Decided to try writing a basic inverse preg_replace today, knowing that it would be impossible to make a “perfect” algo.
Potentially a number of uses for such, though I’m currently thinking along the lines of a WYSIWYG editor for MyBB (dunno if I’ll make one). Well, I’ve gone around to making a basic one, which, in fact, probably works on most custom MyCodes posted in the MyBB community forums. The basic idea is to switch replacement tokens ($1, $2 etc) with the source patterns, and vice versa, which happens to fit nicely with most posted custom MyCodes.
So, in other words, a pattern of \[b\](.*?)\[/b\] and replacement <strong>$1</strong>, after passing through my inverse function, comes out with a pattern of \<strong\>(.*?)\</strong\> and replacement of [b]$1[/b].
It does take into consideration the position and handles repetitions in replacement strings correctly, so a pattern of \[tag\](a)(b)\1\[/tag\] and replacement <strong>$2$1$2$1</strong>, comes out with a pattern of \<strong\>(b)(a)\1\2\</strong\> and replacement of [tag]$2$1$2[/tag].
Obviously, however, it cannot handle all patterns, or even many of them. The inverse function can’t readily determine what to put in conditions/patterns which aren’t captured or used – it will try to make guesses sometimes, but it’s usually crap.
At the current stage, $0 replacement and nested subpatterns are a little problematic, but probably work.
Update: since I’m no longer working on this (and as requested in the comments) here’s the code I was working on:
<?php
// this is a simple inverse for preg_replace. This will obviously not be a complete reverse (it will only replace captured patterns) and even then, probably won't work all the time.
function pregreplace_inverse($pattern, $replacement) {
$pattern = str_replace("\0", '', $pattern); // preg patterns can't contain nulls anyway, but be pedantic...
// backup pattern
$orig_pattern = $pattern;
// first, find all captured patterns
$matches = $placeholders = array();
pregreplace_inverse_makephcache($placeholders);
$num_matches = 0;
/* explanation of below regular expression
* .* greedy prefix (to properly handle stacked expressions)
* (^|[^\\\\](?:\\\\\\\\)*) ensure the bracketed thing hasn't been backslash escaped
* \((?:[^?\\\\]|[^?].*?([^\\\\](?:\\\\\\\\)*))\) capture main pattern - if only one character, it can't be a ? or a \, if more than one character, can't start with a ? (non-captured) or end with a backslash (escaped bracket)
* ([.*]\\??|\\?|\\{\d+(?:,\d*)?\\})? also stick in any quantifiers which follow the brackets
*/
while(preg_match('~.*(?:^|[^\\\\](?:\\\\\\\\)*)(\(([^?\\\\]|[^?].*?(?:[^\\\\](?:\\\\\\\\)*))\)([.*]\\??|\\?|\\{\d+(?:,\d*)?\\})?)~s', $pattern, $match, PREG_OFFSET_CAPTURE)) {
// remove the captured pattern from the pattern, and put in placeholder
if(!isset($placeholders[$num_matches])) pregreplace_inverse_makephcache($placeholders);
$pattern = substr($pattern, 0, $match[1][1]) . $placeholders[$num_matches] . substr($pattern, $match[1][1] + strlen($match[1][0]));
$matches[$num_matches] = array(
'pattern' => $match[2][0],
'quant' => $match[3][0],
//'id' => $num_matches,
);
++$num_matches;
}
// now reverse matches, as we retrieved them from back-to-front
//$matches = array_reverse($matches);
// replace matches in matches with back references
// TODO: check
/* foreach($matches as &$match) {
if(strpos($match['pattern'], "\0") !== false) {
$match['pattern'] = preg_replace('~\\0__placeholder__(\d+)__\\0~e', '\'\\\\\'.'.$num_matches.'-$1', $match['pattern']);
}
} */
// now we start changing the replacement string
$r = preg_split('~(?<=[^\\\\]|^)(\\\\\\\\)*(\$\d+)~s', $replacement, -1, PREG_SPLIT_DELIM_CAPTURE);
$c = count($r);
$pc = 0; // pattern count (for backrefs)
//for($i=2; $i<$c; $i+=3) {
for($i=0; $i<$c; $i++) {
if(($i-2) % 3) {
$r[$i] = preg_quote($r[$i], '#');
} else {
// grab number
$n = intval(substr($r[$i], 1));
if($n) {
$match =& $matches[$num_matches-$n];
if(isset($match['pid']))
$r[$i] = '\\'.$match['pid']; // back reference
else {
$r[$i] = '('.$match['pattern'].')'.$match['quant'];
$match['pid'] = ++$pc;
}
}
else {
// replacement for $0
if(isset($orig_pid))
$r[$i] = '\\'.$orig_pid;
else {
$r[$i] = '('.$orig_pattern.')';
$orig_pid = ++$pc;
// as the original pattern may contain capturing subpatterns, ammend $pc accordingly
$pc += $num_matches-1;
// note that the above isn't a "perfectly" correct way to do this (eg patterns can differ)
}
}
}
}
// fix nested patterns
for($i=2; $i<$c; $i+=3) {
if(strpos($r[$i], "\0") !== false) {
// TODO: check if it references a future reference and swap if necessary
$r[$i] = preg_replace('~\\0__placeholder__(\d+)__\\0~e', 'isset($matches[$1][\'pid\']) ? \'\\\\\\\\\'.$matches[$1][\'pid\'] : \'\'', $r[$i]);
}
}
$r = implode('', $r);
// finally, fix up the source pattern
// first, try to do basic heuristics - this will suck, but try something at least, that'll probably work in most cases
$pattern = preg_replace(array(
'~\\\\(x[0-9a-fA-F]{0,2}|0[0-7]{0,2}|c.|[^xc0-9])~e', // escape sequence (must be first)
//'~\(\?[<>]?[=!].+?\)~s', // look behind/ahead - bad pattern because nested brackets can stuff it up, but I'm lazy
//'~\[([^\^]).*?\]~', // character sequence (we're possibly a bit stuffed if this contains an escape sequence...)
'~\[\^.+?\]~', // exclusion character sequence
'~\.~', // any char
'~[*+]\??~', // quantifier
'~\?~', // quantifier2
'~\{\d+(?:,\d*)?\}\??~', // quantifier3 ({0} not handled correctly)
// keep ^ and $ tokens as is, as they're _probably_ okay
// can't handle (?: ... ) or |'s so ignore >_>
), array(
'pregreplace_inverse_ptn_escape(\'$1\')',
//'', // throw look ahead/behind away - can't deal with them
//'$1', // just replace with first character
"\1", // random character which is unlikely to be excluded
' ', // well, here's a chataer...
'', // throw away quantifiers
'', // throw away quantifiers
'', // throw away quantifiers
), $pattern);
// next, placeholders, and backreferences
$pattern = preg_replace(array(
'~\\0__placeholder__(\d+)__\\0~e',
'~\\\\(\d+)~e'
), array(
'isset($matches[$1][\'pid\']) ? \'$\'.$matches[$1][\'pid\'] : \'\'',
'isset($matches[$num_matches-$1][\'pid\']) ? \'$\'.$matches[$num_matches-$1][\'pid\'] : \'\''
), $pattern);
return array(
'pattern' => $r,
'replacement' => $pattern
);
}
function pregreplace_inverse_makephcache(&$a) {
$c = count($a);
for($i=0; $i<50; $i++) // increment cache size by 50
$a[] = "\0__placeholder__".($c+$i)."__\0";
}
function pregreplace_inverse_ptn_escape($char) {
$char = str_replace('\\"', '"', $char);
switch($char{0}) {
case '[': case ']': case '(': case ')': case '{': case '}':
case '?': case '*': case '+': case '.': case '|': case '^': case '$': case '\\':
return $char;
case 'a': return "\a";
case 'e': return "\x1B";
case 'f': return "\x0C";
case 'n': return "\n";
case 'r': return "\r";
case 't': return "\t";
case 'c': return chr(ord(strtoupper($char{1})) ^ 0x40);
case 'x':
$hex = substr($char, 1);
if($hex) return chr(hexdec($hex));
else return "\0";
case '0':
$oct = substr($char, 1);
if($oct) return chr(octdec($oct));
else return "\0";
case 'd': return '0';
case 'D': return 'a';
case 's': return ' ';
case 'S': return '_';
case 'w': return 'a';
case 'W': return ' ';
case 'b': case 'B': case 'A': case 'Z': case 'z': case 'G':
return '';
default: // also handles back references :P
return '\\'.$char;
}
}
var_dump(pregreplace_inverse(
'\[tag\](a)e?\[/tag\]', '<strong>$1</strong>'
));

I don’t think if you will make something like it now 🙂
But you have given me a nice idea of it.
If you ever tried to make one editor for mybb, can i have its code ?
You’re right, no longer interested.
I’ve put up the reversing code that I wrote above, but probably isn’t much use to you.