Decided to try writing a basic inverse preg_replace today, knowing that it would be impossible to make a “perfect” algo.
Potentially a number of uses for such, though I’m currently thinking along the lines of a WYSIWYG editor for MyBB (dunno if I’ll make one). Well, I’ve gone around to making a basic one, which, in fact, probably works on most custom MyCodes posted in the MyBB community forums. The basic idea is to switch replacement tokens ($1, $2 etc) with the source patterns, and vice versa, which happens to fit nicely with most posted custom MyCodes.
So, in other words, a pattern of \[b\](.*?)\[/b\]
and replacement <strong>$1</strong>
, after passing through my inverse function, comes out with a pattern of \<strong\>(.*?)\</strong\>
and replacement of [b]$1[/b]
.
It does take into consideration the position and handles repetitions in replacement strings correctly, so a pattern of \[tag\](a)(b)\1\[/tag\]
and replacement <strong>$2$1$2$1</strong>
, comes out with a pattern of \<strong\>(b)(a)\1\2\</strong\>
and replacement of [tag]$2$1$2[/tag]
.
Obviously, however, it cannot handle all patterns, or even many of them. The inverse function can’t readily determine what to put in conditions/patterns which aren’t captured or used – it will try to make guesses sometimes, but it’s usually crap.
At the current stage, $0 replacement and nested subpatterns are a little problematic, but probably work.
Update: since I’m no longer working on this (and as requested in the comments) here’s the code I was working on:
<?php // this is a simple inverse for preg_replace. This will obviously not be a complete reverse (it will only replace captured patterns) and even then, probably won't work all the time. function pregreplace_inverse($pattern, $replacement) { $pattern = str_replace("\0", '', $pattern); // preg patterns can't contain nulls anyway, but be pedantic... // backup pattern $orig_pattern = $pattern; // first, find all captured patterns $matches = $placeholders = array(); pregreplace_inverse_makephcache($placeholders); $num_matches = 0; /* explanation of below regular expression * .* greedy prefix (to properly handle stacked expressions) * (^|[^\\\\](?:\\\\\\\\)*) ensure the bracketed thing hasn't been backslash escaped * \((?:[^?\\\\]|[^?].*?([^\\\\](?:\\\\\\\\)*))\) capture main pattern - if only one character, it can't be a ? or a \, if more than one character, can't start with a ? (non-captured) or end with a backslash (escaped bracket) * ([.*]\\??|\\?|\\{\d+(?:,\d*)?\\})? also stick in any quantifiers which follow the brackets */ while(preg_match('~.*(?:^|[^\\\\](?:\\\\\\\\)*)(\(([^?\\\\]|[^?].*?(?:[^\\\\](?:\\\\\\\\)*))\)([.*]\\??|\\?|\\{\d+(?:,\d*)?\\})?)~s', $pattern, $match, PREG_OFFSET_CAPTURE)) { // remove the captured pattern from the pattern, and put in placeholder if(!isset($placeholders[$num_matches])) pregreplace_inverse_makephcache($placeholders); $pattern = substr($pattern, 0, $match[1][1]) . $placeholders[$num_matches] . substr($pattern, $match[1][1] + strlen($match[1][0])); $matches[$num_matches] = array( 'pattern' => $match[2][0], 'quant' => $match[3][0], //'id' => $num_matches, ); ++$num_matches; } // now reverse matches, as we retrieved them from back-to-front //$matches = array_reverse($matches); // replace matches in matches with back references // TODO: check /* foreach($matches as &$match) { if(strpos($match['pattern'], "\0") !== false) { $match['pattern'] = preg_replace('~\\0__placeholder__(\d+)__\\0~e', '\'\\\\\'.'.$num_matches.'-$1', $match['pattern']); } } */ // now we start changing the replacement string $r = preg_split('~(?<=[^\\\\]|^)(\\\\\\\\)*(\$\d+)~s', $replacement, -1, PREG_SPLIT_DELIM_CAPTURE); $c = count($r); $pc = 0; // pattern count (for backrefs) //for($i=2; $i<$c; $i+=3) { for($i=0; $i<$c; $i++) { if(($i-2) % 3) { $r[$i] = preg_quote($r[$i], '#'); } else { // grab number $n = intval(substr($r[$i], 1)); if($n) { $match =& $matches[$num_matches-$n]; if(isset($match['pid'])) $r[$i] = '\\'.$match['pid']; // back reference else { $r[$i] = '('.$match['pattern'].')'.$match['quant']; $match['pid'] = ++$pc; } } else { // replacement for $0 if(isset($orig_pid)) $r[$i] = '\\'.$orig_pid; else { $r[$i] = '('.$orig_pattern.')'; $orig_pid = ++$pc; // as the original pattern may contain capturing subpatterns, ammend $pc accordingly $pc += $num_matches-1; // note that the above isn't a "perfectly" correct way to do this (eg patterns can differ) } } } } // fix nested patterns for($i=2; $i<$c; $i+=3) { if(strpos($r[$i], "\0") !== false) { // TODO: check if it references a future reference and swap if necessary $r[$i] = preg_replace('~\\0__placeholder__(\d+)__\\0~e', 'isset($matches[$1][\'pid\']) ? \'\\\\\\\\\'.$matches[$1][\'pid\'] : \'\'', $r[$i]); } } $r = implode('', $r); // finally, fix up the source pattern // first, try to do basic heuristics - this will suck, but try something at least, that'll probably work in most cases $pattern = preg_replace(array( '~\\\\(x[0-9a-fA-F]{0,2}|0[0-7]{0,2}|c.|[^xc0-9])~e', // escape sequence (must be first) //'~\(\?[<>]?[=!].+?\)~s', // look behind/ahead - bad pattern because nested brackets can stuff it up, but I'm lazy //'~\[([^\^]).*?\]~', // character sequence (we're possibly a bit stuffed if this contains an escape sequence...) '~\[\^.+?\]~', // exclusion character sequence '~\.~', // any char '~[*+]\??~', // quantifier '~\?~', // quantifier2 '~\{\d+(?:,\d*)?\}\??~', // quantifier3 ({0} not handled correctly) // keep ^ and $ tokens as is, as they're _probably_ okay // can't handle (?: ... ) or |'s so ignore >_> ), array( 'pregreplace_inverse_ptn_escape(\'$1\')', //'', // throw look ahead/behind away - can't deal with them //'$1', // just replace with first character "\1", // random character which is unlikely to be excluded ' ', // well, here's a chataer... '', // throw away quantifiers '', // throw away quantifiers '', // throw away quantifiers ), $pattern); // next, placeholders, and backreferences $pattern = preg_replace(array( '~\\0__placeholder__(\d+)__\\0~e', '~\\\\(\d+)~e' ), array( 'isset($matches[$1][\'pid\']) ? \'$\'.$matches[$1][\'pid\'] : \'\'', 'isset($matches[$num_matches-$1][\'pid\']) ? \'$\'.$matches[$num_matches-$1][\'pid\'] : \'\'' ), $pattern); return array( 'pattern' => $r, 'replacement' => $pattern ); } function pregreplace_inverse_makephcache(&$a) { $c = count($a); for($i=0; $i<50; $i++) // increment cache size by 50 $a[] = "\0__placeholder__".($c+$i)."__\0"; } function pregreplace_inverse_ptn_escape($char) { $char = str_replace('\\"', '"', $char); switch($char{0}) { case '[': case ']': case '(': case ')': case '{': case '}': case '?': case '*': case '+': case '.': case '|': case '^': case '$': case '\\': return $char; case 'a': return "\a"; case 'e': return "\x1B"; case 'f': return "\x0C"; case 'n': return "\n"; case 'r': return "\r"; case 't': return "\t"; case 'c': return chr(ord(strtoupper($char{1})) ^ 0x40); case 'x': $hex = substr($char, 1); if($hex) return chr(hexdec($hex)); else return "\0"; case '0': $oct = substr($char, 1); if($oct) return chr(octdec($oct)); else return "\0"; case 'd': return '0'; case 'D': return 'a'; case 's': return ' '; case 'S': return '_'; case 'w': return 'a'; case 'W': return ' '; case 'b': case 'B': case 'A': case 'Z': case 'z': case 'G': return ''; default: // also handles back references :P return '\\'.$char; } } var_dump(pregreplace_inverse( '\[tag\](a)e?\[/tag\]', '<strong>$1</strong>' ));
I don’t think if you will make something like it now 🙂
But you have given me a nice idea of it.
If you ever tried to make one editor for mybb, can i have its code ?
You’re right, no longer interested.
I’ve put up the reversing code that I wrote above, but probably isn’t much use to you.