The problem is that in the worst case, converting an NFA to a DFA gives an overh...

koenigdavidmj · on July 23, 2010

And fortunately, if re2 can not handle one of the regexen that you throw at it, then it will pass it down to Python's original re module (NFA_based). Perfectly seamless.

ori_b · on July 24, 2010

Both re2 and Python are NFA based. The difference, I believe, is that re2 doesn't support backtracking, which changes the worst-case time from linear growth to exponential growth on the input length. (For the snobs, that means that re2 regexes are regular expressions. Python's aren't.)

btilly · on July 24, 2010

The difference is that re2 avoids backtracking in most cases where Python's normal regular expression engine would use it. And by avoiding backtracking, avoids the possibility of exponential slowdowns.

ori_b · on July 24, 2010

Isn't that what I just said?

btilly · on July 24, 2010

No. "Doesn't support backtracking" suggests that it doesn't support features that re backtracks for. In fact it does.

Also it is technically possible for re2 to (although it doesn't currently do this) add backtracking to its current strategy. If it did this, then it could offer all features of re, but would frequently be much faster. However I would not permanently rule that possibility out.

ori_b · on July 24, 2010

I'm talking about the C++ re2 engine, not the Python wrapper.

btilly · on July 24, 2010

So was I.