Hacker News new | past | comments | ask | show | jobs | submit login

The problem is that in the worst case, converting an NFA to a DFA gives an overhead of 2^n, while keeping it as an NFA only incurs an overhead of m (where m is the regex length).

So, in the worst case (ie, under algorithmic complexity attacks), the NFA implementation is O(m * n), while the complexity of the DFA implementation is - I believe - O(2^m * n)




And fortunately, if re2 can not handle one of the regexen that you throw at it, then it will pass it down to Python's original re module (NFA_based). Perfectly seamless.


Both re2 and Python are NFA based. The difference, I believe, is that re2 doesn't support backtracking, which changes the worst-case time from linear growth to exponential growth on the input length. (For the snobs, that means that re2 regexes are regular expressions. Python's aren't.)


The difference is that re2 avoids backtracking in most cases where Python's normal regular expression engine would use it. And by avoiding backtracking, avoids the possibility of exponential slowdowns.


Isn't that what I just said?


No. "Doesn't support backtracking" suggests that it doesn't support features that re backtracks for. In fact it does.

Also it is technically possible for re2 to (although it doesn't currently do this) add backtracking to its current strategy. If it did this, then it could offer all features of re, but would frequently be much faster. However I would not permanently rule that possibility out.


I'm talking about the C++ re2 engine, not the Python wrapper.


So was I.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: