Hacker News new | past | comments | ask | show | jobs | submit login

You are right that the lookahead might be expensive.

(There's probably a way that a sufficiently smart compiler of (ir-)regular expressions can optimize this expression to be still matchable quickly; but Python's regular expression matcher is probably not that smart. I'm not sure if any real world matcher is.)

> The time complexity of finding all matches is not O(N) [...]

If you are happy find a maximal set of non-overlapping matches, you can still do it in O(N). By 'maximal' I mean that you can't greedily find another match (without removing any of the existing matches.)

A sketch of the technique is: take your pattern and wrap it up like this '.?{pattern.?}' (where ? means non-greedy repetition) and match that against your input string. You can do non-greedy repetition and the very limited form of sub-pattern capturing that you need to find all the matches of 'pattern' without breaking O(N) time.

I'm not sure whether you can find the global maximum number of non-overlapping matches, instead of a just a greedy maximum, in O(N) time.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: