You are right that the lookahead might be expensive.
(There's probably a way that a sufficiently smart compiler of (ir-)regular expressions can optimize this expression to be still matchable quickly; but Python's regular expression matcher is probably not that smart. I'm not sure if any real world matcher is.)
> The time complexity of finding all matches is not O(N) [...]
If you are happy find a maximal set of non-overlapping matches, you can still do it in O(N). By 'maximal' I mean that you can't greedily find another match (without removing any of the existing matches.)
A sketch of the technique is: take your pattern and wrap it up like this '.?{pattern.?}' (where ? means non-greedy repetition) and match that against your input string. You can do non-greedy repetition and the very limited form of sub-pattern capturing that you need to find all the matches of 'pattern' without breaking O(N) time.
I'm not sure whether you can find the global maximum number of non-overlapping matches, instead of a just a greedy maximum, in O(N) time.
(There's probably a way that a sufficiently smart compiler of (ir-)regular expressions can optimize this expression to be still matchable quickly; but Python's regular expression matcher is probably not that smart. I'm not sure if any real world matcher is.)
> The time complexity of finding all matches is not O(N) [...]
If you are happy find a maximal set of non-overlapping matches, you can still do it in O(N). By 'maximal' I mean that you can't greedily find another match (without removing any of the existing matches.)
A sketch of the technique is: take your pattern and wrap it up like this '.?{pattern.?}' (where ? means non-greedy repetition) and match that against your input string. You can do non-greedy repetition and the very limited form of sub-pattern capturing that you need to find all the matches of 'pattern' without breaking O(N) time.
I'm not sure whether you can find the global maximum number of non-overlapping matches, instead of a just a greedy maximum, in O(N) time.