@guy038 said in Getting "Invalid Regular Expression" for an extremely simple expression:
At this point, I tried to select all the zones around these 11 matches in a small new file, that I named Matches.txt. Then, using the Mark dialog with (?:[^,]*,){13}[\u\l], against this small file, it does return 10 matches ( not 11 as explained in the next post ! )
However, it is distressing to note that the equivalent regex (?:.*?,){13}[\u\l] still fails against this tiny Matches.txt file, of only 16,138 bytes :-((
Unfortunately, it’s quite certain that cases, like that one, may arise when using most of the available regex engines !
There are two ways an implementation can look at a regex:
A regex is a definition of matching character strings.
A regex is a procedure for matching character strings.
From the first perspective, your two expressions are equivalent: they specify the same strings as matches. From the second perspective, they are not: they specify different procedures for finding strings that match.
No one has found a way to implement back references using method 1. Once your regular expression syntax includes the ability to use back references, you are stuck with the procedural interpretation.
There are other features of PERL-compatible regular expressions that present problems, but back references are the killer.
I’m speculating here, but I think once you include any back reference in an expression, it breaks the ability to process any part of the expression that occurs before the back reference as a definition rather than a procedure. (I’m not certain of that. I have no doubt someone does know the answer to that… but that someone isn’t me.)
So I think you’ll find all those more efficient regular expression engines implement a severely restricted syntax for regular expressions which omits features none of us would like to do without (particularly, back references).
What I’ve also speculated is that perhaps a regular expression engine could include two engines: one which processes using the ”definition” approach for expressions to which it is applicable, and one which uses the “procedural” approach for the remaining expressions. I don’t know if any do that now.