Hi, All,
I’m back for additional information, about lazy, greedy and possessive quantifiers. it’s fundamental to, correctly, understand the differences, between these 3 types of quantifiers !
So, let’s consider the simple text 12345ABCDE, in a new tab
How the regex engine interprets, for instance, the regex \w{1,10}[A-Z]{5}, with the greedy quantifier {1,10} ?. Well :
It, first, tries to match the LONGEST range of \w => 10 Word characters. But, the part [A-Z]{5} CANNOT match anything
Then, it backtracks and tries the first 9 Words characters. Again, the part [A-Z]{5} does NOT match the E letter
Then, it backtracks and tries the first 8 Words characters. Again, the part [A-Z]{5} does NOT match the DE letters
Then, it backtracks and tries the first 7 Words characters. Again, the part [A-Z]{5} does NOT match the CDE letters
Then, it backtracks and tries the first 6 Words characters. Again, the part [A-Z]{5} does NOT match the BCDE letters
Then, it backtracks and tries the first 5 Words characters. This time, the part [A-Z]{5} DOES match the ABCDE letters
=> After the backtracking phase, all the text is matched and selected !
Now, how the regex engine interprets the regex \w{1,10}?[A-Z]{5}, with the lazy quantifier {1,10}? ?
It, first, tries to match the SHORTEST range of \w => 1 Word character. But, the part [A-Z]{5} CANNOT match the 2345ABCDE string
Then, it backtracks and tries the first 2 Words characters. Again, the part [A-Z]{5} does NOT match the 345ABCDE string
Then, it backtracks and tries the first 3 Words characters. Again, the part [A-Z]{5} does NOT match the 45ABCDE string
Then, it backtracks and tries the first 4 Words characters. Again, the part [A-Z]{5} does NOT match the 5ABCDE string
Then, it backtracks and tries the first 5 Words characters. This time, the part [A-Z]{5} DOES match the ABCDE letters
=> After the backtracking phase, all the text is matched and selected !
Note : Instead of the English werb backtrack, the verb fortrack would be more adapted ! Sorry, English isn’t my mother tongue !
Finally, how the regex engine interprets the regex \w{1,10}+[A-Z]{5}, with the possessive quantifier {1,10}+ ?
It, first, tries to match the LONGEST range of \w => 10 Word characters. But, the part [A-Z]{5} CANNOT match anything
Now, the normal process would be to backtrack. But this action is forbidden, due to the possessive quantifier ! In other words, once a match has been found, for the first part \w{1,10}+, the following parts of the regex must match the remaining of the text. But, as the first regex part have consumed all the text, the part [A-Z]{5} will NEVER match anything !
So, the overall match fails and you get the normal message Find: Can’t find the text “\w{1,10}+[A-Z]{5}”
Using, again, the same example 12345ABCDE, in a new tab, it’s easy to verify that :
The regex \w{1,10} matches the longest Word characters range => The whole string 12345ABCDE is matched
The regex \w{1,10}? matches the shortest Word characters range => The 1 Word character is matched, then the 2 digit and so on…
The regex \w{1,10}+ matches the longest Word characters range => The whole string 12345ABCDE is matched, too !
So, to sum up, here is, below, a list of all the quantifiers :
GREEDY quantifiers : * ( = {0,} ) + ( = {1,} ) ? ( = {0,1} ) {n} {n,} {m,n}
LAZY quantifiers : *? ( = {0,}? ) +? ( = {1,}? ) ?? ( = {0,1}? ) {n}? {n,}? {m,n}?
POSSESSIVE quantifiers : *+ ( = {0,}+ ) ++ ( = {1,}+ ) ?+ ( = {0,1}+ ) {n}+ {n,}+ {m,n}+
Remark : The two {n}? and {n}+ syntaxes, although correct, are useless, as the syntax {n} could be qualified as an EXACT quantifier !
Best Regards,
guy038