Hello, @peter-mccormack, @terry-r and All,
Terry, you said :
We can expand on this and say \d{5,8} which asks for at least 5, and up to 8 digits together, tending to as few as necessary. A further expansion of this is to say \d{5,8}+ which says between 5 and 8 digits but more rather than less, so it’s greedy.
I’m really sorry, Terry, but your reasoning, about lazy, greedy and possessive quantifiers, is not exact !
First, if we consider, for instance, the general syntax A{2,9}, this defines 3 types of quantifiers :
The greedy quantifier A{2,9} which tries to match as many letters A as possible, with a maximum of 9 letters A
The lazy quantifier A{2,9}? which tries to match as few letters A as possible, so a minimum of 2 letters A
The possessive quantifier A{2,9}+ which tries to match as many letters A as possible, with a maximum of 9 letters A, but which NEVER allows the regex engine to backtrack so that the overall pattern would match !
Let’s suppose that our sample text, to test some regex syntaxes, is the simple string AAAAAAAAA ( 9 letters A ), in a new tab
The regex (?-i)A{2,9}?A matches the string AAA : Logic, because A{2,9}? matches AA, as a lazy quantifier. Then A matches the third A, of course !
The regex (?-i)A{2,9}A matches all the string AAAAAAAAA. Again logic, but this needs a quick explanation :
First, the part A{2,9} matches the entire string AAAAAAAAA ( 9 letters ), but, now, there’s NO more text, to satisfy the last part of the regex A
So, the regex engine backtracks and the part A{2,9} match the string AAAAAAAA (8 letters only ). This time, the reminder of the regex : A can match the 9th letter A !
The regex (?-i)A{2,9}+A, with the possessive quantifier, matches nothing ! Do you understand the logic of this result ?
Like above, the part A{2,9}+ matches the entire string AAAAAAAAA. And again, there NO more text which could be matched by A, the reminder of the regex !
But, unlike the case above, due to the possessive quantifier, the regex engine is NOT allowed, this time, to backtrack. So, as the regex don’t have other alternatives, the regex engine cannot match our subject string and process stops !
The slightly modified regex (?-i)A{2,8}+A, although containing a possessive quantifier, does match the entire string AAAAAAAAA ! I suppose you’ve already guessed why :-))
First, the part A{2,8}+ although possessive, matches the string AAAAAAAA ( its maximum : 8 letters ) and the last part A of the regex matches the 9th letter A
This time, NO need to backtrack : the overall pattern match our subject string AAAAAAAAA
Keeping again our sample text AAAAAAAAA, let’s test some other regexes :
The regex (?-i)A{2,9}?A{2,9}?, with two lazy quantifiers, matches the string AAAA ( 2 times the minimum of 2 letters )
The regex (?-i)A{2,9}?A{2,9}, with a lazy quantifier, followed by a greedy one, matches the entire string AAAAAAAAA ( The first part A{2,9}? matches AA and the last part A{2,9} matches AAAAAAA )
The regex (?-i)A{2,9}A{2,9}?, with a greedy quantifier, followed by a lazy one, matches, first, all the subject string :
Indeed, the first part A{2,9} can match all the subject string ( 9 letters )
As there NO more text for the last part A{2,9}?, the regex engine backtracks 1 position
So, the first part A{2,9} matches the 8-chars string AAAAAAAA but the last part A{2,9}? cannot match the 9th letter A, as a minimum of two letters A is required !
Again, the regex engine backtracks 1 position. So the first part A{2,9} matches the 7-chars string AAAAAAA
This time, the last part A{2,9}? can match the string AA : Done !
The regex (?-i)A{2,9}A{2,9}, with two greedy quantifiers, matches the entire sting AAAAAAAAA. Logic, as, like above, after two backtracking processes, the first part A{2,9} matches the 7-chars string AAAAAAAA and the last part A{2,9} matches the 8th and 9th letter A, so AA ( UPDATED 07-05-2019)
The regex (?-i)A{2,9}+A{2,9}, with a possessive quantifier, followed by a greedy one, matches nothing. Why ?
The
first part
A{2,9}+ matches the
entire string
AAAAAAAAA, at the beginning. But, as
NO more text can be matched by the
last part
A{2,9} and that backtracking is
not allowed because the quantifier is
possessive, the process stops
without any match !
Finally, the regex (?-i)A{2,9}+A{2,9}?, with a possessive quantifier, followed by a lazy one, would produce the same results and gives no match
You could say : So, what is the benefit of using possessive quantifiers ?
First, to speed up regular expressions. In particular, they help some alternatives, of your regex, to fail faster !
Secondly, they prevent the regex engine from trying all possible permutations. This can be useful for performance reasons !
Thirdly, in case of nested quantifiers, for instance, they may save your day by preventing the regex engine from the catastrophic backtracking event :-((
One example :
Let’s imagine the regex (?-i)A*Z, against this sample text AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, in a new tab
The regex engine, due to the greedy quantifier, will backtrack 50 times to realize that, at any position, an uppercase letter Z cannot be found -((
Let’s consider, now, the regex (?-i)A*+Z. This time, the possessive part A*+ grabs all the letters A and there NO more text to match the Z regex,. And, as backtracking is not allowed, the regex fails faster and so, the regex engine can search, for other matches, further on, more quickly !
Of course, you don’t see any difference between the two cases but, for long texts and/or complicated regexes, this may be significant !!
Finally, here is, below, a summary table of all the quantifiers :
•--------------------------------------•
| QUANTIFIERS |
•---------------•----------•------------•--------------•
| REPETITIONS | Greedy | Lazy | Possessive |
•---------------•----------•------------•--------------•
| From n to ∞ | {n,} | {n,}? | {n,}+ |
•---------------•----------•------------•--------------•
| From n to m | {n,m} | {n,m}? | {n,m}+ |
•---------------•----------•------------•--------------•
| From 0 to 1 | ? | ?? | ?+ |
•---------------•----------•------------•--------------•
| From 0 to ∞ | * | *? | *+ |
•---------------•----------•------------•--------------•
| From 1 to ∞ | + | +? | ++ |
•---------------•----------•------------•--------------•
| From n to n | {n} |
•---------------•--------------------------------------•
Note that the {n} quantifier cannot be qualified with the flavors Greedy, Lazy or Possessive. It just means exactly n times, the character or expression, right before !
Best Regards
guy038