Hello, @scott-smith-0, @peterjones and All,
Just a clarification about greedy and lazy quantifiers :
Imagine the following regex, using the free-spacing mode for a better readability :
(?x) \w+? ABC test ABC
You may think that we must use the lazy quantifier +?, in order to get the first occurrence of the string ABC. But, it’s a common mistake of reasoning !
Indeed, in fact, the regex are looking for the shortest range of word characters, followed with the string ABCtestABC
Easy to prove :
Against the text XYZ 12345ABCtestABCdefghiABCtestABCjklmn the regex \w+?ABCtestABC does match the string 12345ABCtestABC, whereas the regex \w+ABCtestABC matches the string 12345ABCtestABCdefghiABCtestABC. Given this subject text, the difference between the two kinds of quantifiers is obvious !
Now, against the shorter text XYZ 12345ABCtestABCdefghi, as there is only 1 occurrence of the string ABCtestABC, you don’t have to bother about lazy vs greedy quantifiers ! Both, the regexes \w+?ABCtestABC and \w+ABCtestABC will match the same string 12345ABCtestABC
Of course, had we used the simple regexes \w+?ABC and \w+?ABC, the matched strings would have been quite distinct !
Now, let’s imagine :
The subject text XYZ 12345" display="defghi, where I replaced the two strings ABC with a double-quote " and the word test with <sp>display=
The regex \w+?" display="
In this specific case, there an additional reason why the lazy quantifier is useless ;-))
Indeed, if you think, wrongly, at first sight, that this quantifier is needed in order to get the first " symbol, it could not match a further bunch of chars as, both, the double-quote and the following space character are not word characters, anyway !
Just try the two regexes \w+?" display=" and \w+" display=", against the following text :
XYZ 12345" display=“defghi” display="jklmn
In both cases, the matched string is just 12345" display="
Best Regards,
guy038
Reminder : for correct tests, replace any “ and ” quoting characters with a classic double-quote char " ( \x{0022} )