Hi, @alan-kilborn and All,
Regarding regex operators precedence, taken from the link,
https://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.operator_precedence
The table, below, gives the hierarchy of these operators, listed from the highest priority to the lowest priority :
POSIX based
Bracket Character set :
[:Class character:],
[=Equivalent Class=], and
[.Collating element.]
Escaped characters :
\...
Bracket Character set, (
negative or
not ) :
[^.....] and
[.....]
Grouping, (
capturing or
not ) :
(.....) and
(?:.....)
Quantifiers :
*,
+,
?,
{n},
{m,n} and
{m,}
Concatenation ( Implicit )
Anchoring :
^ and
$
Alternation :
|
Here are some examples to verify this hierarchy :
Between level
1 and level
2 :
The regex [[=\=]] matches the reversed slash \, only and NOT the regex [[==]], which is, besides, invalid !
Between level
2 and level
3 :
The regex \[1] means the regex \[ , so the string [, followed with the string 1] and NOT the regex \1, as [1] represents the 1 digit., which,finally, matches the 1 digit
Between level
3 and level
4 :
The regex [(123)45] matches 1, 2, 3, 4 and 5 digits, as well as the parentheses ( and ), and NOT the number 123, as a group, or the digits 4 or 5, which can be found with the regex (123)|[45]
Between level
4 and level
5 :
The regex (123)+ represents the number 123, possibly repeated, and NOT the 12 number, followed with any range of consecutive digit(s) 3, which can be found with the regex 123+
Between level
5 and level
6 :
The regex 123+45+ matches the 12 number, followed with any range of consecutive digit(s) 3, followed with 4 number, followed with any range of consecutive digit(s) 5 and NOT any range of the 123 number, followed with any range of the 45 number, which can be obtained with the regex (123)+(45)+
Between level
6 and level
7 :
I have not been able to detail differences between implicit concatenation of regexes ( for instance, regex a, followed with regex b resulting in the regex ab ) and anchoring which defines zero-length regexes, matching specific locations in file contents !
Indeed, if we consider the simple regex ^123, to my mind, the regex ^1, immediately followed with the regex 23 or the regex ^12, immediately followed with the regex 3 and the regex ^123, or even the zero-lengh regex ^ followed with the regex 123, seem all identical !?
A bit off topic : just notice that string concatenation does NOT represent the same concept as regex concatenation ! For instance, the regex [12], followed with the regex [34] matches all elements of the set { 13, 14, 23, 24 }, whereas the string 12, followed with string 34, represents the single-element set { 1234 }
Between level
7 and level
8 :
The regex ^12|34$ matches the 12 number, beginning a line OR the 34 number, ending a line ( and NOT a line with number 12 OR number 34, only ( which can be found with the regex ^(12|34)$ ) NEITHER a line beginning with the 1 digit, ending with the 4 digit and between, either, digit 2 OR 3 ( which can be found with the regex ^1(2|3)4$ )
Best regards,
Merry Christmas and Happy Holidays to all ;-))
guy038
P.S. :
I’ve, also, found out a great article on operators precedence, regarding the main progamming or script languages ;-)) Just click below :
https://rosettacode.org/wiki/Operator_precedence