Some thoughts about regex assertions, conditional structures and (non-)optional groups !
-
Hi All,
Recently, I’ve learned something about the zero-length. assertions, as
^
or$
. Indeed, as they are, simply, pre-defined look-arounds, referring to an empty string, you may surround them with parentheses, followed with the?
quantifier, in order to create a NON-optional capturing group, possibly absent ! So :-
If the location, of the text to be matched, verifies the assertion, that group is defined and any conditional regex structure, using that group, will be in the
TRUE
state -
If the location, of the text to be matched, cannot verify the assertion, that group is NOT defined and any conditional regex structure, using that group, will be in the
FALSE
state
For instance, let’s consider the following regex S/R :
SEARCH
(^)?ABC($)?
, where, both, the assertions^
and$
are stored as NON-optional groups1
and2
, possibly absent, due to the?
quantifierREPLACE
?1(?2<123>:<123):(?{2}123>:123)
, which represents a conditional replacement structure, relative to groups1
and2
Given the sample data, below :
ABC ABC test test ABC test ABC test
you should get the text :
<123> <123 test test 123> test 123 test
It’s easy to notice that the replacement depends of the location of the ABC string :
-
If the ABC string is alone on a line, the two assertions are verified. So, the groups
1
and2
, both, exist =><ABC>
-
If the ABC string begins a line, the
^
assertion is verified. So, the group1
exists, only =><ABC
-
If the ABC string ends a line, the
$
assertion is verified. So, the group2
exists, only =>ABC>
-
If any ABC string is embedded in a line, NO assertion can be verified. So, the two groups do not exist =>
ABC
Reminder :
The conditional structures, below :
-
(?(##)Regex_if_TRUE|Regex_if_FALSE)
( in the Search regex ) -
(?{##}Regex_if_TRUE:Regex_if_FALSE)
( in the Replacement regex )
refer, both, to the
##th
capturing group, of the search regex, which can be defined or notHowever, in order to be effective, these structures must concern an NON optional group, only !
Indeed, let’s consider the group
1
, in the regex(?-i)ABC(\d*)ABC
. The regex(\d*)
is alwaysTRUE
as it refers to, either :-
An existent NON-empty group
1
( Some digits ) => Group1
is defined and the condition isTRUE
-
An existent empty group
1
( An empty string ) => Group1
is defined and the condition isTRUE
Now, if we re-build the regex this way
(?-i)ABC(\d+)?ABC
, this time, the regex(\d+)?
refers to a NON optional group1
, possibly absent. So, the regex refers, either, to :-
The existent group
1
( Some digits ) => Group1
is defined and the condition isTRUE
-
The NON-existent group
1
( An empty string ) => Group1
is NOT defined and the condition isFALSE
So, given the two sample lines :
ABCABC ABC12345ABC
The regex :
SEARCH
(?-i)ABC(\d*)ABC
REPLACE
(?1True:False)
gives the text :
True True
whereas the equivalent regex :
SEARCH
(?-i)ABC(\d+)?ABC
REPLACE
(?1True:False)
Do give the text :
False True
Cheers,
guy038
-