Some thoughts about regex assertions, conditional structures and (non-)optional groups !



  • Hi All,

    Recently, I’ve learned something about the zero-length. assertions, as ^ or $. Indeed, as they are, simply, pre-defined look-arounds, referring to an empty string, you may surround them with parentheses, followed with the ? quantifier, in order to create a NON-optional capturing group, possibly absent ! So :

    • If the location, of the text to be matched, verifies the assertion, that group is defined and any conditional regex structure, using that group, will be in the TRUE state

    • If the location, of the text to be matched, cannot verify the assertion, that group is NOT defined and any conditional regex structure, using that group, will be in the FALSE state


    For instance, let’s consider the following regex S/R :

    SEARCH (^)?ABC($)? , where, both, the assertions ^ and $ are stored as NON-optional groups 1 and 2, possibly absent, due to the ? quantifier

    REPLACE ?1(?2<123>:<123):(?{2}123>:123) , which represents a conditional replacement structure, relative to groups 1 and 2

    Given the sample data, below :

    ABC
    ABC test
    test ABC
    test ABC test
    

    you should get the text :

    <123>
    <123 test
    test 123>
    test 123 test
    

    It’s easy to notice that the replacement depends of the location of the ABC string :

    • If the ABC string is alone on a line, the two assertions are verified. So, the groups 1 and 2, both, exist => <ABC>

    • If the ABC string begins a line, the ^ assertion is verified. So, the group 1 exists, only => <ABC

    • If the ABC string ends a line, the $ assertion is verified. So, the group 2 exists, only => ABC>

    • If any ABC string is embedded in a line, NO assertion can be verified. So, the two groups do not exist => ABC


    Reminder :

    The conditional structures, below :

    • (?(##)Regex_if_TRUE|Regex_if_FALSE) ( in the Search regex )

    • (?{##}Regex_if_TRUE:Regex_if_FALSE) ( in the Replacement regex )

    refer, both, to the ##th capturing group, of the search regex, which can be defined or not

    However, in order to be effective, these structures must concern an NON optional group, only !


    Indeed, let’s consider the group 1, in the regex (?-i)ABC(\d*)ABC. The regex (\d*) is always TRUE as it refers to, either :

    • An existent NON-empty group 1 ( Some digits ) => Group 1 is defined and the condition is TRUE

    • An existent empty group 1 ( An empty string ) => Group 1 is defined and the condition is TRUE

    Now, if we re-build the regex this way (?-i)ABC(\d+)?ABC, this time, the regex (\d+)? refers to a NON optional group 1, possibly absent. So, the regex refers, either, to :

    • The existent group 1 ( Some digits ) => Group 1 is defined and the condition is TRUE

    • The NON-existent group 1 ( An empty string ) => Group 1 is NOT defined and the condition is FALSE


    So, given the two sample lines :

    ABCABC
    ABC12345ABC
    

    The regex :

    SEARCH (?-i)ABC(\d*)ABC

    REPLACE (?1True:False)

    gives the text :

    True
    True
    

    whereas the equivalent regex :

    SEARCH (?-i)ABC(\d+)?ABC

    REPLACE (?1True:False)

    Do give the text :

    False
    True
    

    Cheers,

    guy038


Log in to reply