Regex capture date if it exists in a block but match the block anyway



  • I am using this regex expression

    (^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)?(?s:.*?)(?=@|==))
    

    to capture two groups and capture an optional date in a test sample block:

    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    02 FEB 1842.
    line 6
    ==
    

    This matches two groups

    **@X1@ F.**
    line 1
    line 2
    

    and

    **@X2@ F.**
    line 3.
    line 4
    02 FEB 1842.
    line 6
    

    but does not capture the date

    whereas

    ^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)(?s:.*?)(?=@|==)
    

    matches just one group (but both of the required groups)

    **@X1@ F.**
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    **02 FEB 1842.**
    line 6
    

    and captures the date.

    Can someone (guy038?) produce a regex that will capture the two groups AND the optional date?



  • @John-Slee

    What does “optional date” mean?

    Does it mean your “before” block could look like this (exactly as you show):

    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    02 FEB 1842.
    line 6
    ==
    

    Or it (the “before” block) could look like this:

    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    line 6
    ==
    


  • it means it could be

    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    03 MAR 1806.
    line 6
    ==
    
    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    line 6
    ==
    

    or

    == Ma ==
    @X1@ F.
    line 1
    04 FEB 1811.
    line 2
    @X2@ F.
    line 3.
    03 MAR 1806.
    line 6
    ==
    

    or

    == Ma ==
    @X1@ F.
    line 1
    04 JUN 1961.
    line 2
    @X2@ F.
    line 3.
    line 6
    ==
    

    Each block begins with the @…@ F. line and can have a variable number of lines, one of which can be the date line.
    I want to match each block and capture the date where it occurs.



  • Hello, @john-slee, @alan-kilborn and All,

    I think that the following regex S/R should be OK !

    SEARCH (?-s)^@X.+\R(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?))(?=^@X|^==|\Z)

    If we use the in-line modifier (?x) we can build the corresponding multi-lines regex, with explanations in comments :

    (?x)                           # FREE-SPACING mode
    (?-s)                          # Forces the DOT regex symbol to match a SINGLE STANDARD character , only ( Not EOL chars )
    ^@X.+\R                        # An ENTIRE Line BEGINNING with @X
    (?:                            # NON-capturing group, beginning 2 ALTERNATIVES
    (?s:[^@]*?)                    #     SHORTEST range of chars, even NULL, DIFFERENT from @, till the DATE, in a NON-capturing group
    (\d\d\x20\w\w\w\x20\d\d\d\d)\. #     DATE, stored in CAPTURING group 1, followed with a DOT
    (?s:.*?)                       #     SHORTEST range of chars till the LOOK-AROUND, in a NON-capturing group
    |                              #   OR
    (?s:.*?)                       #     SHORTEST range of chars, even NULL, till the LOOK-AROUND, in a NON-capturing group
    )                              # END of the NON-capturing group
    (?=^@X|^==|\Z)                 # LOOK-AROUND ( if FOLLOWED with @X or ==, BEGINNING a line, or the END of file [ possibly PRECDEDED with EMPTY lines ] )
    

    You can select all that block, with Ctrl+C and paste it, with Ctrl + V, in the Find what zone of the Find dialog ;-))

    Notes :

    • Each matched multi-lines block, from a line ^@X... to the next line ^@X, excluded, can be used in replacement with the $0 syntax ( The overall match )

    • The group 1 stores the date, when present in current block and is an empty string when the date is absent from block and you can re-use the date, in replacement, with the \1 or $1 syntaxes

    Best Regards,

    guy038



  • @guy038 Thank you so much, Guy. In order to achieve what I want, I adjusted the search regex slightly, so that the whole block, the label line (^@X[/d]+@ F./R) and date are all captured:

    (?-s)((^@X.+\R)(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?)))(?=^@X|^==|\Z)
    

    Best Regards. Stay safe!
    John


Log in to reply