Community
    • Login

    Regex capture date if it exists in a block but match the block anyway

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regexregex
    5 Posts 3 Posters 356 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • John SleeJ
      John Slee
      last edited by John Slee

      I am using this regex expression

      (^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)?(?s:.*?)(?=@|==))
      

      to capture two groups and capture an optional date in a test sample block:

      == Ma ==
      @X1@ F.
      line 1
      line 2
      @X2@ F.
      line 3.
      line 4
      02 FEB 1842.
      line 6
      ==
      

      This matches two groups

      **@X1@ F.**
      line 1
      line 2
      

      and

      **@X2@ F.**
      line 3.
      line 4
      02 FEB 1842.
      line 6
      

      but does not capture the date

      whereas

      ^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)(?s:.*?)(?=@|==)
      

      matches just one group (but both of the required groups)

      **@X1@ F.**
      line 1
      line 2
      @X2@ F.
      line 3.
      line 4
      **02 FEB 1842.**
      line 6
      

      and captures the date.

      Can someone (guy038?) produce a regex that will capture the two groups AND the optional date?

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @John Slee
        last edited by

        @John-Slee

        What does “optional date” mean?

        Does it mean your “before” block could look like this (exactly as you show):

        == Ma ==
        @X1@ F.
        line 1
        line 2
        @X2@ F.
        line 3.
        line 4
        02 FEB 1842.
        line 6
        ==
        

        Or it (the “before” block) could look like this:

        == Ma ==
        @X1@ F.
        line 1
        line 2
        @X2@ F.
        line 3.
        line 4
        line 6
        ==
        
        1 Reply Last reply Reply Quote 0
        • John SleeJ
          John Slee
          last edited by

          it means it could be

          == Ma ==
          @X1@ F.
          line 1
          line 2
          @X2@ F.
          line 3.
          03 MAR 1806.
          line 6
          ==
          
          == Ma ==
          @X1@ F.
          line 1
          line 2
          @X2@ F.
          line 3.
          line 4
          line 6
          ==
          

          or

          == Ma ==
          @X1@ F.
          line 1
          04 FEB 1811.
          line 2
          @X2@ F.
          line 3.
          03 MAR 1806.
          line 6
          ==
          

          or

          == Ma ==
          @X1@ F.
          line 1
          04 JUN 1961.
          line 2
          @X2@ F.
          line 3.
          line 6
          ==
          

          Each block begins with the @…@ F. line and can have a variable number of lines, one of which can be the date line.
          I want to match each block and capture the date where it occurs.

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by guy038

            Hello, @john-slee, @alan-kilborn and All,

            I think that the following regex S/R should be OK !

            SEARCH (?-s)^@X.+\R(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?))(?=^@X|^==|\Z)

            If we use the in-line modifier (?x) we can build the corresponding multi-lines regex, with explanations in comments :

            (?x)                           # FREE-SPACING mode
            (?-s)                          # Forces the DOT regex symbol to match a SINGLE STANDARD character , only ( Not EOL chars )
            ^@X.+\R                        # An ENTIRE Line BEGINNING with @X
            (?:                            # NON-capturing group, beginning 2 ALTERNATIVES
            (?s:[^@]*?)                    #     SHORTEST range of chars, even NULL, DIFFERENT from @, till the DATE, in a NON-capturing group
            (\d\d\x20\w\w\w\x20\d\d\d\d)\. #     DATE, stored in CAPTURING group 1, followed with a DOT
            (?s:.*?)                       #     SHORTEST range of chars till the LOOK-AROUND, in a NON-capturing group
            |                              #   OR
            (?s:.*?)                       #     SHORTEST range of chars, even NULL, till the LOOK-AROUND, in a NON-capturing group
            )                              # END of the NON-capturing group
            (?=^@X|^==|\Z)                 # LOOK-AROUND ( if FOLLOWED with @X or ==, BEGINNING a line, or the END of file [ possibly PRECDEDED with EMPTY lines ] )
            

            You can select all that block, with Ctrl+C and paste it, with Ctrl + V, in the Find what zone of the Find dialog ;-))

            Notes :

            • Each matched multi-lines block, from a line ^@X... to the next line ^@X, excluded, can be used in replacement with the $0 syntax ( The overall match )

            • The group 1 stores the date, when present in current block and is an empty string when the date is absent from block and you can re-use the date, in replacement, with the \1 or $1 syntaxes

            Best Regards,

            guy038

            John SleeJ 1 Reply Last reply Reply Quote 3
            • John SleeJ
              John Slee @guy038
              last edited by

              @guy038 Thank you so much, Guy. In order to achieve what I want, I adjusted the search regex slightly, so that the whole block, the label line (^@X[/d]+@ F./R) and date are all captured:

              (?-s)((^@X.+\R)(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?)))(?=^@X|^==|\Z)
              

              Best Regards. Stay safe!
              John

              1 Reply Last reply Reply Quote 3
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors