• Login
Community
  • Login

Regex capture date if it exists in a block but match the block anyway

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regexregex
5 Posts 3 Posters 366 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    John Slee
    last edited by John Slee Apr 13, 2020, 11:54 AM Apr 13, 2020, 11:53 AM

    I am using this regex expression

    (^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)?(?s:.*?)(?=@|==))
    

    to capture two groups and capture an optional date in a test sample block:

    == Ma ==
    @X1@ F.
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    02 FEB 1842.
    line 6
    ==
    

    This matches two groups

    **@X1@ F.**
    line 1
    line 2
    

    and

    **@X2@ F.**
    line 3.
    line 4
    02 FEB 1842.
    line 6
    

    but does not capture the date

    whereas

    ^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)(?s:.*?)(?=@|==)
    

    matches just one group (but both of the required groups)

    **@X1@ F.**
    line 1
    line 2
    @X2@ F.
    line 3.
    line 4
    **02 FEB 1842.**
    line 6
    

    and captures the date.

    Can someone (guy038?) produce a regex that will capture the two groups AND the optional date?

    A 1 Reply Last reply Apr 13, 2020, 12:08 PM Reply Quote 0
    • A
      Alan Kilborn @John Slee
      last edited by Apr 13, 2020, 12:08 PM

      @John-Slee

      What does “optional date” mean?

      Does it mean your “before” block could look like this (exactly as you show):

      == Ma ==
      @X1@ F.
      line 1
      line 2
      @X2@ F.
      line 3.
      line 4
      02 FEB 1842.
      line 6
      ==
      

      Or it (the “before” block) could look like this:

      == Ma ==
      @X1@ F.
      line 1
      line 2
      @X2@ F.
      line 3.
      line 4
      line 6
      ==
      
      1 Reply Last reply Reply Quote 0
      • J
        John Slee
        last edited by Apr 13, 2020, 2:21 PM

        it means it could be

        == Ma ==
        @X1@ F.
        line 1
        line 2
        @X2@ F.
        line 3.
        03 MAR 1806.
        line 6
        ==
        
        == Ma ==
        @X1@ F.
        line 1
        line 2
        @X2@ F.
        line 3.
        line 4
        line 6
        ==
        

        or

        == Ma ==
        @X1@ F.
        line 1
        04 FEB 1811.
        line 2
        @X2@ F.
        line 3.
        03 MAR 1806.
        line 6
        ==
        

        or

        == Ma ==
        @X1@ F.
        line 1
        04 JUN 1961.
        line 2
        @X2@ F.
        line 3.
        line 6
        ==
        

        Each block begins with the @…@ F. line and can have a variable number of lines, one of which can be the date line.
        I want to match each block and capture the date where it occurs.

        1 Reply Last reply Reply Quote 2
        • G
          guy038
          last edited by guy038 Apr 13, 2020, 3:47 PM Apr 13, 2020, 3:17 PM

          Hello, @john-slee, @alan-kilborn and All,

          I think that the following regex S/R should be OK !

          SEARCH (?-s)^@X.+\R(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?))(?=^@X|^==|\Z)

          If we use the in-line modifier (?x) we can build the corresponding multi-lines regex, with explanations in comments :

          (?x)                           # FREE-SPACING mode
          (?-s)                          # Forces the DOT regex symbol to match a SINGLE STANDARD character , only ( Not EOL chars )
          ^@X.+\R                        # An ENTIRE Line BEGINNING with @X
          (?:                            # NON-capturing group, beginning 2 ALTERNATIVES
          (?s:[^@]*?)                    #     SHORTEST range of chars, even NULL, DIFFERENT from @, till the DATE, in a NON-capturing group
          (\d\d\x20\w\w\w\x20\d\d\d\d)\. #     DATE, stored in CAPTURING group 1, followed with a DOT
          (?s:.*?)                       #     SHORTEST range of chars till the LOOK-AROUND, in a NON-capturing group
          |                              #   OR
          (?s:.*?)                       #     SHORTEST range of chars, even NULL, till the LOOK-AROUND, in a NON-capturing group
          )                              # END of the NON-capturing group
          (?=^@X|^==|\Z)                 # LOOK-AROUND ( if FOLLOWED with @X or ==, BEGINNING a line, or the END of file [ possibly PRECDEDED with EMPTY lines ] )
          

          You can select all that block, with Ctrl+C and paste it, with Ctrl + V, in the Find what zone of the Find dialog ;-))

          Notes :

          • Each matched multi-lines block, from a line ^@X... to the next line ^@X, excluded, can be used in replacement with the $0 syntax ( The overall match )

          • The group 1 stores the date, when present in current block and is an empty string when the date is absent from block and you can re-use the date, in replacement, with the \1 or $1 syntaxes

          Best Regards,

          guy038

          J 1 Reply Last reply Apr 13, 2020, 5:41 PM Reply Quote 3
          • J
            John Slee @guy038
            last edited by Apr 13, 2020, 5:41 PM

            @guy038 Thank you so much, Guy. In order to achieve what I want, I adjusted the search regex slightly, so that the whole block, the label line (^@X[/d]+@ F./R) and date are all captured:

            (?-s)((^@X.+\R)(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?)))(?=^@X|^==|\Z)
            

            Best Regards. Stay safe!
            John

            1 Reply Last reply Reply Quote 3
            1 out of 5
            • First post
              1/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors