Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Regex query - match a block of several lines starting and ending with (but not including) the same string

    Help wanted · · · – – – · · ·
    3
    10
    163
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • John Slee
      John Slee last edited by

      Please forgive me if this has been asked before, but I couldn’t find exactly what I’m looking for.

      My original text is of the form

      === Line1 ===
      ab
      de
      ef
      === Line 2 ===
      gh
      ij
      kl
      mn
      op
      === Line 3 ===
      qr
      stu
      vw
      xyz
      == Line 4 ==
      zyx
      wv

      In this instance I want to match 3 cases but there can be a variable number of matches. This time the matches would be:

      === Line1 ===>>== Line 2 ===>>=== Line 3 ===
      ab>>>>>>>>>>>gh>>>>>>>>>>qr
      de>>>>>>>>>>>ij>>>>>>>>>>>stu
      ef>>>>>>>>>>>kl>>>>>>>>>>>vw
      ^ >>>>>>>>>>>mn>>>>>>>>>>xyz
      ^ >>>>>>>>>>>op

      (ignore the ^ and > they are just to break up the blocks in this post)
      My regex would start something like
      ^([=]{2,3}[ \s\S]+[=]{2,3}$)+
      but that always results in one match, including the == Line 4 ==. Clearly I need to look ahead to a (\n[=]{2,3}) but not include it but I can’t work out how to do that. I’m not being lazy or greedy by asking (puns intended) but I’ve spent hours looking around (another pun?) for a solution.

      Thanks in anticipation.

      1 Reply Last reply Reply Quote 0
      • Terry R
        Terry R last edited by

        @John-Slee said in Regex query - match a block of several lines starting and ending with (but not including) the same string:

        === Line1 ===>>== Line 2 ===>>=== Line 3 ===
        ab>>>>>>>>>>>gh>>>>>>>>>>qr
        de>>>>>>>>>>>ij>>>>>>>>>>>stu
        ef>>>>>>>>>>>kl>>>>>>>>>>>vw
        ^ >>>>>>>>>>>mn>>>>>>>>>>xyz
        ^ >>>>>>>>>>>op

        I don’t understand how you get this from the original input, but then again I’m am utterly confused by most of it.

        I think you tried to emulate spaces and tabs by using other characters but in reality it caused more problems. To show data so that the interpreter doesn’t affect it consider reading up in our FAQ, specifically the one that states “request for help without sufficient information…”. In there it suggests putting data like yours in:

        Bloc
        ~ ~ ~
        and then we can more easily see what needs to happen. Could you do that for both your input and resulting answer to the example and it may become more apparent what you need.
        
        Terry
        1 Reply Last reply Reply Quote 2
        • John Slee
          John Slee last edited by

          OK, I’ll not try laying out in three columns for the three matches.

          The 3 matches I want are:

          === Line1 ===
          ab
          de
          ef

          • and

          === Line 2 ===
          gh
          ij
          kl
          mn
          op

          • and

          === Line 3 ===
          qr
          stu
          vw
          xyz

          1 Reply Last reply Reply Quote 0
          • guy038
            guy038 last edited by guy038

            Hello, @John-slee, @terry-r and All,

            I think that a suitable regex could be :

            SEARCH (?-s)^={2,3}.+\R(?s:.+?)(?==|\Z)

            If correct, I’ll explain some details, next time !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • John Slee
              John Slee last edited by

              @guy038 said in Regex query - match a block of several lines starting and ending with (but not including) the same string:

              (?-s)^={2,3}.+\R(?s:.+?)(?==|\Z)

              Brilliant, thank you. Nearly right!
              I need to change my sample data to include lines which contain one or more single =

              e.g.
              === Line 3 ===
              qr
              stu = ignore this but include in capture
              vw
              xyz

              John Slee 1 Reply Last reply Reply Quote 1
              • John Slee
                John Slee @John Slee last edited by

                @John-Slee
                I should probably say that I am using the Python Script plugin for Notepad++

                This is the processor that is ignoring text after the single = in a capture

                1 Reply Last reply Reply Quote 1
                • guy038
                  guy038 last edited by guy038

                  Hi, @John-slee, @terry-r and All,

                  Ah, OK ! Assuming this new condition, I also tried to simplify a bit that search regex. So, a correct solution could be :

                  SEARCH (?-s)^==.+\R(?s:.+?)(?===|\Z)

                  Notes :

                  • First, the (?-s) in-line modifier forces the regex engine to consider that the special . symbol matches only a single standard character ( not EOL chars )

                  • Then the part ^== looks for, at least two = signs, beginning a line

                  • Now, the part .+\R matches all the remaining characters of current line ( .+ ), followed with their EOL char(s) ( \R )

                  • The (?s:.+?) is a non-capturing group (?:.....) containing the in-line modifier s which means that the part .+? will match the shortest non-null range of any char, even EOL ones…

                  • But only if  it is followed with, either, at least two consecutive = signs, of the next block, or the end of file, possibly preceded with some line-breaks only ( \Z ), due to the positive look-ahead structure (?=.......)


                  This new regex should work against the example text, below :

                  === Line1 ===
                  ab
                  de
                  ef
                  === Line 2 ===
                  gh
                  ij
                  kl
                  mn
                  op
                  === Line 3 ===
                  qr
                  stu = ignore this but include in capture
                  vw
                  xyz
                  == Line 4 ==
                  zyx
                  wv
                  

                  Cheers,

                  guy038

                  John Slee 1 Reply Last reply Reply Quote 2
                  • John Slee
                    John Slee @guy038 last edited by

                    @guy038 Thank you so much. You have cracked it and stopped me cracking under the strain ;-)
                    SOLVED!

                    1 Reply Last reply Reply Quote 2
                    • John Slee
                      John Slee last edited by

                      Just a note to add that I have modified it very slightly so that it captures

                      the Header line (without the EndofLine)

                      and the remaining text as a second capture. The revised expression is thus:

                      (?-s)(^==.+\R)((?s:.+?)(?===|\Z))

                      1 Reply Last reply Reply Quote 0
                      • guy038
                        guy038 last edited by guy038

                        Hello, @John-slee,

                        Ah, of course, if you want to capture part(s) of the regex, needed in replacement, you need to surround these parts with parentheses. In that case we have to change the non-capturing group (?s:.+?) into a capturing group, with the in-line modifier inside so the new syntax ((?s).+?)

                        So the final regex would be :

                        SEARCH (?-s)(^==.+\R)((?s).+?)(?===|\Z)

                        with two groups :

                        • The header line with its EOL ( group 1 )

                        • The subsequent lines of each block, with their line-breaks ( group 2 )


                        However, as you said :

                        so that it captures the Header line (without the EndofLine)

                        This final regex should be :

                        SEARCH (?-s)(^==.+)\R((?s).+?)(?===|\Z)

                        Best Regards

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright © 2014 NodeBB Forums | Contributors