Community
    • Login

    regexp help: lookahead and lookbehind with spaces

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 4 Posters 686 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @patrickdrd
      last edited by

      @patrickdrd

      “f1 fantasy” (space in between, without quotes) and “fantasy f1”

      (?:(f1)|(fantasy)) (?(1)(?2)|(?1))

      also how do I do lookahead and lookbehind in npp

      Decribed in the N++ user manual.

      patrickdrdP 1 Reply Last reply Reply Quote 3
      • patrickdrdP
        patrickdrd @Alan Kilborn
        last edited by

        @alan-kilborn thanks but I would like f1fantasy matched as well

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @patrickdrd
          last edited by

          @patrickdrd said in regexp help: lookahead and lookbehind with spaces:

          but I would like f1fantasy matched as well

          Hmm, the target moves…

          I think you can figure out how to change my regex do that, given your first guess at it. :-)

          patrickdrdP 1 Reply Last reply Reply Quote 2
          • patrickdrdP
            patrickdrd @Alan Kilborn
            last edited by patrickdrd

            @alan-kilborn \s* is the 0 or more spaces but I can’t figure out where to attach it to, I tried all places and it doesn’t work

            I also got a couple more conditions to check for like mls and ucl

            patrickdrdP 1 Reply Last reply Reply Quote 0
            • patrickdrdP
              patrickdrd @patrickdrd
              last edited by

              ok got it,

              (?:(f1|ucl|mls)|(fantasy))\s*(?(1)(?2)|(?1))
              
              1 Reply Last reply Reply Quote 3
              • guy038G
                guy038
                last edited by guy038

                Hello, @patrickdrd, @alan-kilborn and All,

                Actually, two solutions are possibles :

                Regex A : SEARCH (?-i:(f1|ucl|mls)|(fantasy))\s*(?(1)(?2)|(?1))

                Regex B : SEARCH (?-i:(f1|ucl|mls)|(fantasy))\x20*(?(1)(?2)|(?1))

                It does not match exactly the same occurrences !


                • Paste the text below in a new tab :
                f1fantasy
                f1 fantasy
                f1      fantasy
                uclfantasy
                ucl fantasy
                ucl            fantasy
                mlsfantasy
                mls fantasy
                mls    fantasy
                
                fantasyf1
                fantasy f1
                fantasy                f1
                fantasyucl
                fantasy ucl
                fantasy          ucl
                fantasymls
                fantasy mls
                fantasy                       mls
                ============================================================
                Match with \x{000D}\x{000A} (CRLF) in between :
                f1
                fantasy
                
                Match with \x{000A} (LF) in between :
                f1
                fantasy
                
                Match with \x{000D} (CR) in between :
                f1
                fantasy  
                
                Match with \x{0009} (TABULATION) in between :
                fantasy	f1
                
                Match with \x{0011} ( VERTICAL TABULATION ) in between :
                f1fantasy
                
                Match with \x{0085} ( NO-BREAK SPACE ) in between :
                fantasy f1
                
                with a Multi-lignes MIX of these SPECIFIC chars in between :
                fantasy			
                		            f1
                
                • Open the Mark dialog ( Ctrl + M )

                • Tick only the Purge for each search and Wrap around options

                • Click on the Mark All button

                As you can see, the regex A, in addition to match usual space chars, also matches a lot a “SPACE” characters !

                => After the line of equal signs, some matches, possibly multi lines, occurred !

                So the regex B, more rigorous, just matches zero or more \x20 chars ;-))

                => The part, after the equal signs is not detected at all !

                If your different words may be present in any case, simply replace the -i in-line modifier, within the non-capturing group, by the i modifier !

                Best regards,

                guy038

                patrickdrdP 1 Reply Last reply Reply Quote 2
                • patrickdrdP
                  patrickdrd @guy038
                  last edited by patrickdrd

                  @guy038 I prefere the first approach,
                  because I want it to match tabs as well,
                  also \n shouldn’t be a problem because I’m matching single line items

                  Neil SchipperN 1 Reply Last reply Reply Quote 1
                  • Neil SchipperN
                    Neil Schipper @patrickdrd
                    last edited by

                    @patrickdrd and @guy038,

                    In much the same way that Goldilocks found Papa Bear’s bed too hard, and Mama Bear’s bed too soft, but Baby Bear’s bed just right …

                    in the present context, surely \s is too promiscuous, and \x20 is too brittle, while \h is just right!

                    patrickdrdP 1 Reply Last reply Reply Quote 4
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @patrickdrd, @alan-kilborn, @neil-schipper and All,

                      @neil-schipper is perfectly right ! This third solution (?-i:(f1|ucl|mls)|(fantasy))\h*(?(1)(?2)|(?1)) is just what you need as \h* will match any combination, possibly null, of Space, Tabulation and/or No-Break Space char(s) !

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • patrickdrdP
                        patrickdrd @Neil Schipper
                        last edited by

                        @neil-schipper I still tend to think that \s is slightly better because why not match “accidental” carriage returns/ line feeds?

                        which is higher? the possibility of an “accidental” carriage return/ line feed or a “next line mismatch”?
                        I don’t know if you understand, but I think the first is higher

                        Neil SchipperN 1 Reply Last reply Reply Quote 2
                        • Neil SchipperN
                          Neil Schipper @patrickdrd
                          last edited by

                          @patrickdrd said in regexp help: lookahead and lookbehind with spaces:

                          why not match “accidental” carriage returns/ line feeds?

                          I don’t know enough about your data or the intended purpose of the pair matching to say either way, but I will say that, since you are gathering up either ab or ba pairs, then, a small error can make every subsequent pair a different pair from what would have matched without the error.

                          It’s up to you to think through whether this harms what you’re trying to accomplish, and if it does, to try to devise a strategy that could detect a malformed pair, and maybe skip it and then resync and pick up all subsequent pairs with desired grouping.

                          1 Reply Last reply Reply Quote 3
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors