Community
    • Login

    Regex: Select only the first instance of search results / first match

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    54 Posts 7 Posters 56.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA Offline
      Alan Kilborn @Alan Kilborn
      last edited by

      @PeterJones

      BTW, I solved it on my own as well, for my own “pleasure”.
      But I wasn’t going to post it, punishing, I guess, the 32.6K readers that were on the edge of their seat waiting for it – at the expense of those that I didn’t really want to have it. But since you let the cat out of the bag, perhaps it is instructive to see a different approach:

      (?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)

      It seems to work; maybe there are holes.

      Terry RT Vasile CarausV 2 Replies Last reply Reply Quote 2
      • Terry RT Offline
        Terry R @Alan Kilborn
        last edited by

        @Alan-Kilborn said in Regex: Select only the first instance of search results / first match:

        BTW, I solved it on my own as well, for my own “pleasure”.

        I also had, using @PeterJones solution for the “first” instance, removing JUST 1 character. Maybe mine also has holes.
        (?s)\A.*<tr>\s*\K.*?(\s*</tr>)
        So turning a non-greedy regex into a greedy one. It firstly grabs everything, then backs up until the <tr>…</tr> sequence is true. Even the \A sequence could be removed IF the cursor were in the first position of the open file.

        Terry

        PeterJonesP 1 Reply Last reply Reply Quote 1
        • PeterJonesP Offline
          PeterJones @Terry R
          last edited by

          @Terry-R and @Alan-Kilborn ,

          Those are so much simpler than mine! Congrats! 🎉👏👍

          Anyway, I am still glad I presented my solution, as it hopefully shows future readers a thought process that can arrive at a working regex, even if it’s not the simplest or most efficient.

          Alan KilbornA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA Offline
            Alan Kilborn @PeterJones
            last edited by Alan Kilborn

            @PeterJones said in Regex: Select only the first instance of search results / first match:

            …so much simpler…

            Well, maybe.
            But nothing is going to beat your discussion of your thought process.
            An important factor in a good solution.

            I’ve always thought of the ((?!UNWANTED).)* construct as somewhat “expensive”, but maybe that’s just because it “feels” complicated, but it would take a true regex genius like @guy038 to discuss that.

            @Terry-R

            Nice one as well!

            Alan KilbornA 1 Reply Last reply Reply Quote 2
            • Alan KilbornA Offline
              Alan Kilborn @Alan Kilborn
              last edited by

              @Terry-R

              I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?

              Peter’s and my regexes only matched what was inside; not sure if you were solving something Vasile wanted or not with that – not going back to read/revisit it! – but I took the liberty of tweaking yours a bit so it matches what ours does:

              (?s)\A.*<tr>\K.+?(?=</tr>)

              and that appears to be the shortest matching regex thus far.

              Vasile CarausV 1 Reply Last reply Reply Quote 3
              • Terry RT Offline
                Terry R
                last edited by

                @Alan-Kilborn said in Regex: Select only the first instance of search results / first match:

                I was experimenting with your regex a bit and I noticed that not only did it match the text inside the final <tr></tr> pair, but it also matched the </tr> tag as well?

                As I said it was from @PeterJones solution for the first instance. Thus in his post:

                FIND = (?s)\A.?<tr>\s\K.?(\s</tr>)
                REPLACE = new contents$1
                MODE = regular expression
                REPLACE ALL
                then I get

                So the replacement text would have been new contents$1, again same as the first instance solution. Sorry forgot to mention that.

                Terry

                1 Reply Last reply Reply Quote 1
                • Vasile CarausV Offline
                  Vasile Caraus @Alan Kilborn
                  last edited by

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • Vasile CarausV Offline
                    Vasile Caraus @Alan Kilborn
                    last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • Vasile CarausV Offline
                      Vasile Caraus
                      last edited by

                      so, conclusion. I select all regex from the las converstion:

                      Select and replace the first instance:

                      SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)(?=$)
                      REPLACE BY: NEW CONTENT $1

                      or

                      SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
                      REPLACE BY: NEW CONTENT $1

                      Select and replace the last instance:

                      SEARCH: (?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)
                      REPLACE BY: \r NEW CONTENS $1 \r

                      or

                      SEARCH: (?s)\A.*<tr>\K.+?(?=</tr>)
                      REPLACE BY: \r NEW CONTENS $1 \r

                      WORKS. Thanks a lot friends.

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA Offline
                        Alan Kilborn
                        last edited by Alan Kilborn

                        This all seems rather “special case”.
                        This <tr> and </tr> junk…

                        To be generic, that is, a roadmap for other interested parties to use, why not specify it like this:


                        Match only the first occurrence in a file of a regular expression RE:

                        (?s)\A.*?\KRE


                        Match the last occurrence of a regular expression RE:

                        (?s)\A.*(RE).*?\K\1


                        Of course, clearly the RE has to be something a bit more specific than (example) .., but these seem to mostly work to achieve the goal.

                        1 Reply Last reply Reply Quote 2
                        • guy038G Offline
                          guy038
                          last edited by guy038

                          Hello, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

                          IMPORTANT : I wrote this post, after reading posts from the banner 4 YEARS LATER till the @peterjones’s post, below :

                          https://community.notepad-plus-plus.org/post/62964

                          But I going to add a second post, after reading the last recent solutions ! Sorry for my incomplete work !


                          First, @vasile-caraus, I totally agree to @alan-kilbron’s comment on your attitude ! Not very fair and nice to @Terry-r, which was trying to help you :-((

                          Seemingly, you quite know, by now, the powerful of regexes, regarding text manipulations. And if you had studied, seriously, some regex tutorials, you would not have spoken about that regex (?s)\z.*?<tr>\s*\K.*?(\s*</tr>) which is a complete nonsense !

                          For instance, from the two pages of the Regular-expressions.info site, below, you had understood, at once, that the \z syntax always comes at the very end of a regex expression or, possibly, before an alternation symbol | !!

                          https://www.regular-expressions.info/anchors.html

                          https://www.regular-expressions.info/refanchors.html


                          Now, I slightly simplified the @peterjones’s search regex, which searches for the first element <tr> ••••• </tr>, of an HTML page :

                          SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>)

                          In return, if your replacement regex is :

                          • The expression Here is the NEW text, you’ll get the simple text
                           </tr>Here is the NEW text</tr>
                          
                          • The expression is \r\nHere is the NEW text\r\n the output text will be :
                          <tr>
                          Here is the NEW text
                          </tr>
                          
                          • Tick the Wrap around option

                          • Click on the Replace All button, exclusively !


                          Now, to search for the last element <tr> ••••• </tr>, of an HTML page, use the following regex :

                          SEARCH (?s-i)<tr>\K((?!<tr>).)*?(?=</tr>((?!<tr>).)*?\z)

                          Note that I use exactly the scheme proposed by @Peterjones :

                          
                          - find from <tr> to </tr> ( NOT included )          =>    (?s-i)<tr>\K •••••••••• (?=</tr> •••••••••• )
                                                                                                     ^                 ^    ^
                                                                                                     |                 |    |
                          - WITHOUT any contained <tr>                        =>    ((?!<tr>).)*? ---•                 |    |
                          																							 |    |
                          - FOLLOWED by anything that’s NOT a <tr>            =>    ((?!<tr>).)*? ---------------------•    |
                          																								  |
                          - until the VERY END of the file                    =>    \z -------------------------------------•
                          

                          To All :

                          You could ask me : why the regex to search for the last <tr> ••••• </tr> block is more complicated than the one to search for the first one ?

                          This is because of the general direction used by the regex engine : from LEFT to RIGHT !

                          • Indeed, when we search for (?s-i)\A.*?<tr>, part of the first regex, the range of any char (?s).* with the lazy quantifier ? is then extended to the first occurrence of the string <tr> and means that, necessarily, this range cannot contain any <tr> inside !

                          • Similarly, the regex (?s).*?(?=</tr>) would search for any range of any char, possibly empty, till the nearest string </tr>, meaning, implicitly, that this range of chars cannot contain a </tr> string

                          • Whereas, when searching the last <tr> ••••• </tr> block, as our reference is the anchor \z ( very end of current file ), we must build up the regex, using a kind of back-propagation method :

                            • Starting from the very end of file

                            • Moving back, through characters without any <tr> string

                            • Till a </tr> string

                            • Moving back, again, through characters without any <tr> string

                            • Till a <tr> string

                          Of course, I assume that any <tr> correctly ends with </tr> !

                          Test these two regexes against this sample, derived from Peter’s one, which contains 4 blocks </tr> •••• </tr> :

                          <html><body>
                          <table>
                          <tr>
                          get rid of stuff, in case of \A anchor, including <embedded/> <tags/>
                          </tr>
                          <tr>
                          keep stuff including <embedded/> <tags/>
                          </tr>
                          <tr>
                          keep stuff including <embedded/> <tags/>
                          </tr>
                          <tr>
                          get rid of stuff, in case of \z anchor, including <embedded/> <tags/>
                          </tr>
                          </table>
                          </body>
                          </html>
                          

                          The first regex, with the \A syntax should replace the first block, only and the last regex, with the \z syntax, should replace the fourth and last <tr> block

                          Best Regards,

                          guy038

                          P.S. :

                          @vasile-caraus, note that I’m willing, and probably, all people involved in that discussion, to help you if you have difficulty understanding a specific part of a regex tutorial, that you have decided to study. A different perspective will certainly be very useful to you … and others ;-))

                          1 Reply Last reply Reply Quote 1
                          • guy038G Offline
                            guy038
                            last edited by

                            Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

                            My God !! Of course, the @terry-r’s regex is just magic and so simple ! Congratulations, Terry ;-)) How could we not think of it ??

                            If I adapt Terry concept to the regexes of my previous post, everything becomes crystal clear :

                            SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>) to search ( and replace ) the first <tr> ••••• </tr> block

                            SEARCH (?s-i)\A.*<tr>\K.*?(?=</tr>) to search ( and replace ) the last <tr> ••••• </tr> block

                            As usual, tick the Regular expression and Wrap around options and click on the Replace All button, exclusively


                            @vasile-caraus, this demonstrates, in a masterful way, that things can be skillfully solved by other people than me and moreover… by @terry-r !!


                            Now, @alan-kilborn you said :

                            Match the last occurrence of a regular expression RE:

                            (?s)\A.*(RE).*?\K\1

                            But, unless I’m mistaken, doesn’t this regex, below, do the same search ?

                            (?s)\A.*\KRE

                            Best regards,

                            guy038

                            Alan KilbornA Vasile CarausV 2 Replies Last reply Reply Quote 2
                            • Terry RT Offline
                              Terry R
                              last edited by Terry R

                              @guy038 said in Regex: Select only the first instance of search results / first match:

                              Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
                              My God !! Of course, the @terry-r’s regex is just magic and so simple !

                              I feel like I’m being rewarded for something I stole borrowed now. ;-)) All I did was point out the marvellous creation of @PeterJones and how by the absence of a single character it turns one thing into another.

                              But hey, I’m happy that collectively we can show there are many answers, all work in various ways.

                              Terry

                              1 Reply Last reply Reply Quote 2
                              • Alan KilbornA Offline
                                Alan Kilborn @guy038
                                last edited by

                                @guy038 said in Regex: Select only the first instance of search results / first match:

                                But, unless I’m mistaken, doesn’t this regex, below, do the same search ?
                                (?s)\A.*\KRE

                                Yes, indeed.
                                That’s what I get for dabbling in the area of another master! :-)

                                1 Reply Last reply Reply Quote 1
                                • Vasile CarausV Offline
                                  Vasile Caraus @guy038
                                  last edited by

                                  @guy038 thanks a lot !

                                  dr ramaanandD 1 Reply Last reply Reply Quote 0
                                  • dr ramaanandD Offline
                                    dr ramaanand @Vasile Caraus
                                    last edited by dr ramaanand

                                    @Vasile-Caraus The regular expression (?s)\A.*?\Kstring(?:.*?)?> helps find the very first occurrence of a string and if you want to find the first occurrence of a tag, say TAG_2, AFTER the first occurrence of another tag, say TAG_1, my generic regex becomes :

                                    (?s-i)\A.*?<TAG_1(?: .*?)?>.*?\K<TAG_2(?: .*?)?> as per @guy038

                                    dr ramaanandD 1 Reply Last reply Reply Quote 0
                                    • dr ramaanandD Offline
                                      dr ramaanand @dr ramaanand
                                      last edited by dr ramaanand

                                      On testing the above, I observed that both the above regular expressions work only for tags or strings that begin with a < and end with a > - so if you are searching for a string between inverted commas, to find the first string, you should use the regular expression (?s)\A.*?\K"string(?:.*?)?"

                                      dr ramaanandD 1 Reply Last reply Reply Quote 0
                                      • dr ramaanandD Offline
                                        dr ramaanand @dr ramaanand
                                        last edited by dr ramaanand

                                        This post is deleted!
                                        1 Reply Last reply Reply Quote 0

                                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                        With your input, this post could be even better 💗

                                        Register Login
                                        • First post
                                          Last post
                                        The Community of users of the Notepad++ text editor.
                                        Powered by NodeBB | Contributors