• Login
Community
  • Login

Regex: Select only the first instance of search results / first match

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
54 Posts 7 Posters 42.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Feb 17, 2021, 11:57 AM

    so, conclusion. I select all regex from the las converstion:

    Select and replace the first instance:

    SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)(?=$)
    REPLACE BY: NEW CONTENT $1

    or

    SEARCH: (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
    REPLACE BY: NEW CONTENT $1

    Select and replace the last instance:

    SEARCH: (?s)<tr>.*</tr>.*?<tr>\K.+?(?=</tr>.*?\z)
    REPLACE BY: \r NEW CONTENS $1 \r

    or

    SEARCH: (?s)\A.*<tr>\K.+?(?=</tr>)
    REPLACE BY: \r NEW CONTENS $1 \r

    WORKS. Thanks a lot friends.

    1 Reply Last reply Reply Quote 0
    • A
      Alan Kilborn
      last edited by Alan Kilborn Feb 18, 2021, 1:49 AM Feb 18, 2021, 1:46 AM

      This all seems rather “special case”.
      This <tr> and </tr> junk…

      To be generic, that is, a roadmap for other interested parties to use, why not specify it like this:


      Match only the first occurrence in a file of a regular expression RE:

      (?s)\A.*?\KRE


      Match the last occurrence of a regular expression RE:

      (?s)\A.*(RE).*?\K\1


      Of course, clearly the RE has to be something a bit more specific than (example) .., but these seem to mostly work to achieve the goal.

      1 Reply Last reply Reply Quote 2
      • G
        guy038
        last edited by guy038 Feb 18, 2021, 11:24 PM Feb 18, 2021, 10:28 PM

        Hello, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

        IMPORTANT : I wrote this post, after reading posts from the banner 4 YEARS LATER till the @peterjones’s post, below :

        https://community.notepad-plus-plus.org/post/62964

        But I going to add a second post, after reading the last recent solutions ! Sorry for my incomplete work !


        First, @vasile-caraus, I totally agree to @alan-kilbron’s comment on your attitude ! Not very fair and nice to @Terry-r, which was trying to help you :-((

        Seemingly, you quite know, by now, the powerful of regexes, regarding text manipulations. And if you had studied, seriously, some regex tutorials, you would not have spoken about that regex (?s)\z.*?<tr>\s*\K.*?(\s*</tr>) which is a complete nonsense !

        For instance, from the two pages of the Regular-expressions.info site, below, you had understood, at once, that the \z syntax always comes at the very end of a regex expression or, possibly, before an alternation symbol | !!

        https://www.regular-expressions.info/anchors.html

        https://www.regular-expressions.info/refanchors.html


        Now, I slightly simplified the @peterjones’s search regex, which searches for the first element <tr> ••••• </tr>, of an HTML page :

        SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>)

        In return, if your replacement regex is :

        • The expression Here is the NEW text, you’ll get the simple text
         </tr>Here is the NEW text</tr>
        
        • The expression is \r\nHere is the NEW text\r\n the output text will be :
        <tr>
        Here is the NEW text
        </tr>
        
        • Tick the Wrap around option

        • Click on the Replace All button, exclusively !


        Now, to search for the last element <tr> ••••• </tr>, of an HTML page, use the following regex :

        SEARCH (?s-i)<tr>\K((?!<tr>).)*?(?=</tr>((?!<tr>).)*?\z)

        Note that I use exactly the scheme proposed by @Peterjones :

        
        - find from <tr> to </tr> ( NOT included )          =>    (?s-i)<tr>\K •••••••••• (?=</tr> •••••••••• )
                                                                                   ^                 ^    ^
                                                                                   |                 |    |
        - WITHOUT any contained <tr>                        =>    ((?!<tr>).)*? ---•                 |    |
        																							 |    |
        - FOLLOWED by anything that’s NOT a <tr>            =>    ((?!<tr>).)*? ---------------------•    |
        																								  |
        - until the VERY END of the file                    =>    \z -------------------------------------•
        

        To All :

        You could ask me : why the regex to search for the last <tr> ••••• </tr> block is more complicated than the one to search for the first one ?

        This is because of the general direction used by the regex engine : from LEFT to RIGHT !

        • Indeed, when we search for (?s-i)\A.*?<tr>, part of the first regex, the range of any char (?s).* with the lazy quantifier ? is then extended to the first occurrence of the string <tr> and means that, necessarily, this range cannot contain any <tr> inside !

        • Similarly, the regex (?s).*?(?=</tr>) would search for any range of any char, possibly empty, till the nearest string </tr>, meaning, implicitly, that this range of chars cannot contain a </tr> string

        • Whereas, when searching the last <tr> ••••• </tr> block, as our reference is the anchor \z ( very end of current file ), we must build up the regex, using a kind of back-propagation method :

          • Starting from the very end of file

          • Moving back, through characters without any <tr> string

          • Till a </tr> string

          • Moving back, again, through characters without any <tr> string

          • Till a <tr> string

        Of course, I assume that any <tr> correctly ends with </tr> !

        Test these two regexes against this sample, derived from Peter’s one, which contains 4 blocks </tr> •••• </tr> :

        <html><body>
        <table>
        <tr>
        get rid of stuff, in case of \A anchor, including <embedded/> <tags/>
        </tr>
        <tr>
        keep stuff including <embedded/> <tags/>
        </tr>
        <tr>
        keep stuff including <embedded/> <tags/>
        </tr>
        <tr>
        get rid of stuff, in case of \z anchor, including <embedded/> <tags/>
        </tr>
        </table>
        </body>
        </html>
        

        The first regex, with the \A syntax should replace the first block, only and the last regex, with the \z syntax, should replace the fourth and last <tr> block

        Best Regards,

        guy038

        P.S. :

        @vasile-caraus, note that I’m willing, and probably, all people involved in that discussion, to help you if you have difficulty understanding a specific part of a regex tutorial, that you have decided to study. A different perspective will certainly be very useful to you … and others ;-))

        1 Reply Last reply Reply Quote 1
        • G
          guy038
          last edited by Feb 18, 2021, 11:19 PM

          Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,

          My God !! Of course, the @terry-r’s regex is just magic and so simple ! Congratulations, Terry ;-)) How could we not think of it ??

          If I adapt Terry concept to the regexes of my previous post, everything becomes crystal clear :

          SEARCH (?s-i)\A.*?<tr>\K.*?(?=</tr>) to search ( and replace ) the first <tr> ••••• </tr> block

          SEARCH (?s-i)\A.*<tr>\K.*?(?=</tr>) to search ( and replace ) the last <tr> ••••• </tr> block

          As usual, tick the Regular expression and Wrap around options and click on the Replace All button, exclusively


          @vasile-caraus, this demonstrates, in a masterful way, that things can be skillfully solved by other people than me and moreover… by @terry-r !!


          Now, @alan-kilborn you said :

          Match the last occurrence of a regular expression RE:

          (?s)\A.*(RE).*?\K\1

          But, unless I’m mistaken, doesn’t this regex, below, do the same search ?

          (?s)\A.*\KRE

          Best regards,

          guy038

          A V 2 Replies Last reply Feb 19, 2021, 12:07 AM Reply Quote 2
          • Terry RT
            Terry R
            last edited by Terry R Feb 18, 2021, 11:24 PM Feb 18, 2021, 11:24 PM

            @guy038 said in Regex: Select only the first instance of search results / first match:

            Hi, @vasile-caraus, @Terry-R, @alan-kilborn, @peterjones and All,
            My God !! Of course, the @terry-r’s regex is just magic and so simple !

            I feel like I’m being rewarded for something I stole borrowed now. ;-)) All I did was point out the marvellous creation of @PeterJones and how by the absence of a single character it turns one thing into another.

            But hey, I’m happy that collectively we can show there are many answers, all work in various ways.

            Terry

            1 Reply Last reply Reply Quote 2
            • A
              Alan Kilborn @guy038
              last edited by Feb 19, 2021, 12:07 AM

              @guy038 said in Regex: Select only the first instance of search results / first match:

              But, unless I’m mistaken, doesn’t this regex, below, do the same search ?
              (?s)\A.*\KRE

              Yes, indeed.
              That’s what I get for dabbling in the area of another master! :-)

              1 Reply Last reply Reply Quote 1
              • V
                Vasile Caraus @guy038
                last edited by Feb 19, 2021, 3:31 PM

                @guy038 thanks a lot !

                dr ramaanandD 1 Reply Last reply Nov 17, 2024, 10:53 AM Reply Quote 0
                • dr ramaanandD
                  dr ramaanand @Vasile Caraus
                  last edited by dr ramaanand Nov 17, 2024, 10:59 AM Nov 17, 2024, 10:53 AM

                  @Vasile-Caraus The regular expression (?s)\A.*?\Kstring(?:.*?)?> helps find the very first occurrence of a string and if you want to find the first occurrence of a tag, say TAG_2, AFTER the first occurrence of another tag, say TAG_1, my generic regex becomes :

                  (?s-i)\A.*?<TAG_1(?: .*?)?>.*?\K<TAG_2(?: .*?)?> as per @guy038

                  dr ramaanandD 1 Reply Last reply Nov 17, 2024, 1:35 PM Reply Quote 0
                  • dr ramaanandD
                    dr ramaanand @dr ramaanand
                    last edited by dr ramaanand Nov 17, 2024, 1:42 PM Nov 17, 2024, 1:35 PM

                    On testing the above, I observed that both the above regular expressions work only for tags or strings that begin with a < and end with a > - so if you are searching for a string between inverted commas, to find the first string, you should use the regular expression (?s)\A.*?\K"string(?:.*?)?"

                    dr ramaanandD 1 Reply Last reply Nov 17, 2024, 1:59 PM Reply Quote 0
                    • dr ramaanandD
                      dr ramaanand @dr ramaanand
                      last edited by dr ramaanand Nov 17, 2024, 4:43 PM Nov 17, 2024, 1:59 PM

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors