Community
    • Login

    Regex: Select only the first instance of search results / first match

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    54 Posts 7 Posters 42.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello Vasile,

      As you know, regexes force and help us to keep a rigorous attitude, as well as programming do ! So I, slightly, change your example text, in order to exactly see what you need !

      So, let’s suppose the example text, of 20 lines, below :

      Line 01
      Line 02
      Line 03 Text_1
      Line 04
      Line_05
      Text_1
      Line 07
      Line 08
      Line 09 Text_1
      Line_10
      Line_11
      Text_2 Line 12
      Line_13
      Line_14
      Text_2
      Line_16
      Line_17
      Text_2 Line_18
      Line_19
      Line_20
      

      Do you like to delete, in one go, :

      • All lines till the first occurrence of Text_1 ( so, lines 01,02 and 03 ) AND all lines from the first occurrence of Text_2 ( so, from lines 12 to 20 ) = case A

      • All lines till the first occurrence of Text_1 ( so, lines 01,02 and 03 ) AND all lines from the last occurrence of Text_2 ( so, lines 18, 19 and 20 ) = case B

      • All lines till the last occurrence of Text_1 ( so, from lines 01 to 09 ) AND all lines from the first occurrence of Text_2 ( so, from lines 12 to 20 ) = case C

      • All lines till the last occurrence of Text_1 ( so, from lines 01 to 09 ) AND all lines from the last occurrence of Text_2 ( so, lines 18, 19 and 20 ) = case D

      Just, tell me which case ( A, B, C or D ), we’ll have to find out the regex for ?

      Keep in mind, that I, implicitly, suppose that :

      • No string Text_1 may occur, after the first occurrence of string Text_2 !!

      • The strings Text_1 and/or Text_2 may appeared alone, in a line

      See you later,

      Best regards

      guy038

      P.S. :

      Anyway, Vasile, I updated my reply, one hour, later !

      Here are, below, the solution to the FOUR cases, given the exact previous example text, above.

      We just have to make, successively, each quantifier Star, lazy or greedy ! So :

      • Case A : SEARCH = (?s).*?Text_1\R(.*?)Text_2.* , which keeps the lines 04 to 11, only

      • Case B : SEARCH = (?s).*?Text_1\R(.*)Text_2.* , which keeps the lines 04 to 17, only

      • Case C : SEARCH = (?s).*Text_1\R(.*?)Text_2.* , which keeps the lines 10 to 11, only

      • Case D : SEARCH = (?s).*Text_1\R(.*)Text_2.* , which keeps the lines 10 to 17, only

      NOTE :

      • For these four cases, the replacement regex is, simply, \1 OR $1
      1 Reply Last reply Reply Quote 0
      • Vasile CarausV
        Vasile Caraus
        last edited by Vasile Caraus

        guy, works great. But if I want to replace the entire line from fisrt part (?s).*?Text_1\R(.*?), so not only from a word, but the entire line that contains that word

        1 Reply Last reply Reply Quote 0
        • Scott SumnerS
          Scott Sumner
          last edited by

          guy038 should get paid a consulting rate for his excellent quality and quantity of answers! His perseverance is remarkable!

          1 Reply Last reply Reply Quote 2
          • Vasile CarausV
            Vasile Caraus
            last edited by

            Guy, is a very talented and smart. He is the one that helps notepad users to grow and develop, step by step.

            Maybe someday, somebody will need the answers of my questions, and Guy is the one that make possible !

            Evolution starts with questions…and answers !

            by the way, I am not a programmer, in fact, I don’t have almost any connection with this domain. But, I learn basic, which helps me a lot in other way ! Thank you Guy038 !

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello Vasile and Scott,

              Yeah, Scott, you’re right about it ! So, I could drink some more beers, as the weather is quite hot, presently, in Grenoble !!

              No problem, Vasile. So, I’m starting with the original text below :

              Line 01
              Line 02
              Line 03 Text_1 Line 03
              Line 04
              Line_05
              Line 06 Text_1 Line 06
              Line 07
              Line 08
              Line 09 Text_1 Line 09
              Line 10
              Line 11
              Line 12 Text_2 Line 12
              Line 13
              Line 14
              Line 15 Text_2 Line 15
              Line 16
              Line 17
              Line 18 Text_2 Line 18
              Line 19
              Line 20
              

              As you can see, this time, the strings Text_1 and Text_2, in lines 03, 06, 09, 12, 15 and 18 are, all, embedded in the template Line ##......Line ##


              I keep the same principle, using lazy and greedy quantifiers star *. That leads to the four regexes, below :

              • Case A : SEARCH = (?s).*?Text_1(?-s).*\R((?:.*\R)*?).*Text_2(?s).* , which keeps the lines 04 to 11

              • Case B : SEARCH = (?s).*?Text_1(?-s).*\R((?:.*\R)*).*Text_2(?s).* , which keeps the lines 04 to 17

              • Case C : SEARCH = (?s).*Text_1(?-s).*\R((?:.*\R)*?).*Text_2(?s).* , which keeps the lines 10 to 11

              • Case D : SEARCH = (?s).*Text_1(?-s).*\R((?:.*\R)*).*Text_2(?s).* , which keeps the lines 10 to 17

              Remark :

              • Remember that an in-line modifier keeps set, till an opposite modifier is met in the regex or till the end of the regex is reached !

              • The replacement regex has not changed : \1 or $1

              Cheers

              guy038

              P.S… :

              If some parts of the regexes seems too difficult, just ask me for further information ! It’s, simply, a question of mind’s gymnastics, that anyone can learn about !

              Also, try, to visualize the position of the regex engine, while executing the regex, especially when Look-Around ( Look-Behind or Look-Ahead ) are used. Indeed, in that case, the location of the regex engine does NOT change while evaluating the look-around !

              For instance, with the subject text This is a simple text to visualize the cursor location of the regex engine

              Then, the regex (?-s)(?=.*regex).{4} matches the four letters of this sentence ( the word this ). Let’s us split the process :

              • Cursor location is just before the first letter T of the text

              • The regex engine tries to verify if, from the present cursor location, the look-ahead. In other words, if there is, further on, on the same line, the string regex ?

              • As this condition is true, the regex engine goes on, executing the following regex code .{4}

              • But the working position of the regex engine, is, STILL, before the first letter T of the text !

              • Therefore, the regex engine matches the first four characters of the subject string, that is to say, the word this


              Note that IF the word regex would NOT have been found, in the text, the regex engine would have delivered the message :
              Can't find the text "(?-s)(?=.*regex).{4}" !

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV
                Vasile Caraus
                last edited by

                thanks a lot Guy !

                1 Reply Last reply Reply Quote 0
                • Vasile CarausV
                  Vasile Caraus
                  last edited by

                  hello Guy. And If I want to matc (in the last formulas) the first instance of Text_1 and the last instance of Text_2?

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi Vasile,

                    I, first, thought that the regex (?s).*?\KText_1|.*\KText_2 would give you the exact matches that you said :

                    And If I want to match (in the last formulas) the first instance of Text_1 and the last instance of Text_2?

                    Unfortunately, when using the search functionality, only, this regex matches any string Text_1, then the last string text_2 ! And, I was not able to get the right regex, which could find, in the current file, the first instance of Text_1, then the last instance of Text_2 :-((

                    However, the regex (?s).*?\KText_1.*Text_2 allows us to select, in one go, all the gap, between these two specific boundaries, included !

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Vasile CarausV
                      Vasile Caraus
                      last edited by Vasile Caraus

                      hello again. I have many of <tr></tr> tags on a html page. I want to select with regex only this first instance of <tr> tags. I made a regex, but this formula selects both <tr>. tags. I want only the first one, not the second one with Other Code

                      FIND: \b<tr>[\s\S]+</tr>\b

                      <tr>
                      <td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
                      `</tr>
                      

                      and more

                      <tr>
                      Other Code
                      </tr>
                      
                      Vasile CarausV 1 Reply Last reply Reply Quote 0
                      • Terry RT
                        Terry R
                        last edited by

                        @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                        FIND: \b<tr>[\s\S]+</tr>\b

                        The simplest change I can see is to put a ? behind the + character as your regex is greedy. I presume it is currently going to the last </tr> in the file.

                        Also as far as I can see the \s\S combination means every character including CR and LF one’s. The whole thing could be rewritten as (?s)\b<tr>.+?</tf>\b.

                        I’m not on a PC to currently check my answer so apologies if I have it slightly wrong.

                        Terry

                        1 Reply Last reply Reply Quote 0
                        • Vasile CarausV
                          Vasile Caraus
                          last edited by

                          @Terry-R said in Regex: Select only the first instance of search results / first match:

                          (?s)\b<tr>.+?</tf>\b

                          your (?s)\b<tr>.+?</tr>\b is not working :(

                          I also try something different, also not working :( (?:^(?ms)(<tr>).*?(</tr>))

                          Terry RT 2 Replies Last reply Reply Quote 0
                          • Vasile CarausV
                            Vasile Caraus @Vasile Caraus
                            last edited by Vasile Caraus

                            I also try another combination, not working (?-s)(\b(?!^<tr>(.+)</tr>)

                            :((

                            <tr>
                            <td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
                            </tr>
                            

                            code

                            <tr>
                            Other Code
                            </tr>
                            
                            1 Reply Last reply Reply Quote 0
                            • Terry RT
                              Terry R @Vasile Caraus
                              last edited by Terry R

                              @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                              your (?s)\b<tr>.+?</tr>\b is not working :(

                              What does it do? Does it select anything. And sorry for the typo with the tf which I see you caught.
                              The (?s) is necessary to cross lines. As you had \b I also included them although they could both be removed as a test.

                              Terry

                              Vasile CarausV 1 Reply Last reply Reply Quote 0
                              • Vasile CarausV
                                Vasile Caraus
                                last edited by

                                I believe only @guy038 can find a good answer :)

                                Alan KilbornA 1 Reply Last reply Reply Quote 0
                                • Alan KilbornA
                                  Alan Kilborn @Vasile Caraus
                                  last edited by Alan Kilborn

                                  @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                                  I believe only @guy038 can find a good answer

                                  Nice kick in the teeth for Terry, who was trying to help you…
                                  :-) not withstanding.

                                  1 Reply Last reply Reply Quote 2
                                  • Vasile CarausV
                                    Vasile Caraus @Terry R
                                    last edited by Vasile Caraus

                                    @Terry-R I want to select everything from <tr> to </tr> but only one instance, the first instance, because I have many tags starting with <tr> and close with </tr>

                                    Alan KilbornA PeterJonesP 2 Replies Last reply Reply Quote 0
                                    • Alan KilbornA
                                      Alan Kilborn @Vasile Caraus
                                      last edited by

                                      @Vasile-Caraus

                                      Wouldn’t (?s)\A.*?<tr>.*?</tr> work to get only the first?

                                      1 Reply Last reply Reply Quote 1
                                      • PeterJonesP
                                        PeterJones @Vasile Caraus
                                        last edited by

                                        @Vasile-Caraus ,

                                        I would normally say that you should have started a new topic, rather than reviving one from 4.5 years ago. But since 4.5 years later, you still haven’t learned the lesson that Guy taught you in 2016, maybe you should be in the same single topic – but it would be better to actually learn from the dozens of different regular expressions that we have provided for you over the last 4.5 years. This forum is not a regular expression help forum – this is a Notepad++ discussion forum, where regex are only a small part of the power of Notepad++.

                                        In your new question, you state ,

                                        I have many of <tr></tr> tags on a html page.

                                        That phrasing, in English, implies you have only one HTML page you are doing this to. If that’s really the case, then it’s really simple: take the simple regex you guessed, and hit FIND once and REPLACE once, and you are done. Or you could have just gone to the beginning of the document, and done a single search for <tr> and then manually done the replacement, which is even easier.

                                        But I doubt that’s your real situation. The only reason it would make sense to ask this question is if you were really doing a Find In Files > Replace All, in order to make this single change in multiple HTML files.

                                        As Guy explained in 2016, if you only want to replace one instance per file (the first instance), you can do that by consuming the rest of the file in the single regex. That will work for small files, but if your files are too large, it will not work, because regex has only a certain amount of capture memory.

                                        Fortunately, since then, assuming you have updated Notepad++, the developers have fixed the \A anchor, so the beginning-of-file check works – as @Alan-Kilborn showed in his recent reply.

                                        I will modify his (and the regex I was going to supply, which consumed all), to give an example replacement

                                        If I have the simple file:

                                        <html><body>
                                        <table>
                                        <tr>
                                        get rid of stuff including <embedded/> <tags/>
                                        </tr>
                                        <tr>
                                        keep stuff including <embedded/> <tags/>
                                        </tr>
                                        </table>
                                        </body>
                                        </html>
                                        

                                        and I run

                                        • FIND = (?s)\A.*?<tr>\s*\K.*?(\s*</tr>)
                                          REPLACE = new contents$1
                                          MODE = regular expression
                                          REPLACE ALL

                                        then I get

                                        <html><body>
                                        <table>
                                        <tr>
                                        new contents
                                        </tr>
                                        <tr>
                                        keep stuff including <embedded/> <tags/>
                                        </tr>
                                        </table>
                                        </body>
                                        </html>
                                        

                                        This should work, even on long files. It should work the same if you’re using Find in Files instead of the single-file Replace dialog.

                                        Compared to @Alan-Kilborn’s regex, I added the feature that it uses the \K reset to automatically keep everything up to the first <tr>. I also kept the spaces after the <tr> and the spaces before the </tr> (where “spaces” include any space character, even newlines), so that way if you have <tr>blah</tr> all on one line, your replacement will stay all on one line, but if you have the three-line version like you showed, it will stay as three lines.

                                        Because I captured the final spaces and </tr>, I had to include $1 in the replacement to re-instate that part of the text. But it could have been done with positive lookahead instead, meaning you wouldn’t need the group text in the replacement. TIMTOWTDI.

                                        Now that we’ve given you an answer that works for the situation you described, please take the following advice: Please remember that this isn’t a “give me a regex forum”. This isn’t even a paid support, where we are obligated to help you. This is a community to discuss Notepad++; we will answer the occasional Notepad++-related regex question, especially for new users who have never been exposed to regular expressions. But we expect people who have been around the forum for many years to participate in helping others, not just in getting free regex creation service. Please learn from the four and a half years of regular expression advice we have been providing you. Many, many times, we have linked you to the regular expression documentation. I will give you my boiler plate one more time, just in case you missed it the last however many times I’ve posted it here. But please understand that if you continue to show a disregard for our previous advice, and if you continue to just “request” that we craft regex for you, rather than truly participating in the forum, you will find fewer and fewer here who are willing to help you, and you might start noticing downvotes on your questions.

                                        ----

                                        Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                                        1 Reply Last reply Reply Quote 3
                                        • Vasile CarausV
                                          Vasile Caraus
                                          last edited by

                                          This post is deleted!
                                          1 Reply Last reply Reply Quote 0
                                          • Vasile CarausV
                                            Vasile Caraus
                                            last edited by

                                            @PeterJones said in Regex: Select only the first instance of search results / first match:

                                            <tr>
                                            get rid of stuff including <embedded/> <tags/>
                                            </tr>

                                            how about for the last instance? I should use \z isn’t it ?

                                            Like this: (?s)\z.*?<tr>\s*\K.*?(\s*</tr>)

                                            but this does not select the last instance. what did I do wrong

                                            PeterJonesP 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors