• Login
Community
  • Login

Regex: Select only the first instance of search results / first match

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
54 Posts 7 Posters 48.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Sep 13, 2016, 11:15 AM Sep 11, 2016, 5:47 PM

    Hello Vasile,

    I’ve got a solution, based on the one I gave in my last post, on the topic : Regex : Double your words, below, where I also matched all the remaining contents of the file, in order to be certain that will be ONE replacement, only, per file !

    https://notepad-plus-plus.org/community/topic/12341/regex-double-your-words/9

    I, also, supposed, Vasile, that the line, containing your string <div class=“pagination”>, may have some characters before and/or after that specific string.

    Well, Let’s go !

    • To change the contents of the FIRST line, ONLY, of the current file, which contains the string <div class=“pagination”>, use :

    SEARCH : (?-s)(?:.*\R)*?\K.*<div class=“pagination”>.*(?s)(\R.*)

    REPLACE : New contents of the line\1

    • To change the contents of the LAST line, ONLY, of the current file, which contains the string <div class=“pagination”>, use :

    SEARCH : (?-s)(?:.*\R)*\K.*<div class=“pagination”>.*(?s)(\R.*)

    REPLACE : New contents of the line\1

    ***** With the help of Vasile, see, below, in my next post, a shorter version of these two regexes ! *****


    Notes :

    • I won’t speak about the (?-s) and (?s) in-line modifiers ! You already aware about their use :-)

    • The first part (?:.*\R)*? catches the minimum number of complete lines, before the line containing the string <div class=“pagination”>

    • Then, again, the \K syntax forces the regex engine to forget the present match and reset the cursor location just before the first character of the line to be changed

    • Then the next part .*<div class=“pagination”>.* corresponds to all the standard characters of the line to be changed

    • And the final part (\R.*) stands for the EOL character(s) of the line to be changed, followed by all the text, till the end of the current file

    • In the second S/R, the first part (?:.*\R)* catches the maximum number of complete lines, before the line containing the string <div class=“pagination”>

    • In replacement, we just changed the contents of line, containing the string <div class=“pagination”>, by the string New contents of the line, followed by the contents of group 1 ( = text from the next line to the end of the current file )


    IMPORTANT :

    As usual, if you perform these S/R, on a few files, using the Replace dialog ( CTRL + H ), just remember these two rules :

    • Firstly, go to the very beginning of the current file ( CTRL + Origin )

    • Secondly use, exclusively, the Replace All button ( Due to the \K syntax, the step by step replacement, with the Replace button, does NOT work ! )

    Cheers,

    guy038

    P.S. :

    Of course, if the Find/Replace dialog would contain the four, non standard, options :

    • Skip the first N matches

    • Find/Replace the next M matches, only

    • Per Line : [X] or Per File : [X]

    Vasile, you just would have to type 0 for number N, 1 for number M and check the Per File option.

    Then, this simple following S/R would be enough !

    SEARCH : (?-s).*<div class=“pagination”>.*

    REPLACE : New contents of the line

    1 Reply Last reply Reply Quote 0
    • V
      Vasile Caraus
      last edited by Sep 11, 2016, 7:52 PM

      hello guy, I must say you have always something great to say about regex. You helped me a lot !

      about the first part, the two regex works fine.

      But, I don’t really understand the last part (P.S. :) the four, non standard, options

      I cannot see the

      or Per File : [X]

      I guess, something is missing on my part. Can u give me a print screen?

      1 Reply Last reply Reply Quote 0
      • V
        Vasile Caraus
        last edited by Sep 12, 2016, 12:48 PM

        I found on internet a more simple way

        Step1. Enable the . matches newline option

        Search
        <div class=“pagination”>(.*)\z
        Replace by:
        Anything $1

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by Sep 13, 2016, 11:11 AM

          Hi, Vasile,

          Two points :

          • Firstly, Vasile, don’t be mistaken about what I wrote, at the end of my previous post ! I was just dreaming about it ! Of course, these options are NOT part of the present “Find/Replace” dialog. And they, probably, will NEVER be :-((

          I just wanted to point out that these additional options could help us, in some cases, to build more simple regexes !


          • Secondly, yes, you’re right : your regex is more elegant ! But it works ONLY IF there is ONE string <div class=“pagination”>, exactly, in your current file. And seemingly, you said :

          The problem is that this line is repeated 5 times in each html file

          But, indeed, it’s incredible how our brain is disposed to make simple things more complicated:-(( So, from your interesting regex, we can, still, simplify the previous regexes :


          • To change the contents of the FIRST line, ONLY, of the current file, containing the string <div class=“pagination”>, use :

          SEARCH : (?s)<div class=“pagination”>(.*)

          REPLACE : Anything else\1

          • To change the contents of the LAST line, ONLY, of the current file, containing the string <div class=“pagination”>, use :

          SEARCH : (?s).*\K<div class=“pagination”>(.*)

          REPLACE : Anything else\1


          I don’t think we’ll be able to get shorter regexes !!

          Notes :

          • As usual, the in-line modifier (?s) prevent us to mind about checking/unchecking the .matches newline option

          • In the second regex, we need to add, at the beginning, the form .*\K, in order to get the maximum range of characters till the last string <div class=“pagination”>, of the current file

          • The last part of the regex (.*) represents the remaining text, after the matched string <div class=“pagination”>, till the very end of file

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 0
          • V
            Vasile Caraus
            last edited by Vasile Caraus Sep 13, 2016, 2:36 PM Sep 13, 2016, 2:35 PM

            hello guy, works wonderful. Thank you.

            But one more thing, I cannot get the solution. At the regex below, as you can see, first part I select all before a word. And part two select all after a word. Practicaly, I select a middle text from a file. Works beautiful. I use Replace All for more then 2000 files.

            (?s)((^.*)(DELETE_UNTIL_THIS_TEXT)|(DELETE_AFTER_THIS_TEXT)(.*$))

            Sometimes, the problem is when I have more instances at the last part. For exemple:

            text_1
            DELETE_UNTIL_THIS_TEXT (text_1)
            –my text–
            –my text–
            DELETE_AFTER_THIS_TEXT (text_2)
            text_2
            text_2
            text_2

            So as you can see at the last part, I have three (or much more) instances (occurences) of the same “text_2.”
            When I run regex, will delete all the instances text_2.

            So, I want to delete just the last instance of text_2 in the regex. I will right again, but it should be modify a little bit for the second part.

            (?s)((^.*)(text_1)|(text_2)(.*$))

            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by guy038 Sep 13, 2016, 5:56 PM Sep 13, 2016, 4:32 PM

              Hello Vasile,

              As you know, regexes force and help us to keep a rigorous attitude, as well as programming do ! So I, slightly, change your example text, in order to exactly see what you need !

              So, let’s suppose the example text, of 20 lines, below :

              Line 01
              Line 02
              Line 03 Text_1
              Line 04
              Line_05
              Text_1
              Line 07
              Line 08
              Line 09 Text_1
              Line_10
              Line_11
              Text_2 Line 12
              Line_13
              Line_14
              Text_2
              Line_16
              Line_17
              Text_2 Line_18
              Line_19
              Line_20
              

              Do you like to delete, in one go, :

              • All lines till the first occurrence of Text_1 ( so, lines 01,02 and 03 ) AND all lines from the first occurrence of Text_2 ( so, from lines 12 to 20 ) = case A

              • All lines till the first occurrence of Text_1 ( so, lines 01,02 and 03 ) AND all lines from the last occurrence of Text_2 ( so, lines 18, 19 and 20 ) = case B

              • All lines till the last occurrence of Text_1 ( so, from lines 01 to 09 ) AND all lines from the first occurrence of Text_2 ( so, from lines 12 to 20 ) = case C

              • All lines till the last occurrence of Text_1 ( so, from lines 01 to 09 ) AND all lines from the last occurrence of Text_2 ( so, lines 18, 19 and 20 ) = case D

              Just, tell me which case ( A, B, C or D ), we’ll have to find out the regex for ?

              Keep in mind, that I, implicitly, suppose that :

              • No string Text_1 may occur, after the first occurrence of string Text_2 !!

              • The strings Text_1 and/or Text_2 may appeared alone, in a line

              See you later,

              Best regards

              guy038

              P.S. :

              Anyway, Vasile, I updated my reply, one hour, later !

              Here are, below, the solution to the FOUR cases, given the exact previous example text, above.

              We just have to make, successively, each quantifier Star, lazy or greedy ! So :

              • Case A : SEARCH = (?s).*?Text_1\R(.*?)Text_2.* , which keeps the lines 04 to 11, only

              • Case B : SEARCH = (?s).*?Text_1\R(.*)Text_2.* , which keeps the lines 04 to 17, only

              • Case C : SEARCH = (?s).*Text_1\R(.*?)Text_2.* , which keeps the lines 10 to 11, only

              • Case D : SEARCH = (?s).*Text_1\R(.*)Text_2.* , which keeps the lines 10 to 17, only

              NOTE :

              • For these four cases, the replacement regex is, simply, \1 OR $1
              1 Reply Last reply Reply Quote 0
              • V
                Vasile Caraus
                last edited by Vasile Caraus Sep 13, 2016, 6:11 PM Sep 13, 2016, 6:11 PM

                guy, works great. But if I want to replace the entire line from fisrt part (?s).*?Text_1\R(.*?), so not only from a word, but the entire line that contains that word

                1 Reply Last reply Reply Quote 0
                • Scott SumnerS
                  Scott Sumner
                  last edited by Sep 13, 2016, 6:19 PM

                  guy038 should get paid a consulting rate for his excellent quality and quantity of answers! His perseverance is remarkable!

                  1 Reply Last reply Reply Quote 2
                  • V
                    Vasile Caraus
                    last edited by Sep 13, 2016, 7:46 PM

                    Guy, is a very talented and smart. He is the one that helps notepad users to grow and develop, step by step.

                    Maybe someday, somebody will need the answers of my questions, and Guy is the one that make possible !

                    Evolution starts with questions…and answers !

                    by the way, I am not a programmer, in fact, I don’t have almost any connection with this domain. But, I learn basic, which helps me a lot in other way ! Thank you Guy038 !

                    1 Reply Last reply Reply Quote 0
                    • G
                      guy038
                      last edited by guy038 Sep 14, 2016, 3:43 AM Sep 13, 2016, 8:10 PM

                      Hello Vasile and Scott,

                      Yeah, Scott, you’re right about it ! So, I could drink some more beers, as the weather is quite hot, presently, in Grenoble !!

                      No problem, Vasile. So, I’m starting with the original text below :

                      Line 01
                      Line 02
                      Line 03 Text_1 Line 03
                      Line 04
                      Line_05
                      Line 06 Text_1 Line 06
                      Line 07
                      Line 08
                      Line 09 Text_1 Line 09
                      Line 10
                      Line 11
                      Line 12 Text_2 Line 12
                      Line 13
                      Line 14
                      Line 15 Text_2 Line 15
                      Line 16
                      Line 17
                      Line 18 Text_2 Line 18
                      Line 19
                      Line 20
                      

                      As you can see, this time, the strings Text_1 and Text_2, in lines 03, 06, 09, 12, 15 and 18 are, all, embedded in the template Line ##......Line ##


                      I keep the same principle, using lazy and greedy quantifiers star *. That leads to the four regexes, below :

                      • Case A : SEARCH = (?s).*?Text_1(?-s).*\R((?:.*\R)*?).*Text_2(?s).* , which keeps the lines 04 to 11

                      • Case B : SEARCH = (?s).*?Text_1(?-s).*\R((?:.*\R)*).*Text_2(?s).* , which keeps the lines 04 to 17

                      • Case C : SEARCH = (?s).*Text_1(?-s).*\R((?:.*\R)*?).*Text_2(?s).* , which keeps the lines 10 to 11

                      • Case D : SEARCH = (?s).*Text_1(?-s).*\R((?:.*\R)*).*Text_2(?s).* , which keeps the lines 10 to 17

                      Remark :

                      • Remember that an in-line modifier keeps set, till an opposite modifier is met in the regex or till the end of the regex is reached !

                      • The replacement regex has not changed : \1 or $1

                      Cheers

                      guy038

                      P.S… :

                      If some parts of the regexes seems too difficult, just ask me for further information ! It’s, simply, a question of mind’s gymnastics, that anyone can learn about !

                      Also, try, to visualize the position of the regex engine, while executing the regex, especially when Look-Around ( Look-Behind or Look-Ahead ) are used. Indeed, in that case, the location of the regex engine does NOT change while evaluating the look-around !

                      For instance, with the subject text This is a simple text to visualize the cursor location of the regex engine

                      Then, the regex (?-s)(?=.*regex).{4} matches the four letters of this sentence ( the word this ). Let’s us split the process :

                      • Cursor location is just before the first letter T of the text

                      • The regex engine tries to verify if, from the present cursor location, the look-ahead. In other words, if there is, further on, on the same line, the string regex ?

                      • As this condition is true, the regex engine goes on, executing the following regex code .{4}

                      • But the working position of the regex engine, is, STILL, before the first letter T of the text !

                      • Therefore, the regex engine matches the first four characters of the subject string, that is to say, the word this


                      Note that IF the word regex would NOT have been found, in the text, the regex engine would have delivered the message :
                      Can't find the text "(?-s)(?=.*regex).{4}" !

                      1 Reply Last reply Reply Quote 0
                      • V
                        Vasile Caraus
                        last edited by Sep 14, 2016, 6:42 AM

                        thanks a lot Guy !

                        1 Reply Last reply Reply Quote 0
                        • V
                          Vasile Caraus
                          last edited by Sep 18, 2016, 1:32 PM

                          hello Guy. And If I want to matc (in the last formulas) the first instance of Text_1 and the last instance of Text_2?

                          1 Reply Last reply Reply Quote 0
                          • G
                            guy038
                            last edited by Sep 19, 2016, 10:46 PM

                            Hi Vasile,

                            I, first, thought that the regex (?s).*?\KText_1|.*\KText_2 would give you the exact matches that you said :

                            And If I want to match (in the last formulas) the first instance of Text_1 and the last instance of Text_2?

                            Unfortunately, when using the search functionality, only, this regex matches any string Text_1, then the last string text_2 ! And, I was not able to get the right regex, which could find, in the current file, the first instance of Text_1, then the last instance of Text_2 :-((

                            However, the regex (?s).*?\KText_1.*Text_2 allows us to select, in one go, all the gap, between these two specific boundaries, included !

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 0
                            • V
                              Vasile Caraus
                              last edited by Vasile Caraus Feb 16, 2021, 9:23 AM Feb 16, 2021, 9:23 AM

                              hello again. I have many of <tr></tr> tags on a html page. I want to select with regex only this first instance of <tr> tags. I made a regex, but this formula selects both <tr>. tags. I want only the first one, not the second one with Other Code

                              FIND: \b<tr>[\s\S]+</tr>\b

                              <tr>
                              <td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
                              `</tr>
                              

                              and more

                              <tr>
                              Other Code
                              </tr>
                              
                              V 1 Reply Last reply Feb 16, 2021, 10:50 AM Reply Quote 0
                              • Terry RT
                                Terry R
                                last edited by Feb 16, 2021, 10:29 AM

                                @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                                FIND: \b<tr>[\s\S]+</tr>\b

                                The simplest change I can see is to put a ? behind the + character as your regex is greedy. I presume it is currently going to the last </tr> in the file.

                                Also as far as I can see the \s\S combination means every character including CR and LF one’s. The whole thing could be rewritten as (?s)\b<tr>.+?</tf>\b.

                                I’m not on a PC to currently check my answer so apologies if I have it slightly wrong.

                                Terry

                                1 Reply Last reply Reply Quote 0
                                • V
                                  Vasile Caraus
                                  last edited by Feb 16, 2021, 10:45 AM

                                  @Terry-R said in Regex: Select only the first instance of search results / first match:

                                  (?s)\b<tr>.+?</tf>\b

                                  your (?s)\b<tr>.+?</tr>\b is not working :(

                                  I also try something different, also not working :( (?:^(?ms)(<tr>).*?(</tr>))

                                  Terry RT 2 Replies Last reply Feb 16, 2021, 10:57 AM Reply Quote 0
                                  • V
                                    Vasile Caraus @Vasile Caraus
                                    last edited by Vasile Caraus Feb 16, 2021, 10:51 AM Feb 16, 2021, 10:50 AM

                                    I also try another combination, not working (?-s)(\b(?!^<tr>(.+)</tr>)

                                    :((

                                    <tr>
                                    <td class="right">On December 15, 2012, in <a href="https://mywebsite.com/index.html" title="See all articles here" class="external" rel="category tag">Expert-Expert</a>, by Michael Ende</td>
                                    </tr>
                                    

                                    code

                                    <tr>
                                    Other Code
                                    </tr>
                                    
                                    1 Reply Last reply Reply Quote 0
                                    • Terry RT
                                      Terry R @Vasile Caraus
                                      last edited by Terry R Feb 16, 2021, 10:58 AM Feb 16, 2021, 10:57 AM

                                      @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                                      your (?s)\b<tr>.+?</tr>\b is not working :(

                                      What does it do? Does it select anything. And sorry for the typo with the tf which I see you caught.
                                      The (?s) is necessary to cross lines. As you had \b I also included them although they could both be removed as a test.

                                      Terry

                                      V 1 Reply Last reply Feb 16, 2021, 1:51 PM Reply Quote 0
                                      • V
                                        Vasile Caraus
                                        last edited by Feb 16, 2021, 1:37 PM

                                        I believe only @guy038 can find a good answer :)

                                        Alan KilbornA 1 Reply Last reply Feb 16, 2021, 1:41 PM Reply Quote 0
                                        • Alan KilbornA
                                          Alan Kilborn @Vasile Caraus
                                          last edited by Alan Kilborn Feb 16, 2021, 1:42 PM Feb 16, 2021, 1:41 PM

                                          @Vasile-Caraus said in Regex: Select only the first instance of search results / first match:

                                          I believe only @guy038 can find a good answer

                                          Nice kick in the teeth for Terry, who was trying to help you…
                                          :-) not withstanding.

                                          1 Reply Last reply Reply Quote 2
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors