Community
    • Login

    regex: Match string not containing string

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    14 Posts 4 Posters 20.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN
      Neculai I. Fantanaru
      last edited by Neculai I. Fantanaru

      Good day. and Merry Christmas !!

      I have this lines all starting with tag <p class=“TESTA”> and ending with tag <br> except the last two.

      <p class="TESTA">I love you.<br>
      <p class="TESTA">You love her.<br>
      <p class="TEXTA">She loves me.<LLbr>
      <p class="TEXTA">It is not about me.<AAbr>
      

      My output result should be

      <p class="TEXTA">She loves me.<LLbr>
      <p class="TEXTA">It is not about me.<AAbr>
      

      I made a regex, but is not too good. I use ?! and \b but seems it is not a very good idea.

      (?-s)(.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$

      Meta ChuhM 1 Reply Last reply Reply Quote 1
      • rinku singhR
        rinku singh
        last edited by

        try compare
        https://regex101.com/r/We7Afi/1

        https://notepad-plus-plus.org/community/topic/16817/regex-find-all-lines-starting-with-a-specific-tag-and-ending-with-a-different-tag/3

        1 Reply Last reply Reply Quote 0
        • Meta ChuhM
          Meta Chuh moderator @Neculai I. Fantanaru
          last edited by Meta Chuh

          hi @Neculai-I.-Fantanaru
          merry christmas to you too !!

          this regex will work on your given example:

          find what: ^(.*?)TESTA(.*?)(\r\n|\r|\n)
          replace with: (leave empty)
          search mode: regular expression
          click on replace all

          so from your example:

          <p class="TESTA">I love you.<br>
          <p class="TESTA">You love her.<br>
          <p class="TEXTA">She loves me.<LLbr>
          <p class="TEXTA">It is not about me.<AAbr>
          

          every line will be deleted except:

          <p class="TEXTA">She loves me.<LLbr>
          <p class="TEXTA">It is not about me.<AAbr>
          
          1 Reply Last reply Reply Quote 1
          • Neculai I. FantanaruN
            Neculai I. Fantanaru
            last edited by

            yes, @gurikbal-singh , I inspired myself from a previous topic, but it’s not the same thing. This is why I open another topic, even if it’s close.

            1 Reply Last reply Reply Quote 0
            • Neculai I. FantanaruN
              Neculai I. Fantanaru
              last edited by Neculai I. Fantanaru

              yes, I find the solution. I believe my mistake was the fact I also use (?-s) in a negative lookahead regex. Doesn’t work this way.

              <p class="TEXTA">(?:(?!<br>).)*$

              or

              (.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hello, @neculai-i-fantanaru, and All

                In order to match complete non-empty lines which do NOT contain a specific string, let’s say, the word TEXT, with that exact case, here are, below, 5 regexes :

                • Regex A : (?-is)^(?!.*TEXT).+\R matches any line which does NOT contain the string TEXT

                • Regex B : (?-is)^(?!^TEXT).+\R matches any line which does NOT contain the string TEXT, at beginning of line

                • Regex C : (?-is)^(?!.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at end of line

                • Regex D : (?-is)^(?!^TEXT|.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at beginning OR at end of line

                • Regex E : (?-is)^(?!^.+TEXT.+$).+\R matches any line which does NOT contain the string TEXT, NOT at line boundaries


                In the table, below, the lines matched are noted with a X and therefore, will be deleted, if the Replace zone is empty

                •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
                |              Lines Scanned              |  Regex A  |  Regex B  |  Regex C  |  Regex D  |  Regex E  |
                •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
                |  TEST : I love you.                     |     X     |     X     |     X     |     X     |     X     |
                |  TEXT : She loves me.                   |           |           |     X     |           |     X     |
                |  ABCD : It is not about me.             |     X     |     X     |     X     |     X     |     X     |
                |  TEXT : You love her.                   |           |           |     X     |           |     X     |
                •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
                |  Statement "TEST" : I love you.         |     X     |     X     |     X     |     X     |     X     |
                |  Statement "TEXT" : She loves me.       |           |     X     |     X     |     X     |           |
                |  Statement "ABCD" : It is not about me. |     X     |     X     |     X     |     X     |     X     |
                |  Statement "TEXT" : You love her.       |           |     X     |     X     |     X     |           |
                •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
                |  I love you.          = TEST            |     X     |     X     |     X     |     X     |     X     |
                |  She loves me.        = TEXT            |           |     X     |           |           |     X     |
                |  It is not about me.  = ABCD            |     X     |     X     |     X     |     X     |     X     |
                |  You love her.        = TEXT            |           |     X     |           |           |     X     |
                •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
                

                Remark : Of course, for correct testing of these regexes, just copy the text provided, in that way :

                TEST : I love you.
                TEXT : She loves me.
                ABCD : It is not about me.
                TEXT : You love her.
                
                Statement "TEST" : I love you.
                Statement "TEXT" : She loves me.
                Statement "ABCD" : It is not about me.
                Statement "TEXT" : You love her.
                
                I love you.          = TEST
                She loves me.        = TEXT
                It is not about me.  = ABCD
                You love her.        = TEXT
                

                Now, to match all complete non-empty lines which do NOT contain the expression <p class="TEXT">, possibly preceded by some blank characters, with that exact case, use the regex :

                Regex F : (?-is)^(?!\h*<p class="TEXT">).+\R

                •-------------------------------------------•-----------•
                |               Lines Scanned               |  Regex F  |
                •-------------------------------------------•-----------•
                |  <p class="TEST">I love you.<br>          |     X     |
                |      <p class="TEXT">She loves me.<br>    |           |
                |  <p class="ABCD">It is not about me.<br>  |     X     |
                |  <p class="TEXT">You love her.<br>        |           |
                •-------------------------------------------•-----------•
                

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 1
                • Neculai I. FantanaruN
                  Neculai I. Fantanaru
                  last edited by Neculai I. Fantanaru

                  yes, @guy038 . but if I have opposite scenario:

                  1. <p class="TEXTA">She loves me.</p>
                  2. <p class="TEXTA">She loves me.<LLbr>
                  3. <p class="TEXTA">It is not about me.<AAbr>
                  4. <p class="TEXTA">She loves me.
                     </p>
                  

                  And I want to select all tags which contains <p class="TEXTA"> but does not contains </p>. So I want to select only the 2 and 3 lines. How can I do this ?

                  Meta ChuhM 1 Reply Last reply Reply Quote 0
                  • Meta ChuhM
                    Meta Chuh moderator @Neculai I. Fantanaru
                    last edited by

                    @Neculai-I.-Fantanaru

                    this regex will work on your new example:

                    find what: ^(.*?)TEXTA(.*?)(.|\R)(.*?)</p>\R
                    replace with: (leave empty)
                    search mode: regular expression
                    click on replace all

                    so from a copy of your example:
                    (it has to have an empty line after all texts for a correct newline \R detection on multi line <p>…</p> tags like the 4. you’ve given in your example)

                    1. <p class="TEXTA">She loves me.</p>
                    2. <p class="TEXTA">She loves me.<LLbr>
                    3. <p class="TEXTA">It is not about me.<AAbr>
                    4. <p class="TEXTA">She loves me.
                       </p>
                    
                    

                    it will delete everything and leave you with:

                    2. <p class="TEXTA">She loves me.<LLbr>
                    3. <p class="TEXTA">It is not about me.<AAbr>
                    
                    

                    if this does not work with your real data, please provide us with a real data example and how your result should look like

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @neculai-i-fantanaru, and All

                      In that case, you could use the regex (?-i)<p class="TEXTA">[^<>]+<(?!/p).+?>

                      Notes :

                      • First, this regex looks for the literal expression <p class="TEXTA">, with that exact case

                      • Followed with a non-empty range of characters, either different from < and >, till an < symbol

                      • Followed with a non-empty range of standard characters till the nearest > symbol, but ONLY IF the string /p cannot be found, right after the < symbol !

                      Just test it, with that text below :

                      <p class="TEXTA">She loves me.</p>
                      <p class="TEXTA">an other
                      test </abc>
                      <p class="TEXTA">She loves me.</p>      <p class="TEXTA">She loves me.</123>
                         <p class="TEXTA">She loves me.<LLbr>    <p class="TEXTA">She loves me.</p>
                         <p class="TEXTA">She loves me.
                         </p>
                      <p class="TEXTA">She loves me.<LLbr>
                      <p class="TEXTA">It is not about me.<AAbr>     <p class="TEXTA">It is
                       not about
                       me.<p>
                      
                      <p class="TEXTA">She loves me.
                         </p>
                      <p class="TEXTA">It is not about me.<AAbr>
                      

                      As I suppose that you would like to replace any bad ending tag, like </abc>, /123, <LLbr>, <AAbr>, or even <p> with the right ending tag </p>, use the following regex S/R :

                      SEARCH <p class="TEXTA">[^<>]+<\K(?!/p).+?(?=>)

                      REPLACE /p

                      Notes :

                      • If you just perform the search part, it just matches any bad ending tag, without the < and > boundaries, which is different from /p

                      • Remember that the \K syntax forces the regex engine to forget everything already matched and reset the working position to the location, right after the < symbol !

                      • If you click on the Replace All button ( not the Replace one ), any bad ending tag is then changed into </p>

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 1
                      • Neculai I. FantanaruN
                        Neculai I. Fantanaru
                        last edited by

                        @guy038 said:

                        (?-i)<p class=“TEXTA”>[^<>]+<(?!/p).+?>

                        Your regex is great. But I just find another case that you may update regex, if you want. Strange thing. I did not take this into account. That can be some other tags in the same tag. For example:

                        <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                        

                        So, to update my last scenario:

                        1. <p class="TEXTA">She loves me.</p>
                        2. <p class="TEXTA">She loves me.<LLbr>
                        3. <p class="TEXTA">It is not about me.<AAbr>
                        4. <p class="TEXTA">She loves me.
                           </p>
                        5.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                        6.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
                        

                        So, the regex it should select lines 2,3 and 6 . Right now, your regex select also the line 5 (because of that 2 <em></em> witch is not good).

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          @neculai-i-fantanaru, and All

                          Ah, OK ! So, I’ve created a regex, using a recursive pattern ( due to the (?1) subroutine to group 1, located inside the group whose it refers to ), which allows the search of any block :

                          • Beginning with the tag <p class="TEXTA">

                          • Ending with a tag, different from </p>, which ends the line

                          • Containing any correct matched areas <tag>.....<tag, possibly juxtaposed and/or nested, as for instance :

                          <p class="TEXTA">.......<abc>.....<def>...
                          ....</def>...........</abc>.........<123>........</123>......<456>....
                          ...</456>............
                          ........<Niv1>.......<Niv2>.........<Niv3>......
                          ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                          ..............</Niv3>..........
                          .......</Niv2>..........
                          .........</Niv1>...........<bla bla bla>
                          

                          Highly unlikely case, isn’t it !

                          So, here is the regex :

                          (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

                          And, again, if you just want to catch the wrong ending tag use the regex :

                          (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<\K(?!/p)[^<>]+?(?=>\R)


                          Test these regexes, against text below. Note that they match only the blocks with even numbers ( 2, 4, 6, … )

                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          1. <p class="TEXTA">She loves me.</p>
                          
                          2. <p class="TEXTA">She loves me.<LLbr>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          3. <p class="TEXTA">It is not
                                   about me
                              </p>
                          
                          4. <p class="TEXTA">It is not
                                   about me.
                              <AAbr>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          5. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                          
                          6. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          7. <p class="TEXTA">I believe in love<em>but
                                 only 
                               if</em>you can make me smile</p>
                          
                          8. <p class="TEXTA">I believe in love<em>but
                                 only 
                               if</em>you can make me smile</html>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          9.  <p class="TEXTA">I believe in love<12345>but
                              only 
                              if</12345>you can make me smile</p>
                          
                          10. <p class="TEXTA">I believe in love<12345>but
                              only 
                              if</12345>you can make me smile</div>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          11. <p class="TEXTA">I believe<em> in love<em>but
                            only 
                          if</em>you can ma</em>ke me smile</p>
                          
                          12. <p class="TEXTA">I believe<em> in love<em>but
                            only 
                          if</em>you can ma</em>ke me smile<h3>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          13.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
                               only 
                               if</em>you can ma</def>ke me smi</em>le</p>
                          
                          14.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
                               only 
                               if</em>you can ma</def>ke me smi</em>le</body>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          15.
                          <p class="TEXTA">I be<abc>lieve<def> in love<em>but
                          only 
                          if</em>you can ma</def>ke me smi</abc>le</p>
                          
                          16.
                          <p class="TEXTA">I be<abc>lieve<def> in love<em>but
                          only 
                          if</em>you can ma</def>ke me smi</abc>le<abcde>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          17. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</p>
                          
                          18. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</a>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          19. <p class="TEXTA">I <code>believe </code>in love<em>but
                                only if<123>you </123>can 
                                make </em>me
                             smile</p>
                          
                          20. <p class="TEXTA">I <code>believe </code>in love<em>but
                                only if<123>you </123>can 
                                make </em>me
                             smile</tr>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          21.<p class="TEXTA"></p>
                          
                          22.<p class="TEXTA"><script>
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          23.   <p class="TEXTA">.......<abc>.....<def>...
                          ....</def>...........</abc>.........<123>........</123>......<456>....
                          ...</456>............
                          ........<Niv1>.......<Niv2>.........<Niv3>......
                          ..............</Niv3>..........
                          ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                          .......</Niv2>..........
                          .........</Niv1>...........</p>
                          
                          24..   <p class="TEXTA">.......<abc>.....<def>...
                          ....</def>...........</abc>.........<123>........</123>......<456>....
                          ...</456>............
                          ........<Niv1>.......<Niv2>.........<Niv3>......
                          ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                          ..............</Niv3>..........
                          ...<Niv3>.......<XXX>...........</XXX>............</Niv3>......
                          .......</Niv2>..........
                          .........</Niv1>...........<bla bla bla>
                          

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • Neculai I. FantanaruN
                            Neculai I. Fantanaru
                            last edited by Neculai I. Fantanaru

                            @guy038 said:

                            (?-i)<p class=“TEXTA”>(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

                            good morning. I try your regex, both, I don’t know why, but doesn’t select line number 6. Only the lines 2 and 3.

                            Meta ChuhM 1 Reply Last reply Reply Quote 0
                            • Meta ChuhM
                              Meta Chuh moderator @Neculai I. Fantanaru
                              last edited by Meta Chuh

                              @Neculai-I.-Fantanaru

                              again, you will have to add an empty line below 6. if it is the last line of your test document.
                              then @guy038 's regex will find line 6. correctly with your given example.
                              so your document must end with an empty line in order for the regex to work.

                              1 Reply Last reply Reply Quote 0
                              • Neculai I. FantanaruN
                                Neculai I. Fantanaru
                                last edited by

                                yes, ok, but if I have an .html file, I will never finnish with this line. :) So, for sure I have a lot of lines and other tags after line six :)

                                anyway, I get it. I remove the last part (?=\R) and works.

                                (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>

                                thank you @guy038

                                And Happy New Year everyone !!

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors