Community
    • Login

    regex: Match string not containing string

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    14 Posts 4 Posters 26.1k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN Offline
      Neculai I. Fantanaru
      last edited by

      yes, @gurikbal-singh , I inspired myself from a previous topic, but it’s not the same thing. This is why I open another topic, even if it’s close.

      1 Reply Last reply Reply Quote 0
      • Neculai I. FantanaruN Offline
        Neculai I. Fantanaru
        last edited by Neculai I. Fantanaru

        yes, I find the solution. I believe my mistake was the fact I also use (?-s) in a negative lookahead regex. Doesn’t work this way.

        <p class="TEXTA">(?:(?!<br>).)*$

        or

        (.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$

        1 Reply Last reply Reply Quote 0
        • guy038G Offline
          guy038
          last edited by guy038

          Hello, @neculai-i-fantanaru, and All

          In order to match complete non-empty lines which do NOT contain a specific string, let’s say, the word TEXT, with that exact case, here are, below, 5 regexes :

          • Regex A : (?-is)^(?!.*TEXT).+\R matches any line which does NOT contain the string TEXT

          • Regex B : (?-is)^(?!^TEXT).+\R matches any line which does NOT contain the string TEXT, at beginning of line

          • Regex C : (?-is)^(?!.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at end of line

          • Regex D : (?-is)^(?!^TEXT|.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at beginning OR at end of line

          • Regex E : (?-is)^(?!^.+TEXT.+$).+\R matches any line which does NOT contain the string TEXT, NOT at line boundaries


          In the table, below, the lines matched are noted with a X and therefore, will be deleted, if the Replace zone is empty

          •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
          |              Lines Scanned              |  Regex A  |  Regex B  |  Regex C  |  Regex D  |  Regex E  |
          •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
          |  TEST : I love you.                     |     X     |     X     |     X     |     X     |     X     |
          |  TEXT : She loves me.                   |           |           |     X     |           |     X     |
          |  ABCD : It is not about me.             |     X     |     X     |     X     |     X     |     X     |
          |  TEXT : You love her.                   |           |           |     X     |           |     X     |
          •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
          |  Statement "TEST" : I love you.         |     X     |     X     |     X     |     X     |     X     |
          |  Statement "TEXT" : She loves me.       |           |     X     |     X     |     X     |           |
          |  Statement "ABCD" : It is not about me. |     X     |     X     |     X     |     X     |     X     |
          |  Statement "TEXT" : You love her.       |           |     X     |     X     |     X     |           |
          •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
          |  I love you.          = TEST            |     X     |     X     |     X     |     X     |     X     |
          |  She loves me.        = TEXT            |           |     X     |           |           |     X     |
          |  It is not about me.  = ABCD            |     X     |     X     |     X     |     X     |     X     |
          |  You love her.        = TEXT            |           |     X     |           |           |     X     |
          •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
          

          Remark : Of course, for correct testing of these regexes, just copy the text provided, in that way :

          TEST : I love you.
          TEXT : She loves me.
          ABCD : It is not about me.
          TEXT : You love her.
          
          Statement "TEST" : I love you.
          Statement "TEXT" : She loves me.
          Statement "ABCD" : It is not about me.
          Statement "TEXT" : You love her.
          
          I love you.          = TEST
          She loves me.        = TEXT
          It is not about me.  = ABCD
          You love her.        = TEXT
          

          Now, to match all complete non-empty lines which do NOT contain the expression <p class="TEXT">, possibly preceded by some blank characters, with that exact case, use the regex :

          Regex F : (?-is)^(?!\h*<p class="TEXT">).+\R

          •-------------------------------------------•-----------•
          |               Lines Scanned               |  Regex F  |
          •-------------------------------------------•-----------•
          |  <p class="TEST">I love you.<br>          |     X     |
          |      <p class="TEXT">She loves me.<br>    |           |
          |  <p class="ABCD">It is not about me.<br>  |     X     |
          |  <p class="TEXT">You love her.<br>        |           |
          •-------------------------------------------•-----------•
          

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • Neculai I. FantanaruN Offline
            Neculai I. Fantanaru
            last edited by Neculai I. Fantanaru

            yes, @guy038 . but if I have opposite scenario:

            1. <p class="TEXTA">She loves me.</p>
            2. <p class="TEXTA">She loves me.<LLbr>
            3. <p class="TEXTA">It is not about me.<AAbr>
            4. <p class="TEXTA">She loves me.
               </p>
            

            And I want to select all tags which contains <p class="TEXTA"> but does not contains </p>. So I want to select only the 2 and 3 lines. How can I do this ?

            Meta ChuhM 1 Reply Last reply Reply Quote 0
            • Meta ChuhM Offline
              Meta Chuh moderator @Neculai I. Fantanaru
              last edited by

              @Neculai-I.-Fantanaru

              this regex will work on your new example:

              find what: ^(.*?)TEXTA(.*?)(.|\R)(.*?)</p>\R
              replace with: (leave empty)
              search mode: regular expression
              click on replace all

              so from a copy of your example:
              (it has to have an empty line after all texts for a correct newline \R detection on multi line <p>…</p> tags like the 4. you’ve given in your example)

              1. <p class="TEXTA">She loves me.</p>
              2. <p class="TEXTA">She loves me.<LLbr>
              3. <p class="TEXTA">It is not about me.<AAbr>
              4. <p class="TEXTA">She loves me.
                 </p>
              
              

              it will delete everything and leave you with:

              2. <p class="TEXTA">She loves me.<LLbr>
              3. <p class="TEXTA">It is not about me.<AAbr>
              
              

              if this does not work with your real data, please provide us with a real data example and how your result should look like

              1 Reply Last reply Reply Quote 1
              • guy038G Offline
                guy038
                last edited by guy038

                Hi, @neculai-i-fantanaru, and All

                In that case, you could use the regex (?-i)<p class="TEXTA">[^<>]+<(?!/p).+?>

                Notes :

                • First, this regex looks for the literal expression <p class="TEXTA">, with that exact case

                • Followed with a non-empty range of characters, either different from < and >, till an < symbol

                • Followed with a non-empty range of standard characters till the nearest > symbol, but ONLY IF the string /p cannot be found, right after the < symbol !

                Just test it, with that text below :

                <p class="TEXTA">She loves me.</p>
                <p class="TEXTA">an other
                test </abc>
                <p class="TEXTA">She loves me.</p>      <p class="TEXTA">She loves me.</123>
                   <p class="TEXTA">She loves me.<LLbr>    <p class="TEXTA">She loves me.</p>
                   <p class="TEXTA">She loves me.
                   </p>
                <p class="TEXTA">She loves me.<LLbr>
                <p class="TEXTA">It is not about me.<AAbr>     <p class="TEXTA">It is
                 not about
                 me.<p>
                
                <p class="TEXTA">She loves me.
                   </p>
                <p class="TEXTA">It is not about me.<AAbr>
                

                As I suppose that you would like to replace any bad ending tag, like </abc>, /123, <LLbr>, <AAbr>, or even <p> with the right ending tag </p>, use the following regex S/R :

                SEARCH <p class="TEXTA">[^<>]+<\K(?!/p).+?(?=>)

                REPLACE /p

                Notes :

                • If you just perform the search part, it just matches any bad ending tag, without the < and > boundaries, which is different from /p

                • Remember that the \K syntax forces the regex engine to forget everything already matched and reset the working position to the location, right after the < symbol !

                • If you click on the Replace All button ( not the Replace one ), any bad ending tag is then changed into </p>

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 1
                • Neculai I. FantanaruN Offline
                  Neculai I. Fantanaru
                  last edited by

                  @guy038 said:

                  (?-i)<p class=“TEXTA”>[^<>]+<(?!/p).+?>

                  Your regex is great. But I just find another case that you may update regex, if you want. Strange thing. I did not take this into account. That can be some other tags in the same tag. For example:

                  <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                  

                  So, to update my last scenario:

                  1. <p class="TEXTA">She loves me.</p>
                  2. <p class="TEXTA">She loves me.<LLbr>
                  3. <p class="TEXTA">It is not about me.<AAbr>
                  4. <p class="TEXTA">She loves me.
                     </p>
                  5.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                  6.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
                  

                  So, the regex it should select lines 2,3 and 6 . Right now, your regex select also the line 5 (because of that 2 <em></em> witch is not good).

                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by guy038

                    @neculai-i-fantanaru, and All

                    Ah, OK ! So, I’ve created a regex, using a recursive pattern ( due to the (?1) subroutine to group 1, located inside the group whose it refers to ), which allows the search of any block :

                    • Beginning with the tag <p class="TEXTA">

                    • Ending with a tag, different from </p>, which ends the line

                    • Containing any correct matched areas <tag>.....<tag, possibly juxtaposed and/or nested, as for instance :

                    <p class="TEXTA">.......<abc>.....<def>...
                    ....</def>...........</abc>.........<123>........</123>......<456>....
                    ...</456>............
                    ........<Niv1>.......<Niv2>.........<Niv3>......
                    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                    ..............</Niv3>..........
                    .......</Niv2>..........
                    .........</Niv1>...........<bla bla bla>
                    

                    Highly unlikely case, isn’t it !

                    So, here is the regex :

                    (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

                    And, again, if you just want to catch the wrong ending tag use the regex :

                    (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<\K(?!/p)[^<>]+?(?=>\R)


                    Test these regexes, against text below. Note that they match only the blocks with even numbers ( 2, 4, 6, … )

                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    1. <p class="TEXTA">She loves me.</p>
                    
                    2. <p class="TEXTA">She loves me.<LLbr>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    3. <p class="TEXTA">It is not
                             about me
                        </p>
                    
                    4. <p class="TEXTA">It is not
                             about me.
                        <AAbr>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    5. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
                    
                    6. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    7. <p class="TEXTA">I believe in love<em>but
                           only 
                         if</em>you can make me smile</p>
                    
                    8. <p class="TEXTA">I believe in love<em>but
                           only 
                         if</em>you can make me smile</html>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    9.  <p class="TEXTA">I believe in love<12345>but
                        only 
                        if</12345>you can make me smile</p>
                    
                    10. <p class="TEXTA">I believe in love<12345>but
                        only 
                        if</12345>you can make me smile</div>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    11. <p class="TEXTA">I believe<em> in love<em>but
                      only 
                    if</em>you can ma</em>ke me smile</p>
                    
                    12. <p class="TEXTA">I believe<em> in love<em>but
                      only 
                    if</em>you can ma</em>ke me smile<h3>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    13.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
                         only 
                         if</em>you can ma</def>ke me smi</em>le</p>
                    
                    14.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
                         only 
                         if</em>you can ma</def>ke me smi</em>le</body>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    15.
                    <p class="TEXTA">I be<abc>lieve<def> in love<em>but
                    only 
                    if</em>you can ma</def>ke me smi</abc>le</p>
                    
                    16.
                    <p class="TEXTA">I be<abc>lieve<def> in love<em>but
                    only 
                    if</em>you can ma</def>ke me smi</abc>le<abcde>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    17. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</p>
                    
                    18. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</a>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    19. <p class="TEXTA">I <code>believe </code>in love<em>but
                          only if<123>you </123>can 
                          make </em>me
                       smile</p>
                    
                    20. <p class="TEXTA">I <code>believe </code>in love<em>but
                          only if<123>you </123>can 
                          make </em>me
                       smile</tr>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    21.<p class="TEXTA"></p>
                    
                    22.<p class="TEXTA"><script>
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    23.   <p class="TEXTA">.......<abc>.....<def>...
                    ....</def>...........</abc>.........<123>........</123>......<456>....
                    ...</456>............
                    ........<Niv1>.......<Niv2>.........<Niv3>......
                    ..............</Niv3>..........
                    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                    .......</Niv2>..........
                    .........</Niv1>...........</p>
                    
                    24..   <p class="TEXTA">.......<abc>.....<def>...
                    ....</def>...........</abc>.........<123>........</123>......<456>....
                    ...</456>............
                    ........<Niv1>.......<Niv2>.........<Niv3>......
                    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
                    ..............</Niv3>..........
                    ...<Niv3>.......<XXX>...........</XXX>............</Niv3>......
                    .......</Niv2>..........
                    .........</Niv1>...........<bla bla bla>
                    

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • Neculai I. FantanaruN Offline
                      Neculai I. Fantanaru
                      last edited by Neculai I. Fantanaru

                      @guy038 said:

                      (?-i)<p class=“TEXTA”>(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

                      good morning. I try your regex, both, I don’t know why, but doesn’t select line number 6. Only the lines 2 and 3.

                      Meta ChuhM 1 Reply Last reply Reply Quote 0
                      • Meta ChuhM Offline
                        Meta Chuh moderator @Neculai I. Fantanaru
                        last edited by Meta Chuh

                        @Neculai-I.-Fantanaru

                        again, you will have to add an empty line below 6. if it is the last line of your test document.
                        then @guy038 's regex will find line 6. correctly with your given example.
                        so your document must end with an empty line in order for the regex to work.

                        1 Reply Last reply Reply Quote 0
                        • Neculai I. FantanaruN Offline
                          Neculai I. Fantanaru
                          last edited by

                          yes, ok, but if I have an .html file, I will never finnish with this line. :) So, for sure I have a lot of lines and other tags after line six :)

                          anyway, I get it. I remove the last part (?=\R) and works.

                          (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>

                          thank you @guy038

                          And Happy New Year everyone !!

                          1 Reply Last reply Reply Quote 0

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors