regex: Match string not containing string



  • hi @Neculai-I.-Fantanaru
    merry christmas to you too !!

    this regex will work on your given example:

    find what: ^(.*?)TESTA(.*?)(\r\n|\r|\n)
    replace with: (leave empty)
    search mode: regular expression
    click on replace all

    so from your example:

    <p class="TESTA">I love you.<br>
    <p class="TESTA">You love her.<br>
    <p class="TEXTA">She loves me.<LLbr>
    <p class="TEXTA">It is not about me.<AAbr>
    

    every line will be deleted except:

    <p class="TEXTA">She loves me.<LLbr>
    <p class="TEXTA">It is not about me.<AAbr>
    


  • yes, @gurikbal-singh , I inspired myself from a previous topic, but it’s not the same thing. This is why I open another topic, even if it’s close.



  • yes, I find the solution. I believe my mistake was the fact I also use (?-s) in a negative lookahead regex. Doesn’t work this way.

    <p class="TEXTA">(?:(?!<br>).)*$

    or

    (.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$



  • Hello, @neculai-i-fantanaru, and All

    In order to match complete non-empty lines which do NOT contain a specific string, let’s say, the word TEXT, with that exact case, here are, below, 5 regexes :

    • Regex A : (?-is)^(?!.*TEXT).+\R matches any line which does NOT contain the string TEXT

    • Regex B : (?-is)^(?!^TEXT).+\R matches any line which does NOT contain the string TEXT, at beginning of line

    • Regex C : (?-is)^(?!.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at end of line

    • Regex D : (?-is)^(?!^TEXT|.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at beginning OR at end of line

    • Regex E : (?-is)^(?!^.+TEXT.+$).+\R matches any line which does NOT contain the string TEXT, NOT at line boundaries


    In the table, below, the lines matched are noted with a X and therefore, will be deleted, if the Replace zone is empty

    •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
    |              Lines Scanned              |  Regex A  |  Regex B  |  Regex C  |  Regex D  |  Regex E  |
    •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
    |  TEST : I love you.                     |     X     |     X     |     X     |     X     |     X     |
    |  TEXT : She loves me.                   |           |           |     X     |           |     X     |
    |  ABCD : It is not about me.             |     X     |     X     |     X     |     X     |     X     |
    |  TEXT : You love her.                   |           |           |     X     |           |     X     |
    •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
    |  Statement "TEST" : I love you.         |     X     |     X     |     X     |     X     |     X     |
    |  Statement "TEXT" : She loves me.       |           |     X     |     X     |     X     |           |
    |  Statement "ABCD" : It is not about me. |     X     |     X     |     X     |     X     |     X     |
    |  Statement "TEXT" : You love her.       |           |     X     |     X     |     X     |           |
    •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
    |  I love you.          = TEST            |     X     |     X     |     X     |     X     |     X     |
    |  She loves me.        = TEXT            |           |     X     |           |           |     X     |
    |  It is not about me.  = ABCD            |     X     |     X     |     X     |     X     |     X     |
    |  You love her.        = TEXT            |           |     X     |           |           |     X     |
    •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
    

    Remark : Of course, for correct testing of these regexes, just copy the text provided, in that way :

    TEST : I love you.
    TEXT : She loves me.
    ABCD : It is not about me.
    TEXT : You love her.
    
    Statement "TEST" : I love you.
    Statement "TEXT" : She loves me.
    Statement "ABCD" : It is not about me.
    Statement "TEXT" : You love her.
    
    I love you.          = TEST
    She loves me.        = TEXT
    It is not about me.  = ABCD
    You love her.        = TEXT
    

    Now, to match all complete non-empty lines which do NOT contain the expression <p class="TEXT">, possibly preceded by some blank characters, with that exact case, use the regex :

    Regex F : (?-is)^(?!\h*<p class="TEXT">).+\R

    •-------------------------------------------•-----------•
    |               Lines Scanned               |  Regex F  |
    •-------------------------------------------•-----------•
    |  <p class="TEST">I love you.<br>          |     X     |
    |      <p class="TEXT">She loves me.<br>    |           |
    |  <p class="ABCD">It is not about me.<br>  |     X     |
    |  <p class="TEXT">You love her.<br>        |           |
    •-------------------------------------------•-----------•
    

    Best Regards,

    guy038



  • yes, @guy038 . but if I have opposite scenario:

    1. <p class="TEXTA">She loves me.</p>
    2. <p class="TEXTA">She loves me.<LLbr>
    3. <p class="TEXTA">It is not about me.<AAbr>
    4. <p class="TEXTA">She loves me.
       </p>
    

    And I want to select all tags which contains <p class="TEXTA"> but does not contains </p>. So I want to select only the 2 and 3 lines. How can I do this ?



  • @Neculai-I.-Fantanaru

    this regex will work on your new example:

    find what: ^(.*?)TEXTA(.*?)(.|\R)(.*?)</p>\R
    replace with: (leave empty)
    search mode: regular expression
    click on replace all

    so from a copy of your example:
    (it has to have an empty line after all texts for a correct newline \R detection on multi line <p>…</p> tags like the 4. you’ve given in your example)

    1. <p class="TEXTA">She loves me.</p>
    2. <p class="TEXTA">She loves me.<LLbr>
    3. <p class="TEXTA">It is not about me.<AAbr>
    4. <p class="TEXTA">She loves me.
       </p>
    
    

    it will delete everything and leave you with:

    2. <p class="TEXTA">She loves me.<LLbr>
    3. <p class="TEXTA">It is not about me.<AAbr>
    
    

    if this does not work with your real data, please provide us with a real data example and how your result should look like



  • Hi, @neculai-i-fantanaru, and All

    In that case, you could use the regex (?-i)<p class="TEXTA">[^<>]+<(?!/p).+?>

    Notes :

    • First, this regex looks for the literal expression <p class="TEXTA">, with that exact case

    • Followed with a non-empty range of characters, either different from < and >, till an < symbol

    • Followed with a non-empty range of standard characters till the nearest > symbol, but ONLY IF the string /p cannot be found, right after the < symbol !

    Just test it, with that text below :

    <p class="TEXTA">She loves me.</p>
    <p class="TEXTA">an other
    test </abc>
    <p class="TEXTA">She loves me.</p>      <p class="TEXTA">She loves me.</123>
       <p class="TEXTA">She loves me.<LLbr>    <p class="TEXTA">She loves me.</p>
       <p class="TEXTA">She loves me.
       </p>
    <p class="TEXTA">She loves me.<LLbr>
    <p class="TEXTA">It is not about me.<AAbr>     <p class="TEXTA">It is
     not about
     me.<p>
    
    <p class="TEXTA">She loves me.
       </p>
    <p class="TEXTA">It is not about me.<AAbr>
    

    As I suppose that you would like to replace any bad ending tag, like </abc>, /123, <LLbr>, <AAbr>, or even <p> with the right ending tag </p>, use the following regex S/R :

    SEARCH <p class="TEXTA">[^<>]+<\K(?!/p).+?(?=>)

    REPLACE /p

    Notes :

    • If you just perform the search part, it just matches any bad ending tag, without the < and > boundaries, which is different from /p

    • Remember that the \K syntax forces the regex engine to forget everything already matched and reset the working position to the location, right after the < symbol !

    • If you click on the Replace All button ( not the Replace one ), any bad ending tag is then changed into </p>

    Cheers,

    guy038



  • @guy038 said:

    (?-i)<p class=“TEXTA”>[^<>]+<(?!/p).+?>

    Your regex is great. But I just find another case that you may update regex, if you want. Strange thing. I did not take this into account. That can be some other tags in the same tag. For example:

    <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
    

    So, to update my last scenario:

    1. <p class="TEXTA">She loves me.</p>
    2. <p class="TEXTA">She loves me.<LLbr>
    3. <p class="TEXTA">It is not about me.<AAbr>
    4. <p class="TEXTA">She loves me.
       </p>
    5.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
    6.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
    

    So, the regex it should select lines 2,3 and 6 . Right now, your regex select also the line 5 (because of that 2 <em></em> witch is not good).



  • @neculai-i-fantanaru, and All

    Ah, OK ! So, I’ve created a regex, using a recursive pattern ( due to the (?1) subroutine to group 1, located inside the group whose it refers to ), which allows the search of any block :

    • Beginning with the tag <p class="TEXTA">

    • Ending with a tag, different from </p>, which ends the line

    • Containing any correct matched areas <tag>.....<tag, possibly juxtaposed and/or nested, as for instance :

    <p class="TEXTA">.......<abc>.....<def>...
    ....</def>...........</abc>.........<123>........</123>......<456>....
    ...</456>............
    ........<Niv1>.......<Niv2>.........<Niv3>......
    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
    ..............</Niv3>..........
    .......</Niv2>..........
    .........</Niv1>...........<bla bla bla>
    

    Highly unlikely case, isn’t it !

    So, here is the regex :

    (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

    And, again, if you just want to catch the wrong ending tag use the regex :

    (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<\K(?!/p)[^<>]+?(?=>\R)


    Test these regexes, against text below. Note that they match only the blocks with even numbers ( 2, 4, 6, … )

    ------------------------------------------------------------------------------------------------------------
    1. <p class="TEXTA">She loves me.</p>
    
    2. <p class="TEXTA">She loves me.<LLbr>
    ------------------------------------------------------------------------------------------------------------
    3. <p class="TEXTA">It is not
             about me
        </p>
    
    4. <p class="TEXTA">It is not
             about me.
        <AAbr>
    ------------------------------------------------------------------------------------------------------------
    5. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
    
    6. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
    ------------------------------------------------------------------------------------------------------------
    7. <p class="TEXTA">I believe in love<em>but
           only 
         if</em>you can make me smile</p>
    
    8. <p class="TEXTA">I believe in love<em>but
           only 
         if</em>you can make me smile</html>
    ------------------------------------------------------------------------------------------------------------
    9.  <p class="TEXTA">I believe in love<12345>but
        only 
        if</12345>you can make me smile</p>
    
    10. <p class="TEXTA">I believe in love<12345>but
        only 
        if</12345>you can make me smile</div>
    ------------------------------------------------------------------------------------------------------------
    11. <p class="TEXTA">I believe<em> in love<em>but
      only 
    if</em>you can ma</em>ke me smile</p>
    
    12. <p class="TEXTA">I believe<em> in love<em>but
      only 
    if</em>you can ma</em>ke me smile<h3>
    ------------------------------------------------------------------------------------------------------------
    13.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
         only 
         if</em>you can ma</def>ke me smi</em>le</p>
    
    14.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
         only 
         if</em>you can ma</def>ke me smi</em>le</body>
    ------------------------------------------------------------------------------------------------------------
    15.
    <p class="TEXTA">I be<abc>lieve<def> in love<em>but
    only 
    if</em>you can ma</def>ke me smi</abc>le</p>
    
    16.
    <p class="TEXTA">I be<abc>lieve<def> in love<em>but
    only 
    if</em>you can ma</def>ke me smi</abc>le<abcde>
    ------------------------------------------------------------------------------------------------------------
    17. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</p>
    
    18. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</a>
    ------------------------------------------------------------------------------------------------------------
    19. <p class="TEXTA">I <code>believe </code>in love<em>but
          only if<123>you </123>can 
          make </em>me
       smile</p>
    
    20. <p class="TEXTA">I <code>believe </code>in love<em>but
          only if<123>you </123>can 
          make </em>me
       smile</tr>
    ------------------------------------------------------------------------------------------------------------
    21.<p class="TEXTA"></p>
    
    22.<p class="TEXTA"><script>
    ------------------------------------------------------------------------------------------------------------
    23.   <p class="TEXTA">.......<abc>.....<def>...
    ....</def>...........</abc>.........<123>........</123>......<456>....
    ...</456>............
    ........<Niv1>.......<Niv2>.........<Niv3>......
    ..............</Niv3>..........
    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
    .......</Niv2>..........
    .........</Niv1>...........</p>
    
    24..   <p class="TEXTA">.......<abc>.....<def>...
    ....</def>...........</abc>.........<123>........</123>......<456>....
    ...</456>............
    ........<Niv1>.......<Niv2>.........<Niv3>......
    ...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
    ..............</Niv3>..........
    .......</Niv2>..........
    .........</Niv1>...........<bla bla bla>
    ------------------------------------------------------------------------------------------------------------
    

    Best Regards,

    guy038



  • @guy038 said:

    (?-i)<p class=“TEXTA”>(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

    good morning. I try your regex, both, I don’t know why, but doesn’t select line number 6. Only the lines 2 and 3.



  • @Neculai-I.-Fantanaru

    again, you will have to add an empty line below 6. if it is the last line of your test document.
    then @guy038 's regex will find line 6. correctly with your given example.
    so your document must end with an empty line in order for the regex to work.



  • yes, ok, but if I have an .html file, I will never finnish with this line. :) So, for sure I have a lot of lines and other tags after line six :)

    anyway, I get it. I remove the last part (?=\R) and works.

    (?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>

    thank you @guy038

    And Happy New Year everyone !!


Log in to reply