regex: Match string not containing string

Meta Chuh

hi @Neculai-I.-Fantanaru
merry christmas to you too !!

this regex will work on your given example:

find what: ^(.*?)TESTA(.*?)(\r\n|\r|\n)
replace with: (leave empty)
search mode: regular expression
click on replace all

so from your example:

<p class="TESTA">I love you.<br>
<p class="TESTA">You love her.<br>
<p class="TEXTA">She loves me.<LLbr>
<p class="TEXTA">It is not about me.<AAbr>

every line will be deleted except:

<p class="TEXTA">She loves me.<LLbr>
<p class="TEXTA">It is not about me.<AAbr>

Neculai I. Fantanaru

yes, @gurikbal-singh , I inspired myself from a previous topic, but it’s not the same thing. This is why I open another topic, even if it’s close.

Neculai I. Fantanaru

yes, I find the solution. I believe my mistake was the fact I also use (?-s) in a negative lookahead regex. Doesn’t work this way.

(?:(?! ).)*$

or

(.*.*)(?:(?!\b \b))(.*)$

guy038

Hello, @neculai-i-fantanaru, and All

In order to match complete non-empty lines which do NOT contain a specific string, let’s say, the word TEXT, with that exact case, here are, below, 5 regexes :

Regex A : (?-is)^(?!.*TEXT).+\R matches any line which does NOT contain the string TEXT
Regex B : (?-is)^(?!^TEXT).+\R matches any line which does NOT contain the string TEXT, at beginning of line
Regex C : (?-is)^(?!.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at end of line
Regex D : (?-is)^(?!^TEXT|.*TEXT$).+\R matches any line which does NOT contain the string TEXT, at beginning OR at end of line
Regex E : (?-is)^(?!^.+TEXT.+$).+\R matches any line which does NOT contain the string TEXT, NOT at line boundaries

In the table, below, the lines matched are noted with a X and therefore, will be deleted, if the Replace zone is empty

•-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
|              Lines Scanned              |  Regex A  |  Regex B  |  Regex C  |  Regex D  |  Regex E  |
•-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
|  TEST : I love you.                     |     X     |     X     |     X     |     X     |     X     |
|  TEXT : She loves me.                   |           |           |     X     |           |     X     |
|  ABCD : It is not about me.             |     X     |     X     |     X     |     X     |     X     |
|  TEXT : You love her.                   |           |           |     X     |           |     X     |
•-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
|  Statement "TEST" : I love you.         |     X     |     X     |     X     |     X     |     X     |
|  Statement "TEXT" : She loves me.       |           |     X     |     X     |     X     |           |
|  Statement "ABCD" : It is not about me. |     X     |     X     |     X     |     X     |     X     |
|  Statement "TEXT" : You love her.       |           |     X     |     X     |     X     |           |
•-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
|  I love you.          = TEST            |     X     |     X     |     X     |     X     |     X     |
|  She loves me.        = TEXT            |           |     X     |           |           |     X     |
|  It is not about me.  = ABCD            |     X     |     X     |     X     |     X     |     X     |
|  You love her.        = TEXT            |           |     X     |           |           |     X     |
•-----------------------------------------•-----------•-----------•-----------•-----------•-----------•

Remark : Of course, for correct testing of these regexes, just copy the text provided, in that way :

TEST : I love you.
TEXT : She loves me.
ABCD : It is not about me.
TEXT : You love her.

Statement "TEST" : I love you.
Statement "TEXT" : She loves me.
Statement "ABCD" : It is not about me.
Statement "TEXT" : You love her.

I love you.          = TEST
She loves me.        = TEXT
It is not about me.  = ABCD
You love her.        = TEXT

Now, to match all complete non-empty lines which do NOT contain the expression , possibly preceded by some blank characters, with that exact case, use the regex :

Regex F : (?-is)^(?!\h*).+\R

•-------------------------------------------•-----------•
|               Lines Scanned               |  Regex F  |
•-------------------------------------------•-----------•
|  <p class="TEST">I love you.<br>          |     X     |
|      <p class="TEXT">She loves me.<br>    |           |
|  <p class="ABCD">It is not about me.<br>  |     X     |
|  <p class="TEXT">You love her.<br>        |           |
•-------------------------------------------•-----------•

Best Regards,

guy038

Neculai I. Fantanaru

yes, @guy038 . but if I have opposite scenario:

1. <p class="TEXTA">She loves me.</p>
2. <p class="TEXTA">She loves me.<LLbr>
3. <p class="TEXTA">It is not about me.<AAbr>
4. <p class="TEXTA">She loves me.
   </p>

And I want to select all tags which contains  but does not contains . So I want to select only the 2 and 3 lines. How can I do this ?

Meta Chuh

@Neculai-I.-Fantanaru

this regex will work on your new example:

find what: ^(.*?)TEXTA(.*?)(.|\R)(.*?)\R
replace with: (leave empty)
search mode: regular expression
click on replace all

so from a copy of your example:
(it has to have an empty line after all texts for a correct newline \R detection on multi line … tags like the 4. you’ve given in your example)

1. <p class="TEXTA">She loves me.</p>
2. <p class="TEXTA">She loves me.<LLbr>
3. <p class="TEXTA">It is not about me.<AAbr>
4. <p class="TEXTA">She loves me.
   </p>

it will delete everything and leave you with:

2. <p class="TEXTA">She loves me.<LLbr>
3. <p class="TEXTA">It is not about me.<AAbr>

if this does not work with your real data, please provide us with a real data example and how your result should look like

guy038

Hi, @neculai-i-fantanaru, and All

In that case, you could use the regex (?-i)[^<>]+<(?!/p).+?>

Notes :

First, this regex looks for the literal expression , with that exact case
Followed with a non-empty range of characters, either different from < and >, till an < symbol
Followed with a non-empty range of standard characters till the nearest > symbol, but ONLY IF the string /p cannot be found, right after the < symbol !

Just test it, with that text below :

<p class="TEXTA">She loves me.</p>
<p class="TEXTA">an other
test </abc>
<p class="TEXTA">She loves me.</p>      <p class="TEXTA">She loves me.</123>
   <p class="TEXTA">She loves me.<LLbr>    <p class="TEXTA">She loves me.</p>
   <p class="TEXTA">She loves me.
   </p>
<p class="TEXTA">She loves me.<LLbr>
<p class="TEXTA">It is not about me.<AAbr>     <p class="TEXTA">It is
 not about
 me.<p>

<p class="TEXTA">She loves me.
   </p>
<p class="TEXTA">It is not about me.<AAbr>

As I suppose that you would like to replace any bad ending tag, like </abc>, /123, <LLbr>, <AAbr>, or even  with the right ending tag , use the following regex S/R :

SEARCH [^<>]+<\K(?!/p).+?(?=>)

REPLACE /p

Notes :

If you just perform the search part, it just matches any bad ending tag, without the < and > boundaries, which is different from /p
Remember that the \K syntax forces the regex engine to forget everything already matched and reset the working position to the location, right after the < symbol !
If you click on the Replace All button ( not the Replace one ), any bad ending tag is then changed into

Cheers,

guy038

Neculai I. Fantanaru

@guy038 said:

(?-i)[^<>]+<(?!/p).+?>

Your regex is great. But I just find another case that you may update regex, if you want. Strange thing. I did not take this into account. That can be some other tags in the same tag. For example:

<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>

So, to update my last scenario:

1. <p class="TEXTA">She loves me.</p>
2. <p class="TEXTA">She loves me.<LLbr>
3. <p class="TEXTA">It is not about me.<AAbr>
4. <p class="TEXTA">She loves me.
   </p>
5.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
6.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>

So, the regex it should select lines 2,3 and 6 . Right now, your regex select also the line 5 (because of that 2  witch is not good).

guy038

@neculai-i-fantanaru, and All

Ah, OK ! So, I’ve created a regex, using a recursive pattern ( due to the (?1) subroutine to group 1, located inside the group whose it refers to ), which allows the search of any block :

Beginning with the tag 
Ending with a tag, different from , which ends the line
Containing any correct matched areas <tag>.....<tag, possibly juxtaposed and/or nested, as for instance :

<p class="TEXTA">.......<abc>.....<def>...
....</def>...........</abc>.........<123>........</123>......<456>....
...</456>............
........<Niv1>.......<Niv2>.........<Niv3>......
...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
..............</Niv3>..........
.......</Niv2>..........
.........</Niv1>...........<bla bla bla>

Highly unlikely case, isn’t it !

So, here is the regex :

(?-i)(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

And, again, if you just want to catch the wrong ending tag use the regex :

(?-i)(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<\K(?!/p)[^<>]+?(?=>\R)

Test these regexes, against text below. Note that they match only the blocks with even numbers ( 2, 4, 6, … )

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. <p class="TEXTA">She loves me.</p>

2. <p class="TEXTA">She loves me.<LLbr>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3. <p class="TEXTA">It is not
         about me
    </p>

4. <p class="TEXTA">It is not
         about me.
    <AAbr>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>

6. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7. <p class="TEXTA">I believe in love<em>but
       only 
     if</em>you can make me smile</p>

8. <p class="TEXTA">I believe in love<em>but
       only 
     if</em>you can make me smile</html>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9.  <p class="TEXTA">I believe in love<12345>but
    only 
    if</12345>you can make me smile</p>

10. <p class="TEXTA">I believe in love<12345>but
    only 
    if</12345>you can make me smile</div>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11. <p class="TEXTA">I believe<em> in love<em>but
  only 
if</em>you can ma</em>ke me smile</p>

12. <p class="TEXTA">I believe<em> in love<em>but
  only 
if</em>you can ma</em>ke me smile<h3>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
     only 
     if</em>you can ma</def>ke me smi</em>le</p>

14.  <p class="TEXTA">I be<em>lieve<def> in love<em>but
     only 
     if</em>you can ma</def>ke me smi</em>le</body>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15.
<p class="TEXTA">I be<abc>lieve<def> in love<em>but
only 
if</em>you can ma</def>ke me smi</abc>le</p>

16.
<p class="TEXTA">I be<abc>lieve<def> in love<em>but
only 
if</em>you can ma</def>ke me smi</abc>le<abcde>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</p>

18. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</a>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19. <p class="TEXTA">I <code>believe </code>in love<em>but
      only if<123>you </123>can 
      make </em>me
   smile</p>

20. <p class="TEXTA">I <code>believe </code>in love<em>but
      only if<123>you </123>can 
      make </em>me
   smile</tr>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21.<p class="TEXTA"></p>

22.<p class="TEXTA"><script>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
23.   <p class="TEXTA">.......<abc>.....<def>...
....</def>...........</abc>.........<123>........</123>......<456>....
...</456>............
........<Niv1>.......<Niv2>.........<Niv3>......
..............</Niv3>..........
...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
.......</Niv2>..........
.........</Niv1>...........</p>

24..   <p class="TEXTA">.......<abc>.....<def>...
....</def>...........</abc>.........<123>........</123>......<456>....
...</456>............
........<Niv1>.......<Niv2>.........<Niv3>......
...<Niv4>.......<XXX>...........</XXX>............</Niv4>......
..............</Niv3>..........
...<Niv3>.......<XXX>...........</XXX>............</Niv3>......
.......</Niv2>..........
.........</Niv1>...........<bla bla bla>

Best Regards,

guy038

Neculai I. Fantanaru

@guy038 said:

(?-i)(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)

good morning. I try your regex, both, I don’t know why, but doesn’t select line number 6. Only the lines 2 and 3.

Meta Chuh

@Neculai-I.-Fantanaru

again, you will have to add an empty line below 6. if it is the last line of your test document.
then @guy038 's regex will find line 6. correctly with your given example.
so your document must end with an empty line in order for the regex to work.

Neculai I. Fantanaru

yes, ok, but if I have an .html file, I will never finnish with this line. :) So, for sure I have a lot of lines and other tags after line six :)

anyway, I get it. I remove the last part (?=\R) and works.

(?-i)(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>

thank you @guy038

And Happy New Year everyone !!