regex: Match string not containing string
-
Good day. and Merry Christmas !!
I have this lines all starting with tag <p class=“TESTA”> and ending with tag <br> except the last two.
<p class="TESTA">I love you.<br> <p class="TESTA">You love her.<br> <p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">It is not about me.<AAbr>
My output result should be
<p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">It is not about me.<AAbr>
I made a regex, but is not too good. I use
?!
and\b
but seems it is not a very good idea.(?-s)(.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$
-
-
hi @Neculai-I.-Fantanaru
merry christmas to you too !!this regex will work on your given example:
find what:
^(.*?)TESTA(.*?)(\r\n|\r|\n)
replace with: (leave empty)
search mode: regular expression
click on replace allso from your example:
<p class="TESTA">I love you.<br> <p class="TESTA">You love her.<br> <p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">It is not about me.<AAbr>
every line will be deleted except:
<p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">It is not about me.<AAbr>
-
yes, @gurikbal-singh , I inspired myself from a previous topic, but it’s not the same thing. This is why I open another topic, even if it’s close.
-
yes, I find the solution. I believe my mistake was the fact I also use
(?-s)
in a negative lookahead regex. Doesn’t work this way.<p class="TEXTA">(?:(?!<br>).)*$
or
(.*<p class="TEXTA">.*)(?:(?!\b<br>\b))(.*)$
-
Hello, @neculai-i-fantanaru, and All
In order to match complete non-empty lines which do NOT contain a specific string, let’s say, the word TEXT, with that exact case, here are, below,
5
regexes :-
Regex A :
(?-is)^(?!.*TEXT).+\R
matches any line which does NOT contain the stringTEXT
-
Regex B :
(?-is)^(?!^TEXT).+\R
matches any line which does NOT contain the stringTEXT
, at beginning of line -
Regex C :
(?-is)^(?!.*TEXT$).+\R
matches any line which does NOT contain the stringTEXT
, at end of line -
Regex D :
(?-is)^(?!^TEXT|.*TEXT$).+\R
matches any line which does NOT contain the stringTEXT
, at beginning OR at end of line -
Regex E :
(?-is)^(?!^.+TEXT.+$).+\R
matches any line which does NOT contain the stringTEXT
, NOT at line boundaries
In the table, below, the lines matched are noted with a
X
and therefore, will be deleted, if the Replace zone isempty
•-----------------------------------------•-----------•-----------•-----------•-----------•-----------• | Lines Scanned | Regex A | Regex B | Regex C | Regex D | Regex E | •-----------------------------------------•-----------•-----------•-----------•-----------•-----------• | TEST : I love you. | X | X | X | X | X | | TEXT : She loves me. | | | X | | X | | ABCD : It is not about me. | X | X | X | X | X | | TEXT : You love her. | | | X | | X | •-----------------------------------------•-----------•-----------•-----------•-----------•-----------• | Statement "TEST" : I love you. | X | X | X | X | X | | Statement "TEXT" : She loves me. | | X | X | X | | | Statement "ABCD" : It is not about me. | X | X | X | X | X | | Statement "TEXT" : You love her. | | X | X | X | | •-----------------------------------------•-----------•-----------•-----------•-----------•-----------• | I love you. = TEST | X | X | X | X | X | | She loves me. = TEXT | | X | | | X | | It is not about me. = ABCD | X | X | X | X | X | | You love her. = TEXT | | X | | | X | •-----------------------------------------•-----------•-----------•-----------•-----------•-----------•
Remark : Of course, for correct testing of these regexes, just copy the text provided, in that way :
TEST : I love you. TEXT : She loves me. ABCD : It is not about me. TEXT : You love her. Statement "TEST" : I love you. Statement "TEXT" : She loves me. Statement "ABCD" : It is not about me. Statement "TEXT" : You love her. I love you. = TEST She loves me. = TEXT It is not about me. = ABCD You love her. = TEXT
Now, to match all complete non-empty lines which do NOT contain the expression
<p class="TEXT">
, possibly preceded by some blank characters, with that exact case, use the regex :Regex F :
(?-is)^(?!\h*<p class="TEXT">).+\R
•-------------------------------------------•-----------• | Lines Scanned | Regex F | •-------------------------------------------•-----------• | <p class="TEST">I love you.<br> | X | | <p class="TEXT">She loves me.<br> | | | <p class="ABCD">It is not about me.<br> | X | | <p class="TEXT">You love her.<br> | | •-------------------------------------------•-----------•
Best Regards,
guy038
-
-
yes, @guy038 . but if I have opposite scenario:
1. <p class="TEXTA">She loves me.</p> 2. <p class="TEXTA">She loves me.<LLbr> 3. <p class="TEXTA">It is not about me.<AAbr> 4. <p class="TEXTA">She loves me. </p>
And I want to select all tags which contains
<p class="TEXTA">
but does not contains</p>
. So I want to select only the 2 and 3 lines. How can I do this ? -
this regex will work on your new example:
find what:
^(.*?)TEXTA(.*?)(.|\R)(.*?)</p>\R
replace with: (leave empty)
search mode: regular expression
click on replace allso from a copy of your example:
(it has to have an empty line after all texts for a correct newline \R detection on multi line <p>…</p> tags like the 4. you’ve given in your example)1. <p class="TEXTA">She loves me.</p> 2. <p class="TEXTA">She loves me.<LLbr> 3. <p class="TEXTA">It is not about me.<AAbr> 4. <p class="TEXTA">She loves me. </p>
it will delete everything and leave you with:
2. <p class="TEXTA">She loves me.<LLbr> 3. <p class="TEXTA">It is not about me.<AAbr>
if this does not work with your real data, please provide us with a real data example and how your result should look like
-
Hi, @neculai-i-fantanaru, and All
In that case, you could use the regex
(?-i)<p class="TEXTA">[^<>]+<(?!/p).+?>
Notes :
-
First, this regex looks for the literal expression
<p class="TEXTA">
, with that exact case -
Followed with a non-empty range of characters, either different from
<
and>
, till an<
symbol -
Followed with a non-empty range of standard characters till the nearest
>
symbol, but ONLY IF the string/p
cannot be found, right after the<
symbol !
Just test it, with that text below :
<p class="TEXTA">She loves me.</p> <p class="TEXTA">an other test </abc> <p class="TEXTA">She loves me.</p> <p class="TEXTA">She loves me.</123> <p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">She loves me.</p> <p class="TEXTA">She loves me. </p> <p class="TEXTA">She loves me.<LLbr> <p class="TEXTA">It is not about me.<AAbr> <p class="TEXTA">It is not about me.<p> <p class="TEXTA">She loves me. </p> <p class="TEXTA">It is not about me.<AAbr>
As I suppose that you would like to replace any bad ending tag, like
</abc>
,/123
,<LLbr>
,<AAbr>
, or even<p>
with the right ending tag</p>
, use the following regex S/R :SEARCH
<p class="TEXTA">[^<>]+<\K(?!/p).+?(?=>)
REPLACE
/p
Notes :
-
If you just perform the search part, it just matches any bad ending tag, without the
<
and>
boundaries, which is different from/p
-
Remember that the
\K
syntax forces the regex engine to forget everything already matched and reset the working position to the location, right after the<
symbol ! -
If you click on the
Replace All
button ( not theReplace
one ), any bad ending tag is then changed into</p>
Cheers,
guy038
-
-
@guy038 said:
(?-i)<p class=“TEXTA”>[^<>]+<(?!/p).+?>
Your regex is great. But I just find another case that you may update regex, if you want. Strange thing. I did not take this into account. That can be some other tags in the same tag. For example:
<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p>
So, to update my last scenario:
1. <p class="TEXTA">She loves me.</p> 2. <p class="TEXTA">She loves me.<LLbr> 3. <p class="TEXTA">It is not about me.<AAbr> 4. <p class="TEXTA">She loves me. </p> 5.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p> 6.<p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title>
So, the regex it should select lines 2,3 and 6 . Right now, your regex select also the line 5 (because of that 2
<em></em>
witch is not good). -
@neculai-i-fantanaru, and All
Ah, OK ! So, I’ve created a regex, using a recursive pattern ( due to the
(?1)
subroutine to group1
, located inside the group whose it refers to ), which allows the search of any block :-
Beginning with the tag
<p class="TEXTA">
-
Ending with a tag, different from
</p>
, which ends the line -
Containing any correct matched areas
<tag>.....<tag
, possibly juxtaposed and/or nested, as for instance :
<p class="TEXTA">.......<abc>.....<def>... ....</def>...........</abc>.........<123>........</123>......<456>.... ...</456>............ ........<Niv1>.......<Niv2>.........<Niv3>...... ...<Niv4>.......<XXX>...........</XXX>............</Niv4>...... ..............</Niv3>.......... .......</Niv2>.......... .........</Niv1>...........<bla bla bla>
Highly unlikely case, isn’t it !
So, here is the regex :
(?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)
And, again, if you just want to catch the wrong ending tag use the regex :
(?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<\K(?!/p)[^<>]+?(?=>\R)
Test these regexes, against text below. Note that they match only the blocks with even numbers (
2
,4
,6
, … )~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. <p class="TEXTA">She loves me.</p> 2. <p class="TEXTA">She loves me.<LLbr> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3. <p class="TEXTA">It is not about me </p> 4. <p class="TEXTA">It is not about me. <AAbr> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p> 6. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</title> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</p> 8. <p class="TEXTA">I believe in love<em>but only if</em>you can make me smile</html> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9. <p class="TEXTA">I believe in love<12345>but only if</12345>you can make me smile</p> 10. <p class="TEXTA">I believe in love<12345>but only if</12345>you can make me smile</div> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11. <p class="TEXTA">I believe<em> in love<em>but only if</em>you can ma</em>ke me smile</p> 12. <p class="TEXTA">I believe<em> in love<em>but only if</em>you can ma</em>ke me smile<h3> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13. <p class="TEXTA">I be<em>lieve<def> in love<em>but only if</em>you can ma</def>ke me smi</em>le</p> 14. <p class="TEXTA">I be<em>lieve<def> in love<em>but only if</em>you can ma</def>ke me smi</em>le</body> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 15. <p class="TEXTA">I be<abc>lieve<def> in love<em>but only if</em>you can ma</def>ke me smi</abc>le</p> 16. <p class="TEXTA">I be<abc>lieve<def> in love<em>but only if</em>you can ma</def>ke me smi</abc>le<abcde> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</p> 18. <p class="TEXTA">I <ab>believe </ab>in love<em>but only if</em>you <123>can make </123>me smile</a> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 19. <p class="TEXTA">I <code>believe </code>in love<em>but only if<123>you </123>can make </em>me smile</p> 20. <p class="TEXTA">I <code>believe </code>in love<em>but only if<123>you </123>can make </em>me smile</tr> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 21.<p class="TEXTA"></p> 22.<p class="TEXTA"><script> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 23. <p class="TEXTA">.......<abc>.....<def>... ....</def>...........</abc>.........<123>........</123>......<456>.... ...</456>............ ........<Niv1>.......<Niv2>.........<Niv3>...... ..............</Niv3>.......... ...<Niv4>.......<XXX>...........</XXX>............</Niv4>...... .......</Niv2>.......... .........</Niv1>...........</p> 24.. <p class="TEXTA">.......<abc>.....<def>... ....</def>...........</abc>.........<123>........</123>......<456>.... ...</456>............ ........<Niv1>.......<Niv2>.........<Niv3>...... ...<Niv4>.......<XXX>...........</XXX>............</Niv4>...... ..............</Niv3>.......... ...<Niv3>.......<XXX>...........</XXX>............</Niv3>...... .......</Niv2>.......... .........</Niv1>...........<bla bla bla>
Best Regards,
guy038
-
-
@guy038 said:
(?-i)<p class=“TEXTA”>(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>(?=\R)
good morning. I try your regex, both, I don’t know why, but doesn’t select line number 6. Only the lines 2 and 3.
-
again, you will have to add an empty line below 6. if it is the last line of your test document.
then @guy038 's regex will find line 6. correctly with your given example.
so your document must end with an empty line in order for the regex to work. -
yes, ok, but if I have an .html file, I will never finnish with this line. :) So, for sure I have a lot of lines and other tags after line six :)
anyway, I get it. I remove the last part
(?=\R)
and works.(?-i)<p class="TEXTA">(?:([^<>]+<(\w{1,10})>([^<>]+|(?1))</\2>[^<>]+)+|[^<>]*)<(?!/p)[^<>]+?>
thank you @guy038
And Happy New Year everyone !!