Regex: Find those tags that contain a string, but which do not contain other string
-
1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name</a> is beyond magic.</p> 2. <p class="mb-40px">I love my home s< because I stay with my lovely cat.</p> 3. <p class="mb-40px">Because of this book t< I cannot sleep well.</p>
I want to find only the lines that have the operator < included in the html tag <p class=“mb-40px”> </p> , except those lines that have
In my example above, the output should be line 2 (that have s< ) and line 3 ( that have t< )
So, I use @guy032 generic formula: (REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))
In my case FIND: (<p class=“mb-40px”>)+(.)+\K(\w<)(?s:(?=.*(</p>)))
The problem is that my regex find also the e</a> from the first line. And I don’t wanna find the tags with
</a>
Maybe @guy038 have a better GENERIC for this kind of problem
-
Hello, @rodica-f and All,
Just consider this example :
1. <p class="mb-40px">My nick name is Prince and <a href="https://mywebsite.com/bla.html" class="color-gege" target="_new">my real name </a> is z<beyond f< magic.</p> 2. <p class="Test">I love my home s< because I stay with my b<lovely cat.</p> 3. <p class="mb-40px">Because of this book t<I cannot a< sleep well.</p>
Within this text :
-
Two tags begin with
<p class="mb-40px">
and one begins with<p class="Test">
-
Each
<p
tag contains two<
operators ( one followed with aspace
char, the other followed with aletter
)
So :
- To find any
<p...
tag containing any string\w<
, preceded with aspace
char, use the regex :
SEARCH / MARK
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<
- To find any
<p...
tag and containing any string\w<
, preceded and followed with aspace
, use the regex :
SEARCH / MARK
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)
- To find the specific tag
<p class="mb-40px">
containing any string\w<
, preceded with aspace
char, use the regex :
SEARCH / MARK
(?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<
- To find the specific tag
<p class="mb-40px">
containing any string\w<
, preceded and followed with aspace
char, use the regex :
SEARCH / MARK
(?-si:<p class="mb-40px">|(?!\A)\G)(?s-i:(?!</p>).)*?\x20\K\w<(?=\x20)
Best Regards,
guy038
P.S. :
BTW, no need to use a new profile. You’re certainly @robin-cruise !
-
-
@guy038 thanks for the solution.
1 mobile account and 1 desktop account. No difference. It’s all about where you are at that time…
-
@guy038 So, the generic formulas for this kind of problem (contain a string, but doesn’t contain other string) should be this:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\K(FR)
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\x20\KFR(?=\x20)
BSR (begin part) =
<p class="mb-40px">
ESR (end part) =</p>
FR - (FIND Regex) =\w<
-
Hi, @rodica-f and All,
Just a remainder :
- Don’t forget to move the caret to the very beginning of file, before running the regex, with the
Ctrl + Home
shortcut
BR
guy038
- Don’t forget to move the caret to the very beginning of file, before running the regex, with the
-
@guy038 but what if I have the following case?
Must use a regex as to find all lines which contain
<p class="sd-23">
but does not contain the closing tag</p>
<p class="sd-23">Somebody to love</p> <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen. <p class="sd-23">Holy Birth of God, have mercy on us!</p>
my regex doesn’t work at all. It should have found the second line.
FIND:
(?<p class="sd-23">).*(?!</p>)
-
Hello, @Robin-cruise and All,
Well, not so difficult ! I assume that each line must end with the
</p>
tag and that you’re not speaking about any multi-lines block !
Then, use the following regex in order to find out all the lines beginning with
<p class="sd-23">
and not ending with the</p>
tag :(?-i)\h*<p class="sd-23">((?!</p>).)*$
For instance, using this four-lines text :
p class="sd-23">Somebody to love</p> <p class="sd-23">In 1495, the Grand Prince gave this icon as a blessing to his daughter Helen. <p class="sd-23"> <p class="sd-23">Holy Birth of God, have mercy on us!</p>
The regex would select the entire lines
2
and3
!
Notes :
-
The regex finds, first, the string
<p class="sd-23">
, with this exact case, after possible leading blank characters -
Then, it grasps all remaining text (
.*
) till the end of the current line ($
)… -
…But ONLY IF it does not meet the
</p>
tag at any position after<p class="sd-23">
, till the end of current line
Best Regards,
guy038
-
-
@guy038 thanks ! but I don’t understand what does this doing:
(?-i)\h*
-
Hi, @robin-cruise,
-
The (
?-i)
part means that, from thiat point, the search will be sentitive to case. So, it will match the string<p class="sd-23">
, but not, for instance, the string<P class="sd-23">
nor the string<p CLASS="sd-23">
! -
Then, the
\h
class character represents any horizontal blank character ( so, either, the\t
[ Tabulation ] char or the\x20
[ space] char or the\xa0
character [No-breaking Space] char) -
Thus, the
\h*
syntax represents any range of horizontal blank chars, from0
ton
BR
guy038
-