How to find words with 12 or more alphabets that are between a <li style...........> and </li>
-
Block of text for testing:-
<div class=“right”>
<ol>
<li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Haemorrhoids <br>- piles</div></li>
<li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Haemorrhoids<br>-piles</div></li>
<li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Offensive haemorrhages</div></li>
<li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>CONSCIOUSNESS of womb. Hysterically inclined.</div></li>
</ol>
I tried this Regular expression to no avail:-<li style[^<>]*+>.*([a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z][a-z]).*</li>
Haemorrhoids, haemorrhages, CONSCIOUSNESS and Hysterically should all be found/matched (the search should be case insensitive). Every other word with 12 or more alphabets should also be found/matched. Notice that I have put it in a capturing group for replacement
-
Hello, @dr-ramaanand and All,
For your problem, I would use the second form of the generic regex, exposed in this post :
The SEARCH regex is
(?-s)(?-i:
BSR|(?!\A)\G).*?\K(?-i:
FR)
Note that the key-point, of that generic regex, is the use of the
\G
regex feature which means that a next match MUST begin right after the previous match !
If we apply this generic regex to your practical search, it means that :
-
The BSR ( Begin Search region Regex ) is
<li style=
-
The FR ( Find Regex ) is
(?i)(\b[a-z]{12,}\b)
Note that I changed the initial FR case-sensitive region
(?-i:
FR)
, embedded in a non-capturing group, by the case-insensitive region, embedded in group1
(?i)(\b[a-z]{12,}\b)
This leads to this functional regex, below, which solves your practical case :
SEARCH
(?-is:<li style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b)
REPLACE RR
Thus :
- Put the INPUT text, below , in a new tab
<div class=“right”> <ol> <li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Haemorrhoids <br>- piles</div></li> <li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Haemorrhoids<br>-piles</div></li> <li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>Offensive haemorrhages</div></li> <li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”><div class=“marginleft”>CONSCIOUSNESS of womb. Hysterically inclined.</div></li> </ol>
-
Move to the very beginning of the file
-
Open the Find or Replace dialog
-
Uncheck all the box options
-
Check the
Wrap around
option -
Select the
Regular expression
search mode -
Click, several times, on the
Next
button to verify the different matches or click, once only, on theReplace All
button for a global replacement
Note : if your text may contain accentuated characters, I advice you to prefer this version :
- SEARCH
(?-is:<li style=|(?!\A)\G).*?\K(?i)(\b[\u\l]{12,}\b)
Best Regards,
guy038
-
-
@guy038 The word/term, “Hysterically” is not found/matched, probably because it is the second word with 12 or more alphabets in the same line. Can you tweak that RegEx to help find such words?
-
@guy038
(?-is:<li style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b).*?\K(\b[a-z]{12,}\b)
helps find/match only the second word/term with 12 or more alphabets. So I can probably use this RegEx and make changes and then use what you gave to find the first word/term with 12 or more alphabets. -
Hi, @dr-ramaanand and All,
I don’t understand ! With the
(?-is:<li style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b)
regex, once the search regex has matched the wordCONSCIOUSNESS
, if you click again on theFind Next
button, it does find the wordHysterically
!If it’s not the case just post the EXACT raw text used
You do not need your second version !
BR
guy038
P.S. : if your strings
<li style=
always begin a line, you may narrow down the results with this version :- SEARCH
(?-is:^<li style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b)
- SEARCH
-
@guy038 It doesn’t begin at the start of the line, so I will not use your last RegEx (just above this response of mine). At regex101.com your Regular Expression does find that second word/term with 12 or more alphabets (see https://regex101.com/r/Dw8XTK/1) but it doesn’t on my laptop. It may be due to a bug but I will manage - I don’t want to waste your time. Thanks for your help and time!
-
@guy038 I have another block of text for testing as follows:-
<span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Haemorrhoids <br>- piles</span>
<span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Haemorrhoids<br>-piles</span>
<span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Offensive haemorrhages</span>
<span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>CONSCIOUSNESS of womb. Hysterically inclined.</span>The Regular expression
(?-is:<span style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b).*?\K(\b[a-z]{12,}\b)
does not find/match anything -
@guy038
(?-is:<span style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b)
however does find/match it -
@dr-ramaanand said in How to find words with 12 or more alphabets that are between a <li style...........> and </li>:
(?-is:<span style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b)
How to make the searching stop upon encountering a
</span>
? -
@guy038 The regular expression
(?-is:<span style=|(?!\A)\G).*?\K(?i)(\b[a-z]{12,}\b).*?<\/span>
helps find words of 12 alphabets or more between<span style=.....>
and</span>
. Is it correct? -
Block of text for testing:-
<span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Haemorrhoids <br>- piles</span> <span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Haemorrhoids<br>-piles</span> <span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Offensive haemorrhages</span> <span style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>CONSCIOUSNESS of womb. Hysterically inclined.</span> <p style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>Confirmatory symptoms</p> <ol><li style=“padding: 0px; list-style-type: decimal; list-style-image: none; list-style-position: outside; font-family: “verdana”; font-size: 18px; color: black;”>REMEDY RELATIONSHIPS</li></ol>
Everything after the
</span>
should be skipped