Regex: Add html tags in the lines that doesn't have html tags
-
ok, so, I believe, I took a step forward. Seems t work.
FIND:
^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$
REPLACE BY:
<p class="mb-40px">\2</p>
Now, I have to integrate this regex between section:
<!-- START -->
and<!-- FINAL -->
I will use this generic formula:
(?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEX
will become:
FIND:
(?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$
REPLACE:
<p class="mb-40px">\2</p>
In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion
-
Hello @robin-cruise and All,
No need to use the generic formula !
Here is my general method :
-
From beginning of current line, I try to find a line which does not contain :
- A string
<!-- START -->
at any position of current line
AND - A string
<!-- FINAL -->
at any position of current line
AND
( - A tag
<p class="mb-40px">
at any position of current line
OR - A tag
</p>
at any position of current line
)
- A string
-
Then I select all characters, of current line, which come :
-
After a possible
<p class="mb-40px">
tag -
Before a possible
</p>
tag
-
So, given this INPUT text, below, with
3
lines to change :<!-- START --> <p class="mb-40px">I may go to cinema</p> I need someone to take me home. <p class="mb-40px">I may go to cinema</p> I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home. <p class="mb-40px">I can love you now</p> <!-- FINAL -->
I use the following regex S/R :
SEARCH
(?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))
REPLACE
<p class="mb-40px">\1</p>
And, after a click on the
Replace All
button, I get the expected OUTPUT text :<!-- START --> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I can love you now</p> <!-- FINAL -->
Notes :
-
First, after the usual modifiers, the boundaries which must not be matched
(?!.*<!-- START -->)(?!.*<!-- FINAL -->)
-
Then, either, each tag which must not be matched, within a non-capturing group and the alternative
(?:(?!.*<p class)|(?!.*</p>))
-
Now, after a possible
(?:<p class="mb-40px">)?,
in a non-capturing group, too, the regex select, either :- All chars before the
</p>
tag
OR - All remaining chars of current line
- All chars before the
Remark :
-
Note the special syntax of this non-capturing group
(?|(.+)</p>|(.+))
. This allow to define all groups to the same level. Thus, you just need the<p class="mb-40px">\1</p>
syntax in the replacement part -
If I had used a normal non-capturing group
(?:(.+)</p>|(.+))
, two groups1
and2
would have been defined !. So the correct replacement regex would have been<p class="mb-40px">\1\2</p>
, as these two groups are mutually exclusive !
Best Regards,
guy038
-
-
@guy038 super, thanks.
what should be the generic regex in this case? (because I cannot figure the last part )
(?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))
-
Hi, @robin-cruise,
You cannot use the generic regex, discussed in the topic :
In order to solve your present goal. Why ?
Well, because that genric regex suppose :
-
First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a
\K
feature, match the FR region -
Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a
\K
feature, match the FR region
But, in your present case, the INPUT lines to modify, like
I need someone to take me home.
, do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??Best Regards,
guy038
-
-
SEARCH:
(?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))
REPLACE:
<p class="mb-40px">\1</p>
Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.
So, I need only to change between section
<!-- START -->
and<!-- FINAL -->
<html lang="en"> <head> <!-- Meta Tags --> <meta charset="utf-8"/> Script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "Product", "name": "10 media farces of big days", "image": "icon.jpg", "description": "horses of Letea Delta Danube successfully saved,", "brand": { "@type": "Brand", "name": "something" }, "sku": "NFL", "gtin8": "NFL", "offers": { "@type": "Offer", "url": "https://something.html", "priceCurrency": "RON", "price": "0", "priceValidUntil": "2022-02-15", "availability": "https://schema.org/OnlineOnly" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "5", "bestRating": "5", "ratingCount": "6" }, "review": { "@type": "Review", "reviewRating": { "@type": "Rating", "ratingValue": "5", "bestRating": "5" }, "author": {"@type": "Person", "name": "omehing"}, "publisher": {"@type": "Organization", "name": "omehing"} } } </script>
-
Hi, @robin-cruise,
Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !
We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !
BR
guy038
-
yes, but also I cannot copy/paste the entire html page. It is a very large html code.
-
Hi, @Robin-cruise
If you don’t mind, just send me your file by e-mail !
Here is my temporary mail address :
BR
guy038
-
Hello @robin-cruise and All,
Ah… OK. Thanks for your attached
HTML
file with your mail. It’s always easier with a real example ;-))Now, as you just have one
<!-- ARTICOL START -->.......<!-- ARTICOL FINAL -->
zone in yourHTML
file, the simple thing to do is :
-
In search, to look for :
- Any char from the very start of file till the complete
<!-- ARTICOL START -->
line
- Any char from the very start of file till the complete
-
OR
- Any char from the
<!-- ARTICOL FINAL -->
line till the very end of your file
- Any char from the
-
OR ( Scan of lines between the
<!-- ARTICOL START -->
and<!-- ARTICOL FINAL -->
boundaries )-
A possible
<p class="mb-40px">
tag, beginning the current line -
Followed with a single-line range of characters :
- Till a
</p>
tag, ending the current line
- Till a
-
OR
- Till the end of current line
-
-
In replacement, to rewrite :
-
( If scan within the
<!-- ARTICOL START -->.........<!-- ARTICOL FINAL -->
zone, so when the group2
is defined )-
First, the
<p class="mb-40px">
tag, if absent in the INPUT file ( group1
not defined ) -
Then all the contents of current line (
$0
) -
And, finally, the
</p>
tag, if absent in the INPUT file ( group3
not defined )
-
-
-
OR
- The two ranges of chars, before the
<!-- ARTICOL START -->
, included and after the<!-- ARTICOL FINAL -->
boundaries ( which occur when the group2
is not defined )
- The two ranges of chars, before the
For instance, from this INPUT file, below :
<!DOCTYPE html> .... bla bla .... blah bla <!-- ARTICOL START --> <p class="mb-40px">I need someone to take me home.</p> I need someone to take me home. I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home. <!-- ARTICOL FINAL --> bla bla .... blah bla .... </html>
The following regex S/R :
SEARCH
(?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
REPLACE
?2(?1:<p class="mb-40px">)$0(?3:</p>):$0
Should give you the expected results :
<!DOCTYPE html> .... bla bla .... blah bla <!-- ARTICOL START --> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <!-- ARTICOL FINAL --> bla bla .... blah bla .... </html>
The message Replace All:
6
occurences were replaced is displayed in the status bar :- One for the part between
<!DOCTYPE html>
and<!-- ARTICOL START -->
- One for each non-empty line between
<!-- ARTICOL START -->
and<!-- ARTICOL FINAL -->
(4
lines ) - One for the part between
<!-- ARTICOL START -->
and the very end of file
Note that this final solution does not neeed any look-ahead structure nor the
\G
syntax or other goodies !!Best Regards,
guy038
-
-
@guy038 said in Regex: Add html tags in the lines that doesn't have html tags:
SEARCH
(?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
REPLACE?2(?1:<p class="mb-40px">)$0(?3:</p>):$0
great answer, thank you @guy038