Regex: Add html tags in the lines that doesn't have html tags
-
I have this paragraph. Also, I have a line
I need someone to take me home.that doesn’t have html tags. So, I need to find this line (not others) and frame it between tags<!-- START --> <p class="mb-40px">I may go to cinema</p> I need someone to take me home. <p class="mb-40px">I can love you now</p> <!-- FINAL -->OUTPUT:
<!-- START --> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I can love you now</p> <!-- FINAL -->I don’t know why my regex doesn’t work.
FIND:
^(?!<p class="mb-40px">)(.*?)((?!</p>).)*$REPLACE:
<p class="mb-40px">\2\</p> -
This post is deleted! -
ok, so, I believe, I took a step forward. Seems t work.
FIND:
^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$REPLACE BY:
<p class="mb-40px">\2</p>Now, I have to integrate this regex between section:
<!-- START -->and<!-- FINAL -->I will use this generic formula:
(?s)(?-i:REGION-START.+?">|\G(?!^))((?!REGION-FINAL).)*?\KFIND REGEXwill become:
FIND:
(?s)(?-i:<\!-- START -->.+?">|\G(?!^))((?!<\!-- FINAL -->).)*?\K^(?!<p class="mb-40px">)(([a-zA-Z-].+))((?!</p>).)*$REPLACE:
<p class="mb-40px">\2</p>In this case, is not very very good. Something not work too good at this final regex. Maybe @guy038 have a better opinion
-
Hello @robin-cruise and All,
No need to use the generic formula !
Here is my general method :
-
From beginning of current line, I try to find a line which does not contain :
- A string
<!-- START -->at any position of current line
AND - A string
<!-- FINAL -->at any position of current line
AND
( - A tag
<p class="mb-40px">at any position of current line
OR - A tag
</p>at any position of current line
)
- A string
-
Then I select all characters, of current line, which come :
-
After a possible
<p class="mb-40px">tag -
Before a possible
</p>tag
-
So, given this INPUT text, below, with
3lines to change :<!-- START --> <p class="mb-40px">I may go to cinema</p> I need someone to take me home. <p class="mb-40px">I may go to cinema</p> I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home. <p class="mb-40px">I can love you now</p> <!-- FINAL -->I use the following regex S/R :
SEARCH
(?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))REPLACE
<p class="mb-40px">\1</p>And, after a click on the
Replace Allbutton, I get the expected OUTPUT text :<!-- START --> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I may go to cinema</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I can love you now</p> <!-- FINAL -->
Notes :
-
First, after the usual modifiers, the boundaries which must not be matched
(?!.*<!-- START -->)(?!.*<!-- FINAL -->) -
Then, either, each tag which must not be matched, within a non-capturing group and the alternative
(?:(?!.*<p class)|(?!.*</p>)) -
Now, after a possible
(?:<p class="mb-40px">)?,in a non-capturing group, too, the regex select, either :- All chars before the
</p>tag
OR - All remaining chars of current line
- All chars before the
Remark :
-
Note the special syntax of this non-capturing group
(?|(.+)</p>|(.+)). This allow to define all groups to the same level. Thus, you just need the<p class="mb-40px">\1</p>syntax in the replacement part -
If I had used a normal non-capturing group
(?:(.+)</p>|(.+)), two groups1and2would have been defined !. So the correct replacement regex would have been<p class="mb-40px">\1\2</p>, as these two groups are mutually exclusive !
Best Regards,
guy038
-
-
@guy038 super, thanks.
what should be the generic regex in this case? (because I cannot figure the last part )
(?-is)^(?!.*REGION-START)(?!.*REGION-FINAL)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+)) -
Hi, @robin-cruise,
You cannot use the generic regex, discussed in the topic :
In order to solve your present goal. Why ?
Well, because that genric regex suppose :
-
First, to match a BSR region, followed with any range of chars, possibly null, different from the ESR region, and, after a
\Kfeature, match the FR region -
Then, match, from current caret position, any range of chars, possibly null, different from the ESR region, and, after a
\Kfeature, match the FR region
But, in your present case, the INPUT lines to modify, like
I need someone to take me home., do not contain the BSR and/or the ESR region. So, how do you think to get these absent regions, in the search regex ??Best Regards,
guy038
-
-
SEARCH:
(?-is)^(?!.*<!-- START -->)(?!.*<!-- FINAL -->)(?:(?!.*<p class)|(?!.*</p>))(?:<p class="mb-40px">)?(?|(.+)</p>|(.+))REPLACE:
<p class="mb-40px">\1</p>Your regex seems to be very good. Except one thing. If, also, I have this code on my html pages, will also change here.
So, I need only to change between section
<!-- START -->and<!-- FINAL --><html lang="en"> <head> <!-- Meta Tags --> <meta charset="utf-8"/> Script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "Product", "name": "10 media farces of big days", "image": "icon.jpg", "description": "horses of Letea Delta Danube successfully saved,", "brand": { "@type": "Brand", "name": "something" }, "sku": "NFL", "gtin8": "NFL", "offers": { "@type": "Offer", "url": "https://something.html", "priceCurrency": "RON", "price": "0", "priceValidUntil": "2022-02-15", "availability": "https://schema.org/OnlineOnly" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "5", "bestRating": "5", "ratingCount": "6" }, "review": { "@type": "Review", "reviewRating": { "@type": "Rating", "ratingValue": "5", "bestRating": "5" }, "author": {"@type": "Person", "name": "omehing"}, "publisher": {"@type": "Organization", "name": "omehing"} } } </script> -
Hi, @robin-cruise,
Once and for all, Robin, please, post a complete / exact file, which represents all your data that you need to change !
We cannot work this way, in the future, if you do not provide real examples because regex things are very close to real text !
BR
guy038
-
yes, but also I cannot copy/paste the entire html page. It is a very large html code.
-
Hi, @Robin-cruise
If you don’t mind, just send me your file by e-mail !
Here is my temporary mail address :
BR
guy038
-
Hello @robin-cruise and All,
Ah… OK. Thanks for your attached
HTMLfile with your mail. It’s always easier with a real example ;-))Now, as you just have one
<!-- ARTICOL START -->.......<!-- ARTICOL FINAL -->zone in yourHTMLfile, the simple thing to do is :
-
In search, to look for :
- Any char from the very start of file till the complete
<!-- ARTICOL START -->line
- Any char from the very start of file till the complete
-
OR
- Any char from the
<!-- ARTICOL FINAL -->line till the very end of your file
- Any char from the
-
OR ( Scan of lines between the
<!-- ARTICOL START -->and<!-- ARTICOL FINAL -->boundaries )-
A possible
<p class="mb-40px">tag, beginning the current line -
Followed with a single-line range of characters :
- Till a
</p>tag, ending the current line
- Till a
-
OR
- Till the end of current line
-
-
In replacement, to rewrite :
-
( If scan within the
<!-- ARTICOL START -->.........<!-- ARTICOL FINAL -->zone, so when the group2is defined )-
First, the
<p class="mb-40px">tag, if absent in the INPUT file ( group1not defined ) -
Then all the contents of current line (
$0) -
And, finally, the
</p>tag, if absent in the INPUT file ( group3not defined )
-
-
-
OR
- The two ranges of chars, before the
<!-- ARTICOL START -->, included and after the<!-- ARTICOL FINAL -->boundaries ( which occur when the group2is not defined )
- The two ranges of chars, before the
For instance, from this INPUT file, below :
<!DOCTYPE html> .... bla bla .... blah bla <!-- ARTICOL START --> <p class="mb-40px">I need someone to take me home.</p> I need someone to take me home. I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home. <!-- ARTICOL FINAL --> bla bla .... blah bla .... </html>The following regex S/R :
SEARCH
(?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$REPLACE
?2(?1:<p class="mb-40px">)$0(?3:</p>):$0Should give you the expected results :
<!DOCTYPE html> .... bla bla .... blah bla <!-- ARTICOL START --> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <p class="mb-40px">I need someone to take me home.</p> <!-- ARTICOL FINAL --> bla bla .... blah bla .... </html>
The message Replace All:
6occurences were replaced is displayed in the status bar :- One for the part between
<!DOCTYPE html>and<!-- ARTICOL START --> - One for each non-empty line between
<!-- ARTICOL START -->and<!-- ARTICOL FINAL -->(4lines ) - One for the part between
<!-- ARTICOL START -->and the very end of file
Note that this final solution does not neeed any look-ahead structure nor the
\Gsyntax or other goodies !!Best Regards,
guy038
-
-
@guy038 said in Regex: Add html tags in the lines that doesn't have html tags:
SEARCH
(?s-i)^.+<!-- ARTICOL START -->\R|<!-- ARTICOL FINAL -->.+|(?-s)^(<p class="mb-40px">)?(?|(.+)(</p>)|(.+))$
REPLACE?2(?1:<p class="mb-40px">)$0(?3:</p>):$0great answer, thank you @guy038