Regex: How to remove enewline character from a particular html tags?
-
I have this html tag, which is interrupted by /n at some point after word
masuri:<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>THE OUTPUT must be:
<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>I try this regex, but doesn’t work too good, because is also change the entire html code, not just that particular tag.
FIND:
(?:<p class="mb-40px">|\G)(?:(?!</p>).)*?\K(\r\n|\r|\n)REPLACE BY:
\x20Also, I find a solution of @neil-schipper from a page on this forum, but I don’t know how to integrate with my html tag :
FIND:
(?<=[^\r\n])\R(?=[^\r\n])
REPLACE BY:(LEAVE EMPTY) -
This is just a (by now) simple replace-but-only-between-delimiters problem; see HERE for the templatized solution.
-
@alan-kilborn THANKS, it works !!
Find:
(?-i:<p class="mb-40px">|(?!\A)\G)(?s:(?!</p>).)*?\K(?-i:(?<=[^\r\n])\R(?=[^\r\n]))Replace by:
\x20 -
Another solution:
(\r\n|\r|\n)FIND:
(<p class="mb-40px">)+(.)+\K(\r\n|\r|\n)(?=.*<\/p>)REPLACE BY:
\x20The below GENERIC regex formula can be much simple made then @guy038 made in many other of his GENERIC regex formulas:
(REGION-START)+(.)+\K(FIND REGEX)(?=.*REGION-FINAL) -
@robin-cruise said in Regex: How to remove enewline character from a particular html tags?:
The below GENERIC regex formula can be much simple made then @guy038 made
Why should you be believed over @guy038 ?
-
another alternative of Robin’s generic, a better version, can be:
(REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL))) -
Hello, @rovbin-cruise, @alan-kilborn, @hellena-crainicu and All,
Refering to my first blog post about a generic regex, below :
https://community.notepad-plus-plus.org/post/75007
and as Robin want to search for line-ending chars, we need to use, of course the complete generic regex S/R :
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)REPLACE RR
and not the simplified single-line version
So :
-
The FR regex is just
\R, as the non-capturing associated group, beginning with(?_si:..., is useless in this case -
The RR regex is
\x20 -
The BSR regex may be strictly the string
<p class="mb-40px">but may also be expressed as<p class=".+?"> -
The ESR regex is, of course, the ending tag
</p>, which must never occurs before the next line-ending to replace
giving the functional regex S/R :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\RREPLACE
\x20Test it against that text :
<a href="https://www.w3schools.com/">We strongly suggest to visit the w3schools.com site</a> <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p> <p class="Test">A SINGLE line</p> <h1>this is my very first heading </h1> <p class="123-456 789">This is a quick text to verify if it replaces line-endings by a space char in <p> tags ONLY</p>ONLY the
<p class.............<p>, multi-lines or not, should be concerned by the replacement !Of course, these
HTMLcommands do not represent a legalHTMLfile and are just used to verify the regex S/R !
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region, will not work, most of a time :-(
SEARCH
(?-si:BSR|(?!\A)\G).*?\K(?-si:FR)(?=(?s-i:.*?ESR))In our case, the functional regex S/R becomes :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G).*?\K\R(?=(?s-i:.*?</p>))REPLACE
\x20But if you test it against, for instance :
<p class="Test">Several consecutive lines</p> <h1>this is my very first heading </h1> <p class="Test">A SINGLE line</p> <h2>this is my second heading </h2>It would concatenate all text till the last
</p>of the file, just leaving the last<h2>tag untouched. You could say : But I did add a final question mark in order to get a lazy range of chars before</p>!You’re right ! But remember that the regex engine tries, by all means, to get a solution. So, it matches the
CRLFchars, which follow lines</p>, because the regex engine considers that the.*?lazy range of chars begins immediately after the line-ending and continues till right before the third and final</p>, so defining a correct look-ahead assertion !Thus, testing if the ESR region is not reached at any position, till a NEXT FR match, seems the only method which works properly !
Best Regards
guy038
Reminder : Move to the very beginning of text before clicking on the
Find NextorReplace Allbutton ! -
-
@alan-kilborn said in Regex: How to remove enewline character from a particular html tags?:
Why should you be believed over @guy038 ?
@guy038 said in Regex: How to remove enewline character from a particular html tags?:
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region will not work, most of a time :-(
@Robin-cruise and @hellena-crainicu :
Be careful of posting simplifications.
Probably best to leave these things to the “Master”. :-)
-
The best solution is this:
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\s+General regex:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\KFR