Regex: How to remove enewline character from a particular html tags?
-
I have this html tag, which is interrupted by /n at some point after word
masuri:<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>THE OUTPUT must be:
<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>I try this regex, but doesn’t work too good, because is also change the entire html code, not just that particular tag.
FIND:
(?:<p class="mb-40px">|\G)(?:(?!</p>).)*?\K(\r\n|\r|\n)REPLACE BY:
\x20Also, I find a solution of @neil-schipper from a page on this forum, but I don’t know how to integrate with my html tag :
FIND:
(?<=[^\r\n])\R(?=[^\r\n])
REPLACE BY:(LEAVE EMPTY) -
This is just a (by now) simple replace-but-only-between-delimiters problem; see HERE for the templatized solution.
-
@alan-kilborn THANKS, it works !!
Find:
(?-i:<p class="mb-40px">|(?!\A)\G)(?s:(?!</p>).)*?\K(?-i:(?<=[^\r\n])\R(?=[^\r\n]))Replace by:
\x20 -
Another solution:
(\r\n|\r|\n)FIND:
(<p class="mb-40px">)+(.)+\K(\r\n|\r|\n)(?=.*<\/p>)REPLACE BY:
\x20The below GENERIC regex formula can be much simple made then @guy038 made in many other of his GENERIC regex formulas:
(REGION-START)+(.)+\K(FIND REGEX)(?=.*REGION-FINAL) -
@robin-cruise said in Regex: How to remove enewline character from a particular html tags?:
The below GENERIC regex formula can be much simple made then @guy038 made
Why should you be believed over @guy038 ?
-
another alternative of Robin’s generic, a better version, can be:
(REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL))) -
Hello, @rovbin-cruise, @alan-kilborn, @hellena-crainicu and All,
Refering to my first blog post about a generic regex, below :
https://community.notepad-plus-plus.org/post/75007
and as Robin want to search for line-ending chars, we need to use, of course the complete generic regex S/R :
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)REPLACE RR
and not the simplified single-line version
So :
-
The FR regex is just
\R, as the non-capturing associated group, beginning with(?_si:..., is useless in this case -
The RR regex is
\x20 -
The BSR regex may be strictly the string
<p class="mb-40px">but may also be expressed as<p class=".+?"> -
The ESR regex is, of course, the ending tag
</p>, which must never occurs before the next line-ending to replace
giving the functional regex S/R :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\RREPLACE
\x20Test it against that text :
<a href="https://www.w3schools.com/">We strongly suggest to visit the w3schools.com site</a> <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p> <p class="Test">A SINGLE line</p> <h1>this is my very first heading </h1> <p class="123-456 789">This is a quick text to verify if it replaces line-endings by a space char in <p> tags ONLY</p>ONLY the
<p class.............<p>, multi-lines or not, should be concerned by the replacement !Of course, these
HTMLcommands do not represent a legalHTMLfile and are just used to verify the regex S/R !
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region, will not work, most of a time :-(
SEARCH
(?-si:BSR|(?!\A)\G).*?\K(?-si:FR)(?=(?s-i:.*?ESR))In our case, the functional regex S/R becomes :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G).*?\K\R(?=(?s-i:.*?</p>))REPLACE
\x20But if you test it against, for instance :
<p class="Test">Several consecutive lines</p> <h1>this is my very first heading </h1> <p class="Test">A SINGLE line</p> <h2>this is my second heading </h2>It would concatenate all text till the last
</p>of the file, just leaving the last<h2>tag untouched. You could say : But I did add a final question mark in order to get a lazy range of chars before</p>!You’re right ! But remember that the regex engine tries, by all means, to get a solution. So, it matches the
CRLFchars, which follow lines</p>, because the regex engine considers that the.*?lazy range of chars begins immediately after the line-ending and continues till right before the third and final</p>, so defining a correct look-ahead assertion !Thus, testing if the ESR region is not reached at any position, till a NEXT FR match, seems the only method which works properly !
Best Regards
guy038
Reminder : Move to the very beginning of text before clicking on the
Find NextorReplace Allbutton ! -
-
@alan-kilborn said in Regex: How to remove enewline character from a particular html tags?:
Why should you be believed over @guy038 ?
@guy038 said in Regex: How to remove enewline character from a particular html tags?:
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region will not work, most of a time :-(
@Robin-cruise and @hellena-crainicu :
Be careful of posting simplifications.
Probably best to leave these things to the “Master”. :-)
-
The best solution is this:
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\s+General regex:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\KFR
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login