Regex: How to remove enewline character from a particular html tags?
-
I have this html tag, which is interrupted by /n at some point after word
masuri:
<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>
THE OUTPUT must be:
<p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p>
I try this regex, but doesn’t work too good, because is also change the entire html code, not just that particular tag.
FIND:
(?:<p class="mb-40px">|\G)(?:(?!</p>).)*?\K(\r\n|\r|\n)
REPLACE BY:
\x20
Also, I find a solution of @neil-schipper from a page on this forum, but I don’t know how to integrate with my html tag :
FIND:
(?<=[^\r\n])\R(?=[^\r\n])
REPLACE BY:(LEAVE EMPTY)
-
This is just a (by now) simple replace-but-only-between-delimiters problem; see HERE for the templatized solution.
-
@alan-kilborn THANKS, it works !!
Find:
(?-i:<p class="mb-40px">|(?!\A)\G)(?s:(?!</p>).)*?\K(?-i:(?<=[^\r\n])\R(?=[^\r\n]))
Replace by:
\x20
-
Another solution:
(\r\n|\r|\n)
FIND:
(<p class="mb-40px">)+(.)+\K(\r\n|\r|\n)(?=.*<\/p>)
REPLACE BY:
\x20
The below GENERIC regex formula can be much simple made then @guy038 made in many other of his GENERIC regex formulas:
(REGION-START)+(.)+\K(FIND REGEX)(?=.*REGION-FINAL)
-
@robin-cruise said in Regex: How to remove enewline character from a particular html tags?:
The below GENERIC regex formula can be much simple made then @guy038 made
Why should you be believed over @guy038 ?
-
another alternative of Robin’s generic, a better version, can be:
(REGION-START)+(.)+\K(FIND REGEX)(?s:(?=.*(REGION-FINAL)))
-
Hello, @rovbin-cruise, @alan-kilborn, @hellena-crainicu and All,
Refering to my first blog post about a generic regex, below :
https://community.notepad-plus-plus.org/post/75007
and as Robin want to search for line-ending chars, we need to use, of course the complete generic regex S/R :
SEARCH
(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR).)*?\K(?-si:
FR)
REPLACE RR
and not the simplified single-line version
So :
-
The FR regex is just
\R
, as the non-capturing associated group, beginning with(?_si:...
, is useless in this case -
The RR regex is
\x20
-
The BSR regex may be strictly the string
<p class="mb-40px">
but may also be expressed as<p class=".+?">
-
The ESR regex is, of course, the ending tag
</p>
, which must never occurs before the next line-ending to replace
giving the functional regex S/R :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\R
REPLACE
\x20
Test it against that text :
<a href="https://www.w3schools.com/">We strongly suggest to visit the w3schools.com site</a> <p class="mb-40px">Aceasta este o melodie alcatuita din patru masuri: reluata apoi de catre instrumentul solist cu un cintec popular.</p> <p class="Test">A SINGLE line</p> <h1>this is my very first heading </h1> <p class="123-456 789">This is a quick text to verify if it replaces line-endings by a space char in <p> tags ONLY</p>
ONLY the
<p class.............<p>
, multi-lines or not, should be concerned by the replacement !Of course, these
HTML
commands do not represent a legalHTML
file and are just used to verify the regex S/R !
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region, will not work, most of a time :-(
SEARCH
(?-si:
BSR|(?!\A)\G).*?\K(?-si:
FR)(?=(?s-i:.*?
ESR))
In our case, the functional regex S/R becomes :
SEARCH
(?-si:<p class=".+?">|(?!\A)\G).*?\K\R(?=(?s-i:.*?</p>))
REPLACE
\x20
But if you test it against, for instance :
<p class="Test">Several consecutive lines</p> <h1>this is my very first heading </h1> <p class="Test">A SINGLE line</p> <h2>this is my second heading </h2>
It would concatenate all text till the last
</p>
of the file, just leaving the last<h2>
tag untouched. You could say : But I did add a final question mark in order to get a lazy range of chars before</p>
!You’re right ! But remember that the regex engine tries, by all means, to get a solution. So, it matches the
CRLF
chars, which follow lines</p>
, because the regex engine considers that the.*?
lazy range of chars begins immediately after the line-ending and continues till right before the third and final</p>
, so defining a correct look-ahead assertion !Thus, testing if the ESR region is not reached at any position, till a NEXT FR match, seems the only method which works properly !
Best Regards
guy038
Reminder : Move to the very beginning of text before clicking on the
Find Next
orReplace All
button ! -
-
@alan-kilborn said in Regex: How to remove enewline character from a particular html tags?:
Why should you be believed over @guy038 ?
@guy038 said in Regex: How to remove enewline character from a particular html tags?:
Now, the generic variants, proposed by @Robin-cruise and @hellena-crainicu, with a final look-ahead only, containing the ESR region will not work, most of a time :-(
@Robin-cruise and @hellena-crainicu :
Be careful of posting simplifications.
Probably best to leave these things to the “Master”. :-)
-
The best solution is this:
(?-si:<p class=".+?">|(?!\A)\G)(?s-i:(?!</p>).)*?\K\s+
General regex:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\KFR