Regex: Delete all html tags inside 2 other tags, except <a href=.*?"> and </a>
-
I want to delete all html tags inside 2 other tags, except
<a href=".*?">and</a>For example:
<p class="mb-40px">Another blending </h2>option is to all the <div>brushstrokes to show. In the painting of trees above, I didn’t spend much time trying to <a href=https://orfun.com/acrylic class="color-bebe" target="_new">blend the colors</a>. I simply mix each color and apply it without fussing with it.</p>In the case below, must delete <div> and </h2> , but keep <a href and </a>
Output:
<p class="mb-40px">Another blending option is to all the brushstrokes to show. In the painting of trees above, I didn’t spend much time trying to <a href=https://orfun.com/acrylic class="color-bebe" target="_new">blend the colors</a>. I simply mix each color and apply it without fussing with it.</p>My regex is not to good:
(?s-i)^.+<p class="mb-40px">\R|</p>.+|(?-s)(<a href.*>)?(?|(.+)(</a>)|(.+))$Replace by:
?2(?1:<p class="mb-40px">)$0(?3:</p>):$1 -
Tough challenge! But I believe I have a regex that will meet your need.
FIND:
(?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K<(?!(?:/[ap]>|a\x20[^>]+>))[^>]*>
REPLACE WITH: <empty>I converted
<p class="mb-40px">Delete <h2>ALL </h2>of the <div>html</div> <abc foo="bar">tags inside </abc> of a p element <abstract>even this one here</abstract> <a href=https://orfun.com/acrylic class="color-bebe" target="_new">UNLESS THE TAG IS AN a tag</a>. <A HREF="blah">uppercase A tags don't count</A> Text should be left as is</p> <div> This is not a p tag, so <all>the tags</all> in these <here>tags</here> should be left <a href="orneorne">untouched.</a> <p>but <a href="reorn">not</a> <this>tag!</this></p> </div>into this:
<p class="mb-40px">Delete ALL of the html tags inside of a p element even this one here <a href=https://orfun.com/acrylic class="color-bebe" target="_new">UNLESS THE TAG IS AN a tag</a>. uppercase A tags don't count Text should be left as is</p> <div> This is not a p tag, so <all>the tags</all> in these <here>tags</here> should be left <a href="orneorne">untouched.</a> <p>but <a href="reorn">not</a> tag!</p> </div> -
@Mark-Olson said in Regex: Delete all html tags inside 2 other tags, except <a href=.*?"> and </a>:
(?s-i)(?:<p[^>]>|(?!\A)\G)(?:(?!</p>).)?\K<(?!(?:/[ap]>|a\x20[^>]+>))[^>]*>
thanks a lot. But how did you manage to find this solution?
And how did you convert the text ?
Where to find this
[ap]? I never see it ! -
@Vasile-Caraus said in Regex: Delete all html tags inside 2 other tags, except <a href=.*?"> and </a>:
And how did you convert the text ?
I just used the find/replace form, with regular expressions on.
thanks a lot. But how did you manage to find this solution?
Since you’ve taken an interest, I’ll give a pretty detailed explanation of my regex.
By the way, I have a slight update that should work just as well, but is simpler:
Replace(?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K<(?!(?:/[ap]>|a\x20))[^>]*>with nothing.- It’s modeled off of guy038’s now-famous replacing in a specific region of text regex. I won’t explain all the parts of this regex that are indebted to that; you can just read his excellent explanation in the linked post.
- Specifically, the
BSRis<p[^>]*>, which is an opening p tag, and theESRis</p>, the closing p tag. - So far this accounts for the first part of the regex,
(?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K. But the tricky part is matching only tags other than<a>and the closing</p>tag. - We know that any tag we want to remove contains
<[^>]*>, that is, an opening<, some stuff, and a closing>. - To distinguish the tags we want to remove, we’ll do a negative lookahead right after the opening
<, so we get<(?!{%distinguishing text%})[^>]*>. - Let’s start by observing that the tag cannot be a closing a or p tag. This is the
/[ap]>branch of the negative lookahead, where[ap]simply means “a or p”. - Next we need to rule out opening a tags. This is the
a\x20branch of the negative lookahead. By the way,\x20is just another way to sayspace, as in the space you make with your space bar. Regex aficionados like to use\x20, because it can’t be mistaken for any other character. - So we arrive at the final regex,
(?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K<(?!(?:/[ap]>|a\x20))[^>]*>