@Vasile-Caraus said in Regex: Delete all html tags inside 2 other tags, except <a href=.*?"> and </a>:
And how did you convert the text ?
I just used the find/replace form, with regular expressions on.
thanks a lot. But how did you manage to find this solution?
Since you’ve taken an interest, I’ll give a pretty detailed explanation of my regex.
By the way, I have a slight update that should work just as well, but is simpler:
Replace (?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K<(?!(?:/[ap]>|a\x20))[^>]*> with nothing.
It’s modeled off of
guy038’s now-famous replacing in a specific region of text regex. I won’t explain all the parts of this regex that are indebted to that; you can just read his excellent explanation in the linked post.
Specifically, the BSR is <p[^>]*>, which is an
opening p tag, and the ESR is </p>, the
closing p tag.
So far this accounts for the first part of the regex, (?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K. But the tricky part is matching only tags other than <a> and the closing </p> tag.
We know that any tag we want to remove contains <[^>]*>, that is, an opening <, some stuff, and a closing >.
To distinguish the tags we want to remove, we’ll do a negative lookahead right after the opening <, so we get <(?!{%distinguishing text%})[^>]*>.
Let’s start by observing that the tag
cannot be a closing a or p tag. This is the /[ap]> branch of the negative lookahead, where [ap] simply means “a or p”.
Next we need to rule out
opening a tags. This is the a\x20 branch of the negative lookahead. By the way, \x20 is just another way to say space, as in the space you make with your space bar. Regex aficionados like to use \x20, because it can’t be mistaken for any other character.
So we arrive at the final regex, (?s-i)(?:<p[^>]*>|(?!\A)\G)(?:(?!</p>).)*?\K<(?!(?:/[ap]>|a\x20))[^>]*>