How to remove all HTML tags except <p> or <h1> <h2> tags?
- 
 I have several articles in txt files under a directory. The articles’ html code is somehow messed up. I wish to remove all html tags except <p> or <h1> <h2> tags The following code is removing all HTML tags 
 <[^>]+>How to add an exception? 
 Keep any tags that have p, h1 or h2Thank you in advance for your sharing of RegEx knowledge! 
- 
 I would recommend a negative lookahead assertion: FIND = <(?!h1|h2|p)[^>]+>: that says, “look for <, lookahead and make sure it isn’t h1 or h2 or p, consume one or more non-> characters until the first > found”
- 
 Thank you for the reply. This code now replace all html codes except h1,h2,or p tag 
 <(?!h1|h2|p)[^>]+>But I notice that it also replace the ending </h1>, </h2>, and </p> 
 I tried to use these below try to keep the above tags, it failed.
 <(?!h1|/h2|h2|/h2||p|/p)[^>]+>
 or this
 <(?!h1|\h2|h2|\h2||p|\p)[^>]+>Would you advise how to keep the trialing tags? 
- 
 I found this code will do the job 
 </?(?!a)(?!p)(?!ul)(?!li)(?!h)\w*\b[^>]*>
