• Login
Community
  • Login

How to remove all HTML tags except <p> or <h1> <h2> tags?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 2 Posters 3.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N
    NZ Select
    last edited by Jul 18, 2020, 3:18 PM

    I have several articles in txt files under a directory.

    The articles’ html code is somehow messed up.

    I wish to remove all html tags except <p> or <h1> <h2> tags

    The following code is removing all HTML tags
    <[^>]+>

    How to add an exception?
    Keep any tags that have p, h1 or h2

    Thank you in advance for your sharing of RegEx knowledge!

    P 1 Reply Last reply Jul 18, 2020, 4:43 PM Reply Quote 0
    • P
      PeterJones @NZ Select
      last edited by Jul 18, 2020, 4:43 PM

      @NZ-Select ,

      I would recommend a negative lookahead assertion : FIND = <(?!h1|h2|p)[^>]+>: that says, “look for <, lookahead and make sure it isn’t h1 or h2 or p, consume one or more non-> characters until the first > found”

      N 1 Reply Last reply Jul 18, 2020, 8:46 PM Reply Quote 0
      • N
        NZ Select @PeterJones
        last edited by Jul 18, 2020, 8:46 PM

        @PeterJones

        Thank you for the reply.

        This code now replace all html codes except h1,h2,or p tag
        <(?!h1|h2|p)[^>]+>

        But I notice that it also replace the ending </h1>, </h2>, and </p>
        I tried to use these below try to keep the above tags, it failed.
        <(?!h1|/h2|h2|/h2||p|/p)[^>]+>
        or this
        <(?!h1|\h2|h2|\h2||p|\p)[^>]+>

        Would you advise how to keep the trialing tags?

        1 Reply Last reply Reply Quote 0
        • N
          NZ Select
          last edited by Jul 18, 2020, 9:42 PM

          I found this code will do the job
          </?(?!a)(?!p)(?!ul)(?!li)(?!h)\w*\b[^>]*>

          1 Reply Last reply Reply Quote 2
          4 out of 4
          • First post
            4/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors