Community
    • Login

    How to remove all HTML tags except <p> or <h1> <h2> tags?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • NZ SelectN
      NZ Select
      last edited by

      I have several articles in txt files under a directory.

      The articles’ html code is somehow messed up.

      I wish to remove all html tags except <p> or <h1> <h2> tags

      The following code is removing all HTML tags
      <[^>]+>

      How to add an exception?
      Keep any tags that have p, h1 or h2

      Thank you in advance for your sharing of RegEx knowledge!

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @NZ Select
        last edited by

        @NZ-Select ,

        I would recommend a negative lookahead assertion: FIND = <(?!h1|h2|p)[^>]+>: that says, “look for <, lookahead and make sure it isn’t h1 or h2 or p, consume one or more non-> characters until the first > found”

        NZ SelectN 1 Reply Last reply Reply Quote 0
        • NZ SelectN
          NZ Select @PeterJones
          last edited by

          @PeterJones

          Thank you for the reply.

          This code now replace all html codes except h1,h2,or p tag
          <(?!h1|h2|p)[^>]+>

          But I notice that it also replace the ending </h1>, </h2>, and </p>
          I tried to use these below try to keep the above tags, it failed.
          <(?!h1|/h2|h2|/h2||p|/p)[^>]+>
          or this
          <(?!h1|\h2|h2|\h2||p|\p)[^>]+>

          Would you advise how to keep the trialing tags?

          1 Reply Last reply Reply Quote 0
          • NZ SelectN
            NZ Select
            last edited by

            I found this code will do the job
            </?(?!a)(?!p)(?!ul)(?!li)(?!h)\w*\b[^>]*>

            1 Reply Last reply Reply Quote 2
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors