Community
    • Login

    How to remove all HTML tags except <p> or <h1> <h2> tags?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 4.1k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • NZ SelectN Offline
      NZ Select
      last edited by

      I have several articles in txt files under a directory.

      The articles’ html code is somehow messed up.

      I wish to remove all html tags except <p> or <h1> <h2> tags

      The following code is removing all HTML tags
      <[^>]+>

      How to add an exception?
      Keep any tags that have p, h1 or h2

      Thank you in advance for your sharing of RegEx knowledge!

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP Offline
        PeterJones @NZ Select
        last edited by

        @NZ-Select ,

        I would recommend a negative lookahead assertion: FIND = <(?!h1|h2|p)[^>]+>: that says, “look for <, lookahead and make sure it isn’t h1 or h2 or p, consume one or more non-> characters until the first > found”

        NZ SelectN 1 Reply Last reply Reply Quote 0
        • NZ SelectN Offline
          NZ Select @PeterJones
          last edited by

          @PeterJones

          Thank you for the reply.

          This code now replace all html codes except h1,h2,or p tag
          <(?!h1|h2|p)[^>]+>

          But I notice that it also replace the ending </h1>, </h2>, and </p>
          I tried to use these below try to keep the above tags, it failed.
          <(?!h1|/h2|h2|/h2||p|/p)[^>]+>
          or this
          <(?!h1|\h2|h2|\h2||p|\p)[^>]+>

          Would you advise how to keep the trialing tags?

          1 Reply Last reply Reply Quote 0
          • NZ SelectN Offline
            NZ Select
            last edited by

            I found this code will do the job
            </?(?!a)(?!p)(?!ul)(?!li)(?!h)\w*\b[^>]*>

            1 Reply Last reply Reply Quote 2

            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

            With your input, this post could be even better 💗

            Register Login
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors