Community
    • Login

    Removing everything but the content of certain HTML tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Eugene FishgalovE
      Eugene Fishgalov
      last edited by

      Say, I open an HTML page in Notepad++.

      This page has a lot of stuff, but especially these two tags:

      <div id=“first id” class=“first class”>CONTENT</div>

      <div id=“second id” class=“second class”>CONTENT</div>

      I’d like to remove everything from the file, but the CONTENT of these two tags. How could I do that in the most efficient manner?

      1 Reply Last reply Reply Quote 1
      • StanDogS
        StanDog
        last edited by StanDog

        I’m not sure, if this is an efficient way, but at least it is one way: You could use a regular expressions and replace everything with the parentheses placeholders. Open the replace dialog (Ctrl + H) and enter in “Find what” following regular expression:

        (.*?)(<div id=\"first id\" class=\"first class\">)(.*?)(<\/div>)(.*?)(<div id=\"second id\" class=\"second class\">)(.*?)(<\/div>)(.*)
        

        And in the “Replace with” field: ${3}${7}
        Or, alternatively: ${3}\r\n${7}

        The second one will add a line break between the two contents. You must also set the “Search Mode” to “Regular expression” and check the checkmark “. matches newline”. Finally, click “Replace all”.

        1 Reply Last reply Reply Quote 2
        • Eugene FishgalovE
          Eugene Fishgalov
          last edited by

          This worked well! Thank you!

          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors