Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Removing everything but the content of certain HTML tags

    Help wanted · · · – – – · · ·
    2
    3
    4357
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Eugene Fishgalov
      Eugene Fishgalov last edited by

      Say, I open an HTML page in Notepad++.

      This page has a lot of stuff, but especially these two tags:

      <div id=“first id” class=“first class”>CONTENT</div>

      <div id=“second id” class=“second class”>CONTENT</div>

      I’d like to remove everything from the file, but the CONTENT of these two tags. How could I do that in the most efficient manner?

      1 Reply Last reply Reply Quote 1
      • StanDog
        StanDog last edited by StanDog

        I’m not sure, if this is an efficient way, but at least it is one way: You could use a regular expressions and replace everything with the parentheses placeholders. Open the replace dialog (Ctrl + H) and enter in “Find what” following regular expression:

        (.*?)(<div id=\"first id\" class=\"first class\">)(.*?)(<\/div>)(.*?)(<div id=\"second id\" class=\"second class\">)(.*?)(<\/div>)(.*)
        

        And in the “Replace with” field: ${3}${7}
        Or, alternatively: ${3}\r\n${7}

        The second one will add a line break between the two contents. You must also set the “Search Mode” to “Regular expression” and check the checkmark “. matches newline”. Finally, click “Replace all”.

        1 Reply Last reply Reply Quote 2
        • Eugene Fishgalov
          Eugene Fishgalov last edited by

          This worked well! Thank you!

          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          Copyright © 2014 NodeBB Forums | Contributors