Removing everything but the content of certain HTML tags
-
Say, I open an HTML page in Notepad++.
This page has a lot of stuff, but especially these two tags:
<div id=“first id” class=“first class”>CONTENT</div>
<div id=“second id” class=“second class”>CONTENT</div>
I’d like to remove everything from the file, but the CONTENT of these two tags. How could I do that in the most efficient manner?
-
I’m not sure, if this is an efficient way, but at least it is one way: You could use a regular expressions and replace everything with the parentheses placeholders. Open the replace dialog (Ctrl + H) and enter in “Find what” following regular expression:
(.*?)(<div id=\"first id\" class=\"first class\">)(.*?)(<\/div>)(.*?)(<div id=\"second id\" class=\"second class\">)(.*?)(<\/div>)(.*)
And in the “Replace with” field: ${3}${7}
Or, alternatively: ${3}\r\n${7}The second one will add a line break between the two contents. You must also set the “Search Mode” to “Regular expression” and check the checkmark “. matches newline”. Finally, click “Replace all”.
-
This worked well! Thank you!