Removing everything but the content of certain HTML tags
-
Say, I open an HTML page in Notepad++.
This page has a lot of stuff, but especially these two tags:
<div id=“first id” class=“first class”>CONTENT</div>
<div id=“second id” class=“second class”>CONTENT</div>
I’d like to remove everything from the file, but the CONTENT of these two tags. How could I do that in the most efficient manner?
-
I’m not sure, if this is an efficient way, but at least it is one way: You could use a regular expressions and replace everything with the parentheses placeholders. Open the replace dialog (Ctrl + H) and enter in “Find what” following regular expression:
(.*?)(<div id=\"first id\" class=\"first class\">)(.*?)(<\/div>)(.*?)(<div id=\"second id\" class=\"second class\">)(.*?)(<\/div>)(.*)And in the “Replace with” field: ${3}${7}
Or, alternatively: ${3}\r\n${7}The second one will add a line break between the two contents. You must also set the “Search Mode” to “Regular expression” and check the checkmark “. matches newline”. Finally, click “Replace all”.
-
This worked well! Thank you!
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login