How to remove duplicate entries in each line
-
I’ve an HTML file with duplicate
href
attributes like this:<a href="https://example.com/" href="https://example.com/" target="_blank">Example</a> <a href="https://website.com/" href="https://website.com/" target="_blank">Website</a> <a href="https://sample.com/" href="https://sample.com/" target="_blank">Sample</a>
How can I remove the duplicate
href
attribute in each line using Notepad++? -
Maybe try:
find:
(?-s)(href="https://.+?\.com/" )(?=\1)
repl: nothing
mode: Regular expression -
Hi @alan-kilborn, thanks a lot for the reply. Appreciate your time.
I forgot to mention that there are various domains like.org
,.co
,.gov
etc in the file.Is there a regex that handles all of these?
-
@zcraber said in How to remove duplicate entries in each line:
I forgot to mention that
Changes spec after solution is provided. :-(
-
@zcraber said in How to remove duplicate entries in each line:
there are various domains like .org, .co, .gov etc in the file
(?-s)(href="https://.+?\.(?:com|org|gov)/" )(?=\1)
-
@alan-kilborn Thank you.
Changes spec after solution is provided. :-(
Sorry about that. Next time I’ll be more specific. :)