Remove duplicate links from the end - notpad++

robaned

Below are links with duplicate names.
The file names are similar.
For example

https://mysite.to/73cn05wqida5/Fabulous.WEB.H264-BRC.mp4.html
https://mysite.to/lg0g5t7bc6d8/Fabulous.WEB.H264-BRC.mp4.html
https://mysite.to/1eyjovgzk1f5/Fabulous.720p.WEB.H264-OND.mkv.html
https://mysite.to/gjm1xuyxmgy5/Fabulous.720p.WEB.H264-OND.mkv.html
https://mysite.to/lwjny2xiuatk/Fabulous.1080p.AMZN.WEBRip.DD5.1.x264-TTH.mkv.html
https://mysite.to/6aivx4f1xe86/Fabulous.1080p.AMZN.WEBRip.DD5.1.x264-TTH.mkv.html

I wanted duplicate links to be removed.

In this way :

https://mysite.to/73cn05wqida5/Fabulous.WEB.H264-BRC.mp4.html
https://mysite.to/gjm1xuyxmgy5/Fabulous.720p.WEB.H264-OND.mkv.html
https://mysite.to/lwjny2xiuatk/Fabulous.1080p.AMZN.WEBRip.DD5.1.x264-TTH.mkv.html

Terry R

@robaned
I suspect you actually just made these up and weren’t consistent.
For the first 2 lines you picked the first of them to remain, the second being the duplicate. You repeated this for the 5th and 6th lines. However when it comes to the 3rd and 4th lines, you picked the 4th line to output.

Unless you can identify why you need to do that instead of just selecting the first of any duplicates to remain, no one is going to be able to help you.

Terry

robaned

It doesn’t matter which links are removed, I just want duplicates removed.

Terry R

@robaned

Any solution will always remove the same one in a duplicate set. What you also haven’t told us, is if there will be more than 2 duplicate lines.

Terry

robaned

Yes, there are more than two duplicate lines.

Terry R

@robaned
This solution will keep the last of the duplicate lines for each set.

This is a regular expression (regex), so search mode in the Replace function must be set to “regular expression”. Make sure the cursor is at the start of the first line and click on Replace All.
Find What:(?-s)^.+?/([^/]+)\R(?=.+?\1)
Replace With: nothing here, an empty field.

Note this will only remove lines that are together in the set, leaving the last of each set.

Terry

guy038

Hello @robaned, @terry-r and All,

Terry, your regex (?-s)^.+?/([^/]+)\R(?=.+?\1) works as expected. Howewer you still could shorten it !

Indeed, if you have begun your regex with (?-s)^.+/, obviously, the remainder of current line cannot contain any / char anymore !

Thus, your search regex can be simplified to :

(?-s)^.+/(.+)\R(?=.+\1)

I found out an other solution which could be faster in case of numerous duplicates :

FIND (?-s)^(.+/(.+)\R)(.+/\2\R)+

REPLACE $1

My solution acts as the opposite of yours : it keeps the first duplicate line of each set !

Best Regards,

guy038