remove duplicate urls
-
Hello can somone help with this please
input:
http://www.abc.com/123
http://www.abc.com/456
http://www.def.com/223
http://www.def.com/556
http://www.def.com/602
http://www.ghi.com/700
http://www.ghi.com/731
http://www.qwe.com/667
http://www.qwe.com/667
http://www.qwe.com/667Output:
http://www.abc.com/123
http://www.def.com/223
http://www.ghi.com/700
http://www.qwe.com/667i found this but it doesn’t work with notepad++
^(http://[^/]+/)(.*$\n?)((\1)(?2))+
replace with $1$2
-
@guy038 can you help please sir ? <3
-
@El-FAROUZ said in remove duplicate urls:
Hello can somone help with this please
If it were me I would do the following:
- Insert line numbers and order the lines descending (backwards)
- Use a regex to remove the current line if the next line contains the same address
- Re-order in line ascending order and then remove the line numbers.
So:
- Have the cursor in the very first position of the file. Use the Column editor to first insert a
,(comma), then insert a number starting with 1, increasing by 1 and with “leading zero” ticked. Then use the Line Operation function to order lines in Integer Descending. - Using the Replace function we have
Find What:(?-s)^\d+,http://([^/]+)/.+\R(?=[^/]+?//\1)
Replace With: empty field here so it erases the line.
As this is a regex the “search mode” must be “regular expression” Click on "Replace All button. - Re-order the lines as Integer Ascending. Then use the Replace function again with:
Find What:^\d+,
Replace With: empty field here so it removes the line numbers and comma.
At this point you should have your required results.
Terry
-
@Terry-R said in remove duplicate urls:
Step 1 might be a bit unclear for the novice user, because it packs a lot in. Terry, if you’ll allow, I’d specify it like this:
1a. Have the cursor in the very first position of the file. Use the Column editor to insert a ,(comma) via Text to Insert; the caret will remain in the very first position of the file after the insertion.
1b. Use the Column editor’s Number to Insert option to insert a number starting with 1, increasing by 1 and with “leading zero” ticked to add incrementing numbers to the start of every line. Then use the Line Operation function to order lines in Integer Descending.
Overall, a nice solution!
-
Hello @el-farouz, @terry-r, @alan-kilborn and All,
Terry, I don’t see the necessity of inserting line numbers !?
For instance, given the @el-farouz’s list, not sorted at all, as below :
http://www.def.com/602 http://www.abc.com/123 http://www.qwe.com/667 http://www.ghi.com/700 http://www.def.com/556 http://www.abc.com/456 http://www.ghi.com/731 http://www.qwe.com/667 http://www.qwe.com/667 http://www.def.com/223We select this block of addresses and perform an ascending sort (
Edit > Line Operations > Sort Lines Lexicographically Ascending)http://www.abc.com/123 http://www.abc.com/456 http://www.def.com/223 http://www.def.com/556 http://www.def.com/602 http://www.ghi.com/700 http://www.ghi.com/731 http://www.qwe.com/667 http://www.qwe.com/667 http://www.qwe.com/667And, with the following regex S/R :
SEARH
^(http://(.+?)/.+\R)(?:http://\2.+\R)+REPLACE
/1We directly get our expected list :
http://www.abc.com/123 http://www.def.com/223 http://www.ghi.com/700 http://www.qwe.com/667Am I missing something obvious ?
Best Regards,
guy038
-
Perhaps Terry is just trying to cover the more general case, where the lines are not in any kind of pre-sorted order, and one wants to keep the original order while removing the duplicate URLs.
-
@guy038 said in remove duplicate urls:
Am I missing something obvious ?
I made no assumptions about the list, I just wanted to keep the order that did exist in reverse. The OP had pivoted my solution suggesting it worked for them.
Terry
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login