• Login
Community
  • Login

DELETE DUPICATE URLS WITH SAME DOMAIN

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
2 Posts 2 Posters 639 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Varun Teja
    last edited by Aug 2, 2021, 4:45 AM

    https://www.bizjournals.com/houston/news/2021/07/27/buc-ees-travel-center-calhoun-georgia-opening-date.html
    https://www.bizjournals.com/orlando/news/2020/10/19/3-ways-to-use-twitter-fleets-for-business.html
    i have these trpe of urls
    but i need only any one is there any plugin or method to delete duplicate lines only pointing towards domain?

    thanks if helped!!!

    P 1 Reply Last reply Aug 2, 2021, 1:29 PM Reply Quote 0
    • P
      PeterJones @Varun Teja
      last edited by PeterJones Aug 2, 2021, 1:30 PM Aug 2, 2021, 1:29 PM

      @Varun-Teja ,

      If it doesn’t matter which one you keep (that is, if it’s okay to keep only the last instance of a specific domain), then I would suggest doing it this way:

      • FIND = ((?-s)^.*?(https?://[^/]*/).*?$(\R|\Z))(?=(?s).*\2)
      • REPLACE = leave empty
      • SEARCH MODE = regular expression
        3eb6a6e8-c5c8-47ca-9e2d-8ad11dbdcb4f-image.png

      If it doesn’t matter what order they are in, then you could sort first (Edit > Line Operations > Sort Lexiocographically Ascending ) and then use that replacement. (edit: Though that’s pointless, because just doing the first also is thus “doesn’t matter what order it’s in”)

      If it does matter what order they are in, then you could use column-select (alt+click+drag) to select the zeroth column in the file, then use Edit > Column Editor > Number to Insert to insert numbers:
      c54d34e2-89c3-4097-a432-00db12883acc-image.png
      (You might want to do a second column select and then also insert a space between the numbers and the lines by selecting the zero-width column after the numbers and then typing a space)

      1 https://www.fourthdomain.example/misc
      2 https://www.fifthdomain.example/elsewhat
      3 https://www.seconddomain.example/elsewhat
      4 https://www.firstdomain.example/blah
      5 https://www.seconddomain.example/blah
      6 https://www.fourthdomain.example/blah
      7 https://www.thirddomain.example/blah
      8 https://www.firstdomain.example/elsewhat
      

      Then, after that, sort descending (so in my example, it would be 8 down to 1). Then do the replacement I showed above. Then sort ascending again. Then remove the leading numbers (another column select followed by cut or backspace, or do a search-and-replace regular expression of FIND=^\d+\x20* and replace with nothing)

      1 Reply Last reply Reply Quote 2
      1 out of 2
      • First post
        1/2
        Last post
      The Community of users of the Notepad++ text editor.
      Powered by NodeBB | Contributors