Community
    • Login

    DELETE DUPICATE URLS WITH SAME DOMAIN

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 638 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Varun TejaV
      Varun Teja
      last edited by

      https://www.bizjournals.com/houston/news/2021/07/27/buc-ees-travel-center-calhoun-georgia-opening-date.html
      https://www.bizjournals.com/orlando/news/2020/10/19/3-ways-to-use-twitter-fleets-for-business.html
      i have these trpe of urls
      but i need only any one is there any plugin or method to delete duplicate lines only pointing towards domain?

      thanks if helped!!!

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Varun Teja
        last edited by PeterJones

        @Varun-Teja ,

        If it doesn’t matter which one you keep (that is, if it’s okay to keep only the last instance of a specific domain), then I would suggest doing it this way:

        • FIND = ((?-s)^.*?(https?://[^/]*/).*?$(\R|\Z))(?=(?s).*\2)
        • REPLACE = leave empty
        • SEARCH MODE = regular expression
          3eb6a6e8-c5c8-47ca-9e2d-8ad11dbdcb4f-image.png

        If it doesn’t matter what order they are in, then you could sort first (Edit > Line Operations > Sort Lexiocographically Ascending) and then use that replacement. (edit: Though that’s pointless, because just doing the first also is thus “doesn’t matter what order it’s in”)

        If it does matter what order they are in, then you could use column-select (alt+click+drag) to select the zeroth column in the file, then use Edit > Column Editor > Number to Insert to insert numbers:
        c54d34e2-89c3-4097-a432-00db12883acc-image.png
        (You might want to do a second column select and then also insert a space between the numbers and the lines by selecting the zero-width column after the numbers and then typing a space)

        1 https://www.fourthdomain.example/misc
        2 https://www.fifthdomain.example/elsewhat
        3 https://www.seconddomain.example/elsewhat
        4 https://www.firstdomain.example/blah
        5 https://www.seconddomain.example/blah
        6 https://www.fourthdomain.example/blah
        7 https://www.thirddomain.example/blah
        8 https://www.firstdomain.example/elsewhat
        

        Then, after that, sort descending (so in my example, it would be 8 down to 1). Then do the replacement I showed above. Then sort ascending again. Then remove the leading numbers (another column select followed by cut or backspace, or do a search-and-replace regular expression of FIND=^\d+\x20* and replace with nothing)

        1 Reply Last reply Reply Quote 2
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors