Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    DELETE DUPICATE URLS WITH SAME DOMAIN

    Help wanted · · · – – – · · ·
    2
    2
    103
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Varun Teja
      Varun Teja last edited by

      https://www.bizjournals.com/houston/news/2021/07/27/buc-ees-travel-center-calhoun-georgia-opening-date.html
      https://www.bizjournals.com/orlando/news/2020/10/19/3-ways-to-use-twitter-fleets-for-business.html
      i have these trpe of urls
      but i need only any one is there any plugin or method to delete duplicate lines only pointing towards domain?

      thanks if helped!!!

      PeterJones 1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones @Varun Teja last edited by PeterJones

        @Varun-Teja ,

        If it doesn’t matter which one you keep (that is, if it’s okay to keep only the last instance of a specific domain), then I would suggest doing it this way:

        • FIND = ((?-s)^.*?(https?://[^/]*/).*?$(\R|\Z))(?=(?s).*\2)
        • REPLACE = leave empty
        • SEARCH MODE = regular expression
          3eb6a6e8-c5c8-47ca-9e2d-8ad11dbdcb4f-image.png

        If it doesn’t matter what order they are in, then you could sort first (Edit > Line Operations > Sort Lexiocographically Ascending) and then use that replacement. (edit: Though that’s pointless, because just doing the first also is thus “doesn’t matter what order it’s in”)

        If it does matter what order they are in, then you could use column-select (alt+click+drag) to select the zeroth column in the file, then use Edit > Column Editor > Number to Insert to insert numbers:
        c54d34e2-89c3-4097-a432-00db12883acc-image.png
        (You might want to do a second column select and then also insert a space between the numbers and the lines by selecting the zero-width column after the numbers and then typing a space)

        1 https://www.fourthdomain.example/misc
        2 https://www.fifthdomain.example/elsewhat
        3 https://www.seconddomain.example/elsewhat
        4 https://www.firstdomain.example/blah
        5 https://www.seconddomain.example/blah
        6 https://www.fourthdomain.example/blah
        7 https://www.thirddomain.example/blah
        8 https://www.firstdomain.example/elsewhat
        

        Then, after that, sort descending (so in my example, it would be 8 down to 1). Then do the replacement I showed above. Then sort ascending again. Then remove the leading numbers (another column select followed by cut or backspace, or do a search-and-replace regular expression of FIND=^\d+\x20* and replace with nothing)

        1 Reply Last reply Reply Quote 2
        • First post
          Last post
        Copyright © 2014 NodeBB Forums | Contributors