• Login
Community
  • Login

remove duplicate urls

Scheduled Pinned Locked Moved General Discussion
7 Posts 4 Posters 1.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    El FAROUZ
    last edited by Nov 28, 2020, 2:15 AM

    Hello can somone help with this please

    input:

    http://www.abc.com/123
    http://www.abc.com/456
    http://www.def.com/223
    http://www.def.com/556
    http://www.def.com/602
    http://www.ghi.com/700
    http://www.ghi.com/731
    http://www.qwe.com/667
    http://www.qwe.com/667
    http://www.qwe.com/667

    Output:

    http://www.abc.com/123
    http://www.def.com/223
    http://www.ghi.com/700
    http://www.qwe.com/667

    i found this but it doesn’t work with notepad++

    ^(http://[^/]+/)(.*$\n?)((\1)(?2))+

    replace with $1$2

    1 Reply Last reply Reply Quote 0
    • E
      El FAROUZ
      last edited by Nov 28, 2020, 2:18 AM

      @guy038 can you help please sir ? <3

      1 Reply Last reply Reply Quote 0
      • T
        Terry R
        last edited by Nov 28, 2020, 2:39 AM

        @El-FAROUZ said in remove duplicate urls:

        Hello can somone help with this please

        If it were me I would do the following:

        1. Insert line numbers and order the lines descending (backwards)
        2. Use a regex to remove the current line if the next line contains the same address
        3. Re-order in line ascending order and then remove the line numbers.

        So:

        1. Have the cursor in the very first position of the file. Use the Column editor to first insert a ,(comma), then insert a number starting with 1, increasing by 1 and with “leading zero” ticked. Then use the Line Operation function to order lines in Integer Descending.
        2. Using the Replace function we have
          Find What:(?-s)^\d+,http://([^/]+)/.+\R(?=[^/]+?//\1)
          Replace With: empty field here so it erases the line.
          As this is a regex the “search mode” must be “regular expression” Click on "Replace All button.
        3. Re-order the lines as Integer Ascending. Then use the Replace function again with:
          Find What:^\d+,
          Replace With: empty field here so it removes the line numbers and comma.

        At this point you should have your required results.

        Terry

        A 1 Reply Last reply Nov 28, 2020, 12:15 PM Reply Quote 4
        • A
          Alan Kilborn @Terry R
          last edited by Nov 28, 2020, 12:15 PM

          @Terry-R said in remove duplicate urls:

          Step 1 might be a bit unclear for the novice user, because it packs a lot in. Terry, if you’ll allow, I’d specify it like this:

          1a. Have the cursor in the very first position of the file. Use the Column editor to insert a ,(comma) via Text to Insert; the caret will remain in the very first position of the file after the insertion.

          1b. Use the Column editor’s Number to Insert option to insert a number starting with 1, increasing by 1 and with “leading zero” ticked to add incrementing numbers to the start of every line. Then use the Line Operation function to order lines in Integer Descending.

          Overall, a nice solution!

          1 Reply Last reply Reply Quote 4
          • G
            guy038
            last edited by Nov 28, 2020, 2:22 PM

            Hello @el-farouz, @terry-r, @alan-kilborn and All,

            Terry, I don’t see the necessity of inserting line numbers !?

            For instance, given the @el-farouz’s list, not sorted at all, as below :

            http://www.def.com/602
            http://www.abc.com/123
            http://www.qwe.com/667
            http://www.ghi.com/700
            http://www.def.com/556
            http://www.abc.com/456
            http://www.ghi.com/731
            http://www.qwe.com/667
            http://www.qwe.com/667
            http://www.def.com/223
            

            We select this block of addresses and perform an ascending sort ( Edit > Line Operations > Sort Lines Lexicographically Ascending )

            http://www.abc.com/123
            http://www.abc.com/456
            http://www.def.com/223
            http://www.def.com/556
            http://www.def.com/602
            http://www.ghi.com/700
            http://www.ghi.com/731
            http://www.qwe.com/667
            http://www.qwe.com/667
            http://www.qwe.com/667
            

            And, with the following regex S/R :

            SEARH ^(http://(.+?)/.+\R)(?:http://\2.+\R)+

            REPLACE /1

            We directly get our expected list :

            http://www.abc.com/123
            http://www.def.com/223
            http://www.ghi.com/700
            http://www.qwe.com/667
            

            Am I missing something obvious ?

            Best Regards,

            guy038

            A T 2 Replies Last reply Nov 28, 2020, 2:39 PM Reply Quote 2
            • A
              Alan Kilborn @guy038
              last edited by Nov 28, 2020, 2:39 PM

              @guy038

              Perhaps Terry is just trying to cover the more general case, where the lines are not in any kind of pre-sorted order, and one wants to keep the original order while removing the duplicate URLs.

              1 Reply Last reply Reply Quote 2
              • T
                Terry R @guy038
                last edited by Nov 28, 2020, 5:47 PM

                @guy038 said in remove duplicate urls:

                Am I missing something obvious ?

                I made no assumptions about the list, I just wanted to keep the order that did exist in reverse. The OP had pivoted my solution suggesting it worked for them.

                Terry

                1 Reply Last reply Reply Quote 2
                4 out of 7
                • First post
                  4/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors