Community
    • Login

    remove duplicate urls

    Scheduled Pinned Locked Moved General Discussion
    7 Posts 4 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • El FAROUZE Offline
      El FAROUZ
      last edited by

      Hello can somone help with this please

      input:

      http://www.abc.com/123
      http://www.abc.com/456
      http://www.def.com/223
      http://www.def.com/556
      http://www.def.com/602
      http://www.ghi.com/700
      http://www.ghi.com/731
      http://www.qwe.com/667
      http://www.qwe.com/667
      http://www.qwe.com/667

      Output:

      http://www.abc.com/123
      http://www.def.com/223
      http://www.ghi.com/700
      http://www.qwe.com/667

      i found this but it doesn’t work with notepad++

      ^(http://[^/]+/)(.*$\n?)((\1)(?2))+

      replace with $1$2

      1 Reply Last reply Reply Quote 0
      • El FAROUZE Offline
        El FAROUZ
        last edited by

        @guy038 can you help please sir ? <3

        1 Reply Last reply Reply Quote 0
        • Terry RT Offline
          Terry R
          last edited by

          @El-FAROUZ said in remove duplicate urls:

          Hello can somone help with this please

          If it were me I would do the following:

          1. Insert line numbers and order the lines descending (backwards)
          2. Use a regex to remove the current line if the next line contains the same address
          3. Re-order in line ascending order and then remove the line numbers.

          So:

          1. Have the cursor in the very first position of the file. Use the Column editor to first insert a ,(comma), then insert a number starting with 1, increasing by 1 and with “leading zero” ticked. Then use the Line Operation function to order lines in Integer Descending.
          2. Using the Replace function we have
            Find What:(?-s)^\d+,http://([^/]+)/.+\R(?=[^/]+?//\1)
            Replace With: empty field here so it erases the line.
            As this is a regex the “search mode” must be “regular expression” Click on "Replace All button.
          3. Re-order the lines as Integer Ascending. Then use the Replace function again with:
            Find What:^\d+,
            Replace With: empty field here so it removes the line numbers and comma.

          At this point you should have your required results.

          Terry

          Alan KilbornA 1 Reply Last reply Reply Quote 4
          • Alan KilbornA Offline
            Alan Kilborn @Terry R
            last edited by

            @Terry-R said in remove duplicate urls:

            Step 1 might be a bit unclear for the novice user, because it packs a lot in. Terry, if you’ll allow, I’d specify it like this:

            1a. Have the cursor in the very first position of the file. Use the Column editor to insert a ,(comma) via Text to Insert; the caret will remain in the very first position of the file after the insertion.

            1b. Use the Column editor’s Number to Insert option to insert a number starting with 1, increasing by 1 and with “leading zero” ticked to add incrementing numbers to the start of every line. Then use the Line Operation function to order lines in Integer Descending.

            Overall, a nice solution!

            1 Reply Last reply Reply Quote 4
            • guy038G Offline
              guy038
              last edited by

              Hello @el-farouz, @terry-r, @alan-kilborn and All,

              Terry, I don’t see the necessity of inserting line numbers !?

              For instance, given the @el-farouz’s list, not sorted at all, as below :

              http://www.def.com/602
              http://www.abc.com/123
              http://www.qwe.com/667
              http://www.ghi.com/700
              http://www.def.com/556
              http://www.abc.com/456
              http://www.ghi.com/731
              http://www.qwe.com/667
              http://www.qwe.com/667
              http://www.def.com/223
              

              We select this block of addresses and perform an ascending sort ( Edit > Line Operations > Sort Lines Lexicographically Ascending )

              http://www.abc.com/123
              http://www.abc.com/456
              http://www.def.com/223
              http://www.def.com/556
              http://www.def.com/602
              http://www.ghi.com/700
              http://www.ghi.com/731
              http://www.qwe.com/667
              http://www.qwe.com/667
              http://www.qwe.com/667
              

              And, with the following regex S/R :

              SEARH ^(http://(.+?)/.+\R)(?:http://\2.+\R)+

              REPLACE /1

              We directly get our expected list :

              http://www.abc.com/123
              http://www.def.com/223
              http://www.ghi.com/700
              http://www.qwe.com/667
              

              Am I missing something obvious ?

              Best Regards,

              guy038

              Alan KilbornA Terry RT 2 Replies Last reply Reply Quote 2
              • Alan KilbornA Offline
                Alan Kilborn @guy038
                last edited by

                @guy038

                Perhaps Terry is just trying to cover the more general case, where the lines are not in any kind of pre-sorted order, and one wants to keep the original order while removing the duplicate URLs.

                1 Reply Last reply Reply Quote 2
                • Terry RT Offline
                  Terry R @guy038
                  last edited by

                  @guy038 said in remove duplicate urls:

                  Am I missing something obvious ?

                  I made no assumptions about the list, I just wanted to keep the order that did exist in reverse. The OP had pivoted my solution suggesting it worked for them.

                  Terry

                  1 Reply Last reply Reply Quote 2

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors