Community
    • Login

    Remove duplicate lines not possible?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    24 Posts 6 Posters 4.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • CletosC
      Cletos
      last edited by

      Hello Sofistanpp,

      OK, sounds very good! Many thanks!

      SofistanppS 1 Reply Last reply Reply Quote 0
      • SofistanppS
        Sofistanpp @Cletos
        last edited by

        @Cletos Glad to be of help.

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Sofistanpp
          last edited by

          @Sofistanpp

          Maybe explain how reversing the lines helps?

          SofistanppS 1 Reply Last reply Reply Quote 0
          • SofistanppS
            Sofistanpp @Alan Kilborn
            last edited by

            @Alan-Kilborn Sure. It looks to overcome a limitation pointed out by guy038, who wrote that the regex he posted remove all the duplicates except the last one, but it seems that he wanted to keep the first one. So if you reverse the order of lines and run the regex, you will remove, of course, all the instances except the last duplicate — now reverse the list back to the original order and you would have actually kept the first instance of the line —the “bbb” between “aaa” and “ccc” of the example.

            Hope it is clear now (English is not my first language).

            Best Regards.

            Alan KilbornA 1 Reply Last reply Reply Quote 3
            • Alan KilbornA
              Alan Kilborn @Sofistanpp
              last edited by

              @Sofistanpp

              Ah, okay, I missed the point about wanting to keep the first rather than the last. Thanks for the clarification.

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi, @cletos, @sofistanpp, @alan-kilborn and All,

                @sofistanpp, I didn’t want to privilege any solution but, indeed, it’s good to be able to chose, with your clever idea of using the Reverse Lines plugin, between these two solutions :

                • Keep the first duplicate line and delete all subsequent duplicate lines

                • Delete any duplicate but just keep the last duplicate line

                Now, thinking about it, I found out a solution which can be processed within N++ only, preventing from using any external tool


                If we go back to my previous example, open the Column editor ( Edit > Column Editor... ) and, moving the caret to the first column of the first line of your text, create a new number’s list ( Don’t forget to tick the Leading zeros option ! )

                Then after adding 1 or several blank character(s), after each number, with the column mode selection, you should get :

                
                01 aaa
                02 bbb
                03 ccc
                04 ddd
                05 bbb
                06 bbb
                07 eee
                08 fff
                09 bbb
                10 ggg
                11 bbb
                12 hhh
                13 iii
                

                Now, sort the lines with the option Edit > Line Operations > Sort Lines Lexicographically Descending, giving :

                13 iii
                12 hhh
                11 bbb
                10 ggg
                09 bbb
                08 fff
                07 eee
                06 bbb
                05 bbb
                04 ddd
                03 ccc
                02 bbb
                01 aaa
                

                Finally, after running this new version of my previous regex S/R :

                • SEARCH (?-s)^\d+\h+(.+\R)(?=(?s:.*)^\d+\h+\1)

                • REPLACE Leave EMPTY

                You’re left with :

                13 iii
                12 hhh
                10 ggg
                08 fff
                07 eee
                04 ddd
                03 ccc
                02 bbb
                01 aaa
                

                Finally, after the second sort Edit > Line Operations > Sort Lines Lexicographically Ascending, in the reverse order, we have the following output text :

                01 aaa
                02 bbb
                03 ccc
                04 ddd
                07 eee
                08 fff
                10 ggg
                12 hhh
                13 iii
                

                As expected, it remains the duplicate bbb line between lines aaa and ccc only ;-))

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 2
                • SofistanppS
                  Sofistanpp
                  last edited by

                  Hi guy038, All:

                  Well done. I’m glad my post somehow inspired you to develop a more comprehensive solution to the current issue. As I learned reading archived posts, ancillary lists are a frequently used resource of your toolbox.

                  On my side, reversing lines wasn’t my first thought. What would happen, I asked myself, if I run that regex in backward direction from the last line? Would I get, by symmetry, the first “bbb”? Enabled the Backward direction button via an AutoHotkey script and clicked on Replace All, but no joy. You will get exactly the same outcome as if you run the regex in normal direction.

                  I suspect that lookarounds are the culprits (simpler regexes do the expected job), but haven’t thoroughly tested it.

                  Maybe you or someone else can elaborate on this issue.

                  Best Regards.

                  Alan KilbornA 1 Reply Last reply Reply Quote 1
                  • CletosC
                    Cletos
                    last edited by

                    Hello guy038,

                    Thank you you very much for the new method!

                    1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @Sofistanpp
                      last edited by Alan Kilborn

                      @Sofistanpp

                      run that regex in backward direction from the last line

                      Searching backwards with regex is “discouraged” and is partially disabled in Notepad++.
                      The reason, I think, is that thru a given text, if you search backwards versus forwards, you won’t get the same hits. Sometimes (simpler regexes, as you noted) you will, but not always (depends upon the regex and maybe the data).

                      Enabled the Backward direction button via an AutoHotkey script

                      In general, enabling disabled controls and then performing an operation and expecting good results is a dubious premise.

                      1 Reply Last reply Reply Quote 2
                      • endolithE
                        endolith @Cletos
                        last edited by endolith

                        @Cletos Yes this feature is buggy, I see it fairly often. Usually I can click “Remove duplicate lines” and it removes them all, regardless of order, but sometimes it doesn’t remove any of them. Something wrong with the software, but I can’t pinpoint what’s wrong. It depends on the text? Or I have to create a new blank document and then it works there, and then copy it back into the original?

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @endolith
                          last edited by

                          @endolith said in Remove duplicate lines not possible?:

                          It depends on the text?

                          Could be a line-ending problem?
                          If line-endings are different on otherwise duplicate lines, they won’t be considered true duplicates.

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors