Remove duplicate lines not possible?



  • @guy038 said in Remove duplicate lines not possible?:

    I cannot get an other layout, with a correct regex S/R ! ( For instance, keeping the line bbb between lines aaa and ccc and deleting all subsequent bbb lines ) Sorry for this limitation !

    Hi guy038, Cletos, All:

    Not a regex solution, but if you reverse the list —for example, by means of the Reverse Lines plugin— and run the nice regex you provided, you will get the first “bbb” with all duplicates being deleted. Once you are finished, reverse the list again to get the original order of lines.

    Hope you find this, my first post here, useful.

    Best Regards.



  • Hello Sofistanpp,

    OK, sounds very good! Many thanks!



  • @Cletos Glad to be of help.



  • @Sofistanpp

    Maybe explain how reversing the lines helps?



  • @Alan-Kilborn Sure. It looks to overcome a limitation pointed out by guy038, who wrote that the regex he posted remove all the duplicates except the last one, but it seems that he wanted to keep the first one. So if you reverse the order of lines and run the regex, you will remove, of course, all the instances except the last duplicate — now reverse the list back to the original order and you would have actually kept the first instance of the line —the “bbb” between “aaa” and “ccc” of the example.

    Hope it is clear now (English is not my first language).

    Best Regards.



  • @Sofistanpp

    Ah, okay, I missed the point about wanting to keep the first rather than the last. Thanks for the clarification.



  • Hi, @cletos, @sofistanpp, @alan-kilborn and All,

    @sofistanpp, I didn’t want to privilege any solution but, indeed, it’s good to be able to chose, with your clever idea of using the Reverse Lines plugin, between these two solutions :

    • Keep the first duplicate line and delete all subsequent duplicate lines

    • Delete any duplicate but just keep the last duplicate line

    Now, thinking about it, I found out a solution which can be processed within N++ only, preventing from using any external tool


    If we go back to my previous example, open the Column editor ( Edit > Column Editor... ) and, moving the caret to the first column of the first line of your text, create a new number’s list ( Don’t forget to tick the Leading zeros option ! )

    Then after adding 1 or several blank character(s), after each number, with the column mode selection, you should get :

    
    01 aaa
    02 bbb
    03 ccc
    04 ddd
    05 bbb
    06 bbb
    07 eee
    08 fff
    09 bbb
    10 ggg
    11 bbb
    12 hhh
    13 iii
    

    Now, sort the lines with the option Edit > Line Operations > Sort Lines Lexicographically Descending, giving :

    13 iii
    12 hhh
    11 bbb
    10 ggg
    09 bbb
    08 fff
    07 eee
    06 bbb
    05 bbb
    04 ddd
    03 ccc
    02 bbb
    01 aaa
    

    Finally, after running this new version of my previous regex S/R :

    • SEARCH (?-s)^\d+\h+(.+\R)(?=(?s:.*)^\d+\h+\1)

    • REPLACE Leave EMPTY

    You’re left with :

    13 iii
    12 hhh
    10 ggg
    08 fff
    07 eee
    04 ddd
    03 ccc
    02 bbb
    01 aaa
    

    Finally, after the second sort Edit > Line Operations > Sort Lines Lexicographically Ascending, in the reverse order, we have the following output text :

    01 aaa
    02 bbb
    03 ccc
    04 ddd
    07 eee
    08 fff
    10 ggg
    12 hhh
    13 iii
    

    As expected, it remains the duplicate bbb line between lines aaa and ccc only ;-))

    Best Regards,

    guy038



  • Hi guy038, All:

    Well done. I’m glad my post somehow inspired you to develop a more comprehensive solution to the current issue. As I learned reading archived posts, ancillary lists are a frequently used resource of your toolbox.

    On my side, reversing lines wasn’t my first thought. What would happen, I asked myself, if I run that regex in backward direction from the last line? Would I get, by symmetry, the first “bbb”? Enabled the Backward direction button via an AutoHotkey script and clicked on Replace All, but no joy. You will get exactly the same outcome as if you run the regex in normal direction.

    I suspect that lookarounds are the culprits (simpler regexes do the expected job), but haven’t thoroughly tested it.

    Maybe you or someone else can elaborate on this issue.

    Best Regards.



  • Hello guy038,

    Thank you you very much for the new method!



  • @Sofistanpp

    run that regex in backward direction from the last line

    Searching backwards with regex is “discouraged” and is partially disabled in Notepad++.
    The reason, I think, is that thru a given text, if you search backwards versus forwards, you won’t get the same hits. Sometimes (simpler regexes, as you noted) you will, but not always (depends upon the regex and maybe the data).

    Enabled the Backward direction button via an AutoHotkey script

    In general, enabling disabled controls and then performing an operation and expecting good results is a dubious premise.


Log in to reply