Eliminating duplicate (identical) lines



  • Hell!
    Is there any way that will find and eliminate duplicate lines?

    Example:
    Machine 1
    Machine 2
    Machine 2
    Machine 2
    Machine 3
    Machine 4
    Machine 4
    Machine 5

    Once cleaned up, should give:
    Machine 1
    Machine 2
    Machine 3
    Machine 4
    Machine 5

    Thank you for any suggestion!
    Ed



  • Hello, Ed,

    The suppression of all the duplicate lines, in a pre-sorted file, can be easily obtained with a Search/Replacement, in Regular expression mode !

    • Open your file, containing the sorted list of items

    • Open the Replace dialog ( CTRL + H )

    • In the Search what: field, type (?-s)(^.+\R)\1+

    • In the Replace with: field, type \1

    • Check the Regular expression radio button

    • Click on the Replace All button

    Et voilà !!

    Notes :

    • The (?s) in-line modifier ensures you that the special regex dot character will match standard characters, only, even if you, previously, checked the . matches newline option !

    • Then, the part ^.+\R matches all the characters ( .+ ) of any non-empty line, between the beginning of line (^ ) and its End of Line character(s) ( \R ), included

    • So, the part (^.+\R), enclosed by round brackets, simply stores any complete line contents, as group 1

    • Finally the part \1+ tries to match any positive amount of subsequent identical lines, following the previous line

    • And if a overall match can be found, all that block of identical lines, is just replaced by the group 1 ( \1 ), that is to say, ONE copy of that block

    Best Regards,

    guy038


Log in to reply