Eliminating duplicate (identical) lines
-
Hell!
Is there any way that will find and eliminate duplicate lines?Example:
Machine 1
Machine 2
Machine 2
Machine 2
Machine 3
Machine 4
Machine 4
Machine 5Once cleaned up, should give:
Machine 1
Machine 2
Machine 3
Machine 4
Machine 5Thank you for any suggestion!
Ed -
Hello, Ed,
The suppression of all the duplicate lines, in a pre-sorted file, can be easily obtained with a Search/Replacement, in Regular expression mode !
-
Open your file, containing the sorted list of items
-
Open the Replace dialog ( CTRL + H )
-
In the Search what: field, type
(?-s)(^.+\R)\1+
-
In the Replace with: field, type
\1
-
Check the Regular expression radio button
-
Click on the Replace All button
Et voilà !!
Notes :
-
The
(?s)
in-line modifier ensures you that the special regex dot character will match standard characters, only, even if you, previously, checked the . matches newline option ! -
Then, the part
^.+\R
matches all the characters (.+
) of any non-empty line, between the beginning of line (^
) and its End of Line character(s) (\R
), included -
So, the part
(^.+\R)
, enclosed by round brackets, simply stores any complete line contents, as group 1 -
Finally the part
\1+
tries to match any positive amount of subsequent identical lines, following the previous line -
And if a overall match can be found, all that block of identical lines, is just replaced by the group 1 (
\1
), that is to say, ONE copy of that block
Best Regards,
guy038
-