i want to keep only unique lines
-
Hi
Please i have an issue ,
i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
example :
aaa
bbb
ccc
ddd
aaa
bbbi want it to become like this :
ccc
dddI want both duplicates to be removed and keep only Unique Lines
thanks
-
Hi sophey hence and All,
Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :
-
A)
All the lines which are unique -
B)
All the duplicate lines -
C)
The first duplicate line, from all the duplicate lines
UPDATE, on 11/19/16 :
D)
All the lines which are unique AND the last duplicate line, from all the duplicate lines
For this last case
D)
, refer to that other post, below :https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7
In order to get the lines of these remaining 3 cases
A)
,B)
ORC)
,TWO methods are possible :-
METHOD 1 needs, only, a lexical sort and an appropriate regex
-
METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex
Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines
Hypotheses :
-
I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH =
^\h*\R
, REPLACE =EMPTY
, to get rid of all these useless lines -
For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark (
!
). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex ! -
Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )
-
Use the Replace All button, only, to keeps the present cursor location
-
I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :
hhhhhhhhhhh
fffffffffffffff
bbbbbbb
bbbbbbb
jj
eeeeeeeeeeeeeeeeeeeeeeeeeee
aaaaa
ccccccccccccccccccccccccccccccccccccccccccccccc
aaaaa
ddd
iiiiiiiiiiiiiiiii
aaaaa
hhhhhhhhhhh
gggggggggggggggggggggggggggggggggggg
bbbbbbb
Well, let’s go !
METHOD 1
-
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
aaaaa
aaaaa
aaaaa
bbbbbbb
bbbbbbb
bbbbbbb
ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
eeeeeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffff
gggggggggggggggggggggggggggggggggggg
hhhhhhhhhhh
hhhhhhhhhhh
iiiiiiiiiiiiiiiii
jj -
For case A), use the regexes : SEARCH =
(?-s)^(.+\R)\1+
, REPLACE =EMPTY
. We get the final text :ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
eeeeeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffff
gggggggggggggggggggggggggggggggggggg
iiiiiiiiiiiiiiiii
jj -
For case B), use the regexes : SEARCH =
(?-s)^(.+\R)(?:(\1)+|(?!\1))
, REPLACE =(?2$0)
. We get the final text :aaaaa
aaaaa
aaaaa
bbbbbbb
bbbbbbb
bbbbbbb
hhhhhhhhhhh
hhhhhhhhhhh -
For case C), use the regexes : SEARCH =
(?-s)^(.+\R)(?:(\1)+|(?!\1))
, REPLACE =(?2\1)
. We get the final text :aaaaa
bbbbbbb
hhhhhhhhhhh
METHOD 2
-
Use the regexes : SEARCH =
^
, REPLACE =!!
!!hhhhhhhhhhh
!!fffffffffffffff
!!bbbbbbb
!!bbbbbbb
!!jj
!!eeeeeeeeeeeeeeeeeeeeeeeeeee
!!aaaaa
!!ccccccccccccccccccccccccccccccccccccccccccccccc
!!aaaaa
!!ddd
!!iiiiiiiiiiiiiiiii
!!aaaaa
!!hhhhhhhhhhh
!!gggggggggggggggggggggggggggggggggggg
!!bbbbbbb -
Place the cursor between the two exclamation marks
!
-
Open the Column Editor ( ALT + C )
-
Select the second option Number to insert
-
Type 1 in the Initial number : and Increase by : zones
-
Check the Leading zeros option
-
Click on the OK button
!01!hhhhhhhhhhh
!02!fffffffffffffff
!03!bbbbbbb
!04!bbbbbbb
!05!jj
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!07!aaaaa
!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!09!aaaaa
!10!ddd
!11!iiiiiiiiiiiiiiiii
!12!aaaaa
!13!hhhhhhhhhhh
!14!gggggggggggggggggggggggggggggggggggg
!15!bbbbbbb -
Use the regexes : SEARCH =
^(.+!)(.+)
, REPLACE =\2\1
hhhhhhhhhhh!01!
fffffffffffffff!02!
bbbbbbb!03!
bbbbbbb!04!
jj!05!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
aaaaa!07!
ccccccccccccccccccccccccccccccccccccccccccccccc!08!
aaaaa!09!
ddd!10!
iiiiiiiiiiiiiiiii!11!
aaaaa!12!
hhhhhhhhhhh!13!
gggggggggggggggggggggggggggggggggggg!14!
bbbbbbb!15! -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
aaaaa!07!
aaaaa!09!
aaaaa!12!
bbbbbbb!03!
bbbbbbb!04!
bbbbbbb!15!
ccccccccccccccccccccccccccccccccccccccccccccccc!08!
ddd!10!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
fffffffffffffff!02!
gggggggggggggggggggggggggggggggggggg!14!
hhhhhhhhhhh!01!
hhhhhhhhhhh!13!
iiiiiiiiiiiiiiiii!11!
jj!05!
-
For case A), use the regexes : SEARCH =
(?-s)^(.+!).+\R(?:\1.+\R)+
REPLACE =EMPTY
ccccccccccccccccccccccccccccccccccccccccccccccc!08!
ddd!10!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
fffffffffffffff!02!
gggggggggggggggggggggggggggggggggggg!14!
iiiiiiiiiiiiiiiii!11!
jj!05! -
Use the regexes : SEARCH =
^(.+?)(!.+)
, REPLACE =\2\1
!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!10!ddd
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!02!fffffffffffffff
!14!gggggggggggggggggggggggggggggggggggg
!11!iiiiiiiiiiiiiiiii
!05!jj -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!02!fffffffffffffff
!05!jj
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!10!ddd
!11!iiiiiiiiiiiiiiiii
!14!gggggggggggggggggggggggggggggggggggg -
Finally, use the regexes : SEARCH =
^.+!
REPLACE =EMPTY
. We get the final text :fffffffffffffff
jj
eeeeeeeeeeeeeeeeeeeeeeeeeee
ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
iiiiiiiiiiiiiiiii
gggggggggggggggggggggggggggggggggggg
-
For case B), use the regexes :
(?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R))
, REPLACE =(?2$0)
aaaaa!07!
aaaaa!09!
aaaaa!12!
bbbbbbb!03!
bbbbbbb!04!
bbbbbbb!15!
hhhhhhhhhhh!01!
hhhhhhhhhhh!13! -
Use the regexes : SEARCH =
^(.+?)(!.+)
, REPLACE =\2\1
!07!aaaaa
!09!aaaaa
!12!aaaaa
!03!bbbbbbb
!04!bbbbbbb
!15!bbbbbbb
!01!hhhhhhhhhhh
!13!hhhhhhhhhhh -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!01!hhhhhhhhhhh
!03!bbbbbbb
!04!bbbbbbb
!07!aaaaa
!09!aaaaa
!12!aaaaa
!13!hhhhhhhhhhh
!15!bbbbbbb -
Finally, use the regexes : SEARCH =
^.+!
REPLACE =EMPTY
. We get the final text :hhhhhhhhhhh
bbbbbbb
bbbbbbb
aaaaa
aaaaa
aaaaa
hhhhhhhhhhh
bbbbbbb
-
For case C), use the regexes :
(?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R))
, REPLACE =(?3\1)
aaaaa!07!
bbbbbbb!03!
hhhhhhhhhhh!01! -
Use the regexes : SEARCH =
^(.+?)(!.+)
, REPLACE =\2\1
!07!aaaaa
!03!bbbbbbb
!01!hhhhhhhhhhh -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!01!hhhhhhhhhhh
!03!bbbbbbb
!07!aaaaa -
Finally, use the regexes : SEARCH =
^.+!
REPLACE =EMPTY
. We get the final text :hhhhhhhhhhh
bbbbbbb
aaaaa
To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !
Best Regards,
guy038
Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !
-
-
Hi Guy038 ,
" the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "
i do that step ,but nothing happen ,why so ,dear