i want to keep only unique lines
-
Hi
Please i have an issue ,
i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
example :
aaa
bbb
ccc
ddd
aaa
bbbi want it to become like this :
ccc
dddI want both duplicates to be removed and keep only Unique Lines
thanks
-
Hi sophey hence and All,
Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :
-
A)All the lines which are unique -
B)All the duplicate lines -
C)The first duplicate line, from all the duplicate lines
UPDATE, on 11/19/16 :
D)All the lines which are unique AND the last duplicate line, from all the duplicate lines
For this last case
D), refer to that other post, below :https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7
In order to get the lines of these remaining 3 cases
A),B)ORC),TWO methods are possible :-
METHOD 1 needs, only, a lexical sort and an appropriate regex
-
METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex
Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines
Hypotheses :
-
I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH =
^\h*\R, REPLACE =EMPTY, to get rid of all these useless lines -
For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark (
!). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex ! -
Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )
-
Use the Replace All button, only, to keeps the present cursor location
-
I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :
hhhhhhhhhhh
fffffffffffffff
bbbbbbb
bbbbbbb
jj
eeeeeeeeeeeeeeeeeeeeeeeeeee
aaaaa
ccccccccccccccccccccccccccccccccccccccccccccccc
aaaaa
ddd
iiiiiiiiiiiiiiiii
aaaaa
hhhhhhhhhhh
gggggggggggggggggggggggggggggggggggg
bbbbbbb
Well, let’s go !
METHOD 1-
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
aaaaa
aaaaa
aaaaa
bbbbbbb
bbbbbbb
bbbbbbb
ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
eeeeeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffff
gggggggggggggggggggggggggggggggggggg
hhhhhhhhhhh
hhhhhhhhhhh
iiiiiiiiiiiiiiiii
jj -
For case A), use the regexes : SEARCH =
(?-s)^(.+\R)\1+, REPLACE =EMPTY. We get the final text :ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
eeeeeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffff
gggggggggggggggggggggggggggggggggggg
iiiiiiiiiiiiiiiii
jj -
For case B), use the regexes : SEARCH =
(?-s)^(.+\R)(?:(\1)+|(?!\1)), REPLACE =(?2$0). We get the final text :aaaaa
aaaaa
aaaaa
bbbbbbb
bbbbbbb
bbbbbbb
hhhhhhhhhhh
hhhhhhhhhhh -
For case C), use the regexes : SEARCH =
(?-s)^(.+\R)(?:(\1)+|(?!\1)), REPLACE =(?2\1). We get the final text :aaaaa
bbbbbbb
hhhhhhhhhhh
METHOD 2-
Use the regexes : SEARCH =
^, REPLACE =!!!!hhhhhhhhhhh
!!fffffffffffffff
!!bbbbbbb
!!bbbbbbb
!!jj
!!eeeeeeeeeeeeeeeeeeeeeeeeeee
!!aaaaa
!!ccccccccccccccccccccccccccccccccccccccccccccccc
!!aaaaa
!!ddd
!!iiiiiiiiiiiiiiiii
!!aaaaa
!!hhhhhhhhhhh
!!gggggggggggggggggggggggggggggggggggg
!!bbbbbbb -
Place the cursor between the two exclamation marks
! -
Open the Column Editor ( ALT + C )
-
Select the second option Number to insert
-
Type 1 in the Initial number : and Increase by : zones
-
Check the Leading zeros option
-
Click on the OK button
!01!hhhhhhhhhhh
!02!fffffffffffffff
!03!bbbbbbb
!04!bbbbbbb
!05!jj
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!07!aaaaa
!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!09!aaaaa
!10!ddd
!11!iiiiiiiiiiiiiiiii
!12!aaaaa
!13!hhhhhhhhhhh
!14!gggggggggggggggggggggggggggggggggggg
!15!bbbbbbb -
Use the regexes : SEARCH =
^(.+!)(.+), REPLACE =\2\1hhhhhhhhhhh!01!
fffffffffffffff!02!
bbbbbbb!03!
bbbbbbb!04!
jj!05!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
aaaaa!07!
ccccccccccccccccccccccccccccccccccccccccccccccc!08!
aaaaa!09!
ddd!10!
iiiiiiiiiiiiiiiii!11!
aaaaa!12!
hhhhhhhhhhh!13!
gggggggggggggggggggggggggggggggggggg!14!
bbbbbbb!15! -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
aaaaa!07!
aaaaa!09!
aaaaa!12!
bbbbbbb!03!
bbbbbbb!04!
bbbbbbb!15!
ccccccccccccccccccccccccccccccccccccccccccccccc!08!
ddd!10!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
fffffffffffffff!02!
gggggggggggggggggggggggggggggggggggg!14!
hhhhhhhhhhh!01!
hhhhhhhhhhh!13!
iiiiiiiiiiiiiiiii!11!
jj!05!
-
For case A), use the regexes : SEARCH =
(?-s)^(.+!).+\R(?:\1.+\R)+REPLACE =EMPTYccccccccccccccccccccccccccccccccccccccccccccccc!08!
ddd!10!
eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
fffffffffffffff!02!
gggggggggggggggggggggggggggggggggggg!14!
iiiiiiiiiiiiiiiii!11!
jj!05! -
Use the regexes : SEARCH =
^(.+?)(!.+), REPLACE =\2\1!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!10!ddd
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!02!fffffffffffffff
!14!gggggggggggggggggggggggggggggggggggg
!11!iiiiiiiiiiiiiiiii
!05!jj -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!02!fffffffffffffff
!05!jj
!06!eeeeeeeeeeeeeeeeeeeeeeeeeee
!08!ccccccccccccccccccccccccccccccccccccccccccccccc
!10!ddd
!11!iiiiiiiiiiiiiiiii
!14!gggggggggggggggggggggggggggggggggggg -
Finally, use the regexes : SEARCH =
^.+!REPLACE =EMPTY. We get the final text :fffffffffffffff
jj
eeeeeeeeeeeeeeeeeeeeeeeeeee
ccccccccccccccccccccccccccccccccccccccccccccccc
ddd
iiiiiiiiiiiiiiiii
gggggggggggggggggggggggggggggggggggg
-
For case B), use the regexes :
(?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R)), REPLACE =(?2$0)aaaaa!07!
aaaaa!09!
aaaaa!12!
bbbbbbb!03!
bbbbbbb!04!
bbbbbbb!15!
hhhhhhhhhhh!01!
hhhhhhhhhhh!13! -
Use the regexes : SEARCH =
^(.+?)(!.+), REPLACE =\2\1!07!aaaaa
!09!aaaaa
!12!aaaaa
!03!bbbbbbb
!04!bbbbbbb
!15!bbbbbbb
!01!hhhhhhhhhhh
!13!hhhhhhhhhhh -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!01!hhhhhhhhhhh
!03!bbbbbbb
!04!bbbbbbb
!07!aaaaa
!09!aaaaa
!12!aaaaa
!13!hhhhhhhhhhh
!15!bbbbbbb -
Finally, use the regexes : SEARCH =
^.+!REPLACE =EMPTY. We get the final text :hhhhhhhhhhh
bbbbbbb
bbbbbbb
aaaaa
aaaaa
aaaaa
hhhhhhhhhhh
bbbbbbb
-
For case C), use the regexes :
(?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R)), REPLACE =(?3\1)aaaaa!07!
bbbbbbb!03!
hhhhhhhhhhh!01! -
Use the regexes : SEARCH =
^(.+?)(!.+), REPLACE =\2\1!07!aaaaa
!03!bbbbbbb
!01!hhhhhhhhhhh -
Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending
!01!hhhhhhhhhhh
!03!bbbbbbb
!07!aaaaa -
Finally, use the regexes : SEARCH =
^.+!REPLACE =EMPTY. We get the final text :hhhhhhhhhhh
bbbbbbb
aaaaa
To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !
Best Regards,
guy038
Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !
-
-
Hi Guy038 ,
" the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "
i do that step ,but nothing happen ,why so ,dear
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login