• Login
Community
  • Login

i want to keep only unique lines

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
duplicatesunique lines
3 Posts 3 Posters 16.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    sophey hence
    last edited by Oct 11, 2016, 8:52 AM

    Hi
    Please i have an issue ,
    i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
    example :
    aaa
    bbb
    ccc
    ddd
    aaa
    bbb

    i want it to become like this :
    ccc
    ddd

    I want both duplicates to be removed and keep only Unique Lines

    thanks

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Nov 19, 2016, 8:57 AM Oct 11, 2016, 9:58 PM

      Hi sophey hence and All,

      Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :

      • A) All the lines which are unique

      • B) All the duplicate lines

      • C) The first duplicate line, from all the duplicate lines

      UPDATE, on 11/19/16 :

      • D) All the lines which are unique AND the last duplicate line, from all the duplicate lines

      For this last case D), refer to that other post, below :

      https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7

      In order to get the lines of these remaining 3 cases A), B) OR C),TWO methods are possible :

      • METHOD 1 needs, only, a lexical sort and an appropriate regex

      • METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex

      Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines


      Hypotheses :

      • I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH = ^\h*\R , REPLACE = EMPTY , to get rid of all these useless lines

      • For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark ( ! ). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex !

      • Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )

      • Use the Replace All button, only, to keeps the present cursor location

      • I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :

        hhhhhhhhhhh
        fffffffffffffff
        bbbbbbb
        bbbbbbb
        jj
        eeeeeeeeeeeeeeeeeeeeeeeeeee
        aaaaa
        ccccccccccccccccccccccccccccccccccccccccccccccc
        aaaaa
        ddd
        iiiiiiiiiiiiiiiii
        aaaaa
        hhhhhhhhhhh
        gggggggggggggggggggggggggggggggggggg
        bbbbbbb

      Well, let’s go !


      METHOD 1

      • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

        aaaaa
        aaaaa
        aaaaa
        bbbbbbb
        bbbbbbb
        bbbbbbb
        ccccccccccccccccccccccccccccccccccccccccccccccc
        ddd
        eeeeeeeeeeeeeeeeeeeeeeeeeee
        fffffffffffffff
        gggggggggggggggggggggggggggggggggggg
        hhhhhhhhhhh
        hhhhhhhhhhh
        iiiiiiiiiiiiiiiii
        jj

      • For case A), use the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY. We get the final text :

        ccccccccccccccccccccccccccccccccccccccccccccccc
        ddd
        eeeeeeeeeeeeeeeeeeeeeeeeeee
        fffffffffffffff
        gggggggggggggggggggggggggggggggggggg
        iiiiiiiiiiiiiiiii
        jj

      • For case B), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2$0). We get the final text :

        aaaaa
        aaaaa
        aaaaa
        bbbbbbb
        bbbbbbb
        bbbbbbb
        hhhhhhhhhhh
        hhhhhhhhhhh

      • For case C), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2\1). We get the final text :

        aaaaa
        bbbbbbb
        hhhhhhhhhhh


      METHOD 2

      • Use the regexes : SEARCH = ^ , REPLACE = !!

        !!hhhhhhhhhhh
        !!fffffffffffffff
        !!bbbbbbb
        !!bbbbbbb
        !!jj
        !!eeeeeeeeeeeeeeeeeeeeeeeeeee
        !!aaaaa
        !!ccccccccccccccccccccccccccccccccccccccccccccccc
        !!aaaaa
        !!ddd
        !!iiiiiiiiiiiiiiiii
        !!aaaaa
        !!hhhhhhhhhhh
        !!gggggggggggggggggggggggggggggggggggg
        !!bbbbbbb

      • Place the cursor between the two exclamation marks !

      • Open the Column Editor ( ALT + C )

      • Select the second option Number to insert

      • Type 1 in the Initial number : and Increase by : zones

      • Check the Leading zeros option

      • Click on the OK button

        !01!hhhhhhhhhhh
        !02!fffffffffffffff
        !03!bbbbbbb
        !04!bbbbbbb
        !05!jj
        !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
        !07!aaaaa
        !08!ccccccccccccccccccccccccccccccccccccccccccccccc
        !09!aaaaa
        !10!ddd
        !11!iiiiiiiiiiiiiiiii
        !12!aaaaa
        !13!hhhhhhhhhhh
        !14!gggggggggggggggggggggggggggggggggggg
        !15!bbbbbbb

      • Use the regexes : SEARCH = ^(.+!)(.+) , REPLACE = \2\1

        hhhhhhhhhhh!01!
        fffffffffffffff!02!
        bbbbbbb!03!
        bbbbbbb!04!
        jj!05!
        eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
        aaaaa!07!
        ccccccccccccccccccccccccccccccccccccccccccccccc!08!
        aaaaa!09!
        ddd!10!
        iiiiiiiiiiiiiiiii!11!
        aaaaa!12!
        hhhhhhhhhhh!13!
        gggggggggggggggggggggggggggggggggggg!14!
        bbbbbbb!15!

      • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

        aaaaa!07!
        aaaaa!09!
        aaaaa!12!
        bbbbbbb!03!
        bbbbbbb!04!
        bbbbbbb!15!
        ccccccccccccccccccccccccccccccccccccccccccccccc!08!
        ddd!10!
        eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
        fffffffffffffff!02!
        gggggggggggggggggggggggggggggggggggg!14!
        hhhhhhhhhhh!01!
        hhhhhhhhhhh!13!
        iiiiiiiiiiiiiiiii!11!
        jj!05!


      • For case A), use the regexes : SEARCH = (?-s)^(.+!).+\R(?:\1.+\R)+ REPLACE = EMPTY

        ccccccccccccccccccccccccccccccccccccccccccccccc!08!
        ddd!10!
        eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
        fffffffffffffff!02!
        gggggggggggggggggggggggggggggggggggg!14!
        iiiiiiiiiiiiiiiii!11!
        jj!05!

      • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

        !08!ccccccccccccccccccccccccccccccccccccccccccccccc
        !10!ddd
        !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
        !02!fffffffffffffff
        !14!gggggggggggggggggggggggggggggggggggg
        !11!iiiiiiiiiiiiiiiii
        !05!jj

      • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

        !02!fffffffffffffff
        !05!jj
        !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
        !08!ccccccccccccccccccccccccccccccccccccccccccccccc
        !10!ddd
        !11!iiiiiiiiiiiiiiiii
        !14!gggggggggggggggggggggggggggggggggggg

      • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

        fffffffffffffff
        jj
        eeeeeeeeeeeeeeeeeeeeeeeeeee
        ccccccccccccccccccccccccccccccccccccccccccccccc
        ddd
        iiiiiiiiiiiiiiiii
        gggggggggggggggggggggggggggggggggggg


      • For case B), use the regexes : (?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R)) , REPLACE = (?2$0)

        aaaaa!07!
        aaaaa!09!
        aaaaa!12!
        bbbbbbb!03!
        bbbbbbb!04!
        bbbbbbb!15!
        hhhhhhhhhhh!01!
        hhhhhhhhhhh!13!

      • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

        !07!aaaaa
        !09!aaaaa
        !12!aaaaa
        !03!bbbbbbb
        !04!bbbbbbb
        !15!bbbbbbb
        !01!hhhhhhhhhhh
        !13!hhhhhhhhhhh

      • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

        !01!hhhhhhhhhhh
        !03!bbbbbbb
        !04!bbbbbbb
        !07!aaaaa
        !09!aaaaa
        !12!aaaaa
        !13!hhhhhhhhhhh
        !15!bbbbbbb

      • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

        hhhhhhhhhhh
        bbbbbbb
        bbbbbbb
        aaaaa
        aaaaa
        aaaaa
        hhhhhhhhhhh
        bbbbbbb


      • For case C), use the regexes : (?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R)) , REPLACE = (?3\1)

        aaaaa!07!
        bbbbbbb!03!
        hhhhhhhhhhh!01!

      • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

        !07!aaaaa
        !03!bbbbbbb
        !01!hhhhhhhhhhh

      • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

        !01!hhhhhhhhhhh
        !03!bbbbbbb
        !07!aaaaa

      • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

        hhhhhhhhhhh
        bbbbbbb
        aaaaa


      To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !

      Best Regards,

      guy038

      Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !

      1 Reply Last reply Reply Quote 2
      • Hà NguyễnH
        Hà Nguyễn
        last edited by Feb 1, 2017, 4:56 PM

        Hi Guy038 ,

        " the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "

        i do that step ,but nothing happen ,why so ,dear

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors