i want to keep only unique lines



  • Hi
    Please i have an issue ,
    i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
    example :
    aaa
    bbb
    ccc
    ddd
    aaa
    bbb

    i want it to become like this :
    ccc
    ddd

    I want both duplicates to be removed and keep only Unique Lines

    thanks



  • Hi sophey hence and All,

    Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :

    • A) All the lines which are unique

    • B) All the duplicate lines

    • C) The first duplicate line, from all the duplicate lines

    UPDATE, on 11/19/16 :

    • D) All the lines which are unique AND the last duplicate line, from all the duplicate lines

    For this last case D), refer to that other post, below :

    https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7

    In order to get the lines of these remaining 3 cases A), B) OR C),TWO methods are possible :

    • METHOD 1 needs, only, a lexical sort and an appropriate regex

    • METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex

    Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines


    Hypotheses :

    • I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH = ^\h*\R , REPLACE = EMPTY , to get rid of all these useless lines

    • For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark ( ! ). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex !

    • Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )

    • Use the Replace All button, only, to keeps the present cursor location

    • I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :

      hhhhhhhhhhh
      fffffffffffffff
      bbbbbbb
      bbbbbbb
      jj
      eeeeeeeeeeeeeeeeeeeeeeeeeee
      aaaaa
      ccccccccccccccccccccccccccccccccccccccccccccccc
      aaaaa
      ddd
      iiiiiiiiiiiiiiiii
      aaaaa
      hhhhhhhhhhh
      gggggggggggggggggggggggggggggggggggg
      bbbbbbb

    Well, let’s go !


    METHOD 1

    • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

      aaaaa
      aaaaa
      aaaaa
      bbbbbbb
      bbbbbbb
      bbbbbbb
      ccccccccccccccccccccccccccccccccccccccccccccccc
      ddd
      eeeeeeeeeeeeeeeeeeeeeeeeeee
      fffffffffffffff
      gggggggggggggggggggggggggggggggggggg
      hhhhhhhhhhh
      hhhhhhhhhhh
      iiiiiiiiiiiiiiiii
      jj

    • For case A), use the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY. We get the final text :

      ccccccccccccccccccccccccccccccccccccccccccccccc
      ddd
      eeeeeeeeeeeeeeeeeeeeeeeeeee
      fffffffffffffff
      gggggggggggggggggggggggggggggggggggg
      iiiiiiiiiiiiiiiii
      jj

    • For case B), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2$0). We get the final text :

      aaaaa
      aaaaa
      aaaaa
      bbbbbbb
      bbbbbbb
      bbbbbbb
      hhhhhhhhhhh
      hhhhhhhhhhh

    • For case C), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2\1). We get the final text :

      aaaaa
      bbbbbbb
      hhhhhhhhhhh


    METHOD 2

    • Use the regexes : SEARCH = ^ , REPLACE = !!

      !!hhhhhhhhhhh
      !!fffffffffffffff
      !!bbbbbbb
      !!bbbbbbb
      !!jj
      !!eeeeeeeeeeeeeeeeeeeeeeeeeee
      !!aaaaa
      !!ccccccccccccccccccccccccccccccccccccccccccccccc
      !!aaaaa
      !!ddd
      !!iiiiiiiiiiiiiiiii
      !!aaaaa
      !!hhhhhhhhhhh
      !!gggggggggggggggggggggggggggggggggggg
      !!bbbbbbb

    • Place the cursor between the two exclamation marks !

    • Open the Column Editor ( ALT + C )

    • Select the second option Number to insert

    • Type 1 in the Initial number : and Increase by : zones

    • Check the Leading zeros option

    • Click on the OK button

      !01!hhhhhhhhhhh
      !02!fffffffffffffff
      !03!bbbbbbb
      !04!bbbbbbb
      !05!jj
      !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
      !07!aaaaa
      !08!ccccccccccccccccccccccccccccccccccccccccccccccc
      !09!aaaaa
      !10!ddd
      !11!iiiiiiiiiiiiiiiii
      !12!aaaaa
      !13!hhhhhhhhhhh
      !14!gggggggggggggggggggggggggggggggggggg
      !15!bbbbbbb

    • Use the regexes : SEARCH = ^(.+!)(.+) , REPLACE = \2\1

      hhhhhhhhhhh!01!
      fffffffffffffff!02!
      bbbbbbb!03!
      bbbbbbb!04!
      jj!05!
      eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
      aaaaa!07!
      ccccccccccccccccccccccccccccccccccccccccccccccc!08!
      aaaaa!09!
      ddd!10!
      iiiiiiiiiiiiiiiii!11!
      aaaaa!12!
      hhhhhhhhhhh!13!
      gggggggggggggggggggggggggggggggggggg!14!
      bbbbbbb!15!

    • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

      aaaaa!07!
      aaaaa!09!
      aaaaa!12!
      bbbbbbb!03!
      bbbbbbb!04!
      bbbbbbb!15!
      ccccccccccccccccccccccccccccccccccccccccccccccc!08!
      ddd!10!
      eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
      fffffffffffffff!02!
      gggggggggggggggggggggggggggggggggggg!14!
      hhhhhhhhhhh!01!
      hhhhhhhhhhh!13!
      iiiiiiiiiiiiiiiii!11!
      jj!05!


    • For case A), use the regexes : SEARCH = (?-s)^(.+!).+\R(?:\1.+\R)+ REPLACE = EMPTY

      ccccccccccccccccccccccccccccccccccccccccccccccc!08!
      ddd!10!
      eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
      fffffffffffffff!02!
      gggggggggggggggggggggggggggggggggggg!14!
      iiiiiiiiiiiiiiiii!11!
      jj!05!

    • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

      !08!ccccccccccccccccccccccccccccccccccccccccccccccc
      !10!ddd
      !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
      !02!fffffffffffffff
      !14!gggggggggggggggggggggggggggggggggggg
      !11!iiiiiiiiiiiiiiiii
      !05!jj

    • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

      !02!fffffffffffffff
      !05!jj
      !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
      !08!ccccccccccccccccccccccccccccccccccccccccccccccc
      !10!ddd
      !11!iiiiiiiiiiiiiiiii
      !14!gggggggggggggggggggggggggggggggggggg

    • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

      fffffffffffffff
      jj
      eeeeeeeeeeeeeeeeeeeeeeeeeee
      ccccccccccccccccccccccccccccccccccccccccccccccc
      ddd
      iiiiiiiiiiiiiiiii
      gggggggggggggggggggggggggggggggggggg


    • For case B), use the regexes : (?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R)) , REPLACE = (?2$0)

      aaaaa!07!
      aaaaa!09!
      aaaaa!12!
      bbbbbbb!03!
      bbbbbbb!04!
      bbbbbbb!15!
      hhhhhhhhhhh!01!
      hhhhhhhhhhh!13!

    • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

      !07!aaaaa
      !09!aaaaa
      !12!aaaaa
      !03!bbbbbbb
      !04!bbbbbbb
      !15!bbbbbbb
      !01!hhhhhhhhhhh
      !13!hhhhhhhhhhh

    • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

      !01!hhhhhhhhhhh
      !03!bbbbbbb
      !04!bbbbbbb
      !07!aaaaa
      !09!aaaaa
      !12!aaaaa
      !13!hhhhhhhhhhh
      !15!bbbbbbb

    • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

      hhhhhhhhhhh
      bbbbbbb
      bbbbbbb
      aaaaa
      aaaaa
      aaaaa
      hhhhhhhhhhh
      bbbbbbb


    • For case C), use the regexes : (?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R)) , REPLACE = (?3\1)

      aaaaa!07!
      bbbbbbb!03!
      hhhhhhhhhhh!01!

    • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

      !07!aaaaa
      !03!bbbbbbb
      !01!hhhhhhhhhhh

    • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

      !01!hhhhhhhhhhh
      !03!bbbbbbb
      !07!aaaaa

    • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

      hhhhhhhhhhh
      bbbbbbb
      aaaaa


    To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !

    Best Regards,

    guy038

    Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !



  • Hi Guy038 ,

    " the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "

    i do that step ,but nothing happen ,why so ,dear


Log in to reply