Community
    • Login

    i want to keep only unique lines

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    duplicatesunique lines
    3 Posts 3 Posters 16.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • sophey henceS
      sophey hence
      last edited by

      Hi
      Please i have an issue ,
      i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
      example :
      aaa
      bbb
      ccc
      ddd
      aaa
      bbb

      i want it to become like this :
      ccc
      ddd

      I want both duplicates to be removed and keep only Unique Lines

      thanks

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi sophey hence and All,

        Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :

        • A) All the lines which are unique

        • B) All the duplicate lines

        • C) The first duplicate line, from all the duplicate lines

        UPDATE, on 11/19/16 :

        • D) All the lines which are unique AND the last duplicate line, from all the duplicate lines

        For this last case D), refer to that other post, below :

        https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7

        In order to get the lines of these remaining 3 cases A), B) OR C),TWO methods are possible :

        • METHOD 1 needs, only, a lexical sort and an appropriate regex

        • METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex

        Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines


        Hypotheses :

        • I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH = ^\h*\R , REPLACE = EMPTY , to get rid of all these useless lines

        • For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark ( ! ). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex !

        • Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )

        • Use the Replace All button, only, to keeps the present cursor location

        • I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :

          hhhhhhhhhhh
          fffffffffffffff
          bbbbbbb
          bbbbbbb
          jj
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          aaaaa
          ccccccccccccccccccccccccccccccccccccccccccccccc
          aaaaa
          ddd
          iiiiiiiiiiiiiiiii
          aaaaa
          hhhhhhhhhhh
          gggggggggggggggggggggggggggggggggggg
          bbbbbbb

        Well, let’s go !


        METHOD 1

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          aaaaa
          aaaaa
          aaaaa
          bbbbbbb
          bbbbbbb
          bbbbbbb
          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          fffffffffffffff
          gggggggggggggggggggggggggggggggggggg
          hhhhhhhhhhh
          hhhhhhhhhhh
          iiiiiiiiiiiiiiiii
          jj

        • For case A), use the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY. We get the final text :

          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          fffffffffffffff
          gggggggggggggggggggggggggggggggggggg
          iiiiiiiiiiiiiiiii
          jj

        • For case B), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2$0). We get the final text :

          aaaaa
          aaaaa
          aaaaa
          bbbbbbb
          bbbbbbb
          bbbbbbb
          hhhhhhhhhhh
          hhhhhhhhhhh

        • For case C), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2\1). We get the final text :

          aaaaa
          bbbbbbb
          hhhhhhhhhhh


        METHOD 2

        • Use the regexes : SEARCH = ^ , REPLACE = !!

          !!hhhhhhhhhhh
          !!fffffffffffffff
          !!bbbbbbb
          !!bbbbbbb
          !!jj
          !!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !!aaaaa
          !!ccccccccccccccccccccccccccccccccccccccccccccccc
          !!aaaaa
          !!ddd
          !!iiiiiiiiiiiiiiiii
          !!aaaaa
          !!hhhhhhhhhhh
          !!gggggggggggggggggggggggggggggggggggg
          !!bbbbbbb

        • Place the cursor between the two exclamation marks !

        • Open the Column Editor ( ALT + C )

        • Select the second option Number to insert

        • Type 1 in the Initial number : and Increase by : zones

        • Check the Leading zeros option

        • Click on the OK button

          !01!hhhhhhhhhhh
          !02!fffffffffffffff
          !03!bbbbbbb
          !04!bbbbbbb
          !05!jj
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !07!aaaaa
          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !09!aaaaa
          !10!ddd
          !11!iiiiiiiiiiiiiiiii
          !12!aaaaa
          !13!hhhhhhhhhhh
          !14!gggggggggggggggggggggggggggggggggggg
          !15!bbbbbbb

        • Use the regexes : SEARCH = ^(.+!)(.+) , REPLACE = \2\1

          hhhhhhhhhhh!01!
          fffffffffffffff!02!
          bbbbbbb!03!
          bbbbbbb!04!
          jj!05!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          aaaaa!07!
          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          aaaaa!09!
          ddd!10!
          iiiiiiiiiiiiiiiii!11!
          aaaaa!12!
          hhhhhhhhhhh!13!
          gggggggggggggggggggggggggggggggggggg!14!
          bbbbbbb!15!

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          aaaaa!07!
          aaaaa!09!
          aaaaa!12!
          bbbbbbb!03!
          bbbbbbb!04!
          bbbbbbb!15!
          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          ddd!10!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          fffffffffffffff!02!
          gggggggggggggggggggggggggggggggggggg!14!
          hhhhhhhhhhh!01!
          hhhhhhhhhhh!13!
          iiiiiiiiiiiiiiiii!11!
          jj!05!


        • For case A), use the regexes : SEARCH = (?-s)^(.+!).+\R(?:\1.+\R)+ REPLACE = EMPTY

          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          ddd!10!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          fffffffffffffff!02!
          gggggggggggggggggggggggggggggggggggg!14!
          iiiiiiiiiiiiiiiii!11!
          jj!05!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !10!ddd
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !02!fffffffffffffff
          !14!gggggggggggggggggggggggggggggggggggg
          !11!iiiiiiiiiiiiiiiii
          !05!jj

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !02!fffffffffffffff
          !05!jj
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !10!ddd
          !11!iiiiiiiiiiiiiiiii
          !14!gggggggggggggggggggggggggggggggggggg

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          fffffffffffffff
          jj
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          iiiiiiiiiiiiiiiii
          gggggggggggggggggggggggggggggggggggg


        • For case B), use the regexes : (?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R)) , REPLACE = (?2$0)

          aaaaa!07!
          aaaaa!09!
          aaaaa!12!
          bbbbbbb!03!
          bbbbbbb!04!
          bbbbbbb!15!
          hhhhhhhhhhh!01!
          hhhhhhhhhhh!13!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !07!aaaaa
          !09!aaaaa
          !12!aaaaa
          !03!bbbbbbb
          !04!bbbbbbb
          !15!bbbbbbb
          !01!hhhhhhhhhhh
          !13!hhhhhhhhhhh

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !01!hhhhhhhhhhh
          !03!bbbbbbb
          !04!bbbbbbb
          !07!aaaaa
          !09!aaaaa
          !12!aaaaa
          !13!hhhhhhhhhhh
          !15!bbbbbbb

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          hhhhhhhhhhh
          bbbbbbb
          bbbbbbb
          aaaaa
          aaaaa
          aaaaa
          hhhhhhhhhhh
          bbbbbbb


        • For case C), use the regexes : (?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R)) , REPLACE = (?3\1)

          aaaaa!07!
          bbbbbbb!03!
          hhhhhhhhhhh!01!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !07!aaaaa
          !03!bbbbbbb
          !01!hhhhhhhhhhh

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !01!hhhhhhhhhhh
          !03!bbbbbbb
          !07!aaaaa

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          hhhhhhhhhhh
          bbbbbbb
          aaaaa


        To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !

        Best Regards,

        guy038

        Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !

        1 Reply Last reply Reply Quote 2
        • Hà NguyễnH
          Hà Nguyễn
          last edited by

          Hi Guy038 ,

          " the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "

          i do that step ,but nothing happen ,why so ,dear

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors