Community
    • Login

    i want to keep only unique lines

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    duplicatesunique lines
    3 Posts 3 Posters 19.3k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • sophey henceS Offline
      sophey hence
      last edited by

      Hi
      Please i have an issue ,
      i want to keep only unique lines on notepad++ , i have read some previous posts , but it’s not what i’m looking for ,
      example :
      aaa
      bbb
      ccc
      ddd
      aaa
      bbb

      i want it to become like this :
      ccc
      ddd

      I want both duplicates to be removed and keep only Unique Lines

      thanks

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hi sophey hence and All,

        Indeed, sophey hence, you’re raising a general problem ! How, from the contents of the current file, to keep, ONLY :

        • A) All the lines which are unique

        • B) All the duplicate lines

        • C) The first duplicate line, from all the duplicate lines

        UPDATE, on 11/19/16 :

        • D) All the lines which are unique AND the last duplicate line, from all the duplicate lines

        For this last case D), refer to that other post, below :

        https://notepad-plus-plus.org/community/topic/12569/delete-duplicate-lines/7

        In order to get the lines of these remaining 3 cases A), B) OR C),TWO methods are possible :

        • METHOD 1 needs, only, a lexical sort and an appropriate regex

        • METHOD 2 needs some secondary S/R, the use of the Column Editor, two lexical sorts and a main appropriate regex

        Of course, METHOD 1 is more simple. However, contrary to the Method 2, it does NOT keep the original order of the lines


        Hypotheses :

        • I supposed that no blank line and empty line exists, in your file. If NOT, just use the regex : SEARCH = ^\h*\R , REPLACE = EMPTY , to get rid of all these useless lines

        • For METHOD 2, I needs ONE temporary character, NOT presently used, in your file. I choosed the exclamation mark ( ! ). Of course, any other symbol could suit ! However, take care to escape this symbol if it’s a meta character, with special meaning, inside a regex !

        • Before performing any replacement, remember to go back to the very beginning of your file ( CTRL + Origin )

        • Use the Replace All button, only, to keeps the present cursor location

        • I’ll use the sample text, below, containing 15 lines, whose 3 are multiple :

          hhhhhhhhhhh
          fffffffffffffff
          bbbbbbb
          bbbbbbb
          jj
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          aaaaa
          ccccccccccccccccccccccccccccccccccccccccccccccc
          aaaaa
          ddd
          iiiiiiiiiiiiiiiii
          aaaaa
          hhhhhhhhhhh
          gggggggggggggggggggggggggggggggggggg
          bbbbbbb

        Well, let’s go !


        METHOD 1

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          aaaaa
          aaaaa
          aaaaa
          bbbbbbb
          bbbbbbb
          bbbbbbb
          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          fffffffffffffff
          gggggggggggggggggggggggggggggggggggg
          hhhhhhhhhhh
          hhhhhhhhhhh
          iiiiiiiiiiiiiiiii
          jj

        • For case A), use the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY. We get the final text :

          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          fffffffffffffff
          gggggggggggggggggggggggggggggggggggg
          iiiiiiiiiiiiiiiii
          jj

        • For case B), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2$0). We get the final text :

          aaaaa
          aaaaa
          aaaaa
          bbbbbbb
          bbbbbbb
          bbbbbbb
          hhhhhhhhhhh
          hhhhhhhhhhh

        • For case C), use the regexes : SEARCH = (?-s)^(.+\R)(?:(\1)+|(?!\1)) , REPLACE = (?2\1). We get the final text :

          aaaaa
          bbbbbbb
          hhhhhhhhhhh


        METHOD 2

        • Use the regexes : SEARCH = ^ , REPLACE = !!

          !!hhhhhhhhhhh
          !!fffffffffffffff
          !!bbbbbbb
          !!bbbbbbb
          !!jj
          !!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !!aaaaa
          !!ccccccccccccccccccccccccccccccccccccccccccccccc
          !!aaaaa
          !!ddd
          !!iiiiiiiiiiiiiiiii
          !!aaaaa
          !!hhhhhhhhhhh
          !!gggggggggggggggggggggggggggggggggggg
          !!bbbbbbb

        • Place the cursor between the two exclamation marks !

        • Open the Column Editor ( ALT + C )

        • Select the second option Number to insert

        • Type 1 in the Initial number : and Increase by : zones

        • Check the Leading zeros option

        • Click on the OK button

          !01!hhhhhhhhhhh
          !02!fffffffffffffff
          !03!bbbbbbb
          !04!bbbbbbb
          !05!jj
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !07!aaaaa
          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !09!aaaaa
          !10!ddd
          !11!iiiiiiiiiiiiiiiii
          !12!aaaaa
          !13!hhhhhhhhhhh
          !14!gggggggggggggggggggggggggggggggggggg
          !15!bbbbbbb

        • Use the regexes : SEARCH = ^(.+!)(.+) , REPLACE = \2\1

          hhhhhhhhhhh!01!
          fffffffffffffff!02!
          bbbbbbb!03!
          bbbbbbb!04!
          jj!05!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          aaaaa!07!
          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          aaaaa!09!
          ddd!10!
          iiiiiiiiiiiiiiiii!11!
          aaaaa!12!
          hhhhhhhhhhh!13!
          gggggggggggggggggggggggggggggggggggg!14!
          bbbbbbb!15!

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          aaaaa!07!
          aaaaa!09!
          aaaaa!12!
          bbbbbbb!03!
          bbbbbbb!04!
          bbbbbbb!15!
          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          ddd!10!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          fffffffffffffff!02!
          gggggggggggggggggggggggggggggggggggg!14!
          hhhhhhhhhhh!01!
          hhhhhhhhhhh!13!
          iiiiiiiiiiiiiiiii!11!
          jj!05!


        • For case A), use the regexes : SEARCH = (?-s)^(.+!).+\R(?:\1.+\R)+ REPLACE = EMPTY

          ccccccccccccccccccccccccccccccccccccccccccccccc!08!
          ddd!10!
          eeeeeeeeeeeeeeeeeeeeeeeeeee!06!
          fffffffffffffff!02!
          gggggggggggggggggggggggggggggggggggg!14!
          iiiiiiiiiiiiiiiii!11!
          jj!05!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !10!ddd
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !02!fffffffffffffff
          !14!gggggggggggggggggggggggggggggggggggg
          !11!iiiiiiiiiiiiiiiii
          !05!jj

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !02!fffffffffffffff
          !05!jj
          !06!eeeeeeeeeeeeeeeeeeeeeeeeeee
          !08!ccccccccccccccccccccccccccccccccccccccccccccccc
          !10!ddd
          !11!iiiiiiiiiiiiiiiii
          !14!gggggggggggggggggggggggggggggggggggg

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          fffffffffffffff
          jj
          eeeeeeeeeeeeeeeeeeeeeeeeeee
          ccccccccccccccccccccccccccccccccccccccccccccccc
          ddd
          iiiiiiiiiiiiiiiii
          gggggggggggggggggggggggggggggggggggg


        • For case B), use the regexes : (?-s)^(.+!).+\R(?:(\1.+\R)+|(?!\1.+\R)) , REPLACE = (?2$0)

          aaaaa!07!
          aaaaa!09!
          aaaaa!12!
          bbbbbbb!03!
          bbbbbbb!04!
          bbbbbbb!15!
          hhhhhhhhhhh!01!
          hhhhhhhhhhh!13!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !07!aaaaa
          !09!aaaaa
          !12!aaaaa
          !03!bbbbbbb
          !04!bbbbbbb
          !15!bbbbbbb
          !01!hhhhhhhhhhh
          !13!hhhhhhhhhhh

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !01!hhhhhhhhhhh
          !03!bbbbbbb
          !04!bbbbbbb
          !07!aaaaa
          !09!aaaaa
          !12!aaaaa
          !13!hhhhhhhhhhh
          !15!bbbbbbb

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          hhhhhhhhhhh
          bbbbbbb
          bbbbbbb
          aaaaa
          aaaaa
          aaaaa
          hhhhhhhhhhh
          bbbbbbb


        • For case C), use the regexes : (?-s)^((.+!).+\R)(?:(\2.+\R)+|(?!\2.+\R)) , REPLACE = (?3\1)

          aaaaa!07!
          bbbbbbb!03!
          hhhhhhhhhhh!01!

        • Use the regexes : SEARCH = ^(.+?)(!.+) , REPLACE = \2\1

          !07!aaaaa
          !03!bbbbbbb
          !01!hhhhhhhhhhh

        • Click on the menu option Edit - Line Operations - Sort Lines Lexicographically Ascending

          !01!hhhhhhhhhhh
          !03!bbbbbbb
          !07!aaaaa

        • Finally, use the regexes : SEARCH = ^.+! REPLACE = EMPTY. We get the final text :

          hhhhhhhhhhh
          bbbbbbb
          aaaaa


        To end with, I also tried a normal case, with a file, containing 1557 lines, whose 189 lines are unique No problem !

        Best Regards,

        guy038

        Pffff! About a complete day to get this post :-)) Really time to eat and rest a bit !

        1 Reply Last reply Reply Quote 2
        • Hà NguyễnH Offline
          Hà Nguyễn
          last edited by

          Hi Guy038 ,

          " the regexes : SEARCH = (?-s)^(.+\R)\1+ , REPLACE = EMPTY "

          i do that step ,but nothing happen ,why so ,dear

          1 Reply Last reply Reply Quote 0

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors