Community
    • Login

    remove duplicated line

    Scheduled Pinned Locked Moved General Discussion
    12 Posts 6 Posters 894 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pinuzzu99P
      pinuzzu99
      last edited by

      i have a very long txt file like this. on this file some line are duplicate. i have try with “replace” on reg-ex this commnad:
      find: ^(.*)(\r?\n\1)+$
      replace: $1

      but not work on my specific case. also i have try:
      find: ^(.*\r?\n)\1+
      replace: empty

      but this also does not work in my case. how to remove duplicate lines?

      dangsjceamkales@gsnail.com:c6718e7c
      Tom34f@sogbug.com:y7vk5z9292
      zesorex@gmail.com:ploksfasd
      j096875244@gmail.com:st608g410000
      doniel.ctz@homail.com:Cotvxbza22523286
      levjaamel@hetmail.com:camxmel2004
      Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
      szaborefeupert666@gail.com:Rupejffgano666
      jodgsjny0531@cofx.net:Draskakgon357
      zesorex@gmail.com:ploksfasd
      wse_adgel_one@hogmail.com:6947903024
      j096875244@gmail.com:st608g410000
      jringahdhsque@hotmail.com:nadfjddkalgo
      Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
      
      EkopalypseE 1 Reply Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse @pinuzzu99
        last edited by

        @pinuzzu99

        if it is not needed to keep the ordering you can do
        Edit->Line Operations->Sort Lines …
        Edit->Line Operations->Remove Consecutive Duplicate Lines

        1 Reply Last reply Reply Quote 0
        • pinuzzu99P
          pinuzzu99
          last edited by

          oh great, tanxs.
          anyway i need reg ex string to delete my duplicate line without intervening in the order…

          rinku singhR 1 Reply Last reply Reply Quote 0
          • rinku singhR
            rinku singh @pinuzzu99
            last edited by

            @pinuzzu99
            use remove duplicate line plugin

            1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones
              last edited by

              @pinuzzu99 ,

              If you are willing to hit Replace All multiple times, until all duplicates are removed, this worked for me with your example:

              • FIND = (?s)((^.*?$)\R.*)\R*\2(\R|\Z)
              • REPLACE = $1
              • MODE = regular expression

              After three runs, it had become:

              dangsjceamkales@gsnail.com:c6718e7c
              Tom34f@sogbug.com:y7vk5z9292
              zesorex@gmail.com:ploksfasd
              j096875244@gmail.com:st608g410000
              doniel.ctz@homail.com:Cotvxbza22523286
              levjaamel@hetmail.com:camxmel2004
              Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
              szaborefeupert666@gail.com:Rupejffgano666
              jodgsjny0531@cofx.net:Draskakgon357
              wse_adgel_one@hogmail.com:6947903024
              jringahdhsque@hotmail.com:nadfjddkalgo
              
              

              … which I think is what you wanted.

              But yes, @gurikbal-singh’s Remove Duplicate Lines plugin should do what you want, too. Just go to Plugijns > Plugins Admin to install it.

              1 Reply Last reply Reply Quote 0
              • pinuzzu99P
                pinuzzu99
                last edited by

                oh yes PeterJones, work well! tanxs
                but I have to click each time to delete 1 row at a time … and if I had 5000 double rows ???
                isn’t there a single command to bulk remove everything in one go?

                and thanks for the advice of the “remove duplicate line” plug-in. I didn’t know it existed, now I prove it. thank you

                1 Reply Last reply Reply Quote 0
                • PeterJonesP
                  PeterJones
                  last edited by

                  @pinuzzu99 said in remove duplicated line:

                  but I have to click each time to delete 1 row at a time … and if I had 5000 double rows ???
                  isn’t there a single command to bulk remove everything in one go?

                  Regex aren’t infinitely powerful. You can do a lot with them, but if you want to do super-complicated things, sometimes it’s better to use a full-blown programming language (which is what the plugin does, obviously).

                  For example, in perl, running from the command line, it could be done with a readable 3-line script, or the condensed oneliner: perl -pi.bak -e "chomp($k=$_);$_=''if$h{$k};++$h{$k}" filename, which would save the original to filename.bak, and delete the duplicate lines when re-generating filename, assuming there’s enough memory to create the hash (map) which checks for duplicates. If memory became a concern, you could sacrifice speed for memory and generate a shorter key (maybe using crc32 or similar algorithm) to get a 1:1 mapping of line-of-text to key, but have the keys be short enough that they don’t overflow your memory – but this isn’t a general programming-help forum, so I won’t go any farther than that.

                  1 Reply Last reply Reply Quote 0
                  • pinuzzu99P
                    pinuzzu99
                    last edited by

                    ok, understand. you have been very clear.
                    at this point I will use the reg-ex for simple things, and the plug-in for the more complicated txt. thank you for your support.

                    1 Reply Last reply Reply Quote 0
                    • pinuzzu99P
                      pinuzzu99
                      last edited by

                      hey guy038 do you don’t have valid recipe to do it all in one shot?
                      I do not mean like string (?s)((^.?$)\R.)\R*\2(\R|\Z)
                      REPLACE = $1
                      work only with one value at a time…
                      plug-in duplicate line work fine, but refine reg-ex it’s not possible?

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @pinuzzu99
                        last edited by

                        @pinuzzu99

                        It is possible that regex could work, but it is possible to overwhelm the regex engine with such an execution. You will know you have done this because the entire document will become selected. Better to do it in a non-regex way.

                        1 Reply Last reply Reply Quote 1
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello @pinuzzu99, @ekopalypse, @gurikbal-singh, @peterjones, @alan-kilborn and All,

                          Sorry for my late answer : I did a 3-days ski trip to Les Arcs 1800 French resort. We were a group of 14 people. Unfortunately, sun was not there the first two days and on the last day, no skiing due to snow showers !


                          Luckily, a one-go regex S/R is possible ;-))

                          So, assuming the input text, below :

                          Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
                          dangsjceamkales@gsnail.com:c6718e7c
                          Tom34f@sogbug.com:y7vk5z9292
                          zesorex@gmail.com:ploksfasd
                          j096875244@gmail.com:st608g410000
                          doniel.ctz@homail.com:Cotvxbza22523286
                          zesorex@gmail.com:ploksfasd
                          levjaamel@hetmail.com:camxmel2004
                          Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
                          szaborefeupert666@gail.com:Rupejffgano666
                          jodgsjny0531@cofx.net:Draskakgon357
                          zesorex@gmail.com:ploksfasd
                          wse_adgel_one@hogmail.com:6947903024
                          j096875244@gmail.com:st608g410000
                          j096875244@gmail.com:st608g410000
                          jringahdhsque@hotmail.com:nadfjddkalgo
                          Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
                          

                          Use the following regex S/R :

                          SEARCH (?-is)^(.+)\R(?=(?s).*^\1)

                          REPLACE Leave EMPTY

                          And you’ll get the output text

                          dangsjceamkales@gsnail.com:c6718e7c
                          Tom34f@sogbug.com:y7vk5z9292
                          doniel.ctz@homail.com:Cotvxbza22523286
                          levjaamel@hetmail.com:camxmel2004
                          szaborefeupert666@gail.com:Rupejffgano666
                          jodgsjny0531@cofx.net:Draskakgon357
                          zesorex@gmail.com:ploksfasd
                          wse_adgel_one@hogmail.com:6947903024
                          j096875244@gmail.com:st608g410000
                          jringahdhsque@hotmail.com:nadfjddkalgo
                          Andrewhsjfmesjones00@yahoo.com:Winpfgston99001
                          

                          Notes :

                          • This regex searches for any non-empty line, separated from an identical line, case included, by any range of characters, possibly nul and/or multi-lines Thus, it deletes all duplicates of a line, located before this original line

                          • The first part (?-is) is the traditional in-line modifiers ( so dot = 1 standard char and case taken in account )

                          • Then, the part ^(.+)\R, searches the contents of any non-empty line, from the beginning, stored as group 1 and followed with its line-break \R

                          • The last part (?=(?s).*^\1) is a positive look-ahead structure, (?=........), that is to say a condition which must be true, in order to validate the overall match, but which is never part of the overall match !

                            • The part (?s).* represents any range, even nul, of any kind of characters ( standard or EOL chars ), due to the (?s) modifier

                            • The part ^\1 matches the same range of characters \1, beginning a line

                          • As the replacement zone is empty, any line, with its line-break, which is repeated downwards, is then deleted

                          Remark :

                          In an huge file, if two identical lines are separated by a lot of text/lines, this regex S/R may fail and wrongly finds an all contents file match. For instance :

                          • Two lines, separated with 1600 all different lines, of 32 characters each, give a correct result of 1 occurrence ( The line with a duplicate )

                          • Two lines, separated with 1700 all different lines, of 32 characters each, give a incorrect result of 2 occurrences ( The line with a duplicate and all file contents )

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • pinuzzu99P
                            pinuzzu99
                            last edited by

                            tanxs guy038.
                            I’m glad you went ski, even if the weather was not perfect… every now and then it is good to detach from the pc!
                            tanxs for your reply, but not just for the answer itself, as for the spirit you put into it…
                            thank you so much for your very appreciated answers.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors