Community
    • Login

    Remove duplicate numerical lines

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    48 Posts 8 Posters 14.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner
      last edited by

      We have been around the block with the regular expression solution to this. There are also Pythonscript and OS-level solutions. How about one more KISS version of a Pythonscript? This is about as simple and barebones as it gets…maybe let’s see what kind of limitations are encountered with its use:

      from Npp import notepad, editor
      eol = ['\r\n', '\r', '\n'][notepad.getFormatType()]
      line_dict = {}
      line_removal_list = []
      for j in range(editor.getLineCount()):
          l = editor.getLine(j)
          if len(l) > len(eol):
              if l in line_dict:
                  line_removal_list.append(j)
              else:
                  line_dict[l] = None
      if len(line_removal_list) > 0:
          editor.beginUndoAction()
          # remove lines in highest-line-number to lowest-line-number fashion:
          for j in line_removal_list[::-1]: editor.deleteLine(j)
          editor.endUndoAction()
      
      1 Reply Last reply Reply Quote 2
      • rizla kostasR
        rizla kostas
        last edited by

        nice this is a python script how to run it in notepad++

        you make all lines as arrays and remove duplicates?

        Scott SumnerS 1 Reply Last reply Reply Quote 0
        • Scott SumnerS
          Scott Sumner @rizla kostas
          last edited by Scott Sumner

          @rizla-kostas

          python script how to run it in notepad++

          Well you need to install the Pythonscript plugin. :)

          The script makes the contents of each lines a dictionary key (thus, unique). As each line is examined, if there is already a key in the dictionary, we know that that line has already occurred, so it is added to a list of line numbers to delete. After all lines have been examined, we run through the list of duplicate line numbers in reverse order (high-to-low) and delete them. Why high-to-low? Because if we delete them low-to-high it would interfere with the remainder of the line numbers, example: if you need to delete lines 5 and 7 and you delete line 5 first, the original line 7 is now line 6! If you delete line 7 first then line 5 is still the one you want to delete next.

          1 Reply Last reply Reply Quote 2
          • PeterJonesP
            PeterJones
            last edited by

            @rizla-kostas ,

            There is a plugin for Notepad++ called “PythonScript”, which embeds a Python interpreter inside the plugin, and allows automation of the Notepad++ GUI/Environment/editor-component through the Python language. If you install PythonScript (some useful links below), then you can run those programs from the PythonScript plugin’s menu.

            -----

            • PythonScript HOME
            • PythonScript DOWNLOAD
            • HELP = Plugins > Python Script > Context-Help
            • Getting Started with Python
            1 Reply Last reply Reply Quote 2
            • rizla kostasR
              rizla kostas
              last edited by rizla kostas

              thank you so so so much all of you guys behind notepad++

              i will test it tomorrow and i will report back thanks again

              1 Reply Last reply Reply Quote 2
              • Terry RT
                Terry R
                last edited by

                @Scott-Sumner
                I just tested your pythonscript and I think it misses 1 dup, possibly due to the last line having no CRLF. I added that and it then worked as expected (for me).

                I’m trying to learn pythonscript, but unable to see where in your code the problem might be arising.

                Terry

                Scott SumnerS 1 Reply Last reply Reply Quote 2
                • Scott SumnerS
                  Scott Sumner @Terry R
                  last edited by

                  @Terry-R

                  Hey Terry!

                  Is the last line which doesn’t have a line-ending REALLY a duplicate of an earlier line that does have a line-ending? :) Well, okay, it IS if we are talking about line-endingless content, which we (probably) are.

                  Anyway, the culprit line in the code would be the one with editor.getLineCount() in it. You will have one less line without a line-ending on your last line, and thus the range function will cause it to go one less iteration. But also to blame is that when the script remembers a previously encountered line, it does so WITH THE LINE-ENDING ON. So there’s a double reason for failure here.

                  I don’t like files without line-endings on their last lines. I sure do wish there was an option in N++ to automatically make sure lines all have proper ends on them. [Of course I have a Pythonscript that makes sure of this for me, so I don’t usually remember to take this stuff into account.]

                  BTW, note that the script ignores blank lines; something I should have mentioned earlier.

                  1 Reply Last reply Reply Quote 2
                  • PeterJonesP
                    PeterJones
                    last edited by

                    So @Scott-Sumner, are you going to leave us hanging? You need to publish the code to add the line-ending to the last line, if it’s missing it, so that your above code works properly. :-)

                    Scott SumnerS 2 Replies Last reply Reply Quote 2
                    • Scott SumnerS
                      Scott Sumner @PeterJones
                      last edited by

                      @PeterJones said:

                      You need to publish the code to add the line-ending to the last line…so that your above code works properly

                      HAHa. I will, but right now it looks overcomplicated for general use. :-) I’ll work on it and post back here when it is suitable for general consumption…

                      In the meanwhile, why not let’s just fix the original code? I found that all that is needed is to change this line:

                      l = editor.getLine(j)
                      

                      into this:

                      l = editor.getLine(j).rstrip('\n\r')
                      
                      Eko palypseE 1 Reply Last reply Reply Quote 2
                      • Eko palypseE
                        Eko palypse @Scott Sumner
                        last edited by

                        @Scott-Sumner

                        what about using OrderedDict from collections?
                        Preserves the ordering and dict keys are unique per se.

                        from Npp import editor
                        from collections import OrderedDict
                        _dict = OrderedDict.fromkeys(editor.getText().splitlines())
                        editor.setText('\r\n'.join(_dict.keys()))
                        

                        Eko

                        Scott SumnerS 1 Reply Last reply Reply Quote 1
                        • Scott SumnerS
                          Scott Sumner @Eko palypse
                          last edited by

                          @Eko-palypse said:

                          what about…?

                          Sure, why not? Only objection might be the empty line case (my experience is that people usually want their blank lines retained as is, and not removed as duplicates).

                          Eko palypseE 1 Reply Last reply Reply Quote 1
                          • Eko palypseE
                            Eko palypse @Scott Sumner
                            last edited by

                            @Scott-Sumner

                            right, this case makes it a little bit more difficulty, agreed.

                            Eko

                            1 Reply Last reply Reply Quote 0
                            • Eko palypseE
                              Eko palypse
                              last edited by Eko palypse

                              @Scott-Sumner

                              What about this

                              from Npp import editor
                              lastLineContainsEOL = True if len(editor.getLine(editor.getLineCount()-1)) == 0 else False
                              lines = editor.getText().splitlines()
                              uniqueLines = set(lines)
                              newText = '' 
                              for line in lines:
                                  if line in uniqueLines or line.strip() == '':
                                      newText += line + '\r\n'
                                      if line.strip() != '':
                                          uniqueLines.remove(line)
                              editor.setText(newText if lastLineContainsEOL else newText[:-2])
                              
                              • generates unique lines only (ignoring empty lines with and without spaces)
                              • preserves ordering
                              • preserves usage of last EOL

                              Eko

                              Scott SumnerS 1 Reply Last reply Reply Quote 2
                              • Scott SumnerS
                                Scott Sumner @Eko palypse
                                last edited by

                                @Eko-palypse said:

                                What about this

                                Sure. I say “whatever works”. Much like I don’t get all fancy about shaving a few characters off a regex, I think with scripts it is to each his own. As long as it does the job, it is super. :-)

                                1 Reply Last reply Reply Quote 0
                                • Scott SumnerS
                                  Scott Sumner
                                  last edited by

                                  @Eko-palypse

                                  One comment, though: I’m guessing you pretty much exclusively use Windows. I use Windows/Linux about 75%/25%…because of that I have learned to not think that line-endings are always \r\n. So scripts I post here will work (that’s the goal anyway) with either Windows or Linux (or even Mac) files.

                                  This may be something you want to consider doing as well. But it doesn’t bother me if you don’t because I understand the meaning of it–for someone that just wants to blindly pick up and use a script and doesn’t understand Python, oh and BTW uses Linux files…it could be a problem.

                                  BTW, good job! I like seeing Pythonscripts besides my own posted here. Not many people are doing it anymore. :-(

                                  1 Reply Last reply Reply Quote 3
                                  • Scott SumnerS
                                    Scott Sumner @PeterJones
                                    last edited by

                                    @PeterJones said:

                                    You need to publish the code to add the line-ending to the last line, if it’s missing it

                                    Ok, so here it is; I run a similar (but more complicated one for my own needs) from my startup.py so that it is always in place–and thus I never have to deal with files without line-endings on their last lines.

                                    One thing I don’t like, but haven’t found a good method for handling, is that in certain circumstances (e.g. a Save All), after the script does its work, it can leave you sitting in an tab that is different from the tab that was active before. If people are interested in this script and have ideas about solving that particular problem, I’m interested in hearing them.

                                    Here’s the Pythonscript:

                                    from Npp import notepad, editor, NOTIFICATION
                                    
                                    def callback_npp_FILEBEFORESAVE(args):
                                        line_ending = ['\r\n', '\r', '\n'][notepad.getFormatType()]
                                        doc_size = editor.getTextLength()
                                        if editor.getTextRange(doc_size - 1, doc_size) != line_ending[-1]:
                                            # fix Notepad++'s "broken" functionality and add a line-ending at end-of-file
                                            editor.appendText(line_ending)
                                    
                                    notepad.callback(callback_npp_FILEBEFORESAVE, [NOTIFICATION.FILEBEFORESAVE])
                                    
                                    1 Reply Last reply Reply Quote 2
                                    • Eko palypseE
                                      Eko palypse
                                      last edited by

                                      @Scott-Sumner said:

                                      One comment, though: I’m guessing you pretty much exclusively use Windows. I use Windows/Linux about 75%/25%…because of that I have learned to not think that line-endings are always \r\n. So scripts I post here will work (that’s the goal anyway) with either Windows or Linux (or even Mac) files.

                                      Good point and you offered the solution already, even better :-D

                                      from Npp import editor
                                      lastLineContainsEOL = True if len(editor.getLine(editor.getLineCount()-1)) == 0 else False
                                      line_ending = ['\r\n', '\r', '\n'][notepad.getFormatType()]
                                      lines = editor.getText().splitlines()
                                      uniqueLines = set(lines)
                                      newText = '' 
                                      for line in lines:
                                          if line in uniqueLines or line.strip() == '':
                                              newText += line + line_ending 
                                              if line.strip() != '':
                                                  uniqueLines.remove(line)
                                      editor.setText(newText if lastLineContainsEOL else newText[:-2])
                                      

                                      Eko

                                      Scott SumnerS 2 Replies Last reply Reply Quote 2
                                      • Scott SumnerS
                                        Scott Sumner @Eko palypse
                                        last edited by Scott Sumner

                                        @Eko-palypse :

                                        Yes, but you forgot something. :-)

                                        editor.setText(newText if lastLineContainsEOL else newText[:-len(line_ending)])

                                        1 Reply Last reply Reply Quote 1
                                        • PeterJonesP
                                          PeterJones
                                          last edited by

                                          To continue with the hijack-tangent of this thread… :-)

                                          @Scott-Sumner said,

                                          If people are interested in this script and have ideas about solving that particular problem, I’m interested in hearing them.

                                          Challenge accepted. :-)

                                          My first idea was that you could track the previous bufferID, and make sure you always activate the previous one. While trying to see if that would help, I noticed that with the exact script you had posted, if all open files were missing EOL, it would save all files, but only fix the EOL on the active file.

                                          That gave me the flash for the solution: in the callback, store the currently-active bufferID, activate the buffer for the argument to the callback (ie, the file being saved), make the changes to the now-active file, then re-activate the originally-active buffer. The script below seemed to do it for me:

                                          from Npp import notepad, editor, NOTIFICATION
                                          
                                          def callback_npp_FILEBEFORESAVE(args):
                                              # the editor.appendText will go to the _active_ buffer, whatever
                                              # file is currently being saved.  So to solve two birds with one
                                              # stone, save the active buffer ID, then switch to the buffer ID
                                              # for this instance of the callback -- now the editor has the
                                              # correct buffer active.
                                              oldActiveID = notepad.getCurrentBufferID()
                                              notepad.activateBufferID(args["bufferID"])
                                          
                                              line_ending = ['\r\n', '\r', '\n'][notepad.getFormatType()]
                                              doc_size = editor.getTextLength()
                                              if editor.getTextRange(doc_size - 1, doc_size) != line_ending[-1]:
                                                  # fix Notepad++'s "broken" functionality and add a line-ending at end-of-file
                                                  editor.appendText(line_ending)
                                          
                                              # now that you're done editing, go back to the originally-active buffer
                                              notepad.activateBufferID(oldActiveID)
                                          
                                          notepad.callback(callback_npp_FILEBEFORESAVE, [NOTIFICATION.FILEBEFORESAVE])
                                          

                                          I tested this with three open files: two in one view, one in other view; I tried various combinations of which ones needed to be saved, and which ones were missing EOL, and which was active, and it seemed to always do what I intended, but it’s possible that other combinations won’t work.

                                          Scott SumnerS 1 Reply Last reply Reply Quote 3
                                          • Scott SumnerS
                                            Scott Sumner @PeterJones
                                            last edited by

                                            @PeterJones said:

                                            if all open files were missing EOL, it would save all files, but only fix the EOL on the active file

                                            Really? I tested with several open files (at least one of the 3 types, Win/Linux/Mac) that needed fixing and when I did a Save All they all got saved after being modified…hmmm, guess I will have another look…

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors