Community
    • Login

    Search and Replace with PythonScript compared to built-in

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 3 Posters 266 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Wim GielisW
      Wim Gielis
      last edited by Wim Gielis

      Hello all,

      I often work with large log files, currently with a file of 420K lines and 70 million characters. I understand that this is very big.

      I was experimenting with a quick “delete text” PythonScript:

      • when text is selected under the cursor, delete that text in the entire file (literal delete, no regex)
      • if no text is selected, a prompt asks for the text to be deleted
        • if the text starts with 0, it is a literal delete of what follows the 0
        • if the text starts with 1, it is a regex delete of what follows the 1

      Alt-d launches the script. It works very well and I am happy with the result.

      2 questions remain though and AI tools could not help because they invented non-existing methods:

      • the performance of Search/Replace with nothing is much weaker in PythonScript compared to doing Search/Replace with nothing in Notepad++ itself. For example, dates like \d{4}-\d{2}-\d{2} are deleted in 15 seconds with Notepad++, 35 seconds with PythonScript. Do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ? I cannot get it to work to set the strings and choose regex or literal.
      • the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the replace methods in PythonScript ?

      If anyone wants, I can provide the working script here. It is definitely faster than Find/Replace with nothing in the Notepad++ interface, definitely with multiple delete’s one after the other.

      Thanks, best regards,

      Wim

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Wim Gielis
        last edited by Alan Kilborn

        @Wim-Gielis said :

        deleted in 15 seconds with Notepad++, 35 seconds with PythonScript

        This is a large difference; I would think it would not be so different.
        Perhaps you’d better show a script.

        do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ?

        Well, nothing that easy. You can, of course, gain access to the controls themselves and manipulate them via code…

        the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the … methods in PythonScript ?

        This is true. It isn’t needed. In fact, Extended only exists because it existed before regular expressions, and it just wasn’t removed with the advent of Regular expression mode.

        Wim GielisW 1 Reply Last reply Reply Quote 1
        • Wim GielisW
          Wim Gielis @Alan Kilborn
          last edited by Wim Gielis

          @Alan-Kilborn

          Hello Alan,

          Sure ! Here goes:

          from Npp import editor, notepad
          
          
          PREFIX_LITERAL = '0'
          PREFIX_REGEX = '1'
          
          
          # Wim Gielis
          # Feb. 2025
          # Support received from https://community.notepad-plus-plus.org
          #
          # DeleteText script (Alt-d):
          #       - The selected text is deleted in the whole file
          #       - If no text is selected then input is asked from the user
          #       - Using this input, a regular expressions way of deleting can be asked:
          #         * if the input is 1...... and it starts with 1 then the regex following 1 is used
          #         * if the input is 0...... and it starts with 0 then the literal text following 0 is used
          #       - This script works faster for the regular deletes (literal text),
          #         but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window.
          #       - The script allows for multi-selections, in which case it will be a literal delete of each selected text.
          
          
          # Options for delete_mode (when single selection is done):
          # delete_mode = -1 ==> no delete
          # delete_mode =  0 ==> a literal delete
          # delete_mode =  1 ==> a regex delete
          
          # Get the selected area(s) and the selected text(s)
          num_selections = editor.getSelections()
          if num_selections == 1:
              delete_mode = 0
              selected_text = editor.getSelText()
          
              # Ensure there is some selected text (or we go for a regex delete)
              if not selected_text:
                  # Ask the user for input
                  selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "")
                  if selected_text is None:
                      # the user canceled the prompt
                      delete_mode = -1
                  elif not selected_text:
                      # the user entered nothing
                      delete_mode = -1
                  elif selected_text.startswith(PREFIX_LITERAL):
                      # prefix 0 means a literal delete
                      delete_mode = 0
                      selected_text = selected_text[1:]
                  elif selected_text.startswith(PREFIX_REGEX):
                      # prefix 1 means a regex delete
                      delete_mode = 1
                      selected_text = selected_text[1:]
                  else:
                      # default case is a literal delete
                      delete_mode = 0
          
              if delete_mode > -1:
          
                  # Start an undo action for a single undo step
                  editor.beginUndoAction()
          
                  # Get the current position of the caret to avoid losing position after replacement
                  caret_position = editor.getCurrentPos()
          
                  # Here we have not extended delete, and for large files it is slower than the builtin Search/Replace functionality
                  # Replace all occurrences of the selected text with an empty string
                  if delete_mode == 0:
                      editor.replace(selected_text, "")
                  elif delete_mode == 1:
                      editor.rereplace(selected_text, "")
          
                  # Move caret back to its original position
                  editor.gotoPos(caret_position)
          
                  # End the undo action
                  editor.endUndoAction()
          
          else:
              selected_texts = []
          
              # Multiple selections are done
              for i in range(num_selections):
                  # Get each time the selected text
                  start = editor.getSelectionNStart(i)
                  end = editor.getSelectionNEnd(i)
                  if end > start:
                      selected_text = editor.getTextRange(start, end)
                      if selected_text:
                          selected_texts.append(selected_text)
                  
              for index, selected_text in enumerate(selected_texts):
                  # Delete the found text(s) in the whole file
                  # (see above for explanations of the code lines)
                  editor.beginUndoAction()
                  caret_position = editor.getCurrentPos()
                  editor.replace(selected_text, "")
                  editor.gotoPos(caret_position)
                  editor.endUndoAction()
          1 Reply Last reply Reply Quote 0
          • Mark OlsonM
            Mark Olson
            last edited by

            In PythonScript, suppose you want to replace the pattern (?i)^f\w+ with XXX.

            The normal (and most concise) way in PythonScript to do this would be editor.rereplace(r'(?i)^f\w+', 'XXX').

            However, you can also import re, slurp up all the text of the file into Python-land, and use re.sub to do all of these replacements. This is much faster; in a test file consisting of 160,000 lines where each line is all random word characters, the version using re.sub took 0.04 seconds while the editor.rereplace version took about 0.53 seconds.

            Below is a complete script that uses re.sub instead of editor.rereplace.

            import re # standard Python regex library
            from Npp import editor
            
            text = editor.getText()
            # do regex-replace
            REGEX = '(?mi)^f\w+' # note that we have to set the `m` flag so that the `^` only matches the start of each line
            REPL = 'XXX'
            replaced = re.sub(REGEX, REPL, text)
            # set the text in the file to the regex-replaced text
            editor.setText(replaced)
            

            So in summary here’s the pros and cons of the re.sub version:
            PROS:

            1. about 10 times faster (but the ratio can vary a fair bit)

            That’s the only pro that I can think of TBH.

            CONS:

            1. Has to use the re flavor of regular expressions, which has several annoying deficiencies compared to the Boost::Regex` flavor that Notepad++ uses.
            2. Because all the changes are made inside of Python and the entire file is overwritten with re.sub, the Change History feature of Notepad++ will show the entire file as being modified when you use re.sub, whereas editor.rereplace will cause the Change History feature to correctly show only the lines that were actually modified as being modified.
            Wim GielisW Alan KilbornA 2 Replies Last reply Reply Quote 4
            • Wim GielisW
              Wim Gielis @Mark Olson
              last edited by

              Hello @Mark-Olson,

              Thank you very much !

              I think that my strategy will be:

              • check the size of the document
              • if smallish, proceed with my code above
              • if bigger, proceed with your code above

              I am not particularly worried about regex dialects. Personally, I use regex quite often but not the most advanced tools inside regex.
              Hence I would think that in every variant I would get the same results and possibilities given my needs.

              Thanks !

              1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Mark Olson
                last edited by

                @Mark-Olson said:

                the version using re.sub took 0.04 seconds while the editor.rereplace version took about 0.53 seconds.

                I’d guess that what slows the PythonScript version down is that each change generates a bunch of notifications and other “administrative” stuff to happen in Notepad++ for each change, whereas this is not really happening (well, except once) in the Python “slurp it up and put it all back” approach.

                Wim GielisW 1 Reply Last reply Reply Quote 2
                • Wim GielisW
                  Wim Gielis @Alan Kilborn
                  last edited by Wim Gielis

                  Hello all,

                  Proud to present the adapted code. It works very very fast, also in my log file containing 420K lines.

                  My only 2 questions:

                  • what do you think of the cutoff value of 1 MB (1_000_000 bytes): how would you set it ?
                  • what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?

                  This tool is going to save a lot of time :-) Sharing to the community with anyone interested.

                  from Npp import editor, notepad
                  import re
                  
                  
                  # Wim Gielis
                  # Feb. 2025
                  #
                  # DeleteText script (Alt-d):
                  #       - The selected text is deleted in the whole file
                  #       - If no text is selected then input is asked from the user. Using this input, a regular expressions way of deleting can be asked:
                  #         * if the input is re...... and it starts with 're' then the regex following the 1 is deleted everywhere
                  #         * all other input is treated as literal text
                  #       - This script works faster for the regular deletes (literal text),
                  #         but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window.
                  #       - The script allows for multi-selections, in which case it will be a literal delete of each selected text.
                  #       - For bigger files (above 1 MB), regex replacements through Notepad++ can be slower. In that case, we use the re module from Python.
                  #       - Options for delete_mode when single selection delete is done:
                  #         * delete_mode = -1 ==> no delete
                  #         * delete_mode =  0 ==> a literal delete
                  #         * delete_mode =  1 ==> a regex delete
                  #       - Options for delete_mode when multi-selections delete is done:
                  #         * delete_mode =  0 ==> a literal delete
                  #       - Documentation and links:
                  #         * https://community.notepad-plus-plus.org/topic/26620/search-and-replace-with-pythonscript-compared-to-built-in
                  #         * https://npppythonscript.sourceforge.net/docs/latest/scintilla.html
                  
                  
                  PREFIX_REGEX = 're'
                  
                  
                  def delete_text(text_pattern: str, delete_mode: int) -> None:
                  
                      # delete_mode = 0 ==> literal delete so a literal replace is needed
                      # delete_mode = 1 ==> regex delete so a regex replace is needed
                  
                      if editor.getLength() < 1_000_000:
                  
                          # Start an undo action for a single undo step
                          editor.beginUndoAction()
                  
                          # Get the current position of the caret to avoid losing position after replacement
                          caret_position = editor.getCurrentPos()
                  
                          if delete_mode == 0:
                              editor.replace(text_pattern, "")
                  
                          elif delete_mode == 1:
                              editor.rereplace(text_pattern, "")
                  
                          # Move caret back to its original position
                          editor.gotoPos(caret_position)
                  
                          # End the undo action
                          editor.endUndoAction()
                  
                      else:
                          # print("File size too big and switching to the Python re module: " + str(editor.getLength))
                          full_text = editor.getText()
                  
                          if delete_mode == 0:
                              editor.setText(full_text.replace(text_pattern, ""))
                  
                          elif delete_mode == 1:
                              editor.setText(re.sub(text_pattern, "", full_text))
                  
                  # Get the selected area(s) and the selected text(s)
                  num_selections = editor.getSelections()
                  if num_selections == 1:
                      delete_mode = 0
                      selected_text = editor.getSelText()
                  
                      # Ensure there is some selected text (or we go for a regex delete)
                      if not selected_text:
                          # Ask the user for input
                          selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "")
                  
                          if selected_text is None:
                              # the user canceled the prompt
                              delete_mode = -1
                          elif not selected_text:
                              # the user entered nothing
                              delete_mode = -1
                          elif selected_text.startswith(PREFIX_REGEX):
                              # prefix 1 means a regex delete
                              delete_mode = 1
                              selected_text = selected_text[len(PREFIX_REGEX):]
                          else:
                              # default case is a literal delete
                              delete_mode = 0
                  
                      if delete_mode > -1:
                          delete_text(selected_text, delete_mode)
                  
                  else:
                      selected_texts = []
                  
                      # Multiple selections are done
                      for i in range(num_selections):
                  
                          # Get each time the selected text
                          start = editor.getSelectionNStart(i)
                          end = editor.getSelectionNEnd(i)
                          if end > start:
                              selected_text = editor.getTextRange(start, end)
                              if selected_text:
                                  selected_texts.append(selected_text)
                  
                      for index, selected_text in enumerate(selected_texts):
                          delete_text(selected_text, 0)
                  
                  Mark OlsonM 1 Reply Last reply Reply Quote 3
                  • Wim GielisW Wim Gielis referenced this topic on
                  • Mark OlsonM
                    Mark Olson @Wim Gielis
                    last edited by Mark Olson

                    @Wim-Gielis said in Search and Replace with PythonScript compared to built-in:

                    what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?

                    Regarding beginUndoAction and endUndoAction:

                    Suppose you have the following code:

                    editor.beginUndoAction()
                    editor.doSomething() # this can be undone with Ctrl+Z
                    editor.doSomethingElse() # also can be undone with Ctrl+Z
                    editor.doAnotherThing() # also can be undone with Ctrl+Z
                    editor.endUndoAction()
                    

                    The (begin/end)UndoAction calls wrapping this code ensure that when the user hits Ctrl+Z, that undoes the entire block between those calls. If you didn’t have those calls, the user would have to hit Ctrl+Z three times to undo that plugin command.

                    For example, if you have a plugin command that iterates through all the user’s selections, and changes each of those selections one by one, it is important to use (begin/end)UndoAction because otherwise the user will have to hit Ctrl+Z once for every selection to undo their call to the plugin command.

                    Regarding caret_position and goToPos:

                    Some Scintilla calls snap the position of the caret to somewhere unexpected. editor.goToPos helps alleviate that.

                    Wim GielisW 1 Reply Last reply Reply Quote 4
                    • Wim GielisW
                      Wim Gielis @Mark Olson
                      last edited by

                      @Mark-Olson

                      Thank you, then I leave it in the code.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors