Search and Replace with PythonScript compared to built-in
-
Hello all,
I often work with large log files, currently with a file of 420K lines and 70 million characters. I understand that this is very big.
I was experimenting with a quick “delete text” PythonScript:
- when text is selected under the cursor, delete that text in the entire file (literal delete, no regex)
- if no text is selected, a prompt asks for the text to be deleted
- if the text starts with 0, it is a literal delete of what follows the 0
- if the text starts with 1, it is a regex delete of what follows the 1
Alt-d launches the script. It works very well and I am happy with the result.
2 questions remain though and AI tools could not help because they invented non-existing methods:
- the performance of Search/Replace with nothing is much weaker in PythonScript compared to doing Search/Replace with nothing in Notepad++ itself. For example, dates like \d{4}-\d{2}-\d{2} are deleted in 15 seconds with Notepad++, 35 seconds with PythonScript. Do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ? I cannot get it to work to set the strings and choose regex or literal.
- the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the replace methods in PythonScript ?
If anyone wants, I can provide the working script here. It is definitely faster than Find/Replace with nothing in the Notepad++ interface, definitely with multiple delete’s one after the other.
Thanks, best regards,
Wim
-
@Wim-Gielis said :
deleted in 15 seconds with Notepad++, 35 seconds with PythonScript
This is a large difference; I would think it would not be so different.
Perhaps you’d better show a script.do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ?
Well, nothing that easy. You can, of course, gain access to the controls themselves and manipulate them via code…
the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the … methods in PythonScript ?
This is true. It isn’t needed. In fact, Extended only exists because it existed before regular expressions, and it just wasn’t removed with the advent of Regular expression mode.
-
Hello Alan,
Sure ! Here goes:
from Npp import editor, notepad PREFIX_LITERAL = '0' PREFIX_REGEX = '1' # Wim Gielis # Feb. 2025 # Support received from https://community.notepad-plus-plus.org # # DeleteText script (Alt-d): # - The selected text is deleted in the whole file # - If no text is selected then input is asked from the user # - Using this input, a regular expressions way of deleting can be asked: # * if the input is 1...... and it starts with 1 then the regex following 1 is used # * if the input is 0...... and it starts with 0 then the literal text following 0 is used # - This script works faster for the regular deletes (literal text), # but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window. # - The script allows for multi-selections, in which case it will be a literal delete of each selected text. # Options for delete_mode (when single selection is done): # delete_mode = -1 ==> no delete # delete_mode = 0 ==> a literal delete # delete_mode = 1 ==> a regex delete # Get the selected area(s) and the selected text(s) num_selections = editor.getSelections() if num_selections == 1: delete_mode = 0 selected_text = editor.getSelText() # Ensure there is some selected text (or we go for a regex delete) if not selected_text: # Ask the user for input selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "") if selected_text is None: # the user canceled the prompt delete_mode = -1 elif not selected_text: # the user entered nothing delete_mode = -1 elif selected_text.startswith(PREFIX_LITERAL): # prefix 0 means a literal delete delete_mode = 0 selected_text = selected_text[1:] elif selected_text.startswith(PREFIX_REGEX): # prefix 1 means a regex delete delete_mode = 1 selected_text = selected_text[1:] else: # default case is a literal delete delete_mode = 0 if delete_mode > -1: # Start an undo action for a single undo step editor.beginUndoAction() # Get the current position of the caret to avoid losing position after replacement caret_position = editor.getCurrentPos() # Here we have not extended delete, and for large files it is slower than the builtin Search/Replace functionality # Replace all occurrences of the selected text with an empty string if delete_mode == 0: editor.replace(selected_text, "") elif delete_mode == 1: editor.rereplace(selected_text, "") # Move caret back to its original position editor.gotoPos(caret_position) # End the undo action editor.endUndoAction() else: selected_texts = [] # Multiple selections are done for i in range(num_selections): # Get each time the selected text start = editor.getSelectionNStart(i) end = editor.getSelectionNEnd(i) if end > start: selected_text = editor.getTextRange(start, end) if selected_text: selected_texts.append(selected_text) for index, selected_text in enumerate(selected_texts): # Delete the found text(s) in the whole file # (see above for explanations of the code lines) editor.beginUndoAction() caret_position = editor.getCurrentPos() editor.replace(selected_text, "") editor.gotoPos(caret_position) editor.endUndoAction()
-
In PythonScript, suppose you want to replace the pattern
(?i)^f\w+
withXXX
.The normal (and most concise) way in PythonScript to do this would be
editor.rereplace(r'(?i)^f\w+', 'XXX')
.However, you can also
import re
, slurp up all the text of the file into Python-land, and usere.sub
to do all of these replacements. This is much faster; in a test file consisting of 160,000 lines where each line is all random word characters, the version usingre.sub
took 0.04 seconds while theeditor.rereplace
version took about 0.53 seconds.Below is a complete script that uses
re.sub
instead ofeditor.rereplace
.import re # standard Python regex library from Npp import editor text = editor.getText() # do regex-replace REGEX = '(?mi)^f\w+' # note that we have to set the `m` flag so that the `^` only matches the start of each line REPL = 'XXX' replaced = re.sub(REGEX, REPL, text) # set the text in the file to the regex-replaced text editor.setText(replaced)
So in summary here’s the pros and cons of the
re.sub
version:
PROS:- about 10 times faster (but the ratio can vary a fair bit)
That’s the only pro that I can think of TBH.
CONS:
- Has to use the
re
flavor of regular expressions, which has several annoying deficiencies compared to the Boost::Regex` flavor that Notepad++ uses. - Because all the changes are made inside of Python and the entire file is overwritten with
re.sub
, the Change History feature of Notepad++ will show the entire file as being modified when you usere.sub
, whereaseditor.rereplace
will cause the Change History feature to correctly show only the lines that were actually modified as being modified.
-
Hello @Mark-Olson,
Thank you very much !
I think that my strategy will be:
- check the size of the document
- if smallish, proceed with my code above
- if bigger, proceed with your code above
I am not particularly worried about regex dialects. Personally, I use regex quite often but not the most advanced tools inside regex.
Hence I would think that in every variant I would get the same results and possibilities given my needs.Thanks !
-
@Mark-Olson said:
the version using re.sub took 0.04 seconds while the editor.rereplace version took about 0.53 seconds.
I’d guess that what slows the PythonScript version down is that each change generates a bunch of notifications and other “administrative” stuff to happen in Notepad++ for each change, whereas this is not really happening (well, except once) in the Python “slurp it up and put it all back” approach.
-
Hello all,
Proud to present the adapted code. It works very very fast, also in my log file containing 420K lines.
My only 2 questions:
- what do you think of the cutoff value of 1 MB (1_000_000 bytes): how would you set it ?
- what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?
This tool is going to save a lot of time :-) Sharing to the community with anyone interested.
from Npp import editor, notepad import re # Wim Gielis # Feb. 2025 # # DeleteText script (Alt-d): # - The selected text is deleted in the whole file # - If no text is selected then input is asked from the user. Using this input, a regular expressions way of deleting can be asked: # * if the input is re...... and it starts with 're' then the regex following the 1 is deleted everywhere # * all other input is treated as literal text # - This script works faster for the regular deletes (literal text), # but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window. # - The script allows for multi-selections, in which case it will be a literal delete of each selected text. # - For bigger files (above 1 MB), regex replacements through Notepad++ can be slower. In that case, we use the re module from Python. # - Options for delete_mode when single selection delete is done: # * delete_mode = -1 ==> no delete # * delete_mode = 0 ==> a literal delete # * delete_mode = 1 ==> a regex delete # - Options for delete_mode when multi-selections delete is done: # * delete_mode = 0 ==> a literal delete # - Documentation and links: # * https://community.notepad-plus-plus.org/topic/26620/search-and-replace-with-pythonscript-compared-to-built-in # * https://npppythonscript.sourceforge.net/docs/latest/scintilla.html PREFIX_REGEX = 're' def delete_text(text_pattern: str, delete_mode: int) -> None: # delete_mode = 0 ==> literal delete so a literal replace is needed # delete_mode = 1 ==> regex delete so a regex replace is needed if editor.getLength() < 1_000_000: # Start an undo action for a single undo step editor.beginUndoAction() # Get the current position of the caret to avoid losing position after replacement caret_position = editor.getCurrentPos() if delete_mode == 0: editor.replace(text_pattern, "") elif delete_mode == 1: editor.rereplace(text_pattern, "") # Move caret back to its original position editor.gotoPos(caret_position) # End the undo action editor.endUndoAction() else: # print("File size too big and switching to the Python re module: " + str(editor.getLength)) full_text = editor.getText() if delete_mode == 0: editor.setText(full_text.replace(text_pattern, "")) elif delete_mode == 1: editor.setText(re.sub(text_pattern, "", full_text)) # Get the selected area(s) and the selected text(s) num_selections = editor.getSelections() if num_selections == 1: delete_mode = 0 selected_text = editor.getSelText() # Ensure there is some selected text (or we go for a regex delete) if not selected_text: # Ask the user for input selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "") if selected_text is None: # the user canceled the prompt delete_mode = -1 elif not selected_text: # the user entered nothing delete_mode = -1 elif selected_text.startswith(PREFIX_REGEX): # prefix 1 means a regex delete delete_mode = 1 selected_text = selected_text[len(PREFIX_REGEX):] else: # default case is a literal delete delete_mode = 0 if delete_mode > -1: delete_text(selected_text, delete_mode) else: selected_texts = [] # Multiple selections are done for i in range(num_selections): # Get each time the selected text start = editor.getSelectionNStart(i) end = editor.getSelectionNEnd(i) if end > start: selected_text = editor.getTextRange(start, end) if selected_text: selected_texts.append(selected_text) for index, selected_text in enumerate(selected_texts): delete_text(selected_text, 0)
-
W Wim Gielis referenced this topic on
-
@Wim-Gielis said in Search and Replace with PythonScript compared to built-in:
what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?
Regarding
beginUndoAction
andendUndoAction
:Suppose you have the following code:
editor.beginUndoAction() editor.doSomething() # this can be undone with Ctrl+Z editor.doSomethingElse() # also can be undone with Ctrl+Z editor.doAnotherThing() # also can be undone with Ctrl+Z editor.endUndoAction()
The
(begin/end)UndoAction
calls wrapping this code ensure that when the user hits Ctrl+Z, that undoes the entire block between those calls. If you didn’t have those calls, the user would have to hit Ctrl+Z three times to undo that plugin command.For example, if you have a plugin command that iterates through all the user’s selections, and changes each of those selections one by one, it is important to use
(begin/end)UndoAction
because otherwise the user will have to hit Ctrl+Z once for every selection to undo their call to the plugin command.Regarding
caret_position
andgoToPos
:Some Scintilla calls snap the position of the caret to somewhere unexpected.
editor.goToPos
helps alleviate that. -
Thank you, then I leave it in the code.