Search and Replace with PythonScript compared to built-in

Wim Gielis

Hello all,

I often work with large log files, currently with a file of 420K lines and 70 million characters. I understand that this is very big.

I was experimenting with a quick “delete text” PythonScript:

when text is selected under the cursor, delete that text in the entire file (literal delete, no regex)
if no text is selected, a prompt asks for the text to be deleted
- if the text starts with 0, it is a literal delete of what follows the 0
- if the text starts with 1, it is a regex delete of what follows the 1

Alt-d launches the script. It works very well and I am happy with the result.

2 questions remain though and AI tools could not help because they invented non-existing methods:

the performance of Search/Replace with nothing is much weaker in PythonScript compared to doing Search/Replace with nothing in Notepad++ itself. For example, dates like \d{4}-\d{2}-\d{2} are deleted in 15 seconds with Notepad++, 35 seconds with PythonScript. Do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ? I cannot get it to work to set the strings and choose regex or literal.
the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the replace methods in PythonScript ?

If anyone wants, I can provide the working script here. It is definitely faster than Find/Replace with nothing in the Notepad++ interface, definitely with multiple delete’s one after the other.

Thanks, best regards,

Wim

Alan Kilborn

@Wim-Gielis said :

deleted in 15 seconds with Notepad++, 35 seconds with PythonScript

This is a large difference; I would think it would not be so different.
Perhaps you’d better show a script.

do I have other options like notepad.runMenuCommand(“Search”, “Replace”) ?

Well, nothing that easy. You can, of course, gain access to the controls themselves and manipulate them via code…

the extended replacement method in Notepad++ for \r, \n, \t etc. I think that this is not exposed in the … methods in PythonScript ?

This is true. It isn’t needed. In fact, Extended only exists because it existed before regular expressions, and it just wasn’t removed with the advent of Regular expression mode.

Wim Gielis

@Alan-Kilborn

Hello Alan,

Sure ! Here goes:

from Npp import editor, notepad


PREFIX_LITERAL = '0'
PREFIX_REGEX = '1'


# Wim Gielis
# Feb. 2025
# Support received from https://community.notepad-plus-plus.org
#
# DeleteText script (Alt-d):
#       - The selected text is deleted in the whole file
#       - If no text is selected then input is asked from the user
#       - Using this input, a regular expressions way of deleting can be asked:
#         * if the input is 1...... and it starts with 1 then the regex following 1 is used
#         * if the input is 0...... and it starts with 0 then the literal text following 0 is used
#       - This script works faster for the regular deletes (literal text),
#         but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window.
#       - The script allows for multi-selections, in which case it will be a literal delete of each selected text.


# Options for delete_mode (when single selection is done):
# delete_mode = -1 ==> no delete
# delete_mode =  0 ==> a literal delete
# delete_mode =  1 ==> a regex delete

# Get the selected area(s) and the selected text(s)
num_selections = editor.getSelections()
if num_selections == 1:
    delete_mode = 0
    selected_text = editor.getSelText()

    # Ensure there is some selected text (or we go for a regex delete)
    if not selected_text:
        # Ask the user for input
        selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "")
        if selected_text is None:
            # the user canceled the prompt
            delete_mode = -1
        elif not selected_text:
            # the user entered nothing
            delete_mode = -1
        elif selected_text.startswith(PREFIX_LITERAL):
            # prefix 0 means a literal delete
            delete_mode = 0
            selected_text = selected_text[1:]
        elif selected_text.startswith(PREFIX_REGEX):
            # prefix 1 means a regex delete
            delete_mode = 1
            selected_text = selected_text[1:]
        else:
            # default case is a literal delete
            delete_mode = 0

    if delete_mode > -1:

        # Start an undo action for a single undo step
        editor.beginUndoAction()

        # Get the current position of the caret to avoid losing position after replacement
        caret_position = editor.getCurrentPos()

        # Here we have not extended delete, and for large files it is slower than the builtin Search/Replace functionality
        # Replace all occurrences of the selected text with an empty string
        if delete_mode == 0:
            editor.replace(selected_text, "")
        elif delete_mode == 1:
            editor.rereplace(selected_text, "")

        # Move caret back to its original position
        editor.gotoPos(caret_position)

        # End the undo action
        editor.endUndoAction()

else:
    selected_texts = []

    # Multiple selections are done
    for i in range(num_selections):
        # Get each time the selected text
        start = editor.getSelectionNStart(i)
        end = editor.getSelectionNEnd(i)
        if end > start:
            selected_text = editor.getTextRange(start, end)
            if selected_text:
                selected_texts.append(selected_text)
        
    for index, selected_text in enumerate(selected_texts):
        # Delete the found text(s) in the whole file
        # (see above for explanations of the code lines)
        editor.beginUndoAction()
        caret_position = editor.getCurrentPos()
        editor.replace(selected_text, "")
        editor.gotoPos(caret_position)
        editor.endUndoAction()

Mark Olson

In PythonScript, suppose you want to replace the pattern (?i)^f\w+ with XXX.

The normal (and most concise) way in PythonScript to do this would be editor.rereplace(r'(?i)^f\w+', 'XXX').

However, you can also import re, slurp up all the text of the file into Python-land, and use re.sub to do all of these replacements. This is much faster; in a test file consisting of 160,000 lines where each line is all random word characters, the version using re.sub took 0.04 seconds while the editor.rereplace version took about 0.53 seconds.

Below is a complete script that uses re.sub instead of editor.rereplace.

import re # standard Python regex library
from Npp import editor

text = editor.getText()
# do regex-replace
REGEX = '(?mi)^f\w+' # note that we have to set the `m` flag so that the `^` only matches the start of each line
REPL = 'XXX'
replaced = re.sub(REGEX, REPL, text)
# set the text in the file to the regex-replaced text
editor.setText(replaced)

So in summary here’s the pros and cons of the re.sub version:
PROS:

about 10 times faster (but the ratio can vary a fair bit)

That’s the only pro that I can think of TBH.

CONS:

Has to use the re flavor of regular expressions, which has several annoying deficiencies compared to the Boost::Regex` flavor that Notepad++ uses.
Because all the changes are made inside of Python and the entire file is overwritten with re.sub, the Change History feature of Notepad++ will show the entire file as being modified when you use re.sub, whereas editor.rereplace will cause the Change History feature to correctly show only the lines that were actually modified as being modified.

Wim Gielis

Hello @Mark-Olson,

Thank you very much !

I think that my strategy will be:

check the size of the document
if smallish, proceed with my code above
if bigger, proceed with your code above

I am not particularly worried about regex dialects. Personally, I use regex quite often but not the most advanced tools inside regex.
Hence I would think that in every variant I would get the same results and possibilities given my needs.

Thanks !

Alan Kilborn

@Mark-Olson said:

the version using re.sub took 0.04 seconds while the editor.rereplace version took about 0.53 seconds.

I’d guess that what slows the PythonScript version down is that each change generates a bunch of notifications and other “administrative” stuff to happen in Notepad++ for each change, whereas this is not really happening (well, except once) in the Python “slurp it up and put it all back” approach.

Wim Gielis

Hello all,

Proud to present the adapted code. It works very very fast, also in my log file containing 420K lines.

My only 2 questions:

what do you think of the cutoff value of 1 MB (1_000_000 bytes): how would you set it ?
what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?

This tool is going to save a lot of time :-) Sharing to the community with anyone interested.

from Npp import editor, notepad
import re


# Wim Gielis
# Feb. 2025
#
# DeleteText script (Alt-d):
#       - The selected text is deleted in the whole file
#       - If no text is selected then input is asked from the user. Using this input, a regular expressions way of deleting can be asked:
#         * if the input is re...... and it starts with 're' then the regex following the 1 is deleted everywhere
#         * all other input is treated as literal text
#       - This script works faster for the regular deletes (literal text),
#         but also for regex, it is faster and has the added benefit that the regex checkbox is not activated in the Find/Replace dialog window.
#       - The script allows for multi-selections, in which case it will be a literal delete of each selected text.
#       - For bigger files (above 1 MB), regex replacements through Notepad++ can be slower. In that case, we use the re module from Python.
#       - Options for delete_mode when single selection delete is done:
#         * delete_mode = -1 ==> no delete
#         * delete_mode =  0 ==> a literal delete
#         * delete_mode =  1 ==> a regex delete
#       - Options for delete_mode when multi-selections delete is done:
#         * delete_mode =  0 ==> a literal delete
#       - Documentation and links:
#         * https://community.notepad-plus-plus.org/topic/26620/search-and-replace-with-pythonscript-compared-to-built-in
#         * https://npppythonscript.sourceforge.net/docs/latest/scintilla.html


PREFIX_REGEX = 're'


def delete_text(text_pattern: str, delete_mode: int) -> None:

    # delete_mode = 0 ==> literal delete so a literal replace is needed
    # delete_mode = 1 ==> regex delete so a regex replace is needed

    if editor.getLength() < 1_000_000:

        # Start an undo action for a single undo step
        editor.beginUndoAction()

        # Get the current position of the caret to avoid losing position after replacement
        caret_position = editor.getCurrentPos()

        if delete_mode == 0:
            editor.replace(text_pattern, "")

        elif delete_mode == 1:
            editor.rereplace(text_pattern, "")

        # Move caret back to its original position
        editor.gotoPos(caret_position)

        # End the undo action
        editor.endUndoAction()

    else:
        # print("File size too big and switching to the Python re module: " + str(editor.getLength))
        full_text = editor.getText()

        if delete_mode == 0:
            editor.setText(full_text.replace(text_pattern, ""))

        elif delete_mode == 1:
            editor.setText(re.sub(text_pattern, "", full_text))

# Get the selected area(s) and the selected text(s)
num_selections = editor.getSelections()
if num_selections == 1:
    delete_mode = 0
    selected_text = editor.getSelText()

    # Ensure there is some selected text (or we go for a regex delete)
    if not selected_text:
        # Ask the user for input
        selected_text = notepad.prompt("Please enter the text to be deleted from the active file\n(suffix 1 for a regex delete, otherwise a literal delete)", "User input", "")

        if selected_text is None:
            # the user canceled the prompt
            delete_mode = -1
        elif not selected_text:
            # the user entered nothing
            delete_mode = -1
        elif selected_text.startswith(PREFIX_REGEX):
            # prefix 1 means a regex delete
            delete_mode = 1
            selected_text = selected_text[len(PREFIX_REGEX):]
        else:
            # default case is a literal delete
            delete_mode = 0

    if delete_mode > -1:
        delete_text(selected_text, delete_mode)

else:
    selected_texts = []

    # Multiple selections are done
    for i in range(num_selections):

        # Get each time the selected text
        start = editor.getSelectionNStart(i)
        end = editor.getSelectionNEnd(i)
        if end > start:
            selected_text = editor.getTextRange(start, end)
            if selected_text:
                selected_texts.append(selected_text)

    for index, selected_text in enumerate(selected_texts):
        delete_text(selected_text, 0)

Mark Olson

@Wim-Gielis said in Search and Replace with PythonScript compared to built-in:

what is the usefullness of the lines with beginUndoAction, caret_position, gotoPos, endUndoAction: would you apply it too or not have these lines ?

Regarding beginUndoAction and endUndoAction:

Suppose you have the following code:

editor.beginUndoAction()
editor.doSomething() # this can be undone with Ctrl+Z
editor.doSomethingElse() # also can be undone with Ctrl+Z
editor.doAnotherThing() # also can be undone with Ctrl+Z
editor.endUndoAction()

The (begin/end)UndoAction calls wrapping this code ensure that when the user hits Ctrl+Z, that undoes the entire block between those calls. If you didn’t have those calls, the user would have to hit Ctrl+Z three times to undo that plugin command.

For example, if you have a plugin command that iterates through all the user’s selections, and changes each of those selections one by one, it is important to use (begin/end)UndoAction because otherwise the user will have to hit Ctrl+Z once for every selection to undo their call to the plugin command.

Regarding caret_position and goToPos:

Some Scintilla calls snap the position of the caret to somewhere unexpected. editor.goToPos helps alleviate that.

Wim Gielis

@Mark-Olson

Thank you, then I leave it in the code.