Unexpected results when doing multiple complicated regex replace in quick succession on several thousand lines



  • Hello,

    Apologies for the rather long topic title, but I wanted to be specific. I’ve tried searching for people with a similar issue, but was unable to find anything. I suspect that is partly because I’m not sure if there’s a specific term to describe the issue I’m having. Anyway, I’ll move onto the actual topic.

    Using the ‘Python Script’ plugin I’ve written a script that performs multiple complicated regex search and replace operations on a document using the editor.rereplace method. Documentation on that method can be found at http://npppythonscript.sourceforge.net/docs/latest/scintilla.html?highlight=rereplace#Editor.rereplace. I’m especially interested in the last paragraph, which I’ll quote here:

    An small point to note, is that the replacements are first searched, and then all replacements are made. This is done for performance and reliability reasons. Generally this will have no side effects, however there may be cases where it makes a difference. (Author’s note: If you have such a case, please post a note on the forums such that it can be added to the documentation, or corrected).

    The issue I’m experiencing is that as far as I’ve been able to tell, each individual operation works as intended. However running the script and firing those operations off in quick succession produces unexpected results a few thousand lines into the document.

    My working theory is that because of the documented implementation of replace operations, perhaps the next regex operation is trying to match against a document that hasn’t been (fully) processed by the previous operation.

    I’m working on a test for this hypothesis that involves buffering the entire document in a variable and then using Python’s re module on it and then afterwards outputting the variable to the document again. But because Python’s regex engine differs from NPP’s there’s some work involved in adjusting my regex, so for now I don’t know yet if this actually is the issue. For the record I’d much prefer using NPP’s regex engine :)

    I was wondering if other people have had similar experiences and if so, what was your solution? Is there perhaps a way to check if the editor object has completely finished its search & replace operation? Or a way to force it to complete any pending replace operations before starting matching on the next? Would a time.sleep() in my script allow the editor time to fully process a search & replace before I send the next? Or am I maybe barking up the wrong tree entirely?



  • @Spartelfant

    I haven’t discovered such behavior yet, as far as I understand your findings.
    And I’m a bit unsure whether this is a python script issue anyway.
    My understanding is, that the rereplace searches first and then replaces all found positions,
    then it returns.

    Do you have any kind of little script and data which shows what you have discovered?

    Cheers
    Claudia



  • What you’re describing is very common in OOP languages. If this behaved like they do, one would be looking for something more like a doEvents operation as sleep usually halts all operations in the background, unless the community started incorporating a doEvents equivalent into a sleep command (which IMO should have been done years ago, if not already done). In fact I wonder if this is a python background issue rather than a regex issue.

    I’m just starting into regex and have started thinking along similar lines to what you mention here; I’m interested in seeing what others can add to this question by way of responses. Perhaps you could insert timestamps in a test document?


Log in to reply