Python Script Regex replace with uppercase



  • @Alan-Kilborn said in Python Script Regex replace with uppercase:

    editor.rereplace(r'(?-i)([A-Z])', ur'\L\1')

    Interesting… I used a nearly identical line of code here, and it works for me:

    editor.rereplace(r'(?-i)^([A-Z])', ur'\L\1')

    Note that my script file is itself ANSI/Windows-1252, and therefore begins with the following line:

    # encoding: Windows-1252

    I’m guessing yours is Unicode, and maybe that’s the interfering factor?



  • @M-Andre-Z-Eckenrode ,

    Weird.

    I just tried a comparison:

    Zz
    Yy
    XxXx
    
    C_
    À
    Á
    Â
    Ã
    Ä
    Å
    
    1. File that is Encoding > ANSI: f03b0786-610d-4cf8-97b7-f8ef8fa9f85d-image.png => editor.rereplace(r'(?-i)^([A-Z])', ur'\L\1') => 1ef54b3e-41fc-4d50-823e-c6c825cc1a7e-image.png

    2. File that is Encoding > Charset > Western > Windows-1252: same file, same rereplace line, no characters go lowercase

    3. File that is Encoding > UTF-8: same file, same rereplace line, no characters go lowercase

    So ANSI works differently than a forced charset or forced UTF-8.


    addendum:

    Notepad++ v8.1.4   (64-bit)
    Build time : Aug 21 2021 - 13:04:59
    Path : C:\usr\local\apps\notepad++\notepad++.exe
    Command Line : 
    Admin mode : OFF
    Local Conf mode : ON
    Cloud Config : OFF
    OS Name : Windows 10 Enterprise (64-bit) 
    OS Version : 2009
    OS Build : 19042.1165
    Current ANSI codepage : 1252
    Plugins : AutoSave.dll ComparePlugin.dll ExtSettings.dll MarkdownViewerPlusPlus.dll mimeTools.dll NppConsole.dll NppConverter.dll NppEditorConfig.dll NppExec.dll NppExport.dll NppFTP.dll NppUISpy.dll PreviewHTML.dll PythonScript.dll QuickText.dll TagLEET.dll XMLTools.dll 
    

    Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)]



  • @M-Andre-Z-Eckenrode ,

    Instead of doing the lowercase through a regex replacement in the rereplace, what about a lambda function? editor.rereplace(r'(?-i)^([A-Z])', lambda m: m.group(1).lower()) worked in all three of the test file conditions I listed in my previous post.

    And the opposite, which your original question asked for, editor.rereplace(r'(?-i)^([a-z])', lambda m: m.group(1).upper())



  • @PeterJones said in Python Script Regex replace with uppercase:

    Instead of doing the lowercase through a regex replacement in the rereplace, what about a lambda function?

    I’ve never even heard of that before, but it sounds like a promising work-around unless and until an actual fix for rereplace is in place, if possible. Where can I read more about lambda? I see only a passing mention in the Python Script doc page for ‘Editor Object’, and though typing ‘lambda’ in the search box for the online NPP user manual makes it appear that it can be found in numerous sections including ‘Searching’, I’m unable to locate any specific instance of it there using my browser’s ‘Find’ facility.



  • @M-Andre-Z-Eckenrode said in Python Script Regex replace with uppercase:

    Where can I read more about lambda?

    lambda functions are part of Python, not specific to Notepad++'s PythonScript plugin.

    Read more about them by “googling” for “lambda functions in Python”.

    lambda functions are available in other programming languages as well, so it is not even something specific to Python (but that’s the context here, so…).



  • @Alan-Kilborn said in Python Script Regex replace with uppercase:

    Read more about them by “googling” for “lambda functions in Python”.

    Ok, thanks much.



  • @M-Andre-Z-Eckenrode said in Python Script Regex replace with uppercase:

    @PeterJones said in Python Script Regex replace with uppercase:
    I see only a passing mention in the Python Script doc page for ‘Editor Object’,

    There’s only a passing mention because, as @Alan-Kilborn explained, it’s a standard feature in Python (and elsewhere).

    And you don’t even need to know about lambdas for the problem at hand: Really, you just need to learn, as the PythonScript documentation showed, that rereplace allows either a replacement expression or a function that it will call on the matching text. A lambda function or a normally-defined function will both work equally well (like the infamous add_1 in the PS docs). The function accesses the text of the match through the m.group(#) where # aligns with the capture groups in your regular expression match expression. The function should return the text that you want to replace the entire match with (in your case, the function-based equivalent of \U\1). So when rereplace finds a match, it will send that match as m to the function, and then the function returns the replacement value; then rereplace moves on to the next match and calls the function again, until no more matches are found. (To make it abundantly clear, your function does not need to loop through the matches; that is handled by the rereplace; your function just needs to transform one match m into some text to return to be used as the replacement.)

    The call editor.rereplace(r'(?-i)^([a-z])', lambda m: m.group(1).upper()) is exactly equivalent to the longer script

    def do_capitalize(m):
        return m.group(1).upper()
    
    editor.rereplace(r'(?-i)^([a-z])', do_capitalize)
    

    … but it fits nicely in a one-liner. If your replacement function required more than one line (if you wanted to build a more complicated string through various calculations), then you’d have to use the defined-function variant instead of a lambda function.



  • @PeterJones said in Python Script Regex replace with uppercase:

    you don’t even need to know about lambdas for the problem at hand: Really, you just need to learn, as the PythonScript documentation showed, that rereplace allows either a replacement expression or a function that it will call on the matching text.

    Noted, and thank you for the more detailed explanation. Although I can’t think of any immediate use for a lambda function other than your helpful suggestion for rereplace just now, it’s certainly possible on will come to me in the future, so I’m happy to learn more about it than I absolutely have to for my present needs — even though a refresh and more thorough study of it will surely be necessary when the time comes.

    Thanks again to you and @Alan-Kilborn for your help.



  • It might have been instructive for the PythonScript docs to have shown the add_1 example as a lambda, e.g.:

    editor.rereplace('X([0-9]+)', lambda m: 'Y' + str(int(m.group(1)) + 1))


    For those without easy access to the PythonScript docs, here’s what they DO show:

    def add_1(m):
        return 'Y' + str(int(m.group(1)) + 1)
    
    # replace X followed by numbers by an incremented number
    # e.g.   X56 X39 X999
    #          becomes
    #        Y57 Y40 Y1000
    
    editor.rereplace('X([0-9]+)', add_1);
    

    And, No, I’ve no idea what’s up with the trailing semicolon on the last line of that example.



  • Sorry for the late and already too late reply, but I usually stay away from the computer on weekends.

    I assume that Python does its string processing before the boost::regex function gets a chance to interpret the string, but I’ve never really looked into it. The lambda or explicit function solution seem to be the way to solve this problem.


Log in to reply