Python Script Regex replace with uppercase
-
@Alan-Kilborn said in Python Script Regex replace with uppercase:
editor.rereplace(r'(?-i)([A-Z])', ur'\L\1')
Interesting… I used a nearly identical line of code here, and it works for me:
editor.rereplace(r'(?-i)^([A-Z])', ur'\L\1')
Note that my script file is itself ANSI/Windows-1252, and therefore begins with the following line:
# encoding: Windows-1252
I’m guessing yours is Unicode, and maybe that’s the interfering factor?
-
Weird.
I just tried a comparison:
Zz Yy XxXx C_ À Á Â Ã Ä Å
-
File that is Encoding > ANSI: =>
editor.rereplace(r'(?-i)^([A-Z])', ur'\L\1')
=> -
File that is Encoding > Charset > Western > Windows-1252: same file, same rereplace line, no characters go lowercase
-
File that is Encoding > UTF-8: same file, same rereplace line, no characters go lowercase
So ANSI works differently than a forced charset or forced UTF-8.
—
addendum:Notepad++ v8.1.4 (64-bit) Build time : Aug 21 2021 - 13:04:59 Path : C:\usr\local\apps\notepad++\notepad++.exe Command Line : Admin mode : OFF Local Conf mode : ON Cloud Config : OFF OS Name : Windows 10 Enterprise (64-bit) OS Version : 2009 OS Build : 19042.1165 Current ANSI codepage : 1252 Plugins : AutoSave.dll ComparePlugin.dll ExtSettings.dll MarkdownViewerPlusPlus.dll mimeTools.dll NppConsole.dll NppConverter.dll NppEditorConfig.dll NppExec.dll NppExport.dll NppFTP.dll NppUISpy.dll PreviewHTML.dll PythonScript.dll QuickText.dll TagLEET.dll XMLTools.dll
Python 2.7.18 (v2.7.18:8d21aa21f2, Apr 20 2020, 13:25:05) [MSC v.1500 64 bit (AMD64)]
-
-
Instead of doing the lowercase through a regex replacement in the
rereplace
, what about a lambda function?editor.rereplace(r'(?-i)^([A-Z])', lambda m: m.group(1).lower())
worked in all three of the test file conditions I listed in my previous post.And the opposite, which your original question asked for,
editor.rereplace(r'(?-i)^([a-z])', lambda m: m.group(1).upper())
-
@PeterJones said in Python Script Regex replace with uppercase:
Instead of doing the lowercase through a regex replacement in the
rereplace
, what about a lambda function?I’ve never even heard of that before, but it sounds like a promising work-around unless and until an actual fix for
rereplace
is in place, if possible. Where can I read more about lambda? I see only a passing mention in the Python Script doc page for ‘Editor Object’, and though typing ‘lambda’ in the search box for the online NPP user manual makes it appear that it can be found in numerous sections including ‘Searching’, I’m unable to locate any specific instance of it there using my browser’s ‘Find’ facility. -
@M-Andre-Z-Eckenrode said in Python Script Regex replace with uppercase:
Where can I read more about lambda?
lambda functions are part of Python, not specific to Notepad++'s PythonScript plugin.
Read more about them by “googling” for “lambda functions in Python”.
lambda functions are available in other programming languages as well, so it is not even something specific to Python (but that’s the context here, so…).
-
@Alan-Kilborn said in Python Script Regex replace with uppercase:
Read more about them by “googling” for “lambda functions in Python”.
Ok, thanks much.
-
@M-Andre-Z-Eckenrode said in Python Script Regex replace with uppercase:
@PeterJones said in Python Script Regex replace with uppercase:
I see only a passing mention in the Python Script doc page for ‘Editor Object’,There’s only a passing mention because, as @Alan-Kilborn explained, it’s a standard feature in Python (and elsewhere).
And you don’t even need to know about lambdas for the problem at hand: Really, you just need to learn, as the PythonScript documentation showed, that
rereplace
allows either a replacement expression or a function that it will call on the matching text. A lambda function or a normally-defined function will both work equally well (like the infamousadd_1
in the PS docs). The function accesses the text of the match through them.group(#)
where # aligns with the capture groups in your regular expression match expression. The function should return the text that you want to replace the entire match with (in your case, the function-based equivalent of\U\1
). So whenrereplace
finds a match, it will send that match asm
to the function, and then the function returns the replacement value; thenrereplace
moves on to the next match and calls the function again, until no more matches are found. (To make it abundantly clear, your function does not need to loop through the matches; that is handled by therereplace
; your function just needs to transform one matchm
into some text to return to be used as the replacement.)The call
editor.rereplace(r'(?-i)^([a-z])', lambda m: m.group(1).upper())
is exactly equivalent to the longer scriptdef do_capitalize(m): return m.group(1).upper() editor.rereplace(r'(?-i)^([a-z])', do_capitalize)
… but it fits nicely in a one-liner. If your replacement function required more than one line (if you wanted to build a more complicated string through various calculations), then you’d have to use the defined-function variant instead of a lambda function.
-
@PeterJones said in Python Script Regex replace with uppercase:
you don’t even need to know about lambdas for the problem at hand: Really, you just need to learn, as the PythonScript documentation showed, that
rereplace
allows either a replacement expression or a function that it will call on the matching text.Noted, and thank you for the more detailed explanation. Although I can’t think of any immediate use for a lambda function other than your helpful suggestion for
rereplace
just now, it’s certainly possible on will come to me in the future, so I’m happy to learn more about it than I absolutely have to for my present needs — even though a refresh and more thorough study of it will surely be necessary when the time comes.Thanks again to you and @Alan-Kilborn for your help.
-
It might have been instructive for the PythonScript docs to have shown the
add_1
example as a lambda, e.g.:editor.rereplace('X([0-9]+)', lambda m: 'Y' + str(int(m.group(1)) + 1))
For those without easy access to the PythonScript docs, here’s what they DO show:
def add_1(m): return 'Y' + str(int(m.group(1)) + 1) # replace X followed by numbers by an incremented number # e.g. X56 X39 X999 # becomes # Y57 Y40 Y1000 editor.rereplace('X([0-9]+)', add_1);
And, No, I’ve no idea what’s up with the trailing semicolon on the last line of that example.
-
Sorry for the late and already too late reply, but I usually stay away from the computer on weekends.
I assume that Python does its string processing before the boost::regex function gets a chance to interpret the string, but I’ve never really looked into it. The lambda or explicit function solution seem to be the way to solve this problem.