pythonscript: any ready pyscript to replace one huge set of regex/ phrases with others?
-
Hi, @alan-kilborn, and All,
Alan, it’s just all the points, described in my previous post !
You can insert, either, in search and replacement regexes, characters, located outside the BMP, directly or with the syntax
\x{HHHHHHHH}
From the text below :
🍬 = \x{1F36C}
🎂 = \x{1F382}
🎄 = \x{1F384}
🎅 = \x{1F385}
🎇 = \x{1F387}
🎺 = \x{1F3BA}
👼 = \x{1F47C}with the
Python
regex engine, you can use :SEARCH
[\x{0001F36C}-\x{0001F47C}].+
or[\x{1F36C}-\x{1F47C}].+
REPLACE
\x{1F385} = \\x{1F385}
So, with my modified script :
@[\x{1F36C}-\x{1F47C}].+@\x{1F385} = \\x{1F385}@
and you get:
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}For characters with code, above
\x{FFFF}
, you cannot do this kind of S/R with ourBoost
regex engine
The NUL character,
\x{0000}
, can be used, either, in search and replacement regexesFor instance, you can execute the following S/R, with the
Python
regex engine :SEARCH
[\x20-\x7f]
REPLACE
$0\x00
giving for the script :
@[\x20-\x7f]@$0\x00@
This S/R cannot be run with our
Boost
regex engine, which just deletes all the characters
The backward assertions, as, for instance,
\A
, seem correctly supportedJust imagine the text “
This is a test
” in a new N++ tab and the regex S/R :SEARCH
\A.
REPLACE
-
So, in the script, the syntax
@\A.@-@
With the
Python
regex engine, we get the correct text-his is a test
! With ourBoost
regex engine, after clicking on theReplace All
button, we, wrongly, obtain the text--------------
:-((
The Look-behind assertions are correctly handled, even if it overlaps with the end of the previous match
Consider the text
aaaabaaabaaa
and the regex S/R :SEARCH
(?<=a)ba+
REPLACE
123a
=> the syntax
@(?<=a)ba+@123a@
, in the scriptWith the
Python
regex engine, the text is correctly modified asaaaa123a123a
( two S/R ) whereas, with theBoost
regex engine, after clicking on theReplace All
button, we get the wrong stringaaaa123abaaa
Indeed, the second match never occurs, as it should have seen that the last char of replacement
a
was right before thebaaa
string, hence a second match :-((Cheers,
guy038
-
are you really using the python regex engine?
This would mean you have some code likere.sub(pattern, repl, string, count=0, flags=0)
but the snippet you showed earlier useseditor.rereplace
which is supposed to be the boost regex engine. -
Hi, @eko-palypse, @alan-kilborn and All,
Huum…, I’m a bit confused ! When I mean : “With the
Python
regex engine…”, I’m just saying that I did all the tests with the Alan’s script, above, which does use the helper methodeditor.rereplace
! And, of course, the classical N++Replace
dialog, to compare with.In fact, I’m already aware of this fact, as, some time ago, I noticed differences, while using Scott Sumner’s or Claudia frank’s
Python
scripts, which dealt, essentially, with searches ! As, this time, we have a nice search and replace script, I just verified that my assumptions were correct : the present behavior of theeditor.rereplace
method gives improved results and seems to fix some bugs of the current implementation of the Boost library, within Notepad++ :-))But, I’m not a true coder ! So, unfortunately, it’s… up to all of you, to tell me why it’s looks better ;-))
Cheers,
guy038
-
So to clarify, when using the Pythonscript plugin, one can do 1 of 2 things:
- editor.rereplace() which uses the Boost regex that is very similar to, but maybe not exactly the same as the one directly in N++
- use re.sub() which uses the Python regex engine (which is its own thing, not Boost, not PCRE, not ANYTHING except Python’s own re module)
So far I believe everything discussed in this thread is using the FIRST one.
-
@guy038 said:
When I (say) “With the Python regex engine…”, I’m just saying that I did all the tests with…Alan’s script
“With the Python regex engine” would be my SECOND bullet point above, but that is not what you’re doing unless you’ve changed the
editor.rereplace()
call in the script to are.sub()
call (and slightly changed the other logic to cope with that change).BTW when you
import re
(to get access to the re.IGNORECASE aka re.I flag) that is all you are doing–getting access to that, which happens to be shared, for convenience, with the Boost regex engine. -
So from what I get is, that there is a difference in the implementation details of boost:regex in npp and pythonscript plugin.
So the best would be if the pythonscript plugin would implement the missing pieces and npp silently steals the code and
adapt it to have it work the same ;-) -
@guy038 said:
SEARCH \w+
REPLACE \u$0
AFAIK, they do not modify anything, ( I mean regarding case of characters ! ) when executed from a Python script :-((
Interesting. I noticed that the following variant on that above WILL work to affect case when using editor.rereplace() in a script:
Find:
(\w+)
Repl:\U\1
It seems like either variant should capitalize all lowercase letters in a document. HOWEVER, only the script version does this! When run interactively with the Replace dialog in Notepad++, these 2 variants only capitalize the first letter of every “word”.
Can anyone offer an explanation for:
- why Guy’s original regex replace does nothing in the script
- why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but – and I think act correctly in the script)
-
@Alan-Kilborn said:
why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but – and I think act correctly in the script)
Let me correct this:
- why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but the one that involves capturing group #1 and using \1 in the replace part – acts correctly in the script, at least I think it does)
Hmm, better but maybe still not a great way of expressing it. :-P
-
If I understand you correctly, I’m totally lost - my setup must have some kind of builtin wizard as
I do get different result. So just to clarify, having the textthis is some text
and aiming to get
THIS IS SOME TEXT
we would use\w+
and replace with\U$0
or(\w+)
with\U$1
as replacement.
For me, both work the same in the dialog and none work when called likeeditor.rereplace('\w+','\U$0')
from a script.
But you do have a different result? -
Wow. WOW. I find I cannot reproduce my earlier results. It seems to be working consistently now (duplicating your results). I guess I have egg on my face and sorry for the false alarm; unless it takes some sort of special sequence of actions to get into a weird mode! I did restart N++ an hour ago so I supposed that is a possible occurrence.
One thing I am seeing now that the editor.rereplace() is doing “nothing”:
I use the LocationNavigate plugin for its ability to specially mark changed lines (wish there was a better/more-current solution to that, btw!). When I run the scripted replace, although visually no text changes, the plugin does mark every line where
\w+
matches. Basically this means all lines besides empty ones got “changed”. So…not really sure what under-the-hood voodoo magic is happening when using\U
(and probably others like it) with a scripted replace, but it sure seems like SOMETHING interesting might be happening. If what is happening is that the \U is being ignored andabc
is simply being replaced byabc
perhaps that is not all that interesting. :( -
:-) … live is interesting, isn’t it … what is true now might be false in the next minute :-)
From modify callback I see that there is a replace but it just ignores the \U
{'code': 2008, 'annotationLinesAdded': 0, 'text': 'this', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 0, 'line': 0, 'foldLevelNow': 0} {'code': 2008, 'annotationLinesAdded': 0, 'text': 'is', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 2, 'foldLevelPrev': 0, 'position': 5, 'line': 0, 'foldLevelNow': 0} {'code': 2008, 'annotationLinesAdded': 0, 'text': 'some', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 8, 'line': 0, 'foldLevelNow': 0} {'code': 2008, 'annotationLinesAdded': 0, 'text': 'text', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 13, 'line': 0, 'foldLevelNow': 0}
-
-
Since you pointed out that the Pythonscript
editor.rereplace()
way of working has some distinct advantages (less bugs?) over the Notepad++ interactive Replace dialog, how about a little script for making testing the differences even easier (and less cryptic than typing a one-liner in the PS console):if editor.getSelectionEmpty(): p1 = editor.getCurrentPos() p2 = editor.getTextLength() - 1 else: p1 = editor.getSelectionStart() p2 = editor.getSelectionEnd() s = notepad.prompt('Enter search regex:', '', '') if s != None and len(s) > 0: r = notepad.prompt('Enter replace regex:', '', '') if r != None: editor.beginUndoAction() editor.rereplace(s, r, 0, p1, p2) editor.endUndoAction()
If a selection is active when running the script, it acts like N++'s “Replace All…with In Selection ticked”. Otherwise, it acts like a normal N++'s “Replace All” acting on text from caret downward to EOF.
Not rocket science, but then again neither was the original script way above. :)
-
how can I download all script at bruderstein/PythonScript/scripts/Samples/ as a single zip file?
Thanks.
-
how can I download all script at bruderstein/PythonScript/scripts/Samples/ as a single zip file?
you can do that by either downloading the whole master repo from >>> here <<< and extract PythonScript/scripts/Samples/,
… or by downloading them manually one by one, then package them into another zip and commit this zip to david @bruderstein , with the request for publishing, so that you can re-download it 😉
-
… or by writing a python script doing it
@V-S-Rawat I leave the implementation of the compressing function to you ;-)import urllib import re sample_url = "https://github.com/bruderstein/PythonScript/tree/master/scripts/Samples" raw_url = "https://raw.githubusercontent.com/bruderstein/PythonScript/master/scripts/Samples/" save_dir = r'D:\xyz\{}' _f = urllib.urlopen(sample_url) data = _f.read() links = re.findall('href="(.*?\.py)"', data) for link in links: file = link.rpartition('/')[2] urllib.urlretrieve(raw_url + file, save_dir.format(urllib.unquote(file)))
-
This script will run from within npp and will access net and download all *.py on that page?
Quite powerful. I never thought such a thing is even possible.
:-)
Thanks.
-
Yes, but since it does not use any Notepad ++ objects, it can also be run outside of npp.
The ecosystem of Python is really and I mean really huge.
If you are thinking about a problem and how you can solve it, there is most likely already a Python module available that is made for it.
The only little downer is that PythonScript has not implemented pip or it can not be easily implemented (as far as I understand it), which means you have to, if you want to rely on external modules, do some manual work to make it happen within pythonscript plugin. -
That brings another bouncer to me.
@Eko-palypse said:Yes, but since it does not use any Notepad ++ objects, it can also be run outside of npp.
You mean same npp pyscript can be run from outside of npp?
how? I only have pyscript plugin installed, and it seems it had installed a huge set of files. Are those files sufficient to run npp pyscript from outside npp? How? which exe file to run?
or do I have to download python afresh and install that to be able to run python out of npp?
And if so, can I think remove pyscript plugin and still be able to run pyscript from within npp?
because that would mean one python installed by pyscript within npp, and another python installed directly in programfiles. Why keep both, there may be version clashes between them.The ecosystem of Python is really and I mean really huge.
If you are thinking about a problem and how you can solve it, there is most likely already a Python module available that is made for it.yeah, I was also thinking that across the years that python has been popular and widely used, a huge repository of pyscripts must already have been made. But I find very less repositories very less number of scripts then I thought should have been there. Seems I didn’t not reach major pyscript repositories.
Thanks.
-
Yes, this particular script may run outside npp if you have a local
python installation and then by callingpython scriptname.py
Theoretically, you can also uninstall the plugin and remote control
npp and scintilla with the local python installation.
But then you have to program everything yourself
what the pythonscript plugin otherwise provides to you.
The callbacks could be a bigger challenge.A central repo for scripts does not exist, as far as I know,
but a central system for installing modules does.