pythonscript: any ready pyscript to replace one huge set of regex/ phrases with others?

Meta Chuh

what happens if you try a complete new pythonscript 1.3.0.0 install on a new 7.6.2 portable ?
still the same, or does alan’s script work on that ?

i’ve added the folder locations, from the other py thread, below as convenience if lib location or anything else of the add ons might be a trigger:

get a new copy of PythonScript_Full_1.3.0.0.zip from here
extract it and put it’s contents as listed below

PythonScript.dll, plugin dll goes to:
npp.7.6.2.bin\plugins\PythonScript\PythonScript.dll

python27.dll goes to:
npp.7.6.2.bin\python27.dll

machine level scripts and python library go to:
npp.7.6.2.bin\plugins\PythonScript\lib\
npp.7.6.2.bin\plugins\PythonScript\scripts\
contains sample scripts and startup. py

manual, context-help files go to:
npp.7.6.2.bin\plugins\doc\PythonScript\
contains PythonScript.chm up to version 1.2.0.0
contains html docs since version 1.3.0.0

user level scripts go to:
npp.7.6.2.bin\plugins\config\PythonScript\scripts\
note: this folder will be created automatically as soon as a new script is created.

Meta Chuh

@Alan-Kilborn said:

It is really a simple script

it may be simple, but it definitively comes in very handy.
i use it, love it, and without you i wouldn’t have it. 😃👍

before using your script, i used your suggested way of using macros for multi pass, multi regexes on files.
(and before that, there was pure darkness 😂 )

but now i prefer your script, as the regexes are much easier to change or read than within a saved macro. 👍👍👍

guy038

Hello @alan-kilborn, @meta-chuh and All,

Sorry, I preferred to take some time, doing numerous tests and … it works nicely ;-))

Alan, it’s just my mistake, because I should have opened the console, immediately ! Indeed, I wrote some accentuated characters above \x7f, although in a comment ! So I added the directive #coding=utf-8 as first line of the script. You’re really lucky as an [A-Z] person ;-))

In order to run S/R in an insensitive way, I just imported the re library and used the flag re.IGNORECASE

I also tried your second solution with strings/regexes in a file and… no problem, too !

Just notice that, in this case, the exact python code is, rather :

f = open(r'<Drive_Letter>:\....\....\sr_list.txt')

sr_list = f.readlines()

I, first thought that, according your comments, the part, to be inserted, was, literally :

open(r'sr_list.txt') as f: sr_list = f.readlines()

But, of course, this code is wrong and an error occurs on word as -:))

Hi, @meta-chuh,

Hummmm…, I’m hungry, since a while… So be patient a bit ! I be back very soon and, as I’ve already installed a portable v7.6.2 version of N++, I’ll, simply, need to add the latest Python

Just before posting, I’ve seen your last reply to Alan and I do agree to your compliments ! It’s really a magic script ;-))

Best Regards,

guy038

PeterJones

An aside on as:

@guy038 said:

I, first thought that, according your comments, the part, to be inserted, was, literally :
open(r'sr_list.txt') as f: sr_list = f.readlines()

@Alan-Kilborn said:

with open(r’sr_list.txt’) as f: sr_list = f.readlines()

Having just done some Python tutorials, I learned about the with statement, which is what enables the as – the with was literally part of what was needed in order for the as to work

Alan Kilborn

@guy038 said:

It’s really a magic script

It’s just a tiny, obvious script, at least to me. :)

BTW glad you got it going.

I think Peter straightened out the with/as stuff. I’ll just say that the line that started out #with was correct as written. The intent was that one would simply remove the comment (with ONE keypress) to enable the line. Note that my coding style is that if a line is an informational comment it starts out as # plus space plus text. If it is code that is commented out, it is # (at correct indent level), then no space(!), then code.

[A-Z] person

What is that? Why is it a lucky thing?

Meta Chuh

@Alan-Kilborn

Note that my coding style is that if a line is an informational comment it starts out as # plus space plus text. If it is code that is commented out, it is # (at correct indent level), then no space(!), then code

very nice and clean to read 👍

[A-Z] person

hahaha, i never thought that [A-Z] person could be interpreted as a potential insult, or discrimination, but it is not. 😂😂😂

it’s just an expression for languages without special characters and letters like áàñøö and so on.

guy038

Hello @alan-kilborn, @meta-chuh and All,

First, Alan, the expression “An [A-Z] person” is a common way, for @scott-sumner to point out that he’s poorly concerned about accentuated characters and all relative questions ! That’s why I said that you’re lucky for not having to bother about these problems ;-))

Not also that I said, above, “is a common way” and not “was a common way” as I do hope that Scott will be back, on our forum, very soon !

Now, of course, the Python syntax, below, is totally exact !

 with open(r'sr_list.txt') as f: sr_list = f.readlines()

It’s just that when I saw the two comment lines :

# or take input from a file:
#with open(r'sr_list.txt') as f: sr_list = f.readlines()

I thought, wrongly, it meant, in fact :

# or take input from a file with [ the sentence ]:
# open(r'sr_list.txt') as f: sr_list = f.readlines()

BTW, Alan, I tested, in the sr_list.txt, the syntax |^|Test , with some space chars after the word Test and, unfortunately, the ending spaces are not taken in account. Of course, I could have used |^|Test\x20\x20\x20…

So, may I ask for two improvements :

The possibility to repeat the separator, after the replacement string, to take extra blank chars in account, either, in the sr_list.txt file or in the script, itself
The possibility to add comments, beginning with the usual # char, in the sr-list.txt file

For instance :

# Add the string ABC, followed with 3 SPACES, at BEGINNING of line 
|^|ABC   |
# Add the string XYZ, followed with 3 SPACES  at END of line
!$!XYZ   !

Contrary to what I said, @meta-chuh, I didn’t come back and just preferred going to bed as I’ve planned to spend a ski-day, as weather was quite nice, Wednesday, on Grenoble and, in addition, I also met some friends of mine, in Chamrousse ski-station ;-))

As promised, I installed the last 1.3.0.0. Python script version in my local N++ v7.6.2 installation

Let’s suppose that is N++ v7.6.2 is installed in any folder XXXX, different from folder C\Program files and folder C\Program files (x86). Then,

I downloaded the PythonScript_Full_1.3.0.0.zip archive, in XXXX folder
With 7zFileManager, I extracted all archive’s contents, in the XXXX folder
I needed to execute an extra task :
- Move the library PythonScript.dll from the plugins folder to the plugins\PythonScript folder
I opened Notepad v7.6.2
- I chose the menu option Plugins > Python Script > New Script and, immediately closed the window, with the ESC key, in order to create the file tree XXXX > plugins > Config > PythonScript > scripts

Finally, here is, below, the main file’s layout, right after installing the last Python script v1.3.0.0 ::

XXXX, below, represents the INSTALL folder of N++ v7.6.2 , which must be DIFFERENT from, either, "C\Program files" and "C\Program files (x86)"

It's IMPORTANT to note that this LOCAL installation needs the ZERO-LENGTH file, "doLocalConf.xml", along with "notepad..exe"

XXXX
    \
    |-- autoCompletion (folder)
    |                \
    |                |-- ".xml" files
    |
    |-- localization (folder)
    |              \
    |              |-- ".xml" files
    |
    |-- plugins (folder)
    |         \
    |         |-- Config (folder)
    |         |        \
    |         |        |-- Hunspell (folder)
    |         |        |          \
    |         |        |          |-- en_US.aff
    |         |        |          |
    |         |        |          |-- en_US.dic
    |         |        |
    |         |        |-- PythonScript (folder)
    |         |        |              \
    |         |        |              |-- scripts (folder)
    |         |        |                        \
    |         |        |                        |-- Future USER ".py" scripts
    |         |        |
    |         |        |-- ".ini" files
    |         |        |
    |         |        |-- nppPluginList.dll
    |         |
    |         |-- doc (folder)
    |         |     \
    |         |      |-- PythonScript(folder)
    |         |                     \
    |         |                     |-- _sources (folder)
    |         |                     |
    |         |                     |-- _static  (folder)
    |         |                     |
    |         |                     |-- ".html" files and Miscellaneous files
    |         |
    |         |-- DSpellCheck (folder)
    |         |             \
    |         |             |-- DSpellCheck.dll
    |         |
    |         |-- mineTools (folder)
    |         |           \
    |         |           |-- mineTools.dll
    |         |
    |         |-- NppConverter (folder)
    |         |              \
    |         |              |-- NppConverter.dll
    |         |
    |         |-- NppExport
    |         |           \
    |         |           |-- NppExport.dll
    |         |
    |         |-- PythonScript (folder)
    |         |              \
    |         |              |-- lib (folder)
    |         |              |     \
    |         |              |      |-- Sub-folders
    |         |              |      |
    |         |              |      |-- ".py" files
    |         |              |
    |         |              |-- scripts (folder)
    |         |              |         \
    |         |              |         |-- Samples (folder)
    |         |              |         |         \
    |         |              |         |         |-- ".py" scripts
    |         |              |         |
    |         |              |         |-- startup.py
    |         |              |
    |         |              |-- PythonScript.dll
    |
    |-- themes (folder)
    |        \
    |        |-- ".xml" files
    |
    |-- updater (folder)
    |         \
    |         |-- GUP.exe
    |         |
    |         |-- gup.xml
    |         |
    |         |-- libcurl.dll
    |
    |-- doLocalConf.xml
    |
    |-- Notepad++.exe
    |
    |-- python27.dll
    |
    |-- SciLexer.dll
    |
    |-- ".txt" files
    |
    |-- ".xml" CONFIGURATION files

Best Regards,

guy038

P.S. :

In the future, I think that, at least, for portable installs, when all problems concerning “Plugins Admin” are solved, it would be reasonable to migrate the Config and doc directories from the plugins folder to the higher level, with the other directories localization, autoCompletion, themes and updater

So, the plugins folder would only contains sub-folders relative to each plugin installed ! What do you think of my idea ?

guy038

Hello @alan-kilborn, @meta-chuh and All,

I’m answering to myself, concerning the last question, at the end of my previous post

Eventually, it would not be a nice solution to do so as, indeed, the Config and doc folders contain, both, files rather relative to plugins, too !

Cheers,

guy038

Alan Kilborn

@guy038 said:

may I ask for two improvements

We don’t really need to repeat the delimiter, we just need to NOT ignore trailing space. What causes an ignoring of the trailing space in the original script is the rstrip() function. By default this function removes all whitespace from the right side of a string. If we change it to tell it to only strip line ending characters, it will leave blanks on that side: rstrip('\n'). Note that this will work for line endings of \n or \r\n in the file. I mention this because at first glance it would appear to only work for line endings of \n but that is not the case.

Using # as a comment character is also easy, we can do it with this logic: if line[0] == '#': continue which means "if the first column of the data is # then “continue” the “for” loop by jumping back up to the “for” line, ignoring the rest of the indented lines under the “for”.

A new version of the “magic” (still LOL!) script is:

# format for each line is: delimiter then search regex then delimiter then replace regex
sr_list = [
    '!a!A  ',
    '# I start with # so I am merely a comment line',
    '@b@B',
    '!c!C',
    ]

# or take input from a file:
#with open(r'sr_list.txt') as f: sr_list = f.readlines()

editor.beginUndoAction()

for line in sr_list:
    if line[0] == '#': continue
    (s,r) = line[1:].rstrip('\n').split(line[0])
    editor.rereplace(s,r)

editor.endUndoAction()

Meta Chuh

@guy038

Not also that I said, above, “is a common way” and not “was a common way” as I do hope that Scott will be back, on our forum, very soon !

me too, and i think all others too … a little secret: i saw him active at the npp github repo a few days ago 😃👍 … but don’t tell anyone ;-)

I didn’t come back and just preferred going to bed as I’ve planned to spend a ski-day, as weather was quite,nice Wednesday, on Grenoble and, in addition, I also met some friends of mine, in Chamrousse ski-station ;-))

good done, best thing to do … but envyyyyyy ;-)

XXXX ( INSTALL folder of N++ v7.6.2 , DIFFERENT from folder “C\Program files” and folder “C\Program files (x86)” )

thanks for your tree, it comes in very handy and i’ve bookmarked it.

for competition i would edit it to:
XXXX ( PORTABLE folder of N++ v7.6.2 , DIFFERENT from folder "C\Program files" and folder "C\Program files (x86)" )
and/or a note that doLocalConf.xml has to be present.
just to make sure readers will not get those structures mixed up with the different folder structure of an installed version without doLocalConf.xml.

guy038

Hello, @v-s-rawat, @alan-kilborn, @meta-chuh and All,

Alan, I tried your second version and everything went OK ! However, I prefer having a final separator, in order to easily see, in the SR_list.txt, the contents of the replacement regex.

So, here is, below, my own version of your excellent script :

#coding=utf-8

import re

# --------------------------------------------------------------------------------------------------------------------------------------

#                                           Script "Multiples_SR.py"

# A LITTLE adaptation from an ORIGINAL and VALUABLE script of Alan KILBORN ( January 2019 ) !

# See https://notepad-plus-plus.org/community/topic/16942/pythonscript-any-ready-pyscript-to-replace-one-huge-set-of-regex-phrases-with-others/21

# This script :

#   - Reads an existing "SR_List.txt" file, of the CURRENT directory, containing a list of SEARCH/REPLACEMENT strings, ONE PER line
#   - Selects, one at a time, a COUPLE of SEARCH and REPLACEMENT regexes  / expressions / strings / characters
#   - Executes this present S/R on CURRENT edited file, in NOTEPAD++
#   - Loop till the END of file

# Any PURE BLANK line or COMMENT line, beginning with '#', of the "SR_list.txt" file, are simply IGNORED

# --------------------------------------------------------------------------------------------------------------------------------------

# For EACH line, in the "SR_List.txt" file, the format is <DELIMITER><SEARCH regex><DELIMITER><REPLACE regex><DELIMITER>

## EXAMPLES :
## ¯¯¯¯¯¯¯¯

##  Deletes any [ending] "; comment"  /  Delimiter = '!'
#!(?-s)(^.*?);.+!\1!

##  Changes any LOWER-case string "notepad++" in its UPPER-case equivalent  /  Delimiter = '@'
#@(?-i)notepad\+\+@NOTEPAD++@

##  Changes any "Smith" and 'James' strings, with that EXACT case, to, respectively, "Name" and "First name"  /  Delimiter = '&'
##  Deletes any "TEST" string, with that EXACT case
#&(Smith)|TEST|(James)&(?1Name)(?2First name)&

##  Replaces any BACKSLASH character with the "123" number, both  preceded and followed with 3 SPACE characters  /  Delimiter = '%'
#%\\%   123   %
##    or, also, the syntax   %\x5c%   123   %

##  Deletes any string "Fix", followed with a SPACE char, whatever its CASE  /  Delimiter = '+'
#+(?i)Fix ++

##  Change 3 CONSECUTIVE "#" characters with 3 BACKSLASH characters  /  Delimiter = '*'
#*###*\\\\\\*

# --------------------------------------------------------------------------------------------------------------------------------------

# In the CODE line, right below, you may :

#   - Modify the NAME of the file, containing the SEARCH and REPLACEMENT regexes  
#   - Indicate an ABSOLUTE or RELATIVE path, before the filename

with open(r'SR_list.txt') as f: sr_list = f.readlines()

# You may, as well, insert the SEARCH and REPLACE regexes, directly, in THIS script :

#sr_list = [
#    '!(?-s)(^.*?);.+!\\1!',
#    '@(?-i)notepad\\+\\+@NOTEPAD++@',
#    '&(Smith)|TEST|(James)&(?1Name)(?2First name)&',
#    '%\\\\%   123   %',
#          # or the syntax  '%\x5c\x5c%   123   %',
#    '+(?i)Fix ++',
#    '*###*\\\\\\\\\\\\*',
#    ]

# The use of RAW strings  r'.......'  is also possible, in order to SIMPLIFY some regexes

# Note that these RAW regexes are strictly IDENTICAL to those, which could be contained in a "SR_List.txt" file, WITHOUT the 'r' PREFIX 

#sr_list = [
#    r'!(?-s)(^.*?);.+!\1!',
#    r'@(?-i)notepad\+\+@NOTEPAD++@',
#    r'&(Smith)|TEST|(James)&(?1Name)(?2First name)&',
#    r'%\\%   123   %',
#          # or the syntax  r'%\x5c%   123   %',
#    r'+(?i)Fix ++',
#    r'*###*\\\\\\*',
#    ]

editor.beginUndoAction()

console.write ('\nMODIFICATIONS on FILE "{}: "\n\n'.format(notepad.getCurrentFilename()))

# Note : Variable e is always EMPTY string ( Part AFTER the THIRD delimiter and BEFORE the END of line ! )

for line in sr_list:

    if line[0] == '#' or line == '\n' : continue
    (s,r,e) = line[1:].rstrip('\n').split(line[0])

    console.write('    SEARCH  : >{}<\n'.format(s))
    console.write('    REPLACE : >{}<\n\n'.format(r))

    editor.rereplace(s,r)   # or editor.rereplace(s,r,re.IGNORECASE) / editor.rereplace(s,r,re.I)

editor.endUndoAction()

# END of Multiple_SR.py script

@meta-Chuh, as you said, I slightly modify the local Notepad++ tree, in my previous post, to point out the importance of the doLocalConf.xml file ;-))
Cheers,

guy038

Alan Kilborn

@guy038

Yea, probably a good idea. Trailing blanks are hard to see without having visible line ends turned on (yuck!), or doing them as \x20 or, as you like, a trailing delimiter.

Glad you are enjoying the script and your own script mods!

chcg

Would you like to create a PR of the script to be added to https://github.com/bruderstein/PythonScript/tree/master/scripts/Samples? Otherwise I could also add the last version of @guy038 , if that is ok for you.

I know the installation of PythonScript with N++ > 7.6.x is right now a horror. Hope i will find some time to get it compatible with PluginAdmin changes. The biggest problem known so far is the move the location of python27.dll into the plugin folder.

Meta Chuh

@chcg

I know the installation of PythonScript with N++ > 7.6.x is right now a horror.

i’ve made a little guide and summary of all paths, while being in a chat with peter, for the installed version here

and one for the portable version here

maybe you can use it, if you need to help someone.

The biggest problem known so far is the move the location of python27.dll into the plugin folder.

i suppose so, unless the plugin spawns a process with a different relative path, not bound to notepad++.exe’s path, or maybe even a static python27 library in the spawn.

guy038

Hi, @alan-kilborn and All,

I did some tests, with your script and, finally, the Python regex engine seems more reliable than our Boost regex engine ;-))

Some bugs or limitations, present in our Boost implementation ( see the REMARK section of this FAQ, below )

https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

do not occur anymore with the Python regex engine ;-))

Indeed :

You can insert, either, in search and replacement regexes, characters, located outside the BMP, directly or with the syntax \x{HHHHHHHH}
The NUL character, \x{0000}, can be used, either, in search and replacement regexes
The backward assertions, as, for instance, \A, seem correctly supported
The Look-behind assertions are correctly handled, even if it overlaps with the end of the previous match

Seemingly, we’ll just lack, with the Python regex engine, the case modifiers, ( \u, \l, \U, \L and \E )

These escaped sequences are available, with our Boost engine, in the replacement part. Refer to the address, below :

https://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html#boost_regex.format.boost_format_syntax.escape_sequences

For instance, against this text:

This is simple test

You may test the two regex S/R :

SEARCH \w+

REPLACE \u$0

and

SEARCH \w+

REPLACE \U$0 $0\E <$0>

AFAIK, they do not modify anything, ( I mean regarding case of characters ! ) when executed from a Python script :-((

Best Regards,

guy038

Alan Kilborn

@guy038 said:

I did some tests, with your script and, finally, the Python regex engine seems more reliable than our Boost regex engine

Can you show some examples of the Python regex engine testing you did?

Eko palypse

@guy038,

the script provided by @Alan-Kilborn uses the boost regex implementation from the PythonScript plugin, which, as you’ve already shown, is implemented differently than with npp.

Alan Kilborn

@Eko-palypse

Well that’s kinda what I was getting at by asking @guy038 that last question. I couldn’t tell from what he was saying if he was talking about the earlier script or if he had tried some real Python re.xxx functions for search and replace. Hence my question to him.

uses the boost regex implementation from the PythonScript plugin which is implemented differently than with npp

Is it truly, though? I always thought that it made calls back to whatever regex engine is in N++, but, hmmm, maybe not. Maybe I should check the source code. :)

Eko palypse

@Alan-Kilborn

From what I understand, yes, this is the case, it has the boost:regex engine implemented
https://github.com/bruderstein/PythonScript/blob/d54a2b434ec2b51f0dbacd3828fc36a20533c2dc/PythonScript/src/Replacer.cpp

guy038

Hi, @alan-kilborn, and All,

Alan, it’s just all the points, described in my previous post !

You can insert, either, in search and replacement regexes, characters, located outside the BMP, directly or with the syntax \x{HHHHHHHH}

From the text below :

🍬 = \x{1F36C}
🎂 = \x{1F382}
🎄 = \x{1F384}
🎅 = \x{1F385}
🎇 = \x{1F387}
🎺 = \x{1F3BA}
👼 = \x{1F47C}

with the Python regex engine, you can use :

SEARCH [\x{0001F36C}-\x{0001F47C}].+ or [\x{1F36C}-\x{1F47C}].+

REPLACE \x{1F385} = \\x{1F385}

So, with my modified script : @[\x{1F36C}-\x{1F47C}].+@\x{1F385} = \\x{1F385}@

and you get:

🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}
🎅 = \x{1F385}

For characters with code, above \x{FFFF}, you cannot do this kind of S/R with our Boost regex engine

The NUL character, \x{0000}, can be used, either, in search and replacement regexes

For instance, you can execute the following S/R, with the Python regex engine :

SEARCH [\x20-\x7f]

REPLACE $0\x00

giving for the script : @[\x20-\x7f]@$0\x00@

This S/R cannot be run with our Boost regex engine, which just deletes all the characters

The backward assertions, as, for instance, \A, seem correctly supported

Just imagine the text “This is a test” in a new N++ tab and the regex S/R :

SEARCH \A.

REPLACE -

So, in the script, the syntax @\A.@-@

With the Python regex engine, we get the correct text -his is a test ! With our Boost regex engine, after clicking on the Replace All button, we, wrongly, obtain the text -------------- :-((

The Look-behind assertions are correctly handled, even if it overlaps with the end of the previous match

Consider the text aaaabaaabaaa and the regex S/R :

SEARCH (?<=a)ba+

REPLACE 123a

=> the syntax @(?<=a)ba+@123a@, in the script

With the Python regex engine, the text is correctly modified as aaaa123a123a ( two S/R ) whereas, with the Boost regex engine, after clicking on the Replace All button, we get the wrong string aaaa123abaaa

Indeed, the second match never occurs, as it should have seen that the last char of replacement a was right before the baaa string, hence a second match :-((

Cheers,

guy038