Testing regular expressions by RegexTester.py
-
Hello Claudia,
I missed your awesome python script Regex Tester.py !! After some tests, here are some deductions :
-
First of all, how to stop your script ?! May be, I miss something obvious !
-
Once, after an N++ re-start, I ran Regex Tester.py, but I omitted to open, before, a new document and to move it in the secondary view. Doing that, afterwards, I was surprised to get two tabs New 1 in the secondary view !
-
When using the End of Line characters syntax (
\R
,\r
or\n
) in a regex, they are not highlighted, even if the button Show All characters is set. However, the regex.\R.
does highlight the last character of a line and the first character of the next line :-) -
Let’s consider the subject string
aabaaababbbaabbab
On the picture, below, just under each regex, I indicated, first, the way your script highlights and outlines the matches, then, a new highlighting and outlining, just below. What do you think of, Claudia ? I don’t even know if it’s technically possible and not too hard to code !
But, please Claudia, take all your time, as it seems that you do hundred things, at the same time !!
Cheers,
guy038
-
-
Hi Guy,
thank you for your kind words.
First of all, how to stop your script ?! May be, I miss something obvious !
Obviously I missed that :- (
Make sure editor2 and the tab where RegexTester isActive… has the focus and then run it a second time.
Text will change to RegexTester inActive…
If you accidentally run it in another tab a second time you need to call it in that tab again.
This isn’t really user friendly and I’m thinking about having a solution like first run of script activates, second run deactivates, regardless
where you execute it.Once, after an N++ re-start, I ran Regex Tester.py, but I omitted to open, before, a new document and to move it in the secondary view.
Doing that, afterwards, I was surprised to get two tabs New 1 in the secondary view !Ooopss - don’t think that my script is responsible but will take a look
When using the End of Line characters syntax ( \R, \r or \n) in a regex, they are not highlighted, even if the button Show All characters is set.
However, the regex .\R. does highlight the last character of a line and the first character of the next line :-)Yes, I discovered this as well. It looks like scintilla/npp doesn’t allow me to color/access it.
Let’s consider the subject string aabaaababbbaabbab On the picture, below, just under each regex, I indicated, first, the way your script highlights and outlines the matches, then, a new highlighting and outlining, just below. What do you think of, Claudia ? I don’t even know if it’s technically possible and not too hard to code !
Obviously I cannot divide the letters but I see what you mean, I’ll think about it, should be possible.
I saw you reply on the regex /v topic, damned, missed that /v and [/v] have different meanings. Thx for clarifying it - AGAIN. ;-)
Cheers
Claudia -
Hello Claudia,
OK , after having focus on the new 1 tab again and choosing the option Plugins - Python Script - Run Previous Script (Regex Tester), I got, as expected, the text RegexTester inActive… and all the highlighting was suppressed ! But, why the option Plugins - Python Script - Stop script is greyed ?
To reproduce the issue, about new tabs :
-
Run the Regex Tester python script first ( Menu Plugins - Python Script - Scripts - Regex Tester )
-
Open a new tab ( Menu File - New ) or the CTRL + N shortcut
-
Right-Click) on that new tab and choose, in the context menu, the option Move to Other view
=> Two tabs new 1 are displayed !
BTW, the message :
RegexTester isActive [i] i=sensitive, I=insesitive
could be changed into :
RegexTester isActive [S] S=Sensitive, I=Insensitive
without changing any code :-))
Cheers,
guy038
-
-
Hi Guy,
in regards to the two new1 documents, npp does this in the background.
When you start npp with only one view open, npp already has opened a new1 document in second view.
If you don’t access it, it gets deleted/replaced by the one which you move to the second view.E.g.
New start of npp with one view result in one document named new1 which is visible.
Another new1 document is available in second view but currently invisible.
If you open another new document -> new2 appears but if you move this to second view,
new1 from second view gets replaced, and only if it hasn’t been touched in the meantime,
by new2 document. Not sure if this expected but doesn’t harm anyway.In regards to the sensitive switch, yes, you could use any letter as I’m checking for I (Capital i)only ;-)
Currently preparing a v2 of RegexTester, with your comments ;-)
Cheers
Claudia -
Hi Guy and all,
Version2 of the RegexTester.
Improvements based on Guys comments.-
If second view isn’t active a message pops up complaining about this and abort the start.
-
To stop the script you can execute the script a second time regardless which view is active.
-
Coloring/Grouping changed - based on a even/odd differentiation.
-
minor changes.
import re # import regular expression module
editor1.indicSetStyle(10,INDICATORSTYLE.ROUNDBOX) # used to color whole match - odd lines
editor1.indicSetFore(10,(95,215,184)) # the color
editor1.indicSetAlpha(10,55) # alpha settings
editor1.indicSetOutlineAlpha(10,255) # outlining
editor1.indicSetUnder(10,True) # draw under the texteditor1.indicSetStyle(9,INDICATORSTYLE.ROUNDBOX) # used to color whole match - even lines
editor1.indicSetFore(9,(195,215,184))
editor1.indicSetAlpha(9,55)
editor1.indicSetOutlineAlpha(9,255)
editor1.indicSetUnder(9,True)editor1.indicSetStyle(8,INDICATORSTYLE.CONTAINER) # used for sub matches
editor1.indicSetFore(8,(100,215,100))
editor1.indicSetAlpha(8,55)
editor1.indicSetOutlineAlpha(8,255)
editor1.indicSetUnder(8,True)isOdd = False # used as even/odd line identifier
def match_found(m):
global isOdd # global, because we modify it if m.lastindex > 0: # >0 = how many submatches do we have for i in range(0, m.lastindex + 1): # loop over it if i == 0: # match 0 is always the whole match editor1.setIndicatorCurrent(8) # set indicator editor1.indicatorFillRange(m.span(i)[0], m.span(i)[1] - m.span(i)[0]) # draw it else: editor1.setIndicatorCurrent(9 if isOdd else 10) # set indicator for sub matches editor1.indicatorFillRange(m.span(i)[0], m.span(i)[1] - m.span(i)[0]) # draw indicator isOdd = False if isOdd else True # set even/odd identifier - next sub match gets coloured different else: # no sub matches editor1.setIndicatorCurrent(9 if isOdd else 10) # set indicator for matches editor1.indicatorFillRange(m.span(0)[0], m.span(0)[1] - m.span(0)[0]) # draw indicator isOdd = False if isOdd else True # set even/odd identifier
def clear_indicator(): # clear all indicators by
length = editor1.getTextLength() # calculating length of document
for i in range(8,11): # and looping over
editor1.setIndicatorCurrent(i) # each indicator to
editor1.indicatorClearRange(0,length) # clear the rangedef regex(): # here the regex starts
clear_indicator() # first have a clear view ;-) pattern = editor2.getLine(0).rstrip() # next, get the pattern for the second view and cut of line endings try: # try it if editor2.getLine(2)[22:23] == 'i': # is it a case insensitive search? editor1.research(pattern, match_found, re.IGNORECASE) # then call research with the ignore case flag else: # otherwise editor1.research(pattern, match_found) # call without flag except: pass # is needed to catch incorrect regular expressions
def RegexTester_CHARADDED(args): # callback which gets called each time when char is added in editor
regex() # calls itself regex functiondef RegexTester_UPDATEUI(args): # callback gets called and emulates a CHARDELETE notification
if args[‘updated’] == 3: # is a bit of a hack but
regex() # seems to workdef checkIfSecondViewActive(): # self-explanatory
if notepad.getCurrentView() == 0:
notepad.messageBox(‘It is needed to have the second view active!!’,‘RegexTester’,0)
return False
else:
return Truedef startRegexTester(): # start procedures
editor.callback(RegexTester_CHARADDED, [SCINTILLANOTIFICATION.CHARADDED]) # register the callbacks charadd
editor.callback(RegexTester_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI]) # and emulated chardeletewasAlreadyRunning = 1 if console.editor.getProperty('RegexTester_running') == '0' else 0 # this checks if script was already running inputTab = console.editor.getProperty('RegexTester_inputTab') # do we have the bufferid of the previous run saved? if inputTab == '': newTabActive = 0 inputTab = notepad.getCurrentBufferID() # get bufferid from active tab to console.editor.setProperty('RegexTester_inputTab', inputTab) # remember where the input is done else: notepad.activateBufferID(int(inputTab)) newTabActive = 0 if notepad.getCurrentBufferID() == int(inputTab) else 1 # Has old tab been closed? if wasAlreadyRunning == 1 and newTabActive == 0: # editor2.replace('RegexTester inActive', 'RegexTester isActive') # add the status info to second view else: # no, this is the first time we run the script so editor2.appendText('\r\n\r\nRegexTester isActive [s] s=sensitive, i=insensitive') # add the status info to second view console.editor.setProperty('RegexTester_running', '1') # and set the running identifier editor2.setFocus(True) # give the second view the focus editor2.gotoLine(0) # and jump to line 1
def stopRegexTester(): # stop procedures
editor.clearCallbacks([SCINTILLANOTIFICATION.CHARADDED]) # clear the callback charadded and editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI]) # emulated chardeleted console.editor.setProperty('RegexTester_running', '0') # set info that script isn't running inputTab = console.editor.getProperty('RegexTester_inputTab') # get the bufferid of the inupttab if notepad.getCurrentBufferID() != inputTab: # is it currently active? if not, notepad.activateBufferID(int(inputTab)) # activate it editor2.replace('RegexTester isActive', 'RegexTester inActive') # add the status info to second view clear_indicator() # clear all indicators editor1.setFocus(True) # and give first view the focus. Have fun
if console.editor.getProperty(‘RegexTester_running’) == ‘1’: # if the script is currently running
stopRegexTester() # stop RegexTester
else: # else
if checkIfSecondViewActive(): # check if second view is the active one
startRegexTester() # start RegexTester
Cheers
Claudia -
-
Hi @guy038
just saw that I didn’t answer this question
But, why the option Plugins - Python Script - Stop script is greyed ?
Technically the script starts, registers its callbacks and ends. Done.
The reason why you still see changes while typing the regexes is,
that the callbacks get executed by python script plugin.Cheers
Claudia -
Hi, Claudia and All,
Got plenty of things to do at work, this week and I couldn’t find enough time and motivation to be on forums ! Even yesterday, I rebuilt a server, till 9.30 pm, whose hard disk was definitively dead, without any possibility of restoring ! ( So, please, don’t forget to backup your important files, from time to time ! One never knows ! )
Let’s go back, Claudia, to your second version of your Regex Tester Python script. Awesome, you did it ! Not only, it works perfectly well, but I suppose that the
re
Python’s module don’t have the issues than our present Boost regex version has :-)) Finally, your plugin behaves exactly as the non-official François-R Boyer regex engine does:-)
For instance, if we consider the subject string
aabaaababbbaabbab
, of my previous post, and the regex(?<!a)ba*
, the correct results are :- 1st match
b
, at position 10 - 2nd match
baa
, at positions 11, 12 and 13 - 3rd match
ba
, at positions 15 and 16
That is exactly the matches found with your script, as shown below :
With the classical regex search, we get 5 matches. But two of them are wrong : the
b
at position 14 and at position 17
A second example. From that link, below, you’ll see the 40 characters of the Osmanya alphabet, in the range
[\x{10480}-\x{104AF}]
, which are, obviously, outside the Unicode BMP Planehttp://www.unicode.org/charts/PDF/U10480.pdf
With an appropriate font ( Andagii ) set to Default Style of Global Styles , on the picture below, you’ll see that the regex
[\x{10485}-\x{104A3}]
does find the correct consecutive characters, with the Regex Tester script , UNLIKE the classical regex search, which leads to the error message Invalid regular expression :-((Cheers,
guy038
- 1st match
-
Hi Guy,
many many thx for doing all your tests and efforts. Very much appreciated.
Unfortunately your test means that my script failed as I was expecting that I can use
it to test functionlist more easily. :-(
(I already discovered that functionlist regex behaves strange sometimes but now… - YOUUUU broke it ;-))) kidding.In regards to the François-R Boyer regex implementation I’m on a good way I think.
Currently boost regex supports two ways of implementing unicode awareness.
Relying on wchar_t, which is how Don implemented it and by using unicode aware regular expression types like François did.So, at the moment, I don’t see how I could merge both codes reasonably, that’s why I started to use François’s code to replace Dons implementation.
Cheers
Claudia -
Hello Claudia,
I did the tests, of my previous post, with a 6.8.8 version, where I has, previously installed the Python plugin. I decided to verify if S/R are faster or not, with the François-R Boyer version, on the last 6.9.1 N++ version. And there’s a bad new, indeed !
The François-R Boyer regex engine, included in his SciLexer.dll version, does NOT work, with the last 6.9.1 version of N++ :-((
I verified that it’s OK with the 6.9 version, and the previous versions of Notepad++
Why ?!
Cheers,
guy038
-
Hi Guy,
sorry for answering so late - I had a day off - mostly bicycling and enjoying the nice weather.
I thought since upgrading scintilla this lib wasn’t working anymore!!??
Because of that I didn’t test it - I will give the original code a try and see what it is complaining about.Will comeback on this.
Cheers
Claudia