Poorman regex based styler/lexer
-
Recently there was a request by a coworker, to have a python script
which would color a monitored log automatically based on regular expressions.
Script should allow to define multiple regexes and should allow to
monitor different files identified by its extension.Well, this is what came out.
Maybe it is useful for others too.The critical parts are in the configuration area and the first function init_scintilla.
Here are the steps which needs to be done if one wants to extend it with another regex.
- create a new style id like STYLE_INFO and assign the next free integer
- define a new color like STYLE_INFO_FOREGROUND = (255,255,255).
- define a new regex like REGEX_DICT[‘STYLE_INFO’] = ‘(?-s).INFO:\d{4}\s-.’
The critical part here is that the dictionary key ‘STYLE_INFO’ needs to match the id
you choose in step 1 - add the appropriate function(s) to init_scintilla - done.
This example should result in the following code
STYLE_DEFAULT = 0 # don't change this STYLE_WARNINGS = 1 STYLE_ERRORS = 2 STYLE_INFO = 3 STYLE_DEFAULT_FOREGROUND = (200,200,200) # grey STYLE_DEFAULT_BACKGROUND = (30,30,30) # kind of black STYLE_WARNINGS_FOREGROUND = (255,255,0) # yellow STYLE_ERRORS_FOREGROUND = (255,0,0) # red STYLE_INFO_FOREGROUND = (255,255,255) REGEX_DICT = OrderedDict() # to be sure that regex get called in the same order each time REGEX_DICT['STYLE_WARNINGS'] = '(?-s).*WARNING:\d{3}\s\-.*' REGEX_DICT['STYLE_ERRORS'] = '(?-s).*ERROR:\d{4}\s\-.*' REGEX_DICT['STYLE_INFO'] = '(?-s).*INFO:\d{4}\s\-.*' and def init_scintilla(): # this needs to match the definitions in configuration area editor.styleSetFore(STYLE_DEFAULT, STYLE_DEFAULT_FOREGROUND) editor.styleSetBack(STYLE_DEFAULT, STYLE_DEFAULT_BACKGROUND) editor.styleSetFore(STYLE_WARNINGS, STYLE_WARNINGS_FOREGROUND) editor.styleSetFore(STYLE_ERRORS, STYLE_ERRORS_FOREGROUND) editor.styleSetFore(STYLE_INFO, STYLE_INFO_FOREGROUND)
If one wants to add other file extensions to it modify the list REGISTERED_FILE_EXTENSIONS.
Let’s say, the file extensions .dmp_log should be add then the list needs to look like thisREGISTERED_FILE_EXTENSIONS = ['.log','.dmp_log']
There is also a boolean USE_AS_DEFAULT_LANGUAGE which could be set to True if
the ‘new ?’ docs should be colored as well.Ok, what else - the order how the REGEX_DICT is created is the order how the different
regexes get called. Why is this important, because if one defines an overlapping results,
means eg. regex 1 colors the term error file copy failed in blue and another regex,
called later, is looking for word failed which it would color in red than it would result
in error file copy being in blue and failed in red.And, if you gonna play with different colors and regexes while testing,
you may notice that “old” color doesn’t get removed if it isn’t part of the new match.
This is because the script don’t uses the styleClearAll method which normally should be
called on each start first. But because one might use themed npp it could result in
undesirable result - so to workaround while testing it would be best to stop the script
and to activate another tab, in the view where the file which needs to be colored is,
and reactivate the file. Now you see that all colors are gone. Restart script.
And, obviously, you cannot use this script for documents which are already colored by
a different lexer. That’s it - I guess.As always, the first run of the script starts it whereas the next call would stop it.
Questions or suggestions? Feel free to do so.
Cheers
Claudiaimport re, os from collections import OrderedDict # ---------------------------- configuration area ----------------------------- # stlye definitions - changes here need to match setup calls in init_scintilla() STYLE_DEFAULT = 0 # don't change this STYLE_WARNINGS = 1 STYLE_ERRORS = 2 STYLE_DEFAULT_FOREGROUND = (200,200,200) # grey STYLE_DEFAULT_BACKGROUND = (30,30,30) # kind of black STYLE_WARNINGS_FOREGROUND = (128,255,0) # yellow STYLE_ERRORS_FOREGROUND = (255,0,0) # red REGISTERED_FILE_EXTENSIONS = ['.log'] USE_AS_DEFAULT_LANGUAGE = False REGEX_DICT = OrderedDict() # to be sure that regex get called in the same order each time REGEX_DICT['STYLE_WARNINGS'] = '(?-s).*WARNING:\d{3}\s\-.*' REGEX_DICT['STYLE_ERRORS'] = '(?-s).*ERROR:\d{4}\s\-.*' # -------------------------- configuration area end --------------------------- CUSTOM_STYLER_IS_RUNNING = globals().get('CUSTOM_STYLER_IS_RUNNING', False) LIST_OF_FIRST_RUN_DONE = [] DOC_NEEDS_TO_BE_COLORED = False def init_scintilla(): # this needs to match the definitions in configuration area editor.styleSetFore(STYLE_DEFAULT, STYLE_DEFAULT_FOREGROUND) editor.styleSetBack(STYLE_DEFAULT, STYLE_DEFAULT_BACKGROUND) editor.styleSetFore(STYLE_WARNINGS, STYLE_WARNINGS_FOREGROUND) editor.styleSetFore(STYLE_ERRORS, STYLE_ERRORS_FOREGROUND) def custom_lexer(start_pos, end_pos): for k in REGEX_DICT.keys(): matches = [] editor.research(REGEX_DICT[k], lambda m: matches.append(m.span()), 0, start_pos, end_pos) for match in matches: # console.write('match:{} - {}\n'.format(match, editor.getTextRange(*match))) editor.startStyling(match[0],31) editor.setStyling(match[1]-match[0], eval(k)) def this_doc_should_be_styled(): global DOC_NEEDS_TO_BE_COLORED current_file = notepad.getCurrentFilename() _file, _ext = os.path.splitext(current_file) if _ext in REGISTERED_FILE_EXTENSIONS or (USE_AS_DEFAULT_LANGUAGE and _file.startswith('new ')): DOC_NEEDS_TO_BE_COLORED = True else: DOC_NEEDS_TO_BE_COLORED = False return DOC_NEEDS_TO_BE_COLORED def get_visible_area(): first_visible_line = editor.getFirstVisibleLine() last_visible_line = editor.linesOnScreen() + first_visible_line start_pos = editor.positionFromLine(first_visible_line) end_pos = editor.positionFromLine(last_visible_line) return start_pos, end_pos def callback_BUFFERACTIVATED(args): if this_doc_should_be_styled(): init_scintilla() if args['bufferID'] not in LIST_OF_FIRST_RUN_DONE: custom_lexer(*get_visible_area()) LIST_OF_FIRST_RUN_DONE.append(args['bufferID']) def callback_UPDATEUI(args): if DOC_NEEDS_TO_BE_COLORED and args['updated'] >= 4: custom_lexer(*get_visible_area()) def main(): global CUSTOM_STYLER_IS_RUNNING if CUSTOM_STYLER_IS_RUNNING: notepad.clearCallbacks([NOTIFICATION.BUFFERACTIVATED]) editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = False else: if this_doc_should_be_styled(): init_scintilla() custom_lexer(*get_visible_area()) notepad.callback(callback_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED]) editor.callback(callback_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = True main()
-
this is absolutely great
it is super useful for noobs like me, who are trying to learn how to interface python with NPP
thank you
-
Hello everyone,
maybe someone can help me out, I cant really get it to work…
I would like to color columns in fixed width (alternating, in columns of 8: 8-8-8-8-8…), independent of the content of the line. I tried to apply the poormans script:-)
Here is what I wrote, but the coloring for the first 8 columns already does not work…
Maybe someone knows a solution. I guess the regex is still wrong, but I dont know, it works here:
https://regex101.com/r/wuvNR3/2
Thanks a lot in advance!
Kind regards,
Johannesimport re, os
from collections import OrderedDictSTYLE_DEFAULT = 0 # don’t change this
STYLE_COL1 = 1STYLE_DEFAULT_FOREGROUND = (255,0,255) # black
STYLE_DEFAULT_BACKGROUND = (245,245,245) # white
STYLE_COL1_BACKGROUND = (0,255,0) # greenREGEX_DICT = OrderedDict() # to be sure that regex get called in the same order each time
REGEX_DICT[‘STYLE_COL1’] = ‘(?-s).{8}(.{8})’ # http://stackoverflow.com/questions/12559144/regex-for-fixed-position-and-length-fieldREGISTERED_FILE_EXTENSIONS = [‘.bdf’]
def init_scintilla():
# this needs to match the definitions in configuration area
editor.styleSetFore(STYLE_DEFAULT, STYLE_DEFAULT_FOREGROUND)
editor.styleSetBack(STYLE_DEFAULT, STYLE_DEFAULT_BACKGROUND)
editor.styleSetBack(STYLE_COL1, STYLE_COL1_BACKGROUND)init_scintilla()
-
first make sure that no other lexer is active. Meaning if you open a python file
and run the script it won’t work as the default python lexer will overwrite the styles.
So if your file is already assigned to another lexer (builtin or user defined) use
Normal text from language menu and then run the script.The scripts purpose was to color monitored files therefore the following code has been created
def callback_UPDATEUI(args): if DOC_NEEDS_TO_BE_COLORED and args['updated'] >= 4: custom_lexer(*get_visible_area())
the args[‘updated’] >= 4 is the critical part which means that it only updates the ui
if a scroll action happend. See here for more details, maybe you wanna use a different value for your purpose.Now the regex part.
The re engine always returns whole and partial matches which means if you are
looking for .{8}(.{8}) the engine returns the whole match containing the 16 chars
and a submatch returning the second 8 chars.
The script ignores submatches. If you need to work with submatches
here my regex tester which shows how to handle those.Otherwise you have to create something like this.
REGEX_DICT[‘STYLE_COL1’] = ‘(?-s)^.{8}’
REGEX_DICT[‘STYLE_COL2’] = ‘(?-s)^.{8}\K.{8}’
REGEX_DICT[‘STYLE_COL3’] = ‘(?-s)^.{16}\K.{8}’Just another question, what do you try to achieve?
If you just want to have an column view on text you need to make
sure that all rows contain the same amount of text otherwise you will
see gaps.If there is something else I can do, let me know.
Cheers
Claudia -
Dear Claudia,
thanks for the quick response, this really gives me hope that somewhen soon I can solve this looong lasting problem of mine…
The task the script should perform is actually rather simple I suppose:I use a script-language called Nastran, that is an old, fortran related language, where everything is written in a ascii-text-file on a "fixed width basis, meaning, that each text-package has to be written in fields of length 8 (->column-width 8).
In linux, I use nedit as text text editor for these ascii-files, and in nedit I have the option to highlight columns, independent of there content. This makes programming very easy, since input errors based on accidentially shifted input is nicely visualized:https://goo.gl/photos/t3RKWFenAixy5Eve9
(can you see this pic?)
Since I am switching more and more to windows and npp, it would be great to have the same coloring-option…
With your input above I still cant get it to work I have to admit… I adapted it to highlight at least the first 8 columns, but it does not show any change when running the script… Maybe you can think of a simple version of highlighting colors independent of their content? I think it should not look too different from what you proposed, maybe without all the callbacks?
I think, the script has to be run only once, since the coloring is independent of whats actually in the text file…Thanks a lot again in advance!!
Kind regards,Johannes
-
Hello Johannes,
will follow up on this when back from work.
But, without callbacks would mean that you have to run it over and over again
to make it coloring again if text changes.One question in the meantime, is the 8 columns as shown in your photo,
also fix or could it be that more or less columns are required.Cheers
Claudia -
Hello Claudia,
thanks again for the quick reply! Oh, does not have to to without callbacks, I thought it might make life easier. In my version I tried without, but it did not work, so…:)
The 8 columns are fix.
In the meantime I will try some more if I find the time; in case I can get something to work, will let you know…
Thanks a lot again!
Kind regards,Johannes
-
Hello Johannes,
I do understand that you will give it a try yourself
and you will come back in case of a question, correct?Cheers
Claudia -
Hello @Claudia-frank and All,
If you don’t mind, Claudia, I would like you to test my example text, with my own version of your script !
Here is the sample text, in the Test.log file :
1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 12 123456 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 12 123456 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 12 123456 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 12 123456 1234567890123456789012345 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345 1234567890123456789012345 1234567890 1234567890123456789012345
Now, here is the part of your script, that I slightly changed :
# ---------------------------- configuration area ----------------------------- # stlye definitions - changes here need to match setup calls in init_scintilla() STYLE_DEFAULT = 0 # don't change this STYLE_WARNINGS = 1 STYLE_ERRORS = 2 STYLE_DEFAULT_FOREGROUND = (0,0,0) # Black STYLE_DEFAULT_BACKGROUND = (255,255,255) # White STYLE_WARNINGS_BACKGROUND = (0,255,0) # Green STYLE_ERRORS_BACKGROUND = (255,0,0) # Red REGISTERED_FILE_EXTENSIONS = ['.log'] USE_AS_DEFAULT_LANGUAGE = False REGEX_DICT = OrderedDict() # to be sure that regex get called in the same order each time REGEX_DICT['STYLE_WARNINGS'] = '(?-s)^.{8}' REGEX_DICT['STYLE_ERRORS'] = '(?-s)^.{16}\K.{4}' # -------------------------- configuration area end --------------------------- CUSTOM_STYLER_IS_RUNNING = globals().get('CUSTOM_STYLER_IS_RUNNING', False) LIST_OF_FIRST_RUN_DONE = [] DOC_NEEDS_TO_BE_COLORED = False def init_scintilla(): # this needs to match the definitions in configuration area editor.styleSetFore(STYLE_DEFAULT, STYLE_DEFAULT_FOREGROUND) editor.styleSetBack(STYLE_DEFAULT, STYLE_DEFAULT_BACKGROUND) editor.styleSetBack(STYLE_WARNINGS, STYLE_WARNINGS_BACKGROUND) editor.styleSetBack(STYLE_ERRORS, STYLE_ERRORS_BACKGROUND)
Notes :
-
As you can see, as I, usually, work with the default colours, I changed the STYLE_DEFAULT_… … colours
-
Of course, I changed the two regexes for testing :-D
-
In the init_scintilla() function, the last two lines, begin with editor.styleSetBack ( instead of editor.styleSetFore )
-
And, accordingly, I renamed the two variables STYLE_WARNINGS_FOREGROUND and STYLE_ERRORS_FOREGROUND into STYLE_WARNINGS_BACKGROUND and STYLE_ERRORS_BACKGROUND
After starting N++ and opening the Test.log file, I ran your script and… bingo : Wow, I got two coloured columns :
-
The eight characters and green one, between column 1 and column 8
-
The four characters and red one, between column 17 and column 20
That’s great, indeed !
Now, there an small problem : If you scroll up and down, the editor window, quickly enough, with the mouse, some parts of the text, which should not be matched, are highlighted too :-((. I, first, thought it was due to the
\K
syntax. Unfortunately, even if you write, as below :REGEX_DICT['STYLE_ERRORS'] = 'XYZ'
You’ll surely notice, some green parts of text are wrongly highlighted !
What do you think about it ?
Cheers,
guy038
-
-
Hi Guy,
scrolling that fast - you must be one of those who can read a whole book in under a minute ;-)
I assume the asynchronous updateui callback isn’t good in such a case.
Give it a try with the synchronous version.editor.callbackSync(callback_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI])
Cheers
Claudia -
Hi Claudia,
got it working, thanks a lot! Also the synchronous version works well, better than the asynchronous in my case I guess.https://goo.gl/photos/yqRr2o8rDJUm7Z2x6
(never mind the colors…)
One last question… I want to grey out comments-lines, which in this language start with a dollar sign and last till the end of the line (always). The dollar-sign can appear in the middle of a line, so also after a few numbers of actual code.
I thought that this regex might do, but it does not:REGEX_DICT[‘STYLE_COMM’] = ‘(?<=[$]).+$’
Can you give me a hint on how it should look like?
Thanks again!
Kind regards,Johannes
-
Hello, @Claudia-frank, @johannes-dillinger and All
A quick reply, because it’s 3h00 am, in France, and… I work, tomorrow !
It’s just fine ! No problem, anymore, with the synchronous UPDATEUI callback :-))) So, thanks to you, you have a new wonderful world to discover, with multiple colouring of, either, foreground and/or background of lines of files, with a specific extension, handled with regular expressions !!
I think that the correct regex ( tested, with a grey colour, on foreground and background ! ) should be, as below :
REGEX_DICT[‘STYLE_COMM’] = '(?-s)\$.*'
Notes :
-
The
(?-s)
modifier tells the regex engine that the meta-character.
matches a single standard character, only -
The
\$
searches for a literal$
character -
The final part
.*
looks for the remainder of current line, possibly empty !
Once more, a very powerful script, that you give us, Claudia ! A thousand thanks, Yeah !
Cheers,
guy038
-
-
Hello guy,
perfekt!! Thanks for the solution, all happy now:-)
Kind regards,
Johannes -
Hi, @claudia-frank,
When I, first, saw the @jcrmatos post, as the address, below :
I thought that it should have been possible to use your great script to visualize the gap between column 73 and 79, colouring the background. So, @jcrmatos could easily see, at once, the two limits, located at columns 72 and 79 !
Then, from your original script, I built the script, below, which highlights, in pale blue, the columns between the columns 73 and 79 :
import re, os from collections import OrderedDict # ---------------------------- configuration area ----------------------------- # style definitions - changes here need to match setup calls in init_scintilla() STYLE_DEFAULT = 0 # don't change this STYLE_EDGES = 1 STYLE_DEFAULT_FOREGROUND = (0,0,0) # Black STYLE_DEFAULT_BACKGROUND = (255,255,255) # White STYLE_EDGES_BACKGROUND = (208,240,255) # Pale Blue ( &xD0 , &xF0 , &xFF ) REGISTERED_FILE_EXTENSIONS = ['.log'] USE_AS_DEFAULT_LANGUAGE = False REGEX_DICT = OrderedDict() # to be sure that regex get called in the same order each time REGEX_DICT['STYLE_EDGES'] = '(?-s)^.{72}\K.{1,7}' # -------------------------- configuration area end --------------------------- CUSTOM_STYLER_IS_RUNNING = globals().get('CUSTOM_STYLER_IS_RUNNING', False) LIST_OF_FIRST_RUN_DONE = [] DOC_NEEDS_TO_BE_COLORED = False def init_scintilla(): # this needs to match the definitions in configuration area editor.styleSetFore(STYLE_DEFAULT, STYLE_DEFAULT_FOREGROUND) editor.styleSetBack(STYLE_DEFAULT, STYLE_DEFAULT_BACKGROUND) editor.styleSetBack(STYLE_EDGES, STYLE_EDGES_BACKGROUND) def custom_lexer(start_pos, end_pos): for k in REGEX_DICT.keys(): matches = [] editor.research(REGEX_DICT[k], lambda m: matches.append(m.span()), 0, start_pos, end_pos) for match in matches: # console.write('match:{} - {}\n'.format(match, editor.getTextRange(*match))) editor.startStyling(match[0],31) editor.setStyling(match[1]-match[0], eval(k)) def this_doc_should_be_styled(): global DOC_NEEDS_TO_BE_COLORED current_file = notepad.getCurrentFilename() _file, _ext = os.path.splitext(current_file) if _ext in REGISTERED_FILE_EXTENSIONS or (USE_AS_DEFAULT_LANGUAGE and _file.startswith('new ')): DOC_NEEDS_TO_BE_COLORED = True else: DOC_NEEDS_TO_BE_COLORED = False return DOC_NEEDS_TO_BE_COLORED def get_visible_area(): first_visible_line = editor.getFirstVisibleLine() last_visible_line = editor.linesOnScreen() + first_visible_line start_pos = editor.positionFromLine(first_visible_line) end_pos = editor.positionFromLine(last_visible_line) return start_pos, end_pos def callback_BUFFERACTIVATED(args): if this_doc_should_be_styled(): init_scintilla() if args['bufferID'] not in LIST_OF_FIRST_RUN_DONE: custom_lexer(*get_visible_area()) LIST_OF_FIRST_RUN_DONE.append(args['bufferID']) def callback_UPDATEUI(args): if DOC_NEEDS_TO_BE_COLORED and args['updated'] >= 4: custom_lexer(*get_visible_area()) def main(): global CUSTOM_STYLER_IS_RUNNING if CUSTOM_STYLER_IS_RUNNING: notepad.clearCallbacks([NOTIFICATION.BUFFERACTIVATED]) editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = False else: if this_doc_should_be_styled(): init_scintilla() custom_lexer(*get_visible_area()) notepad.callback(callback_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED]) editor.callbackSync(callback_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = True main()
Although, it works fine on .log files, I noticed two problems :
-
If I modify a .log file, by adding a new line, of more than 79 characters long, I have to re-run the script, twice, in order to get this new line highlighted, between the 73th and the 79th character ! ( BTW, a small drawback : the current line cannot be highlighted ! )
-
When I decided to change the .log extension by the .py extension, I was quite surprised to get, almost, all comments, of the python script, highlighted, too ! And sometimes, the columns between positions 73 and 79 were not highlighted.
So, Claudia, any explanation about these behaviours ?
Best Regards,
guy038
-
-
Hi Guy,
rerun two times because I implemented my start/stop logic as in most of my scripts.
See main sectiondef main(): global CUSTOM_STYLER_IS_RUNNING if CUSTOM_STYLER_IS_RUNNING: notepad.clearCallbacks([NOTIFICATION.BUFFERACTIVATED]) editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = False else: if this_doc_should_be_styled(): init_scintilla() custom_lexer(*get_visible_area()) notepad.callback(callback_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED]) editor.callbackSync(callback_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI]) CUSTOM_STYLER_IS_RUNNING = True
If CUSTOM_STYLER_IS_RUNNING it clears the callbacks so no further coloring,
if it is not running it registers the callbacks to make coloring happen.Concerning the renaming. This script is NOT a real lexer as it doesn’t register itself
to be a lexer by using container object, so it is critical that you choose an extension which hasn’t
a lexer assigned, which by the way means, that this script can’t be used as you normally do have
the python lexer active.Regarding the active line, yes, it gets overwritten by the active line style.
In general, scintilla doesn’t provide many functions when it comes to column based coloring,
basically the edge line is the only one I’ve encountered so far. Maybe a solution might be to
draw to window context itself but this would mean one have to handle window resizing messages,
themes etc…Cheers
Claudia -
Hi, @claudia-frank,
Oh yes, I just forgot that your script acts, exactly, like your RegexTesterPro.py script. This just proves that I haven’t studied your excellent regex’s tester script, since a long time !!
Concerning the second point, I do understand your explanations. To be honest, right after changing the extension to .py and before restarting your script, I already thought that this change could lead to some unpredictable results :-((
Cheers,
guy038