Define custom syntax hilighting with Pythonscript plugin

Mikhail V

I want to try сustomize the syntax colors using the Pythonscript plugin.
I’ve looked into examples by @Claudia-Frank here:
custom colors for methods

One thing I want to know: is it possible to make the code more ‘procedural-style’?

In those examples I am trying to understand what all those class, __metaclass__, etc.
things are and how it all works together.
Looks really hacky, to me at least. I am quite proficient with programming
but when I see a word class, I already get some kind of disturbance.

So the question is basically - is involving classes and method tweaking the only
approach in this case?

I am thinking about the algorithm and how would I go about colorising
characters and it comes down to such pseudo-program:

Take the indexes of start and end of visible chunk -
no problem, I can do it.
Find the indexes of needed characters sequence -
and store it in say “group 1”. No problem so far.
Do same for some other searches, store as “group 2”,
“3”, … etc. So end up with N groups of indexes.
In a loop, take each index group and mark with its own style , i.e.
group i -> style i

It seems everything is super-simple and I can cope with each step easily.
But is it possible to make the above steps into working example without
going into PhD-level meta-programming techniques?

Scott Sumner

@Mikhail-V said:

but when I see a word class, I already get some kind of disturbance

Well-worded…is the feeling similar to motion sickness? :-D

Mikhail V

@Scott-Sumner said:

Well-worded…is the feeling similar to motion sickness? :-D

Yes similar to that :)
Or for example a feeling when I have to fill in an application form
which should be easy, but the wording of questions gets irritating.

Claudia Frank

@Mikhail-V

I’m confident that the classes can be rewritten to use a more procedural-style as you said.

There are two reasons why I used classes here.

The first one involves the metaclass
As the script is using callbacks and the intention is to use this script
on several documents with the possibility to reassign a different lexer
and if needed, reassign this “pseudo”-lexer I wanted to make sure that
I do register the callback function only once - otherwise I do get
the callback function called multiple times and this already explains
what this metaclass feature does - it ensures that you always get the same
object when calling EnhancedUDLLexer.

The second reason is similar to the first one - make sure that the variables/functions
used are not overwritten by another (or the same) script as it would break the script immediately.

So, if you plan to use such a script on multiple documents then you need to solve this issue
otherwise you might see unexpected behavior.

Cheers
Claudia

Mikhail V

@Claudia-Frank said:

I’m confident that the classes can be rewritten to use a more procedural-style as you said.
There are two reasons why I used classes here.

Thanks for clarification :) I hope it does not look like I want to make fun of
you coding style. It is just my pet-peeve (OOP).

So I was able to run that example in the linked post.
Though I can run only on the new Npp version with the latest Pythonscript plugin.
I suspect you have some knowledge of non-documented features ;)

So I am still trying to understand the possibilities in the first place.
Earlier you have written about possibilities:

a) writing a lexer entirely with python script
b) writing an “xml-linter” with python script

This is taken out of the context but seems you got some experience.
So maybe you know the answer for these 2 questions:

Can I apply a styler to specific range? I make
emphasis on “styler” because in your examples you deal only with
“indicator” which only changes the color, but I want to change font and
size of the characters if it is possible.
Can I get the information about currently applied styler or
any info about the lexer state for the character at specific index?

In Scintilla Documentation I find this:

SCI_GETLINESTATE(int line) → int
As well as the 8 bits of lexical state stored for each character there 
is also an integer stored for each line.

This mentions some “state for each character”. ?? But I can’t find any other calls
that refers to this state. I guess the active lexer should store this useful info, but how to read it?

I am just trying to understand how plausible is the idea of writing a fully
custom highlighting with the PS plugin, and I am still in doubt.

Mikhail V

Follow-up: regarding question #1, there are related Scintilla API calls:

SCI_GETENDSTYLED → position
SCI_STARTSTYLING(int start, int unused)
SCI_SETSTYLING(int length, int style)
SCI_SETSTYLINGEX(int length, const char *styles)
SCI_SETIDLESTYLING(int idleStyling)
SCI_GETIDLESTYLING → int
SCI_SETLINESTATE(int line, int state)
SCI_GETLINESTATE(int line) → int
SCI_GETMAXLINESTATE → int

So I suggest this might work to set the styles somehow.

Claudia Frank

@Mikhail-V

I hope it does not look like I want to make fun of you coding style.

No problem, I haven’t understood it that way anyway. :-)
But I hope that I do get critics when I post something which could/should
be coded differently. For example like guy038 does when I post regexes which
can be simplified or aren’t correct at all.
I’m still learning new things every day. If you would have asked 6 month ago
whether it is possible to have two different lexer acting on the same documented
I would have posted NO, nowadays I know better or I should say, nowadays I know a
way around that problem.

So I was able to run that example in the linked post.
Though I can run only on the new Npp version with the latest Pythonscript plugin.
I suspect you have some knowledge of non-documented features ;)

I hope I haven’t used undocumented features but yes pythonscript > 1.0.8 is needed
as we have added notepad functions like notepad.getLanguageName(notepad.getLangType())
only recently. I should have made clear - thx for pointing out.

Before answering the two question let me clarify the two possible ways scintilla supports
to colorize the documents from my understanding about it.
The first one, used from the beginning of scintilla is styling and later there were indicators added.
The idea of having indicators is totally different to styling and, as far as I understand, it wasn’t
intended to use it as another way of styling. I just misuse it as some kind of light-way styler.
And yes, there are differences, when using styler and indicators.
A lexer(styler) get the information which part of the document needs to be restyled, indicators don’t get this info.
When using styling you have full control over every piece of styles, like different font,
with indicators you only can modify the foreground and background color
(ignoring the different shapes you can put around of text for the moment).
But because they are handled independently they can be used together and from my understanding indicators are
the only safe way to enhance an existing lexer.

Can I apply a styler to specific range? I make
emphasis on “styler” because in your examples you deal only with
“indicator” which only changes the color, but I want to change font and
size of the characters if it is possible.

You can by using styling functions but with the price that you have to write your own lexer.
Means you cannot use it together with an existing lexer.

Can I get the information about currently applied styler or
any info about the lexer state for the character at specific index?
Styles can be retrieved e.g. editor.getStyleAt function.
You cannot get an lexer state, in terms of a builtin lexer or an udl lexer.

SCI_GETLINESTATE(int line) → int
As well as the 8 bits of lexical state stored for each character there
is also an integer stored for each line.
This mentions some “state for each character”. ?? But I can’t find any other calls
that refers to this state. I guess the active lexer should store this useful info, but how to read it?

My understanding is that this is stored by scintilla internally instead of the lexer but haven’t really checked the sources.
The additional linestate functions have been introduced to provide a way to the lexer to have sub-lexers working.
Like in the html lexer where it is needed to have different lines colored different depending on which sub-lexer (php, js, html …) is used.

I am just trying to understand how plausible is the idea of writing a fully
custom highlighting with the PS plugin, and I am still in doubt.

I would say it depends - a full python lexer is always slower like an builtin or udl lexer,
but nowadays with such computing power it might be possible that you do not even notice the difference
if the documents which should be colored are only thousand, and not millions, of lines.
Writing your own lexer gives you the full control which means you can possibly do what a builtin or udl lexer can’t do
like having a regex based lexer.
Whether it makes sense or not is always up to the one who thinks about a possible solution.

I hope I was able to demystify this a little bit, if not, let me know.

Cheers
Claudia

Claudia Frank

@Mikhail-V

I have uploaded a proof-of-concept regex based lexer here.

Cheers
Claudia

Mikhail V

@Claudia-Frank
Thanks for the input!

I got to do a lot of experimenting with it.
What comes to my mind - I could use some external libararies for
the lexical analysis, but then I have another task - to import and use 3d party
libraries with the PS plugin… but anyway, it seems that is not necessary – I have
regex and could loop over bytes as well, so it is not main problem here.

All in all, sounds like an interesting challenge.