[c#] Adding a custom styler or lexer in C# for scintilla/notepad++
-
I’ve found some time to work on this again, and I’m adding the lexer to the CSV lint plug-in. So far it’s looking pretty good, though there are still some bugs to fix. 😏
-
I’ve got a Lexer related question.
The CSV Lint lexer needs to (among other things) set the separator character when selecting a different file. For example, filetest123.csv
will have the,
(comma) as separator, whiletabsfile.txt
will have\t
(tab) as separator.To make this work, I’ve added code in
Main
to catch the event when Notepad++ user changes to a different tab, so the event when a different file is shown, and I catch the notify like so:public static void OnNotification(ScNotification notification) { // changing tabs if (notification.Header.Code == (uint)NppMsg.NPPN_BUFFERACTIVATED) { // determine separator character current file var sep = SomeCodeToDetermineSeparator(); // set the separator character for the lexer ILexer.separatorChar = sep; } }
And then in the lexer there is a variable
separatorChar
which can be set, and that will be used in theLex()
method to give each column a different color.internal static class ILexer { public static readonly string Name = "CSVLint\0"; public static readonly string StatusText = "My CSV Lint example\0"; public static char separatorChar = '\t'; //etc. public static void Lex(IntPtr instance, UIntPtr start_pos, IntPtr length_doc, int init_style, IntPtr p_access) { int start = (int)start_pos; int length = (int)length_doc; IntPtr range_ptr = editor.GetRangePointer(start, length); string content = Marshal.PtrToStringAnsi(range_ptr, length); // use the separatorChar while (i < length) { if (content[i] == separatorChar) //etc. code for different color per column
This works, kind of, but the problem is that it doesn’t always show the colors correctly at first. When selecting a tab it’s all one color, but when you make one edit in the beginning of the file (add/remove one character) then
Lex()
is called again and the colors are shown corectly.I understand why this happens; when the separator character does not correspond with the file contents, then it will find the separator and the plug-in will interpret the entire line as one column.
So I suspect this is some timing issue, and the
Lex()
is already starting but theseparatorChar
is not updated yet, probably.So my question is:
What is the best way to communicate or set parameters to be used in theLex()
function? -
While working on the CSVLint lexer, the lexer randomly crashes when you have multiple files opened. I get this error message when debugging:
A call has been made on a garbage collected delegate ‘CSVLint!NppPluginNET.PluginInfrastructure.ILexer+ILexerLex::Invoke’
So I checked the original EdifactLexer example project, but there the same thing happens. When you open more than 1 file with the EdifactLexer enabled, then Notepad++ also crashes when you switch between the files.
The error is slightly different, I think because EdifactLexer uses the wordlists while CSVlList doesn’t, see this error
A call has been made on a garbage collected delegate ‘EdifactLexer!NppPluginNET.PluginInfrastructure.ILexer+ILexerWordListSet::Invoke’
It’s always when switching tabs to the other file, but I can’t quite nail down the circumstances . It seems to happen either when you start editing one of the files, or after you’ve manually enabled the lexer from the language menu, and then switch tabs.
@Ekopalypse Could this have something to do with switching between the
_scintillaMainHandle
and_scintillaSecondHandle
? -
@Bas-de-Reuver said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
What is the best way to communicate or set parameters to be used in the Lex() function?
Use the PropertySet method to inform the lexer that a different seperator character has been selected. To quote from the docs:
The return values from PropertySet and WordListSet are used to indicate whether the change requires performing lexing or folding over any of the document. It is the position at which to restart lexing and folding or -1 if the change does not require any extra work on the document. A simple approach is to return 0 if there is any possibility that a change requires lexing the document again while an optimisation could be to remember where a setting first affects the document and return that position.
Could this have something to do with switching between the _scintillaMainHandle and _scintillaSecondHandle
Only if those documents are each in one view.
If it’s just a different tab in the same view, then it’s always the
same Scintilla handle, and I’m pretty sure I tested that.
Hmm, let me double check that today.
Do you have any sample data where this crash occurs? -
@Ekopalypse said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
Use the PropertySet method
Thanks that hadn’t occurred to me to use that, I’ll look into it. The lexer needs fewer parameters than the CSV editing functions, it only needs the separator character, and/or the widths (for fixed width files) for the columns.
Do you have any sample data where this crash occurs?
I’ve added an extra edifact data file, see
edifact_example.txt
andedifact_example_2.edi
on the github page NppPluginLexerExample. Btw afaik the data contents doesn’t really affect the crashing, it’s just switching between the tabs.Easiest I can reproduce it is like this:
- open two .EDI files, both should have syntax colors because of file extension
- edit or delete some line(s) from one file
- switch to other file
- edit or delete any line(s) from second file
- switch back to tab of first file
It’s not very consistent but it usually crashes either at step 3) or step 5) though sometimes it requires repeating it for one more time even.
-
I just did a quick test with Npp 8.1.2 and can see the crash,
then I tested with Npp 7.9.5, the version I was originally using,
and the crash did not occur.
I’m not sure if this is a problem introduced by the new Npp
version or if this just exposed a bug in the plugin.
This needs to be investigated further - I will keep you posted. -
Nope, I see the crash even with 7.9.5 when the files have a different size.
I suppose I know where this is going, the lexer is still assuming
the previous buffer and trying to style text in an area where the
current buffer is invalid.
It seems I need to find a way to implement IDocument interface. -
Ok, the good news is that I got the IDocument interface working,
the bad news is that the issue still exists.
It seems GC is the issue.Managed Debugging Assistant 'CallbackOnCollectedDelegate' : 'A callback was made on a garbage collected delegate of type 'EdifactLexer!NppPluginNET.PluginInfrastructure.ILexer+ILexerLex::Invoke'. This may cause application crashes, corruption and data loss. When passing delegates to unmanaged code, they must be kept alive by the managed application until it is guaranteed that they will never be called.'
I thought defining a class with static like
internal static class ILexer
andstatic ILexer4 ilexer4 = new ILexer4 { };
prevents it from getting collected but this message obviously tells me it is not. :-(
So, what needs to be done to prevent the class from being GC’ed?Enough for today, I’m going to sleep.
-
At first I thought Notepad++ made a
Lexer
instance per document, but theGetLexerFactory(int index)
only gets called once. I don’t really know whatIDocument
has got to do with this, or how to fix this.Also, I’ve tested with two files, file A is smaller and file B is larger. As far as I can tell it always crashes when you edit the smallest of the two files, never when editing the larger one.
-
Interestingly, the
public static IntPtr ILexerImplementation()
does get called multiple times, so for every tab that needs theLex()
for the colors.So if I start Notepad++ and it has two CSV file tabs already opened from the previous session, the last file is show with colors, and
IntPtr ILexerImplementation()
has been called only once. When I then switch to the other CSV tabIntPtr ILexerImplementation()
is called a second time. Soilexer4
is implement again with all new properties. -
If I add a check to see if it is already initialised then it doesn’t crash anymore, so that’s good.
public static IntPtr ILexerImplementation() { if (ilexer4.Version == null) { // simulate a c++ vtable by creating an array of 25 function pointers ilexer4.Version = new ILexerVersion(Version); ilexer4.Release = new ILexerRelease(Release); ilexer4.PropertyNames = new ILexerPropertyNames(PropertyNames); //etc
But I’m not sure if this is a good solution or if it’s considered just a hack
-
@Bas-de-Reuver said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
At first I thought Notepad++ made a Lexer instance per document
Also, I’ve tested with two files, file A is smaller and file B is larger.
Interestingly, the public static IntPtr ILexerImplementation() does get called multiple times…This is also my understanding.
GetLexerFactory is called once, by Scintilla, to get the function that returns the pointer of the ILexer implementation.
Each time a document is activated that has this Lexer assigned to it, the ILexerImplementation function is called.If I add a check to see if it is already initialised then it doesn’t crash anymore, so that’s good.
But I’m not sure if this is a good solution or if it’s considered just a hackI think you are on to something. If this solves the crash problem, then it means that the garbage collection
happened when the functions were renewed and, ultimately, the vtable_pointer also.
I have updated my fork regarding this, which also contains the implementation of the IDcoument interface.The IDocument interface provides predefined methods for lexing and folding a document.
Unlike the ILexer implementation, where we provide methods for Scintilla, the IDocument interface is where Scintilla provides us with methods.
The advantage, Scintilla only ever processes the document where it also calls Lex or Fold.
The easiest way to see how this happens is to open two documents and move one of them to the other view that Npp offers.
If you now put the focus in one view and then move the mouse to the other view WITHOUT activating it and start scrolling,
by using the mouse wheel, this non-active document will be processed accordingly.
This was not possible with the original version, because only the active document was handled.From a lexer’s point of view, all requirements are met now.
I hope that this also solves the issue of garbage collection. -
@Ekopalypse said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
I think you are on to something. If this solves the crash problem, then it means that the garbage collection
happened when the functions were renewed and, ultimately, the vtable_pointer also.
I have updated my fork regarding this, which also contains the implementation of the IDcoument interface.Sounds great, I hadn’t even noticed the “other view” issue as I haven’t used it. If you want to do a pullrequest, I’d be happy to accept it.
The LexerExample project would be pretty much done then, I only want to clean it up, remove all unused menu items and add one option to highlghts the numeric values in yellow or something. That will make the lexer “user-interactive” so to speak, because that is still also needed for the CSV lint lexer.
-
Thanks, I’ll do the PR tomorrow.
I would like to play through it again.
Feel free to change anything that is not really C#-like
and I would be happy, if you could drop me a short info when
you have done your cleanup, then I would write my VLang Lexer
in C# and use it for a while to see that we don’t have some hidden/unknown issues. -
@Ekopalypse Ok I’ve cleaned up the project and removed the unused menuitems, and I’ve added a menu item with a checkmark, you can click it to toggle on or off. Also I’ve changed the lexer so it can hightlight numeric values, which can be toggled on or off with a boolean.
However, I can read the property from the property list, see here, but how do I send a boolean value to the
PropertySet
function of the iLexer? I mean what is the best way to invoke thePropertySet
function of the iLexer from the main menu? See the line here, it’s commented/disabledIf I understand correctly, invoking the
PropertySet
will cause theLex()
to also refresh (depending on the return value) which is good because it will then toggle on/off the numeric highlights. -
Thx, yes but no. You should NEVER call the methods from the
ILexer interface but there corresponding functions from the
scintilla object.
Like SetProperty and SetKeywords.
The ILexer methods are meant to be used by scintilla exclusively.Sorry I haven’t done the PR yet, but real life and other projects got in the way.
-
PR made - I hope I didn’t mix up too many things when I merged your changes and mine into one PR.
From a high level perspective, the Lexer implementation is now fully functional and my tests have shown no more crashes, but I guess only the future can tell.
-
@Ekopalypse said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
PR made - I hope I didn’t mix up too many things when I merged your changes and mine into one PR.
Thanks for the PR I’ve updated the Lexer Example project and I’ve also updated the CSV Lint plug-in and included the syntax highlighting. It works great for both csv and fixed width data 😃 (also quoted string values are now supported, which was an important issue)
About the syntax highlighting, I’ve tested it extensively and it works in large part, but there are some minor things I ran into:
-
Large files take quite a white for the lexer to render, like about 10 seconds for a 10 MB file. And then the lexer is also called for every edit the user does, so each edit to the textfile is slightly delayed, like >half/quarter second for each keypress. It’s workable but you definitely notice it. I’ve added a larger edi data file to test it (and a script to generate even larger files).
-
When changing an option in the lexer, it isn’t always immediately updated. The edifact lexer now has an option to give all numeric values get a different yellow color. When you click the menu item to toggle this on, sometimes it works immediately but sometimes you have to click in the document or switch tabs back and forth before it’s visible. I’ve set it correctly afaik through the
SetProperty
of the editor (see here). -
In the CSV lexer, sometimes the parts that are supposed to be unstyled (so just white) do contain a random color. I think it’s because the Lexer just skips the parts that should be empty, so with no styling (index=0), and then when you edit the document the styling gets shifted around. I suspect that the solution is to either clear all styling on every lexer update, or explicitly set the “no styling” (the latter is probably better/faster performance).
-
-
Large files take quite a …
That sounds like the lexer tries to lex the whole document instead of the range provided by scintilla.
When changing an option in the lexer, it isn’t always immediately updated
I didn’t notice this with my own lexer. As soon as I use SetProperty, PropertySet from the lexer is called
and changes do get updated via the ILexer interface.In the CSV lexer, sometimes
Yes, that’s the interesting part of lexing. What do you do with the parts that have changed.
Brute force would be to clean everything up - style everything - but of course that is time consuming.
Usually scintilla can make a pretty good guess as to which parts need to be reformatted,
but it can’t make sure it’s always right.
Personally, however, I have not noticed this in my projects yet.I’ll take another look at the above points and let you know.
-
@Ekopalypse said in [c#] Adding a custom styler or lexer in C# for scintilla/notepad++:
I tried the edifact lexer with a 20MB file and the first load
took about 7 seconds, but then each change was made
in 4-7 ms according towatch.ElapsedMilliseconds
.
A good test to see if a lexer is fast enough is to press a key and hold it.
If you don’t see any stuttering while rendering, the lexer is fast enough.According to the property, it looks like it is used immediately,
but I have found that it can take a while, depending on where
you are in the current file, since it starts changing from position 0 (IntPtr.Zero
).
A possible solution would be to start with the first visible line
to speed things up.As for the (re)styling, I haven’t found a good general approach.
I suppose this is something each lexer has to take care of itself.