Custom lexer update syntax highlighting while user add new lines inside quoted string
-
Okay this is a very specific issue, but I’m open to any input because I not quite sure how best to fix it.
The CSV Lint plugin supports syntax highlighting and also quoted strings. When the user edits a file the
Lex()
function of the plugin is called, either from the start of a file or from the line that was edited, and this all works fine.However, recently there was this issue posted by a user, describing that when they start editing and adds new lines (an carriage return or enter) inside a quoted string then the syntax highlighting colors becomes incorrect so not as expected. See the screenshot below.
The problem is that the
Lex()
function is called with the start position at the beginning of the edited line, so in the example it’s at start of line 8. This skips the starting quote"
character, so the CSV Lint lexer assumes it is at the start of a file and initialises the color index to 1 (=blue) as if it is at new record.So it is treated as if it is at the start of the file, while infact it is starting inside a quoted string. Or put in another way, it is missing the context of the cursor location where the user has started editing.
This is quite a rare situation, so if I change the plugin to always check backwards to find any starting quote, then I think this could lead to slow performance, because for most of the time it would skip all the way back to the beginning of the file when there is no starting quote.
Any ideas on how to tackle this specific issue?
-
I had this issue sometimes while using CSVLint, but not always.
I’ll pull out my trusty old silly_example.csv and see if I can figure out under what circumstances this happens.
As you can see, in this example, your lexer seems to work fine for me when add newlines in the middle of a string.
What seems to cause problems is when, after creating a line break in a quoted string, you then enter text at the beginning of a subsequent line (so for example before the
m
ofalso \r\nmultiline
). Usually adding text elsewhere in a subsequent line (e.g., after thee
ofalso \r\nmultiline
) doesn’t change the lexing of that line, but it does change the lexing of subsequent lines in the quoted string (so ifalso\r\nmultiline\r\nstring
was all colored green, adding a character after thee
ofmultiline
would makestring
blue, but leave the second and first lines green).It also seems like I can’t get more than one newline in a string, unless I use find/replace to introduce multiple newlines in a single action.
I’ll stare at your lexer code for a while and see if I can understand it well enough to propose a fix.
-
Any ideas on how to tackle this specific issue?
Scintilla has an API for something called “line state”.
In broad terms, a lexer “bookmarks” a line in the document by sending
SCI_SETLINESTATE
with some meaningful value, then looks it up later on by sendingSCI_GETLINESTATE
.A few Lexilla lexers already use line state to keep track of nested stream comments:
For a (much more complicated) alternative, have a look at how the Python lexer implements f-strings.