Help with JSON and Scintilla Lexer

Tyree Bingham

After getting frustrated with JSON highlighting comments red, I started poking around the lexer on github and found that there is a function in LexJSON.cxx to allow comments

116
struct OptionSetJSON : public OptionSet<OptionsJSON> {
	OptionSetJSON() {
		DefineProperty("lexer.json.escape.sequence", &OptionsJSON::escapeSequence,
					   "Set to 1 to enable highlighting of escape sequences in strings");

		DefineProperty("lexer.json.allow.comments", &OptionsJSON::allowComments,
					   "Set to 1 to enable highlighting of line/block comments in JSON");

		DefineProperty("fold.compact", &OptionsJSON::foldCompact);
		DefineProperty("fold", &OptionsJSON::fold);
		DefineWordListSets(JSONWordListDesc);
	}
};

Later it checks those settings

389
			} else if (options.allowComments && context.Match("/*")) {
				context.SetState(SCE_JSON_BLOCKCOMMENT);
				context.Forward();
			} else if (options.allowComments && context.Match("//")) {
				context.SetState(SCE_JSON_LINECOMMENT);

And yet, nowhere am I allowed to change allowComments.
So my question is; Can I change this at all? For example, could I perhaps attach a shell and send a command that changes Lexer.json.allow.comments to 1? I’m just sick of looking at red highlights whenever there are comments in the JSON I open. I know that the JSON specifications technically exclude comments for many reasons, but many parsers check for comments and apparently those are the one’s I’m working with because there are almost always comments when I open a JSON file. Even Microsoft is using comments in their JSON. (For example, their new Terminal app has its settings stored in a JSON you have to edit yourself.)

Any help would be appreciated.

PeterJones

@Tyree-Bingham said in Help with JSON and Scintilla Lexer:

could I perhaps attach a shell and send a command that changes Lexer.json.allow.comments to 1?

Interesting idea. However, I think that Lexer.json’s options.allowComments is an internal variable that you cannot access from the outside world. So I don’t think that would be possible from a scripting plugin or external “shell”.

The best long-term solution would be to follow the FAQ to see how to issue a feature request. I know that the developer has previously done a language-specific option – the SQL backslash-as-escape option – on the Preferences > Language menu:

If you wanted to steal that image to embed in your feature request, and say something like

I would like to be able to allow comments in my JSON files; by default, the JSON lexer disables JSON comments, but line 389 of the JSON lexer source code shows there's an option `options.allowComments` in the lexer:

			} else if (options.allowComments && context.Match("/*")) {
				context.SetState(SCE_JSON_BLOCKCOMMENT);
				context.Forward();
			} else if (options.allowComments && context.Match("//")) {
				context.SetState(SCE_JSON_LINECOMMENT);

... and the Style Configurator has the LINECOMMENT and BLOCKCOMMENT styles, but those never take effect because the lexer option is turned off.  Would it be possible to add an option in the Notepad++ interface that would toggle that lexer option?  Maybe put it in **Preferences > Language**, like for the SQL backslash-as-escape option: ![](https://community.notepad-plus-plus.org/assets/uploads/files/1591019138165-90d075a6-fc62-4dc6-8299-3108ae91cef4-image.png)

Note: I wrote possible text for the issue for you because I think this is a good option to provide to all Notepad++ users. A year ago, I would have actually submitted the issue for you, but I learned recently that the developer pays more attention to issues put in by “normal users” like you, so it would increase your chances if you put in the request yourself. Feel free to rephrase to your own words

I’m just sick of looking at red highlights whenever there are comments in the JSON I open

As a workaround, you can add extra highlighting to a builtin lexer (like the JSON lexer) using regexes via the script EnhanceAnyBuiltinLexer.py that @Ekopalypse shares in this linked post. With that, you should be able to look for the comment sequence and set a new highlighting that would override the default error highlighting.

-—
PS: I had an idea that I want to experiment with – maybe it is possible to access, since it’s done with scintilla properties, but I’ve got to be away from the keyboard, so it might be a while before I get a chance to experiment. One of the other regulars here might be able to take my “scintilla properties hint” and finish the experiment before I get back. :-)

moderator: fixed link to match modern forum linking

Michael Vincent

@PeterJones said in Help with JSON and Scintilla Lexer:

PS: I had an idea that I want to experiment with – maybe it is possible to access, since it’s done with scintilla properties, but I’ve got to be away from the keyboard, so it might be a while before I get a chance to experiment. One of the other regulars here might be able to take my “scintilla properties hint” and finish the experiment before I get back. :-)

@Tyree-Bingham
Pick your scripting plugin of choice - mine is NppExec.

Testing for JSON properties with an open JSON document:

{
  "key1": "value1",
  "key2": 2,
  // comment
  "list1": [
    "keylist1": "keyvalue1"   
  ]
}

which looks like (note I’m using the Obsidian theme so colors may vary, but the comment is red error text):

I can use NppExec and get the following:

================ READY ================
SCI_SENDMSG SCI_PROPERTYNAMES 0 @""
================ READY ================
echo $(MSG_LPARAM)
lexer.json.escape.sequence
lexer.json.allow.comments
fold.compact
fold
================ READY ================

So the script to enable comments is (you’ll need to translate from this script to your scripting language preference (e.g., PythonScript, LuaScript, “PerlScript”).

SCI_SENDMSG SCI_SETPROPERTY "lexer.json.allow.comments" "1"

Now I see:

Cheers.

PeterJones

I mentioned,

scintilla properties

Yes, that worked – and as I was writing it up, @Michael-Vincent replied. :-)

The Scintilla interface has a way to set and get properties. I was able to confirm that you can write that property: in PythonScript (available through Plugins Admin), using editor.setProperty("lexer.json.allow.comments",1) will accomplish the same thing that Michael’s NppExec script did.

If you have a perl interpreter, Win32::Mechanize::NotepadPlusPlus will do the same with editor->setProperty("lexer.json.allow.comments",1) (though I had to click back in the editor pane to get the screen to refresh after Perl tried to update the property)

Does @dinkumoil want to add per-language settings to the ExtSettings plugin?

Michael Vincent

@PeterJones said in Help with JSON and Scintilla Lexer:

though I had to click back in the editor pane to get the screen to refresh

As did I with NppExec, thanks for mentioning that.

Cheers.

Michael Vincent

@PeterJones said in Help with JSON and Scintilla Lexer:

The Scintilla interface has a way to set and get properties.

I got pretty excited about this:

::lex
NPP_CONSOLE keep

SET LOCAL AFTER  = 0

IF "$(ARGC)"<="1" THEN
    SCI_SENDMSG SCI_PROPERTYNAMES 0 @""
    ECHO $(MSG_LPARAM)
ELSE IF "$(ARGC)"<="2" THEN
    SCI_SENDMSG SCI_DESCRIBEPROPERTY "$(ARGV[1])" @""
    ECHO $(MSG_LPARAM)
    SCI_SENDMSG SCI_GETPROPERTYEXPANDED "$(ARGV[1])" @""
    ECHO $(ARGV[1]) = $(MSG_LPARAM)
ELSE IF "$(ARGC)"<="3" THEN
    SCI_SENDMSG SCI_SETPROPERTY "$(ARGV[1])" "$(ARGV[2])"
ELSE
    GOTO USAGE
ENDIF
GOTO END

:USAGE
ECHO Usage:
ECHO   \$(ARGV[0])         = current lexer properties
ECHO   \$(ARGV[0]) [K]     = get value of property K
ECHO   \$(ARGV[0]) [K] [V] = set property K to value V

:END

Now I’ve been running \lex from NppExec console on all filetypes to see what magical properties Scintilla has that I can monkey with :-)

Cheers.

Ekopalypse

@Michael-Vincent

but don’t be sad if it turns out that very few lexers
propagate their “properties” in this way. :-(

PeterJones

@Ekopalypse,

propagate their “properties” in this way. :-(

Good timing. I just finished a script to explore this:

# encoding=utf-8
"""
!!!REQUIRES PythonScript 1.5.4 to avoid getLanguageDesc() bug!!!
"""
from Npp import *

console.show()
console.clear()

notepad.new()
keep = notepad.getLangType()

for t in [LANGTYPE.ADA, LANGTYPE.ASM, LANGTYPE.ASN1, LANGTYPE.ASP, LANGTYPE.AU3, LANGTYPE.AVS, LANGTYPE.BAANC, LANGTYPE.BASH, LANGTYPE.BATCH, LANGTYPE.BLITZBASIC, LANGTYPE.C, LANGTYPE.CAML, LANGTYPE.CMAKE, LANGTYPE.COBOL, LANGTYPE.COFFEESCRIPT, LANGTYPE.CPP, LANGTYPE.CS, LANGTYPE.CSOUND, LANGTYPE.CSS, LANGTYPE.D, LANGTYPE.DIFF, LANGTYPE.ERLANG, LANGTYPE.ESCRIPT, LANGTYPE.FLASH, LANGTYPE.FORTH, LANGTYPE.FORTRAN, LANGTYPE.FORTRAN_77, LANGTYPE.FREEBASIC, LANGTYPE.GUI4CLI, LANGTYPE.HASKELL, LANGTYPE.HTML, LANGTYPE.IHEX, LANGTYPE.INI, LANGTYPE.INNO, LANGTYPE.JAVA, LANGTYPE.JAVASCRIPT, LANGTYPE.JS, LANGTYPE.JSON, LANGTYPE.JSP, LANGTYPE.KIX, LANGTYPE.LATEX, LANGTYPE.LISP, LANGTYPE.LUA, LANGTYPE.MAKEFILE, LANGTYPE.MATLAB, LANGTYPE.MMIXAL, LANGTYPE.NIMROD, LANGTYPE.NNCRONTAB, LANGTYPE.NSIS, LANGTYPE.OBJC, LANGTYPE.OSCRIPT, LANGTYPE.PASCAL, LANGTYPE.PERL, LANGTYPE.PHP, LANGTYPE.POWERSHELL, LANGTYPE.PROPS, LANGTYPE.PS, LANGTYPE.PUREBASIC, LANGTYPE.PYTHON, LANGTYPE.R, LANGTYPE.RC, LANGTYPE.REBOL, LANGTYPE.REGISTRY, LANGTYPE.RUBY, LANGTYPE.RUST, LANGTYPE.SCHEME, LANGTYPE.SEARCHRESULT, LANGTYPE.SMALLTALK, LANGTYPE.SPICE, LANGTYPE.SQL, LANGTYPE.SREC, LANGTYPE.SWIFT, LANGTYPE.TCL, LANGTYPE.TEHEX, LANGTYPE.TEX, LANGTYPE.TXT, LANGTYPE.TXT2TAGS, LANGTYPE.USER, LANGTYPE.VB, LANGTYPE.VERILOG, LANGTYPE.VHDL, LANGTYPE.VISUALPROLOG, LANGTYPE.XML, LANGTYPE.YAML]:
    n = notepad.getLanguageName(t)
    d = notepad.getLanguageDesc(t)
    console.write("{!s:<35.35s} {!s:<35.35s}  {!s:<35.35s}\n".format(t,n,d))

    notepad.setLangType(t)
    for s in editor.propertyNames().split():
        if editor.getPropertyInt(s,-65537) > -65537:
            v = str(editor.getPropertyInt(s,-65537))
        else:
            v = '"' + editor.getProperty(s) + '"'
        console.write("\t{:<32.32}:{:01d}: {:<32.32} \"{}\"\n".format("'"+s+"'", editor.propertyType(s), v, editor.describeProperty(s)))

notepad.setLangType(keep)
notepad.close()

-–
Tangential: While I’ve got you here, @Ekopalypse , I manually used dir(LANGTYPE) to get that list, then I changed ['ADA', ...] to [LANGTYPE.ADA, ...]. Is there an easier way to do that inside Python, rather than manually making the list of types?
-–

Anyway, back to the results: quite a few have properties, but not all:

...
JAVASCRIPT                          JavaScript                           JavaScript file                    
	'styling.within.preprocessor'   :0: ""                               "For C++ code, determines whether all preprocessor code is styled in the preprocessor style (0, the default) or only from the initial # to the end of the command word(1)."
	'lexer.cpp.allow.dollars'       :0: ""                               "Set to 0 to disallow the '$' character in identifiers with the cpp lexer."
	'lexer.cpp.track.preprocessor'  :0: 0                                "Set to 1 to interpret #if/#else/#endif to grey out code that is not active."
	'lexer.cpp.update.preprocessor' :0: ""                               "Set to 1 to update preprocessor definitions when #define found."
	'lexer.cpp.verbatim.strings.allo:0: ""                               "Set to 1 to allow verbatim strings to contain escape sequences."
	'lexer.cpp.triplequoted.strings':0: ""                               "Set to 1 to enable highlighting of triple-quoted strings."
	'lexer.cpp.hashquoted.strings'  :0: ""                               "Set to 1 to enable highlighting of hash-quoted strings."
	'lexer.cpp.backquoted.strings'  :0: 1                                "Set to 1 to enable highlighting of back-quoted raw strings ."
	'lexer.cpp.escape.sequence'     :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
	'fold'                          :0: 1                                ""
	'fold.cpp.syntax.based'         :0: ""                               "Set this property to 0 to disable syntax based folding."
	'fold.comment'                  :0: 1                                "This option enables folding multi-line comments and explicit fold points when using the C++ lexer. Explicit fold points allows adding extra folding by placing a //{ comment at the start and a //} at the end of a section that should fold."
	'fold.cpp.comment.multiline'    :0: ""                               "Set this property to 0 to disable folding multi-line comments when fold.comment=1."
	'fold.cpp.comment.explicit'     :0: ""                               "Set this property to 0 to disable folding explicit fold points when fold.comment=1."
	'fold.cpp.explicit.start'       :2: ""                               "The string to use for explicit fold start points, replacing the standard //{."
	'fold.cpp.explicit.end'         :2: ""                               "The string to use for explicit fold end points, replacing the standard //}."
	'fold.cpp.explicit.anywhere'    :0: ""                               "Set this property to 1 to enable explicit fold points anywhere, not just in line comments."
	'fold.cpp.preprocessor.at.else' :0: ""                               "This option enables folding on a preprocessor #else or #endif line of an #if statement."
	'fold.preprocessor'             :0: 1                                "This option enables folding preprocessor directives when using the C++ lexer. Includes C#'s explicit #region and #endregion folding directives."
	'fold.compact'                  :0: 0                                ""
	'fold.at.else'                  :0: ""                               "This option enables C++ folding on a "} else {" line of an if statement."
JS                                  JavaScript                           JavaScript file                    
	'styling.within.preprocessor'   :0: ""                               "For C++ code, determines whether all preprocessor code is styled in the preprocessor style (0, the default) or only from the initial # to the end of the command word(1)."
	'lexer.cpp.allow.dollars'       :0: ""                               "Set to 0 to disallow the '$' character in identifiers with the cpp lexer."
	'lexer.cpp.track.preprocessor'  :0: 0                                "Set to 1 to interpret #if/#else/#endif to grey out code that is not active."
	'lexer.cpp.update.preprocessor' :0: ""                               "Set to 1 to update preprocessor definitions when #define found."
	'lexer.cpp.verbatim.strings.allo:0: ""                               "Set to 1 to allow verbatim strings to contain escape sequences."
	'lexer.cpp.triplequoted.strings':0: ""                               "Set to 1 to enable highlighting of triple-quoted strings."
	'lexer.cpp.hashquoted.strings'  :0: ""                               "Set to 1 to enable highlighting of hash-quoted strings."
	'lexer.cpp.backquoted.strings'  :0: 1                                "Set to 1 to enable highlighting of back-quoted raw strings ."
	'lexer.cpp.escape.sequence'     :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
	'fold'                          :0: 1                                ""
	'fold.cpp.syntax.based'         :0: ""                               "Set this property to 0 to disable syntax based folding."
	'fold.comment'                  :0: 1                                "This option enables folding multi-line comments and explicit fold points when using the C++ lexer. Explicit fold points allows adding extra folding by placing a //{ comment at the start and a //} at the end of a section that should fold."
	'fold.cpp.comment.multiline'    :0: ""                               "Set this property to 0 to disable folding multi-line comments when fold.comment=1."
	'fold.cpp.comment.explicit'     :0: ""                               "Set this property to 0 to disable folding explicit fold points when fold.comment=1."
	'fold.cpp.explicit.start'       :2: ""                               "The string to use for explicit fold start points, replacing the standard //{."
	'fold.cpp.explicit.end'         :2: ""                               "The string to use for explicit fold end points, replacing the standard //}."
	'fold.cpp.explicit.anywhere'    :0: ""                               "Set this property to 1 to enable explicit fold points anywhere, not just in line comments."
	'fold.cpp.preprocessor.at.else' :0: ""                               "This option enables folding on a preprocessor #else or #endif line of an #if statement."
	'fold.preprocessor'             :0: 1                                "This option enables folding preprocessor directives when using the C++ lexer. Includes C#'s explicit #region and #endregion folding directives."
	'fold.compact'                  :0: 0                                ""
	'fold.at.else'                  :0: ""                               "This option enables C++ folding on a "} else {" line of an if statement."
JSON                                json                                 JSON file                          
	'lexer.json.escape.sequence'    :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
	'lexer.json.allow.comments'     :0: ""                               "Set to 1 to enable highlighting of line/block comments in JSON"
	'fold.compact'                  :0: 0                                ""
	'fold'                          :0: 1                                ""
...
PERL                                Perl                                 Perl source file                   
	'fold'                          :0: 1                                ""
	'fold.comment'                  :0: 1                                ""
	'fold.compact'                  :0: 0                                ""
	'fold.perl.pod'                 :0: ""                               "Set to 0 to disable folding Pod blocks when using the Perl lexer."
	'fold.perl.package'             :0: ""                               "Set to 0 to disable folding packages when using the Perl lexer."
	'fold.perl.comment.explicit'    :0: ""                               "Set to 0 to disable explicit folding."
	'fold.perl.at.else'             :0: ""                               "This option enables Perl folding on a "} else {" line of an if statement."
...
PYTHON                              Python                               Python file                        
	'tab.timmy.whinge.level'        :1: ""                               "For Python code, checks whether indenting is consistent. The default, 0 turns off indentation checking, 1 checks whether each line is potentially inconsistent with the previous line, 2 checks whether any space characters occur before a tab character in the indentation, 3 checks whether any spaces are in the indentation, and 4 checks for any tab characters in the indentation. 1 is a good level to use."
	'lexer.python.literals.binary'  :0: ""                               "Set to 0 to not recognise Python 3 binary and octal literals: 0b1011 0o712."
	'lexer.python.strings.u'        :0: ""                               "Set to 0 to not recognise Python Unicode literals u"x" as used before Python 3."
	'lexer.python.strings.b'        :0: ""                               "Set to 0 to not recognise Python 3 bytes literals b"x"."
	'lexer.python.strings.f'        :0: ""                               "Set to 0 to not recognise Python 3.6 f-string literals f"var={var}"."
	'lexer.python.strings.over.newli:0: ""                               "Set to 1 to allow strings to span newline characters."
	'lexer.python.keywords2.no.sub.i:0: ""                               "When enabled, it will not style keywords2 items that are used as a sub-identifier. Example: when set, will not highlight "foo.open" when "open" is a keywords2 item."
	'fold'                          :0: 1                                ""
	'fold.quotes.python'            :0: 1                                "This option enables folding multi-line quoted strings when using the Python lexer."
	'fold.compact'                  :0: 0                                ""
	'lexer.python.unicode.identifier:0: ""                               "Set to 0 to not recognise Python 3 unicode identifiers."
...

(it was too long, so I pared down results

Ekopalypse

@PeterJones

oopss - I’m surprised to see that much info.
Boost::python seems to create the classes with names and values attributes,
which itself do return a dict and this can be iterated.
Something like for t in LANGTYPE.values.values():
should do the job.
The first values creates the dictionary(hash) and the second a list of LANGTYPE enum members.

Ekopalypse

By the way, what always works, but is banned by most of the python programmers
is using eval function. So something like

eval('LANGTYPE.{}'.format('ADA'))

will return the LANGTYPE.ADA object.

PeterJones

@Ekopalypse said in Help with JSON and Scintilla Lexer:

Something like for t in LANGTYPE.values.values():

Yep. Exactly that syntax worked for me. Much cleaner – and now my saved script will still work in future versions with new LANGTYPEs. :-)

Since the first worked, I didn’t bother with the eval, though that’s a good alternative. Also, for parsing the dir(...) strings, I would have had to eliminate the various helper methods, too, which I don’t have to do with the LANGTYPE.values.values() syntax, so all-in-all, values were better. :-)

PeterJones

@Tyree-Bingham ,

Sorry for hijacking your thread. In the first few responses, we showed a few of the scripting-languages’ methods for changing that setting. If you need more help, let us know which scripting language you chose, and where you’re having trouble.

Michael Vincent

@PeterJones said in Help with JSON and Scintilla Lexer:

Good timing. I just finished a script to explore this:

As the author of “PerlScript” why don’t you post Perl solutions ??? :-)

Pretty quick but roughly Perl equivalent of the PythonScript you posted:

#!perl

use strict;
use warnings;

use Win32::Mechanize::NotepadPlusPlus ':all';

my $keep = notepad->getLangType();

for my $t ( sort ( keys ( %LANGTYPE ) ) ) {
    my $n = notepad->getLanguageName($LANGTYPE{$t});
    my $d = notepad->getLanguageDesc($LANGTYPE{$t});
    printf "%-35s %-35s %-35s\n", $t, $n, $d;

    notepad->setLangType($LANGTYPE{$t});
    for my $s ( split "\n", editor->propertyNames() ) {
        my $v = editor->getProperty($s);
        printf "\t'%-32s' :%i: %-32s %s\n" , $s, editor->propertyType($s), '"' . $v . '"', editor->describeProperty($s);
    }
    print "\n";
}

notepad->setLangType($keep);

PeterJones

For future readers:

Approximately two years later, someone put in feature request #11713 to request this toggle. (They also put in a separate request #11676 to add all the features of JSON5/JSONC, not just comments.)

update: This request was implemented and released in v8.4.9 in January 2023 (see release notes item 3)