Help with JSON and Scintilla Lexer



  • After getting frustrated with JSON highlighting comments red, I started poking around the lexer on github and found that there is a function in LexJSON.cxx to allow comments

    116
    struct OptionSetJSON : public OptionSet<OptionsJSON> {
    	OptionSetJSON() {
    		DefineProperty("lexer.json.escape.sequence", &OptionsJSON::escapeSequence,
    					   "Set to 1 to enable highlighting of escape sequences in strings");
    
    		DefineProperty("lexer.json.allow.comments", &OptionsJSON::allowComments,
    					   "Set to 1 to enable highlighting of line/block comments in JSON");
    
    		DefineProperty("fold.compact", &OptionsJSON::foldCompact);
    		DefineProperty("fold", &OptionsJSON::fold);
    		DefineWordListSets(JSONWordListDesc);
    	}
    };
    

    Later it checks those settings

    389
    			} else if (options.allowComments && context.Match("/*")) {
    				context.SetState(SCE_JSON_BLOCKCOMMENT);
    				context.Forward();
    			} else if (options.allowComments && context.Match("//")) {
    				context.SetState(SCE_JSON_LINECOMMENT);
    

    And yet, nowhere am I allowed to change allowComments.
    So my question is; Can I change this at all? For example, could I perhaps attach a shell and send a command that changes Lexer.json.allow.comments to 1? I’m just sick of looking at red highlights whenever there are comments in the JSON I open. I know that the JSON specifications technically exclude comments for many reasons, but many parsers check for comments and apparently those are the one’s I’m working with because there are almost always comments when I open a JSON file. Even Microsoft is using comments in their JSON. (For example, their new Terminal app has its settings stored in a JSON you have to edit yourself.)

    Any help would be appreciated.



  • @Tyree-Bingham said in Help with JSON and Scintilla Lexer:

    could I perhaps attach a shell and send a command that changes Lexer.json.allow.comments to 1?

    Interesting idea. However, I think that Lexer.json’s options.allowComments is an internal variable that you cannot access from the outside world. So I don’t think that would be possible from a scripting plugin or external “shell”.

    The best long-term solution would be to follow the FAQ to see how to issue a feature request. I know that the developer has previously done a language-specific option – the SQL backslash-as-escape option – on the Preferences > Language menu:
    90d075a6-fc62-4dc6-8299-3108ae91cef4-image.png
    If you wanted to steal that image to embed in your feature request, and say something like

    I would like to be able to allow comments in my JSON files; by default, the JSON lexer disables JSON comments, but line 389 of the JSON lexer source code shows there's an option `options.allowComments` in the lexer:
    
    			} else if (options.allowComments && context.Match("/*")) {
    				context.SetState(SCE_JSON_BLOCKCOMMENT);
    				context.Forward();
    			} else if (options.allowComments && context.Match("//")) {
    				context.SetState(SCE_JSON_LINECOMMENT);
    
    ... and the Style Configurator has the LINECOMMENT and BLOCKCOMMENT styles, but those never take effect because the lexer option is turned off.  Would it be possible to add an option in the Notepad++ interface that would toggle that lexer option?  Maybe put it in **Preferences > Language**, like for the SQL backslash-as-escape option: ![](https://community.notepad-plus-plus.org/assets/uploads/files/1591019138165-90d075a6-fc62-4dc6-8299-3108ae91cef4-image.png)
    

    Note: I wrote possible text for the issue for you because I think this is a good option to provide to all Notepad++ users. A year ago, I would have actually submitted the issue for you, but I learned recently that the developer pays more attention to issues put in by “normal users” like you, so it would increase your chances if you put in the request yourself. Feel free to rephrase to your own words

    I’m just sick of looking at red highlights whenever there are comments in the JSON I open

    As a workaround, you can add extra highlighting to a builtin lexer (like the JSON lexer) using regexes via the script EnhanceAnyBuiltinLexer.py that @Ekopalypse shares in this linked post. With that, you should be able to look for the comment sequence and set a new highlighting that would override the default error highlighting.

    ----
    PS: I had an idea that I want to experiment with – maybe it is possible to access, since it’s done with scintilla properties, but I’ve got to be away from the keyboard, so it might be a while before I get a chance to experiment. One of the other regulars here might be able to take my “scintilla properties hint” and finish the experiment before I get back. :-)



  • @PeterJones said in Help with JSON and Scintilla Lexer:

    PS: I had an idea that I want to experiment with – maybe it is possible to access, since it’s done with scintilla properties, but I’ve got to be away from the keyboard, so it might be a while before I get a chance to experiment. One of the other regulars here might be able to take my “scintilla properties hint” and finish the experiment before I get back. :-)

    @Tyree-Bingham
    Pick your scripting plugin of choice - mine is NppExec.

    Testing for JSON properties with an open JSON document:

    {
      "key1": "value1",
      "key2": 2,
      // comment
      "list1": [
        "keylist1": "keyvalue1"   
      ]
    }
    

    which looks like (note I’m using the Obsidian theme so colors may vary, but the comment is red error text):

    5cb2217d-dd44-4acd-8127-97fbfdf47241-image.png

    I can use NppExec and get the following:

    ================ READY ================
    SCI_SENDMSG SCI_PROPERTYNAMES 0 @""
    ================ READY ================
    echo $(MSG_LPARAM)
    lexer.json.escape.sequence
    lexer.json.allow.comments
    fold.compact
    fold
    ================ READY ================
    

    So the script to enable comments is (you’ll need to translate from this script to your scripting language preference (e.g., PythonScript, LuaScript, “PerlScript”).

    SCI_SENDMSG SCI_SETPROPERTY "lexer.json.allow.comments" "1"
    

    Now I see:

    6a6aae80-72c4-430e-aaed-bfc0e33981a7-image.png

    Cheers.



  • I mentioned,

    scintilla properties

    Yes, that worked – and as I was writing it up, @Michael-Vincent replied. :-)

    The Scintilla interface has a way to set and get properties. I was able to confirm that you can write that property: in PythonScript (available through Plugins Admin), using editor.setProperty("lexer.json.allow.comments",1) will accomplish the same thing that Michael’s NppExec script did.

    If you have a perl interpreter, Win32::Mechanize::NotepadPlusPlus will do the same with editor->setProperty("lexer.json.allow.comments",1) (though I had to click back in the editor pane to get the screen to refresh after Perl tried to update the property)

    Does @dinkumoil want to add per-language settings to the ExtSettings plugin?



  • @PeterJones said in Help with JSON and Scintilla Lexer:

    though I had to click back in the editor pane to get the screen to refresh

    As did I with NppExec, thanks for mentioning that.

    Cheers.



  • @PeterJones said in Help with JSON and Scintilla Lexer:

    The Scintilla interface has a way to set and get properties.

    I got pretty excited about this:

    ::lex
    NPP_CONSOLE keep
    
    SET LOCAL AFTER  = 0
    
    IF "$(ARGC)"<="1" THEN
        SCI_SENDMSG SCI_PROPERTYNAMES 0 @""
        ECHO $(MSG_LPARAM)
    ELSE IF "$(ARGC)"<="2" THEN
        SCI_SENDMSG SCI_DESCRIBEPROPERTY "$(ARGV[1])" @""
        ECHO $(MSG_LPARAM)
        SCI_SENDMSG SCI_GETPROPERTYEXPANDED "$(ARGV[1])" @""
        ECHO $(ARGV[1]) = $(MSG_LPARAM)
    ELSE IF "$(ARGC)"<="3" THEN
        SCI_SENDMSG SCI_SETPROPERTY "$(ARGV[1])" "$(ARGV[2])"
    ELSE
        GOTO USAGE
    ENDIF
    GOTO END
    
    :USAGE
    ECHO Usage:
    ECHO   \$(ARGV[0])         = current lexer properties
    ECHO   \$(ARGV[0]) [K]     = get value of property K
    ECHO   \$(ARGV[0]) [K] [V] = set property K to value V
    
    :END
    

    Now I’ve been running \lex from NppExec console on all filetypes to see what magical properties Scintilla has that I can monkey with :-)

    Cheers.



  • @Michael-Vincent

    but don’t be sad if it turns out that very few lexers
    propagate their “properties” in this way. :-(



  • @Ekopalypse,

    propagate their “properties” in this way. :-(

    Good timing. I just finished a script to explore this:

    # encoding=utf-8
    """
    !!!REQUIRES PythonScript 1.5.4 to avoid getLanguageDesc() bug!!!
    """
    from Npp import *
    
    console.show()
    console.clear()
    
    notepad.new()
    keep = notepad.getLangType()
    
    for t in [LANGTYPE.ADA, LANGTYPE.ASM, LANGTYPE.ASN1, LANGTYPE.ASP, LANGTYPE.AU3, LANGTYPE.AVS, LANGTYPE.BAANC, LANGTYPE.BASH, LANGTYPE.BATCH, LANGTYPE.BLITZBASIC, LANGTYPE.C, LANGTYPE.CAML, LANGTYPE.CMAKE, LANGTYPE.COBOL, LANGTYPE.COFFEESCRIPT, LANGTYPE.CPP, LANGTYPE.CS, LANGTYPE.CSOUND, LANGTYPE.CSS, LANGTYPE.D, LANGTYPE.DIFF, LANGTYPE.ERLANG, LANGTYPE.ESCRIPT, LANGTYPE.FLASH, LANGTYPE.FORTH, LANGTYPE.FORTRAN, LANGTYPE.FORTRAN_77, LANGTYPE.FREEBASIC, LANGTYPE.GUI4CLI, LANGTYPE.HASKELL, LANGTYPE.HTML, LANGTYPE.IHEX, LANGTYPE.INI, LANGTYPE.INNO, LANGTYPE.JAVA, LANGTYPE.JAVASCRIPT, LANGTYPE.JS, LANGTYPE.JSON, LANGTYPE.JSP, LANGTYPE.KIX, LANGTYPE.LATEX, LANGTYPE.LISP, LANGTYPE.LUA, LANGTYPE.MAKEFILE, LANGTYPE.MATLAB, LANGTYPE.MMIXAL, LANGTYPE.NIMROD, LANGTYPE.NNCRONTAB, LANGTYPE.NSIS, LANGTYPE.OBJC, LANGTYPE.OSCRIPT, LANGTYPE.PASCAL, LANGTYPE.PERL, LANGTYPE.PHP, LANGTYPE.POWERSHELL, LANGTYPE.PROPS, LANGTYPE.PS, LANGTYPE.PUREBASIC, LANGTYPE.PYTHON, LANGTYPE.R, LANGTYPE.RC, LANGTYPE.REBOL, LANGTYPE.REGISTRY, LANGTYPE.RUBY, LANGTYPE.RUST, LANGTYPE.SCHEME, LANGTYPE.SEARCHRESULT, LANGTYPE.SMALLTALK, LANGTYPE.SPICE, LANGTYPE.SQL, LANGTYPE.SREC, LANGTYPE.SWIFT, LANGTYPE.TCL, LANGTYPE.TEHEX, LANGTYPE.TEX, LANGTYPE.TXT, LANGTYPE.TXT2TAGS, LANGTYPE.USER, LANGTYPE.VB, LANGTYPE.VERILOG, LANGTYPE.VHDL, LANGTYPE.VISUALPROLOG, LANGTYPE.XML, LANGTYPE.YAML]:
        n = notepad.getLanguageName(t)
        d = notepad.getLanguageDesc(t)
        console.write("{!s:<35.35s} {!s:<35.35s}  {!s:<35.35s}\n".format(t,n,d))
    
        notepad.setLangType(t)
        for s in editor.propertyNames().split():
            if editor.getPropertyInt(s,-65537) > -65537:
                v = str(editor.getPropertyInt(s,-65537))
            else:
                v = '"' + editor.getProperty(s) + '"'
            console.write("\t{:<32.32}:{:01d}: {:<32.32} \"{}\"\n".format("'"+s+"'", editor.propertyType(s), v, editor.describeProperty(s)))
    
    notepad.setLangType(keep)
    notepad.close()
    


    Tangential: While I’ve got you here, @Ekopalypse , I manually used dir(LANGTYPE) to get that list, then I changed ['ADA', ...] to [LANGTYPE.ADA, ...]. Is there an easier way to do that inside Python, rather than manually making the list of types?

    Anyway, back to the results: quite a few have properties, but not all:

    ...
    JAVASCRIPT                          JavaScript                           JavaScript file                    
    	'styling.within.preprocessor'   :0: ""                               "For C++ code, determines whether all preprocessor code is styled in the preprocessor style (0, the default) or only from the initial # to the end of the command word(1)."
    	'lexer.cpp.allow.dollars'       :0: ""                               "Set to 0 to disallow the '$' character in identifiers with the cpp lexer."
    	'lexer.cpp.track.preprocessor'  :0: 0                                "Set to 1 to interpret #if/#else/#endif to grey out code that is not active."
    	'lexer.cpp.update.preprocessor' :0: ""                               "Set to 1 to update preprocessor definitions when #define found."
    	'lexer.cpp.verbatim.strings.allo:0: ""                               "Set to 1 to allow verbatim strings to contain escape sequences."
    	'lexer.cpp.triplequoted.strings':0: ""                               "Set to 1 to enable highlighting of triple-quoted strings."
    	'lexer.cpp.hashquoted.strings'  :0: ""                               "Set to 1 to enable highlighting of hash-quoted strings."
    	'lexer.cpp.backquoted.strings'  :0: 1                                "Set to 1 to enable highlighting of back-quoted raw strings ."
    	'lexer.cpp.escape.sequence'     :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
    	'fold'                          :0: 1                                ""
    	'fold.cpp.syntax.based'         :0: ""                               "Set this property to 0 to disable syntax based folding."
    	'fold.comment'                  :0: 1                                "This option enables folding multi-line comments and explicit fold points when using the C++ lexer. Explicit fold points allows adding extra folding by placing a //{ comment at the start and a //} at the end of a section that should fold."
    	'fold.cpp.comment.multiline'    :0: ""                               "Set this property to 0 to disable folding multi-line comments when fold.comment=1."
    	'fold.cpp.comment.explicit'     :0: ""                               "Set this property to 0 to disable folding explicit fold points when fold.comment=1."
    	'fold.cpp.explicit.start'       :2: ""                               "The string to use for explicit fold start points, replacing the standard //{."
    	'fold.cpp.explicit.end'         :2: ""                               "The string to use for explicit fold end points, replacing the standard //}."
    	'fold.cpp.explicit.anywhere'    :0: ""                               "Set this property to 1 to enable explicit fold points anywhere, not just in line comments."
    	'fold.cpp.preprocessor.at.else' :0: ""                               "This option enables folding on a preprocessor #else or #endif line of an #if statement."
    	'fold.preprocessor'             :0: 1                                "This option enables folding preprocessor directives when using the C++ lexer. Includes C#'s explicit #region and #endregion folding directives."
    	'fold.compact'                  :0: 0                                ""
    	'fold.at.else'                  :0: ""                               "This option enables C++ folding on a "} else {" line of an if statement."
    JS                                  JavaScript                           JavaScript file                    
    	'styling.within.preprocessor'   :0: ""                               "For C++ code, determines whether all preprocessor code is styled in the preprocessor style (0, the default) or only from the initial # to the end of the command word(1)."
    	'lexer.cpp.allow.dollars'       :0: ""                               "Set to 0 to disallow the '$' character in identifiers with the cpp lexer."
    	'lexer.cpp.track.preprocessor'  :0: 0                                "Set to 1 to interpret #if/#else/#endif to grey out code that is not active."
    	'lexer.cpp.update.preprocessor' :0: ""                               "Set to 1 to update preprocessor definitions when #define found."
    	'lexer.cpp.verbatim.strings.allo:0: ""                               "Set to 1 to allow verbatim strings to contain escape sequences."
    	'lexer.cpp.triplequoted.strings':0: ""                               "Set to 1 to enable highlighting of triple-quoted strings."
    	'lexer.cpp.hashquoted.strings'  :0: ""                               "Set to 1 to enable highlighting of hash-quoted strings."
    	'lexer.cpp.backquoted.strings'  :0: 1                                "Set to 1 to enable highlighting of back-quoted raw strings ."
    	'lexer.cpp.escape.sequence'     :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
    	'fold'                          :0: 1                                ""
    	'fold.cpp.syntax.based'         :0: ""                               "Set this property to 0 to disable syntax based folding."
    	'fold.comment'                  :0: 1                                "This option enables folding multi-line comments and explicit fold points when using the C++ lexer. Explicit fold points allows adding extra folding by placing a //{ comment at the start and a //} at the end of a section that should fold."
    	'fold.cpp.comment.multiline'    :0: ""                               "Set this property to 0 to disable folding multi-line comments when fold.comment=1."
    	'fold.cpp.comment.explicit'     :0: ""                               "Set this property to 0 to disable folding explicit fold points when fold.comment=1."
    	'fold.cpp.explicit.start'       :2: ""                               "The string to use for explicit fold start points, replacing the standard //{."
    	'fold.cpp.explicit.end'         :2: ""                               "The string to use for explicit fold end points, replacing the standard //}."
    	'fold.cpp.explicit.anywhere'    :0: ""                               "Set this property to 1 to enable explicit fold points anywhere, not just in line comments."
    	'fold.cpp.preprocessor.at.else' :0: ""                               "This option enables folding on a preprocessor #else or #endif line of an #if statement."
    	'fold.preprocessor'             :0: 1                                "This option enables folding preprocessor directives when using the C++ lexer. Includes C#'s explicit #region and #endregion folding directives."
    	'fold.compact'                  :0: 0                                ""
    	'fold.at.else'                  :0: ""                               "This option enables C++ folding on a "} else {" line of an if statement."
    JSON                                json                                 JSON file                          
    	'lexer.json.escape.sequence'    :0: ""                               "Set to 1 to enable highlighting of escape sequences in strings"
    	'lexer.json.allow.comments'     :0: ""                               "Set to 1 to enable highlighting of line/block comments in JSON"
    	'fold.compact'                  :0: 0                                ""
    	'fold'                          :0: 1                                ""
    ...
    PERL                                Perl                                 Perl source file                   
    	'fold'                          :0: 1                                ""
    	'fold.comment'                  :0: 1                                ""
    	'fold.compact'                  :0: 0                                ""
    	'fold.perl.pod'                 :0: ""                               "Set to 0 to disable folding Pod blocks when using the Perl lexer."
    	'fold.perl.package'             :0: ""                               "Set to 0 to disable folding packages when using the Perl lexer."
    	'fold.perl.comment.explicit'    :0: ""                               "Set to 0 to disable explicit folding."
    	'fold.perl.at.else'             :0: ""                               "This option enables Perl folding on a "} else {" line of an if statement."
    ...
    PYTHON                              Python                               Python file                        
    	'tab.timmy.whinge.level'        :1: ""                               "For Python code, checks whether indenting is consistent. The default, 0 turns off indentation checking, 1 checks whether each line is potentially inconsistent with the previous line, 2 checks whether any space characters occur before a tab character in the indentation, 3 checks whether any spaces are in the indentation, and 4 checks for any tab characters in the indentation. 1 is a good level to use."
    	'lexer.python.literals.binary'  :0: ""                               "Set to 0 to not recognise Python 3 binary and octal literals: 0b1011 0o712."
    	'lexer.python.strings.u'        :0: ""                               "Set to 0 to not recognise Python Unicode literals u"x" as used before Python 3."
    	'lexer.python.strings.b'        :0: ""                               "Set to 0 to not recognise Python 3 bytes literals b"x"."
    	'lexer.python.strings.f'        :0: ""                               "Set to 0 to not recognise Python 3.6 f-string literals f"var={var}"."
    	'lexer.python.strings.over.newli:0: ""                               "Set to 1 to allow strings to span newline characters."
    	'lexer.python.keywords2.no.sub.i:0: ""                               "When enabled, it will not style keywords2 items that are used as a sub-identifier. Example: when set, will not highlight "foo.open" when "open" is a keywords2 item."
    	'fold'                          :0: 1                                ""
    	'fold.quotes.python'            :0: 1                                "This option enables folding multi-line quoted strings when using the Python lexer."
    	'fold.compact'                  :0: 0                                ""
    	'lexer.python.unicode.identifier:0: ""                               "Set to 0 to not recognise Python 3 unicode identifiers."
    ...
    

    (it was too long, so I pared down results



  • @PeterJones

    oopss - I’m surprised to see that much info.
    Boost::python seems to create the classes with names and values attributes,
    which itself do return a dict and this can be iterated.
    Something like for t in LANGTYPE.values.values():
    should do the job.
    The first values creates the dictionary(hash) and the second a list of LANGTYPE enum members.



  • By the way, what always works, but is banned by most of the python programmers
    is using eval function. So something like

    eval('LANGTYPE.{}'.format('ADA'))
    

    will return the LANGTYPE.ADA object.



  • @Ekopalypse said in Help with JSON and Scintilla Lexer:

    Something like for t in LANGTYPE.values.values():

    Yep. Exactly that syntax worked for me. Much cleaner – and now my saved script will still work in future versions with new LANGTYPEs. :-)

    Since the first worked, I didn’t bother with the eval, though that’s a good alternative. Also, for parsing the dir(...) strings, I would have had to eliminate the various helper methods, too, which I don’t have to do with the LANGTYPE.values.values() syntax, so all-in-all, values were better. :-)



  • @Tyree-Bingham ,

    Sorry for hijacking your thread. In the first few responses, we showed a few of the scripting-languages’ methods for changing that setting. If you need more help, let us know which scripting language you chose, and where you’re having trouble.



  • @PeterJones said in Help with JSON and Scintilla Lexer:

    Good timing. I just finished a script to explore this:

    As the author of “PerlScript” why don’t you post Perl solutions ??? :-)

    Pretty quick but roughly Perl equivalent of the PythonScript you posted:

    #!perl
    
    use strict;
    use warnings;
    
    use Win32::Mechanize::NotepadPlusPlus ':all';
    
    my $keep = notepad->getLangType();
    
    for my $t ( sort ( keys ( %LANGTYPE ) ) ) {
        my $n = notepad->getLanguageName($LANGTYPE{$t});
        my $d = notepad->getLanguageDesc($LANGTYPE{$t});
        printf "%-35s %-35s %-35s\n", $t, $n, $d;
    
        notepad->setLangType($LANGTYPE{$t});
        for my $s ( split "\n", editor->propertyNames() ) {
            my $v = editor->getProperty($s);
            printf "\t'%-32s' :%i: %-32s %s\n" , $s, editor->propertyType($s), '"' . $v . '"', editor->describeProperty($s);
        }
        print "\n";
    }
    
    notepad->setLangType($keep);
    

Log in to reply