Community
    • Login

    HTML & User-defined keywords

    Scheduled Pinned Locked Moved General Discussion
    13 Posts 3 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mpheathM
      mpheath @PeterJones
      last edited by

      @PeterJones said in HTML & User-defined keywords:

      Lexilla doesn’t include user-defined keywords for HTML.

      There are user-defined keyword lists for Lexilla lexers.

      Lexilla does not handle User-defined keywords for any lexer. User-defined keywords is a Notepad++ extra special feature. Notepad++ joins Default keywords and User-defined keywords together and then calls SCI_SETKEYWORDS with the single keyword list for Lexilla to use. If you look at stylers.xml, you may see keywordClass attributes of which those tags can save User-defined keywords. The joining function of Default keywords and User-defined keywords is here. That cpp file that you may recognize is where lexers are set and keywords lists are set which is under Notepad++'s source control. Convincing Lexilla with a posted issue appears futile. So, another day, another learning experience I guess. :)

      PeterJonesP 1 Reply Last reply Reply Quote 2
      • PeterJonesP
        PeterJones @mpheath
        last edited by PeterJones

        @mpheath said in HTML & User-defined keywords:

        Notepad++ joins Default keywords and User-defined keywords together and then calls SCI_SETKEYWORDS with the single keyword list for Lexilla to use.

        Sorry I didn’t give the full details of my experiment when I was answering a newb OP. But since you want those details:

        My experiment was to edit the intre1 keyword list for HTML in Notepad++'s langs.xml, which Notepad++ sends to LexHTML as the first keyword list. And even after restarting, it would not lex my new tag.

        So then I went to SciTE, and edited html.properties, and added my dummy tag right after xmp in the list of keywords (I tried either xmpPeter or zzzPeter, so both were alphabetical, even, though that shouldn’t be necessary), and after restarting SciTE, it would syntax highlight <xmp>a</xmp> as a valid tag, but showed <xmpPeter>a</xmpPeter> or <zzzPeter>a</zzzPeter> as an invalid tag. So changing the keyword list that Lexilla sees does not change what it considers a valid tag.

        As far as I can tell, a user of the Lexilla HTML lexer cannot customize the keyword list at all, and have it affect whether Lexilla thinks a given HTML tag is valid or not. Thus, I used the non-explicit phrasing to the OP that you “cannot send user-defined keywords for HTML” tags to Lexilla.
        (I justify that phrasing: even when I, as a user, define the keyword list in the Scintilla/Lexilla testbed SciTE – hence, it’s user-defined – it will not change its lexing of HTML tags based on that list.)

        Hence my conclusion that it’s something that needs to be fixed in Lexilla, not Notepad++.

        mpheathM 1 Reply Last reply Reply Quote 1
        • mpheathM
          mpheath @PeterJones
          last edited by mpheath

          @PeterJones said in HTML & User-defined keywords:

          Sorry I didn’t give the full details of my experiment when I was answering a newb OP. But since you want those details:

          The fine details have been absent since the OP. Any conclusion with such little details is not assured to be conclusive. So, thanks for providing some of your details for me to investigate further.

          My experiment was to edit the intre1 keyword list for HTML in Notepad++'s langs.xml, which Notepad++ sends to LexHTML as the first keyword list. And even after restarting, it would not lex my new tag.

          So then I went to SciTE, and edited html.properties, and added my dummy tag right after xmp in the list of keywords (I tried either xmpPeter or zzzPeter, so both were alphabetical, even, though that shouldn’t be necessary), and after restarting SciTE, it would syntax highlight <xmp>a</xmp> as a valid tag, but showed <xmpPeter>a</xmpPeter> or <zzzPeter>a</zzzPeter> as an invalid tag. So changing the keyword list that Lexilla sees does not change what it considers a valid tag.

          Keywords need to be lowercase. See SciTE’s html.properties comment:

           # All hypertext elements and attributes must be listed in lower case
          

          xmpPeter or zzzPeter will not be recognized as a keyword if is not lowercase in the keyword list. Insert them as xmppeter or zzzpeter. P is a differert char to p.

          With xmppeter added to SciTE’s keyword list named hypertext.elements and this is the result:

          xmpPeter.png

          xmpnotexist is not in the keyword list so gets the red error style for being an unrecognized keyword.

          In Notepad++ with a modified langs.xml and using default theme:

          xmpPeterNpp.png

          Similar to SciTE though lacking the red error style. Appears some themes have unrecognized keywords as red or similar and some do not.

          Hence my conclusion that it’s something that needs to be fixed in Lexilla, not Notepad++.

          The experiment that you have done was flawed so your conclusion needs adjustment.

          DomOBUD PeterJonesP 2 Replies Last reply Reply Quote 1
          • DomOBUD
            DomOBU @mpheath
            last edited by

            @mpheath
            @PeterJones

            Here is ScintillaOrg/lexilla’s reply:

            nyamatongwe commented May 3, 2024
            Recent release 5.3.2 included:

            HTML: Implement substyles for tags, attributes, and identifiers SCE_H_TAG, SCE_H_ATTRIBUTE, SCE_HJ_WORD, SCE_HJA_WORD, SCE_HB_WORD, SCE_HP_WORD, SCE_HPHP_WORD.
            HTML: Implement context-sensitive attributes. "tag.attribute" matches "attribute" only inside "tag".
            

            I suppose that’s understandable for you.

            mpheathM 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @mpheath
              last edited by PeterJones

              @mpheath said in HTML & User-defined keywords:

              Keywords need to be lowercase.

               # All hypertext elements and attributes must be listed in lower case
              

              Interesting design choice on their part, especially since that restriction isn’t in most of the lexers. For example, they could have avoided it if here they had skipped the GetRangeLowered() and used InListCaseInsensitive() instead of InList(). (Update: in their defense, maybe, when the HTML lexer was written, the case-insensitive version of the method didn’t exist, and they just never thought of or bothered making it work with any-case keywords after that method was introduced.) But yes, I missed that comment when doing my experiment, and I just had the poor luck of picking an edge case for my experiment.

              So @DomOBU could just add the lowercase custom keywords into langs.xml; or, could add keywordClass="instre1" attribute to the <WordsStyle name="TAG" styleID="1"... /> in stylers.xml or the active themes\*.xml, which after a restart would put the keyword and user-defined-keyword boxes in the Style Configurator > Language: HTML > Style: TAG for easily adding the keywords in the GUI.

              mpheathM 2 Replies Last reply Reply Quote 0
              • mpheathM
                mpheath @PeterJones
                last edited by mpheath

                @PeterJones

                There are 3 comparisons in the else section of code. InListCaseInsensitive() could be used though there is comparing with type and language that needs case insensitive compare. So the choice of using GetRangeLowered() is more beneficial as it lowers the case once.

                I see keyword lists as lowercase and I have seen some mixed case and some uppercase. Percentage wise, the lowercase might dominate though I do not want to waste time counting.

                Indeed luck was not on your side for getting the case correct. It can happen to anyone. I learnt some hard lessons in the past which is why the fine details do matter with these technical subjects. Taking things for granted and failing can make a person slap themselves in disappontment (ouch!).

                … could add keywordClass=“instre1”…

                Adds the feature to Gui though it does not style the tags. May need a keyword array to be accessed by HTML lexer setup to make use of User-defined keywords. At a review so far, seems makeStyle builds an array though is it for builtin lexers or external lexers. not sure yet. Might be a way already to access the feature or perhaps needs to be custom coded. Like I do not see the inno lexer setup in plain sight and it has User-defined keywords so might be a automatic builtin method to achieve this for HTML. Would not know how to answer until a working compile is done with the changes needed.

                1 Reply Last reply Reply Quote 0
                • mpheathM
                  mpheath @DomOBU
                  last edited by

                  @DomOBU said in HTML & User-defined keywords:

                  HTML: Implement substyles for tags, attributes, and identifiers SCE_H_TAG, SCE_H_ATTRIBUTE, SCE_HJ_WORD, SCE_HJA_WORD, SCE_HB_WORD, SCE_HP_WORD, SCE_HPHP_WORD.
                  HTML: Implement context-sensitive attributes. "tag.attribute" matches "attribute" only inside "tag".
                  

                  I suppose that’s understandable for you.

                  It is an enhancement from last release of Lexilla. Consider p.attrib, so attribute attrib can be matched inside a p tag, AFAIK without any testing done to prove this.

                  When you ask someone who does not understand exactly what you are talking about, you may get a reply that you do not understand. User-defined keywords is a Notepad++ feature that is not related directly with Lexilla.

                  1 Reply Last reply Reply Quote 0
                  • mpheathM
                    mpheath @PeterJones
                    last edited by

                    @PeterJones Might be able to create a similar method as the embedded languges.

                    diff --git a/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp b/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp
                    index 2cbe46d1..223b60d2 100644
                    --- a/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp
                    +++ b/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp
                    @@ -828,13 +828,8 @@ void ScintillaEditView::setXmlLexer(LangType type)
                     	else if ((type == L_HTML) || (type == L_PHP) || (type == L_ASP) || (type == L_JSP))
                     	{
                     		setLexerFromLangID(L_HTML);
                    -		const TCHAR *htmlKeyWords_generic = NppParameters::getInstance().getWordList(L_HTML, LANG_INDEX_INSTR);
                    -
                    -		WcharMbcsConvertor& wmc = WcharMbcsConvertor::getInstance();
                    -		const char *htmlKeyWords = wmc.wchar2char(htmlKeyWords_generic, CP_ACP);
                    -		execute(SCI_SETKEYWORDS, 0, reinterpret_cast<LPARAM>(htmlKeyWords?htmlKeyWords:""));
                    -		makeStyle(L_HTML);
                     
                    +		setHTMLLexer();
                             setEmbeddedJSLexer();
                             setEmbeddedPhpLexer();
                     		setEmbeddedAspLexer();
                    @@ -846,6 +841,21 @@ void ScintillaEditView::setXmlLexer(LangType type)
                     	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("fold.hypertext.comment"), reinterpret_cast<LPARAM>("1"));
                     }
                     
                    +void ScintillaEditView::setHTMLLexer()
                    +{
                    +	const TCHAR *pKwArray[10] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL};
                    +	makeStyle(L_HTML, pKwArray);
                    +
                    +	basic_string<char> keywordList("");
                    +	if (pKwArray[LANG_INDEX_INSTR])
                    +	{
                    +		basic_string<wchar_t> kwlW = pKwArray[LANG_INDEX_INSTR];
                    +		keywordList = wstring2string(kwlW, CP_ACP);
                    +	}
                    +
                    +	execute(SCI_SETKEYWORDS, 1, reinterpret_cast<LPARAM>(getCompleteKeywordList(keywordList, L_HTML, LANG_INDEX_INSTR)));
                    +}
                    +
                     void ScintillaEditView::setEmbeddedJSLexer()
                     {
                     	const TCHAR *pKwArray[10] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL};
                    diff --git a/PowerEditor/src/ScintillaComponent/ScintillaEditView.h b/PowerEditor/src/ScintillaComponent/ScintillaEditView.h
                    index db3c9555..4ec35f3e 100644
                    --- a/PowerEditor/src/ScintillaComponent/ScintillaEditView.h
                    +++ b/PowerEditor/src/ScintillaComponent/ScintillaEditView.h
                    @@ -827,6 +827,7 @@ protected:
                     	//Complex lexers (same lexer, different language)
                     	void setXmlLexer(LangType type);
                      	void setCppLexer(LangType type);
                    +	void setHTMLLexer();
                     	void setJsLexer();
                     	void setTclLexer();
                         void setObjCLexer(LangType type);
                    diff --git a/PowerEditor/src/stylers.model.xml b/PowerEditor/src/stylers.model.xml
                    index 04d26994..18d8726e 100644
                    --- a/PowerEditor/src/stylers.model.xml
                    +++ b/PowerEditor/src/stylers.model.xml
                    @@ -569,7 +569,7 @@
                                 <WordsStyle name="NUMBER" styleID="5" fgColor="FF0000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
                                 <WordsStyle name="DOUBLE STRING" styleID="6" fgColor="8000FF" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="" />
                                 <WordsStyle name="SINGLE STRING" styleID="7" fgColor="8000FF" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="" />
                    -            <WordsStyle name="TAG" styleID="1" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
                    +            <WordsStyle name="TAG" styleID="1" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" keywordClass="instre1"/>
                                 <WordsStyle name="TAG END" styleID="11" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
                                 <WordsStyle name="TAG UNKNOWN" styleID="2" fgColor="000000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
                                 <WordsStyle name="ATTRIBUTE" styleID="3" fgColor="FF0000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
                    
                    

                    All the other theme files would need to updated too. Seems to work ok. You can build on this if you regard as ok and do the PR.
                    Not sure if the WcharMbcsConvertor::getInstance() line removed is needed. It is not used on the embedded languages or surrounding languages.

                    npplexhtml.png

                    DomOBUD PeterJonesP 2 Replies Last reply Reply Quote 1
                    • DomOBUD
                      DomOBU @mpheath
                      last edited by

                      Before using the “User-defined keywords” functionality with CSS, it was suggested that I add the missing properties to an xml file and this worked. However, I don’t see this as end-user functionality.

                      The ‘User-defined keywords’ window is more ‘end-user’.

                      1 Reply Last reply Reply Quote 0
                      • PeterJonesP
                        PeterJones @mpheath
                        last edited by

                        @mpheath said in HTML & User-defined keywords:

                        Might be able to create a similar method as the embedded languges.

                        Thanks for that example code.

                        Not sure if the WcharMbcsConvertor::getInstance() line removed is needed.

                        getCompleteKeywordList() contains the getInstance(), so we don’t need one in the setHTMLLexer().

                        I was able to get it confirmed and working (including adding it to all the themes, not just default stylers.model.xml).

                        Created issue 15093 and PR 15094

                        DomOBUD 1 Reply Last reply Reply Quote 1
                        • DomOBUD
                          DomOBU @PeterJones
                          last edited by

                          @PeterJones
                          @mpheath

                          Thanks for the solution, which will be implemented in v8.6.7.

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors