HTML & User-defined keywords
-
@mpheath said in HTML & User-defined keywords:
Notepad++ joins Default keywords and User-defined keywords together and then calls SCI_SETKEYWORDS with the single keyword list for Lexilla to use.
Sorry I didn’t give the full details of my experiment when I was answering a newb OP. But since you want those details:
My experiment was to edit the
intre1
keyword list for HTML in Notepad++'slangs.xml
, which Notepad++ sends to LexHTML as the first keyword list. And even after restarting, it would not lex my new tag.So then I went to SciTE, and edited
html.properties
, and added my dummy tag right afterxmp
in the list of keywords (I tried eitherxmpPeter
orzzzPeter
, so both were alphabetical, even, though that shouldn’t be necessary), and after restarting SciTE, it would syntax highlight<xmp>a</xmp>
as a valid tag, but showed<xmpPeter>a</xmpPeter>
or<zzzPeter>a</zzzPeter>
as an invalid tag. So changing the keyword list that Lexilla sees does not change what it considers a valid tag.As far as I can tell, a user of the Lexilla HTML lexer cannot customize the keyword list at all, and have it affect whether Lexilla thinks a given HTML tag is valid or not. Thus, I used the non-explicit phrasing to the OP that you “cannot send user-defined keywords for HTML” tags to Lexilla.
(I justify that phrasing: even when I, as a user, define the keyword list in the Scintilla/Lexilla testbed SciTE – hence, it’s user-defined – it will not change its lexing of HTML tags based on that list.)Hence my conclusion that it’s something that needs to be fixed in Lexilla, not Notepad++.
-
@PeterJones said in HTML & User-defined keywords:
Sorry I didn’t give the full details of my experiment when I was answering a newb OP. But since you want those details:
The fine details have been absent since the OP. Any conclusion with such little details is not assured to be conclusive. So, thanks for providing some of your details for me to investigate further.
My experiment was to edit the
intre1
keyword list for HTML in Notepad++'slangs.xml
, which Notepad++ sends to LexHTML as the first keyword list. And even after restarting, it would not lex my new tag.So then I went to SciTE, and edited
html.properties
, and added my dummy tag right afterxmp
in the list of keywords (I tried eitherxmpPeter
orzzzPeter
, so both were alphabetical, even, though that shouldn’t be necessary), and after restarting SciTE, it would syntax highlight<xmp>a</xmp>
as a valid tag, but showed<xmpPeter>a</xmpPeter>
or<zzzPeter>a</zzzPeter>
as an invalid tag. So changing the keyword list that Lexilla sees does not change what it considers a valid tag.Keywords need to be lowercase. See SciTE’s html.properties comment:
# All hypertext elements and attributes must be listed in lower case
xmpPeter
orzzzPeter
will not be recognized as a keyword if is not lowercase in the keyword list. Insert them asxmppeter
orzzzpeter
.P
is a differertchar
top
.With
xmppeter
added to SciTE’s keyword list namedhypertext.elements
and this is the result:xmpnotexist
is not in the keyword list so gets the red error style for being an unrecognized keyword.In Notepad++ with a modified langs.xml and using default theme:
Similar to SciTE though lacking the red error style. Appears some themes have unrecognized keywords as red or similar and some do not.
Hence my conclusion that it’s something that needs to be fixed in Lexilla, not Notepad++.
The experiment that you have done was flawed so your conclusion needs adjustment.
-
Here is ScintillaOrg/lexilla’s reply:
nyamatongwe commented May 3, 2024
Recent release 5.3.2 included:HTML: Implement substyles for tags, attributes, and identifiers SCE_H_TAG, SCE_H_ATTRIBUTE, SCE_HJ_WORD, SCE_HJA_WORD, SCE_HB_WORD, SCE_HP_WORD, SCE_HPHP_WORD. HTML: Implement context-sensitive attributes. "tag.attribute" matches "attribute" only inside "tag".
I suppose that’s understandable for you.
-
@mpheath said in HTML & User-defined keywords:
Keywords need to be lowercase.
# All hypertext elements and attributes must be listed in lower case
Interesting design choice on their part, especially since that restriction isn’t in most of the lexers. For example, they could have avoided it if here they had skipped the GetRangeLowered() and used InListCaseInsensitive() instead of InList(). (Update: in their defense, maybe, when the HTML lexer was written, the case-insensitive version of the method didn’t exist, and they just never thought of or bothered making it work with any-case keywords after that method was introduced.) But yes, I missed that comment when doing my experiment, and I just had the poor luck of picking an edge case for my experiment.
So @DomOBU could just add the lowercase custom keywords into
langs.xml
; or, could addkeywordClass="instre1"
attribute to the<WordsStyle name="TAG" styleID="1"... />
in stylers.xml or the activethemes\*.xml
, which after a restart would put the keyword and user-defined-keyword boxes in the Style Configurator > Language:HTML
> Style:TAG
for easily adding the keywords in the GUI. -
There are 3 comparisons in the
else
section of code.InListCaseInsensitive()
could be used though there is comparing withtype
andlanguage
that needs case insensitive compare. So the choice of usingGetRangeLowered()
is more beneficial as it lowers the case once.I see keyword lists as lowercase and I have seen some mixed case and some uppercase. Percentage wise, the lowercase might dominate though I do not want to waste time counting.
Indeed luck was not on your side for getting the case correct. It can happen to anyone. I learnt some hard lessons in the past which is why the fine details do matter with these technical subjects. Taking things for granted and failing can make a person slap themselves in disappontment (ouch!).
… could add keywordClass=“instre1”…
Adds the feature to Gui though it does not style the tags. May need a keyword array to be accessed by HTML lexer setup to make use of User-defined keywords. At a review so far, seems makeStyle builds an array though is it for builtin lexers or external lexers. not sure yet. Might be a way already to access the feature or perhaps needs to be custom coded. Like I do not see the
inno
lexer setup in plain sight and it has User-defined keywords so might be a automatic builtin method to achieve this for HTML. Would not know how to answer until a working compile is done with the changes needed. -
@DomOBU said in HTML & User-defined keywords:
HTML: Implement substyles for tags, attributes, and identifiers SCE_H_TAG, SCE_H_ATTRIBUTE, SCE_HJ_WORD, SCE_HJA_WORD, SCE_HB_WORD, SCE_HP_WORD, SCE_HPHP_WORD. HTML: Implement context-sensitive attributes. "tag.attribute" matches "attribute" only inside "tag".
I suppose that’s understandable for you.
It is an enhancement from last release of Lexilla. Consider
p.attrib
, so attributeattrib
can be matched inside ap
tag, AFAIK without any testing done to prove this.When you ask someone who does not understand exactly what you are talking about, you may get a reply that you do not understand. User-defined keywords is a Notepad++ feature that is not related directly with Lexilla.
-
@PeterJones Might be able to create a similar method as the embedded languges.
diff --git a/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp b/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp index 2cbe46d1..223b60d2 100644 --- a/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp +++ b/PowerEditor/src/ScintillaComponent/ScintillaEditView.cpp @@ -828,13 +828,8 @@ void ScintillaEditView::setXmlLexer(LangType type) else if ((type == L_HTML) || (type == L_PHP) || (type == L_ASP) || (type == L_JSP)) { setLexerFromLangID(L_HTML); - const TCHAR *htmlKeyWords_generic = NppParameters::getInstance().getWordList(L_HTML, LANG_INDEX_INSTR); - - WcharMbcsConvertor& wmc = WcharMbcsConvertor::getInstance(); - const char *htmlKeyWords = wmc.wchar2char(htmlKeyWords_generic, CP_ACP); - execute(SCI_SETKEYWORDS, 0, reinterpret_cast<LPARAM>(htmlKeyWords?htmlKeyWords:"")); - makeStyle(L_HTML); + setHTMLLexer(); setEmbeddedJSLexer(); setEmbeddedPhpLexer(); setEmbeddedAspLexer(); @@ -846,6 +841,21 @@ void ScintillaEditView::setXmlLexer(LangType type) execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("fold.hypertext.comment"), reinterpret_cast<LPARAM>("1")); } +void ScintillaEditView::setHTMLLexer() +{ + const TCHAR *pKwArray[10] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL}; + makeStyle(L_HTML, pKwArray); + + basic_string<char> keywordList(""); + if (pKwArray[LANG_INDEX_INSTR]) + { + basic_string<wchar_t> kwlW = pKwArray[LANG_INDEX_INSTR]; + keywordList = wstring2string(kwlW, CP_ACP); + } + + execute(SCI_SETKEYWORDS, 1, reinterpret_cast<LPARAM>(getCompleteKeywordList(keywordList, L_HTML, LANG_INDEX_INSTR))); +} + void ScintillaEditView::setEmbeddedJSLexer() { const TCHAR *pKwArray[10] = {NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL}; diff --git a/PowerEditor/src/ScintillaComponent/ScintillaEditView.h b/PowerEditor/src/ScintillaComponent/ScintillaEditView.h index db3c9555..4ec35f3e 100644 --- a/PowerEditor/src/ScintillaComponent/ScintillaEditView.h +++ b/PowerEditor/src/ScintillaComponent/ScintillaEditView.h @@ -827,6 +827,7 @@ protected: //Complex lexers (same lexer, different language) void setXmlLexer(LangType type); void setCppLexer(LangType type); + void setHTMLLexer(); void setJsLexer(); void setTclLexer(); void setObjCLexer(LangType type); diff --git a/PowerEditor/src/stylers.model.xml b/PowerEditor/src/stylers.model.xml index 04d26994..18d8726e 100644 --- a/PowerEditor/src/stylers.model.xml +++ b/PowerEditor/src/stylers.model.xml @@ -569,7 +569,7 @@ <WordsStyle name="NUMBER" styleID="5" fgColor="FF0000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" /> <WordsStyle name="DOUBLE STRING" styleID="6" fgColor="8000FF" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="" /> <WordsStyle name="SINGLE STRING" styleID="7" fgColor="8000FF" bgColor="FFFFFF" fontName="" fontStyle="1" fontSize="" /> - <WordsStyle name="TAG" styleID="1" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" /> + <WordsStyle name="TAG" styleID="1" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" keywordClass="instre1"/> <WordsStyle name="TAG END" styleID="11" fgColor="0000FF" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" /> <WordsStyle name="TAG UNKNOWN" styleID="2" fgColor="000000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" /> <WordsStyle name="ATTRIBUTE" styleID="3" fgColor="FF0000" bgColor="FFFFFF" fontName="" fontStyle="0" fontSize="" />
All the other theme files would need to updated too. Seems to work ok. You can build on this if you regard as ok and do the PR.
Not sure if theWcharMbcsConvertor::getInstance()
line removed is needed. It is not used on the embedded languages or surrounding languages. -
Before using the “User-defined keywords” functionality with CSS, it was suggested that I add the missing properties to an xml file and this worked. However, I don’t see this as end-user functionality.
The ‘User-defined keywords’ window is more ‘end-user’.
-
@mpheath said in HTML & User-defined keywords:
Might be able to create a similar method as the embedded languges.
Thanks for that example code.
Not sure if the
WcharMbcsConvertor::getInstance()
line removed is needed.getCompleteKeywordList()
contains thegetInstance()
, so we don’t need one in thesetHTMLLexer()
.I was able to get it confirmed and working (including adding it to all the themes, not just default
stylers.model.xml
).Created issue 15093 and PR 15094
-
Thanks for the solution, which will be implemented in v8.6.7.