Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?
-
Hello,
ich use the function list to navigate in my structured CSS files.
The regex to fetch all comments in the format
/** comment */I use is
\/\*\*[\*]?[a-zA-Z0-9äöüÄÖÜß -]+\*\/but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.Does anyone know a solution?
-
@stephan-romhart said in Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?:
but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.
Does anyone know a solution?I recall some posts from @guy038 on character ranges etc and in trying to locate them I have found an online application he mentioned at this website.
I copied through your ß character and pressed “Go” which then told me the hex code is 00DF. So using the Find function and typing\xdflocates that character. I eventually found one of his posts https://community.notepad-plus-plus.org/topic/20595/examining-a-character/10 which might be useful reading for you.
The other characters are in a similar range, \xe4, \xf6, \xfc, \xc4, \xd6, \xdc.So if you were to replace the individual characters in your “set” like follows it should work.
\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/
There are other changes also possible as you are escaping the/when it is not needed. The-at the end of the set might need escaping as it normally means a range and as the space is ahead of it that could be thought of as a start of a range (space to ??).I’m hoping by referencing @guy038 he will elaborate more fully, since he has a very good appreciation of these things. Being a native English speaker I haven’t been exposed to these “extended characters”. He hopefully can also verify if the last character in the set
-needs “escaping” in this position.I do also think a much simpler regex such as
(?-s)^/\*+.+?\*/would work, unless you specifically need to look for ONLY the characters within your set, and exclude others. The(?-s)^limits it to 1 line ONLY and to find the first character (/) at column 1 (^). The.+?is lazy so it will only select characters until the first*/following is found.Terry
-
Hello, @stephan-romhart, @terry-r and All,
From the definition of an
CSScomment :https://www.w3schools.com/css/css_comments.asp
I suppose that the following general regex, inside a
commentExprattribute, should work in yourcss.xmldefinition file :<parser displayName="CSS" id ="css_syntax" commentExpr="(?s)/\*.+?\*/" > ....This regex will match any single-line or multi-line
CSScomment !I think we shouldn’t bother with specific characters in
CSScomments ;-))
Now, you said :
All comments with Umlaute wont show, all other will.
Well, I’m not sure about your overall comprehension of the
Function Listfeature. Indeed, from :https://community.notepad-plus-plus.org/topic/19480/faq-desk-function-list-basics ( See main points
2and4of the parse steps, at beginning of the topic )And , from the
Function Listtutorial, below ( Old N++ site, before Sept 19, 2019 ) :https://web.archive.org/web/20190826024431/https://notepad-plus-plus.org/features/function-list.html
Where it is said :
comment: Optional. you can make a RE in this attribute in order to identify comment zones. The identified zones will be ignored by search.My understanding is that the
commentExprattribute is used to define range(s) of characters where theFunction Listfeature will not look for anyclassorfunction / methodblock !!Best Regards
guy038
P.S. :
To non-German people, the
UmlautGerman letters are :•-----------•--------------•--------------•----------•-----------• | Character | Substitution | ANSI (*) | UNICODE | HTML | •-----------•--------------•--------------•----------•-----------• | ä | ae | Alt + 0228 | 00E4 | ä | | ö | oe | Alt + 0246 | 00F6 | ö | | ü | ue | Alt + 0252 | 00FC | ü | •-----------•--------------•--------------•----------•-----------• | ß | ss | Alt + 0223 | 00DF | ß | •-----------•--------------•--------------•----------•-----------• | Ä | Ae | Alt + 0196 | 00C4 | Ä | | Ö | Oe | Alt + 0214 | 00D6 | Ö | | Ü | Ue | Alt + 0220 | 00DC | Ü | •-----------•--------------•--------------•----------•-----------•(*) In
Win-1250,Win-1252,Win-1254,Win-1257orWin-1258encodings -
Thank you @Terry-R and @guy038
the solution to use unicode-chars in the regex works like a charm.
Here my complete code in case of some one else is searching for a solution to use the functionslist as a CSS comment overview:
<NotepadPlus> <functionList> <!-- ================================================ [ CSS ] --> <parser id="css_comment" displayName="CSS" commentExpr=""> <function mainExpr="\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/" displayMode="$functionName"> <functionName> <nameExpr expr="[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+" /> </functionName> </function> </parser> </functionList> </NotepadPlus>Stephan
-
Hi, @stephan-romhart, @terry-r and All,
Oooooh, Stephan, I see ! Actually, it’s the comments, themselves, that are the objects to look for ;-))
Then, just for your information, here is a shorter version of the two regexes, within the
css_comment.xmlfile contents :<?xml version="1.0" encoding="UTF-8" ?> <NotepadPlus> <functionList> <!-- ================================================ [ CSS comments ] --> <parser id="css_comment" displayName="CSS" commentExpr=""> <function mainExpr="/\*{2,3}(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+\*/" displayMode="$functionName"> <functionName> <nameExpr expr="(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+" /> </functionName> </function> </parser> </functionList> </NotepadPlus>With the added line, in the configuration file
overrideMap.xml:<association id= "css_comment.xml" langID= "20"/>
As the default
CSSassociation is :<!-- <association id= "css.xml" langID= "20"/> -->Note that the added line, in the
overrideMap.xmlfile, is not mandatory. But, in that case, you must rename thecss_comment.xmlfile ascss.xml!BR
guy038