Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?
-
Hello,
ich use the function list to navigate in my structured CSS files.
The regex to fetch all comments in the format
/** comment */
I use is
\/\*\*[\*]?[a-zA-Z0-9äöüÄÖÜß -]+\*\/
but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.Does anyone know a solution?
-
@stephan-romhart said in Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?:
but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.
Does anyone know a solution?I recall some posts from @guy038 on character ranges etc and in trying to locate them I have found an online application he mentioned at this website.
I copied through your ß character and pressed “Go” which then told me the hex code is 00DF. So using the Find function and typing\xdf
locates that character. I eventually found one of his posts https://community.notepad-plus-plus.org/topic/20595/examining-a-character/10 which might be useful reading for you.
The other characters are in a similar range, \xe4, \xf6, \xfc, \xc4, \xd6, \xdc.So if you were to replace the individual characters in your “set” like follows it should work.
\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/
There are other changes also possible as you are escaping the/
when it is not needed. The-
at the end of the set might need escaping as it normally means a range and as the space is ahead of it that could be thought of as a start of a range (space to ??).I’m hoping by referencing @guy038 he will elaborate more fully, since he has a very good appreciation of these things. Being a native English speaker I haven’t been exposed to these “extended characters”. He hopefully can also verify if the last character in the set
-
needs “escaping” in this position.I do also think a much simpler regex such as
(?-s)^/\*+.+?\*/
would work, unless you specifically need to look for ONLY the characters within your set, and exclude others. The(?-s)^
limits it to 1 line ONLY and to find the first character (/
) at column 1 (^
). The.+?
is lazy so it will only select characters until the first*/
following is found.Terry
-
Hello, @stephan-romhart, @terry-r and All,
From the definition of an
CSS
comment :https://www.w3schools.com/css/css_comments.asp
I suppose that the following general regex, inside a
commentExpr
attribute, should work in yourcss.xml
definition file :<parser displayName="CSS" id ="css_syntax" commentExpr="(?s)/\*.+?\*/" > ....
This regex will match any single-line or multi-line
CSS
comment !I think we shouldn’t bother with specific characters in
CSS
comments ;-))
Now, you said :
All comments with Umlaute wont show, all other will.
Well, I’m not sure about your overall comprehension of the
Function List
feature. Indeed, from :https://community.notepad-plus-plus.org/topic/19480/faq-desk-function-list-basics ( See main points
2
and4
of the parse steps, at beginning of the topic )And , from the
Function List
tutorial, below ( Old N++ site, before Sept 19, 2019 ) :https://web.archive.org/web/20190826024431/https://notepad-plus-plus.org/features/function-list.html
Where it is said :
comment
: Optional. you can make a RE in this attribute in order to identify comment zones. The identified zones will be ignored by search.My understanding is that the
commentExpr
attribute is used to define range(s) of characters where theFunction List
feature will not look for anyclass
orfunction / method
block !!Best Regards
guy038
P.S. :
To non-German people, the
Umlaut
German letters are :•-----------•--------------•--------------•----------•-----------• | Character | Substitution | ANSI (*) | UNICODE | HTML | •-----------•--------------•--------------•----------•-----------• | ä | ae | Alt + 0228 | 00E4 | ä | | ö | oe | Alt + 0246 | 00F6 | ö | | ü | ue | Alt + 0252 | 00FC | ü | •-----------•--------------•--------------•----------•-----------• | ß | ss | Alt + 0223 | 00DF | ß | •-----------•--------------•--------------•----------•-----------• | Ä | Ae | Alt + 0196 | 00C4 | Ä | | Ö | Oe | Alt + 0214 | 00D6 | Ö | | Ü | Ue | Alt + 0220 | 00DC | Ü | •-----------•--------------•--------------•----------•-----------•
(*) In
Win-1250
,Win-1252
,Win-1254
,Win-1257
orWin-1258
encodings -
Thank you @Terry-R and @guy038
the solution to use unicode-chars in the regex works like a charm.
Here my complete code in case of some one else is searching for a solution to use the functionslist as a CSS comment overview:
<NotepadPlus> <functionList> <!-- ================================================ [ CSS ] --> <parser id="css_comment" displayName="CSS" commentExpr=""> <function mainExpr="\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/" displayMode="$functionName"> <functionName> <nameExpr expr="[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+" /> </functionName> </function> </parser> </functionList> </NotepadPlus>
Stephan
-
Hi, @stephan-romhart, @terry-r and All,
Oooooh, Stephan, I see ! Actually, it’s the comments, themselves, that are the objects to look for ;-))
Then, just for your information, here is a shorter version of the two regexes, within the
css_comment.xml
file contents :<?xml version="1.0" encoding="UTF-8" ?> <NotepadPlus> <functionList> <!-- ================================================ [ CSS comments ] --> <parser id="css_comment" displayName="CSS" commentExpr=""> <function mainExpr="/\*{2,3}(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+\*/" displayMode="$functionName"> <functionName> <nameExpr expr="(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+" /> </functionName> </function> </parser> </functionList> </NotepadPlus>
With the added line, in the configuration file
overrideMap.xml
:<association id= "css_comment.xml" langID= "20"/>
As the default
CSS
association is :<!-- <association id= "css.xml" langID= "20"/> -->
Note that the added line, in the
overrideMap.xml
file, is not mandatory. But, in that case, you must rename thecss_comment.xml
file ascss.xml
!BR
guy038