Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?

stephan-romhart

Hello,

ich use the function list to navigate in my structured CSS files.

The regex to fetch all comments in the format

/** comment */

I use is

\/\*\*[\*]?[a-zA-Z0-9äöüÄÖÜß -]+\*\/

but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.

Does anyone know a solution?

Terry R

@stephan-romhart said in Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?:

but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
All comments with Umlaute wont show, all other will.
Does anyone know a solution?

I recall some posts from @guy038 on character ranges etc and in trying to locate them I have found an online application he mentioned at this website.
I copied through your ß character and pressed “Go” which then told me the hex code is 00DF. So using the Find function and typing \xdf locates that character. I eventually found one of his posts https://community.notepad-plus-plus.org/topic/20595/examining-a-character/10 which might be useful reading for you.
The other characters are in a similar range, \xe4, \xf6, \xfc, \xc4, \xd6, \xdc.

So if you were to replace the individual characters in your “set” like follows it should work.
\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/
There are other changes also possible as you are escaping the / when it is not needed. The - at the end of the set might need escaping as it normally means a range and as the space is ahead of it that could be thought of as a start of a range (space to ??).

I’m hoping by referencing @guy038 he will elaborate more fully, since he has a very good appreciation of these things. Being a native English speaker I haven’t been exposed to these “extended characters”. He hopefully can also verify if the last character in the set - needs “escaping” in this position.

I do also think a much simpler regex such as (?-s)^/\*+.+?\*/ would work, unless you specifically need to look for ONLY the characters within your set, and exclude others. The (?-s)^ limits it to 1 line ONLY and to find the first character (/) at column 1 (^). The .+? is lazy so it will only select characters until the first */ following is found.

Terry

guy038

Hello, @stephan-romhart, @terry-r and All,

From the definition of an CSS comment :

https://www.w3schools.com/css/css_comments.asp

I suppose that the following general regex, inside a commentExpr attribute, should work in your css.xml definition file :

		<parser
			displayName="CSS"
			id         ="css_syntax"
			commentExpr="(?s)/\*.+?\*/"
		>
....

This regex will match any single-line or multi-line CSS comment !

I think we shouldn’t bother with specific characters in CSS comments ;-))

Now, you said :

All comments with Umlaute wont show, all other will.

Well, I’m not sure about your overall comprehension of the Function List feature. Indeed, from :

https://community.notepad-plus-plus.org/topic/19480/faq-desk-function-list-basics ( See main points 2 and 4 of the parse steps, at beginning of the topic )

And , from the Function List tutorial, below ( Old N++ site, before Sept 19, 2019 ) :

https://web.archive.org/web/20190826024431/https://notepad-plus-plus.org/features/function-list.html

Where it is said :

comment: Optional. you can make a RE in this attribute in order to identify comment zones. The identified zones will be ignored by search.

My understanding is that the commentExpr attribute is used to define range(s) of characters where the Function List feature will not look for any class or function / method block !!

Best Regards

guy038

P.S. :

To non-German people, the Umlaut German letters are :

•-----------•--------------•--------------•----------•-----------•
| Character | Substitution |   ANSI (*)   | UNICODE  |    HTML   |   
•-----------•--------------•--------------•----------•-----------•
|     ä     |      ae      |  Alt + 0228  |   00E4   |  &auml;   |
|     ö     |      oe      |  Alt + 0246  |   00F6   |  &ouml;   |
|     ü     |      ue      |  Alt + 0252  |   00FC   |  &uuml;   |
•-----------•--------------•--------------•----------•-----------•
|     ß     |      ss      |  Alt + 0223  |   00DF   |  &szlig;  |
•-----------•--------------•--------------•----------•-----------•
|     Ä     |      Ae      |  Alt + 0196  |   00C4   |  &Auml;   |
|     Ö     |      Oe      |  Alt + 0214  |   00D6   |  &Ouml;   |
|     Ü     |      Ue      |  Alt + 0220  |   00DC   |  &Uuml;   |
•-----------•--------------•--------------•----------•-----------•

(*) In Win-1250, Win-1252, Win-1254, Win-1257 or Win-1258 encodings

stephan-romhart

Thank you @Terry-R and @guy038

the solution to use unicode-chars in the regex works like a charm.

Here my complete code in case of some one else is searching for a solution to use the functionslist as a CSS comment overview:

<NotepadPlus>
	<functionList>
		<!-- ================================================ [ CSS ] -->
		<parser id="css_comment" displayName="CSS" commentExpr="">
			<function mainExpr="\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/" displayMode="$functionName">
				<functionName>
					<nameExpr expr="[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+" />
				</functionName>
			</function>
		</parser>
	</functionList>
</NotepadPlus>

Stephan

guy038

Hi, @stephan-romhart, @terry-r and All,

Oooooh, Stephan, I see ! Actually, it’s the comments, themselves, that are the objects to look for ;-))

Then, just for your information, here is a shorter version of the two regexes, within the css_comment.xml file contents :

<?xml version="1.0" encoding="UTF-8" ?>

<NotepadPlus>
	<functionList>
		<!-- ================================================ [ CSS comments ] -->
		<parser id="css_comment" displayName="CSS" commentExpr="">
			<function mainExpr="/\*{2,3}(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+\*/" displayMode="$functionName">
				<functionName>
					<nameExpr expr="(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+" />
				</functionName>
			</function>
		</parser>
	</functionList>
</NotepadPlus>

With the added line, in the configuration file overrideMap.xml :

			<association id= "css_comment.xml" langID= "20"/>

As the default CSS association is :

	<!--	<association id= "css.xml"	langID= "20"/>  -->

Note that the added line, in the overrideMap.xml file, is not mandatory. But, in that case, you must rename the css_comment.xml file as css.xml !

BR

guy038