Community
    • Login

    Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 632 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephan-romhartS
      stephan-romhart
      last edited by

      Hello,

      ich use the function list to navigate in my structured CSS files.

      The regex to fetch all comments in the format

      /** comment */
      

      I use is

      \/\*\*[\*]?[a-zA-Z0-9äöüÄÖÜß -]+\*\/
      

      but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
      All comments with Umlaute wont show, all other will.

      Does anyone know a solution?

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        @stephan-romhart said in Functionlist regex for CSS comments with german Umlaute äöüÄÖÜß: Bug or Error of my regex?:

        but it seams, Notepad doesnt recognize the german Umlaute (äöüÄÖÜß).
        All comments with Umlaute wont show, all other will.
        Does anyone know a solution?

        I recall some posts from @guy038 on character ranges etc and in trying to locate them I have found an online application he mentioned at this website.
        I copied through your ß character and pressed “Go” which then told me the hex code is 00DF. So using the Find function and typing \xdf locates that character. I eventually found one of his posts https://community.notepad-plus-plus.org/topic/20595/examining-a-character/10 which might be useful reading for you.
        The other characters are in a similar range, \xe4, \xf6, \xfc, \xc4, \xd6, \xdc.

        So if you were to replace the individual characters in your “set” like follows it should work.
        \/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/
        There are other changes also possible as you are escaping the / when it is not needed. The - at the end of the set might need escaping as it normally means a range and as the space is ahead of it that could be thought of as a start of a range (space to ??).

        I’m hoping by referencing @guy038 he will elaborate more fully, since he has a very good appreciation of these things. Being a native English speaker I haven’t been exposed to these “extended characters”. He hopefully can also verify if the last character in the set - needs “escaping” in this position.

        I do also think a much simpler regex such as (?-s)^/\*+.+?\*/ would work, unless you specifically need to look for ONLY the characters within your set, and exclude others. The (?-s)^ limits it to 1 line ONLY and to find the first character (/) at column 1 (^). The .+? is lazy so it will only select characters until the first */ following is found.

        Terry

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @stephan-romhart, @terry-r and All,

          From the definition of an CSS comment :

          https://www.w3schools.com/css/css_comments.asp

          I suppose that the following general regex, inside a commentExpr attribute, should work in your css.xml definition file :

          		<parser
          			displayName="CSS"
          			id         ="css_syntax"
          			commentExpr="(?s)/\*.+?\*/"
          		>
          ....
          

          This regex will match any single-line or multi-line CSS comment !

          I think we shouldn’t bother with specific characters in CSS comments ;-))


          Now, you said :

          All comments with Umlaute wont show, all other will.

          Well, I’m not sure about your overall comprehension of the Function List feature. Indeed, from :

          https://community.notepad-plus-plus.org/topic/19480/faq-desk-function-list-basics    ( See main points 2 and 4 of the parse steps, at beginning of the topic )

          And , from the Function List tutorial, below ( Old N++ site, before Sept 19, 2019 ) :

          https://web.archive.org/web/20190826024431/https://notepad-plus-plus.org/features/function-list.html

          Where it is said :

          comment: Optional. you can make a RE in this attribute in order to identify comment zones. The identified zones will be ignored by search.

          My understanding is that the commentExpr attribute is used to define range(s) of characters where the Function List feature will not look for any class or function / method block !!

          Best Regards

          guy038

          P.S. :

          To non-German people, the Umlaut German letters are :

          •-----------•--------------•--------------•----------•-----------•
          | Character | Substitution |   ANSI (*)   | UNICODE  |    HTML   |   
          •-----------•--------------•--------------•----------•-----------•
          |     ä     |      ae      |  Alt + 0228  |   00E4   |  &auml;   |
          |     ö     |      oe      |  Alt + 0246  |   00F6   |  &ouml;   |
          |     ü     |      ue      |  Alt + 0252  |   00FC   |  &uuml;   |
          •-----------•--------------•--------------•----------•-----------•
          |     ß     |      ss      |  Alt + 0223  |   00DF   |  &szlig;  |
          •-----------•--------------•--------------•----------•-----------•
          |     Ä     |      Ae      |  Alt + 0196  |   00C4   |  &Auml;   |
          |     Ö     |      Oe      |  Alt + 0214  |   00D6   |  &Ouml;   |
          |     Ü     |      Ue      |  Alt + 0220  |   00DC   |  &Uuml;   |
          •-----------•--------------•--------------•----------•-----------•
          

          (*) In Win-1250, Win-1252, Win-1254, Win-1257 or Win-1258 encodings

          1 Reply Last reply Reply Quote 1
          • stephan-romhartS
            stephan-romhart
            last edited by

            Thank you @Terry-R and @guy038

            the solution to use unicode-chars in the regex works like a charm.

            Here my complete code in case of some one else is searching for a solution to use the functionslist as a CSS comment overview:

            <NotepadPlus>
            	<functionList>
            		<!-- ================================================ [ CSS ] -->
            		<parser id="css_comment" displayName="CSS" commentExpr="">
            			<function mainExpr="\/\*\*[\*]?[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+\*\/" displayMode="$functionName">
            				<functionName>
            					<nameExpr expr="[a-zA-Z0-9\xe4\xf6\xfc\xc4\xd6\xdc\xdf -]+" />
            				</functionName>
            			</function>
            		</parser>
            	</functionList>
            </NotepadPlus>
            

            Stephan

            1 Reply Last reply Reply Quote 3
            • guy038G
              guy038
              last edited by guy038

              Hi, @stephan-romhart, @terry-r and All,

              Oooooh, Stephan, I see ! Actually, it’s the comments, themselves, that are the objects to look for ;-))

              Then, just for your information, here is a shorter version of the two regexes, within the css_comment.xml file contents :

              <?xml version="1.0" encoding="UTF-8" ?>
              
              <NotepadPlus>
              	<functionList>
              		<!-- ================================================ [ CSS comments ] -->
              		<parser id="css_comment" displayName="CSS" commentExpr="">
              			<function mainExpr="/\*{2,3}(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+\*/" displayMode="$functionName">
              				<functionName>
              					<nameExpr expr="(?i)[A-Z0-9\xc4\xd6\xdc\xdf -]+" />
              				</functionName>
              			</function>
              		</parser>
              	</functionList>
              </NotepadPlus>
              

              With the added line, in the configuration file overrideMap.xml :

              			<association id= "css_comment.xml" langID= "20"/>
              

              As the default CSS association is :

              	<!--	<association id= "css.xml"	langID= "20"/>  -->
              

              Note that the added line, in the overrideMap.xml file, is not mandatory. But, in that case, you must rename the css_comment.xml file as css.xml !

              BR

              guy038

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors