Community
    • Login

    Regular Expression in c.xml function list

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 3 Posters 741 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • José Luis Montero CastellanosJ
      José Luis Montero Castellanos
      last edited by José Luis Montero Castellanos

      I took the c.xml file that comes with notepad++ as an example to make my own parser:
      In:

      <functionName>
      	<nameExpr expr="(?x)                  	        # Utilize inline comments (see `RegEx - Pattern Modifiers`)
      		[A-Za-z_\x7F-\xFF][\w\x7F-\xFF]*	
      		\s*\(                             	# start of parameters
      		(?s:.*?)                          	# whatever, until...
      		\)                                	# end   of parameters
      	" />
      

      In the third line of code:
      The range \x7F-\xFF as allowed characters should not be excluded or denied from class […] ?
      Why is \w used when it allows accented letters trying to allow numbers and underscore?

      My solution only to the 3 line of the code presented:
      [A-Za-z_][A-Za-z_\d]+ It’s correct? Can it be simplified?

      Thanks in advance for your comments :)

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @José Luis Montero Castellanos
        last edited by PeterJones

        @José-Luis-Montero-Castellanos ,

        Why is \w used when it allows accented letters trying to allow numbers and underscore?

        My guess is because C can probably handle non-ASCII letters (like accented letters) in the function names, depending on compiler settings and file encoding, so they wanted to make sure that those characters would be allowed in the functionList. (I know Perl can use such characters if use utf8; is enabled, so I am assuming C/C++ can as well.)

        My solution … It’s correct? Can it be simplified?

        We don’t know the details of your language. Your pattern says it must start with an ASCII letter or an underscore, and be followed by an ASCII letter, an underscore, or any unicode digit including traditional 0-9 but also digits from other scripts in Unicode.

        f62cf6e6-f53a-493c-9d5c-b90e38fd8186-image.png

        José Luis Montero CastellanosJ 1 Reply Last reply Reply Quote 1
        • José Luis Montero CastellanosJ
          José Luis Montero Castellanos @PeterJones
          last edited by

          @PeterJones
          Thanks for your reply:
          I had understood that C admits for identifiers only English alphabetic characters, without diacritics, or accents and the underscore, and use numbers but not at the beginning. I thought that C parser (c.xml) was bad!

          The language parser I plan to create follows the aforementioned directive for identifiers, I’m going to do some research on it.

          Have a good afternoon

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @José Luis Montero Castellanos
            last edited by PeterJones

            @José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:

            I had understood that C admits for identifiers only English alphabetic characters, without diacritics, or accents and the underscore, and use numbers but not at the beginning.

            That is good advice in practice. However, as explained in this SO answer with the quote from the C99 spec, if a compiler implementation defines its own “implementation defined set of other characters”, the C99 standard does allow other characters. Most standard compilers do not (or at least don’t make it easy), but there are some compilers that have historically allowed that (like the Plan9 C compiler they mentioned), and they are still C99-standards-compliant if they do.

            If you want want a more restrictive regex for your functionList parser, you are allowed – even for your own copy of the c parser rules. But do not submit a PR to change the one distributed with Notepad++, because if someone out there is using a different compiler implementation than you are, their compiler may legally allow those other characters, and they would be upset if you made Notepad++'s functionList not work for them.

            —
            PS: the c.xml parser is not actually good enough. Because according to that same page, using what they called the “universal character syntax”, and the gcc FAQ calls the “UCN”, using \u#### or \U######## inside function identifiers is legal and supported by the gcc compiler:

            #include <stdio.h>
            #include <stdlib.h>
            
            void function(void)
            {
                printf("Hello World\n");
            }
            
            void f\u00FA\u00F1\u00E7(void)
            {
                printf("weird name worked\n");
            }
            
            int main()
            {
                function();
                f\u00FA\u00F1\u00E7();
                return 0;
            }
            

            And I was able to run it with

            c:> gcc -std=c99 -fextended-identifiers win1252_function_name.c -o win1252_function_name & win1252_function_name
            Hello World
            weird name worked
            

            … but Notepad++'s parser doens’t recognize the whole name, because it doesn’t allow \ in the function name.

            52a175c2-301d-4de6-9cfc-385543378d16-image.png

            José Luis Montero CastellanosJ 1 Reply Last reply Reply Quote 3
            • José Luis Montero CastellanosJ
              José Luis Montero Castellanos @PeterJones
              last edited by José Luis Montero Castellanos

              @PeterJones
              Thanks for your reply:
              As I said at the beginning, I took the parser only as an example, to make my own, Adas, SQL, C and Cobol were useful for certain characteristics similar to what I’m implementing.

              I finished it now, and works!, although I had to resort to some tricks that I’m not happy with, because it is not good practice.

              The problem is that by specifying function name

              static function myUglyFunc (param1,param2)
              

              I want the list of functions to present it as:

              static myUglyFunc (param1,param2)
              

              and prevent the word function from passing to the list of functions since It is assumed.

              I am going to study other parsers, to see what I can learn from them, to apply it to mine:

              For sure , I still have to do a lot of practice and study of RegEX :). I still don’t know how to test my parser, because I don’t know what PR is in the Npp github as the manual says.

              PeterJonesP 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @José Luis Montero Castellanos
                last edited by

                @José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:

                prevent the word function from passing to the list of functions

                Sorry, that cannot happen in the Function List feature. It gives the entire match, and cannot remove text from the middle.

                José Luis Montero CastellanosJ 1 Reply Last reply Reply Quote 0
                • José Luis Montero CastellanosJ
                  José Luis Montero Castellanos @PeterJones
                  last edited by José Luis Montero Castellanos

                  @PeterJones
                  I understand, so I don’t waste any more time, and I consider myself well served, since the parser works for me.
                  Could you tell me what it is: create a PR on Notepad++ GitHub page. to test my parser? Is needed to do the test?

                  PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @José Luis Montero Castellanos
                    last edited by

                    @José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:

                    Could you tell me what it is: create a PR on Notepad++ GitHub page. to test my parser? Is needed to do the test?

                    Please re-read
                    https://npp-user-manual.org/docs/function-list/

                    It was recently clarified there that you only need to do that if you are submitting a parser for a built-in lexer. You have already said this is for your own UDL, which the manual specifically says you don’t need the PR or automated test suite.

                    José Luis Montero CastellanosJ 1 Reply Last reply Reply Quote 1
                    • José Luis Montero CastellanosJ
                      José Luis Montero Castellanos @PeterJones
                      last edited by José Luis Montero Castellanos

                      @PeterJones.
                      Now I know - what PR is? =“Pull Request” - Is it? That was the question. Something that the “manual does not translate or explain”, and the first thing that the search engine returns is “Public Relations”, to a newby Spanish speaker!..

                      It is assumed that if I ask the forum for help, it is because I am not an expert, much less in Github jargon. Thank you anyway : ) …

                      PeterJonesP 1 Reply Last reply Reply Quote 1
                      • Alan KilbornA
                        Alan Kilborn @José Luis Montero Castellanos
                        last edited by Alan Kilborn

                        @José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:

                        create a PR on Notepad++ GitHub page

                        It seems a bit of a stretch as to be necessary, but perhaps where the user manual mentions “PR” or “pull request” it needs to link HERE.

                        José Luis Montero CastellanosJ 1 Reply Last reply Reply Quote 1
                        • PeterJonesP
                          PeterJones @José Luis Montero Castellanos
                          last edited by

                          @José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:

                          That was the question

                          Sorry. There was an imperfect translation between our native languages, and I assumed you were asking “do I need to create a PR in order to test my parser?”, when you were really asking “what does it mean to ‘create a PR on the Notepad++ GitHub page’?”.

                          Quoting from the user manual:

                          Contribute your new or enhanced parser rule to the Notepad++ codebase

                          You are welcome to contribute your new or enhanced parser definition file to the Notepad++ codebase by creating PR on the Notepad++ GitHub page. This can be an update for a language that already has a function list definition, or can be a new definition file for one of the builtin lexer languages that does not yet have a function list definition. (This is not necessary if you are creating a function list definition for a UDL: since UDLs do not get distributed with Notepad++, neither do function list definitions for the UDLs. As such, you will not submit your UDL’s function list definition to the Notepad++ GitHub page through a PR, and you do not need to go through the “unit test” procedure described below for your UDL’s function list definition.)

                          Even if that section of the user manual is confusing in its terminology by not defining “PR”, I would hope that it is clear: whatever this magical “PR” entity that needs to be created is, it only needs to be created if you are trying to submit your function list definition to the Notepad++ codebase, and it specifically says regarding User Defined Languages: >>As such, you will not submit your UDL’s function list definition to the Notepad++ GitHub page through a PR, and you do not need to go through the “unit test” procedure described below for your UDL’s function list definition.<< – So whatever that magical “PR” is, you shouldn’t need to care, because you have made your function list definition for your own language (a UDL).

                          If that is not clear in the User Manual, please explain to me how I can phrase it better, because I don’t know how else to say it.

                          PeterJonesP José Luis Montero CastellanosJ 2 Replies Last reply Reply Quote 1
                          • PeterJonesP
                            PeterJones @PeterJones
                            last edited by PeterJones

                            My current proposal for the updated wording:

                            Contribute your new or enhanced parser rule to the Notepad++ codebase

                            If you have added or updated the parser definition file for one of Notepad++'s built-in languages, you are welcome to contribute your file to the Notepad++ codebase by creating “Pull Request” (also called a “PR”) on the Notepad++ GitHub page. (A “Pull Request” is just the GitHub mechanism for requesting that code you write be added to a project.)

                            Please Note: You only need to create a Pull Request if you want your Function List definition to be bundled as part of the Notepad++ codebase going forward, so that everyone who downloads Notepad++ gets your Function List definition. If you do not need to contribute your Function List definition to everyone, then you do not need to read anything below this paragraph.

                            • If you created a Function List for your own UDL, you do not need to create a Pull Request using the link above, because user-created UDLs and their Function List definitions are not distributed as part of Notepad++. You do not need to read any further.
                            • If you just edited one of the pre-existing Function List definitions for your own personal use, and you don’t want to share it with anyone else, you do not need to create a Pull Request using the link above because you are not sharing it with others. You do not need to read any further.
                            • This Pull Request can be used to update the Function List for a language that already has a Function List definition, but you just want to make it better for everyone; or it can be for a new definition file for one of the builtin lexer languages that does not yet have a function list definition. If it does not not meet one of these requirements, you do not need to read any further.

                            If you still want to believe you should be submitting your Function List parser to the Notepad++ codebase at this point, please follow the steps below to create and verify your Unit Tests and then submit the Pull Request.

                            1 Reply Last reply Reply Quote 2
                            • José Luis Montero CastellanosJ
                              José Luis Montero Castellanos @Alan Kilborn
                              last edited by José Luis Montero Castellanos

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • José Luis Montero CastellanosJ
                                José Luis Montero Castellanos @PeterJones
                                last edited by José Luis Montero Castellanos

                                This post is deleted!
                                Alan KilbornA 1 Reply Last reply Reply Quote 0
                                • Alan KilbornA
                                  Alan Kilborn @José Luis Montero Castellanos
                                  last edited by

                                  This post is deleted!
                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors