Regular Expression in c.xml function list
-
I took the c.xml file that comes with notepad++ as an example to make my own parser:
In:<functionName> <nameExpr expr="(?x) # Utilize inline comments (see `RegEx - Pattern Modifiers`) [A-Za-z_\x7F-\xFF][\w\x7F-\xFF]* \s*\( # start of parameters (?s:.*?) # whatever, until... \) # end of parameters " />
In the third line of code:
The range\x7F-\xFF
as allowed characters should not be excluded or denied from class […] ?
Why is\w
used when it allows accented letters trying to allow numbers and underscore?My solution only to the 3 line of the code presented:
[A-Za-z_][A-Za-z_\d]+
It’s correct? Can it be simplified?Thanks in advance for your comments :)
-
@José-Luis-Montero-Castellanos ,
Why is
\w
used when it allows accented letters trying to allow numbers and underscore?My guess is because C can probably handle non-ASCII letters (like accented letters) in the function names, depending on compiler settings and file encoding, so they wanted to make sure that those characters would be allowed in the functionList. (I know Perl can use such characters if
use utf8;
is enabled, so I am assuming C/C++ can as well.)My solution … It’s correct? Can it be simplified?
We don’t know the details of your language. Your pattern says it must start with an ASCII letter or an underscore, and be followed by an ASCII letter, an underscore, or any unicode digit including traditional
0-9
but also digits from other scripts in Unicode. -
@PeterJones
Thanks for your reply:
I had understood that C admits for identifiers only English alphabetic characters, without diacritics, or accents and the underscore, and use numbers but not at the beginning. I thought that C parser (c.xml) was bad!The language parser I plan to create follows the aforementioned directive for identifiers, I’m going to do some research on it.
Have a good afternoon
-
@José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:
I had understood that C admits for identifiers only English alphabetic characters, without diacritics, or accents and the underscore, and use numbers but not at the beginning.
That is good advice in practice. However, as explained in this SO answer with the quote from the C99 spec, if a compiler implementation defines its own “implementation defined set of other characters”, the C99 standard does allow other characters. Most standard compilers do not (or at least don’t make it easy), but there are some compilers that have historically allowed that (like the Plan9 C compiler they mentioned), and they are still C99-standards-compliant if they do.
If you want want a more restrictive regex for your functionList parser, you are allowed – even for your own copy of the c parser rules. But do not submit a PR to change the one distributed with Notepad++, because if someone out there is using a different compiler implementation than you are, their compiler may legally allow those other characters, and they would be upset if you made Notepad++'s functionList not work for them.
—
PS: the c.xml parser is not actually good enough. Because according to that same page, using what they called the “universal character syntax”, and the gcc FAQ calls the “UCN”, using\u####
or\U########
inside function identifiers is legal and supported by the gcc compiler:#include <stdio.h> #include <stdlib.h> void function(void) { printf("Hello World\n"); } void f\u00FA\u00F1\u00E7(void) { printf("weird name worked\n"); } int main() { function(); f\u00FA\u00F1\u00E7(); return 0; }
And I was able to run it with
c:> gcc -std=c99 -fextended-identifiers win1252_function_name.c -o win1252_function_name & win1252_function_name Hello World weird name worked
… but Notepad++'s parser doens’t recognize the whole name, because it doesn’t allow
\
in the function name. -
@PeterJones
Thanks for your reply:
As I said at the beginning, I took the parser only as an example, to make my own, Adas, SQL, C and Cobol were useful for certain characteristics similar to what I’m implementing.I finished it now, and works!, although I had to resort to some tricks that I’m not happy with, because it is not good practice.
The problem is that by specifying function name
static function myUglyFunc (param1,param2)
I want the list of functions to present it as:
static myUglyFunc (param1,param2)
and prevent the word
function
from passing to the list of functions since It is assumed.I am going to study other parsers, to see what I can learn from them, to apply it to mine:
For sure , I still have to do a lot of practice and study of RegEX :). I still don’t know how to test my parser, because I don’t know what
PR
is in the Npp github as the manual says. -
@José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:
prevent the word
function
from passing to the list of functionsSorry, that cannot happen in the Function List feature. It gives the entire match, and cannot remove text from the middle.
-
@PeterJones
I understand, so I don’t waste any more time, and I consider myself well served, since the parser works for me.
Could you tell me what it is: create aPR
on Notepad++ GitHub page. to test my parser? Is needed to do the test? -
@José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:
Could you tell me what it is: create a
PR
on Notepad++ GitHub page. to test my parser? Is needed to do the test?Please re-read
https://npp-user-manual.org/docs/function-list/It was recently clarified there that you only need to do that if you are submitting a parser for a built-in lexer. You have already said this is for your own UDL, which the manual specifically says you don’t need the PR or automated test suite.
-
@PeterJones.
Now I know - whatPR
is? =“Pull Request” - Is it? That was the question. Something that the “manual does not translate or explain”, and the first thing that the search engine returns is “Public Relations”, to a newby Spanish speaker!..It is assumed that if I ask the forum for help, it is because I am not an expert, much less in Github jargon. Thank you anyway : ) …
-
@José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:
create a PR on Notepad++ GitHub page
It seems a bit of a stretch as to be necessary, but perhaps where the user manual mentions “PR” or “pull request” it needs to link HERE.
-
@José-Luis-Montero-Castellanos said in Regular Expression in c.xml function list:
That was the question
Sorry. There was an imperfect translation between our native languages, and I assumed you were asking “do I need to create a PR in order to test my parser?”, when you were really asking “what does it mean to ‘create a PR on the Notepad++ GitHub page’?”.
Quoting from the user manual:
Contribute your new or enhanced parser rule to the Notepad++ codebase
You are welcome to contribute your new or enhanced parser definition file to the Notepad++ codebase by creating PR on the Notepad++ GitHub page. This can be an update for a language that already has a function list definition, or can be a new definition file for one of the builtin lexer languages that does not yet have a function list definition. (This is not necessary if you are creating a function list definition for a UDL: since UDLs do not get distributed with Notepad++, neither do function list definitions for the UDLs. As such, you will not submit your UDL’s function list definition to the Notepad++ GitHub page through a PR, and you do not need to go through the “unit test” procedure described below for your UDL’s function list definition.)
Even if that section of the user manual is confusing in its terminology by not defining “PR”, I would hope that it is clear: whatever this magical “PR” entity that needs to be created is, it only needs to be created if you are trying to submit your function list definition to the Notepad++ codebase, and it specifically says regarding User Defined Languages: >>As such, you will not submit your UDL’s function list definition to the Notepad++ GitHub page through a PR, and you do not need to go through the “unit test” procedure described below for your UDL’s function list definition.<< – So whatever that magical “PR” is, you shouldn’t need to care, because you have made your function list definition for your own language (a UDL).
If that is not clear in the User Manual, please explain to me how I can phrase it better, because I don’t know how else to say it.
-
My current proposal for the updated wording:
Contribute your new or enhanced parser rule to the Notepad++ codebase
If you have added or updated the parser definition file for one of Notepad++'s built-in languages, you are welcome to contribute your file to the Notepad++ codebase by creating “Pull Request” (also called a “PR”) on the Notepad++ GitHub page. (A “Pull Request” is just the GitHub mechanism for requesting that code you write be added to a project.)
Please Note: You only need to create a Pull Request if you want your Function List definition to be bundled as part of the Notepad++ codebase going forward, so that everyone who downloads Notepad++ gets your Function List definition. If you do not need to contribute your Function List definition to everyone, then you do not need to read anything below this paragraph.
- If you created a Function List for your own UDL, you do not need to create a Pull Request using the link above, because user-created UDLs and their Function List definitions are not distributed as part of Notepad++. You do not need to read any further.
- If you just edited one of the pre-existing Function List definitions for your own personal use, and you don’t want to share it with anyone else, you do not need to create a Pull Request using the link above because you are not sharing it with others. You do not need to read any further.
- This Pull Request can be used to update the Function List for a language that already has a Function List definition, but you just want to make it better for everyone; or it can be for a new definition file for one of the builtin lexer languages that does not yet have a function list definition. If it does not not meet one of these requirements, you do not need to read any further.
If you still want to believe you should be submitting your Function List parser to the Notepad++ codebase at this point, please follow the steps below to create and verify your Unit Tests and then submit the Pull Request.
-
This post is deleted! -
This post is deleted! -
This post is deleted!