Custom Function List, help needed
-
Hi
I’m trying to create a Function List for a custom language which looks a lot like the functions from Excel. I’m having troubles with functions (classes?) that have optional comments between the name and the starting square-bracket.
Example of language:
SubString ~ This is a comment ~ [ GetValue["key"], 0, ~ This is another comment ~ IndexOf [ variable1, "," ] ]
From the above code, I would like to see 3 functions (or classes): SubString, GetValue & IndexOf.
However, currently I only see GetValue and IndexOf.I haven’t been able to get classRange to work at all :( so the below Regex is from Function:
My mainExpr looks like this:
(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?im) # case-insensitive, ^ and $ match at line breaks ( [a-z0-9]+ # function identifier (?: \s* # possible whitespace (?:~.*~)* | # optional comments (?:\n?\r|\r?\n)* # possible newlines )* \[ # start of parameter list )
and expr like this:
[a-zA-Z0-9]+
I have been using https://regex101.com/ to test my Regex, from what I see there it should be working…
If the “levels” from classRange could be used to show which functions are within which, then that would be awesome … but if too many levels don’t work with Notepad++, then a simple function list would be fine :)
Thanks in advance for any and all assistance.
-
Since Npp’s function list only supports two nesting levels and your custom language seems to allow an unlimited number of nesting levels I recommend to use a function parser instead of a class parser.
In order to get a working function list you have to define a custom language as well (see
(menu) Language -> User Defined Language -> Define your language...
).With the following I was able to parse your sample code in the way you wanted.
<function mainExpr="(?x) (?ims) ^ \h* ([a-z_]\w*) (?= .*? \\[ ) " > <functionName> <nameExpr expr="(?im-s)[a-z_]\w*" /> </functionName> </function>
Please note: I assumed that identifiers must start with a letter or an underscore.
-
Hi @dinkumoil
Thanks for your feedback. Good to know about the nesting level limitation, so I’ll have to work with functions.
Your regex seems to have worked better than mine, but it still isn’t right as it is picking up more than it should - which is my fault, I left some details out:
The language is like Excel, but there are more differences, like variables:~ comment comment comment ~ ~ [comment comment comment] ~ var1:=FunctionName01["String"]; var2:=FunctionName02 ["String"]; var3:=FunctionName03 ~ comment comment comment ~ [FunctionName04["String"], FunctionName05 ~ comment ~ [123]]; FunctionName06[ ~ comment ~ FunctionName07[FunctionName08~comment~[FunctionName09 [123]]]]; FunctionName10 ~ comment ~ [ "String" ];
The lookahead you added is a good idea, but it shouldn’t pick up everything (such as variables).
From the above (poorly formatted ;)) example, it should only pick up the 10 Functions, the 3 variables should not be caught.
Using your regex I made some modifications, but it still isn’t working. Functions which have comments between their name and the starting bracket are still missing:<function mainExpr="(?x) (?ims) \h* ([a-z][a-z0-9]+) (?=\s*(?:\x7E[^\n\r]*\x7E)*\s*\[) "> <functionName> <nameExpr expr="(?im-s)[a-z0-9]+"/> </functionName> </function>
Note: Only alphanumeric characters are allowed, no special characters at all.
Can any online resource like (regex101) be used to validate the regex’s? -
I finally got it working, as I’d like to see it … the main problem I had was with a mistake in the commentExpr, which I really should have mentioned that I had defined one :(
Here is the Regex that worked:
(?x) (?ims) \h* ([a-z][a-z0-9]+) (?=\s*(?:\x7E[^\n\r\x7E]*\x7E)*\s*\[)
Thanks @dinkumoil, you got me on the right track…
-
Hello, @ericsambach, @dinkumoil and All;
I think that you can simplify your function list as below :
<function mainExpr="(?x) \h* [[:alnum:]]+ (?= (?:\s*~[^\n\r~]*~)* \s* \\[ )"> <functionName> <nameExpr expr="[[:alnum:]]+"/> </functionName> </function>
Notes :
-
No need of of
(?-i)
modifier, as we use the[[:alnum:]]
Posix character class which is identical to[\u\l\d]
or[A-Za-z0-9]
-
No need of the
(?s)
modifier, as we don’t use any regex dot symbol (.
) anyway ! -
No need of the
(?m)
modifier, as we don’t use the^
and$
assertions, too ! -
The first
\s*
of the regex has been moved at beginning of the non-capturing group(?:\s*.....)
Best Regards,
guy038
-