Function list parser for classes
-
Hello,
I tried to write a parser for javascript classes.
I want to parse this example:
class Rectangle { constructor(height, width) { this.height = height; this.width = width; } }
This is my current approach, that doesnt work at all:
<classRange mainExpr="(class)\s+\w+\s*\{" openSymbole="\{" closeSymbole="\}"> <className> <nameExpr expr="(class)\s+\w+" /> </className> <function mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{"> <functionName> <funcNameExpr expr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{" /> </functionName> </function> </classRange>
Can somebody help, what is the issue?
BTW: I tried my regexs at this Website: https://regex101.com/ -
@Stephan-Romhart-0 I thought about it:
Does classRange need to match the hole class content?
or only the fragment until the first “{”
From the manual:
openSymbole & closeSymbole (attribute): They are optional. If defined, then the parser will determine the zone of this class: it find first openSymbole from the first character of the string found by “mainExpr” attribute; then it determines the end of class by closeSymbole found.
How is it meant?
-
@Stephan-Romhart-0
This is my current Regex to get every class separate:^class.\n{\n((\t.\n)|(^$\n))*^}
But it doesnt work :-(
<classRange mainExpr="^class.*\n\{\n((\t.*\n)|(^$\n))*^\}" openSymbole="\{" closeSymbole="\}"> <className> <nameExpr expr="(class).*" /> </className> <function mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{"> <functionName> <funcNameExpr expr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{" /> </functionName> </function> </classRange>
-
It’s hard to get Function List definitions right (at least for me). I always have to start with the simplest, which matches too much (or too little), and then slowly tweak the regex until it does everything I want.
From the way I understand things, if you don’t have the openSymbole and closeSymbole, then the regex for your classRange must match the entire class from start to finish, whereas if you do have the openSymbole and closeSymbole, then the expression just needs to match from the start of the class until (and including) the openSymbole.
@MAPJe71 is our resident expert, having written our Function List Basics FAQ, so maybe he’ll come in with specific advice. But in the meantime, that FAQ links to his repository of UDL, Auto-Completion, and Function List definition config files for many, many languages. And this is the
<parser>
portion of a function list definition for JavaScript with classes enabled. So if you wanted, you could just try to use his parser first (you would have to embed that<parser>
section in a file with the right name and the<NotepadPlus><functionList>...</functionList></NotepadPlus>
wrapper), or you could just study how that one works, and try to adjust it if you have different needs/wants. -
I tried out the FunctionList definition that I linked to, and it worked okay for
function
s, but didn’t seem to handle classes the way I’d expect. Looking at the details, unless I was mis-reading, I think that was looking forvar
as the start of a class… so i think it was looking forvar myObject = { ... }
rather than theclass Name { ... }
definition. Sorry.Since I was curious at this point, I kept the comments and function definition from that one, but then changed the classRange to be looking for the
class
keyword. I’m not an expert on JavaScript syntax, so this shouldn’t be taken as a canonical or all-encompassing, but it worked with your simple example and a slightly expanded example I made. It will probably take work on your part to make it handle everything you want, but if I use:<classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?-i:class) \s+ [A-Za-z_$][\w$]* \s* \{ # start of class body " openSymbole ="\{" closeSymbole="\}" > <className> <nameExpr expr="(?-i:class)\s+\K[A-Za-z_$][\w$]*" /> </className> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \s*(?-i:\bfunction\b)?\s* [A-Za-z_$][\w$]* \s*(?-i:\bfunction\b)?\s* \s*\([^()]*\) # parameters \s*\{ # start of function body " > <functionName> <funcNameExpr expr="[A-Za-z_$][\w$]*" /> </functionName> </function> </classRange>
With the JavaScript code:
class Test { constructor({test=0,test}) { ... } someMethod(a,b,c) { ... } } class Rectangle { constructor(height, width) { this.height = height; this.width = width; } } function blah(a,b,c) { ... }
the Function List showed me:
… so that tells me it’s at least a reasonable starting point.
-
@PeterJones Hello Peter, thank you very, very much.
It is strange. Sometimes, the regex seems to work, sometimes not. If I use your regex definition, I have scripts that work and other that don’t.
So I tried to figure out the two regexes to catch the class and its methods:
if I test your class regex as a one liner, it works
https://regex101.com/r/UBrx23/1if I test your method regex as a one liner, it matches also all if, for etc items.
https://regex101.com/r/eWYfV1/1so I updated both regexes to clean them up:
class regex
https://regex101.com/r/6rHP8q/1methods regex
https://regex101.com/r/RpV9ph/1now, both seam to match correctly, but in the 10 different js-files, only the half worked :-)
Probably some one can see, what I am not getting here…
Code for C&P
<parser displayName="JavaScript" id ="javascript_function" commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)" > <classRange mainExpr ="class [A-Za-z_$]*\s*\{" openSymbole ="\{" closeSymbole="\}" > <className> <nameExpr expr="class [A-Za-z_$]*" /> </className> <function mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{" > <functionName> <funcNameExpr expr="[A-Za-z_$][\w$]*" /> </functionName> </function> </classRange> <function mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*function(\s+[A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{" > <functionName> <nameExpr expr="[A-Za-z_$][\w$]*\s*[=:]|[A-Za-z_$][\w$]*\s*\(" /> <nameExpr expr="[A-Za-z_$][\w$]*" /> </functionName> <className> <nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\." /> <nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*" /> </className> </function> </parser>
-
regex101 does not use the same regular expression engine as Notepad++ uses (N++ uses Boost). Every engine has its own quirks and rules, and writing a regex that works with one does not guarantee it works with another, so just because it works on regex101 doesn’t mean it will work with N++, and vice versa.
And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.
As an idea, if my method regex is matching too many keywords and thinking they are methods, you could use the examples from @MAPJe71’s file that I linked, which has a negative lookahead
(?!...)
to prevent it from thinking thatif(...)
andfor(...)
and similar are function names. -
@Stephan-Romhart-0 I think, I found out:
If the last line of the file is the classes closing “}”, it does not work.
When I do after the “}” an Enter, it works.
So it probably has to do with the closeSymbol?
-
@PeterJones said in Function list parser for classes:
And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.
Sorry, I didn’t mean to sound harsh. You have helped me so much!!!
-
@Stephan-Romhart-0 said in Function list parser for classes:
If the last line of the file is the classes closing “}”, it does not work.
Oh, there’s nothing you can do to fix that. That’s just been a long-time limitation of the Function List parser. See the second post in the FAQ for known limitations.
-
@PeterJones Thank you again
I will post my final solution in case some one needs the same regexes ;-) -
Have a look at the
Java
parser for inspiration:<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <NotepadPlus> <functionList> <!-- | Based on: | https://community.notepad-plus-plus.org/topic/12691/function-list-with-java-problems | | 20161116: | - added embedded comment to RegEx; | - removed `commentExpr` as it prevents classes and functions | from showing in the FunctionList tree when they contain | comments and/or literal strings; | commentExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) | (?s: # Multi Line Comment | \x2F\x2A{1} # - starts with a forward-slash and one asterisk | (?: # - followed by zero or more characters | [^\x2A\x5C] # ...not an asterisk and not a backslash (i.e. escape character) | | \x2A[^\x2F] # ...or an asterisk not followed by a forward-slash | | \x5C. # ...or a backslash followed by any character | )* # | \x2A\x2F # - ends with an asterisk and forward-slash | ) | | (?m-s:\x2F{2}.*$) # Single Line Comment | | (?s: # JavaDoc Comment | \x2F\x2A{2} # - starts with a forward-slash and two asterisk' | (?: # - followed by zero or more characters | [^\x2A\x5C] # ...not an asterisk and not a backslash (i.e. escape character) | | \x2A[^\x2F] # ...or an asterisk not followed by a forward-slash | | \x5C. # ...or a backslash followed by any character | )* # | \x2A\x2F # - ends with an asterisk and forward-slash | ) | | (?s:\x22(?:[^\r\n\x22\x5C]|\x5C[^\r\n])*\x22) # String Literal - Double Quoted, no embedded line-breaks | | (?s:\x27(?:[^\r\n\x27\x5C]|\x5C[^\r\n])*\x27) # String Literal - Single Quoted, no embedded line-breaks | " | - 'type name' and 'parent type name(s)' parts in function 'declarator' | group/subroutine do not use "(?&VALID_ID)" as it prevents | classes and functions from showing in the FunctionList tree; | 20181130: | - Fix for "Function List Omits Java Functions with Spaces Before Closing Parentheses" | (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5085) \--> <parser displayName="Java" id ="java_syntax" > <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?m) # ^ and $ match at line-breaks ^[\t\x20]* # optional leading white-space at start-of-line (?: (?-i: abstract | final | native | p(?:rivate|rotected|ublic) | s(?:tatic|trictfp|ynchronized) | transient | volatile | @[A-Za-z_]\w* # qualified identifier (?: # consecutive names... \. # ...are dot separated [A-Za-z_]\w* )* ) \s+ )* (?-i:class|enum|@?interface) \s+ (?'DECLARATOR' (?'VALID_ID' # valid identifier, use as subroutine \b(?!(?-i: # keywords (case-sensitive), not to be used as identifier a(?:bstract|ssert) | b(?:oolean|reak|yte) | c(?:ase|atch|har|lass|on(?:st|tinue)) | d(?:efault|o(?:uble)?) | e(?:lse|num|xtends) | f(?:inal(?:ly)?|loat|or) | goto | i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?) | long | n(?:ative|ew) | p(?:ackage|rivate|rotected|ublic) | return | s(?:hort|tatic|trictfp|uper|witch|ynchronized) | th(?:is|rows?)|tr(?:ansient|y) | vo(?:id|latile) | while )\b) [A-Za-z_]\w* # valid character combination for identifiers ) (?: \s*\x3C # start-of-template indicator... (?'GENERIC' # ...match first generic, use as subroutine \s* (?: (?&DECLARATOR) # use named generic | \? # or unknown ) (?: # optional type extension \s+(?-i:extends|super) \s+(?&DECLARATOR) (?: # multiple bounds... \s+\x26 # ...are ampersand separated \s+(?&DECLARATOR) )* )? (?: # match consecutive generics objects... \s*, # ...are comma separated (?&GENERIC) )? ) \s*\x3E # end-of-template indicator )? (?: # package and|or nested classes... \. # ...are dot separated (?&DECLARATOR) )? ) (?: # optional object extension \s+(?-i:extends) \s+(?&DECLARATOR) (?: # consecutive objects... \s*, # ...are comma separated \s*(?&DECLARATOR) )* )? (?: # optional object implementation \s+(?-i:implements) \s+(?&DECLARATOR) (?: # consecutive objects... \s*, # ...are comma separated \s*(?&DECLARATOR) )* )? \s*\{ # whatever, until start-of-body indicator " openSymbole ="\{" closeSymbole="\}" > <className> <nameExpr expr="(?-i:class|enum|@?interface)\s+\K\w+(?:\s*\x3C.*?\x3E)?" /> </className> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) ^[\t\x20]* # optional leading white-space at start-of-line (?: (?-i: abstract | final | native | p(?:rivate|rotected|ublic) | s(?:tatic|trictfp|ynchronized) | transient | volatile | @[A-Za-z_]\w* # qualified identifier (?: # consecutive names... \. # ...are dot separated [A-Za-z_]\w* )* ) \s+ )* (?: \s*\x3C # start-of-template indicator (?&GENERIC) \s*\x3E # end-of-template indicator )? \s* (?'DECLARATOR' [A-Za-z_]\w* # (parent) type name (?: # consecutive sibling type names... \. # ...are dot separated [A-Za-z_]\w* )* (?: \s*\x3C # start-of-template indicator (?'GENERIC' # match first generic, use as subroutine \s* (?: (?&DECLARATOR) # use named generic | \? # or unknown ) (?: # optional type extension \s+(?-i:extends|super) \s+(?&DECLARATOR) (?: # multiple bounds... \s+\x26 # ...are ampersand separated \s+(?&DECLARATOR) )* )? (?: # consecutive generics objects... \s*, # ...are comma separated (?&GENERIC) )? ) \s*\x3E # end-of-template indicator )? (?: # package and|or nested classes... \. # ...are dot separated (?&DECLARATOR) )? (?: # optional compound type... \s*\[ # ...start-of-compound indicator \s*\] # ...end-of-compound indicator )* ) \s+ (?'VALID_ID' # valid identifier, use as subroutine \b(?!(?-i: # keywords (case-sensitive), not to be used as identifier a(?:bstract|ssert) | b(?:oolean|reak|yte) | c(?:ase|atch|har|lass|on(?:st|tinue)) | d(?:efault|o(?:uble)?) | e(?:lse|num|xtends) | f(?:inal(?:ly)?|loat|or) | goto | i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?) | long | n(?:ative|ew) | p(?:ackage|rivate|rotected|ublic) | return | s(?:hort|tatic|trictfp|uper|witch|ynchronized) | th(?:is|rows?)|tr(?:ansient|y) | vo(?:id|latile) | while )\b) [A-Za-z_]\w* # valid character combination for identifiers ) \s*\( # start-of-parameters indicator (?'PARAMETER' # match first parameter, use as subroutine \s*(?-i:final\s+)? (?&DECLARATOR) \s+(?&VALID_ID) # parameter name (?: # consecutive parameters... \s*, # ...are comma separated (?&PARAMETER) )? )? \s*\) # end-of-parameters indicator (?: # optional exceptions \s*(?-i:throws) \s+(?&VALID_ID) # first exception name (?: # consecutive exception names... \s*, # ...are comma separated \s*(?&VALID_ID) )* )? [^{;]*\{ # start-of-function-body indicator " > <functionName> <funcNameExpr expr="\w+(?=\s*\()" /> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
-
@MAPJe71 Thank you very much, I will study it :-)
-
@Stephan-Romhart-0
My final javascript class parser I use now for work is the folling:<parser displayName="JavaScript" id="javascript_function" commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)"> <classRange mainExpr="class [A-Za-z_$]*\s*\{" openSymbole ="\{" closeSymbole="\}"> <className> <nameExpr expr="Klasse: [A-Za-z_$]*" /> </className> <function mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{"> <functionName> <funcNameExpr expr="[A-Za-z_$][\w$]*" /> </functionName> </function> </classRange> </parser>
It works for me at the moment, because I use only js classes with declaration.