Community
    • Login

    Function list parser for classes

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    14 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stephan Romhart 0S
      Stephan Romhart 0 @Stephan Romhart 0
      last edited by

      @Stephan-Romhart-0
      This is my current Regex to get every class separate:

      ^class.\n{\n((\t.\n)|(^$\n))*^}

      Screenshot 2023-08-22 113238.jpg

      But it doesnt work :-(

      			<classRange mainExpr="^class.*\n\{\n((\t.*\n)|(^$\n))*^\}" openSymbole="\{" closeSymbole="\}">
      				<className>
      					<nameExpr expr="(class).*" />
      				</className>
      				<function mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{">
      					<functionName>
      						<funcNameExpr expr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*([A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{" />
      					</functionName>
      				</function>
      			</classRange>
      
      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Stephan Romhart 0
        last edited by

        @Stephan-Romhart-0 ,

        It’s hard to get Function List definitions right (at least for me). I always have to start with the simplest, which matches too much (or too little), and then slowly tweak the regex until it does everything I want.

        From the way I understand things, if you don’t have the openSymbole and closeSymbole, then the regex for your classRange must match the entire class from start to finish, whereas if you do have the openSymbole and closeSymbole, then the expression just needs to match from the start of the class until (and including) the openSymbole.

        @MAPJe71 is our resident expert, having written our Function List Basics FAQ, so maybe he’ll come in with specific advice. But in the meantime, that FAQ links to his repository of UDL, Auto-Completion, and Function List definition config files for many, many languages. And this is the <parser> portion of a function list definition for JavaScript with classes enabled. So if you wanted, you could just try to use his parser first (you would have to embed that <parser> section in a file with the right name and the <NotepadPlus><functionList>...</functionList></NotepadPlus> wrapper), or you could just study how that one works, and try to adjust it if you have different needs/wants.

        PeterJonesP 1 Reply Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @PeterJones
          last edited by PeterJones

          @Stephan-Romhart-0 ,

          I tried out the FunctionList definition that I linked to, and it worked okay for functions, but didn’t seem to handle classes the way I’d expect. Looking at the details, unless I was mis-reading, I think that was looking for var as the start of a class… so i think it was looking for var myObject = { ... } rather than the class Name { ... } definition. Sorry.

          Since I was curious at this point, I kept the comments and function definition from that one, but then changed the classRange to be looking for the class keyword. I’m not an expert on JavaScript syntax, so this shouldn’t be taken as a canonical or all-encompassing, but it worked with your simple example and a slightly expanded example I made. It will probably take work on your part to make it handle everything you want, but if I use:

          				<classRange
          					mainExpr    ="(?x)                                          # free-spacing (see `RegEx - Pattern Modifiers`)
          							(?-i:class)
          							\s+
          							[A-Za-z_$][\w$]*
          							\s*
          							\{                                                  # start of class body
          						"
          					openSymbole ="\{"
          					closeSymbole="\}"
          				>
          					<className>
          						<nameExpr expr="(?-i:class)\s+\K[A-Za-z_$][\w$]*" />
          					</className>
          					<function
          						mainExpr="(?x)                                          # free-spacing (see `RegEx - Pattern Modifiers`)
          								\s*(?-i:\bfunction\b)?\s*
          								[A-Za-z_$][\w$]*
          								\s*(?-i:\bfunction\b)?\s*
          								\s*\([^()]*\)                                   # parameters
          								\s*\{                                           # start of function body
          							"
          					>
          						<functionName>
          							<funcNameExpr expr="[A-Za-z_$][\w$]*" />
          						</functionName>
          					</function>
          				</classRange>
          

          With the JavaScript code:

          class Test
          {
              constructor({test=0,test}) {
                  ...
              }
          
              someMethod(a,b,c) {
                  ...
              }
          }
          class Rectangle
          {
            constructor(height, width) {
              this.height = height;
              this.width = width;
            }
          }
          
          function blah(a,b,c) {
              ...
          }
          

          the Function List showed me:
          d0660616-621a-4e07-a9f1-4cd642fd1e5c-image.png

          … so that tells me it’s at least a reasonable starting point.

          Stephan Romhart 0S 1 Reply Last reply Reply Quote 1
          • Stephan Romhart 0S
            Stephan Romhart 0 @PeterJones
            last edited by Stephan Romhart 0

            @PeterJones Hello Peter, thank you very, very much.

            It is strange. Sometimes, the regex seems to work, sometimes not. If I use your regex definition, I have scripts that work and other that don’t.

            So I tried to figure out the two regexes to catch the class and its methods:

            if I test your class regex as a one liner, it works
            https://regex101.com/r/UBrx23/1

            if I test your method regex as a one liner, it matches also all if, for etc items.
            https://regex101.com/r/eWYfV1/1

            so I updated both regexes to clean them up:
            class regex
            https://regex101.com/r/6rHP8q/1

            methods regex
            https://regex101.com/r/RpV9ph/1

            now, both seam to match correctly, but in the 10 different js-files, only the half worked :-)

            Probably some one can see, what I am not getting here…

            Code for C&P

            		<parser
            			displayName="JavaScript"
            			id         ="javascript_function"
            			commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)"
            		>
            			<classRange
            				mainExpr    ="class [A-Za-z_$]*\s*\{"
            				openSymbole ="\{"
            				closeSymbole="\}"
            			>
            				<className>
            					<nameExpr expr="class [A-Za-z_$]*" />
            				</className>
            				<function
            					mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{"
            				>
            					<functionName>
            						<funcNameExpr expr="[A-Za-z_$][\w$]*" />
            					</functionName>
            				</function>
            			</classRange>
            			<function
            				mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*function(\s+[A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{"
            			>
            				<functionName>
            					<nameExpr expr="[A-Za-z_$][\w$]*\s*[=:]|[A-Za-z_$][\w$]*\s*\(" />
            					<nameExpr expr="[A-Za-z_$][\w$]*" />
            				</functionName>
            				<className>
            					<nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\." />
            					<nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*" />
            				</className>
            			</function>
            		</parser>
            
            PeterJonesP Stephan Romhart 0S 2 Replies Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @Stephan Romhart 0
              last edited by

              @Stephan-Romhart-0 ,

              regex101 does not use the same regular expression engine as Notepad++ uses (N++ uses Boost). Every engine has its own quirks and rules, and writing a regex that works with one does not guarantee it works with another, so just because it works on regex101 doesn’t mean it will work with N++, and vice versa.

              And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.

              As an idea, if my method regex is matching too many keywords and thinking they are methods, you could use the examples from @MAPJe71’s file that I linked, which has a negative lookahead (?!...) to prevent it from thinking that if(...) and for(...) and similar are function names.

              Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
              • Stephan Romhart 0S
                Stephan Romhart 0 @Stephan Romhart 0
                last edited by

                @Stephan-Romhart-0 I think, I found out:

                If the last line of the file is the classes closing “}”, it does not work.

                When I do after the “}” an Enter, it works.

                So it probably has to do with the closeSymbol?

                PeterJonesP 1 Reply Last reply Reply Quote 0
                • Stephan Romhart 0S
                  Stephan Romhart 0 @PeterJones
                  last edited by

                  @PeterJones said in Function list parser for classes:

                  And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.

                  Sorry, I didn’t mean to sound harsh. You have helped me so much!!!

                  1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @Stephan Romhart 0
                    last edited by

                    @Stephan-Romhart-0 said in Function list parser for classes:

                    If the last line of the file is the classes closing “}”, it does not work.

                    Oh, there’s nothing you can do to fix that. That’s just been a long-time limitation of the Function List parser. See the second post in the FAQ for known limitations.

                    Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
                    • Stephan Romhart 0S
                      Stephan Romhart 0 @PeterJones
                      last edited by

                      @PeterJones Thank you again
                      I will post my final solution in case some one needs the same regexes ;-)

                      MAPJe71M 1 Reply Last reply Reply Quote 1
                      • MAPJe71M
                        MAPJe71 @Stephan Romhart 0
                        last edited by

                        Have a look at the Java parser for inspiration:

                        <?xml version="1.0" encoding="UTF-8" ?>
                        <!-- ==========================================================================\
                        |
                        |   To learn how to make your own language parser, please check the following
                        |   link:
                        |       https://npp-user-manual.org/docs/function-list/
                        |
                        \=========================================================================== -->
                        <NotepadPlus>
                        	<functionList>
                        		<!--
                        		|   Based on:
                        		|       https://community.notepad-plus-plus.org/topic/12691/function-list-with-java-problems
                        		|
                        		|   20161116:
                        		|   - added embedded comment to RegEx;
                        		|   - removed `commentExpr` as it prevents classes and functions
                        		|     from showing in the FunctionList tree when they contain
                        		|     comments and/or literal strings;
                        		|	commentExpr="(?x)                                                   # free-spacing (see `RegEx - Pattern Modifiers`)
                        		|				(?s:                                                    # Multi Line Comment
                        		|					\x2F\x2A{1}                                         # - starts with a forward-slash and one asterisk
                        		|					(?:                                                 # - followed by zero or more characters
                        		|						[^\x2A\x5C]                                     #   ...not an asterisk and not a backslash (i.e. escape character)
                        		|					|	\x2A[^\x2F]                                     #   ...or an asterisk not followed by a forward-slash
                        		|					|	\x5C.                                           #   ...or a backslash followed by any character
                        		|					)*                                                  #
                        		|					\x2A\x2F                                            # - ends with an asterisk and forward-slash
                        		|				)
                        		|			|	(?m-s:\x2F{2}.*$)                                       # Single Line Comment
                        		|			|	(?s:                                                    # JavaDoc Comment
                        		|					\x2F\x2A{2}                                         # - starts with a forward-slash and two asterisk'
                        		|					(?:                                                 # - followed by zero or more characters
                        		|						[^\x2A\x5C]                                     #   ...not an asterisk and not a backslash (i.e. escape character)
                        		|					|	\x2A[^\x2F]                                     #   ...or an asterisk not followed by a forward-slash
                        		|					|	\x5C.                                           #   ...or a backslash followed by any character
                        		|					)*                                                  #
                        		|					\x2A\x2F                                            # - ends with an asterisk and forward-slash
                        		|				)
                        		|			|	(?s:\x22(?:[^\r\n\x22\x5C]|\x5C[^\r\n])*\x22)           # String Literal - Double Quoted, no embedded line-breaks
                        		|			|	(?s:\x27(?:[^\r\n\x27\x5C]|\x5C[^\r\n])*\x27)           # String Literal - Single Quoted, no embedded line-breaks
                        		|		"
                        		|   - 'type name' and 'parent type name(s)' parts in function 'declarator'
                        		|     group/subroutine do not use "(?&amp;VALID_ID)" as it prevents
                        		|     classes and functions from showing in the FunctionList tree;
                        		|   20181130:
                        		|   - Fix for "Function List Omits Java Functions with Spaces Before Closing Parentheses"
                        		|     (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5085)
                        		\-->
                        		<parser
                        			displayName="Java"
                        			id         ="java_syntax"
                        		>
                        			<classRange
                        				mainExpr    ="(?x)                                              # free-spacing (see `RegEx - Pattern Modifiers`)
                        						(?m)                                                    # ^ and $ match at line-breaks
                        						^[\t\x20]*                                              # optional leading white-space at start-of-line
                        						(?:
                        							(?-i:
                        								abstract
                        							|	final
                        							|	native
                        							|	p(?:rivate|rotected|ublic)
                        							|	s(?:tatic|trictfp|ynchronized)
                        							|	transient
                        							|	volatile
                        							|	@[A-Za-z_]\w*                                   # qualified identifier
                        								(?:                                             # consecutive names...
                        									\.                                          # ...are dot separated
                        									[A-Za-z_]\w*
                        								)*
                        							)
                        							\s+
                        						)*
                        						(?-i:class|enum|@?interface)
                        						\s+
                        						(?'DECLARATOR'
                        							(?'VALID_ID'                                        # valid identifier, use as subroutine
                        								\b(?!(?-i:                                      # keywords (case-sensitive), not to be used as identifier
                        									a(?:bstract|ssert)
                        								|	b(?:oolean|reak|yte)
                        								|	c(?:ase|atch|har|lass|on(?:st|tinue))
                        								|	d(?:efault|o(?:uble)?)
                        								|	e(?:lse|num|xtends)
                        								|	f(?:inal(?:ly)?|loat|or)
                        								|	goto
                        								|	i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?)
                        								|	long
                        								|	n(?:ative|ew)
                        								|	p(?:ackage|rivate|rotected|ublic)
                        								|	return
                        								|	s(?:hort|tatic|trictfp|uper|witch|ynchronized)
                        								|	th(?:is|rows?)|tr(?:ansient|y)
                        								|	vo(?:id|latile)
                        								|	while
                        								)\b)
                        								[A-Za-z_]\w*                                    # valid character combination for identifiers
                        							)
                        							(?:
                        								\s*\x3C                                         # start-of-template indicator...
                        								(?'GENERIC'                                     # ...match first generic, use as subroutine
                        									\s*
                        									(?:
                        										(?&amp;DECLARATOR)                      # use named generic
                        									|	\?                                      # or unknown
                        									)
                        									(?:                                         # optional type extension
                        										\s+(?-i:extends|super)
                        										\s+(?&amp;DECLARATOR)
                        										(?:                                     # multiple bounds...
                        											\s+\x26                             # ...are ampersand separated
                        											\s+(?&amp;DECLARATOR)
                        										)*
                        									)?
                        									(?:                                         # match consecutive generics objects...
                        										\s*,                                    # ...are comma separated
                        										(?&amp;GENERIC)
                        									)?
                        								)
                        								\s*\x3E                                         # end-of-template indicator
                        							)?
                        							(?:                                                 # package and|or nested classes...
                        								\.                                              # ...are dot separated
                        								(?&amp;DECLARATOR)
                        							)?
                        						)
                        						(?:                                                     # optional object extension
                        							\s+(?-i:extends)
                        							\s+(?&amp;DECLARATOR)
                        							(?:                                                 # consecutive objects...
                        								\s*,                                            # ...are comma separated
                        								\s*(?&amp;DECLARATOR)
                        							)*
                        						)?
                        						(?:                                                     # optional object implementation
                        							\s+(?-i:implements)
                        							\s+(?&amp;DECLARATOR)
                        							(?:                                                 # consecutive objects...
                        								\s*,                                            # ...are comma separated
                        								\s*(?&amp;DECLARATOR)
                        							)*
                        						)?
                        						\s*\{                                                   # whatever, until start-of-body indicator
                        					"
                        				openSymbole ="\{"
                        				closeSymbole="\}"
                        			>
                        				<className>
                        					<nameExpr expr="(?-i:class|enum|@?interface)\s+\K\w+(?:\s*\x3C.*?\x3E)?" />
                        				</className>
                        				<function
                        					mainExpr="(?x)                                              # free-spacing (see `RegEx - Pattern Modifiers`)
                        							^[\t\x20]*                                          # optional leading white-space at start-of-line
                        							(?:
                        								(?-i:
                        									abstract
                        								|	final
                        								|	native
                        								|	p(?:rivate|rotected|ublic)
                        								|	s(?:tatic|trictfp|ynchronized)
                        								|	transient
                        								|	volatile
                        								|	@[A-Za-z_]\w*                               # qualified identifier
                        									(?:                                         # consecutive names...
                        										\.                                      # ...are dot separated
                        										[A-Za-z_]\w*
                        									)*
                        								)
                        								\s+
                        							)*
                        							(?:
                        								\s*\x3C                                         # start-of-template indicator
                        								(?&amp;GENERIC)
                        								\s*\x3E                                         # end-of-template indicator
                        							)?
                        							\s*
                        							(?'DECLARATOR'
                        								[A-Za-z_]\w*                                    # (parent) type name
                        								(?:                                             # consecutive sibling type names...
                        									\.                                          # ...are dot separated
                        									[A-Za-z_]\w*
                        								)*
                        								(?:
                        									\s*\x3C                                     # start-of-template indicator
                        									(?'GENERIC'                                 # match first generic, use as subroutine
                        										\s*
                        										(?:
                        											(?&amp;DECLARATOR)                  # use named generic
                        										|	\?                                  # or unknown
                        										)
                        										(?:                                     # optional type extension
                        											\s+(?-i:extends|super)
                        											\s+(?&amp;DECLARATOR)
                        											(?:                                 # multiple bounds...
                        												\s+\x26                         # ...are ampersand separated
                        												\s+(?&amp;DECLARATOR)
                        											)*
                        										)?
                        										(?:                                     # consecutive generics objects...
                        											\s*,                                # ...are comma separated
                        											(?&amp;GENERIC)
                        										)?
                        									)
                        									\s*\x3E                                     # end-of-template indicator
                        								)?
                        								(?:                                             # package and|or nested classes...
                        									\.                                          # ...are dot separated
                        									(?&amp;DECLARATOR)
                        								)?
                        								(?:                                             # optional compound type...
                        									\s*\[                                       # ...start-of-compound indicator
                        									\s*\]                                       # ...end-of-compound indicator
                        								)*
                        							)
                        							\s+
                        							(?'VALID_ID'                                        # valid identifier, use as subroutine
                        								\b(?!(?-i:                                      # keywords (case-sensitive), not to be used as identifier
                        									a(?:bstract|ssert)
                        								|	b(?:oolean|reak|yte)
                        								|	c(?:ase|atch|har|lass|on(?:st|tinue))
                        								|	d(?:efault|o(?:uble)?)
                        								|	e(?:lse|num|xtends)
                        								|	f(?:inal(?:ly)?|loat|or)
                        								|	goto
                        								|	i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?)
                        								|	long
                        								|	n(?:ative|ew)
                        								|	p(?:ackage|rivate|rotected|ublic)
                        								|	return
                        								|	s(?:hort|tatic|trictfp|uper|witch|ynchronized)
                        								|	th(?:is|rows?)|tr(?:ansient|y)
                        								|	vo(?:id|latile)
                        								|	while
                        								)\b)
                        								[A-Za-z_]\w*                                    # valid character combination for identifiers
                        							)
                        							\s*\(                                               # start-of-parameters indicator
                        							(?'PARAMETER'                                       # match first parameter, use as subroutine
                        								\s*(?-i:final\s+)?
                        								(?&amp;DECLARATOR)
                        								\s+(?&amp;VALID_ID)                             # parameter name
                        								(?:                                             # consecutive parameters...
                        									\s*,                                        # ...are comma separated
                        									(?&amp;PARAMETER)
                        								)?
                        							)?
                        							\s*\)                                               # end-of-parameters indicator
                        							(?:                                                 # optional exceptions
                        								\s*(?-i:throws)
                        								\s+(?&amp;VALID_ID)                             # first exception name
                        								(?:                                             # consecutive exception names...
                        									\s*,                                        # ...are comma separated
                        									\s*(?&amp;VALID_ID)
                        								)*
                        							)?
                        							[^{;]*\{                                            # start-of-function-body indicator
                        						"
                        				>
                        					<functionName>
                        						<funcNameExpr expr="\w+(?=\s*\()" />
                        					</functionName>
                        				</function>
                        			</classRange>
                        		</parser>
                        	</functionList>
                        </NotepadPlus>
                        
                        Stephan Romhart 0S 1 Reply Last reply Reply Quote 2
                        • Stephan Romhart 0S
                          Stephan Romhart 0 @MAPJe71
                          last edited by

                          @MAPJe71 Thank you very much, I will study it :-)

                          Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
                          • Stephan Romhart 0S
                            Stephan Romhart 0 @Stephan Romhart 0
                            last edited by

                            @Stephan-Romhart-0
                            My final javascript class parser I use now for work is the folling:

                            <parser displayName="JavaScript" id="javascript_function" commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)">
                            			<classRange mainExpr="class [A-Za-z_$]*\s*\{" openSymbole ="\{" closeSymbole="\}">
                            				<className>
                            					<nameExpr expr="Klasse: [A-Za-z_$]*" />
                            				</className>
                            				<function mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{">
                            					<functionName>
                            						<funcNameExpr expr="[A-Za-z_$][\w$]*" />
                            					</functionName>
                            				</function>
                            			</classRange>
                            </parser>
                            

                            It works for me at the moment, because I use only js classes with declaration.

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors