Community
    • Login

    Function list parser for classes

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    14 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones @PeterJones
      last edited by PeterJones

      @Stephan-Romhart-0 ,

      I tried out the FunctionList definition that I linked to, and it worked okay for functions, but didn’t seem to handle classes the way I’d expect. Looking at the details, unless I was mis-reading, I think that was looking for var as the start of a class… so i think it was looking for var myObject = { ... } rather than the class Name { ... } definition. Sorry.

      Since I was curious at this point, I kept the comments and function definition from that one, but then changed the classRange to be looking for the class keyword. I’m not an expert on JavaScript syntax, so this shouldn’t be taken as a canonical or all-encompassing, but it worked with your simple example and a slightly expanded example I made. It will probably take work on your part to make it handle everything you want, but if I use:

      				<classRange
      					mainExpr    ="(?x)                                          # free-spacing (see `RegEx - Pattern Modifiers`)
      							(?-i:class)
      							\s+
      							[A-Za-z_$][\w$]*
      							\s*
      							\{                                                  # start of class body
      						"
      					openSymbole ="\{"
      					closeSymbole="\}"
      				>
      					<className>
      						<nameExpr expr="(?-i:class)\s+\K[A-Za-z_$][\w$]*" />
      					</className>
      					<function
      						mainExpr="(?x)                                          # free-spacing (see `RegEx - Pattern Modifiers`)
      								\s*(?-i:\bfunction\b)?\s*
      								[A-Za-z_$][\w$]*
      								\s*(?-i:\bfunction\b)?\s*
      								\s*\([^()]*\)                                   # parameters
      								\s*\{                                           # start of function body
      							"
      					>
      						<functionName>
      							<funcNameExpr expr="[A-Za-z_$][\w$]*" />
      						</functionName>
      					</function>
      				</classRange>
      

      With the JavaScript code:

      class Test
      {
          constructor({test=0,test}) {
              ...
          }
      
          someMethod(a,b,c) {
              ...
          }
      }
      class Rectangle
      {
        constructor(height, width) {
          this.height = height;
          this.width = width;
        }
      }
      
      function blah(a,b,c) {
          ...
      }
      

      the Function List showed me:
      d0660616-621a-4e07-a9f1-4cd642fd1e5c-image.png

      … so that tells me it’s at least a reasonable starting point.

      Stephan Romhart 0S 1 Reply Last reply Reply Quote 1
      • Stephan Romhart 0S
        Stephan Romhart 0 @PeterJones
        last edited by Stephan Romhart 0

        @PeterJones Hello Peter, thank you very, very much.

        It is strange. Sometimes, the regex seems to work, sometimes not. If I use your regex definition, I have scripts that work and other that don’t.

        So I tried to figure out the two regexes to catch the class and its methods:

        if I test your class regex as a one liner, it works
        https://regex101.com/r/UBrx23/1

        if I test your method regex as a one liner, it matches also all if, for etc items.
        https://regex101.com/r/eWYfV1/1

        so I updated both regexes to clean them up:
        class regex
        https://regex101.com/r/6rHP8q/1

        methods regex
        https://regex101.com/r/RpV9ph/1

        now, both seam to match correctly, but in the 10 different js-files, only the half worked :-)

        Probably some one can see, what I am not getting here…

        Code for C&P

        		<parser
        			displayName="JavaScript"
        			id         ="javascript_function"
        			commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)"
        		>
        			<classRange
        				mainExpr    ="class [A-Za-z_$]*\s*\{"
        				openSymbole ="\{"
        				closeSymbole="\}"
        			>
        				<className>
        					<nameExpr expr="class [A-Za-z_$]*" />
        				</className>
        				<function
        					mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{"
        				>
        					<functionName>
        						<funcNameExpr expr="[A-Za-z_$][\w$]*" />
        					</functionName>
        				</function>
        			</classRange>
        			<function
        				mainExpr="((^|\s+|[;\}\.])([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\s*[=:]|^|[\s;\}]+)\s*function(\s+[A-Za-z_$][\w$]*)?\s*\([^\)\(]*\)[\n\s]*\{"
        			>
        				<functionName>
        					<nameExpr expr="[A-Za-z_$][\w$]*\s*[=:]|[A-Za-z_$][\w$]*\s*\(" />
        					<nameExpr expr="[A-Za-z_$][\w$]*" />
        				</functionName>
        				<className>
        					<nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*\." />
        					<nameExpr expr="([A-Za-z_$][\w$]*\.)*[A-Za-z_$][\w$]*" />
        				</className>
        			</function>
        		</parser>
        
        PeterJonesP Stephan Romhart 0S 2 Replies Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @Stephan Romhart 0
          last edited by

          @Stephan-Romhart-0 ,

          regex101 does not use the same regular expression engine as Notepad++ uses (N++ uses Boost). Every engine has its own quirks and rules, and writing a regex that works with one does not guarantee it works with another, so just because it works on regex101 doesn’t mean it will work with N++, and vice versa.

          And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.

          As an idea, if my method regex is matching too many keywords and thinking they are methods, you could use the examples from @MAPJe71’s file that I linked, which has a negative lookahead (?!...) to prevent it from thinking that if(...) and for(...) and similar are function names.

          Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
          • Stephan Romhart 0S
            Stephan Romhart 0 @Stephan Romhart 0
            last edited by

            @Stephan-Romhart-0 I think, I found out:

            If the last line of the file is the classes closing “}”, it does not work.

            When I do after the “}” an Enter, it works.

            So it probably has to do with the closeSymbol?

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • Stephan Romhart 0S
              Stephan Romhart 0 @PeterJones
              last edited by

              @PeterJones said in Function list parser for classes:

              And remember, I said mine was a starting point, not the final version. I don’t have the time nor the skill to custom write a complete function list definition that matches all of your requirements. I just thought I’d give you something to help you get started. You will have to put in the effort to improve it to match your own specifications.

              Sorry, I didn’t mean to sound harsh. You have helped me so much!!!

              1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @Stephan Romhart 0
                last edited by

                @Stephan-Romhart-0 said in Function list parser for classes:

                If the last line of the file is the classes closing “}”, it does not work.

                Oh, there’s nothing you can do to fix that. That’s just been a long-time limitation of the Function List parser. See the second post in the FAQ for known limitations.

                Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
                • Stephan Romhart 0S
                  Stephan Romhart 0 @PeterJones
                  last edited by

                  @PeterJones Thank you again
                  I will post my final solution in case some one needs the same regexes ;-)

                  MAPJe71M 1 Reply Last reply Reply Quote 1
                  • MAPJe71M
                    MAPJe71 @Stephan Romhart 0
                    last edited by

                    Have a look at the Java parser for inspiration:

                    <?xml version="1.0" encoding="UTF-8" ?>
                    <!-- ==========================================================================\
                    |
                    |   To learn how to make your own language parser, please check the following
                    |   link:
                    |       https://npp-user-manual.org/docs/function-list/
                    |
                    \=========================================================================== -->
                    <NotepadPlus>
                    	<functionList>
                    		<!--
                    		|   Based on:
                    		|       https://community.notepad-plus-plus.org/topic/12691/function-list-with-java-problems
                    		|
                    		|   20161116:
                    		|   - added embedded comment to RegEx;
                    		|   - removed `commentExpr` as it prevents classes and functions
                    		|     from showing in the FunctionList tree when they contain
                    		|     comments and/or literal strings;
                    		|	commentExpr="(?x)                                                   # free-spacing (see `RegEx - Pattern Modifiers`)
                    		|				(?s:                                                    # Multi Line Comment
                    		|					\x2F\x2A{1}                                         # - starts with a forward-slash and one asterisk
                    		|					(?:                                                 # - followed by zero or more characters
                    		|						[^\x2A\x5C]                                     #   ...not an asterisk and not a backslash (i.e. escape character)
                    		|					|	\x2A[^\x2F]                                     #   ...or an asterisk not followed by a forward-slash
                    		|					|	\x5C.                                           #   ...or a backslash followed by any character
                    		|					)*                                                  #
                    		|					\x2A\x2F                                            # - ends with an asterisk and forward-slash
                    		|				)
                    		|			|	(?m-s:\x2F{2}.*$)                                       # Single Line Comment
                    		|			|	(?s:                                                    # JavaDoc Comment
                    		|					\x2F\x2A{2}                                         # - starts with a forward-slash and two asterisk'
                    		|					(?:                                                 # - followed by zero or more characters
                    		|						[^\x2A\x5C]                                     #   ...not an asterisk and not a backslash (i.e. escape character)
                    		|					|	\x2A[^\x2F]                                     #   ...or an asterisk not followed by a forward-slash
                    		|					|	\x5C.                                           #   ...or a backslash followed by any character
                    		|					)*                                                  #
                    		|					\x2A\x2F                                            # - ends with an asterisk and forward-slash
                    		|				)
                    		|			|	(?s:\x22(?:[^\r\n\x22\x5C]|\x5C[^\r\n])*\x22)           # String Literal - Double Quoted, no embedded line-breaks
                    		|			|	(?s:\x27(?:[^\r\n\x27\x5C]|\x5C[^\r\n])*\x27)           # String Literal - Single Quoted, no embedded line-breaks
                    		|		"
                    		|   - 'type name' and 'parent type name(s)' parts in function 'declarator'
                    		|     group/subroutine do not use "(?&amp;VALID_ID)" as it prevents
                    		|     classes and functions from showing in the FunctionList tree;
                    		|   20181130:
                    		|   - Fix for "Function List Omits Java Functions with Spaces Before Closing Parentheses"
                    		|     (https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5085)
                    		\-->
                    		<parser
                    			displayName="Java"
                    			id         ="java_syntax"
                    		>
                    			<classRange
                    				mainExpr    ="(?x)                                              # free-spacing (see `RegEx - Pattern Modifiers`)
                    						(?m)                                                    # ^ and $ match at line-breaks
                    						^[\t\x20]*                                              # optional leading white-space at start-of-line
                    						(?:
                    							(?-i:
                    								abstract
                    							|	final
                    							|	native
                    							|	p(?:rivate|rotected|ublic)
                    							|	s(?:tatic|trictfp|ynchronized)
                    							|	transient
                    							|	volatile
                    							|	@[A-Za-z_]\w*                                   # qualified identifier
                    								(?:                                             # consecutive names...
                    									\.                                          # ...are dot separated
                    									[A-Za-z_]\w*
                    								)*
                    							)
                    							\s+
                    						)*
                    						(?-i:class|enum|@?interface)
                    						\s+
                    						(?'DECLARATOR'
                    							(?'VALID_ID'                                        # valid identifier, use as subroutine
                    								\b(?!(?-i:                                      # keywords (case-sensitive), not to be used as identifier
                    									a(?:bstract|ssert)
                    								|	b(?:oolean|reak|yte)
                    								|	c(?:ase|atch|har|lass|on(?:st|tinue))
                    								|	d(?:efault|o(?:uble)?)
                    								|	e(?:lse|num|xtends)
                    								|	f(?:inal(?:ly)?|loat|or)
                    								|	goto
                    								|	i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?)
                    								|	long
                    								|	n(?:ative|ew)
                    								|	p(?:ackage|rivate|rotected|ublic)
                    								|	return
                    								|	s(?:hort|tatic|trictfp|uper|witch|ynchronized)
                    								|	th(?:is|rows?)|tr(?:ansient|y)
                    								|	vo(?:id|latile)
                    								|	while
                    								)\b)
                    								[A-Za-z_]\w*                                    # valid character combination for identifiers
                    							)
                    							(?:
                    								\s*\x3C                                         # start-of-template indicator...
                    								(?'GENERIC'                                     # ...match first generic, use as subroutine
                    									\s*
                    									(?:
                    										(?&amp;DECLARATOR)                      # use named generic
                    									|	\?                                      # or unknown
                    									)
                    									(?:                                         # optional type extension
                    										\s+(?-i:extends|super)
                    										\s+(?&amp;DECLARATOR)
                    										(?:                                     # multiple bounds...
                    											\s+\x26                             # ...are ampersand separated
                    											\s+(?&amp;DECLARATOR)
                    										)*
                    									)?
                    									(?:                                         # match consecutive generics objects...
                    										\s*,                                    # ...are comma separated
                    										(?&amp;GENERIC)
                    									)?
                    								)
                    								\s*\x3E                                         # end-of-template indicator
                    							)?
                    							(?:                                                 # package and|or nested classes...
                    								\.                                              # ...are dot separated
                    								(?&amp;DECLARATOR)
                    							)?
                    						)
                    						(?:                                                     # optional object extension
                    							\s+(?-i:extends)
                    							\s+(?&amp;DECLARATOR)
                    							(?:                                                 # consecutive objects...
                    								\s*,                                            # ...are comma separated
                    								\s*(?&amp;DECLARATOR)
                    							)*
                    						)?
                    						(?:                                                     # optional object implementation
                    							\s+(?-i:implements)
                    							\s+(?&amp;DECLARATOR)
                    							(?:                                                 # consecutive objects...
                    								\s*,                                            # ...are comma separated
                    								\s*(?&amp;DECLARATOR)
                    							)*
                    						)?
                    						\s*\{                                                   # whatever, until start-of-body indicator
                    					"
                    				openSymbole ="\{"
                    				closeSymbole="\}"
                    			>
                    				<className>
                    					<nameExpr expr="(?-i:class|enum|@?interface)\s+\K\w+(?:\s*\x3C.*?\x3E)?" />
                    				</className>
                    				<function
                    					mainExpr="(?x)                                              # free-spacing (see `RegEx - Pattern Modifiers`)
                    							^[\t\x20]*                                          # optional leading white-space at start-of-line
                    							(?:
                    								(?-i:
                    									abstract
                    								|	final
                    								|	native
                    								|	p(?:rivate|rotected|ublic)
                    								|	s(?:tatic|trictfp|ynchronized)
                    								|	transient
                    								|	volatile
                    								|	@[A-Za-z_]\w*                               # qualified identifier
                    									(?:                                         # consecutive names...
                    										\.                                      # ...are dot separated
                    										[A-Za-z_]\w*
                    									)*
                    								)
                    								\s+
                    							)*
                    							(?:
                    								\s*\x3C                                         # start-of-template indicator
                    								(?&amp;GENERIC)
                    								\s*\x3E                                         # end-of-template indicator
                    							)?
                    							\s*
                    							(?'DECLARATOR'
                    								[A-Za-z_]\w*                                    # (parent) type name
                    								(?:                                             # consecutive sibling type names...
                    									\.                                          # ...are dot separated
                    									[A-Za-z_]\w*
                    								)*
                    								(?:
                    									\s*\x3C                                     # start-of-template indicator
                    									(?'GENERIC'                                 # match first generic, use as subroutine
                    										\s*
                    										(?:
                    											(?&amp;DECLARATOR)                  # use named generic
                    										|	\?                                  # or unknown
                    										)
                    										(?:                                     # optional type extension
                    											\s+(?-i:extends|super)
                    											\s+(?&amp;DECLARATOR)
                    											(?:                                 # multiple bounds...
                    												\s+\x26                         # ...are ampersand separated
                    												\s+(?&amp;DECLARATOR)
                    											)*
                    										)?
                    										(?:                                     # consecutive generics objects...
                    											\s*,                                # ...are comma separated
                    											(?&amp;GENERIC)
                    										)?
                    									)
                    									\s*\x3E                                     # end-of-template indicator
                    								)?
                    								(?:                                             # package and|or nested classes...
                    									\.                                          # ...are dot separated
                    									(?&amp;DECLARATOR)
                    								)?
                    								(?:                                             # optional compound type...
                    									\s*\[                                       # ...start-of-compound indicator
                    									\s*\]                                       # ...end-of-compound indicator
                    								)*
                    							)
                    							\s+
                    							(?'VALID_ID'                                        # valid identifier, use as subroutine
                    								\b(?!(?-i:                                      # keywords (case-sensitive), not to be used as identifier
                    									a(?:bstract|ssert)
                    								|	b(?:oolean|reak|yte)
                    								|	c(?:ase|atch|har|lass|on(?:st|tinue))
                    								|	d(?:efault|o(?:uble)?)
                    								|	e(?:lse|num|xtends)
                    								|	f(?:inal(?:ly)?|loat|or)
                    								|	goto
                    								|	i(?:f|mp(?:lements|ort)|nstanceof|nt(?:erface)?)
                    								|	long
                    								|	n(?:ative|ew)
                    								|	p(?:ackage|rivate|rotected|ublic)
                    								|	return
                    								|	s(?:hort|tatic|trictfp|uper|witch|ynchronized)
                    								|	th(?:is|rows?)|tr(?:ansient|y)
                    								|	vo(?:id|latile)
                    								|	while
                    								)\b)
                    								[A-Za-z_]\w*                                    # valid character combination for identifiers
                    							)
                    							\s*\(                                               # start-of-parameters indicator
                    							(?'PARAMETER'                                       # match first parameter, use as subroutine
                    								\s*(?-i:final\s+)?
                    								(?&amp;DECLARATOR)
                    								\s+(?&amp;VALID_ID)                             # parameter name
                    								(?:                                             # consecutive parameters...
                    									\s*,                                        # ...are comma separated
                    									(?&amp;PARAMETER)
                    								)?
                    							)?
                    							\s*\)                                               # end-of-parameters indicator
                    							(?:                                                 # optional exceptions
                    								\s*(?-i:throws)
                    								\s+(?&amp;VALID_ID)                             # first exception name
                    								(?:                                             # consecutive exception names...
                    									\s*,                                        # ...are comma separated
                    									\s*(?&amp;VALID_ID)
                    								)*
                    							)?
                    							[^{;]*\{                                            # start-of-function-body indicator
                    						"
                    				>
                    					<functionName>
                    						<funcNameExpr expr="\w+(?=\s*\()" />
                    					</functionName>
                    				</function>
                    			</classRange>
                    		</parser>
                    	</functionList>
                    </NotepadPlus>
                    
                    Stephan Romhart 0S 1 Reply Last reply Reply Quote 2
                    • Stephan Romhart 0S
                      Stephan Romhart 0 @MAPJe71
                      last edited by

                      @MAPJe71 Thank you very much, I will study it :-)

                      Stephan Romhart 0S 1 Reply Last reply Reply Quote 0
                      • Stephan Romhart 0S
                        Stephan Romhart 0 @Stephan Romhart 0
                        last edited by

                        @Stephan-Romhart-0
                        My final javascript class parser I use now for work is the folling:

                        <parser displayName="JavaScript" id="javascript_function" commentExpr="(?s:/\*.*?\*/)|(?m-s://.*?$)">
                        			<classRange mainExpr="class [A-Za-z_$]*\s*\{" openSymbole ="\{" closeSymbole="\}">
                        				<className>
                        					<nameExpr expr="Klasse: [A-Za-z_$]*" />
                        				</className>
                        				<function mainExpr="^\t[A-Za-z_$]*\([a-zA-Z,=0-9]*\)\s*\{">
                        					<functionName>
                        						<funcNameExpr expr="[A-Za-z_$][\w$]*" />
                        					</functionName>
                        				</function>
                        			</classRange>
                        </parser>
                        

                        It works for me at the moment, because I use only js classes with declaration.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors