Trouble making functionlist parser for user defined language MOX

bookwyrm12

I have a custom user defined language called MOX. I am trying to add the parser for it in the functionList.xml. The problem is that it is only showing the first function in my file.

I’m sure I’m just missing something basic, but I’m not sure if it’s in my regex or in my understanding of how the NP++ function list works.

My additions to functionList.xml:

<association id=	"mox_syntax"				userDefinedLangName="MOX"		/>
<association id=	"mox_syntax"				ext=".mox"						/>

<!-- ========================================================= [ MOX ] -->
			
<parser
	id         ="mox_syntax"
	displayName="MOX"
>
	<function
		mainExpr="(^|\s+)(Function|Method|Macro) (\w+)(\.\w+)*\((.*)\)"
	>
		<functionName>
			<nameExpr expr="([A-Za-z_]?\w*(\.\w+)*\s*\()" />
		</functionName>
	</function>
</parser>

Sample MOX code:

Method Blah1()
	[New] dynamicVar = "string"
	[New] dynamicVar2 = 4
	[New] result = Bar2(dynamicVar, dynamicVar2)
End Method

Function Bar2(pParam3, pParam2)
	Foo3
	Return pParam3 && pParam2
End Function

Macro Foo3()
End Macro

Function list display:

sample.mox
  |--->  Blah1(

MAPJe71

@bookwyrm12

Based on your parser and source example I presumed the name of a function, method and macro can contain (base/parent-)class-name(s).
I adapted your parser and added some comment.

<parser
    displayName="MOX"
    id         ="mox_syntax"
>
    <function
        mainExpr="(?x)                       # free-spacing (see `RegEx - Pattern Modifiers`)
                (?m)                         # ^ and $ match at line breaks
                ^\h*                         # optional leading whitespace at start-of-line
                (?i:FUNCTION|METHOD|MACRO)   # start-of indicator, case-insensitive
                \s+                          # trailing whitespace required
                \K                           # discard text matched so far
                (?:(?&amp;VALID_ID)\.)*      # optional class identifier prefix
                (?'VALID_ID'                 # valid identifier, use as subroutine
                    [A-Za-z_]\w*             # valid character combination for identifiers
                )
                \s*\(                        # start-of-parameter-list indicator
                [^)]*                        # parameter list
                \)                           # end-of-parameter-list indicator
            "
    >
        <functionName>
            <nameExpr expr="(?x)             # free-spacing (see `RegEx - Pattern Modifiers`)
                    (?:(?&amp;VALID_ID)\.)*  # optional class identifier prefix
                    \K                       # discard text matched so far
                    (?'VALID_ID'             # valid identifier, use as subroutine
                        [A-Za-z_]\w*         # valid character combination for identifiers
                    )
                "
            />
        </functionName>
        <className>
            <nameExpr expr="(?x)             # free-spacing (see `RegEx - Pattern Modifiers`)
                    (?'VALID_ID'             # valid identifier, use as subroutine
                        [A-Za-z_]\w*         # valid character combination for identifiers
                    )
                    (?:\.(?&amp;VALID_ID))*  # optional sub-class-name(s)
                    (?=\.)                   # exclude last (sub-)class-method-name separator
                "
            />
        </className>
    </function>
</parser>

bookwyrm12

@MAPJe71

Thank you so much, that’s brilliant! (And yes, correct assumption regarding base class names.)

Looking at other parsers I’ve noticed the commentExpr attribute, and I was wondering what that does? Does it allow functions that are commented out to still show up in the function list? Or does it do the opposite?

MAPJe71

@bookwyrm12

The commentExpr attribute is intended to have the parser discard the text it matches.
See also my comment in this topic.

bookwyrm12

@MAPJe71

So, for example, if this language only has single line comments denoted by a single quote at the beginning of the line, like so:

Method Blah()
    'Single line comment
End Method

'Commented out method:
'Method Blah2()
'End Method

Then it’s unnecessary to add the commentExpr attribute, because the 'Method Blah2() line doesn’t match the parser for functions anyways (because it’s matching on the beginning of a line or leading whitespace, so any character other than those don’t match)… right?

MAPJe71

@bookwyrm12

because it’s matching on the beginning of a line or leading white space

That’s the explanation of the (^|\s+)(Function... in the main expression of your parser (presuming the ^ matches at line breaks):
^ : match at the beginning of a line
| : OR
\s+ : leading white space (i.e. space, tab, newline, carriage return, vertical tab)
Function... : followed by the keyword(s)

But without the commentExpr attribute it can still match out-commented functions e.g.:

'   Function ExampleFunction()
    End Function

The ExampleFunction() does not match ^Function... but it does match the alternative expression track \s+Function....

With a “(?m-s)^'.*$” set as commentExpr attribute it will discard all lines starting with a single quote and thus not match the above ExampleFunction().

The explanation of the (?m)^\h*(Function... in the main expression of my parser:
(?m) : have the ^ match at line breaks
^ : match at the beginning of line AND (i.e. one expression track)
\h* : optional leading white space (i.e. space, tab)
Function... : followed by the keyword(s)
This will prevent the above ExampleFunction() from being a match even without the commentExpr attribute set.

bookwyrm12

@MAPJe71

First of all, thanks so much for all of your help!
I have a couple more questions:

Does the Notepad++ function list only support one level of hierarchy? (I’m sort of assuming this is true, but if it does support multiple levels that would be awesome!)
ie. If I have functions in this sort of format:

Method Blah()
    [New] dynamicVar = "string"
    [New] dynamicVar2 = 4
    [New] result = Blah.Bar(dynamicVar, dynamicVar2)
End Method

Function Blah.Bar(pParam3, pParam2)
    Blah.Bar.Foo1
    Return pParam3 && pParam2
End Function

Method Blah.Bar.Foo1()
End Method

Method Blah.Bar.Foo2()
End Method

Macro Blah.Bar.Foo3()
End Macro

Method Blah.Bar.Foo3.Test1()
End Method

Method Blah.Bar.Foo3.Test2()
End Method

…I get a functionlist that looks like this:

sample.mox
  |--> Blah
        |--> Bar
  |--> Blah.Bar
        |--> Foo1
		|--> Foo2
		|--> Foo3
  |--> Blah.Bar.Foo3
        |--> Test1
		|--> Test2

I’m looking into getting this language parser into the Notepad++ functionlist source code. How can I do this?

MAPJe71

@bookwyrm12

Well actually two levels: class and method/function (using C++ terminology);
With a pull-request for the Notepad++ GitHub repository or I could add it to my Function List Update 3.

I get: