Function List does not ignore commented class' opening and closing symbols



  • Hi,

    I’m trying to add a custom function parser to the file functionList.xml used by the ‘Function List’ feature, in order to parse functions in a UDL.
    I’ve been able to add the parser and get it to work the way I want (after forgoing on adding support for nested functions/classes, since they are not supported by the Function List engine as stated here).

    However , I’ve come over a problem for which I can’t find way to solve.
    The language I’m trying to parse, AHK, uses curly braces as functions and classes’ delimiters. It defines classes with the keyword “class” followed by the class’ name, and optionally the name of the parent class preceded by the keyword “extends”, as shown in the following example. Functions and methods are defined by simply writing their name, followed by brackets containing its arguments if any and the block of code enclosed in curly brackets, like shown here:

    class className extends parentName {
        ...
    }
    myFunction() {
        ...
    }
    

    However, curly braces can also be present in strings (quoted by double quotes, only supported as single-line strings) and comments, case in which they have no special meaning as block delimiters.
    Yet, if I specify “\{” as the openSymbole attribute for the classRange element in my parser element in the functionList.xml file, and “\}” as closeSymbole, it would fail miserably against a code like this:

    class myClass {
        myMethod() {
            return "}"
        }
        myOtherMethod() {
            ...
        }
    }
    myGlobalFunction() {
        ...
    }
    

    This will lead the Function List to think that the class myClass ends on the closing brace of the return statement on the third line of the example. and as such, clasify myOtherMethod (and any other subsequent methods) as a function instead.
    Results much worse occur if the closing brace in the example was an opening one, which would lead the RE to fail to match the class’ range until reaching the EOF, and completely ignore it.

    I’ve tried adding quoted strings to the commentExpr attribute on the parser, which technically should cause the parser to ignore all strings for this purpose, but it doesn’t seem affected by this change (yes, I’ve already saved the changes to the xml file and reloaded Notepad++).

    The header of my parser looks like this:

    <parser id="ahk_syntax" displayName="AHK Class"
    		commentExpr="((/\*.*?\*/|(^|\s);.*?$)\x22[^\n\r\x22]*\x22)">
    	<!--\x22 stands for a double quote, since it's a special XML character-->
    	<classRange
    		mainExpr="^[\t ]*class[\t ]+\K\w+([\t ]+extends[\t ]+\w+)?\s*\{"
    		openSymbole="\{"
    		closeSymbole="\}"
    		displayMode="class $className">
    		<className>
    			<nameExpr expr="\w+"/>
    		</className>
    		<function
    			mainExpr="(?x)..."
    			...
    

    Does anyone know if this is intended behaviour, or a bug? Or perhaps I’m doing something wrong here?

    I would appreciate any comments, tips or help.



  • @David-Ignacio-Alcántara-García

    1. Your commentExpr is wrong.
    2. The displayMode attribute of classRange is not used, so can be removed.

    ad.1. Please try this one:

    				commentExpr="(?x)                                               # free-spacing (see `RegEx - Pattern Modifiers`)
    							(?:                                                 # Multi Line Comment
    								(?ms)                                           # - ^, $ and dot match at line-breaks
    								^\x2F\x2A                                       # - start-of-comment indicator at start-of-line
    								.*?                                             # - whatever, until
    								^\x2A\x2F                                       # - end-of-comment indicator at start-of-line
    							)
    						|	(?m-s:;.*$)                                         # Single Line Comment
    						|	(?:\x22[^\r\n\x22]*\x22)                            # String Literal - Double Quoted
    					"
    


  • Oh my… I didn’t type the disjunction pipe (|) between the single line comment’s RE and the string literals’ one. How awful of me…
    Thank you very much, @MAPJe71 , yours works flawlessly. You’ve even added the restriction of multi-line comments’ delimiters to be at the start of the line, which I also missed.
    However, though not explicitly said in the AHK documentation, multi-line comments allow arbitrary indentation before their delimiters, so I’ve added that as well, a minor change.

    Also, single line comments’ delimiter in this language must not be preceded by non-space, so I’ve added the assertion (?<!\S) too, which is clearer than the original I used, (^|\s) (which is not even an assertion, but does the trick).
    And, regarding the flags, I think neither the multi-line nor single-line RE flags are explicitly needed in the expression, since they are both enabled by default in the parser (I’ve tested it), and the negated single-line flag can be bypassed by using the non-greedy wildcard, as I originally did. But, anyways, they clarify the purpose of the RE, and even if the Functions List engine changed its behavior regarding this matter, your RE would continue to work, so I’ll use the explicit flags as well.

    I should probably have done the comentExpr and classRange's mainExpr with the extended RE flag, as the other larger ones, so this wouldn’t have happened.
    Sorry I posted for nothing, it was already night and I was losing all hope. Thank you for your quick response. ^^

    Also, regarding the displayMode attribute, I knew it wasn’t used as of now, but I had read it was reserved for future use, so I tried to guess something and see if it worked in the future. But now that you mention it, I’ve checked the other native parsers in the file and none of them include it, so I guess this implementation has perhaps been cancelled and I’ve removed it from mine as well. Thanks.

    Btw, I’ve noticed that global functions are unaffected by the commentExpr, yet not methods within classes. It’s not a problem at all, but it surprised me, so I’ll let it written here in case someone is puzzled by this as well. In a code like this:

    class myClass {
    /*
        myMethod() ...
    */
    }
    /*
    myFunction() ...
    */
    

    myFunction will be found by the parser, yet being commented, but not myMethod, which is commented as well (regardless of the indentation of the comment delimiters).