Purpose of <nameExpr> in functionList.xml

rossjparker

Hi gang - I might be missing something simple so thought it worthy of a quick question.

I’ve successfully defined a custom language and added a parser for pulling out functions in functionList.xml. It is displaying the functions nicely - but I am trying to tweak sorting the functions.

We’re using the convention of starting the name with an underscore (_) if the function is a ‘private’ function that should only be called within that module (file), and without an underscore if the function is ‘public’. The functions are freely mixed in order in the module.

What I’d like to do is be able to sort the function list alphanumerically ignoring the leading underscore of the function name (if present).

I thought that’s what <nameExpr> would allow me to do - allow me to match just the function name ignoring the leading underscore (if present) - but it doesn’t seem to work like that. Am I missing something fundamental?

My parser in functionList.xml:

		<parser
			displayName="CiCode"
			id         ="cicode_function"
			commentExpr="(?x)(?m-s:;(?!\$PATH).*?$)"
		>
			<function
				mainExpr="(?x)
							(?:FUNCTION\s+)\K\w+\(.*?\)"
			/>
				<functionName>
					<nameExpr expr="\_*\K.*" />
				</functionName>
		</parser>

The “_*\K” is what I think should be discarding the leading underscore (if present) from the function name. (In fact it will discard any number of leading underscores - which is fine.)

Some sample code of a function with leading underscore:

INT
FUNCTION
_KRN_Alarm_InitCountQue(STRING sAlarmGroup = "")
    INT		iQueTmp;
    INT		iType;
    INT		iData;

And without a leading underscore:

INT
FUNCTION
KRN_Alarm_AcknowledgeAllEnabled(INT iMonitor = -1)
    INT		bEnabled;

When I show the function list and sort it alphanumerically all the functions with leading underscores appear before the functions without leading underscores. They should be mixed without regard for the leading underscore.

Have I misunderstood what <nameExpr> is used for? Or have I got the regex wrong?

Notepad++ v7.7.1

guy038

Hello, @rossjparker and All,

I’m not a specialist of XML/HTML but I think that you could simplify your parser as below :

		<parser 
			displayName="CiCode"
			id         ="cicode_function"
			commentExpr="(?-si);(?!\$PATH).*"
		>
			<function
				mainExpr="(?-i)FUNCTION\s+\K\w+?\(.*?\)"
			>
				<functionName>
					<nameExpr expr="_?\K.+" />
				</functionName>
			</function>
		</parser>

Note that I changed all the regexes, as well as the line after mainExpr, from \> to >

Now, if you do not have any additional extraction, from the result of the mainExpr regex, you could even use the minimum syntax, below :

		<parser	displayName="CiCode" id="cicode_function" commentExpr="(?-si);(?!\$PATH).*" >
			<function mainExpr="(?-i)FUNCTION\s+_?\K\w+?\(.*?\)" />
		</parser>

That I tested, in a simple text file, containing your examples of functions, with the syntax :

			<association id="cicode_function" langID="0" />   <!-- L_TEXT -->

added in the associationMap node

and… everything went fine ;-))

I assumed that the key word FUNCTION is always uppercase. If not the case, simply modify the (?-i) modifier into (?i), before word FUNCTION, in the regex !

Best Regards,

guy038

rossjparker

@guy038 - thanks so much for taking the time to reply. You sorted my problem out!

It was the line where you said:

Note that I changed all the regexes, as well as the line after mainExpr, from \> to >

That was the key for me. I completely failed to spot my error where I terminated the <function> element prematurely by closing the tag with \> meaning the subsequent <functionName> and <nameExpr> elements were not being evaluated at all! Hence my confusion about the purpose of <nameExpr> as it didn’t seem to matter what I put in there; it had no effect on the function parser.

For the record, here is what my parser looks like now:

		<parser
			displayName="CiCode"
			id         ="cicode_function"
			commentExpr=""
		>
			<function
				mainExpr="(?x)												# free-spacing for commenting
							(?-i)											# ignore case
							FUNCTION\s+										# CiCode functions marked by keyword 'function' (any case)
							\K												# discard everything matched so far - 'function' keyword not part of function name
							\w+?											# function name itself (non-greedy match)
							\(.*?\)											# function parameter list surrounded by brackets
						"
			/>
		</parser>

Everything is working as I expect it to now - the function list is displaying nicely. I wasn’t actually able to achieve my original goal (display function names with the leading underscore (if present), but sort them alpha-numerically without regard for any leading underscores), but that’s fine - I can live with that. :)

Thank you also for the case-insensitive tip - yes it can be written as function or FUNCTION.

Interesting footnote: you will see the commentExpr string is empty. I had originally just copied it from another parser section as a starting template and not modified it. I don’t need it for my work (I have never seen CiCode files with comments between the FUNCTION keyword and the function name, although technically it is possible). But if you delete the commentExpr line entirely the whole parser stops working!

This is not what the documentation on https://notepad-plus-plus.org/features/function-list.html says:

comment(sic): Optional. you can make a RE in this attribute in order to identify comment zones. The identified zones will be ignored by search.`

Who maintains the documentation pages? Is there any way for the community to contribute and fix errors on the documentation pages?

Thank you once again @guy038.

guy038

Hi, @rossjparker and All,

Pleased that everything went right ;-)) I just want to add two remarks !

First, I would like to point out that :

The syntax (?-i) modifier means that the subsequent part of the regex is treated in a NON-insensitive way ( So, sensitive ! )
Conversely, the (?i) modifier supposes that the subsequent part of the regex is treated in a insensitive way

So your line, regarding case, should be, as below :

                            (?i)                                           # ignore case : (i)nsensitive

Now, you said :

(I have never seen CiCode files with comments between the FUNCTION keyword and the function name, although technically it is possible)

But, on the contrary, the commentExpr attribute allows to prevent from detecting any function embedded inside comment blocks !

For instance; let’s imagine this sample text, below :

INT
FUNCTION
_KRN_Alarm_InitCountQue(STRING sAlarmGroup = "")
    INT     iQueTmp;
    INT     iType;
    INT     iData;

; ----------------------------------------------------------
INT   ;   FUNCTION _KRN_Test_Ross() is NOT used anymore !
; ----------------------------------------------------------

INT
function
KRN_Alarm_AcknowledgeAllEnabled(INT iMonitor = -1)
    INT     bEnabled;

With your present line, below :

            commentExpr=""

You should see 3 function names, displayed in the Function List panel, although one of them is located inside a comment line

Whereas, with the line, below :

            commentExpr="(?-si);(?!\$PATH).*"

Only 2 functions are, correctly, displayed, as expected !

Of course, if, in the sample test, you delete the comment symbol ; only, in the line between the two comment lines of dashes, you should see, again, the function _KRN_Test_Ross() !

Note that the present comment regex, (?-si);(?!\$PATH).*, means that a comment zone :

Is a mono-line zone, due to the (?-s) modifier
Begins with a semi-colon ; symbol, at any position of current line
Is followed with a range, possibly null, ( .* ) of standard characters, due to the (?-s) modifier
But ONLY IF the string $PATH, with this exact case ( ?-i), does not appear right after the ;, due to the negative look-ahead (?!\$PATH)

Cheers

guy038

rossjparker

Thanks once again @guy038. Your valuable suggestions/corrections have been incorporated once again:

		<parser
			displayName="CiCode"
			id         ="cicode_function"
			commentExpr="(?x)												# free-spacing for commenting
							(?s:\x2F\x2A.*?\x2A\x2F)						# Multi Line Comment: /* ... */
						|	(?m-s:\x2F{2}.*$)								# Single Line Comment: // ...
						|	(?m-s:\!.*$)									# Single Line Comment: ! ...
						|	(?s:\x22(?:[^\x22\x5C]|\x5C.)*\x22)				# String Literal - Double Quoted
						|	(?s:\x27(?:[^\x27\x5C]|\x5C.)*\x27)				# String Literal - Single Quoted
						"
		>
			<function
				mainExpr="(?x)												# free-spacing for commenting
							(?i)											# ignore case
							FUNCTION\s+										# CiCode functions marked by keyword 'function' (any case)
							\K												# discard everything matched so far - 'function' keyword not part of function name
							\w+?											# function name itself (non-greedy match)
							\(.*?\)											# function parameter list surrounded by brackets
						"
			/>
		</parser>

This is my latest version of the parser with a proper commentExpr parser. (CiCode supports C-style comments with the addition of !<comment-to-end-of-line> being synonymous with //<comment-to-end-of-line>.

One of my files does indeed have several commented functions - these are now correctly ignored. Thanks again.