Need help with functionlist regex

MaDill

Hi,

for a userdefinde language I try to use the functionlist. For the subs/functions this is already working. But I want to see in the function list also some special markings/infos.

This is an example:

'---------
' TestFile
'---------

'******************
'# blabla
'* << Infoline 1 >>
'******************
'-------------------------------------------------------------------------------
'TestSub1
'-------------------------------------------------------------------------------
Sub TestSub1()



End Sub 'TestSub1


'-------------------------------------------------------------------------------
'TestSub2
'-------------------------------------------------------------------------------
Sub TestSub2()



End Sub 'TestSub2


'* Info Line 2! balbla

'-------------------------------------------------------------------------------
'TestFunction
'-------------------------------------------------------------------------------
Function TestFunction() As Integer

	TestFunction =

End Function 'TestFunction

With this parser:

			<function
				mainExpr="^\s*(sub|function)\s+\K\w+"
			>
			</function>

I get this list at the moment:

TestSub1
TestSub2
TestFunction

What I want is

<< Infoline 1 >>
TestSub1
TestSub2
Info Line 2! balbla
TestFunction

I tried this one

			<function
				mainExpr="^\s*((sub|function)\s+\K\w+|('*)\s+\K.*)"
			>
			</function>

but without success. Then nothing show up anymore in the list. Can someone help me with this? Thank you.

MAPJe71

Try:

            <function
                mainExpr="^\s*((sub|function)\s+\K\w+|'\*\s+\K.*)"
            >
            </function>

The * is a special character in RegEx’s, you need to escape it to match it literally.

MaDill

Thank you for the information. But even with the * it is not working. Nothing is show up in the list. Even not the sub|function anymore. I tried it on https://regex101.com/ and there it highlight what I’m looking for, but not in the functionlist. Do you have another hint?

MAPJe71

	<function
		mainExpr="(?m-s)^\h*(?:(?i:sub|function)\s+\K\w+|\x27\*\h+\K.*)"
	/>

MaDill

Thank you again. With this regex the sub/function is working again. But the '* still not. Is the reqex style of the functionlist somehow special?

MAPJe71

Is the reqex style of the functionlist somehow special?

AFAIK it isn’t.

MAPJe71

guy038

Hello, @madill, @mapje71,

I found a solution :-)) To test it :

Open, in N++, your active functionList.xml file
Add the line, below, inside the <associationMap> node :

			<association id= "Test" langID="0" />

Add the lines of the Test parser, below, inside the <parsers> node :

			<parser	id ="Test" displayName="Ma_Dill_Test" commentExpr="'(?!\* )(?-s:.+)" >
				<function
					mainExpr="^\s*(sub|function)\s+\K\w+|^'\*\s+\K(?-s:.+)" >
				</function>
			</parser>

Save the changes of functionList.xml
Close and re-start Notepad++
Open a new tab
Copy your example text of your first post, in this new tab
Open the Function List panel ( View > Function List )

=> You should see your five functions, as you expect to !!

Notes :

I preferred to add the commentExpr part, which defines the line-comment zones to avoid, for further search of functions !
As I supposed that lines, beginning with '* followed by some space characters, define special marking/infos
it becomes obvious that comment lines are lines which :
- Begin by a single quote character ( ' ), NOT followed by an asterisk + a “space” character '(?!\* ), which is a negative look-ahead
- And followed by all standard characters of the current line => (?-s:.+). The -s modifier is needed, because, by default, the Function List Regex engine considers all the text as a single line. ( So the regex .+ would match any non-empty text, even on several lines. That is to say, all file contents ! )
In the mainExpr regex, I just add the alternative ^'\*\s+\K(?-s).+, which looks, after beginning of line, for :
- A single quote, followed with an *asterisk ( '\* ) and, at least, one character, of type = “space” ( \s+ )
- Then, the syntax \K resets the regex search
- Finally, the part (?-s:.+), again, looks for the remainder of the current line, only, due to the -s modifier, and is simply displayed, in the functionList panel !

BTW, Mapje71, the differences between your two posts, are the part (?m-s), at beginning of the regex, in your last post ;-))

Best Regards,

guy038

MaDill

@guy038 and @MAPJe71 The version from guy038 is working. I don’t know why the other one is working on your screenshot but not here. Thank you to both for your time and help. How can I set the topic to solved?

guy038

Hi, @madill, @mapje71, and All,

Updated on 07-22-17 ( \v syntax added )

MaDill, I, slightly, changed the mainExpr regex, as below :

(?i)^\h*(?:(?-i:Sub|Function)\s+\K\w+|'\*\s+\K(?-s:.+))

Notes :

At beginning, the part (?i)^\h* means that the search is, globally, case insensitive and that the key-words ( Sub, Function and '* may be preceded by optional tabulation and/or space characters
Then, the general structure, which follows, is a non-capturing group, made of two alternatives ( (?:.....|......) )

, As I thought that the key-words Sub and Function must have that strict case, I decided to create the sensitive non-capturing group (?-i:Sub|Function)

Any key-words must be followed by, at least, one, horizontal or vertical, White Space character ( \s+ )
Finally, after the reset behaviour, due to the \K syntax, we display, in the Function List panel, either :
- The name of current subroutine or function ( \w+, in case of key-words Sub/Function
- All the rest of current line, only, (?-s:.+), in case of key-word '\*

Do hope, you’ll like this interpretation ;-))

In all this discussion, we’re using, in regexes, either, the \s and/or the \h syntaxes. We could also add the \v syntax ! What they, all, refer to ?

Well, from the Wiki article :

https://en.wikipedia.org/wiki/Whitespace_character

we hear of the White Space definition, which is any character or series of characters, that represent horizontal or vertical space in typography. They, all, have the Unicode property “WSpace=Y”.

So, strictly :

The Shorthand Character Class \s, used in the N++ Boost regex engine, matches any Vertical or Horizontal White Space character, of the list below :

U+0009			CHARACTER TABULATION
U+000A	
				LINE FEED
U+000B			VERTICAL TABULATION
U+000C			FORM FEED
U+000D	
				CARRIAGE RETURN
U+0020	 		SPACE
U+0085			NEXT LINE
U+00A0	 		NO-BREAK SPACE
U+1680	 		OGHAM SPACE MARK
U+2000	 		EN QUAD
U+2001	 		EM QUAD
U+2002	 		EN SPACE
U+2003	 		EM SPACE
U+2004	 		THREE-PER-EM SPACE
U+2005	 		FOUR-PER-EM SPACE
U+2006	 		SIX-PER-EM SPACE
U+2007	 		FIGURE SPACE
U+2008	 		PUNCTUATION SPACE
U+2009	 		THIN SPACE
U+200A	 		HAIR SPACE
U+2028	 		LINE SEPARATOR
U+2029	 		PARAGRAPH SEPARATOR
U+202F	 		NARROW NO-BREAK SPACE
U+205F	 		MEDIUM MATHEMATICAL SPACE
U+3000	　		IDEOGRAPHIC SPACE

Moreover, it, also, matches the NON-WhiteSpace character, below :

U+200B			ZERO WIDTH SPACE

The Shorthand Character Class \h, used in the N++ Boost regex engine, matches any Horizontal White Space character, of the list below :

U+0009			CHARACTER TABULATION
U+0020	 		SPACE
U+00A0	 		NO-BREAK SPACE
U+1680	 		OGHAM SPACE MARK
U+2000	 		EN QUAD
U+2001	 		EM QUAD
U+2002	 		EN SPACE
U+2003	 		EM SPACE
U+2004	 		THREE-PER-EM SPACE
U+2005	 		FOUR-PER-EM SPACE
U+2006	 		SIX-PER-EM SPACE
U+2007	 		FIGURE SPACE
U+2008	 		PUNCTUATION SPACE
U+2009	 		THIN SPACE
U+200A	 		HAIR SPACE
U+202F	 		NARROW NO-BREAK SPACE
U+205F	 		MEDIUM MATHEMATICAL SPACE
U+3000	　		IDEOGRAPHIC SPACE

As before, it, also, matches the NON-WhiteSpace character, below :

U+200B			ZERO WIDTH SPACE

The Shorthand Character Class \v, used in the N++ Boost regex engine, matches any Vertical White Space character, of the list below :

U+000A	
				LINE FEED
U+000B			VERTICAL TABULATION
U+000C			FORM FEED
U+000D	
				CARRIAGE RETURN
U+0085			NEXT LINE
U+2028	 		LINE SEPARATOR
U+2029	 		PARAGRAPH SEPARATOR

And, logically, the \s class character is identical to the union of the two classes \h and \v !!

Luckily, most of these characters are never found, in Western scripts. Then, practically, we, just, have to remember that :

The \s syntax is, generally, identical to the simple class [\t\n\r\x20]
The \h syntax is, generally, identical to the simple class [\t\x20]
The \v syntax is, generally, identical to the simple class [\n\r]

Cheers,

guy038

BTW, MaDill, the MAPJe71’s last regex does work, properly, on my “old” Win WP configuration !

MaDill

Thank you for the explanation.