[FunctionList] Regex OK in Regex101 but broken in N++

Mateos81

Good evening,

I’m editing a regex in N++ for a custom language in order to exclude a few keywords and a specific word construction, using the negative lookahead thingy

As long as I give whole words it’s fine, but if I try to describe a word with a regex, boom, it just breaks N++…

The whole regex (function mainExpr):

^[ \t]*(global |member )?(\b(?!ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|([A-Z]{2,}\_\w+))\b[\w_\.]+)(\ *)\=((\ *)function(\ *)\(([\w_\,\ ]*)\))?\s*{

The part that breaks is ([A-Z]{2,}\_\w+)

N++ simply ignores the {2,}; it will just exclude any string starting with ONE capital letter where at some point further there’s an underscore followed by word characters… and if there are other underscores, same treatment

If I replace [A-Z]{2,} by, say, BUILD, it works… (excludes ofc a string like BUILD_something)

Example script for checking in regex101: https://pastebin.com/Uk9jdfzw

Any idea? Or is it really a N++ bug?

Thank you in advance

Regards

guy038

Hello, @mateos81,

When a regex contains the [A-Z] or the [a-z] syntaxes, in order to specifically identify either an uppercase or a lowercase letter, the search should be processed in a non-insensitive way !

So, let’s imagine, for instance, the initial regex \d+[A-Z]+\d+. In order to run a non-insensitive search, 3 solutions :

Use the \d+[A-Z]+\d+ regex and tick the Match case option, in any Find dialog tab
Use an in-line modifier at beginning of the regex => (?-i)\d+[A-Z]+\d+
Use a non-capturing group with a non-insensitive search, inside that group => \d+(?-i:[A-Z]+)\d+

Note that even the \d+[\x41-\x5A]+\d+ regex syntax does not guarantee that the search is carried in a non-sensitive way ! Indeed, if the Match case option is not ticked, the regex \d+[\x41-\x5A]+\d+ will match, either the strings 12ABCD123, 123Abcd123 and 123abcd123 :-((

So, in order that the regex engine does not get confused about a mix of uppercase and lowercase letters, simply add the (?-i) part at the beginning of your regex. Of course, this also means that, considering the negative look-ahead (?!ExitConditions|f|navigate|MapRoutes|OnEnter|.......), a variable as, let’s say, maProute would not be an exception and would be matched !

Now, I think that we could simplify your regex, in some ways :

We can delete some unnecessary escape sequences
We can omit any non-optional group, as replacement is not invoked !
We should move the words boundaries \b, inside the look-ahead
We can reduce some redundant syntaxes

Leading up to this version :

(?-i)^\h*(global\x20|member\x20)?(?!\b(ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|[A-Z]{2,}_\w+)\b)[\w.]+\x20*=(\x20*function\x20*\([\w,\x20]*\))?\s*{

If we use the free-spacing mode, this regex can be slipt up, as below, for best readability and explanations :

(?x-i)                                                                                   #  Search in FREE-SPACING and NON-INSENSITIVE modes

^\h*                                                                                     #  OPTIONAL HORIZONTAL blank chars ( SPACE, TAB, NBSP )

(global\x20|member\x20)?                                                                 #  OPTIONAL string "global" or "member", with that EXACT case, FOLLOWED with a SPACE char

(?!  \b  (ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|[A-Z]{2,}_\w+)  \b  )  #  NOT followed with, either, in that EXACT case, the WHOLE WORDS :
                                                                                         #  "ExitConditions", "f", "navigate" "MapRoutes" "OnEnter" "OnExit" or "Wait"
                                                                                         #  or words BEGINNING with, at least, 2 UPPERCASE letters, FOLLOWED with an UNDERSCORE
																					   
[\w.]+    \x20*    =                                                                     #  A word, POSSIBLY containing DOT(S), FOLLOWED with OPTIONAL SPACE char(s) and an EQUAL sign

(  \x20*function\x20*   \(    [\w,\x20]*    \)   )?                                      #  OPTIONAL word "function", with that EXACT case, surrounded with POSSIBLE SPACE char(s)
                                                                                         #  FOLLOWED with  a COUPLE of PARENTHESES, POSSIBLY containing WORD, SPACE or COMMA characters 

\s*{                                                                                     #  ENDED with an OPENING BRACE character, POSSIBLY PRECEDED with SPACE chars, BLANK(S) and/or LINE-BREAK(S)

Finally, I’m a little skeptical about your syntax f, inside the negative look-ahead (?!\b(ExitConditions|f|navigate|....)\b) ?

Indeed, this syntax would only avoid this line :

	f =	{

A bit weird, isn’t it ?

Best regards,

guy038

Alan Kilborn

@Mateos81

Regex OK in Regex101 but broken in N++

So a word of caution: Regex101 and Notepad++ use different regular expression engines, so if your regex gets a bit “tricky”, it would be no surprise that the two produce different results.

Luckily it does not appear to be the case (pun fully intended) this time, if @guy038 's analysis is correct (I didn’t look deep into it).

Mateos81

Thank you for the complete explanation!

Yeah I expected [A-Z] to behave differently compared to [a-z], alright now I know… I thought regex101 without any particular flag was something like a “regular” regex engine, so was kind of trusting it more

“f” is a particular case here, but this is something I plan to escape in a different manner soon :)

The split-up part, was wondering why some had working splits and mine not when I tried to write it in this particularly readable fashion, so an other thing learned :)

Hmm, may I resurrect an old topic I wrote?

Some months ago I tried to achieve: https://imgur.com/1le9ot3

With: https://community.notepad-plus-plus.org/topic/16803/function-list-help-for-adding-a-custom-language

In the end it would be the best to my mind… there again I had regex101 OK but the writing is probably not fine by N++

Would you mind taking a look? ^^’

Mateos81

@guy038 I can’t seem to reply in the other topic because it is closed

For the first part yes I know the comment part in the middle blocks but that’s not a huge issue

My goal there was to use the “class” object to have some kind of tree view, since the GM syntax is basically tables in tables

Thank you for taking the time to have a look :)

guy038

Hello, @mateos81 and All,

Before building up your own class parser, I preferred to simplify the problem and study a very simple parser, where :

A class name is preceded by a # character
A function name, inside a class name, is preceded with the @ character
A function name, outside a class name, is preceded with the & character

I associated this parser to Normal Text with the command <association id="Test_Parser" langID= "0" />, located in the <associationMap> node, for easy tests !

			<parser
				id="Test_Parser"
				displayName="How a Parser work !"
				commentExpr="/\*.*?\*/|//.*?$"
			>
				<classRange
					mainExpr    ="^\h*#\w+\x20*=\s*\{"
					openSymbole ="\{"
					closeSymbole="\}"
				>
					<className>
						<nameExpr expr=".\w+(?=\x20*=)" />
					</className>
					<function
						mainExpr="^\h*@\w+\x20*=\s*\{"
					>
						<functionName>
							<funcNameExpr expr=".\w+(?=\x20*=)"/>
						</functionName>
					</function>
				</classRange>
				<function
					mainExpr="^\h*&\w+\x20*=\s*\{"
				>
					<functionName>
						<nameExpr expr=".\w+(?=\x20*=)" />
					</functionName>
				</function>
			</parser>

I tested this parser against the sample text, below :

#Map =
{
    @Navigation =
    {
        @jump =
        {
            @navigate =
            {
            },
        },
    },

    @Roles =
    {
        @AXIS =
        {
            @AllBots =

            @DEFENDER =
            {
                voice = "Defending the Bridge!",    // Signals voice chat to randomly announce
            },

            @DEFENDER1 =
            {
                voice = "Defending the Main Entrance!", // Signals voice chat to randomly announce
            },
        };
    };

    @Axis_Bridge_Suicide =
    {
        @TriggerOnClass = CLASS.ANYPLAYER,

        @OnEnter =
        {
        },
    },

    @Test =
    {
    },
};

&Function_Outside_Class =
{
    &Funct_3 =
	{
        &Funct_4 =
		{
		},
	},
},

#InitializeRoutes =
{
    @MapRoutes =
    {
        @BUILD_Command_Post =
        {
        },
    },
};

#EmptyClass =
{
};


//    #CommentedOutClassLevelTest_1 =
//    {
//        @Funct_1 =
//    	{
//    	}
//    };


/****************************************

#CommentedOutClassLevelTest_2 =
{
    @Funct_2 =
  	{
  	}
};
*************************************/

The Function List panel displays :


#Map
  |
  \__ @navigation
  | 
  \__ @jump =
  |
  \__ @navigate =
  |
  \__ @Roles =
  |
  \__ @AXIS =
  |
  \__ @DEFENDER =
  |
  \__ @DEFENDER1 =
  |
  \__ @Axis_Bridge_Suicide =
  |
  \__ @OnEnter =
  |
  \__ @Test =

#InitializeRoutes =
  |
  \__ @MapRoutes =
  |
  \__ @BUILD_Command_Post =

&Function_Outside_Class =

&Funct_3 =

&Funct_4 =

Up to now, I noticed some points :

I was not able to get a multi-levels feature. For instance, all subsequent parts, of the Map class, are simply listed at the first sub-level (-)
A function outside a class ( with & identifier ), is correctly listed, at the main level (+), but, unfortunately, all their subsequent functions, too (-)
Block and line comments seem correctly not taken in account (+)
Any empty class { } does not appear in the function List panel (-)

Probably, I did not exploit all the capabilities of the Function List language ! We need experts on that matter, as I haven’t seriously studied any object language, yet !

Best Regards,

guy038

Mateos81

Thank you very much for your investigations!

I guess at this point it’s almost as if it would take to get the source and read the code then debug it to see what’s going on and if the multi-level is possible! Or perhaps no need to go that far, dunno, but that couldn’t be more concrete and would close any debate I guess

Perhaps if someone with that knowledge could drop by here? :D

Thank you again for taking the time to experience, test various things and report back in such a plaisant way to read ^^

guy038

Hi, @mateos81 ,

I did not pay attention to your picture : https://imgur.com/1le9ot3

I just read what you said :

The goal is to have something like:

Map
    Navigation
        jump
            navigate
    Roles
        Axis
            DEFENDER
            DEFENDER1
    AxisBridgeSuicide
        OnEnter
    Test0
    Test
    Test2
OnMapLoad
OnBotJoin
InitializeRoutes

Upon closer inspection, I’m quite surprised to see a FunctionList panel with a multi-level list of functions !! Seemingly, from your picture and the icons, it is a view of the built-in FunctionList N++ panel, isn’t it ?

So, do you have the specific parser code which can produce this kind of list ? I would be really interested to get it ;-))

Or, maybe, it’s a montage of several images, to show what you wanted ?

Cheers,

guy038

MAPJe71

FYI:

Function List can only display two levels; the main level and one sub-level;
when the end of a class range is not detected (correctly), all subsequent functions will end up being displayed in the main level;
empty classes are not displayed in the Function List panel as there are no functions/methods to select.

guy038

Hi, @Mateos81, @MAPJe71 and All,

@MAPje71, Many thanks you for these clarifications :-))

So, from your point of view, the @Mateos81’s image ( https://imgur.com/1le9ot3 ), showing a multi-level list, cannot be a view from the FunctionList feature of Notepad++ but, rather, from another editor or, as I said in my previous post, simply resulting of a photo montage to express what he desired ?

BR

guy038

Mateos81

It’s a montage made by a contributor to show me what he’d like to have, and since that was exactly what I wanted myself, I took the picture and forwarded it over here ^^’

You can see there’s a gray line hanging kind-of off, but still I’ve asked him to be sure, and yes it was a montage (else he’d have pushed his script to the repo I guess)

Thanks for the clarifications @MAPJe71 :)

So we can’t push more forward for now; thank you very much to all of you following and helping on this topic!

Take care people