[FunctionList] Regex OK in Regex101 but broken in N++
-
Good evening,
I’m editing a regex in N++ for a custom language in order to exclude a few keywords and a specific word construction, using the negative lookahead thingy
As long as I give whole words it’s fine, but if I try to describe a word with a regex, boom, it just breaks N++…
The whole regex (function mainExpr):
^[ \t]*(global |member )?(\b(?!ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|([A-Z]{2,}\_\w+))\b[\w_\.]+)(\ *)\=((\ *)function(\ *)\(([\w_\,\ ]*)\))?\s*{
The part that breaks is
([A-Z]{2,}\_\w+)
N++ simply ignores the {2,}; it will just exclude any string starting with ONE capital letter where at some point further there’s an underscore followed by word characters… and if there are other underscores, same treatment
If I replace [A-Z]{2,} by, say, BUILD, it works… (excludes ofc a string like BUILD_something)
Example script for checking in regex101: https://pastebin.com/Uk9jdfzw
Any idea? Or is it really a N++ bug?
Thank you in advance
Regards
-
Hello, @mateos81,
When a regex contains the
[A-Z]
or the[a-z]
syntaxes, in order to specifically identify either an uppercase or a lowercase letter, the search should be processed in a non-insensitive way !So, let’s imagine, for instance, the initial regex
\d+[A-Z]+\d+
. In order to run a non-insensitive search,3
solutions :-
Use the
\d+[A-Z]+\d+
regex and tick theMatch case
option, in anyFind
dialog tab -
Use an in-line modifier at beginning of the regex =>
(?-i)\d+[A-Z]+\d+
-
Use a non-capturing group with a non-insensitive search, inside that group =>
\d+(?-i:[A-Z]+)\d+
Note that even the
\d+[\x41-\x5A]+\d+
regex syntax does not guarantee that the search is carried in a non-sensitive way ! Indeed, if theMatch case
option is not ticked, the regex\d+[\x41-\x5A]+\d+
will match, either the strings 12ABCD123, 123Abcd123 and 123abcd123 :-((
So, in order that the regex engine does not get confused about a mix of uppercase and lowercase letters, simply add the
(?-i)
part at the beginning of your regex. Of course, this also means that, considering the negative look-ahead(?!ExitConditions|f|navigate|MapRoutes|OnEnter|.......)
, a variable as, let’s say,maProute
would not be an exception and would be matched !
Now, I think that we could simplify your regex, in some ways :
-
We can delete some unnecessary escape sequences
-
We can omit any non-optional group, as replacement is not invoked !
-
We should move the words boundaries
\b
, inside the look-ahead -
We can reduce some redundant syntaxes
Leading up to this version :
(?-i)^\h*(global\x20|member\x20)?(?!\b(ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|[A-Z]{2,}_\w+)\b)[\w.]+\x20*=(\x20*function\x20*\([\w,\x20]*\))?\s*{
If we use the free-spacing mode, this regex can be slipt up, as below, for best readability and explanations :
(?x-i) # Search in FREE-SPACING and NON-INSENSITIVE modes ^\h* # OPTIONAL HORIZONTAL blank chars ( SPACE, TAB, NBSP ) (global\x20|member\x20)? # OPTIONAL string "global" or "member", with that EXACT case, FOLLOWED with a SPACE char (?! \b (ExitConditions|f|navigate|MapRoutes|OnEnter|OnExit|Wait|[A-Z]{2,}_\w+) \b ) # NOT followed with, either, in that EXACT case, the WHOLE WORDS : # "ExitConditions", "f", "navigate" "MapRoutes" "OnEnter" "OnExit" or "Wait" # or words BEGINNING with, at least, 2 UPPERCASE letters, FOLLOWED with an UNDERSCORE [\w.]+ \x20* = # A word, POSSIBLY containing DOT(S), FOLLOWED with OPTIONAL SPACE char(s) and an EQUAL sign ( \x20*function\x20* \( [\w,\x20]* \) )? # OPTIONAL word "function", with that EXACT case, surrounded with POSSIBLE SPACE char(s) # FOLLOWED with a COUPLE of PARENTHESES, POSSIBLY containing WORD, SPACE or COMMA characters \s*{ # ENDED with an OPENING BRACE character, POSSIBLY PRECEDED with SPACE chars, BLANK(S) and/or LINE-BREAK(S)
Finally, I’m a little skeptical about your syntax
f
, inside the negative look-ahead(?!\b(ExitConditions|f|navigate|....)\b)
?Indeed, this syntax would only avoid this line :
f = {
A bit weird, isn’t it ?
Best regards,
guy038
-
-
Regex OK in Regex101 but broken in N++
So a word of caution: Regex101 and Notepad++ use different regular expression engines, so if your regex gets a bit “tricky”, it would be no surprise that the two produce different results.
Luckily it does not appear to be the case (pun fully intended) this time, if @guy038 's analysis is correct (I didn’t look deep into it).
-
Thank you for the complete explanation!
Yeah I expected [A-Z] to behave differently compared to [a-z], alright now I know… I thought regex101 without any particular flag was something like a “regular” regex engine, so was kind of trusting it more
“f” is a particular case here, but this is something I plan to escape in a different manner soon :)
The split-up part, was wondering why some had working splits and mine not when I tried to write it in this particularly readable fashion, so an other thing learned :)
Hmm, may I resurrect an old topic I wrote?
Some months ago I tried to achieve: https://imgur.com/1le9ot3
With: https://community.notepad-plus-plus.org/topic/16803/function-list-help-for-adding-a-custom-language
In the end it would be the best to my mind… there again I had regex101 OK but the writing is probably not fine by N++
Would you mind taking a look? ^^’
-
@guy038 I can’t seem to reply in the other topic because it is closed
For the first part yes I know the comment part in the middle blocks but that’s not a huge issue
My goal there was to use the “class” object to have some kind of tree view, since the GM syntax is basically tables in tables
Thank you for taking the time to have a look :)
-
Hello, @mateos81 and All,
Before building up your own class parser, I preferred to simplify the problem and study a very simple parser, where :
-
A
class
name is preceded by a#
character -
A
function
name, inside a class name, is preceded with the@
character -
A
function
name, outside a class name, is preceded with the&
character
I associated this parser to
Normal Text
with the command<association id="Test_Parser" langID= "0" />
, located in the<associationMap>
node, for easy tests !<parser id="Test_Parser" displayName="How a Parser work !" commentExpr="/\*.*?\*/|//.*?$" > <classRange mainExpr ="^\h*#\w+\x20*=\s*\{" openSymbole ="\{" closeSymbole="\}" > <className> <nameExpr expr=".\w+(?=\x20*=)" /> </className> <function mainExpr="^\h*@\w+\x20*=\s*\{" > <functionName> <funcNameExpr expr=".\w+(?=\x20*=)"/> </functionName> </function> </classRange> <function mainExpr="^\h*&\w+\x20*=\s*\{" > <functionName> <nameExpr expr=".\w+(?=\x20*=)" /> </functionName> </function> </parser>
I tested this parser against the sample text, below :
#Map = { @Navigation = { @jump = { @navigate = { }, }, }, @Roles = { @AXIS = { @AllBots = @DEFENDER = { voice = "Defending the Bridge!", // Signals voice chat to randomly announce }, @DEFENDER1 = { voice = "Defending the Main Entrance!", // Signals voice chat to randomly announce }, }; }; @Axis_Bridge_Suicide = { @TriggerOnClass = CLASS.ANYPLAYER, @OnEnter = { }, }, @Test = { }, }; &Function_Outside_Class = { &Funct_3 = { &Funct_4 = { }, }, }, #InitializeRoutes = { @MapRoutes = { @BUILD_Command_Post = { }, }, }; #EmptyClass = { }; // #CommentedOutClassLevelTest_1 = // { // @Funct_1 = // { // } // }; /**************************************** #CommentedOutClassLevelTest_2 = { @Funct_2 = { } }; *************************************/
The
Function List
panel displays :#Map | \__ @navigation | \__ @jump = | \__ @navigate = | \__ @Roles = | \__ @AXIS = | \__ @DEFENDER = | \__ @DEFENDER1 = | \__ @Axis_Bridge_Suicide = | \__ @OnEnter = | \__ @Test = #InitializeRoutes = | \__ @MapRoutes = | \__ @BUILD_Command_Post = &Function_Outside_Class = &Funct_3 = &Funct_4 =
Up to now, I noticed some points :
-
I was not able to get a multi-levels feature. For instance, all subsequent parts, of the
Map
class, are simply listed at the first sub-level(-)
-
A function outside a class ( with
&
identifier ), is correctly listed, at the main level(+)
, but, unfortunately, all their subsequent functions, too(-)
-
Block and line comments seem correctly not taken in account
(+)
-
Any empty class
{ }
does not appear in thefunction List
panel(-)
Probably, I did not exploit all the capabilities of the
Function List
language ! We need experts on that matter, as I haven’t seriously studied any object language, yet !Best Regards,
guy038
-
-
Thank you very much for your investigations!
I guess at this point it’s almost as if it would take to get the source and read the code then debug it to see what’s going on and if the multi-level is possible! Or perhaps no need to go that far, dunno, but that couldn’t be more concrete and would close any debate I guess
Perhaps if someone with that knowledge could drop by here? :D
Thank you again for taking the time to experience, test various things and report back in such a plaisant way to read ^^
-
Hi, @mateos81 ,
I did not pay attention to your picture : https://imgur.com/1le9ot3
I just read what you said :
The goal is to have something like:
Map Navigation jump navigate Roles Axis DEFENDER DEFENDER1 AxisBridgeSuicide OnEnter Test0 Test Test2 OnMapLoad OnBotJoin InitializeRoutes
Upon closer inspection, I’m quite surprised to see a
FunctionList
panel with a multi-level list of functions !! Seemingly, from your picture and the icons, it is a view of the built-inFunctionList
N++ panel, isn’t it ?So, do you have the specific parser code which can produce this kind of list ? I would be really interested to get it ;-))
Or, maybe, it’s a montage of several images, to show what you wanted ?
Cheers,
guy038
-
FYI:
- Function List can only display two levels; the main level and one sub-level;
- when the end of a
class range
is not detected (correctly), all subsequent functions will end up being displayed in the main level; - empty classes are not displayed in the
Function List
panel as there are no functions/methods to select.
-
Hi, @Mateos81, @MAPJe71 and All,
@MAPje71, Many thanks you for these clarifications :-))
So, from your point of view, the @Mateos81’s image ( https://imgur.com/1le9ot3 ), showing a multi-level list, cannot be a view from the
FunctionList
feature of Notepad++ but, rather, from another editor or, as I said in my previous post, simply resulting of a photo montage to express what he desired ?BR
guy038
-
It’s a montage made by a contributor to show me what he’d like to have, and since that was exactly what I wanted myself, I took the picture and forwarded it over here ^^’
You can see there’s a gray line hanging kind-of off, but still I’ve asked him to be sure, and yes it was a montage (else he’d have pushed his script to the repo I guess)
Thanks for the clarifications @MAPJe71 :)
So we can’t push more forward for now; thank you very much to all of you following and helping on this topic!
Take care people