Perl language syntax highlighting troubles (bug or limitation ?)
-
maybe this picture makes it a little bit clearer
-
still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’
as less as possible - non-greedy
and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in >subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks >surrounded by parenthèses only.
placeholder for what was found in match group 3, to find the EOT at the end
and there are 3 match groups or am I missing something??
-
2 sets of parenteses only, where is the third set ?
so only 2 match groupscan you make this work :
no syntax error on the python console but absolutely no result, where is my bug ?
regexes[(3, (255,255,255))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [1])
-
[1]
informs the python script, that only the results from sub match group 1 should be colored in white (255,255,255)
sub match group 1 is the result of (<<)In order to make it painting all you can use [0]
I’m still confused about the 2 to 3 match groups.
Am I incorrect when saying that
(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
(<<)
("{0,1}.+"{0,1})
are three match groups?Maybe the confusion comes from the fact that references matches within a
regular expression starts by 1 but python starts counting match results by 0.Sorry, but I have to stay up early tomorrow and it is already 1am but I’m really
interested in solving our (mis)understanding today later (maybe in ~16-18hours)? -
ok, tomorrow is another day
‘see’ you tomorrow.
have a good night.
g -
you too - see you
-
OK, so the
(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
is a regex group, not a function call surrounded by parenthèses or a logical group provided by the ‘r’ keyword. My mistake.
BUT THEN, it is possible in Python to enclose an instruction such as?\3
which means (as far as I understood what you explained to me earlier) recursive reference to a regexp named ‘3’) ??? The ‘3’ name being given in the expressionregexes[(3, (255,255,255))]
is that correct ? SO you can reference an expression within itself while it has not be closed yet: the last parenthese of the expression 3 is after the \3). Is that what it means ?Python syntax is a bit complicated to me.
-
@Gilles-Maisonneuve said:
Python syntax is a bit complicated to me
It’s not Python syntax, it’s regular expression syntax. It’s just not Perl regular expression syntax. :)
And, BTW, nobody in the history of the world, especially someone coming from a Perl background, has ever uttered the phrase you typed.
-
Hello @gilles-maisonneuve, @eko-palypse and All,
Gilles, could you verify that the two lines, below, work, with yours
R
ed,G
reen andB
lue colors ?regexes[(3, (R,G,B))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1]) regexes[(4, (R,G,B))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
For these two regexes :
-
Group
1
=<<
= double inferior than sign -
Group
2
=['"]?
= an optional single or double quote, for regex id3
-
Group
2
='|"
= a mandatory single or double quote ,separated from the<<
characters with blank characters, for regex id4
-
Group
3
=\w+?
= the shortest area of word characters, after the<<
sign, between possible quotes
and before a semicolon character;
, with possible blank characters, before and/or after the quote characters
Notes :
-
In regex id
3
, only the<<
string is highlighted ( Group1
) -
In regex id
4
, the<<
and the text between quotes are highlighted ( Groups1
and3
) -
I added the
-i
in-line modifier ( =>(?s-i)
leading syntax ) to be sure that the ending boundary of the block corresponds exactly with the text, between quotes ( search is sensitive to case ! )
So my regex
(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3
( id3
) matches any of these six cases, below :$x=<<TEXT; Plain text here TEXT $x=<<'TEXT'; Plain text here TEXT $x=<<"TEXT"; Plain text here TEXT $x=<<TEXT ; Plain text here TEXT $x=<<'TEXT' ; Plain text here TEXT $x=<<"TEXT" ; Plain text here TEXT
And my regex
(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3
( id4
) matches these4
cases, below :$x=<< 'TEXT'; Plain text here TEXT $x=<< "TEXT"; Plain text here TEXT $x=<< 'TEXT' ; Plain text here TEXT $x=<< "TEXT" ; Plain text here TEXT
Best Regards,
guy038
-
-
Hello Guy,
Could not make it work, sorry.
I mean:
-
added (replaced original ones) in the EnhancePerlLexer.py from Ekopalypse the following lines (according to what you gave me:
regexes[(3, (224,0,0))] = (r’(?s-i)(<<)([‘"]?)(\w+?)\2\h*;.?\3’, [1])
regexes[(4, (0,0,224))] = (r’(?s-i)(<<)\h+('|")(\w+?)\2\h;.*?\3’, [1,3]) -
saved it and restarted npp
-
list itemstill have the same coloring, not working.
BUT, good news:
python console: Traceback (most recent call last): File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\startup.py", line 1, in <module> import EnhancePerlLexer File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 36 regexes[(3, (224,0,0))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1]) ^ SyntaxError: EOL while scanning string literal Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)] Initialisation took 110ms Ready.
Can you tell me what did I did wrong ?
(When I comment out the two lines I get back a valid coloring for the ‘q*’ syntaxes (yes, forgot to tell you, this had vanished too…) -
-
Well, I commented out the rule 3 and kept rule 4.
Same kind of error:regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3]) ^ SyntaxError: EOL while scanning string literal
-
if I modify the rule like:
regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\2\h*;.*?\3)', [1,3])
I don’t get any longer a syntax error in Python BUT I get no coloring for the here doc either…
Any idea ?
-
chcp 1250 >NUL: & perl -e "$var=q(Alan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.); for my $p ('\t','\s') {print qq{\$p=$p},$var=~m/($p)déplaisant\1/x?$var:qq{n'en déplaise},qq{\n} ;};" & chcp 850 >NUL: $p=\tn'en déplaise $p=\sAlan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.
-
J’ai tellement l’habitude d’utiliser $1, $2, …, qui, eux, ne fonctionnent pas dans un simple ‘match’ mais uniquement dans un ‘substitute’, que je ne connaissais pas cette façon de répéter les ‘patterns’ de ‘matching’. J’ai appris quelque chose.
Dont acte. -
Lunch break :-)
First, I’m sorry not to telling you that the single quote has to be escaped as it was
used to denote a python string - good, you figured it already out.Let me break down the parts of that python code
regexes = OrderedDict() regexes[(3, (255,0,0))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [0])
regexes is variable, containing an OrderedDict class instance.
OrderedDict is more or less the same as a perl associative array or hashregexes[] is the python way to access a key in that hash, like in perl regexes{}
regexes[()] the round bracket denotes a python tuple, in perl a list I guess (immutable)
the python tuple contains the items 3 and (255,0,0) <- this is again a tuple
The number 3 is here to create an unique key - has nothing to do with the regex itself.
So, regexes[(3, (255,0,0))] means, get me the value for key (3, (255,0,0)) from dict(hash) regexesThe value is (r’(?s)(\s*(<<)\s*(“{0,1}.+”{0,1})\s*;.*?\3)‘, [0])
Again, a python tuple containing the items r’…’ (raw string) and a list [] (in perl an array = mutable)
Everything within the raw string is the regex to be searched for and the list contains the information
which match group should be used for coloring
[0] is always the overall match of the complete regex and [1] would be the result from group 1,
[2] from group 2 and [1,2] from group 1 and group 2So, in terms of regular expressions only the value part of the regexes hash/dict is of interest.
For searching only the raw string and for coloring which part was defined in the list [].Does this makes sense to you?
The reason why this regex
regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\2\h*;.*?\3)', [1,3])
doesn’t do what you want is that you use 4 groups now whereas @guy038 has
removed the outer matching group brackets.(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3
In order to make it work either use
regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+([\'"])(\w+?)\2\h*;.*?\3', [1,3])
or
regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\3\h*;.*?\4)', [1,3])
-
No idea what the “chcp 1250…” posting was supposed to be saying to me. :)
This thread gets my vote for the biggest jumbled mess in the history of the community. :)
-
maybe @Ekopalypse will write a resuming manual, once this is over … i refuse :)
-
You mean a short manager summary I guess :-D
-
if a short manager summary is, in your eyes, a fully featured guide, covering all eventualities, based on all caveats of the whole topic … then yes 😉
-
LOL - back to business