Perl language syntax highlighting troubles (bug or limitation ?)
-
Ok
another one: in Python you must say["|']
instead of Perl["']
(‘either one of the set’) ? Is that what it means ? -
No, afaik non-capturing group is (?:pattern)
This, (?s), just tells the engine that the dot.
is matching
EOLs like\r\n
- if I’m right. -
Just for clarification, the python script does NOT use the python regex engine instead
it uses the one notepad++ offers, the boost::regex.
Yes, you can use the enumeration without the pipe but makes it more visible for me with
the pipe sign. Or is there a difference if used with pipe sign or without? -
or maybe this one might be even better
(?s)(<<)\h+(["'])(\w+?)\2\h*;.*?\3
-
Can’t reply what I wanted, a robot says I’m spamming…
-
Can’t reply what I wanted, a robot says I’m spamming…
I have no idea why this happens sometimes.
By the way, now that you have installed pythonscript plugin would you mind
clicking Plugins->Python Script->Scripts->Samples->RegexTester ?I know not everyone is recommending it but, personally, I love it.
-
AFAIK, at least in Perl, ["|'] means double-quote OR pipre OR simple-quote, everything between square brakets is literal. Also true in “awk” and C regexp I think.
I don’t know for Python. -
Now, if I say in Pyhton (attempt to transliterate from Perl) :
(r'(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)', [2])
does it mean :
- form REGEXP
- do not match NL with DOT
- matches any horizontal blanks (0 or more), don’t make a group
- matches ‘<<’ make it a group
- matches any horizontal blanks (0 or more), don’t make a group
- matches 0 or 1 text quote (either double or single), no group
- matches a group of any chars not " nor ’ one or more time(s) (in perl it would be [^"'])
- matches 0 or 1 text quote (either double or single), no group
- possible blanks until semi-colon, semi-colon, then possible chars until NL
BUT THEN, what does mean
?\3
. I’m lost there. -
a
slash m
-
the
r
at the beginning just informs python that this is a raw string and
every char must be taken literally otherwise backslashes would be treated
as escapes under some circumstances.The regex string is only this part
(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
and I would say, but as said - not an regex expert at all,
(?s) means Dot matches newline characters
the first matching group is(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
the second
(<<)
and the third must be
([^"|^']+?)
if I’m right.
\3 should be the same as $3 in perl
-
still confused:
([^"|^']+?)
why a ‘?’ after the ‘+’ what’s for this ‘?’and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks surrounded by parenthèses only.
-
maybe this picture makes it a little bit clearer
-
still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’
as less as possible - non-greedy
and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in >subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks >surrounded by parenthèses only.
placeholder for what was found in match group 3, to find the EOT at the end
and there are 3 match groups or am I missing something??
-
2 sets of parenteses only, where is the third set ?
so only 2 match groupscan you make this work :
no syntax error on the python console but absolutely no result, where is my bug ?
regexes[(3, (255,255,255))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [1])
-
[1]
informs the python script, that only the results from sub match group 1 should be colored in white (255,255,255)
sub match group 1 is the result of (<<)In order to make it painting all you can use [0]
I’m still confused about the 2 to 3 match groups.
Am I incorrect when saying that
(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
(<<)
("{0,1}.+"{0,1})
are three match groups?Maybe the confusion comes from the fact that references matches within a
regular expression starts by 1 but python starts counting match results by 0.Sorry, but I have to stay up early tomorrow and it is already 1am but I’m really
interested in solving our (mis)understanding today later (maybe in ~16-18hours)? -
ok, tomorrow is another day
‘see’ you tomorrow.
have a good night.
g -
you too - see you
-
OK, so the
(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
is a regex group, not a function call surrounded by parenthèses or a logical group provided by the ‘r’ keyword. My mistake.
BUT THEN, it is possible in Python to enclose an instruction such as?\3
which means (as far as I understood what you explained to me earlier) recursive reference to a regexp named ‘3’) ??? The ‘3’ name being given in the expressionregexes[(3, (255,255,255))]
is that correct ? SO you can reference an expression within itself while it has not be closed yet: the last parenthese of the expression 3 is after the \3). Is that what it means ?Python syntax is a bit complicated to me.
-
@Gilles-Maisonneuve said:
Python syntax is a bit complicated to me
It’s not Python syntax, it’s regular expression syntax. It’s just not Perl regular expression syntax. :)
And, BTW, nobody in the history of the world, especially someone coming from a Perl background, has ever uttered the phrase you typed.
-
Hello @gilles-maisonneuve, @eko-palypse and All,
Gilles, could you verify that the two lines, below, work, with yours
R
ed,G
reen andB
lue colors ?regexes[(3, (R,G,B))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1]) regexes[(4, (R,G,B))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
For these two regexes :
-
Group
1
=<<
= double inferior than sign -
Group
2
=['"]?
= an optional single or double quote, for regex id3
-
Group
2
='|"
= a mandatory single or double quote ,separated from the<<
characters with blank characters, for regex id4
-
Group
3
=\w+?
= the shortest area of word characters, after the<<
sign, between possible quotes
and before a semicolon character;
, with possible blank characters, before and/or after the quote characters
Notes :
-
In regex id
3
, only the<<
string is highlighted ( Group1
) -
In regex id
4
, the<<
and the text between quotes are highlighted ( Groups1
and3
) -
I added the
-i
in-line modifier ( =>(?s-i)
leading syntax ) to be sure that the ending boundary of the block corresponds exactly with the text, between quotes ( search is sensitive to case ! )
So my regex
(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3
( id3
) matches any of these six cases, below :$x=<<TEXT; Plain text here TEXT $x=<<'TEXT'; Plain text here TEXT $x=<<"TEXT"; Plain text here TEXT $x=<<TEXT ; Plain text here TEXT $x=<<'TEXT' ; Plain text here TEXT $x=<<"TEXT" ; Plain text here TEXT
And my regex
(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3
( id4
) matches these4
cases, below :$x=<< 'TEXT'; Plain text here TEXT $x=<< "TEXT"; Plain text here TEXT $x=<< 'TEXT' ; Plain text here TEXT $x=<< "TEXT" ; Plain text here TEXT
Best Regards,
guy038
-