Perl language syntax highlighting troubles (bug or limitation ?)
- 
the regexes assumes double quotes and semicolon directly attached to EOT.
Likeprint << "EOT"; --------------------- separation line ------------------ EOTIs there a rule how this is specified?
 - 
I think I found why.
Your regexp says :
r'(?s)((<<)"*(\w+?)"*;.*?\3)'
would not it be better if :
r'(?s)(\h*(<<)\h*"*(\w+?)"*\h*;.*?\3)'???
To answer your question:
Perl allows
- <<TEXT,
 - << TEXT
 - <<‘TEXT’ / << ‘TEXT’
 - <<“TEXT” / << “TEXT”
 
meanings differ in each case…
 - 
To be honest - I’m not a regex expert at all :-D
If you, as a perl developer, say so I would absolutely believe it is :-) - 
In your Python regexp, what’s the meaning of:
- “\3”
 - “, [2]” and “[2,3]” ?
 
If I can understand what I think I could translate a Perl regex code into python (for this case at least).
 - 
What about using this
(?s)((<<)\h+(["|'])(\w+?)\3\h*;.*?\4) - 
- is the boost:regex convention to denote match group 3
and - defines which match group actually should be painted
 
Like if you have:
r'(word1)(word2)(word3)', [2,3]would mean that only word2 and word3 would be painted
whereas if you would specifyr'(word1)(word2)(word3)', [0]everything would be colored.
Does this makes sense to you?
 - is the boost:regex convention to denote match group 3
 - 
I don’t understand your regexp syntax. Perhaps too ‘pythonized’ for me.
(?s) : what does it mean ? is it ‘s///’ ? or really a non capturing group of ‘s’ ???
\3 \4 : are they $3 $4, I don’t think as I can’t see a 4th accumulator - 
(?s) is a modifier telling the engine that the dot matches line endings
and yes, the engine uses \1 and $1Here the link to the documentation - maybe easier for you.
 - 
ooppps
(?s)((<<)\h+(["|'])(\w+?)\3\h*;.*?\3):-D
 - 
This post is deleted! - 
Ok
another one: in Python you must say["|']instead of Perl["'](‘either one of the set’) ? Is that what it means ? - 
No, afaik non-capturing group is (?:pattern)
This, (?s), just tells the engine that the dot.is matching
EOLs like\r\n- if I’m right. - 
Just for clarification, the python script does NOT use the python regex engine instead
it uses the one notepad++ offers, the boost::regex.
Yes, you can use the enumeration without the pipe but makes it more visible for me with
the pipe sign. Or is there a difference if used with pipe sign or without? - 
or maybe this one might be even better
(?s)(<<)\h+(["'])(\w+?)\2\h*;.*?\3 - 
Can’t reply what I wanted, a robot says I’m spamming…
 - 
Can’t reply what I wanted, a robot says I’m spamming…
I have no idea why this happens sometimes.
By the way, now that you have installed pythonscript plugin would you mind
clicking Plugins->Python Script->Scripts->Samples->RegexTester ?I know not everyone is recommending it but, personally, I love it.
 - 
AFAIK, at least in Perl, ["|'] means double-quote OR pipre OR simple-quote, everything between square brakets is literal. Also true in “awk” and C regexp I think.
I don’t know for Python. - 
Now, if I say in Pyhton (attempt to transliterate from Perl) :
(r'(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)', [2])does it mean :
- form REGEXP
 - do not match NL with DOT
 - matches any horizontal blanks (0 or more), don’t make a group
 - matches ‘<<’ make it a group
 - matches any horizontal blanks (0 or more), don’t make a group
 - matches 0 or 1 text quote (either double or single), no group
 - matches a group of any chars not " nor ’ one or more time(s) (in perl it would be [^"'])
 - matches 0 or 1 text quote (either double or single), no group
 - possible blanks until semi-colon, semi-colon, then possible chars until NL
 
BUT THEN, what does mean
?\3. I’m lost there. - 
a
slash m - 
the
rat the beginning just informs python that this is a raw string and
every char must be taken literally otherwise backslashes would be treated
as escapes under some circumstances.The regex string is only this part
(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)and I would say, but as said - not an regex expert at all,
(?s) means Dot matches newline characters
the first matching group is(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)the second
(<<)and the third must be
([^"|^']+?)if I’m right.
\3 should be the same as $3 in perl