Perl language syntax highlighting troubles (bug or limitation ?)

Ekopalypse

I’m living in DE but have to admit, that I haven’t heard of DWservice before, shame on me :-)

Changes
line 9 to BUILTIN_LEXER = 'perl'
line 111 self.lexer_name = BUILTIN_LEXER.lower()
line 219 to self.doc_is_of_interest = True if editor.getLexerLanguage().lower() == self.lexer_name else False

As python is very picky about whitespaces make sure that you either use
spaces or tabs for indentation only. Best python practice is to set
Settings->Preferences->Language->TabSettings->Python
Tab size = 4 and check replace by space checkbox
(if this isn’t your default setting, of course)

Gilles Maisonneuve

@Ekopalypse

Oops, I must have made some mistakes but can’t find where… Here is the console log:

Traceback (most recent call last):
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\startup.py", line 1, in <module>
    import EnhancePerlLexer
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 283, in <module>
    EnhanceBuiltinLexer().main()
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 280, in main
    self.on_bufferactivated(None)
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 237, in on_bufferactivated
    self.check_lexer()
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 224, in check_lexer
    self.doc_is_of_interest = True if editor.getLexerLanguage().lower() == self.lexer_name else False
AttributeError: 'EnhanceBuiltinLexer' object has no attribute 'lexer_name'
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
Initialisation took 343ms
Ready.

Traceback (most recent call last):
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 73, in <module>
    EnhanceBuiltinLexer().main()
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 280, in main
    self.on_bufferactivated(None)
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 237, in on_bufferactivated
    self.check_lexer()
  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 224, in check_lexer
    self.doc_is_of_interest = True if editor.getLexerLanguage().lower() == self.lexer_name else False
AttributeError: 'EnhanceBuiltinLexer' object has no attribute 'lexer_name'

Line numbers don’t match because I already commented out some of your lines but kept them in the file, and duplicated them with my own changes. But I did do the changes at the places you told me to do them.

Ekopalypse

@Gilles-Maisonneuve

check line 111 - it defines the lexer_name

Gilles Maisonneuve

OOPS, yours :== self.lexer_name, mine :== self_lexer_name, I am really a dumb when dealing with OO programming, can’t realize that ‘self’ is the current object and of course separated by a dot.

Colour has changed for q* keywords and there text (black on dark blue, can’t read but now just need to ajust the colors).
No change for here docs, but don’t know if I properly set the colors, have to check.

Send you a screen copy in a few minutes.

Gilles Maisonneuve

@Ekopalypse

All right, nearly done: with the following regexp in your python code:

regexes[(1, (255,0,128))] = (r'\bq[rwqx]{0,1}\b([^\h]).*?\1|(\bq[rwqx]{0,1}\b\h+(\w).*?\3)', [0])
regexes[(2, (255,0,128))] = (r'\bq[rwqx]{0,1}\b\h*(\(.+?\)|\[.+?\]|\{.+?\})', [0])
regexes[(3, (0,0,0))] = (r'(?s)((<<)"*(\w+?)"*;.*?\3)', [2])
regexes[(4, (0,0,0))] = (r'(?s)((<<)\h+"(\w+?)";.*?\3)', [2,3])

I get the following colors:

q* colors OK, here docs no

Q* colors are good {well I might have an uggly taste in colors but at least they match ;-)) }

Would you have any clue about why the here docs= are still not handled properly ? They should be black, I think.

Ekopalypse

@Gilles-Maisonneuve

the regexes assumes double quotes and semicolon directly attached to EOT.
Like

print << "EOT";

--------------------- separation line ------------------

EOT

Is there a rule how this is specified?

Gilles Maisonneuve

I think I found why.
Your regexp says :
r'(?s)((<<)"*(\w+?)"*;.*?\3)'
would not it be better if :
r'(?s)(\h*(<<)\h*"*(\w+?)"*\h*;.*?\3)'

???

To answer your question:

Perl allows

<<TEXT,
<< TEXT
<<‘TEXT’ / << ‘TEXT’
<<“TEXT” / << “TEXT”

meanings differ in each case…

Ekopalypse

To be honest - I’m not a regex expert at all :-D
If you, as a perl developer, say so I would absolutely believe it is :-)

Gilles Maisonneuve

@Ekopalypse

In your Python regexp, what’s the meaning of:

“\3”
“, [2]” and “[2,3]” ?

If I can understand what I think I could translate a Perl regex code into python (for this case at least).

Ekopalypse

What about using this
(?s)((<<)\h+(["|'])(\w+?)\3\h*;.*?\4)

Ekopalypse

is the boost:regex convention to denote match group 3
and
defines which match group actually should be painted

Like if you have:

r'(word1)(word2)(word3)', [2,3]

would mean that only word2 and word3 would be painted
whereas if you would specify

r'(word1)(word2)(word3)', [0]

everything would be colored.

Does this makes sense to you?

Gilles Maisonneuve

@Ekopalypse

I don’t understand your regexp syntax. Perhaps too ‘pythonized’ for me.

(?s) : what does it mean ? is it ‘s///’ ? or really a non capturing group of ‘s’ ???
\3 \4 : are they $3 $4, I don’t think as I can’t see a 4th accumulator

Ekopalypse

(?s) is a modifier telling the engine that the dot matches line endings
and yes, the engine uses \1 and $1

Here the link to the documentation - maybe easier for you.

Ekopalypse

ooppps

(?s)((<<)\h+(["|'])(\w+?)\3\h*;.*?\3)

:-D

Gilles Maisonneuve

This post is deleted!

Gilles Maisonneuve

@Ekopalypse

Ok
another one: in Python you must say ["|'] instead of Perl ["'] (‘either one of the set’) ? Is that what it means ?

Ekopalypse

No, afaik non-capturing group is (?:pattern)
This, (?s), just tells the engine that the dot . is matching
EOLs like \r\n - if I’m right.

Ekopalypse

Just for clarification, the python script does NOT use the python regex engine instead
it uses the one notepad++ offers, the boost::regex.
Yes, you can use the enumeration without the pipe but makes it more visible for me with
the pipe sign. Or is there a difference if used with pipe sign or without?

Ekopalypse

or maybe this one might be even better
(?s)(<<)\h+(["'])(\w+?)\2\h*;.*?\3

Gilles Maisonneuve

Can’t reply what I wanted, a robot says I’m spamming…