Perl language syntax highlighting troubles (bug or limitation ?)

  • @PeterJones - thx for clarification. So a regex for the problematic keywords would be this?


    Is there a nicer way instead of encasing every keyword by word boundary switch?

    yes, and now type one of the words like q, sub, format … into the area with the correct
    colored keywords and see what happens.

  • @Gilles-Maisonneuve

    All right then if it’s a Scintilla bug, lets wait for a possible correction of their lexer.

    not sure if this really helps in case of npp usage because once the perl lexer gets updated
    npp needs to update it as well and as said, npp uses a rather old scintilla at the moment.

  • @Ekopalypse said:

    Is there a nicer way instead of encasing every keyword by word boundary switch?

    Encase it in bulk?


    Though I don’t think that list is right.

    The reason why the “sub” and some of the others didn’t highlight is because the some of the quote-like operators require extra symbols to complete it, otherwise, it’s not considered a keyword:

    As you can see, the sub and format do highlight as keywords. The m// and s/// also highlight properly when they are complete regular-expression notation. The x is an operator, and highlights properly as an operator.

    The non-m and non-s quote-like operators q qq qr qw qx tr y, however, appear to not be coloring at all, even when properly closed, despite the fact that the lexer can recognize that they need to be properly opened and closed.

    I would say the list of keywords that need enhanced-UDL highlighting (until fixed in the lexer) are limited to:


    @Gilles-Maisonneuve ,

    About the DATA and END keywords I think their coloring is wrong as they should be regarded as keywords or text or longquote so should not reader as"default" text.

    The lexer is applying the same DATASECTION styling to the __DATA__ and __END__ keywords as to the text beyond them: it’s not applying “default” style, when used in proper syntax:
    I’m not sure that’s as much a bug as a difference in opinion on how the DATASECTION lexing should be handled; I think it was probably a design decision, rather than accidentally including them as part of the DATASECTION

    But, as such, I don’t think they should be listed in the INSTRUCTION WORDS (keywords) list – though they also don’t need handling by the enhanced UDL

  • @Ekopalypse ,

    I forgot to include the source-code of my image:

    my $x = m//;
    # if the q is properly bracketed, it allows others to highlight properly
    q(qq qr qw qx tr sub format m x y s);
    q{qq qr qw qx tr sub format m x y s};
    q/qq qr qw qx tr sub format m x y s/;
    sub blah { 'properly highlighted sub' }
    format Somethine = 
        Test: @<<<<<<<< @|||| @>>>>>>
              $str,     $%,   '$'.int($num)
    sub another { 'here' }
    # x is actually an operator (highlighted blue), which says "repeat string n times"
    'blah' x 5;
    # these need two matching symbols, or a matched set of paren-like symbols
    q//;    q(); 
    qq//;   qq{}; 
    qr//;   qr(); 
    qw//;   qw[]; 
    qx//;   qx(); 
    # these need three matching symbols, or two matched sets`
    tr///;      tr(srch)(repl)opts;
    y///;       y{srch}{repl}opts;
    # m and s are special, in that they actually show up as part of the regex coloring
    m/search/opts;              m(search)opts;
    s/search/replace/opts;      s(search)(replace)opts;
    sub blah {} # still highlights right
    # here, sub won't highlight properly because the q operator isn't complete
    sub blah {

  • With these information the SciTE looks much better, maybe mostly correct, then.

    Except for the qr line, where the error text is also colored it should look like these, right?

  • @Ekopalypse
    In the case of using ‘q’, ‘format’ … amid the block of text containing valid perl keyword: as this block is in a n++ Windows where we asked for a Perl syntax highlighting it seems quite normal:

    • list item a. q, qq, (i guess qx and qr too, did not verify…) accept any char as separator, so the first char (lets say “q gethostbyname gethostent”) the separator is ‘g’ (blank not accepted, ignored) until next ‘g’, so ‘gethostbyname’ is white (my “default” color), then ‘g’ of gethostent terminates the q string and is white too (according to me the two ‘g’ used as separators should be colored as separators but that would mean the editor really understands the character in the context of execution of Perl… woaw !, not even ActiveSate Komodo does it as it considers the first and last separators as either “’” quote or ‘"’ quote if using qq) ;
    • list item b. same for format: the next word is a format name it’s not a string it’s an ID (like the static file ID inside a diamond operator (like while (<MYFILE>) {…}) as all the formats are created at compile time. Then coloring goes back to “default” in coloring scheme. Not nice but admitable (if this adjective exists in English…)

    I did not test all the other cases but it might be that we fall on the same kind of syntaxic rules.

    My problem with the syntax coloring in all the q* operators is that it should be colored as TEXT to respect Perl definition of those operators (Quote and Quote-like Operators, thus should be grey in your case most likely and in mine a kind of low light of pink. But they fall back to white, the “default” color.

    Look at what this looks in AS Komodo: link text

    All right I understand, since n++ uses an old version of Sintilla editor toolkit and that’s not a piece of cake to migrate to a newer version, it’s not tomorrow that it will be done even if the hypothetical newer version of Scintilla corrects it.
    Well, after all it’s not something horrible, just annoying. No need to break all the n++ code to integrate a new Scintilla for that.

    All right for DATA (== END); my mistake : my color for them is white ni my color schme, and white is the default color but perhaps the same as text for others. Not really important for me (Komodo deals with it with a different coloring, but not a big fuss)

    But the here document syntax coloring is more annoying : ‘<<’ should be colored either as the other separators or keywords according to me. It’s clearly syntaxicaly not matching.

    Thank you both for your answers.


  • @Gilles-Maisonneuve said:

    But the here document syntax coloring is more annoying : ‘<<’ should be colored either as the other separators or keywords according to me

    Yeah. I wouldn’t say keywords; I would think either operators or punctuation would be the appropriate place for the << to get its coloring from.

    @Ekopalypse ,
    Thanks for the SciTE comparison. I’d say in that test doc, it was highlighting in a reasonable manner; maybe not exactly what I thought would be, but at least it’s recognizing and highlighting the syntax

    Since you have access to the SciTE, I’d like to see how it renders these examples of heredoc syntax, if you have time:

    # for completeness, << as shift operator
    $b = (1 << 5);
    # heredoc with quotes
    $x =<<"EOX";
    Something with embedded $y
    # heredoc without quotes
    $z =<<EOZ;
    Plain text here
    # heredoc with space highlights as operator in Notepad++
    $z =<< EOZ;
    Plain text here
    # all the heredoc text formats as Notepad++ default, rather than any of the Perl-specific style categories

    I’m curious which of those << that SciTE will color, and which it won’t.


  • @Ekopalypse ,

    Wow, lightning fast. :-)

    Except for the last, that’s what I’d actually hope for.

    I just learned something: according to perlop, in order to allow the space between the << and the EOZ, it actually has to be quoted.

    There may not be a space between the << and the identifier, unless the identifier is explicitly quoted.

    Before reading that, I was going to say that the lexer was missing that functionality. But I guess we’d have to check

    $z =<< "EOZ";
    Plain text here

    to see if it knows that exception.

    So, the updated perl lexer in scintilla definitely handles perl highlighting better than the version that’s in Notepad++.

  • @PeterJones

    Is it only me or is the server acting strange today?
    I get 503 and 4s and no updates - have to manually refresh the page …

  • @Ekopalypse

    Is it only me or is the server acting strange today?
    I get 503 and 4s and no updates - have to manually refresh the page …

    yes, the downtimes today are higher than usual.
    i hope it’s not another ddos attack.
    anyone who knows more, please keep us informed.

  • By using these regexes, I know they aren’t optimal yet, we could get something like
    this npp snipped picture. Note, I just used the blue color for showing the difference to error text.
    What is a nice regex way to do something like if ( then ) or if [ then ] or if { then } ??
    And of course by creating match groups we could divide the quoting operators from the following “correct” text which then would be colored differently - if wanted.

    I have to stay up early tomorrow - so chrchrchr… :-)


  • @PeterJones said:

    Since you have access to the SciTE…

    Doesn’t everyone have access to it ?

  • @Alan-Kilborn said:

    Doesn’t everyone have access to it ?

    I was originally going to phrase it as “easy access (ie, already installed/available on your machine)”. But what I really should have said was “I am just about to leave for the day, and don’t feel like downloading another piece of software and mussing about with getting it installed or otherwise running, and figuring out how to get it to behave in the manner that Eko has already proved he knows how to make it work”, so stuck with the shorthand of “have access to”. :-)

  • Using this pythonscript results in something like the attached picture
    using Obsidian theme

    # -*- coding: utf-8 -*-
    from Npp import editor, editor1, editor2, notepad, NOTIFICATION, SCINTILLANOTIFICATION, INDICATORSTYLE
    from collections import OrderedDict
    regexes = OrderedDict()
    # ------------------------------------------------- configuration area ---------------------------------------------------
    # id which is returned by editor.getLexer()
    BUILTIN_LEXER_ID = 6  # perl
    # Definition of colors and regular expressions
    #   Note, the order in which regular expressions will be processed is determined by its creation,
    #   that is, the first definition is processed first, then the 2nd, and so on
    #   The basic structure always looks like this
    #   regexes[(a, b)] = (c, d)
    #   regexes = an ordered dictionary which ensures that the regular expressions are always processed in the same order
    #   a = a unique number - suggestion, start with 0 and always increase by one
    #   b = color is either in the form of (r,g,b) or a single integer without round brackets.
    #       It is assumed that a single integer reflects an existing style id of the current lexer
    #       as defined in stylers.xml -> benefit it works with different themes flawlessly (as long as the theme is correct, of course)
    #   c = raw byte string, describes the regular expression. Example r'\w+'
    #   d = list of ingegers -> using different match group results per regex
    # examples for enhancing the perl lexer
    # color every word instance of q|qq|qr|qw|qx|tr|y
    # with the same color as defined by style id 5 using result from matchgroup 0
    regexes[(1, 5)] = (r'\bq[rwqx]{0,1}\b([^\h]).*?\1|(\bq[rwqx]{0,1}\b\h+(\w).*?\3)', [0])
    regexes[(2, 5)] = (r'\bq[rwqx]{0,1}\b\h*(\(.+?\)|[.+?]|\{.+?\})', [0])
    # in the same color as defined by style id 5 using results from matchgroup 1 and 4
    regexes[(3, (130,130,170))] = (r'(?s)((<<)"*(\w+?)"*;.*?\3)', [2])
    regexes[(4, (130,130,170))] = (r'(?s)((<<)\h+"(\w+?)";.*?\3)', [2,3])
    # Definition of which area should not be styled
    # One needs to check the stylers.xml (or THEMENAME.xml) to be able to see which
    # ids are defined by the lexer in use and what there purposes are
    # Example: perl defines ids 0 to 21 (without 11, 15, and 16) (??? - historical reasons ???)
            # <LexerType name="perl" desc="Perl" ext="">
                # <WordsStyle name="DEFAULT" styleID="0" fgColor="FF0000" ...
                # <WordsStyle name="ERROR" styleID="1" fgColor="FF80C0" ...
                # <WordsStyle name="COMMENT LINE" styleID="2" fgColor="008000" ...
                # <WordsStyle name="POD" styleID="3" fgColor="000000" 
                # <WordsStyle name="NUMBER" styleID="4" fgColor="FF0000" 
                # <WordsStyle name="INSTRUCTION WORD" styleID="5" fgColor="0000FF"
                # <WordsStyle name="STRING" styleID="6" fgColor="808080" 
                # <WordsStyle name="CHARACTER" styleID="7" fgColor="808080" 
                # <WordsStyle name="PUNCTUATION" styleID="8" fgColor="804000" 
                # <WordsStyle name="PREPROCESSOR" styleID="9" fgColor="804000" 
                # <WordsStyle name="OPERATOR" styleID="10" fgColor="000080" 
                # <WordsStyle name="SCALAR" styleID="12" fgColor="FF8000" 
                # <WordsStyle name="ARRAY" styleID="13" fgColor="CF34CF" 
                # <WordsStyle name="HASH" styleID="14" fgColor="8080C0" 
                # <WordsStyle name="SYMBOL TABLE" styleID="15" fgColor="FF0000" 
                # <WordsStyle name="REGEX" styleID="17" fgColor="8080FF" 
                # <WordsStyle name="REGSUBST" styleID="18" fgColor="8080C0" 
                # <WordsStyle name="LONGQUOTE" styleID="19" fgColor="FF8000" 
                # <WordsStyle name="BACKTICKS" styleID="20" fgColor="FFFF00" 
                # <WordsStyle name="DATASECTION" styleID="21" fgColor="808080" 
            # </LexerType>
    # by definining 1 and 2 means, that a regex match would be ignored if the
    # position, which should be colored, has been styled as ERROR or COMMENT LINE
    excluded_styles = [1, 2]
    # ------------------------------------------------ /configuration area ---------------------------------------------------
    except NameError:
        SC_INDICVALUEBIT = 0x1000000
        class SingletonEnhanceBuiltinLexer(type):
                Ensures, more or less, that only one
                instance of the main class can be instantiated
            _instance = None
            def __call__(cls, *args, **kwargs):
                if cls._instance is None:
                    cls._instance = super(SingletonEnhanceBuiltinLexer, cls).__call__(*args, **kwargs)
                return cls._instance
        class EnhanceBuiltinLexer(object):
                Provides additional color options and should be used in conjunction with the built-in UDL function.
                An indicator is used to avoid style collisions.
                Although the Scintilla documentation states that indicators 0-7 are reserved for the lexers,
                indicator 0 is used because UDL uses none internally.
                Even when using more than one regex, it is not necessary to define more than one indicator
                because the class uses the flag SC_INDICFLAG_VALUEFORE.
                See for more information on that topic
            __metaclass__ = SingletonEnhanceBuiltinLexer
            def __init__(self):
                    Instantiated the class,
                    because of __metaclass__ = ... usage, is called once only.
                editor.callbackSync(self.on_updateui, [SCINTILLANOTIFICATION.UPDATEUI])
                notepad.callback(self.on_langchanged, [NOTIFICATION.LANGCHANGED])
                notepad.callback(self.on_bufferactivated, [NOTIFICATION.BUFFERACTIVATED])
                self.doc_is_of_interest = False
                self.lexer_id = BUILTIN_LEXER_ID
            def rgb(r, g, b):
                    Helper function
                    Retrieves rgb color triple and converts it
                    into its integer representation
                        r = integer, red color value in range of 0-255
                        g = integer, green color value in range of 0-255
                        b = integer, blue color value in range of 0-255
                return (b << 16) + (g << 8) + r
            def paint_it(color, matchgroups, match):
                    This is where the actual coloring takes place.
                    Color, matchgroups and match object must be provided.
                    Matchgroups define which group(s) is(are) of interest
                    Coloring occurs only if the position is not within the excluded range.
                        color = integer, expected in range of 0-16777215
                        matchgroups = list of integers
                        match = python re.match object
                for group in matchgroups:
                    pos = match.span(group)[0]
                    if pos < 0 or editor.getStyleAt(pos) in excluded_styles:
                    editor.indicatorFillRange(pos, match.span(group)[1] - pos)
            def style(self):
                    Calculates the text area to be searched for in the current document.
                    Deletes the old indicators before setting new ones and
                    calls the defined regexes.
                start_line = editor.docLineFromVisible(editor.getFirstVisibleLine())
                end_line = editor.docLineFromVisible(start_line + editor.linesOnScreen())
                start_position = editor.positionFromLine(start_line)
                end_position = editor.getLineEndPosition(end_line)
                editor.indicatorClearRange(0, editor.getTextLength())
                for color, regex in self.regexes.items():
                                    lambda match: self.paint_it(color[1],
            def configure(self):
                    Define basic indicator settings and reformat needed regexes.
                editor1.indicSetStyle(0, INDICATORSTYLE.TEXTFORE)
                editor1.indicSetFlags(0, SC_INDICFLAG_VALUEFORE)
                editor2.indicSetStyle(0, INDICATORSTYLE.TEXTFORE)
                editor2.indicSetFlags(0, SC_INDICFLAG_VALUEFORE)
                regex_list = []
                for k, v in regexes.items():
                    if isinstance(k[1], tuple):
                        fg_color = k[1]
                        fg_color = editor.styleGetFore(k[1])
                    regex_list.append(((k[0], self.rgb(*fg_color) | SC_INDICVALUEBIT), v))
                self.regexes = OrderedDict(regex_list)
            def check_lexer(self):
                    Checks if the current document is of interest
                    and sets the flag accordingly
                self.doc_is_of_interest = True if editor.getLexer() == self.lexer_id else False
            def on_bufferactivated(self, args):
                    Callback which gets called every time one switches a document.
                    Triggers the check if the document is of interest.
                        provided by notepad object but none are of interest
            def on_updateui(self, args):
                    Callback which gets called every time scintilla
                    (aka the editor) changed something within the document.
                    Triggers the styling function if the document is of interest.
                        provided by scintilla but none are of interest
                if self.doc_is_of_interest:
            def on_langchanged(self, args):
                    Callback gets called every time one uses the Language menu to set a lexer
                    Triggers the check if the document is of interest
                        provided by notepad object but none are of interest
            def main(self):
                    Main function entry point.
                    Simulates two events to enforce detection of current document
                    and potential styling.

  • @Ekopalypse
    Woaw! that seems wonderful.

    Small question however, as I don’t have a clue about all what you wrote: how do I make this work for me? I don’t know Python, have no Python IDE or language installed on my machine. What should I do to make n++ uses this Python “trick” ?

    Thanks anyway for the work.

  • @Gilles-Maisonneuve

    may I ask you to post your debug info? (available under ? menu)
    to see which version of npp you are using and how you have set it up?
    This would make it easier to describe what needs to be done.

  • @Gilles-Maisonneuve

    What should I do to make n++ uses this Python “trick” ?

    if you are already on notepad++ 7.6.3 or 7.6.4, the first thing you have to do, is to install the pythonscript plugin ,
    by following the Guide: How to install the PythonScript plugin on Notepad++ 7.6.3, 7.6.4 and above:

  • @Ekopalypse
    Here it is :
    Notepad++ v7.3.2 (32-bit)
    Build time : Feb 12 2017 - 23:15:39
    Path : C:\Program Files (x86)\Notepad++\notepad++.exe
    Admin mode : OFF
    Local Conf mode : OFF
    OS : Windows 7 (64-bit)
    Plugins : ComparePlugin.dll CustomizeToolbar.dll DSpellCheck.dll HTMLTag_unicode.dll MathPad.dll MenuIcons.dll mimeTools.dll NppCCompletionPlugin.dll NppColumnSort.dll NppExec.dll NppExport.dll NppSaveAsAdmin.dll NppTextFX.dll PluginManager.dll regrexplace.dll SessionMgr.dll

  • @Meta-Chuh

    All right I am going to migrate right away and I come back to give back my new debug info

    @Ekopalypse : please do not take into account my previous information for the moment.

Log in to reply