Community
    • Login

    Perl language syntax highlighting troubles (bug or limitation ?)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    112 Posts 6 Posters 44.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse
      last edited by

      The perl lexer does have some glitches as it seems.
      Copy the following into a document and set the lexer to perl.

      NULL __FILE__ __LINE__ __PACKAGE__ 
      
      AUTOLOAD BEGIN CORE DESTROY END EQ GE GT INIT LE LT NE CHECK abs accept alarm and atan2 bind binmode 
      bless caller chdir chmod chomp chop chown chr chroot close closedir cmp connect continue cos crypt dbmclose 
      dbmopen defined delete die do dump each else elsif endgrent endhostent endnetent endprotoent endpwent 
      endservent eof eq eval exec exists exit exp fcntl fileno flock for foreach fork formline ge getc getgrent 
      getgrgid getgrnam gethostbyaddr gethostbyname gethostent getlogin getnetbyaddr getnetbyname getnetent 
      getpeername getpgrp getppid getpriority getprotobyname getprotobynumber getprotoent getpwent getpwnam 
      getpwuid getservbyname getservbyport getservent getsockname getsockopt glob gmtime goto grep gt hex if index 
      int ioctl join keys kill last lc lcfirst le length link listen local localtime lock log lstat lt map mkdir msgctl msgget 
      msgrcv msgsnd my ne next no not oct open opendir or ord our pack package pipe pop pos print printf prototype 
      push  quotemeta qu rand read readdir readline readlink readpipe recv redo ref rename require reset return 
      reverse rewinddir rindex rmdir scalar seek seekdir select semctl semget semop send setgrent sethostent setnetent 
      setpgrp setpriority setprotoent setpwent setservent setsockopt shift shmctl shmget shmread shmwrite shutdown 
      sin sleep socket socketpair sort splice split sprintf sqrt srand stat study substr symlink syscall sysopen sysread 
      sysseek system syswrite tell telldir tie tied time times truncate uc ucfirst umask undef unless unlink unpack unshift untie until use utime 
      values vec wait waitpid wantarray warn while write  xor 
      
      q qq qr qw qx tr sub format m x y s
      __DATA__ __END__ 
      

      Now, start playing with the keywords from the line q qq qr qw qx tr sub format m x y s by copying in between
      the other keywords. Strange, isn’t it.

      As I don’t know perl good enough, I would ask for your opinion if removing these special keywords from the
      langs.xml and using something like the enhance udl lexer script to workaround this issue does make sense?

      Btw. I checked with SciTE Version 4.0.3 and it looks similar weird.

      1 Reply Last reply Reply Quote 2
      • Gilles MaisonneuveG
        Gilles Maisonneuve
        last edited by

        Well, well…

        When I copy your text into a n++ window thne using menu “Language -> Perl” I get all the keywords from ‘NULL’ until ‘xor’ in yellow on blue (which is good for Perl keywords in my color scheme based on “vim Dark Blue”) and ‘q’ until ‘END’ in white, (meaning “DEFAULT” in the syntax coloring), so the last set of keywords are not recognized.
        For example I don’t get the color error you get with “quotemeta qu”, mines are all yellow (good). ‘shmctl shmget’ is good for me too.

        (sorry I don’t know how to present things as you do with block of text colored as your message above and could not find a way to include a small screen-copy of my n++ window. I would be glad to know how to do it for a next time).

        So it means that either Scintilla lexer or n++ lexer does not recognize these keywords even when stored into the keywords’ list. OK, so noticed.

        If someday there is a big solution, I guess we’ll (Perl users) be happy to get it. If in the meantime someone knows a workaround it will be nice.

        Thank you for your time and answer anyway, that’s very kind of you.

        1 Reply Last reply Reply Quote 2
        • EkopalypseE
          Ekopalypse
          last edited by Ekopalypse

          To post some block of code I use three ~ and the code and end with three ~
          but the color you see in this thread is not relevant to npp at all.

          To include an image you have to upload it to an image hoster first and then
          use the syntax ![](url_to_uploaded_image)

          Concerning the issue, as this happens in SciTE as well it is within the scintilla lexer.
          A workaround is the one a stated earlier in this thread - using PythonScript plugin
          and a script similar to the one I posted for the UDL enhancements.
          I would create such a script but as said this means, one needs to delete the problematic
          keywords from langs.xml and have to use the PythonScript plugin in addition.

          By the way, the _DATA_ and _END_ keywords behave strange also as, once they appear,
          color everything else afterwards in blue.

          1 Reply Last reply Reply Quote 2
          • Gilles MaisonneuveG
            Gilles Maisonneuve
            last edited by Gilles Maisonneuve

            All right I found how you included image in your text (not the code sample though) and did the same, hope it will render here too…
            This is a copy of my n++ with 'buggy' perl code coloring

            Well it seems it does not display in the text on the contrary as yours but at least I can place a link to an image. Not so user friendly but better than nothing.

            And now your text code with my rendering...

            edited: thank your for the image help; I added the ‘!’ before the link as explained, it works.

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones
              last edited by

              @Ekopalypse said:

              By the way, the DATA and END keywords behave strange also as, once they appear,
              color everything else afterwards in blue.

              That’s actually correct Perl: __DATA__ and __END__ are special triggers that tell the perl interpreter to ignore everything else in the document – nothing after either of those is valid Perl code, and the lexer is correct to style those according to the DATASECTION style in the Style Configurator.

              EkopalypseE 1 Reply Last reply Reply Quote 2
              • Gilles MaisonneuveG
                Gilles Maisonneuve
                last edited by

                About the DATA and END keywords I think their coloring is wrong as they should be regarded as keywords or text or longquote so should not reader as"default" text.

                But the fact that after them everything is greyed or “whited” is normal as in Perl the remaining is not Perl code any longer but pure data as it was on punched cards when we still used them : the text after those keywords is regarded as pure raw data by perl so not colored by a syntax analysis tool.

                All right then if it’s a Scintilla bug, lets wait for a possible correction of their lexer.
                Is there anything we can do to let them know this little trouble ? They might consider fix it in the future if they know.

                EkopalypseE 1 Reply Last reply Reply Quote 1
                • EkopalypseE
                  Ekopalypse @PeterJones
                  last edited by

                  @PeterJones - thx for clarification. So a regex for the problematic keywords would be this?

                  \bq\b|\bqq\b|\bqr\b|\bqw\b|\bqx\b|\btr\b|\bsub\b|\bformat\b|\bm\b|\bx\b|\by\b|\bs\b
                  

                  Is there a nicer way instead of encasing every keyword by word boundary switch?

                  @Gilles-Maisonneuve
                  yes, and now type one of the words like q, sub, format … into the area with the correct
                  colored keywords and see what happens.

                  1 Reply Last reply Reply Quote 2
                  • EkopalypseE
                    Ekopalypse @Gilles Maisonneuve
                    last edited by

                    @Gilles-Maisonneuve

                    All right then if it’s a Scintilla bug, lets wait for a possible correction of their lexer.

                    not sure if this really helps in case of npp usage because once the perl lexer gets updated
                    npp needs to update it as well and as said, npp uses a rather old scintilla at the moment.

                    1 Reply Last reply Reply Quote 1
                    • PeterJonesP
                      PeterJones
                      last edited by

                      @Ekopalypse said:

                      Is there a nicer way instead of encasing every keyword by word boundary switch?

                      Encase it in bulk?

                      \b(q|qq|qr|qw|qx|tr|sub|format|m|x|y|s)\b
                      

                      Though I don’t think that list is right.

                      The reason why the “sub” and some of the others didn’t highlight is because the some of the quote-like operators require extra symbols to complete it, otherwise, it’s not considered a keyword:
                      https://i.imgur.com/zf92skf.png

                      As you can see, the sub and format do highlight as keywords. The m// and s/// also highlight properly when they are complete regular-expression notation. The x is an operator, and highlights properly as an operator.

                      The non-m and non-s quote-like operators q qq qr qw qx tr y, however, appear to not be coloring at all, even when properly closed, despite the fact that the lexer can recognize that they need to be properly opened and closed.

                      I would say the list of keywords that need enhanced-UDL highlighting (until fixed in the lexer) are limited to:

                      \b(q|qq|qr|qw|qx|tr|y)\b
                      

                      @Gilles-Maisonneuve ,

                      About the DATA and END keywords I think their coloring is wrong as they should be regarded as keywords or text or longquote so should not reader as"default" text.

                      The lexer is applying the same DATASECTION styling to the __DATA__ and __END__ keywords as to the text beyond them: it’s not applying “default” style, when used in proper syntax:
                      https://i.imgur.com/dsOAYIc.png
                      I’m not sure that’s as much a bug as a difference in opinion on how the DATASECTION lexing should be handled; I think it was probably a design decision, rather than accidentally including them as part of the DATASECTION

                      But, as such, I don’t think they should be listed in the INSTRUCTION WORDS (keywords) list – though they also don’t need handling by the enhanced UDL

                      1 Reply Last reply Reply Quote 3
                      • PeterJonesP
                        PeterJones
                        last edited by

                        @Ekopalypse ,

                        I forgot to include the source-code of my image:

                        my $x = m//;
                        
                        # if the q is properly bracketed, it allows others to highlight properly
                        q(qq qr qw qx tr sub format m x y s);
                        q{qq qr qw qx tr sub format m x y s};
                        q/qq qr qw qx tr sub format m x y s/;
                        sub blah { 'properly highlighted sub' }
                        format Somethine = 
                            Test: @<<<<<<<< @|||| @>>>>>>
                                  $str,     $%,   '$'.int($num)
                        .
                        sub another { 'here' }
                        
                        # x is actually an operator (highlighted blue), which says "repeat string n times"
                        'blah' x 5;
                        
                        # these need two matching symbols, or a matched set of paren-like symbols
                        q//;    q(); 
                        qq//;   qq{}; 
                        qr//;   qr(); 
                        qw//;   qw[]; 
                        qx//;   qx(); 
                        # these need three matching symbols, or two matched sets`
                        tr///;      tr(srch)(repl)opts;
                        y///;       y{srch}{repl}opts;
                        # m and s are special, in that they actually show up as part of the regex coloring
                        m/search/opts;              m(search)opts;
                        s/search/replace/opts;      s(search)(replace)opts;
                        
                        sub blah {} # still highlights right
                        
                        # here, sub won't highlight properly because the q operator isn't complete
                        q
                        
                        sub blah {
                        }
                        
                        __END__
                        grey
                        
                        1 Reply Last reply Reply Quote 3
                        • EkopalypseE
                          Ekopalypse
                          last edited by

                          With these information the SciTE looks much better, maybe mostly correct, then.

                          Except for the qr line, where the error text is also colored it should look like these, right?

                          1 Reply Last reply Reply Quote 3
                          • Gilles MaisonneuveG
                            Gilles Maisonneuve
                            last edited by

                            @Ekopalypse
                            In the case of using ‘q’, ‘format’ … amid the block of text containing valid perl keyword: as this block is in a n++ Windows where we asked for a Perl syntax highlighting it seems quite normal:

                            • list item a. q, qq, (i guess qx and qr too, did not verify…) accept any char as separator, so the first char (lets say “q gethostbyname gethostent”) the separator is ‘g’ (blank not accepted, ignored) until next ‘g’, so ‘gethostbyname’ is white (my “default” color), then ‘g’ of gethostent terminates the q string and is white too (according to me the two ‘g’ used as separators should be colored as separators but that would mean the editor really understands the character in the context of execution of Perl… woaw !, not even ActiveSate Komodo does it as it considers the first and last separators as either “'” quote or ‘"’ quote if using qq) ;
                            • list item b. same for format: the next word is a format name it’s not a string it’s an ID (like the static file ID inside a diamond operator (like while (<MYFILE>) {…}) as all the formats are created at compile time. Then coloring goes back to “default” in coloring scheme. Not nice but admitable (if this adjective exists in English…)

                            I did not test all the other cases but it might be that we fall on the same kind of syntaxic rules.

                            My problem with the syntax coloring in all the q* operators is that it should be colored as TEXT to respect Perl definition of those operators (Quote and Quote-like Operators, thus should be grey in your case most likely and in mine a kind of low light of pink. But they fall back to white, the “default” color.

                            Look at what this looks in AS Komodo: link text

                            All right I understand, since n++ uses an old version of Sintilla editor toolkit and that’s not a piece of cake to migrate to a newer version, it’s not tomorrow that it will be done even if the hypothetical newer version of Scintilla corrects it.
                            Well, after all it’s not something horrible, just annoying. No need to break all the n++ code to integrate a new Scintilla for that.

                            @PeterJones
                            All right for DATA (== END); my mistake : my color for them is white ni my color schme, and white is the default color but perhaps the same as text for others. Not really important for me (Komodo deals with it with a different coloring, but not a big fuss)

                            But the here document syntax coloring is more annoying : ‘<<’ should be colored either as the other separators or keywords according to me. It’s clearly syntaxicaly not matching.

                            Thank you both for your answers.

                            Gilles

                            1 Reply Last reply Reply Quote 2
                            • PeterJonesP
                              PeterJones
                              last edited by

                              @Gilles-Maisonneuve said:

                              But the here document syntax coloring is more annoying : ‘<<’ should be colored either as the other separators or keywords according to me

                              Yeah. I wouldn’t say keywords; I would think either operators or punctuation would be the appropriate place for the << to get its coloring from.

                              @Ekopalypse ,
                              Thanks for the SciTE comparison. I’d say in that test doc, it was highlighting in a reasonable manner; maybe not exactly what I thought would be, but at least it’s recognizing and highlighting the syntax

                              Since you have access to the SciTE, I’d like to see how it renders these examples of heredoc syntax, if you have time:

                              # for completeness, << as shift operator
                              $b = (1 << 5);
                              
                              # heredoc with quotes
                              $x =<<"EOX";
                              Something with embedded $y
                              EOX
                              
                              # heredoc without quotes
                              $z =<<EOZ;
                              Plain text here
                              EOZ
                              
                              # heredoc with space highlights as operator in Notepad++
                              $z =<< EOZ;
                              Plain text here
                              EOZ
                              
                              # all the heredoc text formats as Notepad++ default, rather than any of the Perl-specific style categories
                              

                              I’m curious which of those << that SciTE will color, and which it won’t.

                              Thanks.

                              Alan KilbornA 1 Reply Last reply Reply Quote 2
                              • EkopalypseE
                                Ekopalypse
                                last edited by

                                1 Reply Last reply Reply Quote 2
                                • PeterJonesP
                                  PeterJones
                                  last edited by

                                  @Ekopalypse ,

                                  Wow, lightning fast. :-)

                                  Except for the last, that’s what I’d actually hope for.

                                  I just learned something: according to perlop, in order to allow the space between the << and the EOZ, it actually has to be quoted.

                                  There may not be a space between the << and the identifier, unless the identifier is explicitly quoted.

                                  Before reading that, I was going to say that the lexer was missing that functionality. But I guess we’d have to check

                                  $z =<< "EOZ";
                                  Plain text here
                                  EOZ
                                  

                                  to see if it knows that exception.

                                  So, the updated perl lexer in scintilla definitely handles perl highlighting better than the version that’s in Notepad++.

                                  EkopalypseE 1 Reply Last reply Reply Quote 2
                                  • EkopalypseE
                                    Ekopalypse @PeterJones
                                    last edited by

                                    @PeterJones

                                    Is it only me or is the server acting strange today?
                                    I get 503 and 4s and no updates - have to manually refresh the page …

                                    Meta ChuhM 1 Reply Last reply Reply Quote 2
                                    • Meta ChuhM
                                      Meta Chuh moderator @Ekopalypse
                                      last edited by

                                      @Ekopalypse

                                      Is it only me or is the server acting strange today?
                                      I get 503 and 4s and no updates - have to manually refresh the page …

                                      yes, the downtimes today are higher than usual.
                                      i hope it’s not another ddos attack.
                                      anyone who knows more, please keep us informed.

                                      1 Reply Last reply Reply Quote 1
                                      • EkopalypseE
                                        Ekopalypse
                                        last edited by Ekopalypse

                                        By using these regexes, I know they aren’t optimal yet, we could get something like
                                        this npp snipped picture. Note, I just used the blue color for showing the difference to error text.
                                        What is a nice regex way to do something like if ( then ) or if [ then ] or if { then } ??
                                        And of course by creating match groups we could divide the quoting operators from the following “correct” text which then would be colored differently - if wanted.

                                        I have to stay up early tomorrow - so chrchrchr… :-)

                                        \b(q|qq|qr|qw|qx|tr|y)\b
                                        \b(q|qq|qr|qw|qx|tr|y)\b([\W]).*?\2
                                        \b(q|qq|qr|qw|qx|tr|y)\b(\().*?\)
                                        \b(q|qq|qr|qw|qx|tr|y)\b(\[).*?\]
                                        \b(q|qq|qr|qw|qx|tr|y)\b(\{).*?\}
                                        \b(q|qq|qr|qw|qx|tr|y)\b\h+(\w).*?\2
                                        

                                        1 Reply Last reply Reply Quote 2
                                        • Alan KilbornA
                                          Alan Kilborn @PeterJones
                                          last edited by

                                          @PeterJones said:

                                          Since you have access to the SciTE…

                                          Doesn’t everyone have access to it ?

                                          1 Reply Last reply Reply Quote 1
                                          • PeterJonesP
                                            PeterJones
                                            last edited by

                                            @Alan-Kilborn said:

                                            Doesn’t everyone have access to it ?

                                            I was originally going to phrase it as “easy access (ie, already installed/available on your machine)”. But what I really should have said was “I am just about to leave for the day, and don’t feel like downloading another piece of software and mussing about with getting it installed or otherwise running, and figuring out how to get it to behave in the manner that Eko has already proved he knows how to make it work”, so stuck with the shorthand of “have access to”. :-)

                                            1 Reply Last reply Reply Quote 3
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors