• Login
Community
  • Login

Perl language syntax highlighting troubles (bug or limitation ?)

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
112 Posts 6 Posters 44.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    Ekopalypse @Gilles Maisonneuve
    last edited by Mar 19, 2019, 11:44 PM

    @Gilles-Maisonneuve

    maybe this picture makes it a little bit clearer

    1 Reply Last reply Reply Quote 2
    • E
      Ekopalypse @Gilles Maisonneuve
      last edited by Ekopalypse Mar 19, 2019, 11:49 PM Mar 19, 2019, 11:47 PM

      @Gilles-Maisonneuve

      still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’

      as less as possible - non-greedy

      and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in >subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks >surrounded by parenthèses only.

      placeholder for what was found in match group 3, to find the EOT at the end

      and there are 3 match groups or am I missing something??

      G 1 Reply Last reply Mar 20, 2019, 12:03 AM Reply Quote 1
      • G
        Gilles Maisonneuve @Ekopalypse
        last edited by Mar 20, 2019, 12:03 AM

        @Ekopalypse

        2 sets of parenteses only, where is the third set ?
        so only 2 match groups

        can you make this work :

        no syntax error on the python console but absolutely no result, where is my bug ?

        regexes[(3, (255,255,255))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [1])
        
        E 1 Reply Last reply Mar 20, 2019, 12:19 AM Reply Quote 0
        • E
          Ekopalypse @Gilles Maisonneuve
          last edited by Ekopalypse Mar 20, 2019, 12:20 AM Mar 20, 2019, 12:19 AM

          @Gilles-Maisonneuve

          [1] informs the python script, that only the results from sub match group 1 should be colored in white (255,255,255)
          sub match group 1 is the result of (<<)

          In order to make it painting all you can use [0]

          I’m still confused about the 2 to 3 match groups.
          Am I incorrect when saying that
          (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
          (<<)
          ("{0,1}.+"{0,1})
          are three match groups?

          Maybe the confusion comes from the fact that references matches within a
          regular expression starts by 1 but python starts counting match results by 0.

          Sorry, but I have to stay up early tomorrow and it is already 1am but I’m really
          interested in solving our (mis)understanding today later (maybe in ~16-18hours)?

          1 Reply Last reply Reply Quote 1
          • G
            Gilles Maisonneuve
            last edited by Mar 20, 2019, 12:20 AM

            ok, tomorrow is another day
            ‘see’ you tomorrow.
            have a good night.
            g

            1 Reply Last reply Reply Quote 1
            • E
              Ekopalypse
              last edited by Mar 20, 2019, 12:21 AM

              you too - see you

              G 1 Reply Last reply Mar 20, 2019, 12:27 AM Reply Quote 1
              • G
                Gilles Maisonneuve @Ekopalypse
                last edited by Mar 20, 2019, 12:27 AM

                @Ekopalypse

                OK, so the

                (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
                

                is a regex group, not a function call surrounded by parenthèses or a logical group provided by the ‘r’ keyword. My mistake.
                BUT THEN, it is possible in Python to enclose an instruction such as ?\3 which means (as far as I understood what you explained to me earlier) recursive reference to a regexp named ‘3’) ??? The ‘3’ name being given in the expression regexes[(3, (255,255,255))] is that correct ? SO you can reference an expression within itself while it has not be closed yet: the last parenthese of the expression 3 is after the \3). Is that what it means ?

                Python syntax is a bit complicated to me.

                A 1 Reply Last reply Mar 20, 2019, 12:41 AM Reply Quote 0
                • A
                  Alan Kilborn @Gilles Maisonneuve
                  last edited by Mar 20, 2019, 12:41 AM

                  @Gilles-Maisonneuve said:

                  Python syntax is a bit complicated to me

                  It’s not Python syntax, it’s regular expression syntax. It’s just not Perl regular expression syntax. :)

                  And, BTW, nobody in the history of the world, especially someone coming from a Perl background, has ever uttered the phrase you typed.

                  G 1 Reply Last reply Mar 20, 2019, 8:50 AM Reply Quote 1
                  • G
                    guy038
                    last edited by guy038 Mar 20, 2019, 2:02 AM Mar 20, 2019, 1:04 AM

                    Hello @gilles-maisonneuve, @eko-palypse and All,

                    Gilles, could you verify that the two lines, below, work, with yours Red, Green and Blue colors ?

                    regexes[(3, (R,G,B))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                    regexes[(4, (R,G,B))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                    

                    For these two regexes :

                    • Group 1 = << = double inferior than sign

                    • Group 2 = ['"]? = an optional single or double quote, for regex id 3

                    • Group 2 = '|" = a mandatory single or double quote ,separated from the << characters with blank characters, for regex id 4

                    • Group 3 = \w+? = the shortest area of word characters, after the << sign, between possible quotes
                      and before a semicolon character ;, with possible blank characters, before and/or after the quote characters

                    Notes :

                    • In regex id 3, only the << string is highlighted ( Group 1 )

                    • In regex id 4, the << and the text between quotes are highlighted ( Groups 1 and 3 )

                    • I added the -i in-line modifier ( => (?s-i) leading syntax ) to be sure that the ending boundary of the block corresponds exactly with the text, between quotes ( search is sensitive to case ! )


                    So my regex (?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3 ( id 3 ) matches any of these six cases, below :

                    $x=<<TEXT;
                    Plain text here
                    TEXT
                    
                    $x=<<'TEXT';
                    Plain text here
                    TEXT
                    
                    $x=<<"TEXT";
                    Plain text here
                    TEXT
                    
                    $x=<<TEXT ;
                    Plain text here
                    TEXT
                    
                    $x=<<'TEXT' ;
                    Plain text here
                    TEXT
                    
                    $x=<<"TEXT" ;
                    Plain text here
                    TEXT
                    

                    And my regex (?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3 ( id 4 ) matches these 4 cases, below :

                    $x=<< 'TEXT';
                    Plain text here
                    TEXT
                    
                    $x=<< "TEXT";
                    Plain text here
                    TEXT
                    
                    $x=<< 'TEXT' ;
                    Plain text here
                    TEXT
                    
                    $x=<< "TEXT" ;
                    Plain text here
                    TEXT
                    

                    Best Regards,

                    guy038

                    G 2 Replies Last reply Mar 20, 2019, 8:09 AM Reply Quote 3
                    • G
                      Gilles Maisonneuve @guy038
                      last edited by Mar 20, 2019, 8:09 AM

                      @guy038

                      Hello Guy,

                      Could not make it work, sorry.

                      I mean:

                      • added (replaced original ones) in the EnhancePerlLexer.py from Ekopalypse the following lines (according to what you gave me:

                        regexes[(3, (224,0,0))] = (r’(?s-i)(<<)([‘"]?)(\w+?)\2\h*;.?\3’, [1])
                        regexes[(4, (0,0,224))] = (r’(?s-i)(<<)\h+('|")(\w+?)\2\h
                        ;.*?\3’, [1,3])

                      • saved it and restarted npp

                      • list itemstill have the same coloring, not working.

                      BUT, good news:

                      python console:
                      Traceback (most recent call last):
                      File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\startup.py", line 1, in <module>
                          import EnhancePerlLexer
                      File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 36
                          regexes[(3, (224,0,0))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                                                                                              ^
                      SyntaxError: EOL while scanning string literal
                      Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
                      Initialisation took 110ms
                      Ready.
                      

                      Can you tell me what did I did wrong ?
                      (When I comment out the two lines I get back a valid coloring for the ‘q*’ syntaxes (yes, forgot to tell you, this had vanished too…)

                      1 Reply Last reply Reply Quote 1
                      • G
                        Gilles Maisonneuve @guy038
                        last edited by Mar 20, 2019, 8:13 AM

                        @guy038

                        Well, I commented out the rule 3 and kept rule 4.
                        Same kind of error:

                         regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                                                                                                ^
                         SyntaxError: EOL while scanning string literal
                        
                        G 1 Reply Last reply Mar 20, 2019, 8:37 AM Reply Quote 0
                        • G
                          Gilles Maisonneuve @Gilles Maisonneuve
                          last edited by Mar 20, 2019, 8:37 AM

                          if I modify the rule like:

                          regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\2\h*;.*?\3)', [1,3])
                          

                          I don’t get any longer a syntax error in Python BUT I get no coloring for the here doc either…

                          Any idea ?

                          1 Reply Last reply Reply Quote 0
                          • G
                            Gilles Maisonneuve @Alan Kilborn
                            last edited by Mar 20, 2019, 8:50 AM

                            @Alan-Kilborn

                            chcp 1250 >NUL: & perl -e "$var=q(Alan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.); for my $p ('\t','\s') {print qq{\$p=$p},$var=~m/($p)déplaisant\1/x?$var:qq{n'en déplaise},qq{\n} ;};" & chcp 850 >NUL:
                            
                            $p=\tn'en déplaise
                            $p=\sAlan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.
                            
                            G 1 Reply Last reply Mar 20, 2019, 8:56 AM Reply Quote 0
                            • G
                              Gilles Maisonneuve @Gilles Maisonneuve
                              last edited by Mar 20, 2019, 8:56 AM

                              J’ai tellement l’habitude d’utiliser $1, $2, …, qui, eux, ne fonctionnent pas dans un simple ‘match’ mais uniquement dans un ‘substitute’, que je ne connaissais pas cette façon de répéter les ‘patterns’ de ‘matching’. J’ai appris quelque chose.
                              Dont acte.

                              E 1 Reply Last reply Mar 20, 2019, 11:59 AM Reply Quote 0
                              • E
                                Ekopalypse @Gilles Maisonneuve
                                last edited by Ekopalypse Mar 20, 2019, 12:02 PM Mar 20, 2019, 11:59 AM

                                @Gilles-Maisonneuve

                                Lunch break :-)

                                First, I’m sorry not to telling you that the single quote has to be escaped as it was
                                used to denote a python string - good, you figured it already out.

                                Let me break down the parts of that python code

                                regexes = OrderedDict()
                                regexes[(3, (255,0,0))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [0])
                                

                                regexes is variable, containing an OrderedDict class instance.
                                OrderedDict is more or less the same as a perl associative array or hash

                                regexes[] is the python way to access a key in that hash, like in perl regexes{}
                                regexes[()] the round bracket denotes a python tuple, in perl a list I guess (immutable)
                                the python tuple contains the items 3 and (255,0,0) <- this is again a tuple
                                The number 3 is here to create an unique key - has nothing to do with the regex itself.
                                So, regexes[(3, (255,0,0))] means, get me the value for key (3, (255,0,0)) from dict(hash) regexes

                                The value is (r’(?s)(\s*(<<)\s*(“{0,1}.+”{0,1})\s*;.*?\3)‘, [0])
                                Again, a python tuple containing the items r’…’ (raw string) and a list [] (in perl an array = mutable)
                                Everything within the raw string is the regex to be searched for and the list contains the information
                                which match group should be used for coloring
                                [0] is always the overall match of the complete regex and [1] would be the result from group 1,
                                [2] from group 2 and [1,2] from group 1 and group 2

                                So, in terms of regular expressions only the value part of the regexes hash/dict is of interest.
                                For searching only the raw string and for coloring which part was defined in the list [].

                                Does this makes sense to you?

                                The reason why this regex

                                regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\2\h*;.*?\3)', [1,3])
                                

                                doesn’t do what you want is that you use 4 groups now whereas @guy038 has
                                removed the outer matching group brackets.

                                (?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3

                                In order to make it work either use

                                regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+([\'"])(\w+?)\2\h*;.*?\3', [1,3])
                                or
                                regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\3\h*;.*?\4)', [1,3])

                                G 1 Reply Last reply Mar 20, 2019, 8:10 PM Reply Quote 2
                                • A
                                  Alan Kilborn
                                  last edited by Mar 20, 2019, 12:09 PM

                                  No idea what the “chcp 1250…” posting was supposed to be saying to me. :)

                                  This thread gets my vote for the biggest jumbled mess in the history of the community. :)

                                  M 1 Reply Last reply Mar 20, 2019, 12:21 PM Reply Quote 4
                                  • M
                                    Meta Chuh moderator @Alan Kilborn
                                    last edited by Mar 20, 2019, 12:21 PM

                                    maybe @Ekopalypse will write a resuming manual, once this is over … i refuse :)

                                    1 Reply Last reply Reply Quote 3
                                    • E
                                      Ekopalypse
                                      last edited by Mar 20, 2019, 12:22 PM

                                      You mean a short manager summary I guess :-D

                                      M 1 Reply Last reply Mar 20, 2019, 12:27 PM Reply Quote 1
                                      • M
                                        Meta Chuh moderator @Ekopalypse
                                        last edited by Mar 20, 2019, 12:27 PM

                                        if a short manager summary is, in your eyes, a fully featured guide, covering all eventualities, based on all caveats of the whole topic … then yes 😉

                                        1 Reply Last reply Reply Quote 3
                                        • E
                                          Ekopalypse
                                          last edited by Mar 20, 2019, 12:30 PM

                                          LOL - back to business

                                          1 Reply Last reply Reply Quote 2
                                          87 out of 112
                                          • First post
                                            87/112
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors