• Login
Community
  • Login

Perl language syntax highlighting troubles (bug or limitation ?)

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
112 Posts 6 Posters 59.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    Ekopalypse @Gilles Maisonneuve
    last edited by Mar 19, 2019, 11:19 PM

    @Gilles-Maisonneuve

    Can’t reply what I wanted, a robot says I’m spamming…

    I have no idea why this happens sometimes.

    By the way, now that you have installed pythonscript plugin would you mind
    clicking Plugins->Python Script->Scripts->Samples->RegexTester ?

    I know not everyone is recommending it but, personally, I love it.

    1 Reply Last reply Reply Quote 1
    • G
      Gilles Maisonneuve @Gilles Maisonneuve
      last edited by Mar 19, 2019, 11:22 PM

      AFAIK, at least in Perl, ["|'] means double-quote OR pipre OR simple-quote, everything between square brakets is literal. Also true in “awk” and C regexp I think.
      I don’t know for Python.

      G 1 Reply Last reply Mar 19, 2019, 11:23 PM Reply Quote 1
      • G
        Gilles Maisonneuve @Gilles Maisonneuve
        last edited by Mar 19, 2019, 11:23 PM

        @Ekopalypse

        Now, if I say in Pyhton (attempt to transliterate from Perl) :

        (r'(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)', [2])
        

        does it mean :

        1. form REGEXP
        2. do not match NL with DOT
        3. matches any horizontal blanks (0 or more), don’t make a group
        4. matches ‘<<’ make it a group
        5. matches any horizontal blanks (0 or more), don’t make a group
        6. matches 0 or 1 text quote (either double or single), no group
        7. matches a group of any chars not " nor ’ one or more time(s) (in perl it would be [^"'])
        8. matches 0 or 1 text quote (either double or single), no group
        9. possible blanks until semi-colon, semi-colon, then possible chars until NL

        BUT THEN, what does mean ?\3. I’m lost there.

        G 1 Reply Last reply Mar 19, 2019, 11:27 PM Reply Quote 0
        • G
          Gilles Maisonneuve @Gilles Maisonneuve
          last edited by Mar 19, 2019, 11:27 PM

          a slash m

          E 1 Reply Last reply Mar 19, 2019, 11:44 PM Reply Quote 0
          • E
            Ekopalypse
            last edited by Ekopalypse Mar 19, 2019, 11:36 PM Mar 19, 2019, 11:36 PM

            the r at the beginning just informs python that this is a raw string and
            every char must be taken literally otherwise backslashes would be treated
            as escapes under some circumstances.

            The regex string is only this part

            (?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
            

            and I would say, but as said - not an regex expert at all,

            (?s) means Dot matches newline characters
            the first matching group is

            (\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
            

            the second

            (<<)
            

            and the third must be

            ([^"|^']+?)
            

            if I’m right.

            \3 should be the same as $3 in perl

            G 1 Reply Last reply Mar 19, 2019, 11:43 PM Reply Quote 1
            • G
              Gilles Maisonneuve @Ekopalypse
              last edited by Gilles Maisonneuve Mar 19, 2019, 11:44 PM Mar 19, 2019, 11:43 PM

              @Ekopalypse

              still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’

              and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks surrounded by parenthèses only.

              E 1 Reply Last reply Mar 19, 2019, 11:47 PM Reply Quote 0
              • E
                Ekopalypse @Gilles Maisonneuve
                last edited by Mar 19, 2019, 11:44 PM

                @Gilles-Maisonneuve

                maybe this picture makes it a little bit clearer

                1 Reply Last reply Reply Quote 2
                • E
                  Ekopalypse @Gilles Maisonneuve
                  last edited by Ekopalypse Mar 19, 2019, 11:49 PM Mar 19, 2019, 11:47 PM

                  @Gilles-Maisonneuve

                  still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’

                  as less as possible - non-greedy

                  and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in >subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks >surrounded by parenthèses only.

                  placeholder for what was found in match group 3, to find the EOT at the end

                  and there are 3 match groups or am I missing something??

                  G 1 Reply Last reply Mar 20, 2019, 12:03 AM Reply Quote 1
                  • G
                    Gilles Maisonneuve @Ekopalypse
                    last edited by Mar 20, 2019, 12:03 AM

                    @Ekopalypse

                    2 sets of parenteses only, where is the third set ?
                    so only 2 match groups

                    can you make this work :

                    no syntax error on the python console but absolutely no result, where is my bug ?

                    regexes[(3, (255,255,255))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [1])
                    
                    E 1 Reply Last reply Mar 20, 2019, 12:19 AM Reply Quote 0
                    • E
                      Ekopalypse @Gilles Maisonneuve
                      last edited by Ekopalypse Mar 20, 2019, 12:20 AM Mar 20, 2019, 12:19 AM

                      @Gilles-Maisonneuve

                      [1] informs the python script, that only the results from sub match group 1 should be colored in white (255,255,255)
                      sub match group 1 is the result of (<<)

                      In order to make it painting all you can use [0]

                      I’m still confused about the 2 to 3 match groups.
                      Am I incorrect when saying that
                      (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
                      (<<)
                      ("{0,1}.+"{0,1})
                      are three match groups?

                      Maybe the confusion comes from the fact that references matches within a
                      regular expression starts by 1 but python starts counting match results by 0.

                      Sorry, but I have to stay up early tomorrow and it is already 1am but I’m really
                      interested in solving our (mis)understanding today later (maybe in ~16-18hours)?

                      1 Reply Last reply Reply Quote 1
                      • G
                        Gilles Maisonneuve
                        last edited by Mar 20, 2019, 12:20 AM

                        ok, tomorrow is another day
                        ‘see’ you tomorrow.
                        have a good night.
                        g

                        1 Reply Last reply Reply Quote 1
                        • E
                          Ekopalypse
                          last edited by Mar 20, 2019, 12:21 AM

                          you too - see you

                          G 1 Reply Last reply Mar 20, 2019, 12:27 AM Reply Quote 1
                          • G
                            Gilles Maisonneuve @Ekopalypse
                            last edited by Mar 20, 2019, 12:27 AM

                            @Ekopalypse

                            OK, so the

                            (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
                            

                            is a regex group, not a function call surrounded by parenthèses or a logical group provided by the ‘r’ keyword. My mistake.
                            BUT THEN, it is possible in Python to enclose an instruction such as ?\3 which means (as far as I understood what you explained to me earlier) recursive reference to a regexp named ‘3’) ??? The ‘3’ name being given in the expression regexes[(3, (255,255,255))] is that correct ? SO you can reference an expression within itself while it has not be closed yet: the last parenthese of the expression 3 is after the \3). Is that what it means ?

                            Python syntax is a bit complicated to me.

                            A 1 Reply Last reply Mar 20, 2019, 12:41 AM Reply Quote 0
                            • A
                              Alan Kilborn @Gilles Maisonneuve
                              last edited by Mar 20, 2019, 12:41 AM

                              @Gilles-Maisonneuve said:

                              Python syntax is a bit complicated to me

                              It’s not Python syntax, it’s regular expression syntax. It’s just not Perl regular expression syntax. :)

                              And, BTW, nobody in the history of the world, especially someone coming from a Perl background, has ever uttered the phrase you typed.

                              G 1 Reply Last reply Mar 20, 2019, 8:50 AM Reply Quote 1
                              • G
                                guy038
                                last edited by guy038 Mar 20, 2019, 2:02 AM Mar 20, 2019, 1:04 AM

                                Hello @gilles-maisonneuve, @eko-palypse and All,

                                Gilles, could you verify that the two lines, below, work, with yours Red, Green and Blue colors ?

                                regexes[(3, (R,G,B))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                                regexes[(4, (R,G,B))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                                

                                For these two regexes :

                                • Group 1 = << = double inferior than sign

                                • Group 2 = ['"]? = an optional single or double quote, for regex id 3

                                • Group 2 = '|" = a mandatory single or double quote ,separated from the << characters with blank characters, for regex id 4

                                • Group 3 = \w+? = the shortest area of word characters, after the << sign, between possible quotes
                                  and before a semicolon character ;, with possible blank characters, before and/or after the quote characters

                                Notes :

                                • In regex id 3, only the << string is highlighted ( Group 1 )

                                • In regex id 4, the << and the text between quotes are highlighted ( Groups 1 and 3 )

                                • I added the -i in-line modifier ( => (?s-i) leading syntax ) to be sure that the ending boundary of the block corresponds exactly with the text, between quotes ( search is sensitive to case ! )


                                So my regex (?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3 ( id 3 ) matches any of these six cases, below :

                                $x=<<TEXT;
                                Plain text here
                                TEXT
                                
                                $x=<<'TEXT';
                                Plain text here
                                TEXT
                                
                                $x=<<"TEXT";
                                Plain text here
                                TEXT
                                
                                $x=<<TEXT ;
                                Plain text here
                                TEXT
                                
                                $x=<<'TEXT' ;
                                Plain text here
                                TEXT
                                
                                $x=<<"TEXT" ;
                                Plain text here
                                TEXT
                                

                                And my regex (?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3 ( id 4 ) matches these 4 cases, below :

                                $x=<< 'TEXT';
                                Plain text here
                                TEXT
                                
                                $x=<< "TEXT";
                                Plain text here
                                TEXT
                                
                                $x=<< 'TEXT' ;
                                Plain text here
                                TEXT
                                
                                $x=<< "TEXT" ;
                                Plain text here
                                TEXT
                                

                                Best Regards,

                                guy038

                                G 2 Replies Last reply Mar 20, 2019, 8:09 AM Reply Quote 3
                                • G
                                  Gilles Maisonneuve @guy038
                                  last edited by Mar 20, 2019, 8:09 AM

                                  @guy038

                                  Hello Guy,

                                  Could not make it work, sorry.

                                  I mean:

                                  • added (replaced original ones) in the EnhancePerlLexer.py from Ekopalypse the following lines (according to what you gave me:

                                    regexes[(3, (224,0,0))] = (r’(?s-i)(<<)([‘"]?)(\w+?)\2\h*;.?\3’, [1])
                                    regexes[(4, (0,0,224))] = (r’(?s-i)(<<)\h+('|")(\w+?)\2\h
                                    ;.*?\3’, [1,3])

                                  • saved it and restarted npp

                                  • list itemstill have the same coloring, not working.

                                  BUT, good news:

                                  python console:
                                  Traceback (most recent call last):
                                  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\startup.py", line 1, in <module>
                                      import EnhancePerlLexer
                                  File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 36
                                      regexes[(3, (224,0,0))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                                                                                                          ^
                                  SyntaxError: EOL while scanning string literal
                                  Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
                                  Initialisation took 110ms
                                  Ready.
                                  

                                  Can you tell me what did I did wrong ?
                                  (When I comment out the two lines I get back a valid coloring for the ‘q*’ syntaxes (yes, forgot to tell you, this had vanished too…)

                                  1 Reply Last reply Reply Quote 1
                                  • G
                                    Gilles Maisonneuve @guy038
                                    last edited by Mar 20, 2019, 8:13 AM

                                    @guy038

                                    Well, I commented out the rule 3 and kept rule 4.
                                    Same kind of error:

                                     regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                                                                                                            ^
                                     SyntaxError: EOL while scanning string literal
                                    
                                    G 1 Reply Last reply Mar 20, 2019, 8:37 AM Reply Quote 0
                                    • G
                                      Gilles Maisonneuve @Gilles Maisonneuve
                                      last edited by Mar 20, 2019, 8:37 AM

                                      if I modify the rule like:

                                      regexes[(4, (0,0,224))] = (r'(?s-i)((<<)\h+([\'"])(\w+?)\2\h*;.*?\3)', [1,3])
                                      

                                      I don’t get any longer a syntax error in Python BUT I get no coloring for the here doc either…

                                      Any idea ?

                                      1 Reply Last reply Reply Quote 0
                                      • G
                                        Gilles Maisonneuve @Alan Kilborn
                                        last edited by Mar 20, 2019, 8:50 AM

                                        @Alan-Kilborn

                                        chcp 1250 >NUL: & perl -e "$var=q(Alan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.); for my $p ('\t','\s') {print qq{\$p=$p},$var=~m/($p)déplaisant\1/x?$var:qq{n'en déplaise},qq{\n} ;};" & chcp 850 >NUL:
                                        
                                        $p=\tn'en déplaise
                                        $p=\sAlan Kilborn est déplaisant dans sa façon de s'exprimer mais il a raison.
                                        
                                        G 1 Reply Last reply Mar 20, 2019, 8:56 AM Reply Quote 0
                                        • G
                                          Gilles Maisonneuve @Gilles Maisonneuve
                                          last edited by Mar 20, 2019, 8:56 AM

                                          J’ai tellement l’habitude d’utiliser $1, $2, …, qui, eux, ne fonctionnent pas dans un simple ‘match’ mais uniquement dans un ‘substitute’, que je ne connaissais pas cette façon de répéter les ‘patterns’ de ‘matching’. J’ai appris quelque chose.
                                          Dont acte.

                                          E 1 Reply Last reply Mar 20, 2019, 11:59 AM Reply Quote 0
                                          81 out of 112
                                          • First post
                                            81/112
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors