Community
    • Login

    Perl language syntax highlighting troubles (bug or limitation ?)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    112 Posts 6 Posters 44.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse
      last edited by Ekopalypse

      Just for clarification, the python script does NOT use the python regex engine instead
      it uses the one notepad++ offers, the boost::regex.
      Yes, you can use the enumeration without the pipe but makes it more visible for me with
      the pipe sign. Or is there a difference if used with pipe sign or without?

      1 Reply Last reply Reply Quote 1
      • EkopalypseE
        Ekopalypse
        last edited by Ekopalypse

        or maybe this one might be even better
        (?s)(<<)\h+(["'])(\w+?)\2\h*;.*?\3

        1 Reply Last reply Reply Quote 1
        • Gilles MaisonneuveG
          Gilles Maisonneuve
          last edited by

          Can’t reply what I wanted, a robot says I’m spamming…

          Gilles MaisonneuveG 1 Reply Last reply Reply Quote 0
          • EkopalypseE
            Ekopalypse @Gilles Maisonneuve
            last edited by

            @Gilles-Maisonneuve

            Can’t reply what I wanted, a robot says I’m spamming…

            I have no idea why this happens sometimes.

            By the way, now that you have installed pythonscript plugin would you mind
            clicking Plugins->Python Script->Scripts->Samples->RegexTester ?

            I know not everyone is recommending it but, personally, I love it.

            1 Reply Last reply Reply Quote 1
            • Gilles MaisonneuveG
              Gilles Maisonneuve @Gilles Maisonneuve
              last edited by

              AFAIK, at least in Perl, ["|'] means double-quote OR pipre OR simple-quote, everything between square brakets is literal. Also true in “awk” and C regexp I think.
              I don’t know for Python.

              Gilles MaisonneuveG 1 Reply Last reply Reply Quote 1
              • Gilles MaisonneuveG
                Gilles Maisonneuve @Gilles Maisonneuve
                last edited by

                @Ekopalypse

                Now, if I say in Pyhton (attempt to transliterate from Perl) :

                (r'(?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)', [2])
                

                does it mean :

                1. form REGEXP
                2. do not match NL with DOT
                3. matches any horizontal blanks (0 or more), don’t make a group
                4. matches ‘<<’ make it a group
                5. matches any horizontal blanks (0 or more), don’t make a group
                6. matches 0 or 1 text quote (either double or single), no group
                7. matches a group of any chars not " nor ’ one or more time(s) (in perl it would be [^"'])
                8. matches 0 or 1 text quote (either double or single), no group
                9. possible blanks until semi-colon, semi-colon, then possible chars until NL

                BUT THEN, what does mean ?\3. I’m lost there.

                Gilles MaisonneuveG 1 Reply Last reply Reply Quote 0
                • Gilles MaisonneuveG
                  Gilles Maisonneuve @Gilles Maisonneuve
                  last edited by

                  a slash m

                  EkopalypseE 1 Reply Last reply Reply Quote 0
                  • EkopalypseE
                    Ekopalypse
                    last edited by Ekopalypse

                    the r at the beginning just informs python that this is a raw string and
                    every char must be taken literally otherwise backslashes would be treated
                    as escapes under some circumstances.

                    The regex string is only this part

                    (?s)(\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
                    

                    and I would say, but as said - not an regex expert at all,

                    (?s) means Dot matches newline characters
                    the first matching group is

                    (\h*(<<)\h*["|']?([^"|^']+?)["|']?\h*;.*?\3)
                    

                    the second

                    (<<)
                    

                    and the third must be

                    ([^"|^']+?)
                    

                    if I’m right.

                    \3 should be the same as $3 in perl

                    Gilles MaisonneuveG 1 Reply Last reply Reply Quote 1
                    • Gilles MaisonneuveG
                      Gilles Maisonneuve @Ekopalypse
                      last edited by Gilles Maisonneuve

                      @Ekopalypse

                      still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’

                      and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks surrounded by parenthèses only.

                      EkopalypseE 1 Reply Last reply Reply Quote 0
                      • EkopalypseE
                        Ekopalypse @Gilles Maisonneuve
                        last edited by

                        @Gilles-Maisonneuve

                        maybe this picture makes it a little bit clearer

                        1 Reply Last reply Reply Quote 2
                        • EkopalypseE
                          Ekopalypse @Gilles Maisonneuve
                          last edited by Ekopalypse

                          @Gilles-Maisonneuve

                          still confused: ([^"|^']+?) why a ‘?’ after the ‘+’ what’s for this ‘?’

                          as less as possible - non-greedy

                          and then \3 would mean the 3rd matching group (third ‘()’) but in Perl is used only in >subsitutions. What is the use here ? There are only 2 groups in the regex (two blocks >surrounded by parenthèses only.

                          placeholder for what was found in match group 3, to find the EOT at the end

                          and there are 3 match groups or am I missing something??

                          Gilles MaisonneuveG 1 Reply Last reply Reply Quote 1
                          • Gilles MaisonneuveG
                            Gilles Maisonneuve @Ekopalypse
                            last edited by

                            @Ekopalypse

                            2 sets of parenteses only, where is the third set ?
                            so only 2 match groups

                            can you make this work :

                            no syntax error on the python console but absolutely no result, where is my bug ?

                            regexes[(3, (255,255,255))] = (r'(?s)(\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)', [1])
                            
                            EkopalypseE 1 Reply Last reply Reply Quote 0
                            • EkopalypseE
                              Ekopalypse @Gilles Maisonneuve
                              last edited by Ekopalypse

                              @Gilles-Maisonneuve

                              [1] informs the python script, that only the results from sub match group 1 should be colored in white (255,255,255)
                              sub match group 1 is the result of (<<)

                              In order to make it painting all you can use [0]

                              I’m still confused about the 2 to 3 match groups.
                              Am I incorrect when saying that
                              (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
                              (<<)
                              ("{0,1}.+"{0,1})
                              are three match groups?

                              Maybe the confusion comes from the fact that references matches within a
                              regular expression starts by 1 but python starts counting match results by 0.

                              Sorry, but I have to stay up early tomorrow and it is already 1am but I’m really
                              interested in solving our (mis)understanding today later (maybe in ~16-18hours)?

                              1 Reply Last reply Reply Quote 1
                              • Gilles MaisonneuveG
                                Gilles Maisonneuve
                                last edited by

                                ok, tomorrow is another day
                                ‘see’ you tomorrow.
                                have a good night.
                                g

                                1 Reply Last reply Reply Quote 1
                                • EkopalypseE
                                  Ekopalypse
                                  last edited by

                                  you too - see you

                                  Gilles MaisonneuveG 1 Reply Last reply Reply Quote 1
                                  • Gilles MaisonneuveG
                                    Gilles Maisonneuve @Ekopalypse
                                    last edited by

                                    @Ekopalypse

                                    OK, so the

                                    (\s*(<<)\s*("{0,1}.+"{0,1})\s*;.*?\3)
                                    

                                    is a regex group, not a function call surrounded by parenthèses or a logical group provided by the ‘r’ keyword. My mistake.
                                    BUT THEN, it is possible in Python to enclose an instruction such as ?\3 which means (as far as I understood what you explained to me earlier) recursive reference to a regexp named ‘3’) ??? The ‘3’ name being given in the expression regexes[(3, (255,255,255))] is that correct ? SO you can reference an expression within itself while it has not be closed yet: the last parenthese of the expression 3 is after the \3). Is that what it means ?

                                    Python syntax is a bit complicated to me.

                                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                                    • Alan KilbornA
                                      Alan Kilborn @Gilles Maisonneuve
                                      last edited by

                                      @Gilles-Maisonneuve said:

                                      Python syntax is a bit complicated to me

                                      It’s not Python syntax, it’s regular expression syntax. It’s just not Perl regular expression syntax. :)

                                      And, BTW, nobody in the history of the world, especially someone coming from a Perl background, has ever uttered the phrase you typed.

                                      Gilles MaisonneuveG 1 Reply Last reply Reply Quote 1
                                      • guy038G
                                        guy038
                                        last edited by guy038

                                        Hello @gilles-maisonneuve, @eko-palypse and All,

                                        Gilles, could you verify that the two lines, below, work, with yours Red, Green and Blue colors ?

                                        regexes[(3, (R,G,B))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                                        regexes[(4, (R,G,B))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                                        

                                        For these two regexes :

                                        • Group 1 = << = double inferior than sign

                                        • Group 2 = ['"]? = an optional single or double quote, for regex id 3

                                        • Group 2 = '|" = a mandatory single or double quote ,separated from the << characters with blank characters, for regex id 4

                                        • Group 3 = \w+? = the shortest area of word characters, after the << sign, between possible quotes
                                          and before a semicolon character ;, with possible blank characters, before and/or after the quote characters

                                        Notes :

                                        • In regex id 3, only the << string is highlighted ( Group 1 )

                                        • In regex id 4, the << and the text between quotes are highlighted ( Groups 1 and 3 )

                                        • I added the -i in-line modifier ( => (?s-i) leading syntax ) to be sure that the ending boundary of the block corresponds exactly with the text, between quotes ( search is sensitive to case ! )


                                        So my regex (?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3 ( id 3 ) matches any of these six cases, below :

                                        $x=<<TEXT;
                                        Plain text here
                                        TEXT
                                        
                                        $x=<<'TEXT';
                                        Plain text here
                                        TEXT
                                        
                                        $x=<<"TEXT";
                                        Plain text here
                                        TEXT
                                        
                                        $x=<<TEXT ;
                                        Plain text here
                                        TEXT
                                        
                                        $x=<<'TEXT' ;
                                        Plain text here
                                        TEXT
                                        
                                        $x=<<"TEXT" ;
                                        Plain text here
                                        TEXT
                                        

                                        And my regex (?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3 ( id 4 ) matches these 4 cases, below :

                                        $x=<< 'TEXT';
                                        Plain text here
                                        TEXT
                                        
                                        $x=<< "TEXT";
                                        Plain text here
                                        TEXT
                                        
                                        $x=<< 'TEXT' ;
                                        Plain text here
                                        TEXT
                                        
                                        $x=<< "TEXT" ;
                                        Plain text here
                                        TEXT
                                        

                                        Best Regards,

                                        guy038

                                        Gilles MaisonneuveG 2 Replies Last reply Reply Quote 3
                                        • Gilles MaisonneuveG
                                          Gilles Maisonneuve @guy038
                                          last edited by

                                          @guy038

                                          Hello Guy,

                                          Could not make it work, sorry.

                                          I mean:

                                          • added (replaced original ones) in the EnhancePerlLexer.py from Ekopalypse the following lines (according to what you gave me:

                                            regexes[(3, (224,0,0))] = (r’(?s-i)(<<)([‘"]?)(\w+?)\2\h*;.?\3’, [1])
                                            regexes[(4, (0,0,224))] = (r’(?s-i)(<<)\h+('|")(\w+?)\2\h
                                            ;.*?\3’, [1,3])

                                          • saved it and restarted npp

                                          • list itemstill have the same coloring, not working.

                                          BUT, good news:

                                          python console:
                                          Traceback (most recent call last):
                                          File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\startup.py", line 1, in <module>
                                              import EnhancePerlLexer
                                          File "C:\Users\gm\AppData\Roaming\Notepad++\plugins\Config\PythonScript\scripts\EnhancePerlLexer.py", line 36
                                              regexes[(3, (224,0,0))] = (r'(?s-i)(<<)(['"]?)(\w+?)\2\h*;.*?\3', [1])
                                                                                                                  ^
                                          SyntaxError: EOL while scanning string literal
                                          Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 30 2018, 16:30:26) [MSC v.1500 64 bit (AMD64)]
                                          Initialisation took 110ms
                                          Ready.
                                          

                                          Can you tell me what did I did wrong ?
                                          (When I comment out the two lines I get back a valid coloring for the ‘q*’ syntaxes (yes, forgot to tell you, this had vanished too…)

                                          1 Reply Last reply Reply Quote 1
                                          • Gilles MaisonneuveG
                                            Gilles Maisonneuve @guy038
                                            last edited by

                                            @guy038

                                            Well, I commented out the rule 3 and kept rule 4.
                                            Same kind of error:

                                             regexes[(4, (0,0,224))] = (r'(?s-i)(<<)\h+('|")(\w+?)\2\h*;.*?\3', [1,3])
                                                                                                                    ^
                                             SyntaxError: EOL while scanning string literal
                                            
                                            Gilles MaisonneuveG 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors