Community
    • Login

    FunctionList.xml Regular Expressions not parsing properly

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    17 Posts 3 Posters 7.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Brian ZeltB
      Brian Zelt
      last edited by

      Yet another Functionlist.xml question…
      I added a Fortran parser by following the suggestions from the community, however, it appears that the regular expression parser does not parse correctly.

      using
      “^([^c]).(function|FUNCTION|subroutine|SUBROUTINE)[\s]\K([\w]+)”
      and testing on
      https://regex101.com/
      the RE above will correctly provide the subroutine name for all of examples below

      1. " SUBROUTINE bob(a,b,c)"
      2. “SUBROUTINE bob(a,b,c)”
      3. " SUBROUTINE bob()"
      4. " SUBROUTINE bob( )"
      5. " SUBROUTINE bob"
      6. " SUBROUTINE bob "
      7. “c SUBROUTINE bob(a,b,c)” <ie, no returned value because commented>
      8. " PRIVATE SUBROUTINE bob(a,b,c) return(d)"

      However, when the RE is used in functionlist.xml, only the lines 1, 4, 6, 7, 8 work.
      The following are not parsed.
      2. “SUBROUTINE bob(a,b,c)”
      3. " SUBROUTINE bob()"
      5. " SUBROUTINE bob"

      Any suggestions, anyone? (of course, the answer should be ‘bob’ for all cases).
      …Brian

      1 Reply Last reply Reply Quote 0
      • Brian ZeltB
        Brian Zelt
        last edited by

        Sorry, the asterix were missing when I copied:
        “^([^c]).(function|FUNCTION|subroutine|SUBROUTINE)[\s]\K([\w]+)”

        1 Reply Last reply Reply Quote 0
        • MAPJe71M
          MAPJe71
          last edited by

          (?m)^(?!c)\h*(?i:PRIVATE\s+)?(?i:FUNCTION|SUBROUTINE)\s\K\w+

          1 Reply Last reply Reply Quote 0
          • MAPJe71M
            MAPJe71
            last edited by MAPJe71

            @brian-zelt
            This is the Fortran parser I currently have in my functionList.xml:

            			<!--
            			|   https://notepad-plus-plus.org/community/topic/11059/custom-functions-list-rules
            			\-->
            			<parser
            				displayName="[TODO] Fortran 90/95/2k - FORmula TRANslation Free Form style"
            				id         ="fortran_function"
            				commentExpr="(?x)                                               # Utilize inline comments (see `RegEx - Pattern Modifiers`)
            								(?m-s:!.*$)                                     # Single Line Comment
            							"
            			>
            				<function
            					mainExpr="(?x)                                              # Utilize inline comments (see `RegEx - Pattern Modifiers`)
            							(?-i:function|subroutine)\s+
            							\K                                                  # keep the text matched so far, out of the overall match
            							\w+
            							\s*\(
            							[^()]*
            							\s*\)
            						"
            				>
            					<!-- comment out the following node to display the method with its parameters -->
            					<functionName>
            						<nameExpr expr="\w+" />
            					</functionName>
            				</function>
            			</parser>
            
            
            1 Reply Last reply Reply Quote 0
            • MAPJe71M
              MAPJe71
              last edited by

              Should have tested it with your examples before posting it.
              Uses the exclamation mark as the comment character though.

              			<!--
              			|   https://notepad-plus-plus.org/community/topic/11059/custom-functions-list-rules
              			\-->
              			<parser
              				displayName="[TODO] Fortran 90/95/2k - FORmula TRANslation Free Form style"
              				id         ="fortran_function"
              				commentExpr="(?x)                                               # Utilize inline comments (see `RegEx - Pattern Modifiers`)
              								(?m-s:!.*$)                                     # Single Line Comment
              							"
              			>
              				<function
              					mainExpr="(?x)                                              # Utilize inline comments (see `RegEx - Pattern Modifiers`)
              							(?i:FUNCTION|SUBROUTINE)\s+
              							\K                                                  # keep the text matched so far, out of the overall match
              							\w+
              							(?:
              								\s*
              								\(
              								[^()]*
              								\)
              							)?
              						"
              				>
              					<!-- comment out the following node to display the method with its parameters -->
              					<functionName>
              						<nameExpr expr="\w+" />
              					</functionName>
              				</function>
              			</parser>
              
              1 Reply Last reply Reply Quote 0
              • Brian ZeltB
                Brian Zelt
                last edited by

                I guess I’m not doing this correctly:
                here {star} means asterix
                “^([^c]).{star}(function|FUNCTION|subroutine|SUBROUTINE)[\s]{star}\K([\w]+)”

                1 Reply Last reply Reply Quote 0
                • MAPJe71M
                  MAPJe71
                  last edited by

                  @brian-zelt
                  Enclose the text in back ticks in stead of double quotes or read the manual for markdown syntax .

                  1 Reply Last reply Reply Quote 0
                  • Brian ZeltB
                    Brian Zelt
                    last edited by

                    Thanks, for the example. It, however, does not match all of the examples I listed.

                    My point also, however, was not requesting a Fortran parser (although appreciated) but rather that NP++ was not parsing the RE as expected. That is, there appears to be a bug(s) in the implementation of RE in NP++ or how NP++ is sending text strings to be parsed.

                    1 Reply Last reply Reply Quote 0
                    • MAPJe71M
                      MAPJe71
                      last edited by MAPJe71

                      was not requesting a Fortran parser

                      My bad, retry …

                      I tried your RE ^([^c]).*(function|FUNCTION|subroutine|SUBROUTINE)[\s]*\K([\w]+)
                      on regex101.com with flags/options m and g set (even tried different combinations) but was not able to get all the subroutine names from following examples (I presumed the double quotes had to be excluded):

                       SUBROUTINE bob(a,b,c)
                      SUBROUTINE bob(a,b,c)
                       SUBROUTINE bob()
                       SUBROUTINE bob( )
                       SUBROUTINE bob
                       SUBROUTINE bob 
                      c SUBROUTINE bob(a,b,c)
                       PRIVATE SUBROUTINE bob(a,b,c) return(d)
                      

                      i.e. the second example does not give a match.

                      This was actually what I expected after reading your RE i.e. I did not expect regex101.com to

                      correctly provide the subroutine name for all of examples

                      just as N++ does not.


                      This one will …

                      (?m-s)^(?!c).*(?i:FUNCTION|SUBROUTINE)\s*\K\w+    
                      
                      • Use these options for the whole regular expression (?m-s)
                        • ^$ match at line breaks m
                        • (hyphen inverts the meaning of the letters that follow) -
                        • Dot doesn’t match line breaks s
                      • Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^
                      • Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!c)
                        • Match the character “c” literally (case sensitive) c
                      • Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
                        • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                      • Match the regex below with these options (?i:FUNCTION|SUBROUTINE)
                        • Case insensitive i
                        • Match this alternative (attempting the next alternative only if this one fails) FUNCTION
                          • Match the character string “FUNCTION” literally (case insensitive) FUNCTION
                        • Or match this alternative (the entire group fails if this one fails to match) SUBROUTINE
                          • Match the character string “SUBROUTINE” literally (case insensitive) SUBROUTINE
                      • Match a single character that is a “whitespace character” (any space in the active code page, tab, line feed, carriage return, vertical tab, form feed) \s*
                        • Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
                      • Keep the text matched so far out of the overall regex match \K
                      • Match a single character that is a “word character” (letter, digit, or underscore in the active code page) \w+
                        • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

                      Created with RegexBuddy

                      Scott SumnerS 1 Reply Last reply Reply Quote 0
                      • Brian ZeltB
                        Brian Zelt
                        last edited by

                        Thanks. I forgot about regexbuddy. Correct, the RE I supplied did not work on the one example. I believe I missed an additional asterix after the initial group when I used the wrong quotes. However, your proposal works better, with the addition of:

                        (?m-s)^(?!c|C).*(?i:FUNCTION|SUBROUTINE)\s*\K\w+

                        The point remains, however, that when this RE is inserted in functionlist.xml, NP++ does not parse the subroutines properly as described in the original inquiry.

                        1 Reply Last reply Reply Quote 0
                        • Scott SumnerS
                          Scott Sumner @MAPJe71
                          last edited by

                          @MAPJe71

                          So…I’ve kinda wondered in the past about what you did in your post, so it is a good time to ask. You posted RegexBuddy output. So people get the benefit of RB’s “wisdom” without paying for it (like we have - :) ). It sorta seems wrong to me, a little bit, but is it OK to do? Maybe a question best for Jan…

                          So here’s a great example. Yesterday I answered a regex question (https://notepad-plus-plus.org/community/topic/13556/replace-last-value-in-row-with-0), and I supplied my own explanation, but I was tempted to use RB to generate the explanation, but in the end I didn’t.

                          1 Reply Last reply Reply Quote 0
                          • Brian ZeltB
                            Brian Zelt
                            last edited by

                            Thanks S.S., I haven’t read RB policy’s with respect to re-posting the RB output. RB was correctly referenced to being the source.

                            Again, the point this thread is whether or not NP++ has a bug in the RE processing for FunctionList.xml.

                            Has anybody looked at the source code for NP++ for FunctionList?

                            1 Reply Last reply Reply Quote 0
                            • MAPJe71M
                              MAPJe71
                              last edited by

                              @Scott-Sumner
                              Uhm…to be honest it hadn’t even crossed my mind to look up RB’s policy on posting a copy of the RegEx Tree.

                              @Brian-Zelt
                              Yes, I’ve looked at NP++'s FunctionList code in the past (SourceForge era). I created a patch for it that never got merged. On request I am in the process of re-creating that patch on current code base in addition to cleaning up and adding RE explanation as comment to functionList.xml.
                              There are known issues with the RE engine (as explained by @guy038 here). Both “Search (& Replace)” and FunctionList use the same RE engine. When a RE works with “Search (& Replace)” it will work with FunctionList.
                              However, defining a parser in FunctionList can be tricky.

                              Maybe you can post the complete parser so I (or anyone else) can check it.

                              1 Reply Last reply Reply Quote 0
                              • Brian ZeltB
                                Brian Zelt
                                last edited by

                                Interesting…I played with the commentExpr RE. And now the parser seems to be working. I can’t duplicate the original commentExpr but it was copied from someone’s posted example that included only the fortran ! comment and not the start of line c comment. The original seemed harmless.

                                So, the final fortran parser I have is listed below. I contains a possible flaw that only a single space is allowed between the ‘end’ and ‘function’, whereas a good RE should allow for any number of spaces, but the RE doesn’t permit \s* in the code below.
                                `

                                		<association langID="25" id="fortran_function"/>
                                
                                		<parser id="fortran_function" displayName="Fortran" commentExpr="(!.*?$|^(?i:c).*?$)">
                                			<function
                                				mainExpr="(?m-s)^(?!c|C).*(?i:(?<!END\s)FUNCTION|(?<!END\s)SUBROUTINE)\s*\K(\w+)"
                                				displayMode="$functionName$">
                                				<functionName>
                                					<nameExpr expr="[\w]+"/>
                                				</functionName>
                                			</function>
                                		</parser
                                

                                `
                                For testing, the following fortran code:

                                `
                                c----------------------------------------------------------------------
                                subroutine MYSUB1(iunit0,i,lprt,lnfl2)
                                return
                                end

                                  subroutine MYSUB2( )
                                  return
                                  end
                                
                                  subroutine MYSUB3 (  
                                   )
                                  return
                                  end
                                
                                  subroutine MYSUB4()
                                  return
                                  end
                                
                                  subroutine MYSUB5   
                                  return
                                  end
                                
                                  subroutine MYSUB6
                                  return
                                  end
                                

                                c subroutine MYSUB7(a,b,c)
                                return
                                end

                                  private subroutine MYSUB8(a,b,c) return(d) 
                                
                                  end 
                                

                                subroutine MYSUB9
                                return
                                end

                                  subroutine MYSUB10
                                  return
                                  end subroutine MYSUB10a
                                
                                  
                                  !   subroutine MYSUB11(a,b,c)   
                                  return
                                  end
                                

                                c----------------------------------------------------------------------

                                `

                                should produce, a list:
                                MYSUB1
                                MYSUB2
                                MYSUB3
                                MYSUB4
                                MYSUB5
                                MYSUB6
                                MYSUB8
                                MYSUB9
                                MYSUB10
                                Note that MYSUB7 and MYSUB11 are commented and are correctly not in the list.

                                Using the functionList.xml on existing code files now appears to work for fixed form or free form fortran. For some files, however, the parser only works if I select language ‘fortran free form’ even though the formatting is language ‘fortran fixed form’. Copying the contents of such a file to a new file, corrects the issue, so I suspect there may be a file encoding error embedded in the file that is not otherwise apparent.

                                So… the solution appeared to be related to multiple levels RE in the functionList.xml processing. In this case, perhaps the comment vs the function name (although, I can’t duplicate my original issue).

                                1 Reply Last reply Reply Quote 0
                                • MAPJe71M
                                  MAPJe71
                                  last edited by

                                  Hi @Brian-Zelt

                                  Are END and SUBROUTINE (or FUNCTION) allowed to be on separate lines?
                                  e.g.

                                  subroutine MYSUB12
                                  return
                                  end 
                                    subroutine MYSUB12a
                                  
                                  

                                  FYI:

                                  1. With the updated commentExpr it’s no longer needed to have (?!c|C) in the mainExpr;
                                  2. Using a (numbered-)capturing group for the identifier ((\w+) in the mainExpr) doesn’t add functionality;
                                  3. The nameExpr can be simplified to \w+;
                                  4. Make sure to encode special XML characters i.e. change < to &lt;
                                  5. Free form style uses langID="25" and fixed form style uses langID="59". You could create separate (dedicated) parsers or have both styles associated with the same parser.
                                  1 Reply Last reply Reply Quote 0
                                  • Brian ZeltB
                                    Brian Zelt
                                    last edited by

                                    Thanks MAPje71,
                                    All good points. I couldn’t find a listing of the langID but assumed there must be a better list. Thanks for langID=59.
                                    Yes, “end function” must be on a single line.

                                    1 Reply Last reply Reply Quote 0
                                    • MAPJe71M
                                      MAPJe71
                                      last edited by

                                      FYI: The newest functionList.xml has a language ID table as comment at the top of the associationMap-node.

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors