FunctionList.xml Regular Expressions not parsing properly
- 
 I guess I’m not doing this correctly: 
 here {star} means asterix
 “^([^c]).{star}(function|FUNCTION|subroutine|SUBROUTINE)[\s]{star}\K([\w]+)”
- 
 @brian-zelt 
 Enclose the text in back ticks in stead of double quotes or read the manual for markdown syntax .
- 
 Thanks, for the example. It, however, does not match all of the examples I listed. My point also, however, was not requesting a Fortran parser (although appreciated) but rather that NP++ was not parsing the RE as expected. That is, there appears to be a bug(s) in the implementation of RE in NP++ or how NP++ is sending text strings to be parsed. 
- 
 was not requesting a Fortran parser My bad, retry … I tried your RE ^([^c]).*(function|FUNCTION|subroutine|SUBROUTINE)[\s]*\K([\w]+)
 on regex101.com with flags/optionsmandgset (even tried different combinations) but was not able to get all the subroutine names from following examples (I presumed the double quotes had to be excluded):SUBROUTINE bob(a,b,c) SUBROUTINE bob(a,b,c) SUBROUTINE bob() SUBROUTINE bob( ) SUBROUTINE bob SUBROUTINE bob c SUBROUTINE bob(a,b,c) PRIVATE SUBROUTINE bob(a,b,c) return(d)i.e. the second example does not give a match. This was actually what I expected after reading your RE i.e. I did not expect regex101.com to correctly provide the subroutine name for all of examples just as N++ does not. 
 This one will … (?m-s)^(?!c).*(?i:FUNCTION|SUBROUTINE)\s*\K\w+- Use these options for the whole regular expression (?m-s)
- Assert position at the beginning of a line (at beginning of the string or after a line break character) (carriage return and line feed, form feed) ^
- Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!c)
- Match any single character that is NOT a line break character (line feed, carriage return, form feed) .*
- Match the regex below with these options (?i:FUNCTION|SUBROUTINE)
- Match a single character that is a “whitespace character” (any space in the active code page, tab, line feed, carriage return, vertical tab, form feed) \s*
- Keep the text matched so far out of the overall regex match \K
- Match a single character that is a “word character” (letter, digit, or underscore in the active code page) \w+
 Created with RegexBuddy 
- Use these options for the whole regular expression 
- 
 Thanks. I forgot about regexbuddy. Correct, the RE I supplied did not work on the one example. I believe I missed an additional asterix after the initial group when I used the wrong quotes. However, your proposal works better, with the addition of: (?m-s)^(?!c|C).*(?i:FUNCTION|SUBROUTINE)\s*\K\w+The point remains, however, that when this RE is inserted in functionlist.xml, NP++ does not parse the subroutines properly as described in the original inquiry. 
- 
 So…I’ve kinda wondered in the past about what you did in your post, so it is a good time to ask. You posted RegexBuddy output. So people get the benefit of RB’s “wisdom” without paying for it (like we have - :) ). It sorta seems wrong to me, a little bit, but is it OK to do? Maybe a question best for Jan… So here’s a great example. Yesterday I answered a regex question (https://notepad-plus-plus.org/community/topic/13556/replace-last-value-in-row-with-0), and I supplied my own explanation, but I was tempted to use RB to generate the explanation, but in the end I didn’t. 
- 
 Thanks S.S., I haven’t read RB policy’s with respect to re-posting the RB output. RB was correctly referenced to being the source. Again, the point this thread is whether or not NP++ has a bug in the RE processing for FunctionList.xml. Has anybody looked at the source code for NP++ for FunctionList? 
- 
 @Scott-Sumner 
 Uhm…to be honest it hadn’t even crossed my mind to look up RB’s policy on posting a copy of the RegEx Tree.@Brian-Zelt 
 Yes, I’ve looked at NP++'s FunctionList code in the past (SourceForge era). I created a patch for it that never got merged. On request I am in the process of re-creating that patch on current code base in addition to cleaning up and adding RE explanation as comment tofunctionList.xml.
 There are known issues with the RE engine (as explained by @guy038 here). Both “Search (& Replace)” and FunctionList use the same RE engine. When a RE works with “Search (& Replace)” it will work with FunctionList.
 However, defining a parser in FunctionList can be tricky.Maybe you can post the complete parser so I (or anyone else) can check it. 
- 
 Interesting…I played with the commentExpr RE. And now the parser seems to be working. I can’t duplicate the original commentExpr but it was copied from someone’s posted example that included only the fortran ! comment and not the start of line c comment. The original seemed harmless. So, the final fortran parser I have is listed below. I contains a possible flaw that only a single space is allowed between the ‘end’ and ‘function’, whereas a good RE should allow for any number of spaces, but the RE doesn’t permit \s*in the code below.
 `<association langID="25" id="fortran_function"/> <parser id="fortran_function" displayName="Fortran" commentExpr="(!.*?$|^(?i:c).*?$)"> <function mainExpr="(?m-s)^(?!c|C).*(?i:(?<!END\s)FUNCTION|(?<!END\s)SUBROUTINE)\s*\K(\w+)" displayMode="$functionName$"> <functionName> <nameExpr expr="[\w]+"/> </functionName> </function> </parser` 
 For testing, the following fortran code:` 
 c----------------------------------------------------------------------
 subroutine MYSUB1(iunit0,i,lprt,lnfl2)
 return
 endsubroutine MYSUB2( ) return end subroutine MYSUB3 ( ) return end subroutine MYSUB4() return end subroutine MYSUB5 return end subroutine MYSUB6 return endc subroutine MYSUB7(a,b,c) 
 return
 endprivate subroutine MYSUB8(a,b,c) return(d) endsubroutine MYSUB9 
 return
 endsubroutine MYSUB10 return end subroutine MYSUB10a ! subroutine MYSUB11(a,b,c) return endc---------------------------------------------------------------------- ` should produce, a list: 
 MYSUB1
 MYSUB2
 MYSUB3
 MYSUB4
 MYSUB5
 MYSUB6
 MYSUB8
 MYSUB9
 MYSUB10
 Note that MYSUB7 and MYSUB11 are commented and are correctly not in the list.Using the functionList.xml on existing code files now appears to work for fixed form or free form fortran. For some files, however, the parser only works if I select language ‘fortran free form’ even though the formatting is language ‘fortran fixed form’. Copying the contents of such a file to a new file, corrects the issue, so I suspect there may be a file encoding error embedded in the file that is not otherwise apparent. So… the solution appeared to be related to multiple levels RE in the functionList.xml processing. In this case, perhaps the comment vs the function name (although, I can’t duplicate my original issue). 
- 
 Hi @Brian-Zelt Are ENDandSUBROUTINE(orFUNCTION) allowed to be on separate lines?
 e.g.subroutine MYSUB12 return end subroutine MYSUB12aFYI: - With the updated commentExprit’s no longer needed to have(?!c|C)in themainExpr;
- Using a (numbered-)capturing group for the identifier ((\w+)in themainExpr) doesn’t add functionality;
- The nameExprcan be simplified to\w+;
- Make sure to encode special XML characters i.e. change <to<
- Free form style uses langID="25"and fixed form style useslangID="59". You could create separate (dedicated) parsers or have both styles associated with the same parser.
 
- With the updated 
- 
 Thanks MAPje71, 
 All good points. I couldn’t find a listing of the langID but assumed there must be a better list. Thanks for langID=59.
 Yes, “end function” must be on a single line.
- 
 FYI: The newest functionList.xmlhas a language ID table as comment at the top of theassociationMap-node.
