Trouble making a functionList parser for MATLAB
-
This is what I get …
with the following <parser> …
<parser displayName="MATLAB - MATrix LABoratory" id ="matlab_syntax" commentExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?'MLC': # Multi Line Comment %\{ # ...start-of-comment indicator (?: # ...followed by zero or more characters [^%] # ...not start of start-of-comment indicator | %(?![%{}]) # ...not being an SLC or a start- or end-of-comment indicator )*? %\} # ...end-of-comment indicator ) | (?m-s:\x25.*$) # Single Line Comment (SLC) | (?s:\x22(?:[^\x22\x5C]|\x5C.)*\x22) # String Literal - Double Quoted | (?s:\x27(?:[^\x27\x5C]|\x5C.)*\x27) # String Literal - Single Quoted " > <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?s-i) # dot matches at line breaks, case-sensitive \bclassdef\b # start-of-class indicator .*? # whatever, until... \bmethods\b # ...start-of-class-body indicator " openSymbole ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b # ensure leading word boundary for start-of-block indicator (?-i: # case-sensitive start-of-block indicators e(?:numeration|vents) | f(?:or|unction) | if | methods | p(?:arfor|roperties) | switch | try | while ) \b " closeSymbole="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b # ensure leading word boundary for end-of-block indicator (?-i: # case-sensitive end # end-of-block indicator ) \b # ensure trailing word boundary for end-of-block indicator " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b(?-i:classdef) # case-sensitive start-of-class indicator \h+ # required whitespace (?:\([^)]*?\)\h+)? # optional class-attributes \K # discard text matched so far [A-Za-z]\w* # valid character combination for identifiers i.e. class name " /> </className> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?ms-i) # ^, $ and dot match at line breaks, case-sensitive (?: ^ # a function can be found at start-of-line | [,;] # ...or after a separator )\h* # ...optionally followed by whitespace \K # discard text matched so far \bfunction # ensure word boundary for start-of-function indicator \s+ # required whitespace separator .*? # whatever, until... \bend\b # ...the first end-of-block indicator " > <functionName> <funcNameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\s+ # start-of-function indicator (?:\.{3}\s+)? # optional continuation-line indicator \K # discard text matched so far (?: # optional return value(s) (?: \w+ # ...single variable name | \[[\h,\w]*\] # ...or one or more variable names in brackets )\h*=\h* # ...followed by a separator with optional whitespace )? [A-Za-z]\w* # valid character combination for identifiers i.e. function name \b # ensure word boundary for name (?: # optional parameter list \s* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: \) # ...until end-of-parameter-list indicator | \.{3} # ...or continuation-line indicator ) )? " /> <!-- comment out the following node to display the method with its parameters --> <!-- <funcNameExpr expr="(?:(?:\w+|\[[\h,\w]*\])\h*=\h*)?[A-Za-z]\w*" /> --> </functionName> </function> </classRange> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?ms-i) # ^, $ and dot match at line breaks, case-sensitive ^\h* # optional leading whitespace at start-of-line function\b # start-of-function indicator .*? # whatever, until... \bend\b # ...the first end-of-block indicator " > <functionName> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\s+ # start-of-function indicator (?:\.{3}\s+)? # optional continuation-line indicator \K # discard text matched so far (?: # optional return value(s) (?: \w+ # ...single variable name | \[[\h,\w]*\] # ...or one or more variable names in brackets )\h*=\h* # ...followed by a separator with optional whitespace )? [A-Za-z]\w* # valid character combination for identifiers i.e. function name \b # ensure word boundary for name (?: # optional parameter list \s* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: \) # ...until end-of-parameter-list indicator | \.{3} # ...or continuation-line indicator ) )? " /> <!-- comment out the following node to display the method with its parameters --> <!-- <nameExpr expr="(?:(?:\w+|\[[\h,\w]*\])\h*=\h*)?[A-Za-z]\w*" /> --> </functionName> </function> </parser>
-
@MAPJe71 ,
Your new parser works like a charm. It even handlesWeirdUseOfContinuationLine.m
. Thank you very much! -
Sorry, I’m back with more problems. However, the result so far is good enough for me who know the limitations. Other Matlab users may think differently.
My goal is that the <parser> produces the intended result for syntactically correct Matlab files. Exceptions are acceptable for rare and smelly cases. It’s good if something is shown for syntactically incorrect Matlab files - the more the better.
The Matlab class syntax, [Class Syntax Guide], includes more than I have “communicated” with the test files in this thread.
Problem: A Matlab file allows one
classdef...end
block, which in turn may contain zero or moremethods...end
blocks.classdef
shall be the first keyword in the file. The current <parser> displays the functions of the firstmethods...end
block as belonging to the class, i.e. being methods, and erroneously the functions of subsequentmethods...end
blocks as functions. This is illustrated by the attached m-file.I failed to modify your <parser> to show all function inside the
classdef...end
block as methods belonging to the class.classdef (Sealed) aCompleteClassWithFunction < handle properties (Access = private) Prop1 = datenum(date) end properties Prop2 end events (ListenAccess = protected) StateChanged end methods function obj = aCompleteClassWithFunction(x) obj.Prop2 = x; end end methods (Access = {?MyOtherClass}) function d = myMethod(obj) d = obj.Prop1 + x; end end end function myUtilityFcn A = 17; end
-
Sigh, it was to good to be true ;-)
I’ll will look at it tomorrow. -
I’m obviously not capable to contribute effectively to the code of this <parser>:-(. I lack both regarding knowledge on regular expressions, and understanding of how the <parser>-code produces the function list.
How important is a good Matlab-<parser> to Notepad++ ? To me Notepad++ is a valuable complement to Matlab’s editor, which is sluggish and doesn’t support regular expressions, not even wildcards. I use Notepad++ daily. Notepad++ is superior when it comes to large files and browsing large code-bases. Google returns 0.18 million results for “notepad++ matlab”.
I send this post rather to help you decide than insisting on specific things being included in the <parser>. What we achieved so far is good enough for me!
Is there a mean to make the user aware of limitations in the capability of a specific <parser>. Documented limitations are more acceptable.
MLC commentExpr
The colon after ‘MLC’ is that a typo or
non-capture
?Proposal: replace the MLC-code by
(?xms) (?'OMLC' (?>^\h*%\{\h*$)) .*? (?'CMLC' (?>^\h*%\}\h*$))
Justification: Again Matlab’s syntax [Add Comments to Programs] is permissive
- nested multi-line-comments (candidate for documented limitation)
- single-line-comments inside MLC
- I often comments out code with both SLC and MLC, because Matlab has a feature, Find in Files, which doesn’t honor MLC (bug). MLC allows me to code-fold the comment-out code.
This should be matched by MLC
%{ some code %% Section % some other code %}
This expression is affected by the block-comment-problem.
Nested functions
Matlab have nested functions. My previous <parser> showed functions nested in functions as ordinary functions and functions nested in methods broke the function list (showed only file name). Your current <parser> ignores nested functions altogether, which is better. Candidate for documented limitation.
function out = function_scalar_return_nested( val ) out = 17; function nested end end
-
There is another Matlab feature, which is a strong candidate for documented limitation. In files defining functions and subfunctions, but no nested functions,
end
is optional in thefunction ... end
block. It’s a legacy from long time ago. -
Yep, the colon in the MLC commentExpr is a typo.
FYI: failing to detect multi-line/block comments when the parser has aclassRange
node appears to be a bug.The only way I can think of right now to document limitations of parsers is to add it as comment in
functionList.xml
. I will do so for this parser.Although I haven’t found a solution for multiple ‘methods’ sections within a class definition yet, I think the parser should support it presuming it’s used a lot in OO programming.
-
The latest:
<!-- | Note(s): | 1) One-line class definitions are not supported; | 2) Nested functions are not supported; | 3) All functions require an 'end' statement; | 4) In some situations where a multi-line/block comment contains function definition(s), | the function(s) is/are still displayed as global function(s). \--> <parser displayName="MATLAB - MATrix LABoratory" id ="matlab_syntax" commentExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?'MLC' # Multi Line Comment (?m) # ...^ and $ match at line breaks ^%\{$ # ...start-of-comment indicator (?: # ...followed by zero or more characters [^%] # ...not start of start-of-comment indicator | %(?![%{}]) # ...not being an SLC or a start- or end-of-comment indicator | (?&MLC) )*? ^%\}$ # ...end-of-comment indicator ) | (?m-s:%.*$) # Single Line Comment (SLC) | (?s:\x22(?:[^\x22\x5C]|\x5C.)*\x22) # String Literal - Double Quoted | (?s:\x27(?:[^\x27\x5C]|\x5C.)*\x27) # String Literal - Single Quoted " > <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?s-i) # dot matches at line breaks, case-sensitive \bclassdef\b # start-of-class indicator .*? # whatever, up till... (?= # ...start-of-class-body indicator \b (?:e(?:numeration|vents)|methods|properties) \b ) " openSymbole ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b # ensure leading word boundary for start-of-block indicator (?-i: # case-sensitive start-of-block indicators e(?:numeration|vents) | f(?:or|unction) | if | methods | p(?:arfor|roperties) | switch | try | while ) \b " closeSymbole="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b # ensure leading word boundary for end-of-block indicator (?-i: # case-sensitive end # end-of-block indicator ) \b # ensure trailing word boundary for end-of-block indicator " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \b(?-i:classdef) # case-sensitive start-of-class indicator \h+ # required whitespace (?:\([^)]*?\)\h+)? # optional class-attributes \K # discard text matched so far [A-Za-z]\w* # valid character combination for identifiers i.e. class name " /> </className> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?ms-i) # ^, $ and dot match at line breaks, case-sensitive (?: ^ # a function can be found at start-of-line | [,;] # ...or after a separator )\h* # ...optionally followed by whitespace \K # discard text matched so far \bfunction # ensure word boundary for start-of-function indicator \s+ # required whitespace separator .*? # whatever, until... \bend\b # ...the first end-of-block indicator " > <functionName> <funcNameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\s+ # start-of-function indicator (?:\.{3}\s+)? # optional continuation-line indicator \K # discard text matched so far (?: # optional return value(s) (?: \w+ # ...single variable name | \[[\h,\w]*\] # ...or one or more variable names in brackets )\h*=\h* # ...followed by a separator with optional whitespace )? [A-Za-z]\w* # valid character combination for identifiers i.e. function name \b # ensure word boundary for name (?: # optional parameter list \s* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: \) # ...until end-of-parameter-list indicator | \.{3} # ...or continuation-line indicator ) )? " /> <!-- comment out the following node to display the method with its parameters --> <!-- <funcNameExpr expr="(?:(?:\w+|\[[\h,\w]*\])\h*=\h*)?[A-Za-z]\w*" /> --> </functionName> </function> </classRange> <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?ms-i) # ^, $ and dot match at line breaks, case-sensitive ^\h* # optional leading whitespace at start-of-line function\b # start-of-function indicator .*? # whatever, until... \bend\b # ...the first end-of-block indicator " > <functionName> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\s+ # start-of-function indicator (?:\.{3}\s+)? # optional continuation-line indicator \K # discard text matched so far (?: # optional return value(s) (?: \w+ # ...single variable name | \[[\h,\w]*\] # ...or one or more variable names in brackets )\h*=\h* # ...followed by a separator with optional whitespace )? [A-Za-z]\w* # valid character combination for identifiers i.e. function name \b # ensure word boundary for name (?: # optional parameter list \s* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: \) # ...until end-of-parameter-list indicator | \.{3} # ...or continuation-line indicator ) )? " /> <!-- comment out the following node to display the method with its parameters --> <!-- <nameExpr expr="(?:(?:\w+|\[[\h,\w]*\])\h*=\h*)?[A-Za-z]\w*" /> --> </functionName> </function> </parser>
-
Thank you for your latest parser.
I’ve tested your latest parser and the parser I attach below with the files in a couple of real Matlab programs.
Your latest parser handles “multiple ‘methods’ sections” well in most cases. I failed to spot what characterizes the few cases, which show methods as functions.
One lesson I learned is that I must be more careful so that the test cases covers all relevant parts of the syntax of the target language. I previously missed that
- the keyword,
end
, is used as index, e.gA(end)
is the last element of the vectorA
. - apostrophe,
'
, is shorthand for the function,transpose
, e.g.A'
is equal totranspose(A)
My latest parser works as expected, i.e. I haven’t seen it fail yet. However, it puts higher requirements on the target Matlab code; it will fail with more files.
I have included two parsers in
functionList.xml
, one for the UDL matlab2. I switch between the two
by the menu entries,Language|matlab2
andLanguage|M|Matlab
, respectively. The obviously overtakes the extension,.m2
.I plan to upload my parsers to Matlab File Exchange to see it I get any interest.
<!-- P09, KISS, full function signature Limitations and requirements 1.1 The keyword, 'classdef', shall start a new line; no indentions 1.2 The keyword, 'end', which match 'classdef', shall start a new line 1.3 No keyword, 'end', shall start a new line inside the classdef block 2.1 The keyword, 'function', optionally indented, shall start a new line 3.1 The line of the last keyword, 'end', shall be ended by a new line character 4.1 Nested functions are displayed as methods and sub-functions, respectively. 5.1 Block comments must not be nested; no block comment inside a block comment 6.1 Function blocks shall be ended by the keyword, 'end' 7.1 The file shall contain at least one method or function /--> <parser displayName="MATLAB - MATrix LABoratory" id ="matlab2_syntax" commentExpr= "(?x) # free-spacing (?m) # ^ and $ match at line breaks (?-s) # dot does NOT match new line (?: # Multi Line Comment, (MLC) (?:^\h*\x25\x7B\h*$) # '%{' with optional space (?s:.*?) # lazy: any character up till (?:^\h*\x25\x7D\h*$) # '%}' with optional space ) # | # or (?: # Single Line Comment (SLC) (?: # \x25 # '%' | # or \x2D{3} # '...' ).*?$ # and up till end of line ) # | # or (?:\x27(?:.*?)*\x27) # String Literal - Single Quoted - on one line " > <classRange mainExpr = "(?x) # free-spacing (?m) # ^ and $ match at line breaks (?s) # dot matches new line (?-i) # case-sensitive ^classdef\h+ # start-of-class indicator at start-of-line (?:.*?) # whatever, until... ^end # end-of-class indicator at start-of-line " > <className> <nameExpr expr="(?x) # free-spacing (?-i) # case-sensitive \bclassdef # case-sensitive start-of-class indicator \h+ # required whitespace (?:\([^)]*?\)\h+)? # optional class-attributes \K # discard text matched so far [A-Za-z]\w* # valid Matlab identifiers; class name " /> </className> <function mainExpr="(?x) # free-spacing (?m) # ^ and $ match at line breaks (?s) # dot matches new line (?-i) # case-sensitive ^\h+ # required intendation function # start-of-function indicator \h+ # required whitespace separator .*?$ # whatever, up till the end of line " > <functionName> <funcNameExpr expr="(?x) # free-spacing function\h+ # start-of-function indicator \K # discard text matched so far (?: # optional return value(s) (?: # \w+ # ...single variable name | # or \[[\h,\w]*\] # ...one or more names in square brackets ) # \h*=\h* # ...a separator with optional whitespace )? # \b # ensure word boundary [A-Za-z]\w* # valid Matlab identifiers; method name \b # ensure word boundary for name (?: # optional parameter list \h* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: # \) # ...up till end-of-parameter-list indicator | # or \.{3} # ...continuation-line indicator ) )? " /> </functionName> </function> </classRange> <function mainExpr="(?x) # free-spacing (?m) # ^ and $ match at line breaks (?s) # dot matches new line (?-i) # case-sensitive ^\h* # optional intendation function # start-of-function indicator \h+ # required whitespace separator .*?$ # whatever, until end of line " > <functionName> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\h+ # start-of-function indicator (?:\.{3}.*$)? # optional continuation-line indicator \K # discard text matched so far (?: # optional return value(s) (?: # \w+ # ...single variable name | # or \[[\h,\w]*\] # ...one or more variable names in brackets ) # \h*=\h* # ...followed by a separator with optional whitespace )? # [A-Za-z]\w* # valid Matlab identifiers; function name \b # ensure word boundary for name (?: # optional parameter list \h* # ...optional leading whitespace \( # ...start-of-parameter-list indicator [\h,\w~]* # ...with optional parameters (?: # \) # ...until end-of-parameter-list indicator | # or \.{3} # ...or continuation-line indicator ) )? " /> </functionName> </function> </parser>
Alternate <functionName> block to show method and function name (without input and output arguments)
<functionName> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) function\h+ # start-of-function indicator (?:\.{3}.*$)? # optional continuation-line indicator (?: # optional return value(s) (?: # \w+ # ...single variable name | # or \[[\h,\w]*\] # ...one or more variable names in brackets ) # \h*=\h* # ...followed by a separator with optional whitespace )? # \K # discard text matched so far [A-Za-z]\w* # valid Matlab identifiers; function name \b # ensure word boundary for name " /> </functionName>
- the keyword,
-
*indentation ;-)
Glad you found a parser that meets your needs!
I propose to wait for feedback from other Matlab users (I’m not one of them) before including it in an official N++ release. -
I’ve finally uploaded some <parser>-elements to the Matlab File Exchange.