functionList not ignoring comments
-
Okay, this is going to get a little hairy, as the developer and myself have been beating our heads against the wall trying to figure out why two Class declarations wasn’t showing up in the functionList, while others in the file were, showing up. A tweak here and there in the regex enabled more to show up, but not one in particular.
Eventually, I broke the file completely apart into separate files and the regex worked on the code showing. The developer did some work on his code and commenting that he thought would help and eventually we came to a joint conclusion. His comment changes helped make it work, and the comment ignoring code in the functionList regex, was not working properly. Why, I hope we can find out here, since as I understand it, nothing inside the comment sections should be used by the functionList parser to create or delete a class/function pair inside of comments.
In this case, at one point the developer did indeed, have a faux Class declaration inside comments as a reference for prototyping his class and documenting it’s references. The apparent result of this clash, is that the real
class
outside of comments, was being closed by theendclass
keyword of the example class inside the comments.Since the functionList parser didn’t find a function between the outside of comments
Class
and the inside of commentsendclass
, it was apparently ignoring the class.Consequently, using further commenting to denote that the next section would be part of the
Class constructor
following the comment header, functionList apparently read this as the start of a newclass
called ‘constructor’. As it read the file, it found the functions/methods that belonged to the original class up until the final endclass for the original class, and it closed and displays the class in the functionList panel. Prior to a few regex tweaks and the author changing some comments, this wasn’t even displaying in the functionList originally, which is what prompted us trying to find the problem. Some regex tweaks later and his comment/code changes prompted the functionList to finally show what had happened.This screenshot shows what the function List shows, when the offending code was identified and put into a faux dBASE Plus .wfm file for showing the problem:
Interestingly, in the above code, the UDL language properly colors everything, and knows what is comment, and what is functional code. Yet the functionList parser doesn’t.
The following is the code and the comment faux code that shows the interaction that resulted in the above functionList result:
class ccs_Object(vInstanceId) of Timer() custom /* class newClass(vInstanceId) of ccs_Object(vInstanceId) ... class construct ... ... class methods ... endclass */ /* ============================================================================= Class constructor ============================================================================= */ function IamLoaded() return endclass
If you follow the above, you can see how the problem finally manifested itself, after changes in the searching regex and the code itself was changed to expose the actual culprit.
The below is the regex for the commentExpr in the troubleshooting environment I created that duplicates the dBASEPlus UDL/functionList environment so we could work on this. Changing the characters below to their hex equivalents changed nothing, so they’re left as they were when the problem started.
commentExpr="(?x) (?s:/\*.*\*/) (?m-s://.*?$) (?m-s:\&\&.*?$) "
I’m hoping someone may have seen this problem before or know what may have gone wrong, but the result is undeniable. Commented out code was included in the functionList parsing that produced this problem.
-
In Notepad++ versions 8.4.8 and 8.4.9 there were some changes to the code base that were related to function list parsers and function list tree generation. So, to be able to at least try to help you, it is important to know
- the version of Notepad++ you were doing your tests with and
- the complete function list parser you created.
BTW: To clarify the context, who is “the developer” (I guess it’s not DonHo) or what did he develop, respectively?
-
Thanks for the questions. This was a problem in 8.4.8, and my test environment I setup to work on this was in 8.4.6. as it also showed the problem there. I could go back further, but for the moment chose not to. The
developer
mentioned is one of our advanced dBASE PLus developers, who has worked with the environment going way back to the Borland days and has been instrumental in helping them with code fixes and is a prolific contributor to our community code base, called dBASE Users’ Function Library Project (dUFLP).I think, after playing with it, and reading (as best I could) NPP code, Scintilla code and Lexilla code as well as Scintillalua (?) documentation, it became clear to me, that there is a separate ‘tag’ for single line comments and multi-line comments. When I was messing with this when we finally realized what the problem might be, I commented out code with single line comment characters
//
that was inside/* */
multi-line comment characters to change how things presented themselves after saving the file and repainting the functionList panel.So, I just a little while ago, did a column selection of all the code in the affected muli-line comment area and changed the entire column section to multiple single line comment symbols and saved. The proper class name showed up in the functionList panel, as it was supposed to. These findings lead me to think that there is either in NPP, or the underlying libraries NPP uses, a problem with the multi-line comment code being recognized.
Maybe it’s just the functionList panel functionality in NPP that is the problem, I can’t be sure, but I know that using single line comments instead of multi-line comments, makes the panel function properly, at this point.
So, for now, I seem to have found at least a temporary work around for the problem. I noticed as you pointed out that the javascript (I think) in the Github issue notes/fixes as having been addressed and figured maybe something “broke” with Scintilla/Lexilla change that NPP made recently.
On the other hand, this developer that presented the problem writes some rather advanced code in our language and I had to make some changes to the UDL as well as the regex to comply with the language’s capabilities that I wasn’t aware of that his code prestented. An interesting side note, is that this particular code that he was using to test the dBASEPlus NPP UDL suite that I recently did, has presented problems on occasion with the development environment’s own editor, which by the way, is an adaptation of the SciTE editor, which uses the same libraries, so at this point, we’re looking at seeing if this might actually be a library problem rather than just an NPP problem.
We’re looking into the issue and I just posted here to see if anyone else has had a problem of this type, with the comment code not working, and at this point, it appears, the multi-line comment code not being ignored by the functionList panel parser. The comment code in the functionList parser shown above is all that should be involved. As the screenshots show, the UDL properly color codes the syntax. It’s only the functionList parsing, that seems to be the issue at the moment.
It’s getting late, and need to get to sleep, right now since it’s almost 5:00 am here, but I can post here later, if need be, a complete set of files that makes up this suite, but I was figuring if no one has seen the problem before on here, that I will probably have to do a detailed submission of an issue on Github, so was waiting to see if I got any other issues reported here, which so far, there seems to be none.
-
Maybe I should have mentioned that I had the same issue (function list not ignoring code commented-out in block comments) with my Pascal/Delphi function list parser that I released some weeks ago. The solution was a small change in the C++ code of Notepad++ provided in >> this commit << by @mpheath.
That fixes the issue at least for so called mixed function list parsers, i.e. parsers that contain a class parser and a function parser (see >> the Notepad++ user manual << to obtain more infos about the different types of function list parsers).
So, if you use Notepad++ v8.4.9 and above and your function list parser is a mixed parser, chances are high that your issue is already fixed. If it is only a function parser, it is possible to convert it into a mixed parser by adding a dummy class parser.
Maybe it’s also worth to know that the whole function list feature is independent from Scintilla and has only limited relationship to lexers/syntax highlighters in general. In order to be able to use the function list for a certain language, a lexer and a function list parser for that language are required, but the only reason for that is, that Notepad++ needs this relationship to activate the appropriate function list parser when a file containing code of this certain language is opened. That means, defining what is a line comment or what is a block comment in a UDL lexer is completely independent from defining regular expressions for single-line or multi-line comments in a function list parser.
-
v8.4.9 or later as @dinkumoil has correctly stated to invalidate code matched in comment zones.
Function parser reads the xml file for the patterns. Scintilla functions
SCI_SETTARGETRANGE
andSCI_SEARCHINTARGET
are called bysearchInTarget()
. The regular expression engine gets the matches from the text in the Scintilla edit control and are used to update the Function List panel. Lexilla has no involvement in this process AFAIK so lexing is unrelated to the issue.Try this
commentExpr="(?x) (?s:/\*.*?\*/) |(?m-s://.*$) |(?m-s:&&.*$) "
Missing some alternations
|
between the groups? so I inserted them in the pattern..*
is very greedy and might be OK for multiline mode because it will stop at the end of a line. Consider.*?
for single line mode to get smallest match else.*
will try to match not the first literal instance of*/
but the very last instance of*/
which the later could be at the end of the document. Nothing special about&
to escape. -
@dinkumoil ,
I will try that and see if that is the case. I usually don’t upgrade right away, and since I was concerned about another issue in 8.4.9, I hadn’t updated installing my UDL package into 8.4.9, nor updated my working version to it. -
@mpheath ,
Just for completeness, the changes you suggested to
commentExpr
, were originally how the code was done, but the above is the result after making different attempts at trying to get it to work. The above, with exception of the multi-line issue, works the way I have it above, and yes when I was using Mark to try and work out the regex aspect, it did show greedy selection, which I maybe erroneously, was attributing to the multi-line aspect actually not working and selecting all text within it to process.Edit: Also, after checking, I want to point out that in the c.xml functionList panel file, those
commentExpr
options are not OR’d out either, which is why I removed the OR’s in mine. -
Edit: You are right, however, it is a mixed parser language.
Unfortunately, it appears to be ineffective in this version of 8.4.9 as well. I’m not sure if the fix was put in the codebase but not included in the last 8.4.9 portable, I have on my machine, per this Debug info:
Notepad++ v8.4.9 (64-bit) Build time : Jan 27 2023 - 03:11:16 Path : C:\Users\camilee\Documents\Development Tools Downloads\Notepad++ Versions\npp.8.4.9.portable.x64\notepad++.exe Command Line : Admin mode : OFF Local Conf mode : ON Cloud Config : OFF OS Name : Windows 10 Home (64-bit) OS Version : 22H2 OS Build : 19045.2486 Current ANSI codepage : 1252 Plugins : mimeTools (2.9) NppConverter (4.5) NppExport (0.4)
Presuming this version is the latest release version, the following screenshots will show that the Orig named file which was unchanged, and the New file that had single line comments instead, under the same test files shows the difference I’m referring to. The
endclass
in the comment, is closing the class at the top of the editor window, and grabbing the next commented text which isClass constructor
which is in comments and is showing up in the functionList panel. The New named file, shows the proper Class name being shown in the functionList panel by using single line comments to comment out the lines the multi-line comment is not ignoring.Original file:
New file:
ccs_Object is the real class name in the New file. constructor is the commented out name that is being used in the Orig file.
-
Just downloaded the latest from the website and checked the build date, so apparently the version I used to take these shots, is the current 8.4.9 build. So the issue still does exist then. I’ll probably be submitting an issue then.
-
@Lycan-Thrope said in functionList not ignoring comments:
@mpheath ,
…
Edit: Also, after checking, I want to point out that in the c.xml functionList panel file, thosecommentExpr
options are not OR’d out either, which is why I removed the OR’s in mine.Can you look again? I see
|
characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expect, though @dinkumoil has more experience with Function List patterns so might be able to confirm.I cannot test bits and pieces so am at a disadvantage to be 100% sure of my advice. I do have a custom Notepad++ build that outputs the pairs and a Python 3 script that uses the pairs to make a hta (html) file showing the matched zones. This might be useful to show the matching pairs visually with info popups like for example.
… when I was using Mark to try and work out the regex aspect …
The searching is done with zones pairs. A discussion about the function unit parser behaviour, though the (Class/Method/Function) mix parser uses zones though a bit different. It is not exactly like the workings of Find and Replace dialog, so take note that the later F&R use is a guide, not a 100% certainty.
I’ll probably be submitting an issue then.
Where is the evidence of a bug? Where are the files to reproduce the issue? Need something to focus on to find a bug. Images are nice to look at though we cannot run the function list parser on images. Suggest to get your stuff sorted here and now and exhaust all options before creating an issue.
-
@mpheath said in functionList not ignoring comments:
Can you look again? I see
|
characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expectThat’s exactly the answer I would have given. Regular expressions for matching different types of comments have to be OR’d, otherwise they would not work.
@mpheath said in functionList not ignoring comments:
Where are the files to reproduce the issue? Need something to focus on to find a bug. Images are nice to look at though we cannot run the function list parser on images. Suggest to get your stuff sorted here and now and exhaust all options before creating an issue.
Thumbs up, nothing more to say.
-
@mpheath said in functionList not ignoring comments:
Can you look again? I see | characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expect, though @dinkumoil has more experience with Function List patterns so might be able to confirm.
Thanks, I was wondering why I couldn’t reproduce one of the original problems again. You were right, the c.xml file commentExpr was OR’d, but were aligned with the indentation indicators, which is why I missed them. As you’ll see from this image, it has removed even the faux class name from the list of the Original file,
constructor
isn’t there anymore. Nothing is there. Removing the OR, showed the issue of the missing class and why it was missing.I will be preparing these things. The images are just so I can show what I’m seeing to maybe jog anyone’s memory if, other than just @dinkumoil’s, if they’ve seen this behavior, and to show, I’m not imagining it.
That’s why I’m posting it here first, to discuss it and to iron out any misgivings, like this one, on my end.
-
@dinkumoil ,
Here’s the code that is being used for the UDL and the functionList parser. The following sample code is modified from the above with the addition of a function outside of the class, to show the mixed parser functionality working.Sample:
class ccs_Object(vInstanceId) of Timer() custom /* class newClass(vInstanceId) of ccs_Object(vInstanceId) ... class construct ... ... class methods ... endclass */ /* ============================================================================= Class constructor ============================================================================= */ function IamLoaded() return endclass function outsideClassTest junk = nothing return
testd.xml - UDL
<NotepadPlus> <UserLang name="testd" ext="wfm cfm cdm rep crp prg cc mnu sfm dmd lab" udlVersion="2.1"> <Settings> <Global caseIgnored="yes" allowFoldOfComments="no" foldCompact="no" forcePureLC="2" decimalSeparator="0" /> <Prefix Keywords1="no" Keywords2="no" Keywords3="no" Keywords4="no" Keywords5="no" Keywords6="no" Keywords7="no" Keywords8="no" /> </Settings> <KeywordLists> <Keywords name="Comments">00* 01 02 03/* 04*/</Keywords> <Keywords name="Numbers, prefix1"></Keywords> <Keywords name="Numbers, prefix2">0x</Keywords> <Keywords name="Numbers, extras1">A B C D E F a b c d e f</Keywords> <Keywords name="Numbers, extras2"></Keywords> <Keywords name="Numbers, suffix1">B b O o</Keywords> <Keywords name="Numbers, suffix2"></Keywords> <Keywords name="Numbers, range"></Keywords> <Keywords name="Operators1">= ; | := += -= *= /= %= == <> # > < >= <= ** ++ -- -> & + - * $ % ^ / ,</Keywords> <Keywords name="Operators2"></Keywords> <Keywords name="Folders in code1, open"></Keywords> <Keywords name="Folders in code1, middle"></Keywords> <Keywords name="Folders in code1, close"></Keywords> <Keywords name="Folders in code2, open">"do case" "do while" #if class do do do if if for for function try printjob with</Keywords> <Keywords name="Folders in code2, middle">#elseif else elseif</Keywords> <Keywords name="Folders in code2, close">endcase until #endif endclass enddo while with endif next endfor return endtry endprintjob endwith</Keywords> <Keywords name="Folders in comment, open"></Keywords> <Keywords name="Folders in comment, middle"></Keywords> <Keywords name="Folders in comment, close"></Keywords> <Keywords name="Keywords1">'close procedure' 'set procedure to' additive form local parameter parameters persistent procedure</Keywords> <Keywords name="Keywords2">case catch exit finally loop otherwise throw</Keywords> <Keywords name="Keywords3">AND NEW NOT OR of</Keywords> <Keywords name="Keywords4">'.and.' '.f.' '.not.' '.or.' '.t.'</Keywords> <Keywords name="Keywords5"></Keywords> <Keywords name="Keywords6">false true</Keywords> <Keywords name="Keywords7"></Keywords> <Keywords name="Keywords8"></Keywords> <Keywords name="Delimiters">00[ 01 02] 03( 04 05) 06{ 07 08} 09// 09&& 10 11((EOL)) 11((EOL)) 12" 13 14" 15 16 17 18 19 20 21 22 23</Keywords> </KeywordLists> <Styles> <WordsStyle name="DEFAULT" fgColor="FFFF00" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="COMMENTS" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" /> <WordsStyle name="LINE COMMENTS" fgColor="00FF00" bgColor="808080" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" /> <WordsStyle name="NUMBERS" fgColor="00FFFF" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS1" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS3" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS4" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS5" fgColor="FFFFFF" bgColor="000000" fontStyle="0" nesting="0" /> <WordsStyle name="KEYWORDS6" fgColor="00FFFF" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="KEYWORDS7" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" /> <WordsStyle name="KEYWORDS8" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" /> <WordsStyle name="OPERATORS" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="FOLDER IN CODE1" fgColor="000000" bgColor="FFFFFF" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="FOLDER IN CODE2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" /> <WordsStyle name="FOLDER IN COMMENT" fgColor="FFFFFF" bgColor="000000" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="DELIMITERS1" fgColor="00FFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="33554432" /> <WordsStyle name="DELIMITERS2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="83931154" /> <WordsStyle name="DELIMITERS3" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="83886087" /> <WordsStyle name="DELIMITERS4" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" /> <WordsStyle name="DELIMITERS5" fgColor="00FFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" /> <WordsStyle name="DELIMITERS6" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" /> <WordsStyle name="DELIMITERS7" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" /> <WordsStyle name="DELIMITERS8" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" /> </Styles> </UserLang> </NotepadPlus>
testd.xml - functionList
<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <!--Much of this regex is courtesy of Peter Jones of the Notepad++ Community, with some changes made for customization by Lee Grant--> <NotepadPlus> <functionList> <!-- ========================================================= [ testd ] --> <parser displayName="testd" id ="testd" commentExpr="(?x) (?s:/\*.*?\*/) |(?m-s://.*?$) |(?m-s:&&.*?$) " > <classRange mainExpr="(?xi) # Free-spacing mode and inline comments + search sensitive to case ^\h* # Optional leading whitespace chars class # 'class' keyword \h? # Optional whitepace char \w+ # Class name \h? # Following the class name there is the option of parameters, and if so the first entry inside the parens is required, whether there is other # parameters or not, once the parens go up, the first is required. ie: class FrameCtrl(frameObj) (?: # Beginning of the optional parameter(s) part ( Group 1 ) \( # Opening parenthesis (\h*\w+\h*)? # First and required parameter ( ,? \h* \w+\h*)* # Following optional/additional parameters \) # Closing parenthesis )? # End of the optional parameter(s) part # For the rest of the class declaration, after the class name, all other options are part of one big optional set, that follows 'of' \h? # and can be populated by one of several options. (?: # Beginning of the main optional part, in a non-capturing group # The first and most prevalent is the Superclass name that the class is being subclassed from, and it's options of parameters and again, # if it has parameters, at least the first one is required ie.: class ToolButtonFx(oParent) of Toolbutton(oParent). of # Optional 'of' keyword, surrounded by 1 horizontal whitespace char \h? \w+ # Superclass name \h? (?: # Beginning of the optional parameter(s) part ( Group 2 ) \( # Opening parenthesis (\h?\w+?\h?)? # First and required parameter ( ,? \h* \w+\h*)* # Following optional/additional parameters \) # Closing parenthesis )? # End of the optional parameter(s) part # The next possible option is that it is a custom object and needs to be in this line so if the object or form is opened up in the dBASE IDE, # the designers in it won't mess up the object by streaming out missing parts or overriding properties or objects and functions. #\h* #( custom )? # Optional 'custom' keyword # The next possible option is that the class is being subclassed from another object that is contained elsewhere and the compiler needs to know # this reference. There are two options for pointing to the file. The first is an Alias path in the IDE that can be accessed by the compiler # in the environment, or second, it is in the current directory and only the name is needed...or it has a path that can be listed here, # but this is bad practice, and an Alias is recommended if the file is in a place other than the current directory. If it is, the name can be # used in quotes as a string that gets passed to the compiler. Both follow the word 'From'. The Alias directory is a name that is enclosed # in two colons, one immediately before the Alias name and one immediately after, no spaces. \h* (?: # Beginning of the optional part, in a non-capturing group from # Optional 'from' keyword, surrounded by 1 horizontal whitespace char \h* (?: # Beginning of a non-capturing group : \w+ : \w+ \. \w+ # First pointing file case | # OR \x22 \w+ \. \w+ \x22 # Second pointing file case ) # End of a non-capturing group )? # End of the optional part \h* (?: custom )? # Optional 'custom' keyword )? # End of the main optional part $ # End of current line and end of the class declaration (?si:.*?^\h*endclass) # must match all the way to 'endclass' " > <className> <nameExpr expr="(?xi) # Free-spacing mode and inline comments and search sensible to case \h* # Optional leading whitespace chars class # 'class' keyword \h? # Optional whitepace char \K\w+ # Pure class name " /> </className> <function mainExpr="(?xi-s) ^ \h* (?: function \h+ \w+ | procedure \h+ \w+ | with \h+ \(.*?\) ) \h* " > <functionName> <funcNameExpr expr="(?xi-s) # multiline/comments ^ # trying to keep following keywords from being included in comments \h* # allow leading spaces (?: function # must have word 'function' as first word \h+ # must have at least one horizontal space after function # \K don't keep 'function' in the name of the function in the panel \w+ # the name of the function is the first whole word after 'function' | procedure # must have word 'procedure' as first word \h+ # must have at least one horizontal space after procedure # \K don't keep 'procedure' in the name of the function in the panel (?!to\b)\w+ # the name of the function is the first whole word after 'procedure' - 'to' # so as to exclude any 'set procedure to' statements, needs work though. | with \h+ \K \( \Kthis\.\K(.+)(?=\)) # all but 'this' and the closing parens. | with \h+ \K \( \K(.*?)(?=\)) ) " /> </functionName> </function> </classRange> <function mainExpr="(?xi-s) ^ \h* (?: function \h+ \w+ | procedure \h+ \w+ ) \h* " > <functionName> <nameExpr expr="(?xi-s) # multiline/comments \h* # allow leading spaces (?: function # must have word 'function' as first word \h+ # must have at least one horizontal space after function #\K # don't keep 'function' in the name of the function in the panel \w+ # the name of the function is the first whole word after 'function' | procedure \h+ #\K (?!to\b)\w+ ) " /> </functionName> </function> </parser> </functionList> </NotepadPlus>
As noted above, I have re-OR’d the commentExpr regex in the above functionList parser testd.xml file.
Remove the OR’s in it to re-produce the class
constructor
, instead of classccs_Object
, showing that the endclass in comments is being read by the parser and used to close off the classccs_Object
with no function inside it.Alternatively, use single line comments
//
instead of the block comments/* */
to show the proper classccs_Object
named. -
@Lycan-Thrope
And here’s the overrideMap.xml that the other message wouldn’t let me put in since the previous post is so long.<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <NotepadPlus> <functionList> <associationMap> <!-- This file is optional (can be removed). Each functionlist parse rule links to a language ID ("langID"). The "id" is the parse rule's default file name, but users can override it. Here are the default value they are using: <association id= "php.xml" langID= "1" /> <association id= "c.xml" langID= "2" /> <association id= "cpp.xml" langID= "3" /> (C++) <association id= "cs.xml" langID= "4" /> (C#) <association id= "objc.xml" langID= "5" /> (Obective-C) <association id= "java.xml" langID= "6" /> <association id= "rc.xml" langID= "7" /> (Windows Resource file) <association id= "html.xml" langID= "8" /> <association id= "xml.xml" langID= "9" /> <association id= "makefile.xml" langID= "10"/> <association id= "pascal.xml" langID= "11"/> <association id= "batch.xml" langID= "12"/> <association id= "ini.xml" langID= "13"/> <association id= "asp.xml" langID= "16"/> <association id= "sql.xml" langID= "17"/> <association id= "vb.xml" langID= "18"/> <association id= "css.xml" langID= "20"/> <association id= "perl.xml" langID= "21"/> <association id= "python.xml" langID= "22"/> <association id= "lua.xml" langID= "23"/> <association id= "tex.xml" langID= "24"/> (TeX) <association id= "fortran.xml" langID= "25"/> <association id= "bash.xml" langID= "26"/> <association id= "actionscript.xml" langID= "27"/> <association id= "nsis.xml" langID= "28"/> <association id= "tcl.xml" langID= "29"/> <association id= "lisp.xml" langID= "30"/> <association id= "scheme.xml" langID= "31"/> <association id= "asm.xml" langID= "32"/> (Assembly) <association id= "diff.xml" langID= "33"/> <association id= "props.xml" langID= "34"/> <association id= "postscript.xml" langID= "35"/> <association id= "ruby.xml" langID= "36"/> <association id= "smalltalk.xml" langID= "37"/> <association id= "vhdl.xml" langID= "38"/> <association id= "kix.xml" langID= "39"/> (KiXtart) <association id= "autoit.xml" langID= "40"/> <association id= "caml.xml" langID= "41"/> <association id= "ada.xml" langID= "42"/> <association id= "verilog.xml" langID= "43"/> <association id= "matlab.xml" langID= "44"/> <association id= "haskell.xml" langID= "45"/> <association id= "inno.xml" langID= "46"/> (Inno Setup) <association id= "cmake.xml" langID= "48"/> <association id= "yaml.xml" langID= "49"/> <association id= "cobol.xml" langID= "50"/> <association id= "gui4cli.xml" langID= "51"/> <association id= "d.xml" langID= "52"/> <association id= "powershell.xml" langID= "53"/> <association id= "r.xml" langID= "54"/> <association id= "jsp.xml" langID= "55"/> <association id= "coffeescript.xml" langID= "56"/> <association id= "json.xml" langID= "57"/> <association id= "javascript.js.xml" langID= "58"/> <association id= "fortran77.xml" langID= "59"/> <association id= "baanc.xml" langID= "60"/> (BaanC) <association id= "srec.xml" langID= "61"/> (Motorola S-Record binary data) <association id= "ihex.xml" langID= "62"/> (Intel HEX binary data) <association id= "tehex.xml" langID= "63"/> (Tektronix extended HEX binary data) <association id= "swift.xml" langID= "64"/> <association id= "asn1.xml" langID= "65"/> (Abstract Syntax Notation One) <association id= "avs.xml" langID= "66"/> (AviSynth) <association id= "blitzbasic.xml" langID= "67"/> (BlitzBasic) <association id= "purebasic.xml" langID= "68"/> <association id= "freebasic.xml" langID= "69"/> <association id= "csound.xml" langID= "70"/> <association id= "erlang.xml" langID= "71"/> <association id= "escript.xml" langID= "72"/> <association id= "forth.xml" langID= "73"/> <association id= "latex.xml" langID= "74"/> <association id= "mmixal.xml" langID= "75"/> <association id= "nimrod.xml" langID= "76"/> <association id= "nncrontab.xml" langID= "77"/> (extended crontab) <association id= "oscript.xml" langID= "78"/> <association id= "rebol.xml" langID= "79"/> <association id= "registry.xml" langID= "80"/> <association id= "rust.xml" langID= "81"/> <association id= "spice.xml" langID= "82"/> <association id= "txt2tags.xml" langID= "83"/> <association id= "visualprolog.xml" langID= "84"/> <association id= "typescript.xml" langID= "85"/> If you create your own parse rule of supported languages (above) for your specific need, you can copy it without modifying the original one, and make it point to your rule. For example, you have created your php parse rule, named "myphp2.xml". You add the rule file into the functionlist directory and add the following line in this file: <association id= "myphp2.xml" langID= "1" /> and that's it. --> <!-- As there is currently only one langID for COBOL: uncomment the following line to change to cobol-free.xml (cobol section free) if this is your favourite format --> <!-- <association id= "cobol-free.xml" langID= "50"/> --> <!-- For User Defined Languages (UDL's) use: <association id="my_parser.xml" userDefinedLangName="My UDL Name" /> If "My UDL Name" doesn't exist yet, you can create it via UDL dialog. --> <!-- ==================== User Defined Languages ============================ --> <association id= "testd.xml" userDefinedLangName="testd"/> <association id= "krl.xml" userDefinedLangName="KRL"/> <association id= "sinumerik.xml" userDefinedLangName="Sinumerik"/> <association id= "universe_basic.xml" userDefinedLangName="UniVerse BASIC"/> <!-- ======================================================================== --> </associationMap> </functionList> </NotepadPlus>
-
@Lycan-Thrope ,
The result of these files, again, can be viewed with this version of the test file above. One with block comments, one with single line comments and the one with single line comments with the class expanded to view the internal function/method.Block Comments:
Single Line Comments:
And this one shows the class opened up to see the internal function/method:
Waiting to hear your feedback.
-
@mpheath ,
Incidentally, you’ll see I did put the original code back in with regards your points of?
in the block comment search. Like I said, I did have them that way originally, but was trying different things, and one of them is the one that exposed that comments weren’t ignoring theendclass
in comments. It may be the keyword pair that closes the class definition, but as it is behind comments, it should not be paid any attention. That’s not happening for some reason. In addition, if I take that?
out, we get the next comment that hasClass constructor
in the comment section chosen to be a functionList entry. -
Thanks for the files. This is the output of the zones…
commentZone: 51, 209 commentZone: 213, 409 classZone: 0, 203 classZone: 301, 460 funcZone: 462, 487
This is the visual (Disregard green at end as it is missing a closing span tag). The Class match is from the start of the document (pos
0
) and ends atendclass
(pos203
) inside the comment zone. Code within comments are invalid as a match. This class match is not within a comment zone so is a valid match as far as the parser is concerned. Making it invalid because it is partly into a comment zone would IMO lose the Class from being displayed in the functionlist panel and would be a worse result.Looking at the
pascal.xml
function list file, I do not see patterns trying to locate theend
keyword of a Class, Function or Procedure. So intestd.xml
, is it necessary to match toendclass
to acknowledge a Class definition?Class definition:
class ccs_Object(vInstanceId) of Timer() custom
I do not know the fine details of recognizing class functions from ordinary functions with the functionlist patterns so @dinkumoil may be able to help with that.
Greediness in the patterns seems to be a problem with the functionlist parsing so suggest to avoid using greedy patterns. The larger the match, the more vunerable it is to run into an issue like this one. Try to aim for the smallest match to achieve a successful result. I consider that @dinkumoil has achieved that with
pascal.xml
. -
@mpheath ,
What am I looking at there? Is that the character/caret positions?To answer your question, yes, we needed to have the endclass found to close the class. That regex was giving me trouble, as you point out, trying to find only the least amount. @PeterJones , @guy038 , and a few other users here helped formulate the needed regex in that functionList parser file. We needed that to make the regex run the entire length down to the endclass so we could find and close the Class definition.
The dBASE Plus OOP isn’t like C++, where typically, one file = one class. We can have multiple classes defined per file, and thanks to @PeterJones , we also were able to have our objects display like functions inside a Class definition, because they are basically UI objects inside a Class…they technically are other class objects being defined inside a class. So yes, it needs to find the endclass to close one class, and find another.
We can’t use the OpenSymbole/CloseSymbole delimiters like C++ can. We tried putting the terms class/endclass inside those, but I suspect they are only meant for single characters as we never were able to get them to be used for identifying the open/close it needed to frame the Class definitions.
But, again…the endclass should be ignored, inside comments. Period. :)
Edit: Yes, they are character/caret positions, so I see and you see what I’m saying. :) The numbers are off, I suspect for some kind of whitespace offset or newline offset. I finally realized you had a link for me to follow to see that graphic. Yes…how did you produce that?
-
Edit: Yes, they are character/caret positions, so I see and you see what I’m saying. :) The numbers are off, I suspect for some kind of whitespace offset or newline offset. I finally realized you had a link for me to follow to see that graphic. Yes…how did you produce that?
They are positions of start and end pairs of a match. I consider the start pos as off by 1. Produced by stdout with some debugging code that the Python script captures and processes it into html. It gives a reasonable visual of the matches, though not perfect as the missing unclosed span tag probably due to overlapping zones.
But, again…the endclass should be ignored, inside comments. Period. :)
You can state that, though the reality is that you may need to work with the parser as it is. I could state it is the pattern that needs fixing as it trys to handle everything from
class
toendclass
. Demands have no relevence without reason.To answer your question, yes, we needed to have the endclass found to close the class.
Not sure why. The pattern searches to the
endclass
and yet the pattern does not properly handle code in comments. Seems like an issue with the pattern as it wants to do this rather large match. To blame the parser for the greedy pattern seems unreasonable.The subject is not about the C++ functionlist so is opening a new argument. I mentioned the
pascal.xml
which Pascal usesbegin
andend
, though as I mentioned, the Pascal patterns do not look for theend
keyword.The way I see it is that the patterns need to work with the parser. The parser will probably not be changed to make up for patterns that fail to work due to being large. The maintainer IMO will not risk all the functionlist patterns in use to make you or your developer happy even if it were possible. There is a bash functionlist file issue at least 2 years old and risking the other functionlist files could be a bad option. Suggest you fix the pattern.
-
@mpheath said in functionList not ignoring comments:
You can state that, though the reality is that you may need to work with the parser as it is. I could state it is the pattern that needs fixing as it trys to handle everything from class to endclass. Demands have no relevence without reason.
I will have to respectfully, disagree.
The pattern works. It has been working. It continues to work.The problem is the parser, and in particular, the commentZones code, in this instance.
If developers don’t comment their code, the parser is safe, but that’s not realistic, either. The simple fact is, the parser doesn’t recognize comments inside a class. It recognizes it before, or after, but not inside…particularly when those comments have actual keywords. Perhaps, you could say, it’s the Zone that is the problem, as it doesn’t recognize when something (a class in this instance) has not ended, but has comments started inside it. Multi-line comments, in this case, and it is the parsers weakness that it doesn’t realize it’s still inside a class. That’s why the classrange element has these:
openSymbole ="\{" closeSymbole="\}"
And the parser makes calls like this:
bool FunctionParsersManager::getZonePaserParameters(TiXmlNode *classRangeParser, generic_string &mainExprStr, generic_string &openSymboleStr, generic_string &closeSymboleStr, std::vector<generic_string> &classNameExprArray, generic_string &functionExprStr, std::vector<generic_string> &functionNameExprArray)
It can’t possibly parse a zone, if it doesn’t know the beginning and end of it.
It’s because the parser doesn’t allow keywords to be used in the open and close
Symbole
, that having to findendclass
became necessary.Another way to work around this is not to put block comments inside a class, because the parser doesn’t work like a real parser. It’s a quasi-parser. That kind of encourages not commenting code, which seems like it is encouraging a bad habit that is already rampant. Either way, you can’t reasonably expect the parser to work properly if it doesn’t allow the parserLanguage.xml to define it’s open and close pattern, and that’s what this parser is doing, and so this pattern was improvised to make the parser function.
So please, don’t disparage the pattern. I’d like to think some really talented people created it, sans myself, right @PeterJones ? :)