functionList not ignoring comments

mpheath

@Lycan-Thrope Try changing line 91 in testd.xml

						  (?si:.*?^\h*endclass)           #  must match all the way to 'endclass'

to

						  (?si:.*?^\h*(?!//\h*)endclass(?!\s*\*/))           #  must match all the way to 'endclass'

if endclass is preceded by // or followed by */ then it is probably commented.

commentZone: 51, 209
commentZone: 213, 409
funcZone: 411, 432
classZone: 0, 460
funcZone: 462, 487

Instead of 2 classZones as before, now there is only 1 classZone.

PeterJones

@Lycan-Thrope said in functionList not ignoring comments:

So please, don’t disparage the pattern. I’d like to think some really talented people created it, sans myself, right @PeterJones ? :)

But those talented people aren’t infallible (at least, I know I’m not), and might not have considered all edge cases.

You seem to be expecting the parser to be running two regex simultaneously – the comment regex and the class regex. Because you seem to expect that it can see the beginning of a class, then, while parsing the single class regex to grab the whole class from class to endclass, that it simultaneously sees a comment using another regex while the class-regex is still active, and can recognize that the endclass is commented out, to prevent the class-parsing regex from seeing it. Running two regex simultaneously would be a pretty awesome design if you could make it work… but I doubt that’s what was implemented.

Alternately, one could develop a system where the entire source file is parsed using the comment regex, and the comments are all deleted from the in-memory version of the file. Then that shrunken/de-commented in-memory file is run through the class parser, which then couldn’t see anything that used to be in the comments. If I were to have all the experience I have now with Notepad++'s Function List parser, and were to be writing a Function List-like parser from scratch, I think this is the direction I would go (if I could remember to do so). But based on my previous experience with FunctionList, and your descriptions above, that’s obviously not how Notepad++'s parser is written.

Thus, we’re left with the situation as implemented, where it seems (without my having studied N++'s FunctionList source code) that the comment regex is used to avoid starting a new class (or, I believe, standalone function), but that once it starts the class’s (or function’s) regex, it is up to the regex expression to make sure that comments inside that block are ignored while trying to determine the end of the block. (I think it’s not as much of a big deal for functions, because usually the expressions for those are looking for just the function name, not the whole function block). Hence, @mpheath has encouraged you to edit the regex in such a way as to ignore comments when looking for endclass, and even shown an example of how to do that.

Lycan Thrope

@mpheath said in functionList not ignoring comments:

(?si:.?^\h(?!//\h*)endclass(?!\s**/))

1.) Thank you. It works.
2.) I humbly, thank you and now, see what you were referring to with regard handling the comments inside the regex…

I was again trying last night/this morning before I retired, to try different things like you did, but obviously don’t have the chops that I thought I had. :)

@PeterJones , in his follow-up message almost points out how I thought the parser worked.

My impression, was that when it was inside the class zone, it would would read until the endclass and if a commmentZone was encountered, it would ignore reading anything inside, regardless what was in it until it found the end of the commentZone, and then would continue on with it’s previous Zone since it did not find it’s end. In that regard, it would be like a switch, with it’s recursion, like how a function in code works (It seemed like the logical explanation) where it would stop basic operation there until it finished the commentZone, and then return to it’s previous task. Maybe, that is the way it works, but the weak point was, we’re not able to use a keyword, only a character symbol, so I can see, now, why you meant it needed to deal with comments in the regex pattern itself. My thoughts were that was why we defind the comment regex prior to the pattern searching.

So I accept that this was, again, user error. Mine. :)

Thank you again.

Lycan Thrope

@PeterJones ,

Never expected you to be perfect, Peter, but you’re pretty close. :)

As in my reply to @mpheath , you’re mostly right about how I thought the parser worked. The exception was that I didn’t necessarily think it was running two regex simultaneously, but that it would know where the commentZone was, perhaps based on the commentZone character positions it uses, and turn off the reading/consumption mode of the class regex while inside a marked commentZone, basically ignore it until it came to the */ mark, and then continue reading the class.

I guess I was just confused about how I was supposed to alter the regex to deal with comments, as I wasn’t sure what he meant, since he seemed to be saying I needed to stop the pattern from seeking the endclass, which we couldn’t, because otherwise the class pattern would never be recognized. I know, I tried it last night, as that’s what I thought he was referring to, and all the functions inside and out showed up, but not the classes, so I knew that endclass keyword had to stay. I’ll have to chalk this misunderstanding to my lack of experience, which, thanks to his above solution, I can be a little wiser…not much…but a little. :)

Lycan Thrope

@mpheath ,
Just a futher follow up. After the initial fix worked for the example file presented, we have again found the issue rearing it’s head elsewhere, so while it temporarily fixed the issue, I feel the only real fix for this issue, will need to be the code allowing for keywords like class/endclass for the open and close demarcations that the classrange element expects for locating those zones.

I’m trying to get myself up to speed somewhat in C++ so I can at least read the code with a little more knowledge and see if I can’t figure out how to find and change what I think needs to be changed and I guess do a pull request and muck it up. :)

Thanks anyway, and I think this subject here will at least let people know if they find this issue in their functionList implementation of their language, if it has a mixed parser, this is a possible explanation for the problem.

Lycan Thrope

@mpheath ,

After going over a lot of stuff, including reexamining the FunctionList FAQ, I have to agree with you, that at present, I may need to work with the parser, as is. I must have missed that ‘embedded comments’ section, or glossed over it, or didn’t relate it to the kind of comments we do as the example seemed to be an example of excessive use of inline ‘block’ comment style for an inline comment. That may be what threw me as regards that example in the FAQ.

Moving the opening class declaration line after any block comments immediately following it, fixes the problem completely, with my recent testing. It appears, however, that as I discovered before, line comments inside a class taking up a whole line, or after code, inside a class/endclass block does indeed work as it should.

So I guess until I get up to speed with C++ and take a crack at a fix for the block comments giving problems inside a class/endclass code block, block comments will have to be avoided.

I do, however, feel that the simple fix for this, should be just allowing a longer string other than \{ and \} symbol characters would be a proper fix, since being able to use class and endclass in the open/close symbole elements, it would be part of the actual parser demarcation ability to know where a class starts and ends, rather than having to use regex to find all the way to the end like this regex had to do to be able to outline the structure of the class. For now, the crisis is averted until I can revisit this issue, hopefully with some C++ skills at a later date. :)