Community
    • Login

    functionList not ignoring comments

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    26 Posts 4 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Lycan ThropeL
      Lycan Thrope
      last edited by Lycan Thrope

      Okay, this is going to get a little hairy, as the developer and myself have been beating our heads against the wall trying to figure out why two Class declarations wasn’t showing up in the functionList, while others in the file were, showing up. A tweak here and there in the regex enabled more to show up, but not one in particular.

      Eventually, I broke the file completely apart into separate files and the regex worked on the code showing. The developer did some work on his code and commenting that he thought would help and eventually we came to a joint conclusion. His comment changes helped make it work, and the comment ignoring code in the functionList regex, was not working properly. Why, I hope we can find out here, since as I understand it, nothing inside the comment sections should be used by the functionList parser to create or delete a class/function pair inside of comments.

      In this case, at one point the developer did indeed, have a faux Class declaration inside comments as a reference for prototyping his class and documenting it’s references. The apparent result of this clash, is that the real class outside of comments, was being closed by the endclass keyword of the example class inside the comments.

      Since the functionList parser didn’t find a function between the outside of comments Class and the inside of comments endclass, it was apparently ignoring the class.

      Consequently, using further commenting to denote that the next section would be part of the Class constructor following the comment header, functionList apparently read this as the start of a new class called ‘constructor’. As it read the file, it found the functions/methods that belonged to the original class up until the final endclass for the original class, and it closed and displays the class in the functionList panel. Prior to a few regex tweaks and the author changing some comments, this wasn’t even displaying in the functionList originally, which is what prompted us trying to find the problem. Some regex tweaks later and his comment/code changes prompted the functionList to finally show what had happened.

      This screenshot shows what the function List shows, when the offending code was identified and put into a faux dBASE Plus .wfm file for showing the problem:
      commenterrorissue.PNG

      Interestingly, in the above code, the UDL language properly colors everything, and knows what is comment, and what is functional code. Yet the functionList parser doesn’t.

      The following is the code and the comment faux code that shows the interaction that resulted in the above functionList result:

      class ccs_Object(vInstanceId) of Timer() custom
      
      /*
            class newClass(vInstanceId) of ccs_Object(vInstanceId)
      
               ... class construct ...
      
               ... class methods ...
      
            endclass
      
      */
      
      /*
         =============================================================================
      
         Class constructor
      
         =============================================================================
      */
         function IamLoaded()
            return
      
      endclass
      
      

      If you follow the above, you can see how the problem finally manifested itself, after changes in the searching regex and the code itself was changed to expose the actual culprit.

      The below is the regex for the commentExpr in the troubleshooting environment I created that duplicates the dBASEPlus UDL/functionList environment so we could work on this. Changing the characters below to their hex equivalents changed nothing, so they’re left as they were when the problem started.

      			commentExpr="(?x)
      							(?s:/\*.*\*/)
      							(?m-s://.*?$)
      							(?m-s:\&\&.*?$)
      							"
      

      I’m hoping someone may have seen this problem before or know what may have gone wrong, but the result is undeniable. Commented out code was included in the functionList parsing that produced this problem.

      dinkumoilD 1 Reply Last reply Reply Quote 2
      • dinkumoilD
        dinkumoil @Lycan Thrope
        last edited by

        @Lycan-Thrope

        In Notepad++ versions 8.4.8 and 8.4.9 there were some changes to the code base that were related to function list parsers and function list tree generation. So, to be able to at least try to help you, it is important to know

        • the version of Notepad++ you were doing your tests with and
        • the complete function list parser you created.

        BTW: To clarify the context, who is “the developer” (I guess it’s not DonHo) or what did he develop, respectively?

        Lycan ThropeL 1 Reply Last reply Reply Quote 0
        • Lycan ThropeL
          Lycan Thrope @dinkumoil
          last edited by Lycan Thrope

          @dinkumoil ,

          Thanks for the questions. This was a problem in 8.4.8, and my test environment I setup to work on this was in 8.4.6. as it also showed the problem there. I could go back further, but for the moment chose not to. The developer mentioned is one of our advanced dBASE PLus developers, who has worked with the environment going way back to the Borland days and has been instrumental in helping them with code fixes and is a prolific contributor to our community code base, called dBASE Users’ Function Library Project (dUFLP).

          I think, after playing with it, and reading (as best I could) NPP code, Scintilla code and Lexilla code as well as Scintillalua (?) documentation, it became clear to me, that there is a separate ‘tag’ for single line comments and multi-line comments. When I was messing with this when we finally realized what the problem might be, I commented out code with single line comment characters // that was inside /* */ multi-line comment characters to change how things presented themselves after saving the file and repainting the functionList panel.

          So, I just a little while ago, did a column selection of all the code in the affected muli-line comment area and changed the entire column section to multiple single line comment symbols and saved. The proper class name showed up in the functionList panel, as it was supposed to. These findings lead me to think that there is either in NPP, or the underlying libraries NPP uses, a problem with the multi-line comment code being recognized.

          Maybe it’s just the functionList panel functionality in NPP that is the problem, I can’t be sure, but I know that using single line comments instead of multi-line comments, makes the panel function properly, at this point.

          So, for now, I seem to have found at least a temporary work around for the problem. I noticed as you pointed out that the javascript (I think) in the Github issue notes/fixes as having been addressed and figured maybe something “broke” with Scintilla/Lexilla change that NPP made recently.

          On the other hand, this developer that presented the problem writes some rather advanced code in our language and I had to make some changes to the UDL as well as the regex to comply with the language’s capabilities that I wasn’t aware of that his code prestented. An interesting side note, is that this particular code that he was using to test the dBASEPlus NPP UDL suite that I recently did, has presented problems on occasion with the development environment’s own editor, which by the way, is an adaptation of the SciTE editor, which uses the same libraries, so at this point, we’re looking at seeing if this might actually be a library problem rather than just an NPP problem.

          We’re looking into the issue and I just posted here to see if anyone else has had a problem of this type, with the comment code not working, and at this point, it appears, the multi-line comment code not being ignored by the functionList panel parser. The comment code in the functionList parser shown above is all that should be involved. As the screenshots show, the UDL properly color codes the syntax. It’s only the functionList parsing, that seems to be the issue at the moment.

          It’s getting late, and need to get to sleep, right now since it’s almost 5:00 am here, but I can post here later, if need be, a complete set of files that makes up this suite, but I was figuring if no one has seen the problem before on here, that I will probably have to do a detailed submission of an issue on Github, so was waiting to see if I got any other issues reported here, which so far, there seems to be none.

          dinkumoilD mpheathM 2 Replies Last reply Reply Quote 0
          • dinkumoilD
            dinkumoil @Lycan Thrope
            last edited by dinkumoil

            @Lycan-Thrope

            Maybe I should have mentioned that I had the same issue (function list not ignoring code commented-out in block comments) with my Pascal/Delphi function list parser that I released some weeks ago. The solution was a small change in the C++ code of Notepad++ provided in >> this commit << by @mpheath.

            That fixes the issue at least for so called mixed function list parsers, i.e. parsers that contain a class parser and a function parser (see >> the Notepad++ user manual << to obtain more infos about the different types of function list parsers).

            So, if you use Notepad++ v8.4.9 and above and your function list parser is a mixed parser, chances are high that your issue is already fixed. If it is only a function parser, it is possible to convert it into a mixed parser by adding a dummy class parser.

            Maybe it’s also worth to know that the whole function list feature is independent from Scintilla and has only limited relationship to lexers/syntax highlighters in general. In order to be able to use the function list for a certain language, a lexer and a function list parser for that language are required, but the only reason for that is, that Notepad++ needs this relationship to activate the appropriate function list parser when a file containing code of this certain language is opened. That means, defining what is a line comment or what is a block comment in a UDL lexer is completely independent from defining regular expressions for single-line or multi-line comments in a function list parser.

            Lycan ThropeL 2 Replies Last reply Reply Quote 3
            • mpheathM
              mpheath @Lycan Thrope
              last edited by

              @Lycan-Thrope

              v8.4.9 or later as @dinkumoil has correctly stated to invalidate code matched in comment zones.

              Function parser reads the xml file for the patterns. Scintilla functions SCI_SETTARGETRANGE and SCI_SEARCHINTARGET are called by searchInTarget(). The regular expression engine gets the matches from the text in the Scintilla edit control and are used to update the Function List panel. Lexilla has no involvement in this process AFAIK so lexing is unrelated to the issue.

              Try this

              			commentExpr="(?x)
              							(?s:/\*.*?\*/)
              							|(?m-s://.*$)
              							|(?m-s:&&.*$)
              							"
              

              Missing some alternations | between the groups? so I inserted them in the pattern.

              .* is very greedy and might be OK for multiline mode because it will stop at the end of a line. Consider .*? for single line mode to get smallest match else .* will try to match not the first literal instance of */ but the very last instance of */ which the later could be at the end of the document. Nothing special about & to escape.

              Lycan ThropeL 2 Replies Last reply Reply Quote 0
              • Lycan ThropeL
                Lycan Thrope @dinkumoil
                last edited by

                @dinkumoil ,
                I will try that and see if that is the case. I usually don’t upgrade right away, and since I was concerned about another issue in 8.4.9, I hadn’t updated installing my UDL package into 8.4.9, nor updated my working version to it.

                1 Reply Last reply Reply Quote 0
                • Lycan ThropeL
                  Lycan Thrope @mpheath
                  last edited by Lycan Thrope

                  @mpheath ,

                  Just for completeness, the changes you suggested to commentExpr, were originally how the code was done, but the above is the result after making different attempts at trying to get it to work. The above, with exception of the multi-line issue, works the way I have it above, and yes when I was using Mark to try and work out the regex aspect, it did show greedy selection, which I maybe erroneously, was attributing to the multi-line aspect actually not working and selecting all text within it to process.

                  Edit: Also, after checking, I want to point out that in the c.xml functionList panel file, those commentExpr options are not OR’d out either, which is why I removed the OR’s in mine.

                  mpheathM 1 Reply Last reply Reply Quote 0
                  • Lycan ThropeL
                    Lycan Thrope @dinkumoil
                    last edited by Lycan Thrope

                    @dinkumoil ,

                    Edit: You are right, however, it is a mixed parser language.

                    Unfortunately, it appears to be ineffective in this version of 8.4.9 as well. I’m not sure if the fix was put in the codebase but not included in the last 8.4.9 portable, I have on my machine, per this Debug info:

                    Notepad++ v8.4.9   (64-bit)
                    Build time : Jan 27 2023 - 03:11:16
                    Path : C:\Users\camilee\Documents\Development Tools Downloads\Notepad++ Versions\npp.8.4.9.portable.x64\notepad++.exe
                    Command Line : 
                    Admin mode : OFF
                    Local Conf mode : ON
                    Cloud Config : OFF
                    OS Name : Windows 10 Home (64-bit) 
                    OS Version : 22H2
                    OS Build : 19045.2486
                    Current ANSI codepage : 1252
                    Plugins : 
                        mimeTools (2.9)
                        NppConverter (4.5)
                        NppExport (0.4)
                    
                    

                    Presuming this version is the latest release version, the following screenshots will show that the Orig named file which was unchanged, and the New file that had single line comments instead, under the same test files shows the difference I’m referring to. The endclass in the comment, is closing the class at the top of the editor window, and grabbing the next commented text which is Class constructor which is in comments and is showing up in the functionList panel. The New named file, shows the proper Class name being shown in the functionList panel by using single line comments to comment out the lines the multi-line comment is not ignoring.

                    Original file:
                    testdOrigFile.PNG

                    New file:
                    testdNewFile.PNG

                    ccs_Object is the real class name in the New file. constructor is the commented out name that is being used in the Orig file.

                    Lycan ThropeL 1 Reply Last reply Reply Quote 0
                    • Lycan ThropeL
                      Lycan Thrope @Lycan Thrope
                      last edited by

                      @Lycan-Thrope ,

                      Just downloaded the latest from the website and checked the build date, so apparently the version I used to take these shots, is the current 8.4.9 build. So the issue still does exist then. I’ll probably be submitting an issue then.

                      1 Reply Last reply Reply Quote 0
                      • mpheathM
                        mpheath @Lycan Thrope
                        last edited by

                        @Lycan-Thrope said in functionList not ignoring comments:

                        @mpheath ,
                        …
                        Edit: Also, after checking, I want to point out that in the c.xml functionList panel file, those commentExpr options are not OR’d out either, which is why I removed the OR’s in mine.

                        https://github.com/notepad-plus-plus/notepad-plus-plus/blob/f38195a0da5657f447ccc949bc77f38dba06aa61/PowerEditor/installer/functionList/c.xml#L15-L20

                        Can you look again? I see | characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expect, though @dinkumoil has more experience with Function List patterns so might be able to confirm.

                        I cannot test bits and pieces so am at a disadvantage to be 100% sure of my advice. I do have a custom Notepad++ build that outputs the pairs and a Python 3 script that uses the pairs to make a hta (html) file showing the matched zones. This might be useful to show the matching pairs visually with info popups like for example.

                        … when I was using Mark to try and work out the regex aspect …

                        The searching is done with zones pairs. A discussion about the function unit parser behaviour, though the (Class/Method/Function) mix parser uses zones though a bit different. It is not exactly like the workings of Find and Replace dialog, so take note that the later F&R use is a guide, not a 100% certainty.

                        I’ll probably be submitting an issue then.

                        Where is the evidence of a bug? Where are the files to reproduce the issue? Need something to focus on to find a bug. Images are nice to look at though we cannot run the function list parser on images. Suggest to get your stuff sorted here and now and exhaust all options before creating an issue.

                        dinkumoilD Lycan ThropeL 2 Replies Last reply Reply Quote 3
                        • dinkumoilD
                          dinkumoil @mpheath
                          last edited by

                          @Lycan-Thrope

                          @mpheath said in functionList not ignoring comments:

                          Can you look again? I see | characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expect

                          That’s exactly the answer I would have given. Regular expressions for matching different types of comments have to be OR’d, otherwise they would not work.

                          @mpheath said in functionList not ignoring comments:

                          Where are the files to reproduce the issue? Need something to focus on to find a bug. Images are nice to look at though we cannot run the function list parser on images. Suggest to get your stuff sorted here and now and exhaust all options before creating an issue.

                          Thumbs up, nothing more to say.

                          Lycan ThropeL 1 Reply Last reply Reply Quote 2
                          • Lycan ThropeL
                            Lycan Thrope @mpheath
                            last edited by Lycan Thrope

                            @mpheath said in functionList not ignoring comments:

                            Can you look again? I see | characters separating the groups in c.xml. Otherwise the patterns of the groups would be treated as 1 whole pattern to try to match that I would expect, though @dinkumoil has more experience with Function List patterns so might be able to confirm.

                            Thanks, I was wondering why I couldn’t reproduce one of the original problems again. You were right, the c.xml file commentExpr was OR’d, but were aligned with the indentation indicators, which is why I missed them. As you’ll see from this image, it has removed even the faux class name from the list of the Original file, constructor isn’t there anymore. Nothing is there. Removing the OR, showed the issue of the missing class and why it was missing.

                            testdORComment.PNG

                            I will be preparing these things. The images are just so I can show what I’m seeing to maybe jog anyone’s memory if, other than just @dinkumoil’s, if they’ve seen this behavior, and to show, I’m not imagining it.

                            That’s why I’m posting it here first, to discuss it and to iron out any misgivings, like this one, on my end.

                            1 Reply Last reply Reply Quote 0
                            • Lycan ThropeL
                              Lycan Thrope @dinkumoil
                              last edited by Lycan Thrope

                              @dinkumoil ,
                              Here’s the code that is being used for the UDL and the functionList parser. The following sample code is modified from the above with the addition of a function outside of the class, to show the mixed parser functionality working.

                              Sample:

                              
                              class ccs_Object(vInstanceId) of Timer() custom
                              
                              /*
                                    class newClass(vInstanceId) of ccs_Object(vInstanceId)
                              
                                       ... class construct ...
                              
                                       ... class methods ...
                              
                                    endclass
                              
                              */
                              
                              /*
                                 =============================================================================
                              
                                 Class constructor
                              
                                 =============================================================================
                              */
                                 function IamLoaded()
                                    return
                              
                              endclass
                              function outsideClassTest
                              junk = nothing
                              return
                              
                              
                              

                              testd.xml - UDL

                              <NotepadPlus>
                                  <UserLang name="testd" ext="wfm cfm cdm rep crp prg cc mnu sfm dmd lab" udlVersion="2.1">
                                      <Settings>
                                          <Global caseIgnored="yes" allowFoldOfComments="no" foldCompact="no" forcePureLC="2" decimalSeparator="0" />
                                          <Prefix Keywords1="no" Keywords2="no" Keywords3="no" Keywords4="no" Keywords5="no" Keywords6="no" Keywords7="no" Keywords8="no" />
                                      </Settings>
                                      <KeywordLists>
                                          <Keywords name="Comments">00* 01 02 03/* 04*/</Keywords>
                                          <Keywords name="Numbers, prefix1"></Keywords>
                                          <Keywords name="Numbers, prefix2">0x</Keywords>
                                          <Keywords name="Numbers, extras1">A B C D E F a b c d e f</Keywords>
                                          <Keywords name="Numbers, extras2"></Keywords>
                                          <Keywords name="Numbers, suffix1">B b O o</Keywords>
                                          <Keywords name="Numbers, suffix2"></Keywords>
                                          <Keywords name="Numbers, range"></Keywords>
                                          <Keywords name="Operators1">= ; | := += -= *= /= %= == <> # > < >= <= ** ++ -- -> & + - * $ % ^ / ,</Keywords>
                                          <Keywords name="Operators2"></Keywords>
                                          <Keywords name="Folders in code1, open"></Keywords>
                                          <Keywords name="Folders in code1, middle"></Keywords>
                                          <Keywords name="Folders in code1, close"></Keywords>
                                          <Keywords name="Folders in code2, open">"do case" "do while" #if class do do do if if for for function try printjob with</Keywords>
                                          <Keywords name="Folders in code2, middle">#elseif else elseif</Keywords>
                                          <Keywords name="Folders in code2, close">endcase until #endif endclass enddo while with endif next endfor return endtry endprintjob endwith</Keywords>
                                          <Keywords name="Folders in comment, open"></Keywords>
                                          <Keywords name="Folders in comment, middle"></Keywords>
                                          <Keywords name="Folders in comment, close"></Keywords>
                                          <Keywords name="Keywords1">'close procedure' 'set procedure to' additive form local parameter parameters persistent procedure</Keywords>
                                          <Keywords name="Keywords2">case catch exit finally loop otherwise throw</Keywords>
                                          <Keywords name="Keywords3">AND NEW NOT OR of</Keywords>
                                          <Keywords name="Keywords4">'.and.' '.f.' '.not.' '.or.' '.t.'</Keywords>
                                          <Keywords name="Keywords5"></Keywords>
                                          <Keywords name="Keywords6">false true</Keywords>
                                          <Keywords name="Keywords7"></Keywords>
                                          <Keywords name="Keywords8"></Keywords>
                                          <Keywords name="Delimiters">00[ 01 02] 03( 04 05) 06{ 07 08} 09// 09&& 10 11((EOL)) 11((EOL)) 12" 13 14" 15 16 17 18 19 20 21 22 23</Keywords>
                                      </KeywordLists>
                                      <Styles>
                                          <WordsStyle name="DEFAULT" fgColor="FFFF00" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="COMMENTS" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" />
                                          <WordsStyle name="LINE COMMENTS" fgColor="00FF00" bgColor="808080" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" />
                                          <WordsStyle name="NUMBERS" fgColor="00FFFF" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS1" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS3" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS4" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS5" fgColor="FFFFFF" bgColor="000000" fontStyle="0" nesting="0" />
                                          <WordsStyle name="KEYWORDS6" fgColor="00FFFF" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="KEYWORDS7" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
                                          <WordsStyle name="KEYWORDS8" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
                                          <WordsStyle name="OPERATORS" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="FOLDER IN CODE1" fgColor="000000" bgColor="FFFFFF" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="FOLDER IN CODE2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="1" fontSize="11" nesting="0" />
                                          <WordsStyle name="FOLDER IN COMMENT" fgColor="FFFFFF" bgColor="000000" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="DELIMITERS1" fgColor="00FFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="33554432" />
                                          <WordsStyle name="DELIMITERS2" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="83931154" />
                                          <WordsStyle name="DELIMITERS3" fgColor="FFFFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="83886087" />
                                          <WordsStyle name="DELIMITERS4" fgColor="00FF00" bgColor="FFFFFF" colorStyle="1" fontName="Source Code Pro" fontStyle="3" fontSize="11" nesting="0" />
                                          <WordsStyle name="DELIMITERS5" fgColor="00FFFF" bgColor="000000" colorStyle="1" fontName="Source Code Pro" fontStyle="0" fontSize="11" nesting="0" />
                                          <WordsStyle name="DELIMITERS6" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
                                          <WordsStyle name="DELIMITERS7" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
                                          <WordsStyle name="DELIMITERS8" fgColor="000000" bgColor="FFFFFF" fontStyle="0" nesting="0" />
                                      </Styles>
                                  </UserLang>
                              </NotepadPlus>
                              
                              

                              testd.xml - functionList

                              <?xml version="1.0" encoding="UTF-8" ?>
                              <!-- ==========================================================================\
                              |
                              |   To learn how to make your own language parser, please check the following
                              |   link:
                              |       https://npp-user-manual.org/docs/function-list/
                              |
                              \=========================================================================== -->
                              
                              <!--Much of this regex is courtesy of Peter Jones of the Notepad++ Community,
                              with some changes made for customization by Lee Grant-->
                              
                              <NotepadPlus>
                              	<functionList>
                              		<!-- ========================================================= [ testd ] -->
                              		<parser
                              			displayName="testd"
                              			id         ="testd"
                              			commentExpr="(?x)
                              							(?s:/\*.*?\*/)
                              							|(?m-s://.*?$)
                              							|(?m-s:&&.*?$)
                              							"
                              		>
                              			<classRange
                              				mainExpr="(?xi)                        #  Free-spacing mode and inline comments + search sensitive to case
                              
                              						  ^\h*                          #  Optional leading whitespace chars
                              						  class                         #  'class' keyword
                              						  \h?                           #  Optional whitepace char
                              						  \w+                           #  Class name
                                                  \h?
                              														#  Following the class name there is the option of parameters, and if so the first entry inside the parens is required, whether there is other 
                              														#  parameters or not, once the parens go up, the first is required. ie: class FrameCtrl(frameObj)
                              
                              						  (?:                             #  Beginning of the optional parameter(s) part  ( Group 1 )
                              							\(                      #    Opening parenthesis
                              							(\h*\w+\h*)?                         #    First and required parameter
                              							( ,? \h* \w+\h*)*               #    Following optional/additional parameters
                              							\)                          #    Closing  parenthesis
                              						  )?                            #  End of the optional parameter(s) part
                              
                              														#  For the rest of the class declaration, after the class name, all other options are part of one big optional set, that follows 'of'
                              							\h?							#  and can be populated by one of several options.
                              
                              						  (?:                           #  Beginning of the main optional part, in a non-capturing group
                              
                              														#    The first and most prevalent is the Superclass name that the class is being subclassed from, and it's options of parameters and again, 
                              														#    if it has parameters, at least the first one is required ie.: class ToolButtonFx(oParent) of Toolbutton(oParent).
                              
                              							of                     #    Optional 'of' keyword, surrounded by 1 horizontal whitespace char
                                                    \h?
                              							\w+                         #    Superclass name
                                                   \h?
                              						  (?:                             #  Beginning of the optional parameter(s) part  ( Group 2 )
                              							\(                      #    Opening parenthesis
                              							(\h?\w+?\h?)?                         #    First and required parameter
                              							( ,? \h* \w+\h*)*               #    Following optional/additional parameters
                              							\)                          #    Closing  parenthesis
                              						  )?                            #  End of the optional parameter(s) part
                              
                              														#    The next possible option is that it is a custom object and needs to be in this line so if the object or form is opened up in the dBASE IDE,
                              														#    the designers in it won't mess up the object by streaming out missing parts or overriding properties or objects and functions.
                                                    #\h*
                              							#( custom )?              #    Optional 'custom' keyword 
                              
                              														#    The next possible option is that the class is being subclassed from another object that is contained elsewhere and the compiler needs to know
                              														#    this reference. There are two options for pointing to the file. The first is an Alias path in the IDE that can be accessed by the compiler
                              														#    in the environment, or second, it is in the current directory and only the name is needed...or it has a path that can be listed here,
                              														#    but this is bad practice, and an Alias is recommended if the file is in a place other than the current directory. If it is, the name can be
                              														#    used in quotes as a string that gets passed to the compiler. Both follow the word 'From'. The Alias directory is a name that is enclosed
                              														#    in two colons, one immediately before the Alias name and one immediately after, no spaces.
                                                   \h*
                              							(?:                         #    Beginning of the optional part, in a non-capturing group
                              							  from                 #      Optional 'from' keyword, surrounded by 1 horizontal whitespace char
                                                   \h*
                              							  (?:                       #    Beginning of a non-capturing group
                              								  : \w+ : \w+ \. \w+    #        First pointing file case
                              								|                       #      OR
                              								  \x22 \w+ \. \w+ \x22  #        Second pointing file case
                              							  )                         #    End of a non-capturing group
                              
                              							)?                          #    End of the optional part
                                                   \h*
                                                   (?: custom )?              #    Optional 'custom' keyword 
                              
                              						  )?                            #  End of the main optional part
                              
                              						  $                             #  End of current line and end of the class declaration
                              
                              						  (?si:.*?^\h*endclass)           #  must match all the way to 'endclass'
                              
                              
                              						 "
                              
                              						 
                              			>
                              				<className>
                              					<nameExpr
                              						expr="(?xi)                    #  Free-spacing mode and inline comments and search sensible to case
                              						      \h*                       #  Optional leading whitespace chars
                              						      class                     #  'class' keyword
                              						      \h?                       #  Optional whitepace char
                              						      \K\w+                     #  Pure class name
                              						     "
                              					/>
                              					
                              				</className>
                              			<function
                              					mainExpr="(?xi-s) 
                              									^
                              									\h* 
                              									(?:
                              									
                              									function \h+ \w+
                              									|
                              									procedure \h+ \w+
                              									|
                              									with \h+ \(.*?\)
                              								)
                              								\h*
                              							"
                              				>
                              					<functionName>
                              						<funcNameExpr 
                              								expr="(?xi-s)					# multiline/comments
                              									^							# trying to keep following keywords from being included in comments
                              									\h*							# allow leading spaces
                              									(?:
                              									
                              										function				# must have word 'function' as first word
                              										\h+						# must have at least one horizontal space after function
                              										 						# \K don't keep 'function' in the name of the function in the panel
                              										\w+						# the name of the function is the first whole word after 'function'
                              									|
                              										procedure				# must have word 'procedure' as first word
                              										\h+      				# must have at least one horizontal space after procedure
                              																# \K don't keep 'procedure' in the name of the function in the panel
                              										(?!to\b)\w+ 			# the name of the function is the first whole word after 'procedure' - 'to'
                              																# so as to exclude any 'set procedure to' statements, needs work though.
                              									|
                              										with
                              										\h+
                              										\K
                              										\(
                              										\Kthis\.\K(.+)(?=\))			# all but 'this' and the closing parens.
                              									|
                              										with
                              										\h+
                              										\K
                              										\(
                              										\K(.*?)(?=\))
                              									)
                              									"
                              						/>
                              					</functionName>
                              				</function>
                              			</classRange>
                              			<function
                              					mainExpr="(?xi-s) 
                              									^
                              									\h* 
                              									(?:
                              									function \h+ \w+
                              									|
                              									procedure \h+ \w+
                              								)
                              								\h*
                              							"
                              				>
                              					<functionName>
                              						<nameExpr 
                              								expr="(?xi-s)				# multiline/comments
                              								
                              								\h*							# allow leading spaces
                              								(?:
                              									function				# must have word 'function' as first word
                              									\h+						# must have at least one horizontal space after function
                              									#\K 						# don't keep 'function' in the name of the function in the panel
                              									\w+						# the name of the function is the first whole word after 'function'
                              									|
                              									procedure
                              									\h+
                              									#\K
                              									(?!to\b)\w+
                              								)
                              									"
                              						/>
                              					</functionName>
                              				</function>
                              		</parser>
                              	</functionList>
                              </NotepadPlus>
                              

                              As noted above, I have re-OR’d the commentExpr regex in the above functionList parser testd.xml file.

                              Remove the OR’s in it to re-produce the class constructor, instead of class ccs_Object, showing that the endclass in comments is being read by the parser and used to close off the class ccs_Object with no function inside it.

                              Alternatively, use single line comments // instead of the block comments /* */ to show the proper class ccs_Object named.

                              Lycan ThropeL 1 Reply Last reply Reply Quote 0
                              • Lycan ThropeL
                                Lycan Thrope @Lycan Thrope
                                last edited by Lycan Thrope

                                @Lycan-Thrope
                                And here’s the overrideMap.xml that the other message wouldn’t let me put in since the previous post is so long.

                                <?xml version="1.0" encoding="UTF-8" ?>
                                <!-- ==========================================================================\
                                |
                                |   To learn how to make your own language parser, please check the following
                                |   link:
                                |       https://npp-user-manual.org/docs/function-list/
                                |
                                \=========================================================================== -->
                                <NotepadPlus>
                                	<functionList>
                                		<associationMap>
                                		<!--
                                			This file is optional (can be removed).
                                			Each functionlist parse rule links to a language ID ("langID").
                                			The "id" is the parse rule's default file name, but users can override it.
                                			Here are the default value they are using:
                                
                                			<association id= "php.xml"			 langID= "1" />
                                			<association id= "c.xml"			 langID= "2" />	
                                			<association id= "cpp.xml"			 langID= "3" />		(C++)
                                			<association id= "cs.xml"			 langID= "4" />		(C#)
                                			<association id= "objc.xml"			 langID= "5" />		(Obective-C)
                                			<association id= "java.xml"			 langID= "6" />
                                			<association id= "rc.xml"			 langID= "7" />		(Windows Resource file)
                                			<association id= "html.xml"			 langID= "8" />
                                			<association id= "xml.xml"			 langID= "9" />
                                			<association id= "makefile.xml"		 langID= "10"/>
                                			<association id= "pascal.xml"		 langID= "11"/>
                                			<association id= "batch.xml"		 langID= "12"/>	
                                			<association id= "ini.xml"			 langID= "13"/>
                                			<association id= "asp.xml"			 langID= "16"/>
                                			<association id= "sql.xml"			 langID= "17"/>
                                			<association id= "vb.xml"			 langID= "18"/>
                                			<association id= "css.xml"			 langID= "20"/>
                                			<association id= "perl.xml"			 langID= "21"/>
                                			<association id= "python.xml"		 langID= "22"/>
                                			<association id= "lua.xml"			 langID= "23"/>
                                			<association id= "tex.xml"			 langID= "24"/>		(TeX)
                                			<association id= "fortran.xml"		 langID= "25"/>
                                			<association id= "bash.xml"			 langID= "26"/>
                                			<association id= "actionscript.xml"  langID= "27"/>
                                			<association id= "nsis.xml"			 langID= "28"/>
                                			<association id= "tcl.xml"			 langID= "29"/>
                                			<association id= "lisp.xml"			 langID= "30"/>
                                			<association id= "scheme.xml"		 langID= "31"/>
                                			<association id= "asm.xml"			 langID= "32"/>		(Assembly)
                                			<association id= "diff.xml"			 langID= "33"/>
                                			<association id= "props.xml"		 langID= "34"/>	
                                			<association id= "postscript.xml"	 langID= "35"/>
                                			<association id= "ruby.xml"			 langID= "36"/>
                                			<association id= "smalltalk.xml"	 langID= "37"/>	
                                			<association id= "vhdl.xml"			 langID= "38"/>
                                			<association id= "kix.xml"			 langID= "39"/>		(KiXtart)
                                			<association id= "autoit.xml"		 langID= "40"/>
                                			<association id= "caml.xml"			 langID= "41"/>
                                			<association id= "ada.xml"			 langID= "42"/>
                                			<association id= "verilog.xml"		 langID= "43"/>
                                			<association id= "matlab.xml"		 langID= "44"/>
                                			<association id= "haskell.xml"		 langID= "45"/>
                                			<association id= "inno.xml"			 langID= "46"/>		(Inno Setup)
                                			<association id= "cmake.xml"		 langID= "48"/>	
                                			<association id= "yaml.xml"			 langID= "49"/>
                                			<association id= "cobol.xml"		 langID= "50"/>	
                                			<association id= "gui4cli.xml"		 langID= "51"/>
                                			<association id= "d.xml"			 langID= "52"/>	
                                			<association id= "powershell.xml"	 langID= "53"/>
                                			<association id= "r.xml"			 langID= "54"/>	
                                			<association id= "jsp.xml"			 langID= "55"/>
                                			<association id= "coffeescript.xml"  langID= "56"/>
                                			<association id= "json.xml"			 langID= "57"/>
                                			<association id= "javascript.js.xml" langID= "58"/>
                                			<association id= "fortran77.xml"	 langID= "59"/>	
                                			<association id= "baanc.xml"		 langID= "60"/>		(BaanC)
                                			<association id= "srec.xml"			 langID= "61"/>		(Motorola S-Record binary data)
                                			<association id= "ihex.xml"			 langID= "62"/>		(Intel HEX binary data)
                                			<association id= "tehex.xml"		 langID= "63"/>		(Tektronix extended HEX binary data)
                                			<association id= "swift.xml"		 langID= "64"/>	
                                			<association id= "asn1.xml"			 langID= "65"/>		(Abstract Syntax Notation One)
                                			<association id= "avs.xml"			 langID= "66"/>		(AviSynth)
                                			<association id= "blitzbasic.xml"	 langID= "67"/>		(BlitzBasic)
                                			<association id= "purebasic.xml"	 langID= "68"/>	
                                			<association id= "freebasic.xml"	 langID= "69"/>	
                                			<association id= "csound.xml"		 langID= "70"/>
                                			<association id= "erlang.xml"		 langID= "71"/>
                                			<association id= "escript.xml"		 langID= "72"/>
                                			<association id= "forth.xml"		 langID= "73"/>	
                                			<association id= "latex.xml"		 langID= "74"/>	
                                			<association id= "mmixal.xml"		 langID= "75"/>
                                			<association id= "nimrod.xml"		 langID= "76"/>
                                			<association id= "nncrontab.xml"	 langID= "77"/>		(extended crontab)
                                			<association id= "oscript.xml"		 langID= "78"/>
                                			<association id= "rebol.xml"		 langID= "79"/>	
                                			<association id= "registry.xml"		 langID= "80"/>
                                			<association id= "rust.xml"			 langID= "81"/>
                                			<association id= "spice.xml"		 langID= "82"/>	
                                			<association id= "txt2tags.xml"		 langID= "83"/>
                                			<association id= "visualprolog.xml"  langID= "84"/>
                                			<association id= "typescript.xml"  	 langID= "85"/>
                                
                                			If you create your own parse rule of supported languages (above) for your specific need,
                                			you can copy it without modifying the original one, and make it point to your rule.
                                			
                                			For example, you have created your php parse rule, named "myphp2.xml". You add the rule file
                                			into the functionlist directory and add the following line in this file:
                                			<association id= "myphp2.xml"		langID= "1" />
                                			and that's it.
                                		-->
                                		
                                
                                		<!--
                                		   As there is currently only one langID for COBOL:
                                		   uncomment the following line to change to cobol-free.xml (cobol section free)
                                		   if this is your favourite format
                                		-->
                                		<!--
                                		<association id= "cobol-free.xml"		langID= "50"/>
                                		-->
                                		
                                		<!--
                                			For User Defined Languages (UDL's) use:
                                			
                                			<association id="my_parser.xml" userDefinedLangName="My UDL Name" />
                                			
                                			If "My UDL Name" doesn't exist yet, you can create it via UDL dialog.
                                		-->
                                			<!-- ==================== User Defined Languages ============================ -->
                                			<association id= "testd.xml"			userDefinedLangName="testd"/>
                                			<association id= "krl.xml"				userDefinedLangName="KRL"/>
                                			<association id= "sinumerik.xml"		userDefinedLangName="Sinumerik"/>
                                			<association id= "universe_basic.xml"	userDefinedLangName="UniVerse BASIC"/>
                                			<!-- ======================================================================== -->
                                		</associationMap>
                                	</functionList>
                                </NotepadPlus>
                                
                                
                                Lycan ThropeL 1 Reply Last reply Reply Quote 0
                                • Lycan ThropeL
                                  Lycan Thrope @Lycan Thrope
                                  last edited by Lycan Thrope

                                  @Lycan-Thrope ,
                                  The result of these files, again, can be viewed with this version of the test file above. One with block comments, one with single line comments and the one with single line comments with the class expanded to view the internal function/method.

                                  Block Comments:
                                  blockcommenttest.PNG

                                  Single Line Comments:
                                  singlecommenttest.PNG

                                  And this one shows the class opened up to see the internal function/method:
                                  singlecommenttestClassOpened.PNG

                                  Waiting to hear your feedback.

                                  1 Reply Last reply Reply Quote 0
                                  • Lycan ThropeL
                                    Lycan Thrope @mpheath
                                    last edited by Lycan Thrope

                                    @mpheath ,
                                    Incidentally, you’ll see I did put the original code back in with regards your points of ? in the block comment search. Like I said, I did have them that way originally, but was trying different things, and one of them is the one that exposed that comments weren’t ignoring the endclass in comments. It may be the keyword pair that closes the class definition, but as it is behind comments, it should not be paid any attention. That’s not happening for some reason. In addition, if I take that ? out, we get the next comment that has Class constructor in the comment section chosen to be a functionList entry.

                                    mpheathM 1 Reply Last reply Reply Quote 0
                                    • mpheathM
                                      mpheath @Lycan Thrope
                                      last edited by mpheath

                                      @Lycan-Thrope

                                      Thanks for the files. This is the output of the zones…

                                      commentZone: 51, 209
                                      commentZone: 213, 409
                                      classZone: 0, 203
                                      classZone: 301, 460
                                      funcZone: 462, 487
                                      

                                      This is the visual (Disregard green at end as it is missing a closing span tag). The Class match is from the start of the document (pos 0) and ends at endclass (pos 203) inside the comment zone. Code within comments are invalid as a match. This class match is not within a comment zone so is a valid match as far as the parser is concerned. Making it invalid because it is partly into a comment zone would IMO lose the Class from being displayed in the functionlist panel and would be a worse result.

                                      Looking at the pascal.xml function list file, I do not see patterns trying to locate the end keyword of a Class, Function or Procedure. So in testd.xml , is it necessary to match to endclass to acknowledge a Class definition?

                                      Class definition:

                                      class ccs_Object(vInstanceId) of Timer() custom
                                      

                                      I do not know the fine details of recognizing class functions from ordinary functions with the functionlist patterns so @dinkumoil may be able to help with that.

                                      Greediness in the patterns seems to be a problem with the functionlist parsing so suggest to avoid using greedy patterns. The larger the match, the more vunerable it is to run into an issue like this one. Try to aim for the smallest match to achieve a successful result. I consider that @dinkumoil has achieved that with pascal.xml.

                                      Lycan ThropeL 1 Reply Last reply Reply Quote 1
                                      • Lycan ThropeL
                                        Lycan Thrope @mpheath
                                        last edited by Lycan Thrope

                                        @mpheath ,
                                        What am I looking at there? Is that the character/caret positions?

                                        To answer your question, yes, we needed to have the endclass found to close the class. That regex was giving me trouble, as you point out, trying to find only the least amount. @PeterJones , @guy038 , and a few other users here helped formulate the needed regex in that functionList parser file. We needed that to make the regex run the entire length down to the endclass so we could find and close the Class definition.

                                        The dBASE Plus OOP isn’t like C++, where typically, one file = one class. We can have multiple classes defined per file, and thanks to @PeterJones , we also were able to have our objects display like functions inside a Class definition, because they are basically UI objects inside a Class…they technically are other class objects being defined inside a class. So yes, it needs to find the endclass to close one class, and find another.

                                        We can’t use the OpenSymbole/CloseSymbole delimiters like C++ can. We tried putting the terms class/endclass inside those, but I suspect they are only meant for single characters as we never were able to get them to be used for identifying the open/close it needed to frame the Class definitions.

                                        But, again…the endclass should be ignored, inside comments. Period. :)

                                        Edit: Yes, they are character/caret positions, so I see and you see what I’m saying. :) The numbers are off, I suspect for some kind of whitespace offset or newline offset. I finally realized you had a link for me to follow to see that graphic. Yes…how did you produce that?

                                        mpheathM 1 Reply Last reply Reply Quote 0
                                        • mpheathM
                                          mpheath @Lycan Thrope
                                          last edited by

                                          @Lycan-Thrope

                                          Edit: Yes, they are character/caret positions, so I see and you see what I’m saying. :) The numbers are off, I suspect for some kind of whitespace offset or newline offset. I finally realized you had a link for me to follow to see that graphic. Yes…how did you produce that?

                                          They are positions of start and end pairs of a match. I consider the start pos as off by 1. Produced by stdout with some debugging code that the Python script captures and processes it into html. It gives a reasonable visual of the matches, though not perfect as the missing unclosed span tag probably due to overlapping zones.

                                          But, again…the endclass should be ignored, inside comments. Period. :)

                                          You can state that, though the reality is that you may need to work with the parser as it is. I could state it is the pattern that needs fixing as it trys to handle everything from class to endclass. Demands have no relevence without reason.

                                          To answer your question, yes, we needed to have the endclass found to close the class.

                                          Not sure why. The pattern searches to the endclass and yet the pattern does not properly handle code in comments. Seems like an issue with the pattern as it wants to do this rather large match. To blame the parser for the greedy pattern seems unreasonable.

                                          The subject is not about the C++ functionlist so is opening a new argument. I mentioned the pascal.xml which Pascal uses begin and end, though as I mentioned, the Pascal patterns do not look for the end keyword.

                                          The way I see it is that the patterns need to work with the parser. The parser will probably not be changed to make up for patterns that fail to work due to being large. The maintainer IMO will not risk all the functionlist patterns in use to make you or your developer happy even if it were possible. There is a bash functionlist file issue at least 2 years old and risking the other functionlist files could be a bad option. Suggest you fix the pattern.

                                          Lycan ThropeL 1 Reply Last reply Reply Quote 1
                                          • Lycan ThropeL
                                            Lycan Thrope @mpheath
                                            last edited by Lycan Thrope

                                            @mpheath said in functionList not ignoring comments:

                                            You can state that, though the reality is that you may need to work with the parser as it is. I could state it is the pattern that needs fixing as it trys to handle everything from class to endclass. Demands have no relevence without reason.

                                            I will have to respectfully, disagree.
                                            The pattern works. It has been working. It continues to work.

                                            The problem is the parser, and in particular, the commentZones code, in this instance.

                                            If developers don’t comment their code, the parser is safe, but that’s not realistic, either. The simple fact is, the parser doesn’t recognize comments inside a class. It recognizes it before, or after, but not inside…particularly when those comments have actual keywords. Perhaps, you could say, it’s the Zone that is the problem, as it doesn’t recognize when something (a class in this instance) has not ended, but has comments started inside it. Multi-line comments, in this case, and it is the parsers weakness that it doesn’t realize it’s still inside a class. That’s why the classrange element has these:

                                            openSymbole ="\{"
                                            closeSymbole="\}"
                                            

                                            And the parser makes calls like this:

                                            bool FunctionParsersManager::getZonePaserParameters(TiXmlNode *classRangeParser, generic_string &mainExprStr, generic_string &openSymboleStr, generic_string &closeSymboleStr, std::vector<generic_string> &classNameExprArray, generic_string &functionExprStr, std::vector<generic_string> &functionNameExprArray)
                                            

                                            It can’t possibly parse a zone, if it doesn’t know the beginning and end of it.

                                            It’s because the parser doesn’t allow keywords to be used in the open and close Symbole, that having to find endclass became necessary.

                                            Another way to work around this is not to put block comments inside a class, because the parser doesn’t work like a real parser. It’s a quasi-parser. That kind of encourages not commenting code, which seems like it is encouraging a bad habit that is already rampant. Either way, you can’t reasonably expect the parser to work properly if it doesn’t allow the parserLanguage.xml to define it’s open and close pattern, and that’s what this parser is doing, and so this pattern was improvised to make the parser function.

                                            So please, don’t disparage the pattern. I’d like to think some really talented people created it, sans myself, right @PeterJones ? :)

                                            mpheathM PeterJonesP 2 Replies Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors