Community
    • Login

    Need help with functionlist regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 3 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MAPJe71M
      MAPJe71
      last edited by

      Try:

                  <function
                      mainExpr="^\s*((sub|function)\s+\K\w+|'\*\s+\K.*)"
                  >
                  </function>
      

      The * is a special character in RegEx’s, you need to escape it to match it literally.

      1 Reply Last reply Reply Quote 0
      • MaDillM
        MaDill
        last edited by

        Thank you for the information. But even with the * it is not working. Nothing is show up in the list. Even not the sub|function anymore. I tried it on https://regex101.com/ and there it highlight what I’m looking for, but not in the functionlist. Do you have another hint?

        1 Reply Last reply Reply Quote 0
        • MAPJe71M
          MAPJe71
          last edited by

          	<function
          		mainExpr="(?m-s)^\h*(?:(?i:sub|function)\s+\K\w+|\x27\*\h+\K.*)"
          	/>
          
          1 Reply Last reply Reply Quote 0
          • MaDillM
            MaDill
            last edited by

            Thank you again. With this regex the sub/function is working again. But the '* still not. Is the reqex style of the functionlist somehow special?

            1 Reply Last reply Reply Quote 0
            • MAPJe71M
              MAPJe71
              last edited by

              Is the reqex style of the functionlist somehow special?

              AFAIK it isn’t.

              1 Reply Last reply Reply Quote 0
              • MAPJe71M
                MAPJe71
                last edited by

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @madill, @mapje71,

                  I found a solution :-)) To test it :

                  • Open, in N++, your active functionList.xml file

                  • Add the line, below, inside the <associationMap> node :

                  			<association id= "Test" langID="0" />
                  
                  • Add the lines of the Test parser, below, inside the <parsers> node :
                  			<parser	id ="Test" displayName="Ma_Dill_Test" commentExpr="'(?!\* )(?-s:.+)" >
                  				<function
                  					mainExpr="^\s*(sub|function)\s+\K\w+|^'\*\s+\K(?-s:.+)" >
                  				</function>
                  			</parser>
                  
                  • Save the changes of functionList.xml

                  • Close and re-start Notepad++

                  • Open a new tab

                  • Copy your example text of your first post, in this new tab

                  • Open the Function List panel ( View > Function List )

                  => You should see your five functions, as you expect to !!


                  Notes :

                  • I preferred to add the commentExpr part, which defines the line-comment zones to avoid, for further search of functions !

                  • As I supposed that lines, beginning with '* followed by some space characters, define special marking/infos
                    it becomes obvious that comment lines are lines which :

                    • Begin by a single quote character ( ' ), NOT followed by an asterisk + a “space” character '(?!\* ), which is a negative look-ahead

                    • And followed by all standard characters of the current line => (?-s:.+). The -s modifier is needed, because, by default, the Function List Regex engine considers all the text as a single line. ( So the regex .+ would match any non-empty text, even on several lines. That is to say, all file contents ! )

                  • In the mainExpr regex, I just add the alternative ^'\*\s+\K(?-s).+, which looks, after beginning of line, for :

                    • A single quote, followed with an *asterisk ( '\* ) and, at least, one character, of type = “space” ( \s+ )

                    • Then, the syntax \K resets the regex search

                    • Finally, the part (?-s:.+), again, looks for the remainder of the current line, only, due to the -s modifier, and is simply displayed, in the functionList panel !


                  BTW, Mapje71, the differences between your two posts, are the part (?m-s), at beginning of the regex, in your last post ;-))

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • MaDillM
                    MaDill
                    last edited by MaDill

                    @guy038 and @MAPJe71 The version from guy038 is working. I don’t know why the other one is working on your screenshot but not here. Thank you to both for your time and help. How can I set the topic to solved?

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @madill, @mapje71, and All,

                      Updated on 07-22-17 ( \v syntax added )

                      MaDill, I, slightly, changed the mainExpr regex, as below :

                      (?i)^\h*(?:(?-i:Sub|Function)\s+\K\w+|'\*\s+\K(?-s:.+))

                      Notes :

                      • At beginning, the part (?i)^\h* means that the search is, globally, case insensitive and that the key-words ( Sub, Function and '* may be preceded by optional tabulation and/or space characters

                      • Then, the general structure, which follows, is a non-capturing group, made of two alternatives ( (?:.....|......) )

                      , As I thought that the key-words Sub and Function must have that strict case, I decided to create the sensitive non-capturing group (?-i:Sub|Function)

                      • Any key-words must be followed by, at least, one, horizontal or vertical, White Space character ( \s+ )

                      • Finally, after the reset behaviour, due to the \K syntax, we display, in the Function List panel, either :

                        • The name of current subroutine or function ( \w+, in case of key-words Sub/Function

                        • All the rest of current line, only, (?-s:.+), in case of key-word '\*

                      Do hope, you’ll like this interpretation ;-))


                      In all this discussion, we’re using, in regexes, either, the \s and/or the \h syntaxes. We could also add the \v syntax ! What they, all, refer to ?

                      Well, from the Wiki article :

                      https://en.wikipedia.org/wiki/Whitespace_character

                      we hear of the White Space definition, which is any character or series of characters, that represent horizontal or vertical space in typography. They, all, have the Unicode property “WSpace=Y”.


                      So, strictly :

                      • The Shorthand Character Class \s, used in the N++ Boost regex engine, matches any Vertical or Horizontal White Space character, of the list below :
                      U+0009			CHARACTER TABULATION
                      U+000A	
                      				LINE FEED
                      U+000B			VERTICAL TABULATION
                      U+000C			FORM FEED
                      U+000D	
                      				CARRIAGE RETURN
                      U+0020	 		SPACE
                      U+0085	…		NEXT LINE
                      U+00A0	 		NO-BREAK SPACE
                      U+1680	 		OGHAM SPACE MARK
                      U+2000	 		EN QUAD
                      U+2001	 		EM QUAD
                      U+2002	 		EN SPACE
                      U+2003	 		EM SPACE
                      U+2004	 		THREE-PER-EM SPACE
                      U+2005	 		FOUR-PER-EM SPACE
                      U+2006	 		SIX-PER-EM SPACE
                      U+2007	 		FIGURE SPACE
                      U+2008	 		PUNCTUATION SPACE
                      U+2009	 		THIN SPACE
                      U+200A	 		HAIR SPACE
                      U+2028	
		LINE SEPARATOR
                      U+2029	
		PARAGRAPH SEPARATOR
                      U+202F	 		NARROW NO-BREAK SPACE
                      U+205F	 		MEDIUM MATHEMATICAL SPACE
                      U+3000	 		IDEOGRAPHIC SPACE
                      

                      Moreover, it, also, matches the NON-WhiteSpace character, below :

                      U+200B	​		ZERO WIDTH SPACE
                      
                      • The Shorthand Character Class \h, used in the N++ Boost regex engine, matches any Horizontal White Space character, of the list below :
                      U+0009			CHARACTER TABULATION
                      U+0020	 		SPACE
                      U+00A0	 		NO-BREAK SPACE
                      U+1680	 		OGHAM SPACE MARK
                      U+2000	 		EN QUAD
                      U+2001	 		EM QUAD
                      U+2002	 		EN SPACE
                      U+2003	 		EM SPACE
                      U+2004	 		THREE-PER-EM SPACE
                      U+2005	 		FOUR-PER-EM SPACE
                      U+2006	 		SIX-PER-EM SPACE
                      U+2007	 		FIGURE SPACE
                      U+2008	 		PUNCTUATION SPACE
                      U+2009	 		THIN SPACE
                      U+200A	 		HAIR SPACE
                      U+202F	 		NARROW NO-BREAK SPACE
                      U+205F	 		MEDIUM MATHEMATICAL SPACE
                      U+3000	 		IDEOGRAPHIC SPACE
                      

                      As before, it, also, matches the NON-WhiteSpace character, below :

                      U+200B	​		ZERO WIDTH SPACE
                      
                      • The Shorthand Character Class \v, used in the N++ Boost regex engine, matches any Vertical White Space character, of the list below :
                      U+000A	
                      				LINE FEED
                      U+000B			VERTICAL TABULATION
                      U+000C			FORM FEED
                      U+000D	
                      				CARRIAGE RETURN
                      U+0085	…		NEXT LINE
                      U+2028	
		LINE SEPARATOR
                      U+2029	
		PARAGRAPH SEPARATOR
                      

                      And, logically, the \s class character is identical to the union of the two classes \h and \v !!


                      Luckily, most of these characters are never found, in Western scripts. Then, practically, we, just, have to remember that :

                      • The \s syntax is, generally, identical to the simple class [\t\n\r\x20]

                      • The \h syntax is, generally, identical to the simple class [\t\x20]

                      • The \v syntax is, generally, identical to the simple class [\n\r]

                      Cheers,

                      guy038

                      BTW, MaDill, the MAPJe71’s last regex does work, properly, on my “old” Win WP configuration !

                      1 Reply Last reply Reply Quote 1
                      • MaDillM
                        MaDill
                        last edited by

                        Thank you for the explanation.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors