Hello, @peterjones,
First, read this post to @coises, where I discuss the Unicode concept of identifiers, particularly in Perl !
Thus, as explained at the end of that post, I created a second version of my perl.xml file parser which should work correctly without significant delay !
In short :
I do NOT use any atomic structure !
In mainExpr of the class range, I do NOT use a named group but, simply, use the part ^ (?: package | class ) \b, twice !
I changed your prototype / signature syntax (?:\([^()]*+\)\s*+)?+ to (?: \( [\x20-\x7E\w]* \) \s* )?
I changed your attributes syntax (?:\:[^{]+)?+ to (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )?
In the two syntaxes above, I simply added \w within each character class
Note that, from this article https://www.effectiveperlprogramming.com/2015/04/use-v5-20-subroutine-signatures/, the following syntax seems possible :
sub animals ( $cat, $auto_id = get_id() ) {
say "$auto_id: The cat is $cat";
}
Thus, for prototype / signature syntax, I’ve allowed parentheses within the outer parentheses. If this example seems not pertinent, use the alternate syntax :
(?: \( [\x20-\x27\x2A\x7E\w]* \) \s* )?
Finally, I changed the regex
class name (?x)\s\K[^;{]+ to
(?x) \s+ \K .+? (?= \x20* [;{] )
BTW, my parser presently contains 13 strings \s. May be, the \h or even the [\t\x20] syntax should be more appropriate, in some parts ?
<?xml version="1.0" encoding="UTF-8" ?>
<!-- ==========================================================================\
|
| To learn how to make your own language parser, please check the following
| link:
| https://npp-user-manual.org/docs/function-list/
|
\=========================================================================== -->
<NotepadPlus>
<functionList>
<!-- ======================================================== [ PERL ] -->
<!-- Perl - functions and packages, including fully-qualtified subroutine names -->
<parser
displayName="Perl" id="perl_syntax"
commentExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-s: # 'Multi-lines' mode ( ^ and $ match at line-breaks ) / 'Dot' char does NOT match line-breaks
\x23 .* # Single Line Comment ( #................ )
) #
| # OR
(?s: # 'Single line' mode (letter s optional as mode set by DEFAULT)
__ (?: END | DATA ) __ # String '__END__' or '__DATA__'
.* # ANY character(s), including line-breaks, till...
\Z # Last line-break, included
)
"
>
<classRange
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Multi-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ # NO leading white-space at start of line
(?: package | class ) \b # Header : word 'package' or 'clas', in LOWER case
(?s: # 'Single line' mode (letter s optional as mode set by DEFAULT)
.+? # ANY character(s), including line-breaks, till...
) # Section below, excluded
(?= # Start of look-ahead
\s* # Optional leading white-space of
^ # NO leading white-space at start of line
(?: package | class ) \b # Next header : word 'package' or 'clas', in LOWER case
| # OR
\Z # last line-break
) # End of look-ahead
"
>
<className>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
\s+ # Leading white-space(s)
\K # Discard text matched so far
.+? # ANY character(s) till...
(?= \x20* [;{] ) # First semi-colon or left brace, excluded
"
/>
</className>
<function
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ \h* # Optional leading spaces or tabulations
(?: sub | method ) \b # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
(?: \w+ :: )* # Optional list of words EACH followed with ::
\w+ # Word character(s)
\s* # Optional white-space character(s)
(?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section
(?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section
\{ # Start of function body
"
>
<functionName>
<funcNameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far (move this line right before \w+ if 'prefix::' part NOT desired)
(?: \w+ :: )* # Optional prefix:: part ( package:: / names:: )
\w+ # Word character(s)
"
/>
</functionName>
</function>
</classRange>
<function
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ \h* # Optional leading spaces or tabulations
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
(?: \w+ :: )* # Optional list of words, EACH followed with ::
\w+ # Word character(s)
\s* # Optional white-space character(s)
(?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section
(?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section
\{ # Start of function body
"
>
<functionName>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far ( move this line right before \w+ if part 'prefix::' NOT desired
(?: \w+ :: )* # Optional prefix:: part ( package:: / names:: )
\w+ # Word character(s)
"
/>
</functionName>
<className>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far
\w+ # Word character(s)
( :: \w+ )* # Optional list of words, EACH preceded with ::
(?= :: \w ) # Till a last string ':: + word char' excluded
"
/>
</className>
</function>
</parser>
</functionList>
</NotepadPlus>
In the https://github.com/notepad-plus-plus/notepad-plus-plus/blob/a91b22bd8337465e04c1afa30cb71f7909340293/PowerEditor/Test/FunctionList/perl/unitTest file, I added text at various locations :
Before the line
############### Start ###############
################ Added by guy038 to test Notepad++'s FunctionList
sub animals ( $cat, $autoid = get_id() ) {
say "$auto_id: the cat is $cat";
}
sub _function_été {
return 1
}
Before the line
package NameSpace::Block {
################ Added by guy038 to test Notepad++'s FunctionList
sub grâce::Hôte { return 'running' }
sub grâce::Son_ø { return 'stopped' }
#################################################################
At the very
end of file :
################ Added by guy038 to test Notepad++'s FunctionList
class NewClassSyntax {
method inBlock { return 1 }
method inBlockProto($) { return $_[0] }
method inBlockAttrib :prototype($) { return $_[0] }
}
class Chaîne{
method inBlock { return 1 }
method Dûment($) { return $_[0] }
method ƒ_Hameçon :prototype($) { return $_[0] }
}
#################################################################
In terms of speed, the Function List panel seems quickly displayed. I also did a test copying UniTest.txt twice, and then adding, by regex, _1, _2 and _3 at end of the different names, the Function List panel still appeared without delay !
Best Regards,
guy038