Hello, @peterjones,
First, read this post to @coises, where I discuss the Unicode concept of identifiers, particularly in Perl !
Thus, as explained at the end of that post, I created a second version of my perl.xml file parser which should work correctly without significant delay !
In short :
I do NOT use any atomic structure !
In mainExpr of the class range, I do NOT use a named group but, simply, use the part ^ (?: package | class ) \b, twice !
I changed your prototype / signature syntax (?:\([^()]*+\)\s*+)?+ to (?: \( [\x20-\x7E\w]* \) \s* )?
I changed your attributes syntax (?:\:[^{]+)?+ to (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )?
In the two syntaxes above, I simply added \w within each character class
Note that, from this article https://www.effectiveperlprogramming.com/2015/04/use-v5-20-subroutine-signatures/, the following syntax seems possible :
sub animals ( $cat, $auto_id = get_id() ) { say "$auto_id: The cat is $cat"; }Thus, for prototype / signature syntax, I’ve allowed parentheses within the outer parentheses. If this example seems not pertinent, use the alternate syntax :
(?: \( [\x20-\x27\x2A\x7E\w]* \) \s* )?
Finally, I changed the regex class name (?x)\s\K[^;{]+ to (?x) \s+ \K .+? (?= \x20* [;{] )BTW, my parser presently contains 13 strings \s. May be, the \h or even the [\t\x20] syntax should be more appropriate, in some parts ?
<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <NotepadPlus> <functionList> <!-- ======================================================== [ PERL ] --> <!-- Perl - functions and packages, including fully-qualtified subroutine names --> <parser displayName="Perl" id="perl_syntax" commentExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-s: # 'Multi-lines' mode ( ^ and $ match at line-breaks ) / 'Dot' char does NOT match line-breaks \x23 .* # Single Line Comment ( #................ ) ) # | # OR (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) __ (?: END | DATA ) __ # String '__END__' or '__DATA__' .* # ANY character(s), including line-breaks, till... \Z # Last line-break, included ) " > <classRange mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Multi-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ # NO leading white-space at start of line (?: package | class ) \b # Header : word 'package' or 'clas', in LOWER case (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) .+? # ANY character(s), including line-breaks, till... ) # Section below, excluded (?= # Start of look-ahead \s* # Optional leading white-space of ^ # NO leading white-space at start of line (?: package | class ) \b # Next header : word 'package' or 'clas', in LOWER case | # OR \Z # last line-break ) # End of look-ahead " > <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) \s+ # Leading white-space(s) \K # Discard text matched so far .+? # ANY character(s) till... (?= \x20* [;{] ) # First semi-colon or left brace, excluded " /> </className> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) \b # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <funcNameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far (move this line right before \w+ if 'prefix::' part NOT desired) (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> </function> </classRange> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words, EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far ( move this line right before \w+ if part 'prefix::' NOT desired (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far \w+ # Word character(s) ( :: \w+ )* # Optional list of words, EACH preceded with :: (?= :: \w ) # Till a last string ':: + word char' excluded " /> </className> </function> </parser> </functionList> </NotepadPlus>In the https://github.com/notepad-plus-plus/notepad-plus-plus/blob/a91b22bd8337465e04c1afa30cb71f7909340293/PowerEditor/Test/FunctionList/perl/unitTest file, I added text at various locations :
Before the line ############### Start ############### ################ Added by guy038 to test Notepad++'s FunctionList sub animals ( $cat, $autoid = get_id() ) { say "$auto_id: the cat is $cat"; } sub _function_été { return 1 } Before the line package NameSpace::Block { ################ Added by guy038 to test Notepad++'s FunctionList sub grâce::Hôte { return 'running' } sub grâce::Son_ø { return 'stopped' } ################################################################# At the very end of file : ################ Added by guy038 to test Notepad++'s FunctionList class NewClassSyntax { method inBlock { return 1 } method inBlockProto($) { return $_[0] } method inBlockAttrib :prototype($) { return $_[0] } } class Chaîne{ method inBlock { return 1 } method Dûment($) { return $_[0] } method ƒ_Hameçon :prototype($) { return $_[0] } } #################################################################In terms of speed, the Function List panel seems quickly displayed. I also did a test copying UniTest.txt twice, and then adding, by regex, _1, _2 and _3 at end of the different names, the Function List panel still appeared without delay !
Best Regards,
guy038