Hello, @peterjones and All,
I finally succeeded to get a new perl.xml file parser !
In short :
I do NOT use any atomic construction !
In mainExpr of the class range, I do NOT use a named group but, simply, use the part ^ (?: package | class ) \b, twice !
I changed your prototype / signature syntax (?:\([^()]*+\)\s*+)?+ to (?: \( [\x20-\x27\x2A-\x7E]* \) \s* )?
I changed your attributes syntax (?:\:[^{]+)?+ to (?: : [\x20-\x7A\x7C-\x7E]+ \s* )?
So, for these two syntaxes, I just supposed that standard ASCII characters are used, from \x20 to \x7E, except for \x28 and \x29 in one part and \x7B in second part ! May be, the \t should be part of each class character, either !
I changed the regex
class name (?x)\s\K[^;{]+ to
(?x)\s+\K.+?(?=[;{])
BTW, my parser presently contains 13 strings \s. May be, the \h or even [\t\x20] should be more appropriate, in some parts ?
Also, how many optional parts (?: \w+ :: )* may exist, before the mandatory \w+ of the function name ?
Anyway, this is a first draft. As I’m definitively not a Perl Expert, I probably missed a lot !
So, here is the first version of my Perl.xml parser :
<?xml version="1.0" encoding="UTF-8" ?>
<!-- ==========================================================================\
|
| To learn how to make your own language parser, please check the following
| link:
| https://npp-user-manual.org/docs/function-list/
|
\=========================================================================== -->
<NotepadPlus>
<functionList>
<!-- ======================================================== [ PERL ] -->
<!-- Perl - functions and packages, including fully-qualtified subroutine names -->
<parser
displayName="Perl" id="perl_syntax"
commentExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-s: # 'Multi-lines' mode ( ^ and $ match at line-breaks ) / 'Dot' char does NOT match line-breaks
\x23 .* # Single Line Comment ( #................ )
) #
| # OR
(?s: # 'Single line' mode (letter s optional as mode set by DEFAULT)
__ (?: END | DATA ) __ # String '__END__' or '__DATA__'
.* # ANY character(s), including line-breaks, till...
\Z # Last line-break, included
)
"
>
<classRange
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Multi-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ # NO leading white-space at start of line
(?: package | class ) \b # Header : word 'package' or 'clas', in LOWER case
(?s: # 'Single line' mode (letter s optional as mode set by DEFAULT)
.+? # ANY character(s), including line-breaks, till...
) # Section below, excluded
(?= # Start of look-ahead
\s* # Optional leading white-space of
^ # NO leading white-space at start of line
(?: package | class ) \b # Next header : word 'package' or 'clas', in LOWER case
| # OR
\Z # last line-break
) # End of look-ahead
"
>
<className>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
\s+ # Leading white-space(s)
\K # Discard text matched so far
.+? # ANY character(s) till...
(?= [;{] ) # First semi-colon or left brace, excluded
"
/>
</className>
<function
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ \h* # Optional leading spaces or tabulations
(?: sub | method ) \b # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
(?: \w+ :: )* # Optional list of words EACH followed with ::
\w+ # Word character(s)
\s* # Optional white-space character(s)
(?: \( [\x20-\x27\x2A-\x7E]* \) \s* )? # Optional Prototype or Signature section
(?: : [\x20-\x7A\x7C-\x7E]+ \s* )? # Optional Attributes section
\{ # Start of function body
"
>
<functionName>
<funcNameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far (move this line right before \w+ if 'prefix::' part NOT desired)
(?: \w+ :: )* # Optional prefix:: part ( package:: / names:: )
\w+ # Word character(s)
"
/>
</functionName>
</function>
</classRange>
<function
mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode
^ \h* # Optional leading spaces or tabulations
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
(?: \w+:: )* # Optional list of words, EACH followed with ::
\w+ # Word character(s)
\s* # Optional white-space character(s)
(?: \( [\x20-\x27\x2A-\x7E]* \) \s* )? # Optional Prototype or Signature section
(?: : [\x20-\x7A\x7C-\x7E]+ \s* )? # Optional Attributes section
\{ # Start of function body
"
>
<functionName>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far ( move this line right before \w+ if part 'prefix::' NOT desired
(?: \w+ :: )* # Optional prefix:: part ( package:: / names:: )
\w+ # Word character(s)
"
/>
</functionName>
<className>
<nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`)
(?: sub | method ) # Word 'sub' or 'method', in LOWER case
\s+ # White-space character(s)
\K # Discard text matched, so far
\w+ # Word character(s)
( :: \w+ )* # Optional list of words, EACH preceded with ::
(?= :: \w ) # Till a last string ':: + word char' excluded
"
/>
</className>
</function>
</parser>
</functionList>
</NotepadPlus>
May be, it would be interesting to compare my version to yours, in terms of speed. To my mind, it’s seems similar !?
Best Regards,
guy038