Community
    • Login
    1. Home
    2. Popular
    Log in to post
    • All Time
    • Day
    • Week
    • Month
    • All Topics
    • New Topics
    • Watched Topics
    • Unreplied Topics
    • All categories
    • U

      Localization problem

      Watching Ignoring Scheduled Pinned Locked Moved Translation
      4
      1 Votes
      4 Posts
      73 Views
      PeterJonesP

      @Uwo222777 said in Localization problem:

      I use my own version of localization (but not the one that comes “bundled” with the Notepad++ release).
      …
      But, after restarting Notepad++, two languages ​​are consistently present in the search window. And this effect is repeated very steadily.

      Not for me. It probably has something to do with your custom localization. Are you sure that your localization has all the fields that the most recent official localizations do? Because if any of those are missing, then it will default to the English terms for those fields. (The Find in Projects entries are in another location of the file, not in the <ProjectManager> section, so I’d look for those values, to make sure your localization has those defined.)

    • donhoD

      Notepad++ v8.9.2 Release

      Watching Ignoring Scheduled Pinned Locked Moved Announcements
      11
      1 Votes
      11 Posts
      5k Views
      CoisesC

      @PeterJones said in Notepad++ v8.9.2 Release:

      https://github.com/notepad-plus-plus/notepad-plus-plus/issues/17540

      Thanks. I should know better… I forgot to search closed issues, not just open ones.

    • J

      Perl keywords "class" and "method" not recognised by Function List

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      13
      0 Votes
      13 Posts
      854 Views
      guy038G

      Hello, @peterjones,

      First, read this post to @coises, where I discuss the Unicode concept of identifiers, particularly in Perl !

      Thus, as explained at the end of that post, I created a second version of my perl.xml file parser which should work correctly without significant delay !

      In short :

      I do NOT use any atomic structure !

      In mainExpr of the class range, I do NOT use a named group but, simply, use the part ^ (?: package | class ) \b, twice !

      I changed your prototype / signature syntax (?:\([^()]*+\)\s*+)?+ to (?: \( [\x20-\x7E\w]* \) \s* )?

      I changed your attributes syntax (?:\:[^{]+)?+ to (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )?

      In the two syntaxes above, I simply added \w within each character class

      Note that, from this article https://www.effectiveperlprogramming.com/2015/04/use-v5-20-subroutine-signatures/, the following syntax seems possible :

      sub animals ( $cat, $auto_id = get_id() ) { say "$auto_id: The cat is $cat"; }

      Thus, for prototype / signature syntax, I’ve allowed parentheses within the outer parentheses. If this example seems not pertinent, use the alternate syntax :

      (?: \( [\x20-\x27\x2A\x7E\w]* \) \s* )?

      Finally, I changed the regex class name (?x)\s\K[^;{]+ to (?x) \s+ \K .+? (?= \x20* [;{] )

      BTW, my parser presently contains 13 strings \s. May be, the \h or even the [\t\x20] syntax should be more appropriate, in some parts ?

      <?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <NotepadPlus> <functionList> <!-- ======================================================== [ PERL ] --> <!-- Perl - functions and packages, including fully-qualtified subroutine names --> <parser displayName="Perl" id="perl_syntax" commentExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-s: # 'Multi-lines' mode ( ^ and $ match at line-breaks ) / 'Dot' char does NOT match line-breaks \x23 .* # Single Line Comment ( #................ ) ) # | # OR (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) __ (?: END | DATA ) __ # String '__END__' or '__DATA__' .* # ANY character(s), including line-breaks, till... \Z # Last line-break, included ) " > <classRange mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Multi-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ # NO leading white-space at start of line (?: package | class ) \b # Header : word 'package' or 'clas', in LOWER case (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) .+? # ANY character(s), including line-breaks, till... ) # Section below, excluded (?= # Start of look-ahead \s* # Optional leading white-space of ^ # NO leading white-space at start of line (?: package | class ) \b # Next header : word 'package' or 'clas', in LOWER case | # OR \Z # last line-break ) # End of look-ahead " > <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) \s+ # Leading white-space(s) \K # Discard text matched so far .+? # ANY character(s) till... (?= \x20* [;{] ) # First semi-colon or left brace, excluded " /> </className> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) \b # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <funcNameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far (move this line right before \w+ if 'prefix::' part NOT desired) (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> </function> </classRange> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words, EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far ( move this line right before \w+ if part 'prefix::' NOT desired (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far \w+ # Word character(s) ( :: \w+ )* # Optional list of words, EACH preceded with :: (?= :: \w ) # Till a last string ':: + word char' excluded " /> </className> </function> </parser> </functionList> </NotepadPlus>

      In the https://github.com/notepad-plus-plus/notepad-plus-plus/blob/a91b22bd8337465e04c1afa30cb71f7909340293/PowerEditor/Test/FunctionList/perl/unitTest file, I added text at various locations :

      Before the line ############### Start ############### ################ Added by guy038 to test Notepad++'s FunctionList sub animals ( $cat, $autoid = get_id() ) { say "$auto_id: the cat is $cat"; } sub _function_été { return 1 } Before the line package NameSpace::Block { ################ Added by guy038 to test Notepad++'s FunctionList sub grâce::Hôte { return 'running' } sub grâce::Son_ø { return 'stopped' } ################################################################# At the very end of file : ################ Added by guy038 to test Notepad++'s FunctionList class NewClassSyntax { method inBlock { return 1 } method inBlockProto($) { return $_[0] } method inBlockAttrib :prototype($) { return $_[0] } } class Chaîne{ method inBlock { return 1 } method Dûment($) { return $_[0] } method ƒ_Hameçon :prototype($) { return $_[0] } } #################################################################

      In terms of speed, the Function List panel seems quickly displayed. I also did a test copying UniTest.txt twice, and then adding, by regex, _1, _2 and _3 at end of the different names, the Function List panel still appeared without delay !

      Best Regards,

      guy038

    • CoisesC

      Columns++ version 1.3: All Unicode, all the time

      Watching Ignoring Scheduled Pinned Locked Moved Notepad++ & Plugin Development
      21
      5 Votes
      21 Posts
      2k Views
      guy038G

      Hello, @coises, @thomas-knoefel, @peterjones and All,

      @coises, many thanks for your additional info. But, please, don’t be too upset by these regex oddities ! Of course, some class definitions seems different but, in all cases, Columns++ gives more accurate results than native N++ search, anyway !

      In fact, I did all these researches on the Unicode world as I wanted to clarify the status about identifiers, particularly with Perl, in order to find out a simplified formulation for the Function List Perl parser created by @peterjones and improved with your help, by using atomic structures !

      My first attempt was clearly insufficient because I only took ASCII characters into account. Peter adviced me to refer to the article, below :

      https://perldoc.perl.org/perldata#Identifier-parsing

      which explains that, when using UTF-8, the Perl identifier syntax should be :

      / (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) (?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x or in a SINGLE line (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])(?[ ( \p{Word} & \p{XID_Continue} ) ]) *

      Although the properties \p{XID_Start} and \p{XID_Continue} are NOT part of the General Category list and are not functional with the Boost regex engine, this Perl syntax could be expressed, in theory, with our Boost regex engine as :

      (?:(?=\p{XID_Start})\w|_)(?=\p{XID_Continue})\w*

      Now, with the v17.0 release of BabelMap software, I was able to get the complete and exact list of these properties : \p{WORD}, \p{ID_Start}, \p{ID_Continue}, \p{XID_Start}, \p{XID_Continue},

      Then, from these lists, I could deduce the Unicode characters count of the regexes (?:(?=\p{XID_Start})\w|_) and (?=\p{XID_Continue})\w. Refer below :

      # ================================================================================================== # # Unicode 17.0.0 # # From article https://unicode.org/reports/tr18/tr18-23.html#word # # # Derived Property WORD : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 = \p{lettter} or [[:alpha:]] # # + Decimal_Number # Nd 770 = \p{Decimal Digit Number} # ----------- # Total : 146,442 = Columns++ WORD chars - \x{005F} # # + Mc + Me + Mn # M* 2,543 = \p{Mark} # # + Connector_Punctuation # Pc 10 ( including the LOW LINE character \x{005F} ) # # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER ( JOIN-CONTROL character ) # # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER ( JOIN-CONTROL character ) # # => Total = 148,997 characters # # ================================================================================================== # # From file 'DerivedCoreProperties.txt' : # # https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt # # # Derived Property ID_Start : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = [[:alpha:]] ) # # + Letter_Number # Nl 239 # # + 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA # # + 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA # # + 2118 ; Other_ID_Start # Sm 1 SCRIPT CAPITAL P # # + 212E ; Other_ID_Start # So 1 ESTIMATED SYMBOL # # + 309B ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # + 309C ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as INCLUDED in L* ) # # => Total = 145,916 characters # # ================================================================================================== # # Derived Property XID_Start ( ID_Start MODIFIED for closure under NFKx ) : # # # ID_Start 145,916 # # - 037A ; ID_Start # Lm 1 GREEK YPOGEGRAMMENI # # - 0E33 ; ID_Start # Lo 1 THAI CHARACTER SARA AM # # - 0EB3 ; ID_Start # Lo 1 LAO VOWEL SIGN AM # # - 309B ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # - 309C ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - FC5E ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # # - FDFA ; ID_Start # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Start # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Start # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Start # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Start # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Start # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Start # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Start # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Start # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Start # Lo 1 ARABIC SUKUN ISOLATED FORM # # - FF9E ; ID_Start # Lm 1 HALFWIDTH KATAKANA VOICED SOUND MARK # - FF9F ; ID_Start # Lm 1 HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK # # => Total = 145,893 characters # # ================================================================================================== # # Derived Property ID_Continue : # # # ID_Start = 145,916 # # - 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA # # - 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA # # The TWO characters above must be SUBTRACTED because they are, both, INCLUDED in 'Other_ID_Start' and in 'Nonspacing Mark' # # + Nonspacing_Mark # Mn 2,059 # # + Spacing_Mark # Mc 471 # # + Decimal_Number # Nd 770 # # + Connector_Punctuation # Pc 10 ( including the LOW LINE char : 005F _ ) # # + 00B7 ; Other_ID_Continue # Po 1 MIDDLE DOT # + 0387 ; Other_ID_Continue # Po 1 GREEK ANO TELEIA # + 1369 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT ONE # + 136A ; Other_ID_Continue # No 1 ETHIOPIC DIGIT TWO # + 136B ; Other_ID_Continue # No 1 ETHIOPIC DIGIT THREE # + 136C ; Other_ID_Continue # No 1 ETHIOPIC DIGIT FOUR # + 136D ; Other_ID_Continue # No 1 ETHIOPIC DIGIT FIVE # + 136E ; Other_ID_Continue # No 1 ETHIOPIC DIGIT SIX # + 136F ; Other_ID_Continue # No 1 ETHIOPIC DIGIT SEVEN # + 1370 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT EIGHT # + 1371 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT NINE # + 19DA ; Other_ID_Continue # No 1 NEW TAI LUE THAM DIGIT ONE # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER # + 30FB ; Other_ID_Continue # Po 1 KATAKANA MIDDLE DOT # + FF65 ; Other_ID_Continue # Po 1 HALFWIDTH KATAKANA MIDDLE DOT # # => Total = 149,240 characters # # ================================================================================================== # # Derived Property XID_Continue ( ID_Continue MODIFIED for closure under NFKx ) : # # # ID_Continue 149,240 # # - 037A ; ID_Continue # Lm 1 GREEK YPOGEGRAMMENI # # - 309B ; ID_Continue # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # - 309C ; ID_Continue # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - FC5E ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # - FDFA ; ID_Continue # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Continue # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Continue # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Continue # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Continue # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Continue # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Continue # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Continue # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Continue # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Continue # Lo 1 ARABIC SUKUN ISOLATED FORM # # => Total = 149,221 characters # # ================================================================================================== # # From https://perldoc.perl.org/perldate/#identifier-parsing # # # Intersection of WORD and XID_Start properties + LOW LINE char : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = \p{lettter} or [[:alpha:]] ) # # # + 005F ; Connector_Punctuation # Pc 1 LOW LINE # # + 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA ( NON-SPACING mark, common in WORD and XID_Start ) # # + 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA ( NON-SPACING mark, common in WORD and XID_Start ) # # # - 037A ; ID_Start # Lm 1 GREEK YPOGEGRAMMENI # # - 0E33 ; ID_Start # Lo 1 THAI CHARACTER SARA AM # # - 0EB3 ; ID_Start # Lo 1 LAO VOWEL SIGN AM # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as ALREADY included in L* ) # # - FC5E ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # # - FDFA ; ID_Start # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Start # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Start # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Start # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Start # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Start # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Start # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Start # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Start # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Start # Lo 1 ARABIC SUKUN ISOLATED FORM # # - FF9E ; ID_Start # Lm 1 HALFWIDTH KATAKANA VOICED SOUND MARK # - FF9F ; ID_Start # Lm 1 HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK # # => Total = 145,653 characters, which can START an IDENTIFIER # # ================================================================================================== # # From https://perldoc.perl.org/perldate/#identifier-parsing # # # Intersection of WORD and XID_Continue properties : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = \p{lettter} or [[:alpha:]] ) # # + Nonspacing_Mark # Mn 2,059 # # + Spacing_Mark # Mc 471 # # + Decimal_Number # Nd 770 # # + Connector_Punctuation # Pc 10 ( including the LOW LINE char : 005F _ ) # # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER ( FORMAT character, common in common in WORD and XID_Continue ) # # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER ( FORMAT character, common in common in WORD and XID_Continue ) # # # - 037A ; ID_Continue # Lm 1 GREEK YPOGEGRAMMENI # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as ALREADY included in L* ) # # - FC5E ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # - FDFA ; ID_Continue # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Continue # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Continue # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Continue # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Continue # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Continue # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Continue # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Continue # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Continue # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Continue # Lo 1 ARABIC SUKUN ISOLATED FORM # # => Total = 148,966 characters, which can CONTINUE an IDENTIFIER #

      However, the last two results (?:(?=\p{XID_Start})\w|_) and (?=\p{XID_Continue})\w, above, are true ONLY IF the regex engine would respect all Unicode properties. Unfortunately, from a Boost point of view, which :

      Only considers that word characters are all in the BMP

      Generally considers that word characters are those defined prior to the Unicode 5.3 release !

      I verified that, presently, only 47,681 characters can begin an PERL identifier and only 48,011 characters can continue a PERL identifier !

      So, @Peterjones, in all cases, the regex rules, used in Function List for Perl, are a rough approximation of what they should be !

      Now, Peter, the goal is to get a Perl parser using the approximative BOOST \w definition, without the help of atomic structures.

      Refer to https://community.notepad-plus-plus.org/post/104861

      Best Regards,

      guy038

    • U

      Possible error or not?

      Watching Ignoring Scheduled Pinned Locked Moved Translation
      4
      1 Votes
      4 Posts
      86 Views
      xomxX

      @PeterJones said in Possible error or not?:

      the old 4096 Mb limit was actually causing a crash, so it had to be lowered to a limit that was 2046 Mb.

      It’s true, and the 2046 is the current right ‘Define Large File Size’ threshold max value for the N++ Scintilla syntax highlighting.

      Note:
      Now we could easily shift that back (but I don’t think it’s a good idea, as the enabled syntax highlighting and the assoc. stuff substantially slow down handling and double the memory consumption needed for such large files…) to the previous larger 4096 threshold, since in the meantime I finally persuaded Don and now the SC_DOCUMENTOPTION_TEXT_LARGE Scintilla docs flag is used as default everywhere (this effectively removes the previous crash possibility for a small increase in the consumed memory price). Some details in:
      https://github.com/notepad-plus-plus/notepad-plus-plus/issues/14944
      https://github.com/notepad-plus-plus/notepad-plus-plus/pull/14982