• 2 Votes
    1 Posts
    23 Views
    No one has replied
  • 0 Votes
    2 Posts
    61 Views
    PeterJonesP

    @P-A ,

    I’ve never seen that happen.

    What version of Notepad++ are you using? If v8.9.2, there is a known bug that causes it to crash when entering keywords with too many characters; it might be you’ve found a new symptom of the same or similar bug. Giving an exact sequence of events would allow us to replicate, and to see if the upcoming fix for the crash also fixes your issue.

    If it’s with v8.9.1 or older, I’d be very surprised. But again, giving an exact sequence of events would allow us to replicate.

    Please share your ? menu’s Debug Info, and the exact steps to replicate your problem.

  • 0 Votes
    5 Posts
    230 Views
    guy038G

    Hi, @kjell-rilbe, @peterjones, @Coises and All,

    In my previous post, I said :

    I did not need to use the atomic forms *+

    I did additional tests and there a difference of execution time between the two solutions : greedy quantifiers vs atomic quantifiers

    If I use the same test file , containing 524,288 correct lines, so with 0 match :

    line 1 : one;two;three;four;five;six;seven;eight;nine;ten;eleven;twelve;end line 524288 : one;two;three;four;five;six;seven;eight;nine;ten;eleven;twelve;end

    The regex ^[^;\r\n]*(?:;[^;\r\n]*){12}$(*SKIP)(*F)|^.+$ displays the message Mark: 0 matches in entire file after between 1.65s and 1.71s

    The regex ^[^;\r\n]*+(?:;[^;\r\n]*+){12}+$(*SKIP)(*F)|^.+ displays the message Mark: 0 matches in entire file after between 1.45s and 1.51s

    Now, if I add the six incorrect lines below, at the very end of the test file :

    line 524289 : one;two;three;four;five;six;seven;eight;nine;ten;eleven;end line 524290 : one;two;three;four;five;six;seven;eight;nine;ten;eleven;twelve;thirteen;end line 524291 : one;two;three;four;five;six;seven;eight;nine;ten;end line 524292 : one;two;three;four;five;six;seven;eight;nine;ten;eleven;twelve;thirteen;fourteen;end line 524293 : one line 524294 : ;two line 524295 :

    The regex ^[^;\r\n]*(?:;[^;\r\n]*){12}$(*SKIP)(*F)|^.+$ displays the message : Mark: 6 matches in entire file after between 1.58s and 1.65s

    The regex ^[^;\r\n]*+(?:;[^;\r\n]*+){12}+$(*SKIP)(*F)|^.+ displays the message Mark: 6 matches in entire file after between 1.45s and 1.51s

    Remark that, if the Match case option is not checked, the execution time increases significantly ( between 6.1s and 6.2s ) :-((

    I repeated each test many times to obtain average values !

    Best Regards,

    guy038

    Of course, the . matches new line option is not cheched and the Wrap around option is checked

  • 0 Votes
    4 Posts
    176 Views
    Terry RT

    @Linen-Gray said in Adblock360Updater Batch File Keeps Appearing:

    but wanted to know if anyone else had experienced this happening and if so how they took care of it.

    Well if you are certain your system isn’t infected then that is a step in the right direction. Just understand that according to the bat file’s contents the “malware” had been apparently residing in the location referenced in almost every line. I would still take a look at that location to be absolutely sure it is gone.

    I am re-reading your initial post and trying to understand what is occurring. You say this “bat” file is opening regularly. What is the app that is opening this file? If it is Notepad++ (the “bat” file shows within a Notepad++ tab), then try to identify the location of that file. It should show the location if you move the mouse pointer over the tab’s title line. Then open that location to have a better look around, you should be able to right click on the tab’s title line and select Open into… and select the Explorer line. Once you are happy that you can delete the file, just close it in Notepad++ and then remove it from the location.

    So in terms of the question has any one else experienced this issue, the answer is no, no one else has mentioned this on this forum. You could easily do a forum search on the string “adblock” but you won’t find this specific one, just mentions of “real” adblock apps.

    Terry

  • 0 Votes
    3 Posts
    141 Views
    M Andre Z EckenrodeM

    @Coises said in Regex matching anomaly:

    If the line endings in the file aren’t consistent, it could mismatch.

    Huh. Right you are. Ironically, that possibility had actually occurred to me, and I even thought I’d checked for it adequately by enabling View > Show Symbol > Show EOL, but apparently I failed to pick the single LF out of all the CRLF. Thanks.

  • 0 Votes
    38 Posts
    1m Views
    Alan KilbornA

    @Chris-Richardson said:

    TextPad++

    I see that there is a fee for this app on the appstore.
    I’d think for that reason alone you won’t get too many users trying this app out.

  • 0 Votes
    4 Posts
    236 Views
    Vitalii DovganV

    Just 6 votes… Not many.

    Anyway, I’m continuing to improve the HTML version of the Manual, this time with close help of Gemini.
    I still don’t understand the HTML/JS/CSS things good enough, but with the trials and re-trials guided by the AI, I think I’ve already achieved such level of flexibility and complexity that rdipardo and Joseph Samuel (who originally significantly helped with bringing the Manual online) may be proud of the result :)
    I seemed to achieve the same behavior between the offline (local) and online (web) version of the HTML Manual today, so you may try it.
    The most important changes have been made around the “Search Topics” logic:

    Now the search results (left frame) and the document content (right frame) both listen to the ‘mouseenter’ event. Once this event happens, the corresponding frame becomes focused, allowing e.g. scrolling by the arrow keys. While working with the search results list (such as changing the selected item via the mouse or the keyboard), the focus remains in the results list, thus allowing to navigate through different documents. Pressing Enter or double-clicking the search results list brings the focus to the document content (right frame). The search results list can be closed by Esc. The http and https links in the documents are blocked for the CHM version of the manual and allowed otherwise.
  • 0 Votes
    4 Posts
    118 Views
    PeterJonesP

    @PeterJones said in File type associations not working:

    it’s at the mercy of the OS as to whether the OS will propagate such settings to the user, or completely bypass them

    It worked reasonably for me. I ran my installed Notepad++ v8.9.1 As Administrator, then went to the File Association setting, picked customize, typed .pcj (which is a file extension that didn’t have any associated app or filetype), then clicked -> to move it to the Registered extensions column. If I then exit Notepad++ (to get out of Admin mode), and double-click on blah.pcj in Windows Explorer, it opened it in Notepad++. (And looking at the registry, I can confirm it added HKCR\.pcj to point to Notepad++_file, and HKCR\Notepad++_file sets the shell\open\command as expected. (But maybe you didn’t think it “worked” at this point, because it doesn’t necessarily change the text of the “file type” column in Explorer. Even if it doesn’t, the double-click did what I expected.)

    When I tried with the preferences dialog misc > .nfo, it edited the existing HJCR\.nfo to point to Notepad++_file (with a Notepad++_backup entry pointing to the original MSInfoFile. When i double-click on an NFO file, Windows actually pops up a Select an App to open this .nfo file, which includes the “Notepad++ (New)” entry (because Windows has been trained to not fully allow applications to hijack extensions, because users hate it when an app does that without their permission) – and from there, you can choose whether you really want to.

    But by doing it through the Windows OS Open With interface to begin with, you make sure Windows knows it’s you who wants the change, not the app, and so lets you do it more easily there.

    IOW: it works for me on Windows 11 as Notepad++ tries it, with the caveats that Windows 11 is trying to protect me from nefarious apps, so might require a confirmation; and when I do things the way Windows OS wants, it works as expected rather than having to do the extra steps.

  • Real-time search results

    Notepad++ & Plugin Development
    2
    0 Votes
    2 Posts
    74 Views
    Mark OlsonM

    @Pawan-Sharma
    If I had to guess, two words: race conditions (and an opposite-ish problem, deadlocks).
    Iteratively updating the results while searching seems like a great way to introduce endless difficult-to-reproduce bugs.

  • survey: Incremental search usefulness

    General Discussion
    82
    1 Votes
    82 Posts
    47k Views
    William4565W

    @PeterJones sure.

  • How come I have two types of fonts in my sentences?

    General Discussion
    9
    0 Votes
    9 Posts
    2k Views
    William4565W

    You can use sites that can transform text, simply put you text and then transform it and place in your sentence, that is how you have two types of fonts

  • Localization problem

    Translation
    9
    1 Votes
    9 Posts
    354 Views
    U

    @xomx
    Thank you very much for the work you have done, which will lead to improvements in Notepad++ in the future.
    I am very grateful to you.

  • 3 Votes
    4 Posts
    5k Views
    ThosRTannerT

    Updated linter++ to v1.0.3

    Two changes of significance here:

    Deal properly with raw UTF8 characters in checkstyle output (mainly from jshint) Added two items to the plugin menu Help which opens the Readme on github pages About which produces a small modal dialogue with the version and a clickable link to the project github repo.
  • 0 Votes
    18 Posts
    18k Views
    O

    For future users, you need to make your own variation of the pre-installed Markdown laguage.

    Go to Languages>User Defined Languages->Define Your Language.

    Then click on a Styler button. I’m using dark mode and just prefer white text for formatting italics bold etc. So I set foreground to transparent, this means no colour override and uses the dark mode settings.

    d7869850-e1bc-4ebe-ac09-75f35e022ade-image.png

    Then go to Save As and give it a name.
    You’ll need to activate the new language variant which will be available in the Languages menu.

  • 0 Votes
    3 Posts
    254 Views
    H

    @Coises

    Thank you, Coises, for your helpful reply. I truly appreciate your support and guidance.

    Regards,
    Harmandeep Singh Kandhari

  • Poll: How Long Have You Used Notepad++?

    Blogs
    7
    8 Votes
    7 Posts
    7k Views
    William4565W

    Its been 5 years I’m using notepad ++ and for me it was and it is very useful tool.

  • need to edit text

    General Discussion
    2
    0 Votes
    2 Posts
    105 Views
    PeterJonesP

    @Joc-Bedenčič ,

    Based on my guess as to what you meant,
    FIND = (^#EXTINF:0,).*$
    REPLACE = $1
    SEARCH MODE = Regular Expression

    That gives the result,

    #EXTINF:0, #EXTTV:Mpeg2;slv; udp://@232.2.1.1:5002 #EXTINF:0, #EXTTV:Mpeg2;slv; udp://@232.2.1.2:5002

    Because that’s my guess as to what you meant by “remove everything behind #EXTINFO:0,”

    If that isn’t what you wanted, you will want to give both “before” and “after” data (“only channel names” has no meaning to someone who doesn’t know the format)

    ----

    Useful References Please Read Before Posting Template for Search/Replace Questions Formatting Forum Posts Notepad++ Online User Manual: Searching/Regex FAQ: Where to find other regular expressions (regex) documentation
  • Notepad++ v8.9.2 Release

    Pinned Announcements
    11
    1 Votes
    11 Posts
    7k Views
    CoisesC

    @PeterJones said in Notepad++ v8.9.2 Release:

    https://github.com/notepad-plus-plus/notepad-plus-plus/issues/17540

    Thanks. I should know better… I forgot to search closed issues, not just open ones.

  • 0 Votes
    13 Posts
    959 Views
    guy038G

    Hello, @peterjones,

    First, read this post to @coises, where I discuss the Unicode concept of identifiers, particularly in Perl !

    Thus, as explained at the end of that post, I created a second version of my perl.xml file parser which should work correctly without significant delay !

    In short :

    I do NOT use any atomic structure !

    In mainExpr of the class range, I do NOT use a named group but, simply, use the part ^ (?: package | class ) \b, twice !

    I changed your prototype / signature syntax (?:\([^()]*+\)\s*+)?+ to (?: \( [\x20-\x7E\w]* \) \s* )?

    I changed your attributes syntax (?:\:[^{]+)?+ to (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )?

    In the two syntaxes above, I simply added \w within each character class

    Note that, from this article https://www.effectiveperlprogramming.com/2015/04/use-v5-20-subroutine-signatures/, the following syntax seems possible :

    sub animals ( $cat, $auto_id = get_id() ) { say "$auto_id: The cat is $cat"; }

    Thus, for prototype / signature syntax, I’ve allowed parentheses within the outer parentheses. If this example seems not pertinent, use the alternate syntax :

    (?: \( [\x20-\x27\x2A\x7E\w]* \) \s* )?

    Finally, I changed the regex class name (?x)\s\K[^;{]+ to (?x) \s+ \K .+? (?= \x20* [;{] )

    BTW, my parser presently contains 13 strings \s. May be, the \h or even the [\t\x20] syntax should be more appropriate, in some parts ?

    <?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | | To learn how to make your own language parser, please check the following | link: | https://npp-user-manual.org/docs/function-list/ | \=========================================================================== --> <NotepadPlus> <functionList> <!-- ======================================================== [ PERL ] --> <!-- Perl - functions and packages, including fully-qualtified subroutine names --> <parser displayName="Perl" id="perl_syntax" commentExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-s: # 'Multi-lines' mode ( ^ and $ match at line-breaks ) / 'Dot' char does NOT match line-breaks \x23 .* # Single Line Comment ( #................ ) ) # | # OR (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) __ (?: END | DATA ) __ # String '__END__' or '__DATA__' .* # ANY character(s), including line-breaks, till... \Z # Last line-break, included ) " > <classRange mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Multi-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ # NO leading white-space at start of line (?: package | class ) \b # Header : word 'package' or 'clas', in LOWER case (?s: # 'Single line' mode (letter s optional as mode set by DEFAULT) .+? # ANY character(s), including line-breaks, till... ) # Section below, excluded (?= # Start of look-ahead \s* # Optional leading white-space of ^ # NO leading white-space at start of line (?: package | class ) \b # Next header : word 'package' or 'clas', in LOWER case | # OR \Z # last line-break ) # End of look-ahead " > <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) \s+ # Leading white-space(s) \K # Discard text matched so far .+? # ANY character(s) till... (?= \x20* [;{] ) # First semi-colon or left brace, excluded " /> </className> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) \b # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <funcNameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far (move this line right before \w+ if 'prefix::' part NOT desired) (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> </function> </classRange> <function mainExpr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?m-i) # 'Mutli-lines' mode (^ and $ match at line-breaks) / 'Sensitive case' mode ^ \h* # Optional leading spaces or tabulations (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) (?: \w+ :: )* # Optional list of words, EACH followed with :: \w+ # Word character(s) \s* # Optional white-space character(s) (?: \( [\x20-\x7E\w]* \) \s* )? # Optional Prototype or Signature section (?: : [\x20-\x7A\x7C-\x7E\w]+ \s* )? # Optional Attributes section \{ # Start of function body " > <functionName> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far ( move this line right before \w+ if part 'prefix::' NOT desired (?: \w+ :: )* # Optional prefix:: part ( package:: / names:: ) \w+ # Word character(s) " /> </functionName> <className> <nameExpr expr="(?x) # 'Free-spacing' mode (see `RegEx - Pattern Modifiers`) (?: sub | method ) # Word 'sub' or 'method', in LOWER case \s+ # White-space character(s) \K # Discard text matched, so far \w+ # Word character(s) ( :: \w+ )* # Optional list of words, EACH preceded with :: (?= :: \w ) # Till a last string ':: + word char' excluded " /> </className> </function> </parser> </functionList> </NotepadPlus>

    In the https://github.com/notepad-plus-plus/notepad-plus-plus/blob/a91b22bd8337465e04c1afa30cb71f7909340293/PowerEditor/Test/FunctionList/perl/unitTest file, I added text at various locations :

    Before the line ############### Start ############### ################ Added by guy038 to test Notepad++'s FunctionList sub animals ( $cat, $autoid = get_id() ) { say "$auto_id: the cat is $cat"; } sub _function_été { return 1 } Before the line package NameSpace::Block { ################ Added by guy038 to test Notepad++'s FunctionList sub grâce::Hôte { return 'running' } sub grâce::Son_ø { return 'stopped' } ################################################################# At the very end of file : ################ Added by guy038 to test Notepad++'s FunctionList class NewClassSyntax { method inBlock { return 1 } method inBlockProto($) { return $_[0] } method inBlockAttrib :prototype($) { return $_[0] } } class Chaîne{ method inBlock { return 1 } method Dûment($) { return $_[0] } method ƒ_Hameçon :prototype($) { return $_[0] } } #################################################################

    In terms of speed, the Function List panel seems quickly displayed. I also did a test copying UniTest.txt twice, and then adding, by regex, _1, _2 and _3 at end of the different names, the Function List panel still appeared without delay !

    Best Regards,

    guy038

  • 5 Votes
    21 Posts
    2k Views
    guy038G

    Hello, @coises, @thomas-knoefel, @peterjones and All,

    @coises, many thanks for your additional info. But, please, don’t be too upset by these regex oddities ! Of course, some class definitions seems different but, in all cases, Columns++ gives more accurate results than native N++ search, anyway !

    In fact, I did all these researches on the Unicode world as I wanted to clarify the status about identifiers, particularly with Perl, in order to find out a simplified formulation for the Function List Perl parser created by @peterjones and improved with your help, by using atomic structures !

    My first attempt was clearly insufficient because I only took ASCII characters into account. Peter adviced me to refer to the article, below :

    https://perldoc.perl.org/perldata#Identifier-parsing

    which explains that, when using UTF-8, the Perl identifier syntax should be :

    / (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) (?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x or in a SINGLE line (?[ ( \p{Word} & \p{XID_Start} ) + [_] ])(?[ ( \p{Word} & \p{XID_Continue} ) ]) *

    Although the properties \p{XID_Start} and \p{XID_Continue} are NOT part of the General Category list and are not functional with the Boost regex engine, this Perl syntax could be expressed, in theory, with our Boost regex engine as :

    (?:(?=\p{XID_Start})\w|_)(?=\p{XID_Continue})\w*

    Now, with the v17.0 release of BabelMap software, I was able to get the complete and exact list of these properties : \p{WORD}, \p{ID_Start}, \p{ID_Continue}, \p{XID_Start}, \p{XID_Continue},

    Then, from these lists, I could deduce the Unicode characters count of the regexes (?:(?=\p{XID_Start})\w|_) and (?=\p{XID_Continue})\w. Refer below :

    # ================================================================================================== # # Unicode 17.0.0 # # From article https://unicode.org/reports/tr18/tr18-23.html#word # # # Derived Property WORD : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 = \p{lettter} or [[:alpha:]] # # + Decimal_Number # Nd 770 = \p{Decimal Digit Number} # ----------- # Total : 146,442 = Columns++ WORD chars - \x{005F} # # + Mc + Me + Mn # M* 2,543 = \p{Mark} # # + Connector_Punctuation # Pc 10 ( including the LOW LINE character \x{005F} ) # # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER ( JOIN-CONTROL character ) # # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER ( JOIN-CONTROL character ) # # => Total = 148,997 characters # # ================================================================================================== # # From file 'DerivedCoreProperties.txt' : # # https://www.unicode.org/Public/UCD/latest/ucd/DerivedCoreProperties.txt # # # Derived Property ID_Start : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = [[:alpha:]] ) # # + Letter_Number # Nl 239 # # + 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA # # + 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA # # + 2118 ; Other_ID_Start # Sm 1 SCRIPT CAPITAL P # # + 212E ; Other_ID_Start # So 1 ESTIMATED SYMBOL # # + 309B ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # + 309C ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as INCLUDED in L* ) # # => Total = 145,916 characters # # ================================================================================================== # # Derived Property XID_Start ( ID_Start MODIFIED for closure under NFKx ) : # # # ID_Start 145,916 # # - 037A ; ID_Start # Lm 1 GREEK YPOGEGRAMMENI # # - 0E33 ; ID_Start # Lo 1 THAI CHARACTER SARA AM # # - 0EB3 ; ID_Start # Lo 1 LAO VOWEL SIGN AM # # - 309B ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # - 309C ; Other_ID_Start # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - FC5E ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # # - FDFA ; ID_Start # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Start # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Start # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Start # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Start # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Start # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Start # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Start # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Start # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Start # Lo 1 ARABIC SUKUN ISOLATED FORM # # - FF9E ; ID_Start # Lm 1 HALFWIDTH KATAKANA VOICED SOUND MARK # - FF9F ; ID_Start # Lm 1 HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK # # => Total = 145,893 characters # # ================================================================================================== # # Derived Property ID_Continue : # # # ID_Start = 145,916 # # - 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA # # - 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA # # The TWO characters above must be SUBTRACTED because they are, both, INCLUDED in 'Other_ID_Start' and in 'Nonspacing Mark' # # + Nonspacing_Mark # Mn 2,059 # # + Spacing_Mark # Mc 471 # # + Decimal_Number # Nd 770 # # + Connector_Punctuation # Pc 10 ( including the LOW LINE char : 005F _ ) # # + 00B7 ; Other_ID_Continue # Po 1 MIDDLE DOT # + 0387 ; Other_ID_Continue # Po 1 GREEK ANO TELEIA # + 1369 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT ONE # + 136A ; Other_ID_Continue # No 1 ETHIOPIC DIGIT TWO # + 136B ; Other_ID_Continue # No 1 ETHIOPIC DIGIT THREE # + 136C ; Other_ID_Continue # No 1 ETHIOPIC DIGIT FOUR # + 136D ; Other_ID_Continue # No 1 ETHIOPIC DIGIT FIVE # + 136E ; Other_ID_Continue # No 1 ETHIOPIC DIGIT SIX # + 136F ; Other_ID_Continue # No 1 ETHIOPIC DIGIT SEVEN # + 1370 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT EIGHT # + 1371 ; Other_ID_Continue # No 1 ETHIOPIC DIGIT NINE # + 19DA ; Other_ID_Continue # No 1 NEW TAI LUE THAM DIGIT ONE # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER # + 30FB ; Other_ID_Continue # Po 1 KATAKANA MIDDLE DOT # + FF65 ; Other_ID_Continue # Po 1 HALFWIDTH KATAKANA MIDDLE DOT # # => Total = 149,240 characters # # ================================================================================================== # # Derived Property XID_Continue ( ID_Continue MODIFIED for closure under NFKx ) : # # # ID_Continue 149,240 # # - 037A ; ID_Continue # Lm 1 GREEK YPOGEGRAMMENI # # - 309B ; ID_Continue # Sk 1 KATAKANA-HIRAGANA VOICED SOUND MARK # # - 309C ; ID_Continue # Sk 1 KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK # # - FC5E ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # - FDFA ; ID_Continue # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Continue # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Continue # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Continue # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Continue # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Continue # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Continue # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Continue # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Continue # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Continue # Lo 1 ARABIC SUKUN ISOLATED FORM # # => Total = 149,221 characters # # ================================================================================================== # # From https://perldoc.perl.org/perldate/#identifier-parsing # # # Intersection of WORD and XID_Start properties + LOW LINE char : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = \p{lettter} or [[:alpha:]] ) # # # + 005F ; Connector_Punctuation # Pc 1 LOW LINE # # + 1885 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI BALUDA ( NON-SPACING mark, common in WORD and XID_Start ) # # + 1886 ; Other_ID_Start # Mn 1 MONGOLIAN LETTER ALI GALI THREE BALUDA ( NON-SPACING mark, common in WORD and XID_Start ) # # # - 037A ; ID_Start # Lm 1 GREEK YPOGEGRAMMENI # # - 0E33 ; ID_Start # Lo 1 THAI CHARACTER SARA AM # # - 0EB3 ; ID_Start # Lo 1 LAO VOWEL SIGN AM # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as ALREADY included in L* ) # # - FC5E ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Start # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # # - FDFA ; ID_Start # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Start # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Start # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Start # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Start # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Start # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Start # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Start # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Start # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Start # Lo 1 ARABIC SUKUN ISOLATED FORM # # - FF9E ; ID_Start # Lm 1 HALFWIDTH KATAKANA VOICED SOUND MARK # - FF9F ; ID_Start # Lm 1 HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK # # => Total = 145,653 characters, which can START an IDENTIFIER # # ================================================================================================== # # From https://perldoc.perl.org/perldate/#identifier-parsing # # # Intersection of WORD and XID_Continue properties : # # # Lu + Ll + Lt + Lm + Lo = # L* 145,672 ( = \p{lettter} or [[:alpha:]] ) # # + Nonspacing_Mark # Mn 2,059 # # + Spacing_Mark # Mc 471 # # + Decimal_Number # Nd 770 # # + Connector_Punctuation # Pc 10 ( including the LOW LINE char : 005F _ ) # # + 200C ; Other_ID_Continue # Cf 1 ZERO WIDTH NON-JOINER ( FORMAT character, common in WORD and XID_Continue ) # # + 200D ; Other_ID_Continue # Cf 1 ZERO WIDTH JOINER ( FORMAT character, common in WORD and XID_Continue ) # # # - 037A ; ID_Continue # Lm 1 GREEK YPOGEGRAMMENI # # - 2E2F ; # Lm 1 VERTICAL TILDE ( as ALREADY included in L* ) # # - FC5E ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM # - FC5F ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRATAN ISOLATED FORM # - FC60 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH FATHA ISOLATED FORM # - FC61 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH DAMMA ISOLATED FORM # - FC62 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH KASRA ISOLATED FORM # - FC63 ; ID_Continue # Lo 1 ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM # # - FDFA ; ID_Continue # Lo 1 ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM # - FDFB ; ID_Continue # Lo 1 ARABIC LIGATURE JALLAJALALOUHOU # # - FE70 ; ID_Continue # Lm 1 ARABIC FATHATAN ISOLATED FORM # - FE72 ; ID_Continue # Lo 1 ARABIC DAMMATAN ISOLATED FORM # - FE74 ; ID_Continue # Lo 1 ARABIC KASRATAN ISOLATED FORM # - FE76 ; ID_Continue # Lo 1 ARABIC FATHA ISOLATED FORM # - FE78 ; ID_Continue # Lo 1 ARABIC DAMMA ISOLATED FORM # - FE7A ; ID_Continue # Lo 1 ARABIC KASRA ISOLATED FORM # - FE7C ; ID_Continue # Lo 1 ARABIC SHADDA ISOLATED FORM # - FE7E ; ID_Continue # Lo 1 ARABIC SUKUN ISOLATED FORM # # => Total = 148,966 characters, which can CONTINUE an IDENTIFIER #

    However, the last two results (?:(?=\p{XID_Start})\w|_) and (?=\p{XID_Continue})\w, above, are true ONLY IF the regex engine would respect all Unicode properties. Unfortunately, from a Boost point of view, which :

    Only considers that word characters are all in the BMP

    Generally considers that word characters are those defined prior to the Unicode 5.3 release !

    I verified that, presently, only 47,681 characters can begin an PERL identifier and only 48,011 characters can continue a PERL identifier !

    So, @Peterjones, in all cases, the regex rules, used in Function List for Perl, are a rough approximation of what they should be !

    Now, Peter, the goal is to get a Perl parser using the approximative BOOST \w definition, without the help of atomic structures.

    Refer to https://community.notepad-plus-plus.org/post/104861

    Best Regards,

    guy038