<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Emulation of the &quot;View &gt; Summary&quot; feature with a Python script]]></title><description><![CDATA[<p dir="auto">Hello <strong>All</strong>,</p>
<p dir="auto"><strong>Before</strong> Feb. 18 2024, the <strong>title</strong> of this topic was : <code>Tests and impressions on the "View &gt; Summary...' feature</code></p>
<p dir="auto">Recently, I’ve been looking at the <strong>results</strong> given by the N++ <strong>Summary</strong> feature ( <strong><code>View &gt; Summary...</code></strong> ). And I must say that <strong>numerous</strong> things are really <strong>weird</strong> !</p>
<p dir="auto">For tests, I used contents with a lot of <strong>Unicode</strong> characters, both, in the <strong>Basic Multilingual Plane</strong> and, sometimes, <strong>over</strong> the <strong><code>BMP</code></strong> too, saved in the <strong><code>4</code></strong> N++ <strong>Unicode</strong> encodings files as well as in an <strong><code>ANSI</code></strong> file, containing the <strong><code>256</code></strong> characters of the <strong><code>Windows-1252</code></strong> encoding :</p>
<ul>
<li><strong><code>ANSI</code></strong></li>
<li><strong><code>UTF-8</code></strong></li>
<li><strong><code>UTF-8-BOM</code></strong></li>
<li><strong><code>UCS-2 BE BOM</code></strong></li>
<li><strong><code>UCS-2 LE BOM</code></strong></li>
</ul>
<hr />
<p dir="auto">To my mind, there are <strong><code>3</code></strong> <strong>major</strong> problems and some <strong>minor</strong> points :</p>
<ul>
<li>
<p dir="auto">The first and <strong>worse</strong> problem is the fact that, when an <strong><code>UTF-8[-BOM]</code></strong> file, containing various <strong>Unicode</strong> chars ( of the <strong><code>BMP</code></strong> only : this point is <strong>important</strong> ! ) is <strong>copied</strong> in an <strong><code>UCS-2 BE BOM</code></strong> or <strong><code>UCS-2 LE BOM</code></strong> <strong>encoded</strong> file, some results, given by the <strong><code>Summary</code></strong> feature for these <strong>new</strong> files, are <strong>totally</strong> wrong :</p>
<ul>
<li>
<p dir="auto">The <strong><code>characters( without line endings )</code></strong> value seems to be the number of <strong>bytes</strong> used in the <strong>corresponding</strong> <strong><code>UTF-8[-BOM]</code></strong> file</p>
</li>
<li>
<p dir="auto">The <strong><code>Document length</code></strong> value seems to be the document length of the <strong>corresponding</strong> <strong><code>UTF-8[-BOM]</code></strong> file and is also displayed, unfortunately, in the <strong>status bar</strong> !</p>
</li>
</ul>
</li>
<li>
<p dir="auto">The second problem is that the definition of a <strong>word</strong> char, by the <strong><code>Summary</code></strong> feature is definitively <strong>NOT</strong> the same of the <strong>definition</strong>  of the regex <strong><code>\w</code></strong>, as explained further on !</p>
</li>
<li>
<p dir="auto">Thus, the third problem is that the of given number of <strong>words</strong> is <strong>totally</strong> inaccurate ! And, anyway, the <strong>number</strong> of words, although <strong>well enough</strong> defined for an <strong><code>English / American</code></strong> text, is rather a <strong>vague</strong> notion, for a lot of texts written in <strong>other</strong> languages, especially <strong>Asiatic</strong> ones ! ( See further on )</p>
</li>
<li>
<p dir="auto">Some <strong>minor</strong> things :</p>
<ul>
<li>
<p dir="auto">The number of <strong>lines</strong> given is, most of the time, <strong>increased</strong> by <strong>one</strong> unit</p>
</li>
<li>
<p dir="auto">Presently, the <strong>Summary</strong> feature displays the <strong>document length</strong> in the Notepad++ buffer. I think it would be good to display, as well, the <strong>actual</strong> document length saved on <strong>disk</strong>. Incidentally, for just <strong>saved</strong> documents, it would give, by <strong>difference</strong>, the length of the possible <strong><code>Byte Order Mark</code></strong>, if its <strong>size</strong> wouldn’t be <strong>explicitly</strong> displayed !</p>
</li>
<li>
<p dir="auto">For <strong><code>UTF-8</code></strong> or <strong><code>UTF-8-BOM</code></strong> encoded files, a decomposition, giving the <strong>number</strong> of chars coded with <strong><code>1</code></strong>, <strong><code>2</code></strong>, <strong><code>3</code></strong> and <strong><code>4</code></strong> bytes, for chars <strong>over</strong> the <strong><code>BMP</code></strong>, would be <strong>welcome</strong> !</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">So, in brief, in the <strong>present</strong> <strong><code>Summary</code></strong> window :</p>
<ul>
<li>
<p dir="auto">The <strong><code>Characters (without line endings):</code></strong> number is <strong>wrong</strong> for the <strong><code>UCS-2 BE BOM</code></strong> or <strong><code>UCS-2 LE BOM</code></strong> encodings</p>
</li>
<li>
<p dir="auto">The <strong><code>Words</code></strong> number is totally <strong>wrong</strong>, given the <strong>regex</strong> definition of a <strong>word</strong> character, <strong>whatever</strong> the encoding used</p>
</li>
<li>
<p dir="auto">The <strong><code>Lines:</code></strong> number is <strong>wrong</strong>, by <strong>one</strong> unit, if a <strong>line-break</strong> ends the <strong>last</strong> line of current file, in <strong>any</strong> encoding</p>
</li>
<li>
<p dir="auto">The <strong><code>Document length</code></strong> value, in N++ <strong>buffer</strong>, is <strong>wrong</strong> for the <strong><code>UCS-2 BE BOM</code></strong> or <strong><code>UCS-2 LE BOM</code></strong> encodings, as well as the <strong><code>Length:</code></strong> indication in the <strong>status</strong> bar</p>
</li>
</ul>
<p dir="auto">Note, that I’m about to create an <strong>issue</strong> for the <strong>wrong</strong> results returned for <strong><code>UCS-2 BE BOM</code></strong> and <strong><code>UCS-2 LE BOM</code></strong> encoded files !</p>
<hr />
<p dir="auto">To begin with, let’s me develop the… <strong>second</strong> bug ! After <strong>numerous</strong> tests, I determined that, in the <strong>present</strong> <strong><code>View &gt; Summary...</code></strong> feature, the characters, considered a <strong>word</strong> character, are :</p>
<ul>
<li>
<p dir="auto">The <strong>C0 control</strong> characters, except for the <strong>Tabulation</strong> ( <strong><code>\x{0009}</code></strong> ) and the <strong>two EOL</strong> ( <strong><code>\x{000a}</code></strong> and <strong><code>\x{000d}</code></strong> ), so the regex <strong><code>(?![\t\r\n])[\x00-\x1F]</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>number</strong> sign <strong><code>#</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>10</code></strong> <strong>digits</strong>, so the regex <strong><code>[0-9]</code></strong>                                                                                    :</p>
</li>
<li>
<p dir="auto">The <strong><code>26</code></strong> <strong>uppercase</strong> and <strong>lowercase</strong> letters, so the regex <strong><code>(?i)[A-Z]</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>low line</strong> character <strong><code>_</code></strong></p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters, of the <strong>Basic Multilingual Plane</strong> ( <strong><code>BMP</code></strong> ), with code-point <strong>over</strong> <strong><code>\x{007E}</code></strong>, so the regex <strong><code>(?![\x{D800}-\x{DFFF}])[\x{007F}-\x{FFFF}]</code></strong> for a <strong><code>Unicode</code></strong> encoded file or <strong><code>[\x7F-\xFF]</code></strong> for an <strong><code>ANSI</code></strong> encoded file</p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters, <strong>over</strong> the <strong>Basic Multilingual Plane</strong>, so the regex <strong><code>(?-s).[\x{D800}-\x{DFFF}]</code></strong> for an <strong><code>Unicode </code></strong> encoded file, <strong>only</strong></p>
</li>
</ul>
<p dir="auto">To <strong>simulate</strong> the present <strong><code>Words:</code></strong> number ( which is <strong>erroneous</strong> ! ), given by the <strong>summary</strong> feature, <strong>whatever</strong> the file <strong>encoding</strong>, simply use the regex below :</p>
<pre><code class="language-z">[^\t\n\r\x20!"$%&amp;'()*+,\-./:;&lt;=&gt;?@\x5B\x5C\x5D^\x60{|}~]+
</code></pre>
<p dir="auto">and click on the <strong><code>Count</code></strong> button of the <strong>Find</strong> dialog, with the <strong><code>Wrap around</code></strong> option <strong>ticked</strong></p>
<p dir="auto">Obviously, this is <strong>not</strong> exact as a single <strong>word</strong> character is matched with the <strong><code>\w</code></strong> regex, which is the class <strong><code>[\u\l\d_]</code></strong>, where <strong><code>\u</code></strong>, <strong><code>\l</code></strong> and <strong><code>\d</code></strong> represents any <strong>Unicode</strong> <strong><code>uppercase</code></strong>, <strong><code>lowercase</code></strong> and <strong><code>digit</code></strong> char or a <strong>related</strong> char, so, finally, <strong>much more</strong> than the simple <strong><code>[A-Za-z0-9]</code></strong> set !</p>
<p dir="auto">But , worse, it’s the notion of <strong>word</strong> which is practically, <strong>not consistent</strong>, most of the time ! Indeed, for instance, if we consider the <strong>French</strong> expression <strong><code>l'école</code></strong> ( the school ), the regex <strong><code>\w+</code></strong> would return <strong><code>2</code></strong> words, which is <strong>correct</strong> as this expression can be mentally decomposed as <strong><code>la école</code></strong>. However, this regex would <strong>wrongly</strong> say the that the <strong>single</strong> word <strong><code>aujourd'hui</code></strong> ( today ) is a <strong>two-words</strong> expression. Of course,  you could change the regex as <strong><code>[\w']+</code></strong> which would return <strong><code>1</code></strong> word, but, this time, the expression <strong><code>l'école</code></strong> would <strong>wrongly</strong> be considered as a <strong>one-word</strong> string !</p>
<p dir="auto">In addition, what can be said about languages that do <strong>not</strong> use the <strong><code>Space</code></strong> character or where the use of the <strong><code>Space</code></strong> is <strong>discretionary</strong> ? Then, <strong>counting</strong> of words is impossible or rather <strong>non-significant</strong> ! This is developed in this <strong>Martin Haspelmath</strong>’s article, below :</p>
<p dir="auto"><a href="https://zenodo.org/record/225844/files/WordSegmentationFL.pdf" rel="nofollow ugc">https://zenodo.org/record/225844/files/WordSegmentationFL.pdf</a></p>
<blockquote>
<p dir="auto">At end of section <strong>5</strong>, it is said : … On such a view, the claim that “all languages have words” (Radford et al. 1999: 145) would be interpretable only in the weaker sense that "<strong>all languages have a unit which falls between the minimal sign and the phrase</strong>” …</p>
</blockquote>
<blockquote>
<p dir="auto">And : … The basic problem remains the same: The units are defined in a <strong>language-specific</strong> way and cannot be <strong>equated across languages</strong>, and there is <strong>no</strong> reason to give <strong>special</strong> status to a unit called <strong>‘word’</strong>. …</p>
</blockquote>
<blockquote>
<p dir="auto">At beginning  of section, <strong>7</strong> : … Linguists have <strong>no good basis for identifying words</strong> across languages …</p>
</blockquote>
<blockquote>
<p dir="auto">And in the <strong>conclusion</strong>, section <strong>10</strong> : … I conclude, from the arguments presented in this article, that there is <strong>no definition of ‘word’</strong> that can be applied to <strong>any</strong> language and that would yield <strong>consistent</strong> results …</p>
</blockquote>
<hr />
<p dir="auto">Now, the <strong>Unicode</strong> definition of a <strong>word</strong> character is :</p>
<p dir="auto"><strong><code>\p{gc=Alphabetic} | \p{gc=Mark} | \p{gc=Decimal_Number} | \p{gc=Connector_Punctuation} | \p{Join-Control}</code></strong></p>
<p dir="auto"><a href="https://stackoverflow.com/questions/5555613/does-w-match-all-alphanumeric-characters-defined-in-the-unicode-standard" rel="nofollow ugc">https://stackoverflow.com/questions/5555613/does-w-match-all-alphanumeric-characters-defined-in-the-unicode-standard</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries" rel="nofollow ugc">https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries</a></p>
<p dir="auto">So, in theory, the <strong><code>word_character</code></strong> class should include :</p>
<ul>
<li>
<p dir="auto"><strong>All</strong> values of the <strong>derived</strong> category <strong>Alphabetic</strong> ( = <strong><code>alpha</code></strong> = <strong><code>\p{alphabetic}</code></strong> ) so <strong><code>132,875 chars</code></strong>, from the <strong>DerivedCoreProperties.txt</strong> file, which can be decomposed into :</p>
<ul>
<li>
<p dir="auto"><strong>Uppercase_Letter</strong> (<strong><code>Lu</code></strong>) + <strong>Lowercase_Letter</strong> (<strong><code>Ll</code></strong>) + <strong>Titlecase_Letter</strong> (<strong><code>Lt</code></strong>) + Modifier_Letter (<strong><code>Lm</code></strong>) + <strong>Other_Letter</strong> (<strong><code>Lo</code></strong>) + <strong>Letter_Number</strong> (<strong><code>Nl</code></strong>) + <strong>Other_Alphabetic</strong>, so the characters sum <strong><code>1,791 + 2,155  + 31 + 260 + 127,004 + 236 + 1,398</code></strong></p>
</li>
<li>
<p dir="auto"><strong>Note</strong> : The last  property <strong>Other_Alphabetic</strong>, from the <strong>Prop_list.txt</strong> file, contains some, but <strong>not all</strong>, characters from the <strong><code>3</code></strong> General_Categories <strong>Spacing_Mark</strong> ( <strong><code>Mc</code></strong> ), <strong>Nonspacing_Mark</strong> ( <strong><code>Mn</code></strong> ) and <strong>Other_Symbol</strong> ( <strong><code>So</code></strong> ), so the characters sum <strong><code>417 + 851 + 130</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto"><strong>All</strong> values with <strong>General_Category</strong> = <strong><code>Decimal_Number</code></strong>, from the <strong>DerivedGeneralCategory.txt</strong> file, so <strong><code>650</code></strong> characters</p>
<p dir="auto">( These are characters, with <strong>defined</strong> values in the <strong>three</strong> fields <strong><code>6</code></strong>, <strong><code>7</code></strong> and <strong><code>8</code></strong> of the <strong>UnicodeData.txt</strong> file</p>
</li>
<li>
<p dir="auto"><strong>All</strong> values with <strong>General_Category</strong> = <strong><code>Connector_Punctuation</code></strong>, from the <strong>DerivedGeneralCategory.txt</strong> file, so <strong><code>10</code></strong> characters</p>
</li>
<li>
<p dir="auto"><strong>All</strong> values with the <strong>binary</strong> Property <strong><code>Join_Control</code></strong>, from the <strong>PropList.txt</strong> file, so <strong><code>2</code></strong> characters</p>
</li>
</ul>
<p dir="auto">So, if we include all <strong>Unicode</strong> languages, even <strong>historical</strong> ones :</p>
<p dir="auto">=&gt; <strong>Total</strong> number of Unicode <strong>word</strong> characters = <strong><code>132,875 + 650 + 10 + 2</code></strong> = <strong><code>133,537</code></strong> characters, with version <strong>UNICODE</strong> <strong><code> 13.0.0</code></strong> !!</p>
<p dir="auto"><strong>Notes</strong> :</p>
<ul>
<li>The <strong>different</strong> files mentioned can be downloaded from the <strong>Unicode Character Database</strong> ( <strong><code>UCD</code></strong> ) or <strong>sub-directories</strong>, below :</li>
</ul>
<p dir="auto"><a href="http://www.unicode.org/Public/UCD/latest/ucd/" rel="nofollow ugc">http://www.unicode.org/Public/UCD/latest/ucd/</a></p>
<ul>
<li>And refer to the <strong>sites</strong>, below, for <strong>additional</strong> information to this topic :</li>
</ul>
<p dir="auto"><a href="https://www.unicode.org/reports/tr18/#Compatibility_Properties" rel="nofollow ugc">https://www.unicode.org/reports/tr18/#Compatibility_Properties</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr29/#Word_Boundaries" rel="nofollow ugc">https://www.unicode.org/reports/tr29/#Word_Boundaries</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr31/" rel="nofollow ugc">https://www.unicode.org/reports/tr31/</a>    for tables <strong><code>4</code></strong>, <strong><code>5</code></strong> and <strong><code>6</code></strong> of section <strong><code>2.4</code></strong></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr44/#UnicodeData.txt" rel="nofollow ugc">https://www.unicode.org/reports/tr44/#UnicodeData.txt</a></p>
<hr />
<p dir="auto">If someone did click on the links to the <strong>Unicode Consortium</strong>, above, one understood, very quickly, that <strong>word</strong> characters and word <strong>boundaries</strong> notions are a real <strong>nightmare</strong> !</p>
<p dir="auto">Even if we <strong>restrict</strong> the definition of <strong>word</strong> chars to Unicode <strong>living</strong> scripts, forgetting all the <strong>historical</strong> scripts not in use, and also leaving <strong>aside</strong> all scripts which do <strong>not</strong> use the <strong>space</strong> char to, systematically, <strong>delimit</strong> words, we still have a list of about <strong><code>21,000</code></strong> characters which should be considered as <strong>word</strong> character ! I tried to build up such a list, with the <strong>help</strong> of these sites :</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries" rel="nofollow ugc">https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries</a></p>
<p dir="auto"><a href="https://linguistlist.org/issues/6/6-1302/" rel="nofollow ugc">https://linguistlist.org/issues/6/6-1302/</a></p>
<p dir="auto"><a href="https://unicode-org.github.io/cldr-staging/charts/37/supplemental/scripts_and_languages.html" rel="nofollow ugc">https://unicode-org.github.io/cldr-staging/charts/37/supplemental/scripts_and_languages.html</a></p>
<p dir="auto"><a href="https://scriptsource.org/cms/scripts/page.php?item_id=script_overview" rel="nofollow ugc">https://scriptsource.org/cms/scripts/page.php?item_id=script_overview</a></p>
<p dir="auto"><a href="https://r12a.github.io/scripts/featurelist/" rel="nofollow ugc">https://r12a.github.io/scripts/featurelist/</a></p>
<p dir="auto">And I ended up with this list of <strong><code>46</code></strong> <strong>living</strong> scripts which always use a <strong><code>Space</code></strong> character between <strong>words</strong> :</p>
<pre><code class="language-diff">•------------------------•----------------•-------------------•-----------------•
|                        |    SCRIPT      |   SPACE between   |  UNICODE Script |
|                        |      Type :    |      Words :      |     Class :     |
|                        •----------------•-------------------•-----------------•
|           SCRIPT       |  (L)iving      |  (Y)es            |  (R)ecommended  |
|                        |                |  (U)nspecified    |  (L)imited      |
|                        |  (H)istorical  |  (D)iscretionary  |  (E)xcluded     |
|                        |                |  (N)o             |                 |
•------------------------•----------------•-------------------•-----------------•
|  ARMENIAN              |       L        |         Y         |        R        |
|  ADLAM                 |       L        |         Y         |        L        |
|  ARABIC                |       L        |         Y         |        R        |
|  BAMUM                 |       L        |         Y         |        L        |
|  BASSA VAH             |       L        |         Y         |        E        |
|  BENGALI ( Assamese )  |       L        |         Y         |        R        |
|  BOPOMOFO              |       L        |         Y         |        R        |
|  BUGINESE              |       L        |         D         |        E        |
|  CANADIAN SYLLABICS    |       L        |         Y         |        L        |
|  CHEROKEE              |       L        |         Y         |        L        |
|  CYRILLIC              |       L        |         Y         |        R        |
|  DEVANAGARI            |       L        |         Y         |        R        |
|  ETHIOPIC (Ge'ez)      |       L        |         Y         |        R        |
|  GEORGIAN              |       L        |         Y         |        R        |
|  GREEK                 |       L        |         Y         |        R        |
|  GUJARATI              |       L        |         Y         |        R        |
|  GURMUKHI              |       L        |         Y         |        R        |
|  HANGUL                |       L        |         Y         |        R        |
|  HANIFI ROHINGYA       |       L        |         Y         |        L        |
|  HEBREW                |       L        |         Y         |        R        |
|  KANNADA               |       L        |         Y         |        R        |
|  KAYAH LI              |       L        |         Y         |        L        |
|  LATIN                 |       L        |         Y         |        R        |
|  LIMBU                 |       L        |         Y         |        L        |
|  MALAYALAM             |       L        |         D         |        R        |
|  MANDAIC               |       H        |         Y         |        L        |
|  MEETEI MAYEK          |       L        |         Y         |        L        |
|  MIAO (Pollard)        |       L        |         Y         |        L        |
|  MONGOLIAN             |       L        |         Y         |        E        |
|  NEWA                  |       L        |         Y         |        L        |
|  NKO                   |       L        |         Y         |        L        |
|  OL CHIKI              |       L        |         Y         |        L        |
|  ORIYA (Odia)          |       L        |         Y         |        R        |
|  OSAGE                 |       L        |         Y         |        L        |
|  SINHALA               |       L        |         Y         |        R        |
|  SUNDANESE             |       L        |         Y         |        L        |
|  SYLOTI NAGRI          |       L        |         Y         |        L        |
|  SYRIAC                |       L        |         Y         |        L        |
|  TAi VIET              |       L        |         Y         |        L        |
|  TAMIL                 |       L        |         Y         |        R        |
|  TELUGU                |       L        |         Y         |        R        |
|  THAANA                |       L        |         D         |        R        |
|  TIFINAGH (Berber)     |       L        |         Y         |        L        |
|  VAI                   |       L        |         Y         |        L        |
|  WANCHO                |       L        |         Y         |        L        |
|  YI                    |       L        |         Y         |        L        |
•------------------------•----------------•-------------------•-----------------•
</code></pre>
<p dir="auto">These scripts involve <strong><code>101</code></strong> legal <strong>Unicode</strong> scripts, from <strong>Basic Latin</strong> ( <strong><code>0000 - 007F</code></strong> ) till <strong>Symbols for Legacy Computing</strong> ( <strong><code>1FB00 - 1FBFF</code></strong> )</p>
<hr />
<p dir="auto">You may, also, have a look to these sites for <strong>general</strong> information :</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/List_of_Unicode_characters" rel="nofollow ugc">https://en.wikipedia.org/wiki/List_of_Unicode_characters</a></p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Scriptio_continua#Decline" rel="nofollow ugc">https://en.wikipedia.org/wiki/Scriptio_continua#Decline</a></p>
<p dir="auto"><a href="https://glottolog.org/glottolog/language" rel="nofollow ugc">https://glottolog.org/glottolog/language</a>    especially to <strong>locate</strong> the area where a <strong>language</strong> is used</p>
<p dir="auto"><strong>Continued</strong> discussion in the <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/20218/emulation-of-the-view-summary-feature-with-a-python-script</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 15:40:56 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/20218.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 25 Oct 2020 19:32:15 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 24 Mar 2024 11:53:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said in <a href="/post/93770">Emulation of the "View &gt; Summary" feature with a Python script</a>:</p>
<blockquote>
<p dir="auto">did you receive my e-mail to you, on March, 21, with an attached zip archive to possibly reproduce the problem ?</p>
</blockquote>
<p dir="auto">Hi Guy.  Yes, I did receive it but haven’t had time to work with it.  Because of your prompting, however, I just did finish evaluating it.</p>
<p dir="auto">I believe that what is happening in the buggy case is that <a href="https://github.com/bruderstein/PythonScript/issues/248" rel="nofollow ugc">THIS</a> PS bug is manifesting (side note: it’s a bug that <em><strong>I</strong></em> reported).  When the caret is at the first location in the file (aka position 0) – which is one of your test cases – then the bug kicks in.</p>
<p dir="auto">The bug has been fixed, but I don’t believe there has been a release of PS2 <em>after</em> the fixing, so only PS3 contains the fix (which is why I – running PS3 – did not see a problem with your script code that did not include the <code>bytes_count</code> check against <code>0</code>).</p>
<p dir="auto">I hope this clears it up.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93771</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93771</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Sun, 24 Mar 2024 11:53:48 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 24 Mar 2024 11:09:27 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a>,</p>
<p dir="auto">BTW, regarding the <strong>bug</strong> that you <strong>cannot</strong> identify, did you receive my <strong>e-mail</strong> to you, on March, <strong>21</strong>, with an <strong>attached</strong> zip archive to <strong>possibly</strong> reproduce the problem ?</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93770</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93770</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 24 Mar 2024 11:09:27 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Mon, 04 Mar 2024 13:02:14 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said:</p>
<blockquote>
<p dir="auto">You should see, in the SELECTION(S) line, a non-null number of words</p>
</blockquote>
<p dir="auto">Well, I tried, using both PS3 and PS2, using license file and code change of: <code>#if Bytes_count != 0:</code>, and I still see in the output:</p>
<p dir="auto"><code>SELECTION(S)      :  0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range</code></p>
]]></description><link>https://community.notepad-plus-plus.org/post/93387</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93387</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 04 Mar 2024 13:02:14 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 03 Mar 2024 14:47:08 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>All</strong>,</p>
<p dir="auto">Ah… OK. No problem ! So, this script will work with both <strong>Python script</strong> <strong><code>2</code></strong> and <strong><code>3</code></strong>, nice !</p>
<hr />
<p dir="auto">Regarding the <strong>bug</strong>, I can reproduce it very <strong>easily</strong> !</p>
<p dir="auto">So, we use this part of the script, relative to <strong>selections</strong>, where I put the line <strong><code>if Bytes_count != 0:</code></strong> in <strong>comments</strong> :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0
    Words_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
#        if Bytes_count != 0:
        editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<p dir="auto">Then :</p>
<ul>
<li>
<p dir="auto">Open, let’s say, the <strong><code>license.txt</code></strong> file</p>
</li>
<li>
<p dir="auto">Move the <strong>caret</strong> to the <strong>very beginning</strong> of the <strong><code>license.txt</code></strong> file ( so, before the letter <strong>C</strong> of the word <strong><code>COPYING</code></strong> )</p>
</li>
<li>
<p dir="auto">Do <strong>not</strong> do any selection</p>
</li>
<li>
<p dir="auto">Run the script</p>
</li>
</ul>
<p dir="auto">=&gt; You should see, in the <strong><code>SELECTION(S)</code></strong> line, a <strong>non-null</strong> number of words :</p>
<pre><code class="language-diff"> SELECTION(S)      :  0 selected char, 5822 selected words (0 selected byte) in 1 EMPTY range
</code></pre>
<ul>
<li>
<p dir="auto">Now, just move the caret <strong>one</strong> character on the <strong>right</strong> ( so, between the <strong>C</strong> and the <strong>O</strong> letters of the word <strong><code>COPYING</code></strong> )</p>
</li>
<li>
<p dir="auto">Do <strong>not</strong> do any selection, again</p>
</li>
<li>
<p dir="auto"><strong>Re</strong>-run the script</p>
</li>
</ul>
<p dir="auto">=&gt; This time, we get, for the <strong><code>SELECTION(S)</code></strong> line, the <strong>expected</strong> results :</p>
<pre><code class="language-diff"> SELECTION(S)      :  0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range
</code></pre>
<p dir="auto">At first sight, this bug occurs <strong>only</strong> when the caret is at the <strong>very beginning</strong> of <strong>current</strong> file !</p>
<p dir="auto">Once, you’ll find an <strong>explanation</strong> ( if any ! ), I will post the <strong>new</strong> version of the script.</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> : May be, this bug do <strong>not</strong> occur with <strong><code>Python script 3</code></strong> ?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93357</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93357</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 03 Mar 2024 14:47:08 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 03 Mar 2024 12:16:36 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said :</p>
<blockquote>
<p dir="auto">But if this unique zero-length selection was on a non-empty line, it would wrongly write…</p>
</blockquote>
<p dir="auto">I removed the <code>if Bytes_count != 0:</code> and tried to replicate the problem you indicated, but did not see the same issue.  Can you provide more detail on your “steps to reproduce”?</p>
<hr />
<p dir="auto">Also, this line of your script gave me an error under Python3:</p>
<p dir="auto"><code>File_name = notepad.getCurrentFilename().decode('utf-8')</code></p>
<p dir="auto">Here’s a way to make it work under Python2 or 3:</p>
<pre><code>import sys
python3 = sys.version_info.major == 3
if python3:
    File_name = notepad.getCurrentFilename()
else:
    File_name = notepad.getCurrentFilename().decode('utf-8')
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/93354</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93354</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Sun, 03 Mar 2024 12:16:36 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 03 Mar 2024 11:02:27 GMT]]></title><description><![CDATA[<p dir="auto">Hi <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>all</strong>,</p>
<p dir="auto"><strong>Continuation</strong> of version <strong><code>v1.2</code></strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

print ('START')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Bytes_length = editor.getLength()

Total_chars = editor.countCharacters(0, editor.getLength())

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\n|\r', number)

Total_EOL = num

print ('EOL')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\t|\x20', number)

Blank_chars = num

print ('BLANK')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_standard = Total_chars - Total_EOL

True_chars = Total_chars - Total_EOL - Blank_chars

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':

    Total_BMP = Total_standard
    
    Total_1_byte = Total_BMP

    Total_2_bytes = 0

    Total_3_bytes = 0

    Total_4_bytes = 0

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':

    num = 0
    editor.research(r'[\x{0080}-\x{07FF}]', number)

    Total_2_bytes = num

    print ('2-BYTES')

    # --------------------------------------------------------------------------------------------------------------------------------------------------------------

    num = 0
    editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number)

    Total_3_bytes = num

    print ('3-BYTES')

    # -----------------------------------------------------------------------------------------------------------------------------

    Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3

    Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes

    Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------


if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':

    num = 0
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP chars different from '\r' and '\n'

    Total_2_bytes = num

    Total_4_bytes = Total_standard - Total_2_bytes

    Total_BMP = Total_2_bytes

    Total_1_byte = 0

    Total_3_bytes = 0

    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

    print ('2-BYTES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\d', number)

Number_chars = num

print ('NUMBERS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'_', number)

Lowline_chars = num

print ('LOW_LINES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w', number)

Word_chars = num

print ('WORDS')

Letter_chars = Word_chars - Number_chars - Lowline_chars

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_total = num

print ('WORDS+')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex_non_space = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_regex_non_space = True

Non_space_count = num

print ('NON-SPACE+')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex_sentence = False

num = 0

try:
    editor.research(r'(?-s)(?:\A|(?&lt;=[\h\r\n.?!])).+?(?:(?=[.?!](\h|\R|\z))|(?=\R|\z))', number)
except RuntimeError:
    Err_regex_sentence = True

Sentence_count = num

print ('SENTENCES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
Err_regex_paragraph = False

num = 0

try:
    editor.research(r'(?-s)(?:(?:.[\x{D800}-\x{DFFF}]?)+(?:\r\n|\n|\r))+(?:\r\n|\n|\r){1,}(?:(?:.[\x{D800}-\x{DFFF}]?)+\z)?|(?:.[\x{D800}-\x{DFFF}]?)+\z', number)
except RuntimeError:
    Err_regex_paragraph = True

Paragraph_count = num

print ('PARAGRAPHS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^(?:\r\n|\n|\r)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\n|\r)', number)

Special_empty = num

num = 0
editor.research(r'^(?:\r\n|\n|\r)', number)

Default_empty = num

Empty_lines = Default_empty - Special_empty

print ('EMPTY lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^[\t\x20]+(?:\r\n|\n|\r|\z)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\n|\r|\z)', number)

Special_blank = num

num = 0
editor.research(r'^[\t\x20]+(?:\r\n|\n|\r|\z)', number)

Default_blank = num

Blank_lines = Default_blank - Special_blank

print ('BLANK lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_lines = editor.getLineCount()

num = 0
editor.research(r'(?-s)^.+\z', number)

if num == 0:
    Total_lines = Total_lines - 1  #  Because LAST line totally EMPTY

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0
    Words_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
        if Bytes_count != 0:
            editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '
    else:
        Txt_bytes = ' selected bytes) in '

    if Chars_count &lt; 2:
        Txt_chars = ' selected char, '
    else:
        Txt_chars = ' selected chars, '

    if Words_count &lt; 2:
        Txt_words = ' selected word ('
    else:
        Txt_words = ' selected words ('

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

console.hide()

line_list = []  # empty list

Line_end = '\r\n'

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )')

line_list.append ('-' * Line_title + Line_end)

line_list.append (' FULL File Path    :  ' + File_name + Line_end)

if os.path.isfile(File_name) == True:

    line_list.append (' CREATION     Date :  ' + Creation_date)

    line_list.append (' MODIFICATION Date :  ' + Modif_date + Line_end)

    line_list.append (' READ-ONLY flag    :  ' + RO_flag)

line_list.append (' READ-ONLY editor  :  ' + RO_editor + Line_end * 2)

line_list.append (' Current VIEW      :  ' + Curr_view + Line_end)

line_list.append (' Current ENCODING  :  ' + Curr_encoding + Line_end)

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')' + Line_end)

line_list.append (' Current Line END  :  ' + Curr_eol + Line_end)

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + Line_end * 2)

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + Line_end)

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + Line_end)

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard) + Line_end * 2)

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL))

line_list.append (' SPC &amp; TAB  Chars  :  ' + str(Blank_chars))

line_list.append (' TRUE       Chars  :  ' + str(True_chars) + Line_end)

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + Line_end * 2)

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + Line_end)

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + Line_end * 2)
else:
    if Line_end == '\r\n':
        line_list.append (Line_end)

line_list.append (' NUMBER     Chars  :  ' + str(Number_chars) + '\t(*)')

line_list.append (' LOW_LINE   Chars  :  ' + str(Lowline_chars))

line_list.append (' LETTER     Chars  :  ' + str(Letter_chars) + '\t(*)' + Line_end)

line_list.append (' WORD       Chars  :  ' + str(Word_chars) + '\t(*)' + Line_end * 2)

line_list.append (' WORDS      Count  :  ' + str(Words_total) + '\t(*)' + Line_end)

if Err_regex_non_space == False:
    line_list.append (' NON-SPACE  Count  :  ' + str(Non_space_count) + '\t(**)' + Line_end * 2)
else:
    line_list.append (' NON-SPACE  Count  :  ' + str(Non_space_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2)

if Err_regex_sentence == False:
    line_list.append (' SENTENCES  Count  :  ' + str(Sentence_count) + '\t(**)' + Line_end)
else:
    line_list.append (' SENTENCES  Count  :  ' + str(Sentence_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end)

if Err_regex_paragraph == False:
    line_list.append (' PARAGRAPHS Count  :  ' + str(Paragraph_count) + '\t(**)' + Line_end * 2)
else:
    line_list.append (' PARAGRAPHS Count  :  ' + str(Paragraph_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2)

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + Line_end)

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + Line_end)

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + Line_end * 2)

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + '\r\n' + Line_end)

line_list.append (' (*)   Our BOOST regex engine ignore all WORD, NUMBER and LETTER characters over the BMP and may ignore some others within the BMP !')

line_list.append (' (**)  The results may NOT be very accurate for "technical" or "non-regular" files !' + Line_end)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93353</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93353</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 03 Mar 2024 11:02:27 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 03 Mar 2024 11:01:02 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>All</strong>,</p>
<p dir="auto">Below, the <strong><code>v1.2</code></strong> version of the <strong>Python</strong> script for an <strong>enhanced</strong> <strong><code>Summary</code></strong> feature :</p>
<ul>
<li>
<p dir="auto">I decomposed the <strong>total</strong> number of chars in <strong><code>3</code></strong> parts : <strong>EOL</strong> chars, <strong>Space and Tab</strong> chars and <strong>True</strong> chars ( <strong><code>[^\t\x20\r\n]</code></strong> )</p>
</li>
<li>
<p dir="auto">I also decomposed the <strong>total</strong> number of <strong>word</strong> chars in <strong><code>3</code></strong> parts : <strong>letters</strong> chars, <strong>digits</strong> chars and <strong>low_line</strong> chars</p>
</li>
<li>
<p dir="auto">I added a count of the <strong>paragraphs</strong> ( You may <strong>adapt</strong> the corresponding regex to your needs )</p>
</li>
<li>
<p dir="auto">I added a count of the <strong>sentences</strong> ( You may <strong>adapt</strong> the corresponding regex to your needs )</p>
</li>
<li>
<p dir="auto">I added some <strong>remarks</strong> at the end of the <strong>summary</strong> report, regarding the global <strong>accurancy</strong> of some results !</p>
</li>
</ul>
<hr />
<p dir="auto">Now, <strong>Alan</strong>, I needed to change this part, regarding the <strong>selections</strong> :</p>
<pre><code class="language-py">    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
        editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num
</code></pre>
<p dir="auto">by this one :</p>
<pre><code class="language-py">    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
        if Bytes_count != 0:
            editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num
</code></pre>
<p dir="auto">Because, if the unique <strong>zero-length</strong> selection was on a <strong>pure empty</strong> line, it did write, as <strong>expected</strong>, the message :</p>
<pre><code class="language-diff">0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range
</code></pre>
<p dir="auto">But if this unique <strong>zero-length</strong> selection was on a <strong>non-empty</strong> line, it would <strong>wrongly</strong> write, for example :</p>
<pre><code class="language-diff">0 selected char, **`568`** selected words (0 selected byte) in 1 EMPTY range
</code></pre>
<p dir="auto">Given that the total file contains <strong><code>568</code></strong> words</p>
<hr />
<p dir="auto">So, here is the <strong><code>v1.2</code></strong> version of my script, split on <strong>two</strong> posts :</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v1.2 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

console.show()

console.clear()

Start_time = time.time()

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename().decode('utf-8')

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93352</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93352</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 03 Mar 2024 11:01:02 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Wed, 21 Feb 2024 03:36:45 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a>,</p>
<p dir="auto">Many thanks for the <strong>tip</strong> ! I did some <strong>Google</strong> searches before, but just saw some <strong>obscur</strong> explanations. But, right now, trying again with this <strong>question</strong> :</p>
<p dir="auto"><strong><code>How to get "os.path.isfile(Filename)" == True: when Filename contains "NON ASCII" chars ?</code></strong></p>
<p dir="auto">And reading the <strong>first</strong> article, named <em>“python - UnicodeEncodeError on joining file name”</em>, on Jan. 05 2010, from the site <strong><code>Stack Overflow</code></strong>, it is <strong>textually</strong> said, in the middle of the article :</p>
<p dir="auto"><strong><code>So I would first try filename = filename.decode('utf-8') -- that should allow the os.path.join to work</code></strong></p>
<hr />
<p dir="auto">Now, I won’t bother to re-edit my script with a <strong>new</strong> version number ! I just changed, in my <strong><code>v1.1</code></strong> version, above, the line :</p>
<pre><code class="language-py">File_name = notepad.getCurrentFilename()
</code></pre>
<p dir="auto">by this one :</p>
<pre><code class="language-py">File_name = notepad.getCurrentFilename().decode('utf-8')
</code></pre>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93067</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93067</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 21 Feb 2024 03:36:45 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Tue, 20 Feb 2024 15:48:27 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said in <a href="/post/93034">Emulation of the "View &gt; Summary" feature with a Python script</a>:</p>
<blockquote>
<p dir="auto">how to recognize the filename even if current file or current path contain NON-ASCII characters ?</p>
</blockquote>
<p dir="auto">Short answer:  This is better done with Python3, i.e., PythonScript 3.x.  Then things “just work”.  :-)</p>
<p dir="auto">But, for Python2, (and PS 2.x) you can make a call to <code>.encode('utf-8')</code> or <code>.decode('utf-8')</code> – depending upon your circumstance (I’m not commenting on your specific code) – in order to get what you need.</p>
<p dir="auto">Basically, if you have a Python2 string (in a variable <code>s</code>) and you want to get a Unicode string (for things like Windows pathnames with non-trivial characters), use <code>s.decode('utf-8')</code> and to go the other way, where you have a Unicode str (in a variable <code>u</code>) and you want a Python2 str, do <code>u.encode('utf-8')</code>.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93035</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93035</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Tue, 20 Feb 2024 15:48:27 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Tue, 20 Feb 2024 14:58:03 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and Python <strong>gurus</strong>,</p>
<p dir="auto">I’ve just found out a <strong>bug</strong> when trying to run my script against à <strong>“French”</strong> file called <strong><code>Numéros</code></strong> ( which means <strong><code>Numbers</code></strong> ) :-((</p>
<hr />
<p dir="auto">In that Python <strong>section</strong> of my script below, it detects if the <strong>current</strong> tab is associated with a <strong>true</strong> file, <strong>saved</strong> on disk, or if the <strong>current</strong> tab refers to a <strong><code>new #</code></strong> file, <strong>not</strong> saved yet</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">And <strong>unfortunately</strong>, if current name contains <strong>accentuated</strong> characters, like <strong><code>Numéros</code></strong>, it <strong>wrongly</strong> suppose it’s a <strong><code>new # </code></strong> file !</p>
<p dir="auto">As soon as it is <strong>renamed</strong> as <strong><code>Numeros</code></strong>, everything is <strong>OK</strong> again</p>
<p dir="auto">So, how to recognize the <strong>filename</strong> even if <strong>current</strong> file or <strong>current</strong> path contain <strong><code>NON-ASCII</code></strong> characters ?</p>
<p dir="auto">TIA</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93034</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93034</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Tue, 20 Feb 2024 14:58:03 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Wed, 21 Feb 2024 01:49:34 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>Alan</strong> and <strong>all</strong>,</p>
<p dir="auto"><strong>Continuation</strong> of version <strong><code>v1.1</code></strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

console.show()

console.clear()

Start_time = time.time()

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename().decode('utf-8')

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

print ('START')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Bytes_length = editor.getLength()

Total_chars = editor.countCharacters(0, editor.getLength())

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

print ('EOL')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_standard = Total_chars - Total_EOL

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':

    Total_BMP = Total_standard
    
    Total_1_byte = Total_BMP

    Total_2_bytes = 0

    Total_3_bytes = 0

    Total_4_bytes = 0

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':

    num = 0
    editor.research(r'[\x{0080}-\x{07FF}]', number)

    Total_2_bytes = num

    print ('2-BYTES')

    # --------------------------------------------------------------------------------------------------------------------------------------------------------------

    num = 0
    editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number)

    Total_3_bytes = num

    print ('3-BYTES')

    # -----------------------------------------------------------------------------------------------------------------------------

    Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3

    Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes

    Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------


if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':

    num = 0
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP chars different from '\r' and '\n'

    Total_2_bytes = num

    Total_4_bytes = Total_standard - Total_2_bytes

    Total_BMP = Total_2_bytes

    Total_1_byte = 0

    Total_3_bytes = 0

    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

    print ('2-BYTES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\t|\x20', number)

Non_blank_chars = Total_standard - num

print ('NON-BLANK')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_total = num

print ('WORDS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_regex = True

Non_space_count = num

print ('NON-SPACE')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\r|\n)', number)

Special_empty = num

num = 0
editor.research(r'^(?:\r\n|\r|\n)', number)

Default_empty = num

Empty_lines = Default_empty - Special_empty

print ('EMPTY lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Special_blank = num

num = 0
editor.research(r'^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Default_blank = num

Blank_lines = Default_blank - Special_blank

print ('BLANK lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_lines = editor.getLineCount()

num = 0
editor.research(r'(?-s)^.+\z', number)

if num == 0:
    Total_lines = Total_lines - 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0
    Words_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
        editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '
    else:
        Txt_bytes = ' selected bytes) in '

    if Chars_count &lt; 2:
        Txt_chars = ' selected char, '
    else:
        Txt_chars = ' selected chars, '

    if Words_count &lt; 2:
        Txt_words = ' selected word ('
    else:
        Txt_words = ' selected words ('

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

console.hide()

line_list = []  # empty list

Line_end = '\r\n'

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )')

line_list.append ('-' * Line_title + Line_end)

line_list.append (' FULL File Path    :  ' + File_name + Line_end)

if os.path.isfile(File_name) == True:

    line_list.append (' CREATION     Date :  ' + Creation_date)

    line_list.append (' MODIFICATION Date :  ' + Modif_date + Line_end)

    line_list.append (' READ-ONLY flag    :  ' + RO_flag)

line_list.append (' READ-ONLY editor  :  ' + RO_editor + Line_end * 2)

line_list.append (' Current VIEW      :  ' + Curr_view + Line_end)

line_list.append (' Current ENCODING  :  ' + Curr_encoding + Line_end)

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')' + Line_end)

line_list.append (' Current Line END  :  ' + Curr_eol + Line_end)

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + Line_end * 2)

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + Line_end)

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + Line_end)

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + Line_end)

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + Line_end * 2)

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + Line_end)

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + Line_end * 2)
else:
    if Line_end == '\r\n':
        line_list.append (Line_end)

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + Line_end)

line_list.append (' WORDS     Count   :  ' + str(Words_total) + ' (Caution !)' + Line_end)

if Err_regex == False:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + Line_end * 2)
else:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + ' (Caution as " RuntimeError " occured !)' + Line_end * 2)


line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + Line_end)

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + Line_end)

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + Line_end * 2)

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + Line_end)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93031</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93031</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Wed, 21 Feb 2024 01:49:34 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Tue, 20 Feb 2024 12:05:15 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>All</strong>,</p>
<p dir="auto">Following your <strong>advice</strong>, I included the number of <strong>selected</strong> words <strong><code>\w+</code></strong> in the <strong>last</strong> line of the <strong><code>summary</code></strong> report, regarding the <strong>different</strong> selections</p>
<p dir="auto">If needed, the OP may choose this <strong>second</strong> syntax, which includes the <strong>hyphen</strong>, the <strong>apostrophe</strong> and the <strong>Right Single Quotation Mark</strong>, when surrounded by <strong>word</strong> chars, as <strong>true words</strong> chars !</p>
<p dir="auto">SEARCH <strong><code>(?:(?&lt;=\w)[-'’](?=\w)|\w)+</code></strong></p>
<p dir="auto">And thus, <strong>replace</strong> the line</p>
<pre><code class="language-py">        editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
</code></pre>
<p dir="auto">by this one :</p>
<pre><code class="language-py">        editor.research(r'(?:(?&lt;=\w)[-'’](?=\w)|\w)+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
</code></pre>
<hr />
<p dir="auto">So, here is the <strong><code>v1.1</code></strong> version of my script, split on <strong>two</strong> posts :</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v1.1 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93030</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93030</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Tue, 20 Feb 2024 12:05:15 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Mon, 19 Feb 2024 12:19:55 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a></p>
<p dir="auto">I was considering recommending your script as a basis for the solution to <a href="https://community.notepad-plus-plus.org/topic/25493/word-count-in-status-bar-on-text-selection">THIS</a> inquiry, but then I noticed that your script doesn’t report word-count in selected text – perhaps it should do that as well?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/93003</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/93003</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 19 Feb 2024 12:19:55 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Mon, 19 Feb 2024 07:44:31 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>all</strong>,</p>
<p dir="auto"><strong>Continuation</strong> of version <strong><code>v1.0</code></strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

console.show()

console.clear()

Start_time = time.time()

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

print ('START')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Bytes_length = editor.getLength()

Total_chars = editor.countCharacters(0, editor.getLength())

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

print ('EOL')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_standard = Total_chars - Total_EOL

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':

    Total_BMP = Total_standard
    
    Total_1_byte = Total_BMP

    Total_2_bytes = 0

    Total_3_bytes = 0

    Total_4_bytes = 0

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':

    num = 0
    editor.research(r'[\x{0080}-\x{07FF}]', number)

    Total_2_bytes = num

    print ('2-BYTES')

    # --------------------------------------------------------------------------------------------------------------------------------------------------------------

    num = 0
    editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number)

    Total_3_bytes = num

    print ('3-BYTES')

    # -----------------------------------------------------------------------------------------------------------------------------

    Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3

    Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes

    Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------


if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':

    num = 0
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP chars different from '\r' and '\n'

    Total_2_bytes = num

    Total_4_bytes = Total_standard - Total_2_bytes

    Total_BMP = Total_2_bytes

    Total_1_byte = 0

    Total_3_bytes = 0

    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

    print ('2-BYTES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\t|\x20', number)

Non_blank_chars = Total_standard - num

print ('NON-BLANK')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

print ('WORDS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_regex = True

Non_space_count = num

print ('NON-SPACE')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\r|\n)', number)

Special_empty = num

num = 0
editor.research(r'^(?:\r\n|\r|\n)', number)

Default_empty = num

Empty_lines = Default_empty - Special_empty

print ('EMPTY lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Special_blank = num

num = 0
editor.research(r'^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Default_blank = num

Blank_lines = Default_blank - Special_blank

print ('BLANK lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_lines = editor.getLineCount()

num = 0
editor.research(r'(?-s)^.+\z', number)

if num == 0:
    Total_lines = Total_lines - 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count &lt; 2:
        Txt_chars = ' selected char ('
    else:
        Txt_chars = ' selected chars ('


    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '
    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range\n'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

console.hide()

line_list = []  # empty list

Line_end = '\r\n'

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )')

line_list.append ('-' * Line_title + Line_end)

line_list.append (' FULL File Path    :  ' + File_name + Line_end)

if os.path.isfile(File_name) == True:

    line_list.append (' CREATION     Date :  ' + Creation_date)

    line_list.append (' MODIFICATION Date :  ' + Modif_date + Line_end)

    line_list.append (' READ-ONLY flag    :  ' + RO_flag)

line_list.append (' READ-ONLY editor  :  ' + RO_editor + Line_end * 2)

line_list.append (' Current VIEW      :  ' + Curr_view + Line_end)

line_list.append (' Current ENCODING  :  ' + Curr_encoding + Line_end)

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')' + Line_end)

line_list.append (' Current Line END  :  ' + Curr_eol + Line_end)

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + Line_end * 2)

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + Line_end)

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + Line_end)

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + Line_end)

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + Line_end * 2)

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + Line_end)

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + Line_end * 2)
else:
    if Line_end == '\r\n':
        line_list.append (Line_end)

line_list.append (' NON-Blank Count   :  ' + str(Non_blank_chars) + Line_end)

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)' + Line_end)

if Err_regex == False:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + Line_end * 2)
else:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + ' (Caution as " RuntimeError " occured !)' + Line_end * 2)


line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + Line_end)

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + Line_end)

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + Line_end * 2)

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">Remenber that you can use a <strong>shorter</strong> <strong><code>summary</code></strong> report by changing the line :</p>
<pre><code class="language-py">Line_end = '\r\n'
</code></pre>
<p dir="auto">by this one :</p>
<pre><code class="language-py">Line_end = ''
</code></pre>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92984</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92984</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Mon, 19 Feb 2024 07:44:31 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 18 Feb 2024 19:42:13 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto">You’ll find, below, the <strong><code>v1.0</code></strong> version of my script. I changed a <strong>lot</strong> of things :</p>
<ul>
<li>
<p dir="auto">I add a counter to get the <strong>execution time</strong> of the script, which is written right <strong>after</strong> the current <strong>date</strong>, at the <strong>beginning</strong> of the summary</p>
</li>
<li>
<p dir="auto">I modified some <strong>regexes</strong> in order to <strong>improve</strong> their performance as well as the <strong>order</strong> to search them for</p>
</li>
<li>
<p dir="auto">I used the <strong>Pythonscript</strong> methods <strong><code>.editor.getLength()</code></strong>, <strong><code>editor.countCharacters(0, editor.getLength())</code></strong> and <strong><code>editor.getLineCount()</code></strong> to get, respectively, the <strong>bytes length</strong> ( <strong>without</strong> a possible <strong><code>BOM</code></strong> ) value, the <strong>Total_chars</strong> value and the <strong>Total_lines</strong> value. Note that, in case of an <strong><code>UTF-8</code></strong> or <strong><code>UTF-8-BOM</code></strong> encoded file, we get <strong>two</strong> relations :</p>
<ul>
<li>(<strong>A</strong>) <strong><code>Buffer length - Total_EOL - Total_1_byte - 2 × Total_2_bytes - 3 × Total_3_bytes = 4 × Total_4_bytes</code></strong></li>
<li>(<strong>B</strong>) <strong><code>Total_Chars - Total_EOL - Total_1_byte - Total_2_bytes - Total_3_bytes           = Total_4_bytes</code></strong></li>
</ul>
</li>
</ul>
<p dir="auto">So, we can deduce, from the <strong>relation</strong> <strong><code>A-B</code></strong>, the equations :</p>
<p dir="auto"><strong><code>Total_4_bytes = ( Total_length - Total_chars - Total_2_bytes - 2 × Total_3_bytes ) / 3</code></strong></p>
<p dir="auto">and then :</p>
<p dir="auto"><strong><code>Total_1_byte = Total_chars - Total_EOL - Total_2_bytes - Total_3_bytes - Total_4_bytes</code></strong></p>
<p dir="auto">Thus, <strong>after</strong> counting the number of <strong><code>Total_2_bytes</code></strong> and <strong><code>Total_3_bytes</code></strong>, the <strong>two</strong> results <strong><code>Total_4_bytes</code></strong> and <strong><code>Total_1_byte</code></strong> are <strong>easily</strong> deduced. This new way <strong>decreases</strong>, from a factor <strong><code>2</code></strong> to <strong><code>3</code></strong>, the <strong>execution</strong> time of the script, because, most of the time, the file contains <strong>only</strong> <strong><code>1-byte</code></strong> chars :-))</p>
<p dir="auto">However, the <strong><code>Buffer_length</code></strong> value <strong>wrongly</strong> remains the <strong>same</strong>, in case of an <strong><code>UTF-16 BE BOM</code></strong> or <strong><code>UTF-16 LE BOM</code></strong> encoded file. Thus, I needed to calcul the <strong><code>Total_4_bytes</code></strong> and <strong><code>Buffer_length</code></strong> values, from the number of <strong><code>Total_2_bytes</code></strong>, with the relations :</p>
<p dir="auto"><strong><code>Total_4_bytes = Total_chars - Total_EOL - Total_2_bytes</code></strong></p>
<p dir="auto"><strong><code>Bytes_length = 2 * Total_EOL + 2 * Total_2_bytes + 4 × Total_4_bytes</code></strong></p>
<ul>
<li>
<p dir="auto">Now, because some <strong>huge</strong> files may lead to a <strong>long</strong> time before getting the <strong><code>Summary</code></strong> results ( even with the <strong>native</strong> N++ version, BTW ! ), you can follow the progression of the different <strong>searches</strong> on the <strong><code>Python</code></strong> <strong>console</strong>, which is automatically <strong>enabled</strong> at beginning of the script and <strong>disabled</strong> right <strong>before</strong> outputting the results</p>
</li>
<li>
<p dir="auto">At the <strong>end</strong> of the script, I just replace the <strong><code>notepad.prompt</code></strong> method by the <strong><code>notepad.messageBox</code></strong> method in order to display the <strong>warning</strong> ( more <strong>logical</strong> ! )</p>
</li>
</ul>
<hr />
<p dir="auto"><strong>IMPORTANT</strong> :</p>
<ul>
<li>
<p dir="auto"><strong>Never</strong> switch to an <strong>other</strong> tab when running this script. Else, you’ll probably get <strong>unpredictable</strong> or <strong>negative</strong> results !</p>
</li>
<li>
<p dir="auto">Thus, by viewing the <strong>console</strong> messages, if you think that the results seem too <strong>long</strong> to happen for a <strong>specific</strong> file and that you prefer to <strong>abort</strong> its <strong><code>Summary</code></strong> report, simply <strong>stop</strong> the <strong>current</strong> <strong><code>Python</code></strong> script with the classical <strong><code>Plugins &gt; Python Script &gt; Stop script</code></strong> menu option</p>
</li>
</ul>
<hr />
<p dir="auto">Now, I was a bit upset by some <strong>inconsistant</strong> results regarding the number of <strong><code>NON-SPACE</code></strong> strings, when <strong>current</strong> file, with an <strong><code>Unicode</code></strong> encoding, contains some bytes <strong>over</strong> the <strong><code>BMP</code></strong></p>
<p dir="auto">So, I searched among <strong>all</strong> my posts, since <strong>2013</strong>, as well as some others used as <strong>documentation</strong>, for <strong>only</strong> those containing some <strong><code>four-bytes</code></strong> characters and here is the <strong>list</strong> of these files with the <strong>reported</strong> results :</p>
<pre><code class="language-diff">•=============================•===========•=================•==================•============•================•
|                             |           |    Expected     |  Summary Report  |            |                |
|           Filename          |   4_BYTES |         NON-SPACE count            | Difference |    Encoding    |
|                             |           | (?:(?!\s).[\x{D800}-\x{DFFF}]?)+   |            |                |
•=============================•===========•=================•==================•============•================•
|  Symbola_Monospacified.txt  |   11,951  |     199,891     |      199,882     |      - 9   |  UTF-8-BOM     |
|  Total_Chars.txt            |  262,136  |           9     |           18     |      + 9   |  UTF-8-BOM     |
•=============================•===========•=================•==================•============•================•
|  Caractères.txt             |    2,901  |       7,361     |        7,358     |      - 3   |  UTF-8-BOM     |
|  Test_2.txt                 |    1,276  |           8     |            9     |      + 1   |  UTF-8         |
|  Test_1.txt                 |      881  |           8     |            9     |      + 1   |  UTF-8         |
|  Plane_0.txt                |        0  |           9     |           10     |      + 1   |  UCS-2 BE BOM  |
|  Clemens.txt                |    3,968  |       2,816     |        2,818     |      + 2   |  UTF-8-BOM     |
|  Planes_0+1.txt             |   65,534  |           9     |           12     |      + 3   |  UTF-8-BOM     |
•=============================•===========•=================•==================•============•================•
|  Chars_Over_BMP.txt         |       28  |         455     |          455     |        0   |  UTF-8-BOM     |
|  Entites_by_Name.txt        |      133  |      15,968     |       15,968     |        0   |  UTF-8         |
|  Entites_by_Number.txt      |      133  |      15,968     |       15,968     |        0   |  UTF-8         |
|  Invisible_chars.txt        |       31  |       3,459     |        3,459     |        0   |  UTF-8-BOM     |
|  Osmanya_Tout.txt           |      119  |         605     |          605     |        0   |  UTF-8-BOM     |
|  Smileys.txt                |    1,031  |      10,157     |       10,157     |        0   |  UTF-8-BOM     |
|  Alan_K.txt                 |      114  |      46,082     |       46,082     |        0   |  UTF-8         |
|  Alexolog.txt               |       13  |       2,199     |        2,199     |        0   |  UTF-8         |
|  André_Z.txt                |        8  |       5,860     |        5,860     |        0   |  UTF-8         |
|  Bidule.txt                 |        1  |         327     |          327     |        0   |  UTF-8         |
|  Carypt.txt                 |        1  |       3,551     |        3,551     |        0   |  UTF-8         |
|  Dean_Corso.txt             |      761  |       9,632     |        9,632     |        0   |  UTF-8         |
|  Don_Ho.txt                 |        2  |      41,426     |       41,426     |        0   |  UTF-8         |
|  Durkin.txt                 |      144  |       4,638     |        4,638     |        0   |  UTF-8         |
|  Dylan.txt                  |       34  |       2,180     |        2,180     |        0   |  UTF-8         |
|  Furek.txt                  |       20  |         499     |          499     |        0   |  UTF-8         |
|  Gary_2.txt                 |        2  |         458     |          458     |        0   |  UTF-8         |
|  Haleba.txt                 |        5  |         817     |          817     |        0   |  UTF-8         |
|  ImSpecial.txt              |        1  |         161     |          161     |        0   |  UTF-8         |
|  Joss.txt                   |        6  |         105     |          105     |        0   |  UTF-8         |
|  JR.txt                     |       39  |       1,735     |        1,735     |        0   |  UTF-8         |
|  Mark_Olson.txt             |        1  |       3,652     |        3,652     |        0   |  UTF-8         |
|  Minus_Majus.txt            |       62  |       9,931     |        9,931     |        0   |  UTF-8         |
|  Niting-jain.txt            |        4  |         537     |          537     |        0   |  UTF-8         |
|  PeterCJ.txt                |       31  |      37,323     |       37,323     |        0   |  UTF-8         |
|  Petr_jaja.txt              |       14  |       3,168     |        3,168     |        0   |  UTF-8         |
|  Pintas.txt                 |        4  |         614     |          614     |        0   |  UTF-8         |
|  Register.txt               |       20  |         242     |          242     |        0   |  UTF-8         |
|  Scott_3.txt                |        4  |      42,552     |       42,552     |        0   |  UTF-8         |
|  Skevich.txt                |        6  |         715     |          715     |        0   |  UTF-8         |
|  Statistiques.txt           |        7  |       9,012     |        9,012     |        0   |  UTF-8         |
|  Summary.txt                |        7  |       4,322     |        4,322     |        0   |  UTF-8         |
|  Summary_NEW.txt            |       10  |       8,903     |        8,903     |        0   |  UTF-8         |
|  Uzivatel.txt               |        2  |         873     |          873     |        0   |  UTF-8         |
|  Xavier_mdq.txt             |       13  |       3,652     |        3,652     |        0   |  UTF-8         |
|  Text.txt                   |    2,400  |       1,000     |        1,000     |        0   |  UTF-8         |
•============================•============•=================•==================•============•================•
</code></pre>
<p dir="auto">From that list, I deduced that the number of <strong>NON-space</strong> chars is <strong>erroneous</strong> in very <strong>rare</strong> cases, especially when current file contains <strong>consecutively</strong> :</p>
<ul>
<li>
<p dir="auto"><strong>All</strong> the characters of a font</p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters of an <strong><code>Unicode</code></strong> range</p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters of <strong>all</strong> <strong><code>Unicode</code></strong> ranges</p>
</li>
</ul>
<p dir="auto">Luckily, in all the <strong>other</strong> cases, with a <strong>random</strong> position of these <strong><code>four-bytes</code></strong> chars, the <strong><code>Summary</code></strong> report <strong>always</strong> gives the <strong>right</strong> results, regarding the <strong><code>NON-SPACE</code></strong> count !</p>
<hr />
<p dir="auto">Here is the <strong><code>v1.0</code></strong> version of my script, split on <strong>two</strong> posts :</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v1.0 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92983</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92983</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 18 Feb 2024 19:42:13 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 11 Feb 2024 02:18:19 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>all</strong>,</p>
<p dir="auto"><strong>Continuation</strong> of version <strong><code>v0.8</code></strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'[^\r\n]', number)

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)

Total_1_byte = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'[\x{0080}-\x{07FF}]', number)

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !)

Total_2_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)

Total_3_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
num = 0
editor.research(r'[^\r\n]', number)

Total_standard = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_4_bytes = 0  #  By default

if Curr_encoding != 'ANSI':
    Total_4_bytes = Total_standard - Total_BMP

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_chars = Total_EOL + Total_standard

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':
    Bytes_length = Total_EOL + Total_1_byte

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'[^\r\n\t\x20]', number)

Non_blank_chars = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_Regex = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_Regex = True

Non_space_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)

Empty_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Blank_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
else:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)

Total_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

# print ('Res = ', Num_sel)

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)

        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count &lt; 2:
        Txt_chars = ' selected char ('

    else:
        Txt_chars = ' selected chars ('


    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '

    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range\n'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

line_list = []  # empty list

Line_end = '\r\n'

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

line_list.append ('-' * Line_title + Line_end)

line_list.append (' FULL File Path    :  ' + File_name + Line_end)

if os.path.isfile(File_name) == True:

    line_list.append(' CREATION     Date :  ' + Creation_date)

    line_list.append(' MODIFICATION Date :  ' + Modif_date + Line_end)

    line_list.append(' READ-ONLY flag    :  ' + RO_flag )

line_list.append (' READ-ONLY editor  :  ' + RO_editor + Line_end * 2)

line_list.append (' Current VIEW      :  ' + Curr_view + Line_end)

line_list.append (' Current ENCODING  :  ' + Curr_encoding + Line_end)

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')' + Line_end)

line_list.append (' Current Line END  :  ' + Curr_eol + Line_end)

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + Line_end * 2)

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + Line_end)

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + Line_end)

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + Line_end)

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + Line_end * 2)

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + Line_end)

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + Line_end * 2)
else:
    line_list.append ('\n')

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + Line_end)

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)' + Line_end)

if Err_Regex == False:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + Line_end * 2)
else:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + ' (ERROR : Ran out of stack space trying to match the regular expressions !)' + Line_end * 2)

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + Line_end)

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + Line_end)

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + Line_end * 2)

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92824</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92824</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 11 Feb 2024 02:18:19 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sun, 11 Feb 2024 02:16:28 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto">I realized that it was the <strong>mess</strong> regarding the <strong>line_endings</strong>, in the <strong><code>Summary</code></strong> report. Thus, by defining a <strong><code>Line_end</code></strong> variable equal to <strong><code>\r\n</code></strong>, the results are more <strong>harmonious</strong> !</p>
<p dir="auto">One <strong>advantage</strong> : if you do <strong>not</strong> want any <strong>supplementary</strong> line-break, in the <strong><code>Summary</code></strong> report, simply change the line :</p>
<pre><code class="language-py">Line_end = '\r\n'
</code></pre>
<p dir="auto">by this one :</p>
<pre><code class="language-py">Line_end = ''
</code></pre>
<p dir="auto">So, here is the <strong><code>v0.8</code></strong> version of my script :</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v0.8 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92823</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92823</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 11 Feb 2024 02:16:28 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 15:01:00 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>all</strong>,</p>
<p dir="auto"><strong>Continuation</strong> of version <strong><code>v0.7</code></strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'[^\r\n]', number)

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)

Total_1_byte = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'[\x{0080}-\x{07FF}]', number)

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !)

Total_2_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)

Total_3_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
num = 0
editor.research(r'[^\r\n]', number)

Total_standard = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_4_bytes = 0  #  By default

if Curr_encoding != 'ANSI':
    Total_4_bytes = Total_standard - Total_BMP

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_chars = Total_EOL + Total_standard

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':
    Bytes_length = Total_EOL + Total_1_byte

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'[^\r\n\t\x20]', number)

Non_blank_chars = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_Regex = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_Regex = True

Non_space_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)

Empty_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Blank_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
else:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)

Total_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

# print ('Res = ', Num_sel)

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)

        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count &lt; 2:
        Txt_chars = ' selected char ('

    else:
        Txt_chars = ' selected chars ('


    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '

    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range\n'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

line_list = []  # empty list

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

line_list.append ('-' * Line_title +'\n')

line_list.append (' FULL File Path    :  ' + File_name + '\n')

if os.path.isfile(File_name) == True:

    line_list.append(' CREATION     Date :  ' + Creation_date)

    line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')

    line_list.append(' READ-ONLY flag    :  ' + RO_flag )

line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')

line_list.append (' Current VIEW      :  ' + Curr_view + '\n')

line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')

line_list.append (' Current Line END  :  ' + Curr_eol + '\n')

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
else:
    line_list.append ('\n')

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')

if Err_Regex == False:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')
else:
    line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + ' (ERROR : Ran out of stack space trying to match the regular expressions !)\n\n')

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">So, just <strong>test</strong> this script against <strong>any</strong> file, to get any possible <strong>bug</strong> or <strong>limitation</strong> !!</p>
<p dir="auto">I’ve also heard of <strong>compiled</strong> regexes in <strong>Python</strong>. Would that be <strong>interesting</strong> for <strong>this</strong> script ?</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92802</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92802</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 15:01:00 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 15:14:53 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <strong>All</strong>,</p>
<ul>
<li>
<p dir="auto">So, I followed the <strong>excellent</strong> <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/26710">@mark-olson</a>’s suggestion to bypass the <strong>clipboard</strong> functionality !</p>
</li>
<li>
<p dir="auto">Now, in case of a <strong><code>RuntimeError</code></strong>, when searching for the <strong>NON-SPACE</strong> count of characters, I used an <strong>exception</strong> which displays a <strong>warning</strong> message, if the <strong><code>Err_Regex</code></strong> is <strong>True</strong>. But, even when the <strong><code>Err_Regex</code></strong> variable is <strong>False</strong>, the result is not <strong>totally</strong> guaranteed too, if the <strong>analyzed</strong> file contains bytes <strong>over</strong> the <strong><code>BMP</code></strong>.</p>
</li>
</ul>
<p dir="auto">So, globally, whatever the <strong><code>Err_Regex</code></strong> status, the <strong><code>NON-SPACE count</code></strong> value may be <strong>increased</strong> or <strong>decreased</strong> by <strong><code>1</code></strong>, in some cases ( still <strong>unclear</strong> ) !</p>
<hr />
<p dir="auto">Here is the <strong><code>v0.7</code></strong> version of my script ( I indeed gave a <strong>version</strong> number to my <strong>successive</strong> attempts ! )</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v0.7 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92801</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92801</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 15:14:53 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 04:16:01 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said in <a href="/post/92797">Tests and impressions on the "View &gt; Summary..." functionality</a>:</p>
<blockquote>
<p dir="auto">editor.copyText (‘\r\n’.join(line_list))</p>
<p dir="auto">notepad.new()</p>
<p dir="auto">editor.paste()</p>
<p dir="auto">editor.copyText(‘’)</p>
</blockquote>
<p dir="auto">Couldn’t you just do</p>
<pre><code>notepad.new()
editor.setText('\r\n'.join(line_list))
</code></pre>
<p dir="auto">and thus avoid overwriting the user’s clipboard?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92799</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92799</guid><dc:creator><![CDATA[Mark Olson]]></dc:creator><pubDate>Sat, 10 Feb 2024 04:16:01 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 03:00:09 GMT]]></title><description><![CDATA[<p dir="auto"><strong>Continuation</strong> of the script :</p>
<pre><code class="language-py"># --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'[^\r\n]', number)

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)

Total_1_byte = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'[\x{0080}-\x{07FF}]', number)

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !)

Total_2_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)

Total_3_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
num = 0
editor.research(r'[^\r\n]', number)

Total_standard = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_4_bytes = 0  #  By default

if Curr_encoding != 'ANSI':
    Total_4_bytes = Total_standard - Total_BMP

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_chars = Total_EOL + Total_standard

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':
    Bytes_length = Total_EOL + Total_1_byte

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'[^\r\n\t\x20]', number)

Non_blank_chars = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)

Non_space_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)

Empty_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?&lt;!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'(?&lt;![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Blank_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
else:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)

Total_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

# print ('Res = ', Num_sel)

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)

        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count &lt; 2:
        Txt_chars = ' selected char ('

    else:
        Txt_chars = ' selected chars ('


    if Bytes_count &lt; 2:
        Txt_bytes = ' selected byte) in '

    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel &lt; 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel &lt; 2 and Bytes_count &gt; 0:
        Txt_ranges = ' range\n'

    if Num_sel &gt; 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel &gt; 1 and Bytes_count &gt; 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

line_list = []  # empty list

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

line_list.append ('-' * Line_title +'\n')

line_list.append (' FULL File Path    :  ' + File_name + '\n')

if os.path.isfile(File_name) == True:

    line_list.append(' CREATION     Date :  ' + Creation_date)

    line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')

    line_list.append(' READ-ONLY flag    :  ' + RO_flag )

line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')

line_list.append (' Current VIEW      :  ' + Curr_view + '\n')

line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')

line_list.append (' Current Line END  :  ' + Curr_eol + '\n')

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')

line_list.append (' CHARS w/o CR &amp; LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
else:
    line_list.append ('\n')

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')

line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

editor.copyText ('\r\n'.join(line_list))

notepad.new()

editor.paste()

editor.copyText('')

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding &gt; Character Set &gt; ...' feature

        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =&gt;  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">The way to use this script is quite <strong>self-explanatory</strong>. Just <strong>three</strong> points to <strong>emphazise</strong> :</p>
<ul>
<li>
<p dir="auto">On the <strong><code>BUFFER length</code></strong> line, the values between <strong>parentheses</strong> :</p>
<ul>
<li>
<p dir="auto"><strong>Always</strong> begin with the number of <strong><code>EOL</code></strong> ( I <strong>omitted</strong> the <strong><code>b</code></strong> after <strong><code>x 1</code></strong>, on purpose ! )</p>
<ul>
<li>
<p dir="auto">Followed with the number of the <strong><code>1-BYTE</code></strong> for an <strong><code>ANSI</code></strong> <strong>encoded</strong> file</p>
</li>
<li>
<p dir="auto">Followed with the numbers of the <strong><code>1-BYTE</code></strong>, <strong><code>2-BYTES</code></strong>, <strong><code>3-BYTES</code></strong> and <strong><code>4-BYTES</code></strong>, for an <strong><code>UTF-8</code></strong> or <strong><code>UTF-8-BOM</code></strong> <strong>encoded</strong> file</p>
</li>
<li>
<p dir="auto">Followed with the numbers of the <strong><code>2-BYTES</code></strong> and <strong><code>4-BYTES</code></strong>, for an <strong><code>UTF-16 BE BOM</code></strong> or <strong><code>UTF-16 LE BOM</code></strong> <strong>encoded</strong> file</p>
</li>
</ul>
</li>
</ul>
</li>
<li>
<p dir="auto">Normally, when a file is <strong>saved</strong> the values <strong><code>BUFFEER length</code></strong> and <strong><code>Length on DISK</code></strong> should always be <strong>equal</strong>. If <strong>not</strong>, <strong>two</strong> cases are possible :</p>
<ul>
<li>
<p dir="auto">This file have been recently <strong>modified</strong> ( <strong>trivial</strong> case )</p>
</li>
<li>
<p dir="auto">The file is <strong>not</strong> identified with a <strong><code>BOM</code></strong> and has been <strong>re-interpreted</strong> with an other <strong>NON-Unicode</strong> encoding. Then, <strong>apply</strong> the actions, indicated in the <strong>pop-up</strong> message !</p>
</li>
</ul>
</li>
<li>
<p dir="auto">For a <strong><code>new #</code></strong> file, some values are <strong>obviously</strong> absent. These are the <strong><code>MODIFICATION date</code></strong>, the <strong><code>CREATION date</code></strong>, the <strong><code>READ-ONLY</code></strong> flag and the <strong><code>Length on DISK</code></strong> ( <strong>size</strong> ) values</p>
</li>
</ul>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92797</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92797</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 03:00:09 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 10:35:54 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto"><strong>Remarks</strong> : Although most of the <strong>regexes</strong>, above, can be <strong>easily</strong> understood, here are some <strong>additional</strong> elements :</p>
<ul>
<li>
<p dir="auto">The regex <strong><code>(?-s).[\x{D800}-\x{DFFF}]</code></strong> is the <strong>sole</strong> correct syntax, with our <strong>Boost</strong> regex engine, to count <strong>all</strong> the characters <strong>over</strong> the <strong><code>BMP</code></strong>. But it may fail with the message <strong><code>Ran out of stack space trying to match the regular expression.</code></strong>. Luckily, I do <strong>not</strong> use it because it can be deduced from the difference <strong><code>Total_Standard - Total_BMP</code></strong></p>
</li>
<li>
<p dir="auto">The regex <strong><code>(?s)((?!\s).[\x{D800}-\x{DFFF}]?)+</code></strong>, to count <strong>all</strong> the <strong><code>Non_Space</code></strong> strings, was explained <strong>before</strong> but may fail with the message <strong><code>Ran out of stack space trying to match the regular expression.</code></strong></p>
</li>
<li>
<p dir="auto">In <strong>all</strong> the regexes, relative to the counting of <strong>lines</strong>, you probably noticed the character class <strong><code>[\f\x{0085}\x{2028}\x{2029}]</code></strong>. It <strong>must</strong> be present because the <strong>four</strong> characters <strong><code>\f</code></strong>, <strong><code>\x{0085}</code></strong> , <strong><code>\x{2028}</code></strong> and <strong><code>\x{2029}</code></strong> are, <strong>both</strong>, considered as a <strong><code>start </code></strong> and an <strong><code>End</code></strong>  of line, like the <strong>assertions</strong> <strong><code>^</code></strong> and <strong><code>$</code></strong> !</p>
<ul>
<li>For instance, if, in a <strong>new</strong> file, you insert one <strong>Next_Line</strong> char ( <strong><code>NEL</code></strong> ), of code-point <strong><code>\x{0085}</code></strong> and hit the <strong><code>Enter</code></strong> key, this <strong>sole</strong> line is <strong>wrongly</strong> seen as an <strong>empty</strong> line by the simple regex <strong><code>^(?:\r\n|\r|\n)</code></strong> which matches the <strong>line-break</strong> after the <strong><code>Next_Line</code></strong> char !</li>
</ul>
</li>
</ul>
<hr />
<p dir="auto">Here is the <strong>python</strong> script, split on <strong>two</strong> posts</p>
<pre><code class="language-py"># encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v0.6 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number &lt;= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode &amp; 0xFFFF
        type = (retcode &gt;&gt; 16) &amp; 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

</code></pre>
<p dir="auto">See <strong>next</strong> post for continuation !</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92796</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92796</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 10:35:54 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 02:49:09 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>All</strong>,</p>
<p dir="auto">Then, with the help of the <strong>excellent</strong> <strong><code>Babel Map</code></strong> software, updated for Unicode <strong>v13.0``</strong></p>
<p dir="auto"><a href="https://www.babelstone.co.uk/Software/BabelMap.html" rel="nofollow ugc">https://www.babelstone.co.uk/Software/BabelMap.html</a></p>
<p dir="auto">I succeeded to create a list of the <strong><code>21,143</code></strong> <strong>remaining</strong> characters, from the <strong>living</strong> scripts, above, which should be <strong>truly</strong> considered as <strong>word</strong> character, <strong>without</strong> any ambiguity</p>
<p dir="auto">On the other hand, with the help of my <strong><code>Total_Chars.txt</code></strong>, which contains <strong><code>325,590</code></strong> characters, I detected <strong><code>48,031</code></strong> <strong>word</strong> chars with the simple search of the <strong><code>\w</code></strong> regex. This number seems important but include <strong>all</strong> the <strong><code>Chinese</code></strong> characters and equivalent chars which cannot be <strong>truly</strong> counted as <strong>word</strong> chars because of their <strong>vertical</strong> / <strong>horizontal</strong> way of writing !</p>
<p dir="auto">In addition, when applying the regex <strong><code>\t\w\t</code></strong> against this list above, I got a total of <strong><code>17,307</code></strong> <strong>word</strong> characters, <strong>only</strong>, because, probably, Notepad++ does <strong>not</strong> use the <strong>Boost</strong> regex library with <em>FULL</em> <strong>Unicode</strong> support</p>
<p dir="auto">Indeed, after some <strong>verifications</strong> :</p>
<ul>
<li>
<p dir="auto">The <strong>Boost</strong> definition of the regex <strong><code>\w</code></strong> does <strong>not</strong> consider <strong>all</strong> the characters <strong>over</strong> the <strong><code>BMP</code></strong></p>
</li>
<li>
<p dir="auto">Some characters of the <strong><code>BMP</code></strong>, although <strong>alphabetic</strong>, are <strong>not</strong> considered, yet, as <strong>word</strong> chars</p>
</li>
</ul>
<p dir="auto">For instance, in this short list, below, each <strong>Unicode</strong> char, <strong>surrounded</strong> with two <strong><code>tabulation</code></strong> chars, <strong>cannot</strong> be found with the regex <strong><code>\t\w\t</code></strong>, although that each char is, indeed, seen as a <strong>word</strong> by the <strong>Unicode Consortium`</strong> :-((</p>
<pre><code class="language-z"> 24B6   Ⓐ     ; Other_Symbol     # So         CIRCLED LATIN CAPITAL LETTER A
1D400   𝐀     ; Uppercase_Letter # Lu         MATHEMATICAL BOLD CAPITAL A
1D70B   𝜋     ; Lowercase_Letter # Ll         MATHEMATICAL ITALIC SMALL PI
1F150   🅐     ; Other_symbol     # So         NEGATIVE CIRCLED LATIN CAPITAL LETTER A
</code></pre>
<p dir="auto">To my mind, for <strong>all</strong> these reasons, as we <strong>cannot</strong> rely on the <strong>word</strong> notion, the <strong><code>View &gt; Summary...</code></strong> feature should just <strong>ignore</strong> the number of <strong>words</strong> or, at least, add the indication <strong><code>With caution</code></strong> !</p>
<hr />
<p dir="auto">By contrast, I think that it would be <strong>useful</strong> to count the number of <strong><code>Non_Space</code></strong> strings, determined with the regex <strong><code>\S+</code></strong>. Indeed, we would get more <strong>confident</strong> results ! The <strong>boundaries</strong> of <strong><code>Non_Space</code></strong> strings, which are the <strong><code>Space</code></strong> characters, belong to the <strong>well-defined</strong> list of the <strong><code>25</code></strong> <strong>Unicode</strong> characters with the <strong>binary</strong> property <strong><code>White_Space</code></strong>, from the <strong><code>PropList.txt</code></strong> file. Refer to the <strong>very beginning</strong> of this file :</p>
<p dir="auto"><a href="http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt" rel="nofollow ugc">http://www.unicode.org/Public/UCD/latest/ucd/PropList.txt</a></p>
<p dir="auto">As a reminder, the regex <strong><code>\s</code></strong> is <strong>identical</strong> to <strong><code>\h|\v</code></strong>. So, it represents the <strong>complete</strong> character class <strong><code>[\t\x20\xA0\x{1680}\x{2000}-\x{200B}\x{202F}\x{3000}]|[\n\x0B\f\r\x85\x{2028}\x{2029}]</code></strong> which can be <strong>re</strong>-ordered as :</p>
<p dir="auto"><strong><code>\s</code></strong> = <strong><code>[\t\n\x0B\f\r\x20\x85\xA0\x{1680}\x{2000}-\x{200B}\x{2028}\x{2029}\x{202F}\x{3000}]</code></strong></p>
<p dir="auto">Note that, in practice, the <strong><code>\s</code></strong> regex is mainly <strong>equivalent</strong> to the simple regex  <strong><code>[\t\n\r\x20]</code></strong></p>
<p dir="auto">Here is that <strong>Unicode</strong> list of all <strong>Unicode</strong> characters with the property <strong><code>White_Space</code></strong>, with their <strong>name</strong> and their <strong><code>General_Category</code></strong> value :</p>
<pre><code class="language-diff">0009  TAB  ; White_Space    # Cc    TABULATION  &lt;control-0009&gt;
000A  LF   ; White_Space    # Cc    LINE FEED  &lt;control-000A&gt;
000B       ; White_Space    # Cc    VERTICAL TABULATION  &lt;control-000B&gt;
000C    ; White_Space    # Cc    FORM FEED  &lt;control-000C&gt;
000D  CR   ; White_Space    # Cc    CARRIAGE RETURN  &lt;control-000D&gt;
0020       ; White_Space    # Zs    SPACE
0085    ; White_Space    # Cc    NEXT LINE  &lt;control-0085&gt;
00A0       ; White_Space    # Zs    NO-BREAK SPACE
1680       ; White_Space    # Zs    OGHAM SPACE MARK
2000       ; White_Space    # Zs    EN QUAD
2001       ; White_Space    # Zs    EM QUAD
2002       ; White_Space    # Zs    EN SPACE
2003       ; White_Space    # Zs    EM SPACE
2004       ; White_Space    # Zs    THREE-PER-EM SPACE
2005       ; White_Space    # Zs    FOUR-PER-EM SPACE
2006       ; White_Space    # Zs    SIX-PER-EM SPACE
2007       ; White_Space    # Zs    FIGURE SPACE
2008       ; White_Space    # Zs    PUNCTUATION SPACE
2009       ; White_Space    # Zs    THIN SPACE
200A       ; White_Space    # Zs    HAIR SPACE
2028     ; White_Space    # Zl    LINE SEPARATOR
2029     ; White_Space    # Zp    PARAGRAPH SEPARATOR
202F       ; White_Space    # Zs    NARROW NO-BREAK SPACE
205F       ; White_Space    # Zs    MEDIUM MATHEMATICAL SPACE
3000   　  ; White_Space    # Zs    IDEOGRAPHIC SPACE
</code></pre>
<p dir="auto">Note that I used the <strong>notations</strong> <em>TAB</em>, <em>LF</em> and <em>CR</em>, standing for the <strong>three</strong> characters <strong><code>\t</code></strong>, <strong><code>\n</code></strong> and <strong><code>\r</code></strong>, instead of the chars themselves</p>
<p dir="auto">So, in order to get the number of <strong><code>Non_Space</code></strong> strings, we should, <strong>normally</strong>, use the simple regex <strong><code>\S+</code></strong>. However, it does <strong>not</strong> give the <strong>right</strong> number. Indeed, when <strong>several</strong> characters, with code-point <strong>over</strong> the <strong><code>BMP</code></strong>, are <strong>consecutive</strong>, they are <strong>not</strong> seen as a <strong>global</strong> <strong><code>Non_Space</code></strong> string but as <strong>individual</strong> characters :-((</p>
<p dir="auto">You may test my statement with this <strong>string</strong>, composed of <strong>four</strong> consecutive <strong><code>emoji</code></strong> chars 👨👩👦👧. The regex <strong><code>\S+</code></strong> returns <strong>four</strong> <strong><code>Non_Space</code></strong> strings, whereas I would have <strong>expected</strong> only <strong>one</strong> string !</p>
<p dir="auto">Consequently, I verified that, when the number of <strong>four bytes</strong> chars is <strong><code>&gt; 0</code></strong>, the <strong>suitable</strong> regex to <strong>count</strong> all the <strong><code>Non_Space</code></strong> strings of a file, <strong>whatever</strong> their <strong>Unicode</strong> code-point, is rather the regex <strong><code>((?!\s).[\x{D800}-\x{DFFF}]?)+</code></strong> ( longer, I agree but <strong>exact</strong> ! )</p>
<hr />
<p dir="auto">So, I would like to propose a <strong>new</strong> layout of an <strong>summary</strong> feature, which should be more <strong>informative</strong>. It contains a list of <strong>regexes</strong> which allow you to <strong>count</strong> different <strong>subsets</strong> of characters from the <strong>current</strong> file contents. Of course, <strong>tick</strong> the <strong><code>Wrap around</code></strong> option, in the <strong><code>Find</code></strong> dialog and click on the <strong><code>Count</code></strong> button for <strong>tests</strong> !</p>
<p dir="auto"><em>IMPORTANT</em> : In the <strong>list</strong> below, <strong>any</strong> text, before the <strong>colon</strong> character of <strong>each</strong> line, is the <strong>name</strong> which should be displayed in the <strong>new</strong> <strong><code>Summary</code></strong> dialog !</p>
<pre><code class="language-diff"> FULL File Path    :  X:\....\....\

 CREATION     Date :  Name Month Day 22-05-26 Year
 MODIFICATION Date :  Name Month Day 22-05-26 Year

 READ-ONLY flag    :  YES / NO
 READ-ONLY editor  :  YES / NO


 Current VIEW      :  MAIN view / SECONDARY view

 Current ENCODING  :  UTF-... / ANSI

 Current LANGUAGE  :  TXT ( Normal txt file) / ...

 Current Line END  :  Windows (CR LF) / Macintosh (CR) / Unix (LF)

 Current WRAPPING  :  YES / NO

•------------------------------------------------•----------------------------------------------------------------------------•------------------------------------------------•-----------------------------------•
                                                 |                                 UTF-8 [-BOM]                               |             UCS-2/UTF-16 BE/LE BOM             |                ANSI
•------------------------------------------------•----------------------------------------------------------------------------•------------------------------------------------•-----------------------------------•
                                                 |                                                                            |                                                |
 1-BYTE  Chars     :  N1                         | (?![\r\n])[\x{0000}-\x{007F}]                                              |                        0                       |               [^\r\n]
 2-BYTES Chars     :  N2                         | [\x{0080}-\x{07FF}]                                                        | (?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}] |                  0
 3-BYTES Chars     :  N3                         | (?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]                                 |                        0                       |                  0
                                                 |                                                                            |                                                |
 Sum BMP Chars     :  N1 + N2 + N3               | (?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}] or [^\r\n\x{D800}-\x{DFFF}] |                      idem                      |               [^\r\n]
 4-BYTES Chars     :  N4                         | (?-s).[\x{D800}-\x{DFFF}]  or  [\x{D800}-\x{DFFF}]                         |                      idem                      |                  0
                                                 |                                                                            |                                                |
 Chars w/o CR|LF   :  N1 + N2 + N3 + N4          | [^\r\n]                                                                    |                      idem                      |                idem
 EOL ( CR or LF )  :  N0                         | \r|\n                                                                      |                      idem                      |                idem
                                                 |                                                                            |                                                |
 TOTAL Characters  :  N0 + N1 + N2 + N3 + N4     | (?s).                                                                      |                      idem                      |                idem
                                                 |                                                                            |                                                |
                                                 |                                                                            |                                                |
 BYTE Length       :                             | N0 + N1 + 2 × N2 + 3 × N3 + 4 × N4                                         |           N0 × 2 + N2 × 2 +  N4 ×    4         |               NO + N1
                                                 |                                                                            |                                                |
 Byte Order Mark   :                             | 0 ( UTF-8)  or  3 ( UTF-8-BOM )                                            |                        2                       |                  0
                                                 |                                                                            |                                                |
 BUFFER Length     :  BYTE length  +  BOM        |                                                                            |                                                |
                                                 |                                                                            |                                                |
 Length on DISK    :  Length CURRENT file on DISK|                                                                            |                                                |
                                                 |                                                                            |                                                |
                                                 |                                                                            |                                                |
 NON BLANK chars   :                             | [^\r\n\t\x20]                                                              |                       idem                     |                idem
                                                 |                                                                            |                                                |
 WORDS     count   :     (Caution !)             | \w+                                                                        |                       idem                     |                idem
                                                 |                                                                            |                                                |
 NON-SPACE count   :                             | (?:(?!\s).[\x{D800}-\x{DFFF}]?)+  or  \S+                                  |                       idem                     |                \S+
                                                 |                                                                            |                                                |
                                                 |                                                                            |                                                |
 True EMPTY lines  :  L1                         | (?&lt;![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)                           |                       idem                     | (?&lt;!\f)^(?:\r\n|\r|\n)
                                                 |                                                                            |                                                |
 True BLANK lines  :  L2                         | (?&lt;![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)               |                       idem                     | (?&lt;!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)
                                                 |                                                                            |                                                |
                                                 |                                                                            |                                                |
 EMPTY/BLANK lines :  L1 + L2                    | (?&lt;![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]*(?:\r\n|\r|\n|\z)               |                       idem                     | (?&lt;!\f)^[\t\x20]*(?:\r\n|\r|\n|\z)
                                                 |                                                                            |                                                |
 NON-BLANK lines   :                             | (?-s)(?!^[\t\x20]+$)^(?:.|[\f\x{0085}\x{2028}\x{2029}])+(?:\r\n|\r|\n|\z)  |                       idem                     | (?-s)(?!^[\t\x20]+$)^(?:.|\f)+(?:\r\n|\r|\n|\z)
                                                 |                                                                            |                                                |
 TOTAL lines       :                             | (?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z                       |                       idem                     | (?-s)\r\n|\r|\n|(?:.|\f)\z
                                                 |                                                                            |                                                |
                                                 |                                                                            |                                                |
 SELECTION(S)      :  X characters (Y bytes) in Z ranges                                                                      |                        idem                    |                idem
•------------------------------------------------•----------------------------------------------------------------------------•------------------------------------------------•------------------------------------•
</code></pre>
<p dir="auto"><strong>Continued</strong> discussion in the <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92795</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92795</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 02:49:09 GMT</pubDate></item><item><title><![CDATA[Reply to Emulation of the &quot;View &gt; Summary&quot; feature with a Python script on Sat, 10 Feb 2024 02:46:37 GMT]]></title><description><![CDATA[<p dir="auto">Hello <strong>All</strong>,</p>
<p dir="auto">Here is the <strong>updated</strong> version of my <strong>previous</strong> posts regarding the <strong>present</strong> N++ <strong>Summary</strong> feature ( <strong><code>View &gt; Summary...</code></strong> ). And I must say that <strong>numerous</strong> things are still <strong>weird</strong> !</p>
<p dir="auto">For tests, I used various files as well as my <strong><code>Total_Chars.txt</code></strong> file, written with the <strong><code>4</code></strong> N++ <strong>Unicode</strong> encodings and also an <strong><code>ANSI</code></strong> file, containing the <strong><code>256</code></strong> characters of the <strong><code>Windows-1252</code></strong> encoding :</p>
<ul>
<li><strong><code>ANSI</code></strong></li>
<li><strong><code>UTF-8</code></strong></li>
<li><strong><code>UTF-8-BOM</code></strong></li>
<li><strong><code>UTF-16 BE BOM</code></strong></li>
<li><strong><code>UTF-16 LE BOM</code></strong></li>
</ul>
<hr />
<p dir="auto">To my mind, there are <strong><code>3</code></strong> <strong>major</strong> problems and some <strong>minor</strong> points :</p>
<ul>
<li>
<p dir="auto">The first and <strong>worse</strong> problem is the fact that, when an <strong><code>UTF-8[-BOM]</code></strong> file, containing various <strong>Unicode</strong> chars ( of the <strong><code>BMP</code></strong> only : this point is <strong>important</strong> ! ) is <strong>copied</strong> in an <strong><code>UCS-2 BE BOM</code></strong> or <strong><code>UCS-2 LE BOM</code></strong> <strong>encoded</strong> file, some results, given by the <strong><code>Summary</code></strong> feature for these <strong>new</strong> files, are <strong>totally</strong> wrong :</p>
<ul>
<li>
<p dir="auto">The <strong><code>characters( without line endings )</code></strong> value seems to be the number of <strong>bytes</strong> used in the <strong>corresponding</strong> <strong><code>UTF-8[-BOM]</code></strong> file</p>
</li>
<li>
<p dir="auto">The <strong><code>Document length</code></strong> value seems to be the document length of the <strong>corresponding</strong> <strong><code>UTF-8[-BOM]</code></strong> file and is also displayed, unfortunately, in the <strong>status bar</strong> !!</p>
</li>
</ul>
</li>
<li>
<p dir="auto">The second problem is that the definition of a <strong>word</strong> char, by the <strong><code>Summary</code></strong> feature is definitively <strong>NOT</strong> the same of the <strong>definition</strong>  of the regex <strong><code>\w</code></strong>, as explained further on !</p>
</li>
<li>
<p dir="auto">Thus, the third problem is that the given number of <strong>words</strong> is <strong>totally</strong> inaccurate ! And, anyway, the <strong>number</strong> of words, although <strong>well enough</strong> defined for an <strong><code>English / American</code></strong> text, is rather a <strong>vague</strong> notion, for a lot of texts written in <strong>other</strong> languages, especially <strong>Asiatic</strong> ones ! ( See further on )</p>
</li>
<li>
<p dir="auto">Some <strong>minor</strong> things :</p>
<ul>
<li>
<p dir="auto">The number of <strong>lines</strong> given is, most of the time, <strong>increased</strong> by <strong>one</strong> unit</p>
</li>
<li>
<p dir="auto">Presently, the <strong>Summary</strong> feature displays the <strong>document length</strong> in the Notepad++ buffer. I think it would be good to display, as well, the <strong>actual</strong> document length saved on <strong>disk</strong>. Incidentally, for just <strong>saved</strong> documents, it would give, by <strong>difference</strong>, the length of the possible <strong><code>Byte Order Mark</code></strong>, if its <strong>size</strong> wouldn’t be <strong>explicitly</strong> displayed !</p>
</li>
<li>
<p dir="auto">For any <strong>encoded</strong> file, a decomposition, giving the <strong>number</strong> of chars coded with <strong><code>1</code></strong>, <strong><code>2</code></strong>, <strong><code>3</code></strong> and <strong><code>4</code></strong> bytes would be <strong>welcome</strong> !</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">So, in brief, in the <strong>present</strong> <strong><code>Summary</code></strong> window :</p>
<ul>
<li>
<p dir="auto">The <strong><code>Characters (without line endings):</code></strong> number is <strong>wrong</strong> for the <strong><code>UTF-16 BE BOM</code></strong> or <strong><code>UTF-16 LE BOM</code></strong> encodings</p>
</li>
<li>
<p dir="auto">The <strong><code>Words</code></strong> number is totally <strong>wrong</strong>, given the <strong>regex</strong> definition of a <strong>word</strong> character, <strong>whatever</strong> the encoding used</p>
</li>
<li>
<p dir="auto">The <strong><code>Lines:</code></strong> number is <strong>wrong</strong>, by <strong>one</strong> unit, if a <strong>line-break</strong> ends the <strong>last</strong> line of current file, in <strong>any</strong> encoding</p>
</li>
<li>
<p dir="auto">The <strong><code>Document length</code></strong> value, in N++ <strong>buffer</strong>, is <strong>wrong</strong> for the <strong><code>UTF-16 BE BOM</code></strong> or <strong><code>UTF-16 LE BOM</code></strong> encodings, as well as the <strong><code>Length:</code></strong> indication in the <strong>status</strong> bar</p>
</li>
</ul>
<hr />
<p dir="auto">To begin with, let’s me develop the… <strong>second</strong> bug ! After <strong>numerous</strong> tests, I determined that, in the <strong>present</strong> <strong><code>View &gt; Summary...</code></strong> feature, the characters, considered a <strong>word</strong> character, are :</p>
<ul>
<li>
<p dir="auto">The <strong>C0 control</strong> characters, except for the <strong>Tabulation</strong> ( <strong><code>\x{0009}</code></strong> ) and the <strong>two EOL</strong> ( <strong><code>\x{000a}</code></strong> and <strong><code>\x{000d}</code></strong> ), so the regex <strong><code>(?![\t\r\n])[\x00-\x1F]</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>number</strong> sign <strong><code>#</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>10</code></strong> <strong>digits</strong>, so the regex <strong><code>[0-9]</code></strong>                                                                                    :</p>
</li>
<li>
<p dir="auto">The <strong><code>26</code></strong> <strong>uppercase</strong> and <strong>lowercase</strong> letters, so the regex <strong><code>(?i)[A-Z]</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>low line</strong> character <strong><code>_</code></strong></p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters, of the <strong>Basic Multilingual Plane</strong> ( <strong><code>BMP</code></strong> ), with code-point <strong>over</strong> <strong><code>\x{007E}</code></strong>, so the regex <strong><code>(?![\x{D800}-\x{DFFF}])[\x{007F}-\x{FFFF}]</code></strong> for a <strong><code>Unicode</code></strong> encoded file or <strong><code>[\x7F-\xFF]</code></strong> for an <strong><code>ANSI</code></strong> encoded file</p>
</li>
<li>
<p dir="auto"><strong>All</strong> the characters, <strong>over</strong> the <strong>Basic Multilingual Plane</strong>, so the regex <strong><code>(?-s).[\x{D800}-\x{DFFF}]</code></strong> for an <strong><code>Unicode </code></strong> encoded file, <strong>only</strong></p>
</li>
</ul>
<p dir="auto">To <strong>simulate</strong> the present <strong><code>Words:</code></strong> number ( which is <strong>erroneous</strong> ! ), given by the <strong>summary</strong> feature, <strong>whatever</strong> the file <strong>encoding</strong>, simply use the regex below :</p>
<pre><code class="language-z">[^\t\n\r\x20!"$%&amp;'()*+,\-./:;&lt;=&gt;?@\x5B\x5C\x5D^\x60{|}~]+
</code></pre>
<p dir="auto">and click on the <strong><code>Count</code></strong> button of the <strong>Find</strong> dialog, with the <strong><code>Wrap around</code></strong> option <strong>ticked</strong></p>
<p dir="auto">Obviously, this is <strong>not</strong> exact as a single <strong>word</strong> character is matched with the <strong><code>\w</code></strong> regex, which is the class <strong><code>[\u\l\d_]</code></strong>, where <strong><code>\u</code></strong>, <strong><code>\l</code></strong> and <strong><code>\d</code></strong> represents any <strong>Unicode</strong> <strong><code>uppercase</code></strong>, <strong><code>lowercase</code></strong> and <strong><code>digit</code></strong> char or a <strong>related</strong> char, so, finally, <strong>much more</strong> than the simple <strong><code>[A-Za-z0-9]</code></strong> set !</p>
<p dir="auto">But , worse, it’s the notion of <strong>word</strong> which is practically, <strong>not consistent</strong>, most of the time ! Indeed, for instance, if we consider the <strong>French</strong> expression <strong><code>l'école</code></strong> ( the school ), the regex <strong><code>\w+</code></strong> would return <strong><code>2</code></strong> words, which is <strong>correct</strong> as this expression can be mentally decomposed as <strong><code>la école</code></strong>. However, this regex would <strong>wrongly</strong> say the that the <strong>single</strong> word <strong><code>aujourd'hui</code></strong> ( today ) is a <strong>two-words</strong> expression. Of course,  you could change the regex as <strong><code>[\w']+</code></strong> which would return <strong><code>1</code></strong> word, but, this time, the expression <strong><code>l'école</code></strong> would <strong>wrongly</strong> be considered as a <strong>one-word</strong> string !</p>
<p dir="auto">In addition, what can be said about languages that do <strong>not</strong> use the <strong><code>Space</code></strong> character or where the use of the <strong><code>Space</code></strong> is <strong>discretionary</strong> ? Then, <strong>counting</strong> of words is impossible or rather <strong>non-significant</strong> ! This is developed in this <strong>Martin Haspelmath</strong>’s article, below :</p>
<p dir="auto"><a href="https://zenodo.org/record/225844/files/WordSegmentationFL.pdf" rel="nofollow ugc">https://zenodo.org/record/225844/files/WordSegmentationFL.pdf</a></p>
<blockquote>
<p dir="auto">At end of section <strong>5</strong>, it is said : … On such a view, the claim that “all languages have words” (Radford et al. 1999: 145) would be interpretable only in the weaker sense that "<strong>all languages have a unit which falls between the minimal sign and the phrase</strong>” …</p>
</blockquote>
<blockquote>
<p dir="auto">And : … The basic problem remains the same: The units are defined in a <strong>language-specific</strong> way and cannot be <strong>equated across languages</strong>, and there is <strong>no</strong> reason to give <strong>special</strong> status to a unit called <strong>‘word’</strong>. …</p>
</blockquote>
<blockquote>
<p dir="auto">At beginning  of section, <strong>7</strong> : … Linguists have <strong>no good basis for identifying words</strong> across languages …</p>
</blockquote>
<blockquote>
<p dir="auto">And in the <strong>conclusion</strong>, section <strong>10</strong> : … I conclude, from the arguments presented in this article, that there is <strong>no definition of ‘word’</strong> that can be applied to <strong>any</strong> language and that would yield <strong>consistent</strong> results …</p>
</blockquote>
<hr />
<p dir="auto">Now, the <strong>Unicode</strong> definition of a <strong>word</strong> character is :</p>
<p dir="auto"><strong><code>\p{gc=Alphabetic} | \p{gc=Mark} | \p{gc=Decimal_Number} | \p{gc=Connector_Punctuation} | \p{Join-Control}</code></strong></p>
<p dir="auto"><a href="https://stackoverflow.com/questions/5555613/does-w-match-all-alphanumeric-characters-defined-in-the-unicode-standard" rel="nofollow ugc">https://stackoverflow.com/questions/5555613/does-w-match-all-alphanumeric-characters-defined-in-the-unicode-standard</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries" rel="nofollow ugc">https://www.unicode.org/reports/tr18/#Simple_Word_Boundaries</a></p>
<p dir="auto">So, in theory, the <strong><code>word_character</code></strong> class should include :</p>
<ul>
<li>
<p dir="auto"><strong>All</strong> values of the <strong>derived</strong> category <strong>Alphabetic</strong> ( = <strong><code>alpha</code></strong> = <strong><code>\p{alphabetic}</code></strong> ) so <strong><code>132,875 chars</code></strong>, from the <strong>DerivedCoreProperties.txt</strong> file, which can be decomposed into :</p>
<ul>
<li>
<p dir="auto"><strong>Uppercase_Letter</strong> (<strong><code>Lu</code></strong>) + <strong>Lowercase_Letter</strong> (<strong><code>Ll</code></strong>) + <strong>Titlecase_Letter</strong> (<strong><code>Lt</code></strong>) + Modifier_Letter (<strong><code>Lm</code></strong>) + <strong>Other_Letter</strong> (<strong><code>Lo</code></strong>) + <strong>Letter_Number</strong> (<strong><code>Nl</code></strong>) + <strong>Other_Alphabetic</strong>, so the characters sum <strong><code>1,791 + 2,155  + 31 + 260 + 127,004 + 236 + 1,398</code></strong></p>
</li>
<li>
<p dir="auto"><strong>Note</strong> : The last  property <strong>Other_Alphabetic</strong>, from the <strong>Prop_list.txt</strong> file, contains some, but <strong>not all</strong>, characters from the <strong><code>3</code></strong> General_Categories <strong>Spacing_Mark</strong> ( <strong><code>Mc</code></strong> ), <strong>Nonspacing_Mark</strong> ( <strong><code>Mn</code></strong> ) and <strong>Other_Symbol</strong> ( <strong><code>So</code></strong> ), so the characters sum <strong><code>417 + 851 + 130</code></strong></p>
</li>
</ul>
</li>
<li>
<p dir="auto"><strong>All</strong> values with <strong>General_Category</strong> = <strong><code>Decimal_Number</code></strong>, from the <strong>DerivedGeneralCategory.txt</strong> file, so <strong><code>650</code></strong> characters</p>
<p dir="auto">( These are characters, with <strong>defined</strong> values in the <strong>three</strong> fields <strong><code>6</code></strong>, <strong><code>7</code></strong> and <strong><code>8</code></strong> of the <strong>UnicodeData.txt</strong> file</p>
</li>
<li>
<p dir="auto"><strong>All</strong> values with <strong>General_Category</strong> = <strong><code>Connector_Punctuation</code></strong>, from the <strong>DerivedGeneralCategory.txt</strong> file, so <strong><code>10</code></strong> characters</p>
</li>
<li>
<p dir="auto"><strong>All</strong> values with the <strong>binary</strong> Property <strong><code>Join_Control</code></strong>, from the <strong>PropList.txt</strong> file, so <strong><code>2</code></strong> characters</p>
</li>
</ul>
<p dir="auto">So, if we include all <strong>Unicode</strong> languages, even <strong>historical</strong> ones :</p>
<p dir="auto">=&gt; <strong>Total</strong> number of Unicode <strong>word</strong> characters = <strong><code>132,875 + 650 + 10 + 2</code></strong> = <strong><code>133,537</code></strong> characters, with version <strong>UNICODE</strong> <strong><code> 13.0.0</code></strong> !!</p>
<p dir="auto"><strong>Notes</strong> :</p>
<ul>
<li>The <strong>different</strong> files mentioned can be downloaded from the <strong>Unicode Character Database</strong> ( <strong><code>UCD</code></strong> ) or <strong>sub-directories</strong>, below :</li>
</ul>
<p dir="auto"><a href="http://www.unicode.org/Public/UCD/latest/ucd/" rel="nofollow ugc">http://www.unicode.org/Public/UCD/latest/ucd/</a></p>
<ul>
<li>And refer to the <strong>sites</strong>, below, for <strong>additional</strong> information to this topic :</li>
</ul>
<p dir="auto"><a href="https://www.unicode.org/reports/tr18/#Compatibility_Properties" rel="nofollow ugc">https://www.unicode.org/reports/tr18/#Compatibility_Properties</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr29/#Word_Boundaries" rel="nofollow ugc">https://www.unicode.org/reports/tr29/#Word_Boundaries</a></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr31/" rel="nofollow ugc">https://www.unicode.org/reports/tr31/</a>    for tables <strong><code>4</code></strong>, <strong><code>5</code></strong> and <strong><code>6</code></strong> of section <strong><code>2.4</code></strong></p>
<p dir="auto"><a href="https://www.unicode.org/reports/tr44/#UnicodeData.txt" rel="nofollow ugc">https://www.unicode.org/reports/tr44/#UnicodeData.txt</a></p>
<hr />
<p dir="auto">If someone did click on the links to the <strong>Unicode Consortium</strong>, above, one understood, very quickly, that <strong>word</strong> characters and word <strong>boundaries</strong> notions are a real <strong>nightmare</strong> !</p>
<p dir="auto">Even if we <strong>restrict</strong> the definition of <strong>word</strong> chars to Unicode <strong>living</strong> scripts, forgetting all the <strong>historical</strong> scripts not in use, and also leaving <strong>aside</strong> all scripts which do <strong>not</strong> use the <strong>space</strong> char to, systematically, <strong>delimit</strong> words, we still have a list of about <strong><code>21,000</code></strong> characters which should be considered as <strong>word</strong> character ! I tried to build up such a list, with the <strong>help</strong> of these sites :</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries" rel="nofollow ugc">https://en.wikipedia.org/wiki/Category:Writing_systems_without_word_boundaries</a></p>
<p dir="auto"><a href="https://linguistlist.org/issues/6/6-1302/" rel="nofollow ugc">https://linguistlist.org/issues/6/6-1302/</a></p>
<p dir="auto"><a href="https://unicode-org.github.io/cldr-staging/charts/37/supplemental/scripts_and_languages.html" rel="nofollow ugc">https://unicode-org.github.io/cldr-staging/charts/37/supplemental/scripts_and_languages.html</a></p>
<p dir="auto"><a href="https://scriptsource.org/cms/scripts/page.php?item_id=script_overview" rel="nofollow ugc">https://scriptsource.org/cms/scripts/page.php?item_id=script_overview</a></p>
<p dir="auto"><a href="https://r12a.github.io/scripts/featurelist/" rel="nofollow ugc">https://r12a.github.io/scripts/featurelist/</a></p>
<p dir="auto">And I ended up with this list of <strong><code>46</code></strong> <strong>living</strong> scripts which always use a <strong><code>Space</code></strong> character between <strong>words</strong> :</p>
<pre><code class="language-diff">•------------------------•----------------•-------------------•-----------------•
|                        |    SCRIPT      |   SPACE between   |  UNICODE Script |
|                        |      Type :    |      Words :      |     Class :     |
|                        •----------------•-------------------•-----------------•
|           SCRIPT       |  (L)iving      |  (Y)es            |  (R)ecommended  |
|                        |                |  (U)nspecified    |  (L)imited      |
|                        |  (H)istorical  |  (D)iscretionary  |  (E)xcluded     |
|                        |                |  (N)o             |                 |
•------------------------•----------------•-------------------•-----------------•
|  ARMENIAN              |       L        |         Y         |        R        |
|  ADLAM                 |       L        |         Y         |        L        |
|  ARABIC                |       L        |         Y         |        R        |
|  BAMUM                 |       L        |         Y         |        L        |
|  BASSA VAH             |       L        |         Y         |        E        |
|  BENGALI ( Assamese )  |       L        |         Y         |        R        |
|  BOPOMOFO              |       L        |         Y         |        R        |
|  BUGINESE              |       L        |         D         |        E        |
|  CANADIAN SYLLABICS    |       L        |         Y         |        L        |
|  CHEROKEE              |       L        |         Y         |        L        |
|  CYRILLIC              |       L        |         Y         |        R        |
|  DEVANAGARI            |       L        |         Y         |        R        |
|  ETHIOPIC (Ge'ez)      |       L        |         Y         |        R        |
|  GEORGIAN              |       L        |         Y         |        R        |
|  GREEK                 |       L        |         Y         |        R        |
|  GUJARATI              |       L        |         Y         |        R        |
|  GURMUKHI              |       L        |         Y         |        R        |
|  HANGUL                |       L        |         Y         |        R        |
|  HANIFI ROHINGYA       |       L        |         Y         |        L        |
|  HEBREW                |       L        |         Y         |        R        |
|  KANNADA               |       L        |         Y         |        R        |
|  KAYAH LI              |       L        |         Y         |        L        |
|  LATIN                 |       L        |         Y         |        R        |
|  LIMBU                 |       L        |         Y         |        L        |
|  MALAYALAM             |       L        |         D         |        R        |
|  MANDAIC               |       H        |         Y         |        L        |
|  MEETEI MAYEK          |       L        |         Y         |        L        |
|  MIAO (Pollard)        |       L        |         Y         |        L        |
|  MONGOLIAN             |       L        |         Y         |        E        |
|  NEWA                  |       L        |         Y         |        L        |
|  NKO                   |       L        |         Y         |        L        |
|  OL CHIKI              |       L        |         Y         |        L        |
|  ORIYA (Odia)          |       L        |         Y         |        R        |
|  OSAGE                 |       L        |         Y         |        L        |
|  SINHALA               |       L        |         Y         |        R        |
|  SUNDANESE             |       L        |         Y         |        L        |
|  SYLOTI NAGRI          |       L        |         Y         |        L        |
|  SYRIAC                |       L        |         Y         |        L        |
|  TAi VIET              |       L        |         Y         |        L        |
|  TAMIL                 |       L        |         Y         |        R        |
|  TELUGU                |       L        |         Y         |        R        |
|  THAANA                |       L        |         D         |        R        |
|  TIFINAGH (Berber)     |       L        |         Y         |        L        |
|  VAI                   |       L        |         Y         |        L        |
|  WANCHO                |       L        |         Y         |        L        |
|  YI                    |       L        |         Y         |        L        |
•------------------------•----------------•-------------------•-----------------•
</code></pre>
<p dir="auto">These scripts involve <strong><code>101</code></strong> legal <strong>Unicode</strong> scripts, from <strong>Basic Latin</strong> ( <strong><code>0000 - 007F</code></strong> ) till <strong>Symbols for Legacy Computing</strong> ( <strong><code>1FB00 - 1FBFF</code></strong> )</p>
<hr />
<p dir="auto">You may, also, have a look to these sites for <strong>general</strong> information :</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/List_of_Unicode_characters" rel="nofollow ugc">https://en.wikipedia.org/wiki/List_of_Unicode_characters</a></p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Scriptio_continua#Decline" rel="nofollow ugc">https://en.wikipedia.org/wiki/Scriptio_continua#Decline</a></p>
<p dir="auto"><a href="https://glottolog.org/glottolog/language" rel="nofollow ugc">https://glottolog.org/glottolog/language</a>    especially to <strong>locate</strong> the area where a <strong>language</strong> is used</p>
<p dir="auto"><strong>Continued</strong> discussion in the <strong>next</strong> post</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/92794</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/92794</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 10 Feb 2024 02:46:37 GMT</pubDate></item></channel></rss>