<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Word frequency list]]></title><description><![CDATA[<p dir="auto">Hello, <strong>All</strong>,</p>
<p dir="auto">In this post, I’ll show you how to generate a <strong>word frequency</strong> list, from a stream <strong>selection</strong>, using <em>ONLY</em> Notepad++'s regular expressions</p>
<p dir="auto">Of course, this topic have been resolved many times, mainly through <strong>Python</strong> scripts. I haven’t searched on our <strong><code>NodeBB</code></strong> website yet but I <strong>do</strong> know there are a few !</p>
<hr />
<p dir="auto">Now, my proposed <strong>macro</strong>, to get this <strong>word frequency</strong> list, has some <strong>limitations</strong> :</p>
<ul>
<li>
<p dir="auto">It’s <strong>mandatory</strong> to do a <em>SINGLE</em> <strong>stream</strong> selection of some part or <strong>all</strong> current file contents <em>BEFORE</em> using that macro</p>
</li>
<li>
<p dir="auto">You need to select, at least, an <strong>entire</strong> line of your <strong>current</strong> file ( the selection of <strong>few</strong> words only leads to <strong>incoherent</strong> results ! )</p>
</li>
<li>
<p dir="auto">By default, this search is <strong>sensitive</strong> to case</p>
</li>
<li>
<p dir="auto">For each <strong>word</strong> of that list, the <strong>maximum</strong> number of occurrences is <strong><code>99,999</code></strong> ( However, there a <strong>trick</strong> to get <strong>right</strong> results above this <strong>limit</strong> )</p>
</li>
<li>
<p dir="auto">Your selection ( or the <strong>entire</strong> file ) should <strong>not</strong> exceed <strong><code>10 Mb</code></strong> ( Note that it may display the message <strong><code>N++ has stop responding</code></strong> : In this case, use the <strong><code>Task manager</code></strong> or be <strong>patient</strong> some more minutes ! )</p>
</li>
</ul>
<hr />
<p dir="auto">In this macro which are the characters that I consider to be a <strong>word</strong> character ?</p>
<ul>
<li>
<p dir="auto">Of course, <strong>all</strong> the characters which satisfy the <strong><code>\w</code></strong> regex</p>
</li>
<li>
<p dir="auto">The <strong>comma</strong>, <em>ONLY IF</em> surrounded by <strong><code>digit</code></strong> characters</p>
</li>
<li>
<p dir="auto">The <strong>underscore</strong>, <em>ONLY IF</em> surrounded by <strong><code>word</code></strong> chars</p>
</li>
<li>
<p dir="auto">The <strong><code>hyphen-minus</code></strong> character</p>
</li>
<li>
<p dir="auto">The <strong>apostrophe</strong>, as in strings <strong><code>Shouldn't</code></strong>, <strong><code>program's</code></strong> and <strong><code>authors'</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>right single quotation mark</strong>, as in strings <strong><code>Should’t</code></strong>, <strong><code>program’s</code></strong> and <strong><code>authors’</code></strong></p>
</li>
<li>
<p dir="auto">For words of length <strong><code>1</code></strong>, I just consider any <strong>digit</strong> and the letters <strong><code>[AaIO]</code></strong></p>
</li>
</ul>
<hr />
<ul>
<li>
<p dir="auto">First of all, <strong>backup</strong> your <strong>active</strong> <strong><code>shortcuts.xml</code></strong> file ( one never knows ! )</p>
</li>
<li>
<p dir="auto">Open your <strong><code>Shortcuts.xml</code></strong> file</p>
</li>
<li>
<p dir="auto">In the <strong>macros</strong> section,  right <strong>before</strong> the line <strong><code>&lt;/Macros&gt;</code></strong>, insert all the <strong>new</strong> macro contents, below :</p>
</li>
</ul>
<pre><code class="language-diff">        &lt;Macro name="Word_Frequency" Ctrl="no" Alt="no" Shift="no" Key="0"&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^\w,$£€'’\r\n-]+" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="\r\n" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?&lt;= \d ) , (?= \d ) (*SKIP) (*F) | ," /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?&lt;= \w ) _ (?= \w ) (*SKIP) (*F) | _" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) ^ (?! [AaIO\d] ) .? \R | , ’? $" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="2" message="0" wParam="42059" lParam="0" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) (?: (?&lt;= × ) | (?&lt;= ^ ) ) ( .+ ) \R (?= ^ \1 \R | ^ \1 \z )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="×" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^×\r\n]+" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="×$0" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) (×+) (.+)" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="\2                                                  : \1" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) ^ .{51} \K \x20+ (?=:)" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10000}" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="¶" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="×{1000}" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="¤" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="×{100}" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="•" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10}" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="÷" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¶¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶) | (¶¶¶¶¶¶) | (¶¶¶¶¶) | (¶¶¶¶) | (¶¶¶) | (¶¶) | (¶) ) (?= ¤ | (•) | (÷) | (×) | ($) )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)(?{13}0000)" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¤¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤) | (¤¤¤¤¤¤) | (¤¤¤¤¤) | (¤¤¤¤) | (¤¤¤) | (¤¤) | (¤) ) (?= • | (÷) | (×) | ($) )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (•••••••••) | (••••••••) | (•••••••) | (••••••) | (•••••) | (••••) | (•••) | (••) | (•) ) (?= ÷ | (×) | ($) )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (÷÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷) | (÷÷÷÷÷÷) | (÷÷÷÷÷) | (÷÷÷÷) | (÷÷÷) | (÷÷) | (÷) ) (?= × | ($) )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (×××××××××) | (××××××××) | (×××××××) | (××××××) | (×××××) | (××××) | (×××) | (××) | (×) )" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;

            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?&lt;= : [ ] ) (?: ( \d{5} ) | ( \d{4} ) | ( \d{3} )  | ( \d\d ) | (\d ) ) $" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="(?1 )(?2  )(?3   )(?4    )(?5     )$0" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;
        &lt;/Macro&gt;
</code></pre>
<ul>
<li>
<p dir="auto">Save the changes of your <strong>active</strong> <strong><code>Shortcuts.xml</code></strong></p>
</li>
<li>
<p dir="auto">Close and <strong>re</strong>-open Notepad++</p>
</li>
</ul>
<p dir="auto">=&gt; You should see the <strong><code>Word_Frequency</code></strong> macro among all your <strong>other</strong> macros !</p>
<p dir="auto">See <strong>next</strong> post to get the <strong>end</strong> of this story !</p>
<p dir="auto">BR</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/27546/word-frequency-list</link><generator>RSS for Node</generator><lastBuildDate>Tue, 26 May 2026 11:39:58 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/27546.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 26 May 2026 08:33:08 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Word frequency list on Tue, 26 May 2026 08:39:44 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>All</strong>,</p>
<p dir="auto"><strong>Second</strong> and <strong>last</strong> post regarding the <strong><code>Word_Frequency</code></strong> macro !</p>
<p dir="auto">Now, a simple example :</p>
<ul>
<li>
<p dir="auto">Open the <strong><code>change.log</code></strong> file of the <strong>last</strong> release <strong><code>v8.9.6</code></strong></p>
</li>
<li>
<p dir="auto">Do a <strong>stream</strong> selection of <strong>all</strong> the points of the v8.9.6 release, <em>ONLY</em>. So, the lines below :</p>
</li>
</ul>
<pre><code class="language-diff"> 1. Fix vulnerability (CVE-2026-46710) of v8.9.4 &amp; v8.9.5 installer.
 2. Fix x86 installer regression of not showing installation entry in Control Panel's "Unstall a program".
 3. Fix x86 installer regression where context menu not installed or uninstalled correctly.
 4. Fix UAC prompt display regression (“Notepad++ installer” instead of “Notepad++”) for Notepad++ v8.9.5.
 5. Fix incorrect bevaviour when saving dirty read-only files.
 6. Fix regression where saving a UDL file removed XML declaration.
</code></pre>
<p dir="auto">Run the <strong><code>Word_Frequency</code></strong> macro. You should get, at <strong>once,</strong> this <em>OUTPUY</em> text :</p>
<pre><code class="language-xml">1                                                  :      1
2                                                  :      1
3                                                  :      1
4                                                  :      2
5                                                  :      3
6                                                  :      1
9                                                  :      3
CVE-2026-46710                                     :      1
Control                                            :      1
Fix                                                :      6
Notepad                                            :      3
Panel's                                            :      1
UAC                                                :      1
UDL                                                :      1
Unstall                                            :      1
XML                                                :      1
a                                                  :      2
bevaviour                                          :      1
context                                            :      1
correctly                                          :      1
declaration                                        :      1
dirty                                              :      1
display                                            :      1
entry                                              :      1
file                                               :      1
files                                              :      1
for                                                :      1
in                                                 :      1
incorrect                                          :      1
installation                                       :      1
installed                                          :      1
installer                                          :      4
instead                                            :      1
menu                                               :      1
not                                                :      2
of                                                 :      3
or                                                 :      1
program                                            :      1
prompt                                             :      1
read-only                                          :      1
regression                                         :      4
removed                                            :      1
saving                                             :      2
showing                                            :      1
uninstalled                                        :      1
v8                                                 :      3
vulnerability                                      :      1
when                                               :      1
where                                              :      2
x86                                                :      2
</code></pre>
<hr />
<p dir="auto">If you prefer a ordered list <strong>ignoring</strong> the <strong>case</strong>, simply insert the <strong>regex</strong> replacement, below</p>
<pre><code class="language-xml">            &lt;Action type="3" message="1700" wParam="0" lParam="0" sParam="" /&gt;
            &lt;Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-i) \u+" /&gt;
            &lt;Action type="3" message="1625" wParam="0" lParam="2" sParam="" /&gt;
            &lt;Action type="3" message="1602" wParam="0" lParam="0" sParam="\L$0" /&gt;
            &lt;Action type="3" message="1702" wParam="0" lParam="640" sParam="" /&gt;
            &lt;Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /&gt;
</code></pre>
<p dir="auto">Right before the <strong>sort</strong> line :</p>
<pre><code class="language-xml">            &lt;Action type="2" message="0" wParam="42059" lParam="0" sParam="" /&gt;
</code></pre>
<hr />
<p dir="auto">Here is the <strong>trick</strong> to get the <strong>right</strong> number of occurrences when <strong><code>&gt; 99,999</code></strong>.</p>
<ul>
<li>Search for any remaining <strong><code>¶</code></strong> character with the regex <strong><code>¶+</code></strong> . Let’s suppose you have this line :</li>
</ul>
<pre><code class="language-diff">the                                                :      ¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶97371
</code></pre>
<p dir="auto">As the number of <strong>consecutive</strong> <strong><code>¶</code></strong> is <strong><code>23</code></strong>, the <strong>exact</strong> of occurrences of the word <strong><code>the</code></strong> is : <strong><code>23 × 10000 + 97,371</code></strong> i.e. <strong><code>327,371</code></strong> occurrences</p>
<hr />
<p dir="auto">Remember that the <strong>first</strong> thing to do, before running the <strong><code>Word_Frequency</code></strong> macro, is to <strong>select</strong> part or all current file contents !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/105534</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/105534</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Tue, 26 May 2026 08:39:44 GMT</pubDate></item></channel></rss>