Word frequency list
-
Hello, All,
In this post, I’ll show you how to generate a word frequency list, from a stream selection, using ONLY Notepad++'s regular expressions
Of course, this topic have been resolved many times, mainly through Python scripts. I haven’t searched on our
NodeBBwebsite yet but I do know there are a few !
Now, my proposed macro, to get this word frequency list, has some limitations :
-
It’s mandatory to do a SINGLE stream selection of some part or all current file contents BEFORE using that macro
-
You need to select, at least, an entire line of your current file ( the selection of few words only leads to incoherent results ! )
-
By default, this search is sensitive to case
-
For each word of that list, the maximum number of occurrences is
99,999( However, there a trick to get right results above this limit ) -
Your selection ( or the entire file ) should not exceed
10 Mb( Note that it may display the messageN++ has stop responding: In this case, use theTask manageror be patient some more minutes ! )
In this macro which are the characters that I consider to be a word character ?
-
Of course, all the characters which satisfy the
\wregex -
The comma, ONLY IF surrounded by
digitcharacters -
The underscore, ONLY IF surrounded by
wordchars -
The
hyphen-minuscharacter -
The apostrophe, as in strings
Shouldn't,program'sandauthors' -
The right single quotation mark, as in strings
Should’t,program’sandauthors’ -
For words of length
1, I just consider any digit and the letters[AaIO]
-
First of all, backup your active
shortcuts.xmlfile ( one never knows ! ) -
Open your
Shortcuts.xmlfile -
In the macros section, right before the line
</Macros>, insert all the new macro contents, below :
<Macro name="Word_Frequency" Ctrl="no" Alt="no" Shift="no" Key="0"> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^\w,$£€'’\r\n-]+" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="\r\n" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= \d ) , (?= \d ) (*SKIP) (*F) | ," /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= \w ) _ (?= \w ) (*SKIP) (*F) | _" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) ^ (?! [AaIO\d] ) .? \R | , ’? $" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="2" message="0" wParam="42059" lParam="0" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) (?: (?<= × ) | (?<= ^ ) ) ( .+ ) \R (?= ^ \1 \R | ^ \1 \z )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="×" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^×\r\n]+" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="×$0" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) (×+) (.+)" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="\2 : \1" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) ^ .{51} \K \x20+ (?=:)" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10000}" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="¶" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{1000}" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="¤" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{100}" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="•" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10}" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="÷" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¶¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶) | (¶¶¶¶¶¶) | (¶¶¶¶¶) | (¶¶¶¶) | (¶¶¶) | (¶¶) | (¶) ) (?= ¤ | (•) | (÷) | (×) | ($) )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)(?{13}0000)" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¤¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤) | (¤¤¤¤¤¤) | (¤¤¤¤¤) | (¤¤¤¤) | (¤¤¤) | (¤¤) | (¤) ) (?= • | (÷) | (×) | ($) )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (•••••••••) | (••••••••) | (•••••••) | (••••••) | (•••••) | (••••) | (•••) | (••) | (•) ) (?= ÷ | (×) | ($) )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (÷÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷) | (÷÷÷÷÷÷) | (÷÷÷÷÷) | (÷÷÷÷) | (÷÷÷) | (÷÷) | (÷) ) (?= × | ($) )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (×××××××××) | (××××××××) | (×××××××) | (××××××) | (×××××) | (××××) | (×××) | (××) | (×) )" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= : [ ] ) (?: ( \d{5} ) | ( \d{4} ) | ( \d{3} ) | ( \d\d ) | (\d ) ) $" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?1 )(?2 )(?3 )(?4 )(?5 )$0" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" /> </Macro>-
Save the changes of your active
Shortcuts.xml -
Close and re-open Notepad++
=> You should see the
Word_Frequencymacro among all your other macros !See next post to get the end of this story !
BR
guy038
-
-
Hi, All,
Second and last post regarding the
Word_Frequencymacro !Now, a simple example :
-
Open the
change.logfile of the last releasev8.9.6 -
Do a stream selection of all the points of the v8.9.6 release, ONLY. So, the lines below :
1. Fix vulnerability (CVE-2026-46710) of v8.9.4 & v8.9.5 installer. 2. Fix x86 installer regression of not showing installation entry in Control Panel's "Unstall a program". 3. Fix x86 installer regression where context menu not installed or uninstalled correctly. 4. Fix UAC prompt display regression (“Notepad++ installer” instead of “Notepad++”) for Notepad++ v8.9.5. 5. Fix incorrect bevaviour when saving dirty read-only files. 6. Fix regression where saving a UDL file removed XML declaration.Run the
Word_Frequencymacro. You should get, at once, this OUTPUY text :1 : 1 2 : 1 3 : 1 4 : 2 5 : 3 6 : 1 9 : 3 CVE-2026-46710 : 1 Control : 1 Fix : 6 Notepad : 3 Panel's : 1 UAC : 1 UDL : 1 Unstall : 1 XML : 1 a : 2 bevaviour : 1 context : 1 correctly : 1 declaration : 1 dirty : 1 display : 1 entry : 1 file : 1 files : 1 for : 1 in : 1 incorrect : 1 installation : 1 installed : 1 installer : 4 instead : 1 menu : 1 not : 2 of : 3 or : 1 program : 1 prompt : 1 read-only : 1 regression : 4 removed : 1 saving : 2 showing : 1 uninstalled : 1 v8 : 3 vulnerability : 1 when : 1 where : 2 x86 : 2
If you prefer a ordered list ignoring the case, simply insert the regex replacement, below
<Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-i) \u+" /> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <Action type="3" message="1602" wParam="0" lParam="0" sParam="\L$0" /> <Action type="3" message="1702" wParam="0" lParam="640" sParam="" /> <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />Right before the sort line :
<Action type="2" message="0" wParam="42059" lParam="0" sParam="" />
Here is the trick to get the right number of occurrences when
> 99,999.- Search for any remaining
¶character with the regex¶+. Let’s suppose you have this line :
the : ¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶97371As the number of consecutive
¶is23, the exact of occurrences of the wordtheis :23 × 10000 + 97,371i.e.327,371occurrences
Remember that the first thing to do, before running the
Word_Frequencymacro, is to select part or all current file contents !Best Regards,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login