Community
    • Login

    Word frequency list

    Scheduled Pinned Locked Moved Blogs
    2 Posts 1 Posters 23 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G Offline
      guy038
      last edited by guy038

      Hello, All,

      In this post, I’ll show you how to generate a word frequency list, from a stream selection, using ONLY Notepad++'s regular expressions

      Of course, this topic have been resolved many times, mainly through Python scripts. I haven’t searched on our NodeBB website yet but I do know there are a few !


      Now, my proposed macro, to get this word frequency list, has some limitations :

      • It’s mandatory to do a SINGLE stream selection of some part or all current file contents BEFORE using that macro

      • You need to select, at least, an entire line of your current file ( the selection of few words only leads to incoherent results ! )

      • By default, this search is sensitive to case

      • For each word of that list, the maximum number of occurrences is 99,999 ( However, there a trick to get right results above this limit )

      • Your selection ( or the entire file ) should not exceed 10 Mb ( Note that it may display the message N++ has stop responding : In this case, use the Task manager or be patient some more minutes ! )


      In this macro which are the characters that I consider to be a word character ?

      • Of course, all the characters which satisfy the \w regex

      • The comma, ONLY IF surrounded by digit characters

      • The underscore, ONLY IF surrounded by word chars

      • The hyphen-minus character

      • The apostrophe, as in strings Shouldn't, program's and authors'

      • The right single quotation mark, as in strings Should’t, program’s and authors’

      • For words of length 1, I just consider any digit and the letters [AaIO]


      • First of all, backup your active shortcuts.xml file ( one never knows ! )

      • Open your Shortcuts.xml file

      • In the macros section, right before the line </Macros>, insert all the new macro contents, below :

              <Macro name="Word_Frequency" Ctrl="no" Alt="no" Shift="no" Key="0">
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^\w,$£€'’\r\n-]+" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="\r\n" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= \d ) , (?= \d ) (*SKIP) (*F) | ," />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= \w ) _ (?= \w ) (*SKIP) (*F) | _" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) ^ (?! [AaIO\d] ) .? \R | , ’? $" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="2" message="0" wParam="42059" lParam="0" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-si) (?: (?<= × ) | (?<= ^ ) ) ( .+ ) \R (?= ^ \1 \R | ^ \1 \z )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="×" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) [^×\r\n]+" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="×$0" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) (×+) (.+)" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="\2                                                  : \1" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-s) ^ .{51} \K \x20+ (?=:)" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10000}" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="¶" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{1000}" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="¤" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{100}" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="•" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="×{10}" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="÷" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¶¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶¶) | (¶¶¶¶¶¶¶) | (¶¶¶¶¶¶) | (¶¶¶¶¶) | (¶¶¶¶) | (¶¶¶) | (¶¶) | (¶) ) (?= ¤ | (•) | (÷) | (×) | ($) )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)(?{13}0000)" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (¤¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤¤) | (¤¤¤¤¤¤¤) | (¤¤¤¤¤¤) | (¤¤¤¤¤) | (¤¤¤¤) | (¤¤¤) | (¤¤) | (¤) ) (?= • | (÷) | (×) | ($) )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)(?{12}000)" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (•••••••••) | (••••••••) | (•••••••) | (••••••) | (•••••) | (••••) | (•••) | (••) | (•) ) (?= ÷ | (×) | ($) )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)(?{11}00)" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (÷÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷÷) | (÷÷÷÷÷÷÷) | (÷÷÷÷÷÷) | (÷÷÷÷÷) | (÷÷÷÷) | (÷÷÷) | (÷÷) | (÷) ) (?= × | ($) )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)(?{10}0)" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?: (×××××××××) | (××××××××) | (×××××××) | (××××××) | (×××××) | (××××) | (×××) | (××) | (×) )" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}9)(?{2}8)(?{3}7)(?{4}6)(?{5}5)(?{6}4)(?{7}3)(?{8}2)(?{9}1)" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
      
                  <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                  <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x) (?<= : [ ] ) (?: ( \d{5} ) | ( \d{4} ) | ( \d{3} )  | ( \d\d ) | (\d ) ) $" />
                  <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                  <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?1 )(?2  )(?3   )(?4    )(?5     )$0" />
                  <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                  <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
              </Macro>
      
      • Save the changes of your active Shortcuts.xml

      • Close and re-open Notepad++

      => You should see the Word_Frequency macro among all your other macros !

      See next post to get the end of this story !

      BR

      guy038

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hi, All,

        Second and last post regarding the Word_Frequency macro !

        Now, a simple example :

        • Open the change.log file of the last release v8.9.6

        • Do a stream selection of all the points of the v8.9.6 release, ONLY. So, the lines below :

         1. Fix vulnerability (CVE-2026-46710) of v8.9.4 & v8.9.5 installer.
         2. Fix x86 installer regression of not showing installation entry in Control Panel's "Unstall a program".
         3. Fix x86 installer regression where context menu not installed or uninstalled correctly.
         4. Fix UAC prompt display regression (“Notepad++ installer” instead of “Notepad++”) for Notepad++ v8.9.5.
         5. Fix incorrect bevaviour when saving dirty read-only files.
         6. Fix regression where saving a UDL file removed XML declaration.
        

        Run the Word_Frequency macro. You should get, at once, this OUTPUY text :

        1                                                  :      1
        2                                                  :      1
        3                                                  :      1
        4                                                  :      2
        5                                                  :      3
        6                                                  :      1
        9                                                  :      3
        CVE-2026-46710                                     :      1
        Control                                            :      1
        Fix                                                :      6
        Notepad                                            :      3
        Panel's                                            :      1
        UAC                                                :      1
        UDL                                                :      1
        Unstall                                            :      1
        XML                                                :      1
        a                                                  :      2
        bevaviour                                          :      1
        context                                            :      1
        correctly                                          :      1
        declaration                                        :      1
        dirty                                              :      1
        display                                            :      1
        entry                                              :      1
        file                                               :      1
        files                                              :      1
        for                                                :      1
        in                                                 :      1
        incorrect                                          :      1
        installation                                       :      1
        installed                                          :      1
        installer                                          :      4
        instead                                            :      1
        menu                                               :      1
        not                                                :      2
        of                                                 :      3
        or                                                 :      1
        program                                            :      1
        prompt                                             :      1
        read-only                                          :      1
        regression                                         :      4
        removed                                            :      1
        saving                                             :      2
        showing                                            :      1
        uninstalled                                        :      1
        v8                                                 :      3
        vulnerability                                      :      1
        when                                               :      1
        where                                              :      2
        x86                                                :      2
        

        If you prefer a ordered list ignoring the case, simply insert the regex replacement, below

                    <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                    <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?x-i) \u+" />
                    <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                    <Action type="3" message="1602" wParam="0" lParam="0" sParam="\L$0" />
                    <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                    <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
        

        Right before the sort line :

                    <Action type="2" message="0" wParam="42059" lParam="0" sParam="" />
        

        Here is the trick to get the right number of occurrences when > 99,999.

        • Search for any remaining ¶ character with the regex ¶+ . Let’s suppose you have this line :
        the                                                :      ¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶¶97371
        

        As the number of consecutive ¶ is 23, the exact of occurrences of the word the is : 23 × 10000 + 97,371 i.e. 327,371 occurrences


        Remember that the first thing to do, before running the Word_Frequency macro, is to select part or all current file contents !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0

        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

        With your input, this post could be even better 💗

        Register Login
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors