Show a list of same words before replacement



  • Guys,
    is there a automatic word list that shows you the count of the most used / repeated words in the document?
    i.e. I would like to open a window list, that shows me from top to bottom the most repeatable words in the specific doc.

    the - 354
    train - 253
    cool - 128
    hope - 89
    etc.

    that helps me a lot with translation instead of guessing or typing manually in search.

    Thanks



  • @Patrik-Spacek

    Here’s a Pythonscript I had on hand that copies to the clipboard a word histogram like you show, based on the word content of the current document; obviously requires use of the Pythonscript plugin.

    It’s not perfect the way it is, e.g. it would show s as a word when it scans It's, but as a base thing, it’s not bad.

    # -*- coding: utf-8 -*-
    
    from Npp import editor
    
    word_matches = []
    def match_found(m): word_matches.append(editor.getTextRange(m.span(0)[0], m.span(0)[1]))
    editor.research('\w+', match_found)
    histogram_dict = {}
    for word in word_matches:
        if word not in histogram_dict:
            histogram_dict[word] = 1
        else:
            histogram_dict[word] += 1
    output_list = []
    for k in histogram_dict: output_list.append('{}={}'.format(k, histogram_dict[k]))
    output_list.sort()
    editor.copyText('\r\n'.join(output_list))
    


  • Hi, @patrik-spacek, @alan-kilborn,

    Alan, I’ve just tried your Python script, that I named Counting_Words.py and …Wow, very nice ! It’s typically the kind of task for a computer ;-))

    Now, to get a better output :

    • Move the caret at beginning of the list

    • Open the Replace dialog Ctrl + H )

    • SEARCH (=)|^.{18}\K\.+

    • REPLACE (?1.............................. )

    • Untick the Wrap around option

    • Click TWICE on the Replace All button


    I still plan to “play” with your script and I’ll let you know my impressions soon ;-))

    Best Regards,

    guy038



  • @guy038 said in Show a list of same words before replacement:

    Now, to get a better output :

    Hmm, not sure what you were going for here, but when I tried the regex replacement you offered, nothing happened to the file (0 replacements). ??

    I’m guessing that you meant for something like this to happen:

    My original output:

    After=1
    Alice=3
    Angeles=1
    Bob=7
    But=1
    Cannon=1
    Carol=6
    Culp=1
    Disturbed=1
    Don=1
    

    to be transformed into something like:

    After.............1
    Alice.............3
    Angeles...........1
    Bob...............7
    But...............1
    Cannon............1
    Carol.............6
    Culp..............1
    Disturbed.........1
    Don...............1
    

    If so, we can do this easily, right in the Python; let me walk you through it…

    See this part of the original script:

    '{}={}'.format(

    I was a bit lazy with that. I should have done this:

    '{0}={1}'.format(

    This means, take the two parameters supplied to the format function, and insert them into the {} placeholders in the format string, according to their position. 0 would be the first parameter to format, 1 would be the second. Thus we get simple Bob=7 output for our problem at hand.

    Note that if numbers (e.g. the 0 and the 1) are not supplied, the numbering is implied, left to right, starting at zero and counting up by one each time a {} is encountered.

    Okay, so that’s all fine, but if we want special formatting (this is the format function after all), we can add a colon after the number and supply some formatting codes, e.g.:

    '{0:.<18}{1}'.format(

    This means, for the first parameter to format, reserve a field width 18 wide, left-justify the parameter value in this field (specified with the <), and pad any remaining empty space in the field with the . character.

    And this indeed does generate the “transformed” output I showed above, when the script is changed and re-run.

    More on the format() function and what it can do is here:
    https://docs.python.org/2.7/library/string.html#format-string-syntax
    It’s a bit heavy, but scrolling down one finds an “examples” section, which maybe helps.



  • Hi, @alan-kilborn,

    I don’t understand ! For instance, with the previous search regex :

    • SEARCH (=)|^.{18}\K\.+

    • REPLACE (?1.............................. )

    and TWO consecutive clicks on the Replace All button

    this text :

    a=5
    bc=1
    def=150
    ghij=17
    

    is changed into :

    a................. 5
    bc................ 1
    def............... 150
    ghij.............. 17
    

    Anyway, you guessed well and, thanks to your explanations about Python formating, I chose the '{0:.<18} {1}'.format syntax with a space between the last dot and the number. Super !

    Best Regards,

    guy038



  • @guy038

    By the way, it may pay dividends to become familiar with the format function’s formatting codes – I’ll be using that same syntax as a user-input parameter for my upcoming script posting for “replacing using an incrementing count value”.

    Reference (but no, nothing new there YET!)



  • @guy038 said in Show a list of same words before replacement:

    I don’t understand ! For instance, with the previous search regex

    I found out the problem.
    I neglected to follow this part of the original instruction:

    Move the caret at beginning of the list

    After doing that, the regex replacement works fine for me. Sorry for the confusion.



  • @Alan-Kilborn ooh , thank you for sharing your python-script , this is the stuff i expect npp to do by default . yes , thank you , litrature researchers want to know these facts , dont they ?



  • @carypt said in Show a list of same words before replacement:

    this is the stuff i expect npp to do by default

    WHY would you expect this??



  • @Alan-Kilborn why would i ? because everywhere when it comes into tricky needs , people get directed to npp , because you can do “everything” with it . fine , yes , if you can code a bit as it seems , or by help of others . this text-analysing , quenching , reformatting , experimenting stuff i do expect npp to be able . finding rhymes , wrap everything around , giving negatives of characters , or tempered chords of syllables , counting rythm , finding copypasted phrases in dissertaions , plagiarism . finding sentence lenght , word historam (sic !) , sentence structure analysis . book eating stuff . playing with words .



  • @carypt

    No. With a period after the “no”.

    Word-histogramming (as I showed with the script) and the other things you so interestingly described are more functions of word-processing software or grammar-analyzing software, or, to the point, special-purpose software.

    A text editor is, by definition, a fairly general purpose tool.
    N++ provides some advanced capability, but it centers around general-purpose needs. Examples include sorting lines, removing duplicate lines…

    “…finding rhymes, negatives of characters, tempered chords, rhythm (which I can spell correctly), plagarism, structure analysis…” – all very special purpose.

    historam (sic!)

    I know what sic means, but I certainly don’t know what you’re implying with its usage here.



  • @Alan-Kilborn ok , meh



  • @Alan-Kilborn but … ty for guiding me to “text-mining”. )



  • Thanks guys for all those suggestions, but i am not a coder… and apps like “word frequency counter” show lots of errors if I paste text with 9000 lines…

    I dont know python or any coding…could you create a .bat file or some small app runner where i can paste my file or text and just run it?

    Some simple version of app that I dont have to code and so on?
    BTW this kind of addition must be added to notepad++ in general :/.

    Thanks



  • @Patrik-Spacek said in Show a list of same words before replacement:

    could you create a .bat file or some small app runner where i can paste my file or text and just run it?
    Some simple version of app that I dont have to code and so on?

    No need.

    Thanks guys for all those suggestions, but i am not a coder…
    I dont know python or any coding…

    Fortunately, you don’t have to know a whit of python, because @Alan-Kilborn already gave you the code that will work to accomplish this solution in Notepad++:

    1. Go to Plugins > Plugins Admin, and install the PythonScript plugin.
    2. Restart Notepad++ as necessary
    3. Plugins > Python Script > New Script, give it the name WordCount.py or similar
    4. Copy/paste the script that Alan posted earlier
    5. Plugins > Python Script > Scripts > WordCount.py will run the script for you, and use Paste / Ctrl+V to paste the count data wherever you want it (a new file, whatever)

    Once you know it works,

    1. Plugins > Python Script > Configuration…
    2. Select WordCount.py in the UserScripts
    3. Click the left Add to add that script to Menu Items list
    4. Make sure Initialisation is et to ATSTARTUP
    5. Click OK
    6. Exit Notepad++ and restart
    7. Settings > Shortcut Mapper > Plugin Commands
    8. Filter for WordCount
    9. Select that line, click Modify, and set the keyboard shortcut you desire, and close the dialogs.

    From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

    No programming required for you, because Alan already did the work.



  • @Patrik-Spacek said in Show a list of same words before replacement:

    this kind of addition must be added to notepad++ in general

    You seem to think like @carypt
    Kindly refer to my previous post which started out with No. With a period after the…

    @PeterJones

    Some people are just paralyzed with fear about something like you describe. :-)



  • @PeterJones thanks Peter,

    nothing is happening…

    ea6f6976-7a92-4534-8dd8-c8161e7e079a-image.png



  • @Patrik-Spacek

    As I said,

    From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

    Did you paste somewhere after running the script?



  • @PeterJones ouch, i missed that paste to somewhere… yes, its working!

    what about add command that opens a new doc in notepad++ and paste it right away?
    also set in count order from highest to lowest?

    @Alan-Kilborn



  • @Patrik-Spacek said in Show a list of same words before replacement:

    what about add command that opens a new doc in notepad++

    notepad.new()

    and paste it right away?

    editor.paste()

    You may not be as far from programming as you think, if you try your hand at adding the above (hint: at the end of the script) :-)


Log in to reply