Community
    • Login

    Show a list of same words before replacement

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    24 Posts 5 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Patrik SpacekP
      Patrik Spacek
      last edited by

      Guys,
      is there a automatic word list that shows you the count of the most used / repeated words in the document?
      i.e. I would like to open a window list, that shows me from top to bottom the most repeatable words in the specific doc.

      the - 354
      train - 253
      cool - 128
      hope - 89
      etc.

      that helps me a lot with translation instead of guessing or typing manually in search.

      Thanks

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Patrik Spacek
        last edited by

        @Patrik-Spacek

        Here’s a Pythonscript I had on hand that copies to the clipboard a word histogram like you show, based on the word content of the current document; obviously requires use of the Pythonscript plugin.

        It’s not perfect the way it is, e.g. it would show s as a word when it scans It's, but as a base thing, it’s not bad.

        # -*- coding: utf-8 -*-
        
        from Npp import editor
        
        word_matches = []
        def match_found(m): word_matches.append(editor.getTextRange(m.span(0)[0], m.span(0)[1]))
        editor.research('\w+', match_found)
        histogram_dict = {}
        for word in word_matches:
            if word not in histogram_dict:
                histogram_dict[word] = 1
            else:
                histogram_dict[word] += 1
        output_list = []
        for k in histogram_dict: output_list.append('{}={}'.format(k, histogram_dict[k]))
        output_list.sort()
        editor.copyText('\r\n'.join(output_list))
        
        1 Reply Last reply Reply Quote 4
        • guy038G
          guy038
          last edited by guy038

          Hi, @patrik-spacek, @alan-kilborn,

          Alan, I’ve just tried your Python script, that I named Counting_Words.py and …Wow, very nice ! It’s typically the kind of task for a computer ;-))

          Now, to get a better output :

          • Move the caret at beginning of the list

          • Open the Replace dialog Ctrl + H )

          • SEARCH (=)|^.{18}\K\.+

          • REPLACE (?1.............................. )

          • Untick the Wrap around option

          • Click TWICE on the Replace All button


          I still plan to “play” with your script and I’ll let you know my impressions soon ;-))

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn
            last edited by

            @guy038 said in Show a list of same words before replacement:

            Now, to get a better output :

            Hmm, not sure what you were going for here, but when I tried the regex replacement you offered, nothing happened to the file (0 replacements). ??

            I’m guessing that you meant for something like this to happen:

            My original output:

            After=1
            Alice=3
            Angeles=1
            Bob=7
            But=1
            Cannon=1
            Carol=6
            Culp=1
            Disturbed=1
            Don=1
            

            to be transformed into something like:

            After.............1
            Alice.............3
            Angeles...........1
            Bob...............7
            But...............1
            Cannon............1
            Carol.............6
            Culp..............1
            Disturbed.........1
            Don...............1
            

            If so, we can do this easily, right in the Python; let me walk you through it…

            See this part of the original script:

            '{}={}'.format(…

            I was a bit lazy with that. I should have done this:

            '{0}={1}'.format(…

            This means, take the two parameters supplied to the format function, and insert them into the {} placeholders in the format string, according to their position. 0 would be the first parameter to format, 1 would be the second. Thus we get simple Bob=7 output for our problem at hand.

            Note that if numbers (e.g. the 0 and the 1) are not supplied, the numbering is implied, left to right, starting at zero and counting up by one each time a {} is encountered.

            Okay, so that’s all fine, but if we want special formatting (this is the format function after all), we can add a colon after the number and supply some formatting codes, e.g.:

            '{0:.<18}{1}'.format(…

            This means, for the first parameter to format, reserve a field width 18 wide, left-justify the parameter value in this field (specified with the <), and pad any remaining empty space in the field with the . character.

            And this indeed does generate the “transformed” output I showed above, when the script is changed and re-run.

            More on the format() function and what it can do is here:
            https://docs.python.org/2.7/library/string.html#format-string-syntax
            It’s a bit heavy, but scrolling down one finds an “examples” section, which maybe helps.

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              Hi, @alan-kilborn,

              I don’t understand ! For instance, with the previous search regex :

              • SEARCH (=)|^.{18}\K\.+

              • REPLACE (?1.............................. )

              and TWO consecutive clicks on the Replace All button

              this text :

              a=5
              bc=1
              def=150
              ghij=17
              

              is changed into :

              a................. 5
              bc................ 1
              def............... 150
              ghij.............. 17
              

              Anyway, you guessed well and, thanks to your explanations about Python formating, I chose the '{0:.<18} {1}'.format syntax with a space between the last dot and the number. Super !

              Best Regards,

              guy038

              Alan KilbornA 2 Replies Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @guy038
                last edited by

                @guy038

                By the way, it may pay dividends to become familiar with the format function’s formatting codes – I’ll be using that same syntax as a user-input parameter for my upcoming script posting for “replacing using an incrementing count value”.

                Reference (but no, nothing new there YET!)

                1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038 said in Show a list of same words before replacement:

                  I don’t understand ! For instance, with the previous search regex

                  I found out the problem.
                  I neglected to follow this part of the original instruction:

                  Move the caret at beginning of the list

                  After doing that, the regex replacement works fine for me. Sorry for the confusion.

                  1 Reply Last reply Reply Quote 1
                  • caryptC
                    carypt
                    last edited by

                    @Alan-Kilborn ooh , thank you for sharing your python-script , this is the stuff i expect npp to do by default . yes , thank you , litrature researchers want to know these facts , dont they ?

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @carypt
                      last edited by

                      @carypt said in Show a list of same words before replacement:

                      this is the stuff i expect npp to do by default

                      WHY would you expect this??

                      1 Reply Last reply Reply Quote 0
                      • caryptC
                        carypt
                        last edited by carypt

                        @Alan-Kilborn why would i ? because everywhere when it comes into tricky needs , people get directed to npp , because you can do “everything” with it . fine , yes , if you can code a bit as it seems , or by help of others . this text-analysing , quenching , reformatting , experimenting stuff i do expect npp to be able . finding rhymes , wrap everything around , giving negatives of characters , or tempered chords of syllables , counting rythm , finding copypasted phrases in dissertaions , plagiarism . finding sentence lenght , word historam (sic !) , sentence structure analysis . book eating stuff . playing with words .

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @carypt
                          last edited by

                          @carypt

                          No. With a period after the “no”.

                          Word-histogramming (as I showed with the script) and the other things you so interestingly described are more functions of word-processing software or grammar-analyzing software, or, to the point, special-purpose software.

                          A text editor is, by definition, a fairly general purpose tool.
                          N++ provides some advanced capability, but it centers around general-purpose needs. Examples include sorting lines, removing duplicate lines…

                          “…finding rhymes, negatives of characters, tempered chords, rhythm (which I can spell correctly), plagarism, structure analysis…” – all very special purpose.

                          historam (sic!)

                          I know what sic means, but I certainly don’t know what you’re implying with its usage here.

                          1 Reply Last reply Reply Quote 1
                          • caryptC
                            carypt
                            last edited by

                            @Alan-Kilborn ok , meh

                            1 Reply Last reply Reply Quote 0
                            • caryptC
                              carypt
                              last edited by

                              @Alan-Kilborn but … ty for guiding me to “text-mining”. )

                              1 Reply Last reply Reply Quote 0
                              • Patrik SpacekP
                                Patrik Spacek
                                last edited by

                                Thanks guys for all those suggestions, but i am not a coder… and apps like “word frequency counter” show lots of errors if I paste text with 9000 lines…

                                I dont know python or any coding…could you create a .bat file or some small app runner where i can paste my file or text and just run it?

                                Some simple version of app that I dont have to code and so on?
                                BTW this kind of addition must be added to notepad++ in general :/.

                                Thanks

                                PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
                                • PeterJonesP
                                  PeterJones @Patrik Spacek
                                  last edited by PeterJones

                                  @Patrik-Spacek said in Show a list of same words before replacement:

                                  could you create a .bat file or some small app runner where i can paste my file or text and just run it?
                                  Some simple version of app that I dont have to code and so on?

                                  No need.

                                  Thanks guys for all those suggestions, but i am not a coder…
                                  I dont know python or any coding…

                                  Fortunately, you don’t have to know a whit of python, because @Alan-Kilborn already gave you the code that will work to accomplish this solution in Notepad++:

                                  1. Go to Plugins > Plugins Admin, and install the PythonScript plugin.
                                  2. Restart Notepad++ as necessary
                                  3. Plugins > Python Script > New Script, give it the name WordCount.py or similar
                                  4. Copy/paste the script that Alan posted earlier
                                  5. Plugins > Python Script > Scripts > WordCount.py will run the script for you, and use Paste / Ctrl+V to paste the count data wherever you want it (a new file, whatever)

                                  Once you know it works,

                                  1. Plugins > Python Script > Configuration…
                                  2. Select WordCount.py in the UserScripts
                                  3. Click the left Add to add that script to Menu Items list
                                  4. Make sure Initialisation is et to ATSTARTUP
                                  5. Click OK
                                  6. Exit Notepad++ and restart
                                  7. Settings > Shortcut Mapper > Plugin Commands
                                  8. Filter for WordCount
                                  9. Select that line, click Modify, and set the keyboard shortcut you desire, and close the dialogs.

                                  From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

                                  No programming required for you, because Alan already did the work.

                                  Patrik SpacekP 1 Reply Last reply Reply Quote 1
                                  • Alan KilbornA
                                    Alan Kilborn @Patrik Spacek
                                    last edited by Alan Kilborn

                                    @Patrik-Spacek said in Show a list of same words before replacement:

                                    this kind of addition must be added to notepad++ in general

                                    You seem to think like @carypt
                                    Kindly refer to my previous post which started out with No. With a period after the…

                                    @PeterJones

                                    Some people are just paralyzed with fear about something like you describe. :-)

                                    1 Reply Last reply Reply Quote 0
                                    • Patrik SpacekP
                                      Patrik Spacek @PeterJones
                                      last edited by

                                      @PeterJones thanks Peter,

                                      nothing is happening…

                                      ea6f6976-7a92-4534-8dd8-c8161e7e079a-image.png

                                      PeterJonesP 1 Reply Last reply Reply Quote 0
                                      • PeterJonesP
                                        PeterJones @Patrik Spacek
                                        last edited by

                                        @Patrik-Spacek

                                        As I said,

                                        From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

                                        Did you paste somewhere after running the script?

                                        Patrik SpacekP 1 Reply Last reply Reply Quote 0
                                        • Patrik SpacekP
                                          Patrik Spacek @PeterJones
                                          last edited by Patrik Spacek

                                          @PeterJones ouch, i missed that paste to somewhere… yes, its working!

                                          what about add command that opens a new doc in notepad++ and paste it right away?
                                          also set in count order from highest to lowest?

                                          @Alan-Kilborn

                                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                                          • Alan KilbornA
                                            Alan Kilborn @Patrik Spacek
                                            last edited by

                                            @Patrik-Spacek said in Show a list of same words before replacement:

                                            what about add command that opens a new doc in notepad++

                                            notepad.new()

                                            and paste it right away?

                                            editor.paste()

                                            You may not be as far from programming as you think, if you try your hand at adding the above (hint: at the end of the script) :-)

                                            Patrik SpacekP 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors