• Login
Community
  • Login

Show a list of same words before replacement

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
24 Posts 5 Posters 4.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    Patrik Spacek
    last edited by Jan 14, 2021, 6:48 PM

    Guys,
    is there a automatic word list that shows you the count of the most used / repeated words in the document?
    i.e. I would like to open a window list, that shows me from top to bottom the most repeatable words in the specific doc.

    the - 354
    train - 253
    cool - 128
    hope - 89
    etc.

    that helps me a lot with translation instead of guessing or typing manually in search.

    Thanks

    A 1 Reply Last reply Jan 14, 2021, 7:31 PM Reply Quote 0
    • A
      Alan Kilborn @Patrik Spacek
      last edited by Jan 14, 2021, 7:31 PM

      @Patrik-Spacek

      Here’s a Pythonscript I had on hand that copies to the clipboard a word histogram like you show, based on the word content of the current document; obviously requires use of the Pythonscript plugin.

      It’s not perfect the way it is, e.g. it would show s as a word when it scans It's, but as a base thing, it’s not bad.

      # -*- coding: utf-8 -*-
      
      from Npp import editor
      
      word_matches = []
      def match_found(m): word_matches.append(editor.getTextRange(m.span(0)[0], m.span(0)[1]))
      editor.research('\w+', match_found)
      histogram_dict = {}
      for word in word_matches:
          if word not in histogram_dict:
              histogram_dict[word] = 1
          else:
              histogram_dict[word] += 1
      output_list = []
      for k in histogram_dict: output_list.append('{}={}'.format(k, histogram_dict[k]))
      output_list.sort()
      editor.copyText('\r\n'.join(output_list))
      
      1 Reply Last reply Reply Quote 4
      • G
        guy038
        last edited by guy038 Jan 15, 2021, 11:16 AM Jan 14, 2021, 11:02 PM

        Hi, @patrik-spacek, @alan-kilborn,

        Alan, I’ve just tried your Python script, that I named Counting_Words.py and …Wow, very nice ! It’s typically the kind of task for a computer ;-))

        Now, to get a better output :

        • Move the caret at beginning of the list

        • Open the Replace dialog Ctrl + H )

        • SEARCH (=)|^.{18}\K\.+

        • REPLACE (?1.............................. )

        • Untick the Wrap around option

        • Click TWICE on the Replace All button


        I still plan to “play” with your script and I’ll let you know my impressions soon ;-))

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • A
          Alan Kilborn
          last edited by Jan 15, 2021, 1:21 PM

          @guy038 said in Show a list of same words before replacement:

          Now, to get a better output :

          Hmm, not sure what you were going for here, but when I tried the regex replacement you offered, nothing happened to the file (0 replacements). ??

          I’m guessing that you meant for something like this to happen:

          My original output:

          After=1
          Alice=3
          Angeles=1
          Bob=7
          But=1
          Cannon=1
          Carol=6
          Culp=1
          Disturbed=1
          Don=1
          

          to be transformed into something like:

          After.............1
          Alice.............3
          Angeles...........1
          Bob...............7
          But...............1
          Cannon............1
          Carol.............6
          Culp..............1
          Disturbed.........1
          Don...............1
          

          If so, we can do this easily, right in the Python; let me walk you through it…

          See this part of the original script:

          '{}={}'.format(…

          I was a bit lazy with that. I should have done this:

          '{0}={1}'.format(…

          This means, take the two parameters supplied to the format function, and insert them into the {} placeholders in the format string, according to their position. 0 would be the first parameter to format, 1 would be the second. Thus we get simple Bob=7 output for our problem at hand.

          Note that if numbers (e.g. the 0 and the 1) are not supplied, the numbering is implied, left to right, starting at zero and counting up by one each time a {} is encountered.

          Okay, so that’s all fine, but if we want special formatting (this is the format function after all), we can add a colon after the number and supply some formatting codes, e.g.:

          '{0:.<18}{1}'.format(…

          This means, for the first parameter to format, reserve a field width 18 wide, left-justify the parameter value in this field (specified with the <), and pad any remaining empty space in the field with the . character.

          And this indeed does generate the “transformed” output I showed above, when the script is changed and re-run.

          More on the format() function and what it can do is here:
          https://docs.python.org/2.7/library/string.html#format-string-syntax
          It’s a bit heavy, but scrolling down one finds an “examples” section, which maybe helps.

          1 Reply Last reply Reply Quote 2
          • G
            guy038
            last edited by guy038 Jan 15, 2021, 3:20 PM Jan 15, 2021, 3:18 PM

            Hi, @alan-kilborn,

            I don’t understand ! For instance, with the previous search regex :

            • SEARCH (=)|^.{18}\K\.+

            • REPLACE (?1.............................. )

            and TWO consecutive clicks on the Replace All button

            this text :

            a=5
            bc=1
            def=150
            ghij=17
            

            is changed into :

            a................. 5
            bc................ 1
            def............... 150
            ghij.............. 17
            

            Anyway, you guessed well and, thanks to your explanations about Python formating, I chose the '{0:.<18} {1}'.format syntax with a space between the last dot and the number. Super !

            Best Regards,

            guy038

            A 2 Replies Last reply Jan 15, 2021, 6:15 PM Reply Quote 0
            • A
              Alan Kilborn @guy038
              last edited by Jan 15, 2021, 6:15 PM

              @guy038

              By the way, it may pay dividends to become familiar with the format function’s formatting codes – I’ll be using that same syntax as a user-input parameter for my upcoming script posting for “replacing using an incrementing count value”.

              Reference (but no, nothing new there YET!)

              1 Reply Last reply Reply Quote 0
              • A
                Alan Kilborn @guy038
                last edited by Jan 15, 2021, 6:27 PM

                @guy038 said in Show a list of same words before replacement:

                I don’t understand ! For instance, with the previous search regex

                I found out the problem.
                I neglected to follow this part of the original instruction:

                Move the caret at beginning of the list

                After doing that, the regex replacement works fine for me. Sorry for the confusion.

                1 Reply Last reply Reply Quote 1
                • C
                  carypt
                  last edited by Jan 15, 2021, 10:45 PM

                  @Alan-Kilborn ooh , thank you for sharing your python-script , this is the stuff i expect npp to do by default . yes , thank you , litrature researchers want to know these facts , dont they ?

                  A 1 Reply Last reply Jan 16, 2021, 12:12 AM Reply Quote 0
                  • A
                    Alan Kilborn @carypt
                    last edited by Jan 16, 2021, 12:12 AM

                    @carypt said in Show a list of same words before replacement:

                    this is the stuff i expect npp to do by default

                    WHY would you expect this??

                    1 Reply Last reply Reply Quote 0
                    • C
                      carypt
                      last edited by carypt Jan 17, 2021, 12:59 PM Jan 17, 2021, 12:58 PM

                      @Alan-Kilborn why would i ? because everywhere when it comes into tricky needs , people get directed to npp , because you can do “everything” with it . fine , yes , if you can code a bit as it seems , or by help of others . this text-analysing , quenching , reformatting , experimenting stuff i do expect npp to be able . finding rhymes , wrap everything around , giving negatives of characters , or tempered chords of syllables , counting rythm , finding copypasted phrases in dissertaions , plagiarism . finding sentence lenght , word historam (sic !) , sentence structure analysis . book eating stuff . playing with words .

                      A 1 Reply Last reply Jan 17, 2021, 1:08 PM Reply Quote 0
                      • A
                        Alan Kilborn @carypt
                        last edited by Jan 17, 2021, 1:08 PM

                        @carypt

                        No. With a period after the “no”.

                        Word-histogramming (as I showed with the script) and the other things you so interestingly described are more functions of word-processing software or grammar-analyzing software, or, to the point, special-purpose software.

                        A text editor is, by definition, a fairly general purpose tool.
                        N++ provides some advanced capability, but it centers around general-purpose needs. Examples include sorting lines, removing duplicate lines…

                        “…finding rhymes, negatives of characters, tempered chords, rhythm (which I can spell correctly), plagarism, structure analysis…” – all very special purpose.

                        historam (sic!)

                        I know what sic means, but I certainly don’t know what you’re implying with its usage here.

                        1 Reply Last reply Reply Quote 1
                        • C
                          carypt
                          last edited by Jan 17, 2021, 1:15 PM

                          @Alan-Kilborn ok , meh

                          1 Reply Last reply Reply Quote 0
                          • C
                            carypt
                            last edited by Jan 17, 2021, 1:29 PM

                            @Alan-Kilborn but … ty for guiding me to “text-mining”. )

                            1 Reply Last reply Reply Quote 0
                            • P
                              Patrik Spacek
                              last edited by Jan 19, 2021, 8:59 PM

                              Thanks guys for all those suggestions, but i am not a coder… and apps like “word frequency counter” show lots of errors if I paste text with 9000 lines…

                              I dont know python or any coding…could you create a .bat file or some small app runner where i can paste my file or text and just run it?

                              Some simple version of app that I dont have to code and so on?
                              BTW this kind of addition must be added to notepad++ in general :/.

                              Thanks

                              P A 2 Replies Last reply Jan 19, 2021, 9:12 PM Reply Quote 0
                              • P
                                PeterJones @Patrik Spacek
                                last edited by PeterJones Jan 19, 2021, 9:12 PM Jan 19, 2021, 9:12 PM

                                @Patrik-Spacek said in Show a list of same words before replacement:

                                could you create a .bat file or some small app runner where i can paste my file or text and just run it?
                                Some simple version of app that I dont have to code and so on?

                                No need.

                                Thanks guys for all those suggestions, but i am not a coder…
                                I dont know python or any coding…

                                Fortunately, you don’t have to know a whit of python, because @Alan-Kilborn already gave you the code that will work to accomplish this solution in Notepad++:

                                1. Go to Plugins > Plugins Admin, and install the PythonScript plugin.
                                2. Restart Notepad++ as necessary
                                3. Plugins > Python Script > New Script, give it the name WordCount.py or similar
                                4. Copy/paste the script that Alan posted earlier
                                5. Plugins > Python Script > Scripts > WordCount.py will run the script for you, and use Paste / Ctrl+V to paste the count data wherever you want it (a new file, whatever)

                                Once you know it works,

                                1. Plugins > Python Script > Configuration…
                                2. Select WordCount.py in the UserScripts
                                3. Click the left Add to add that script to Menu Items list
                                4. Make sure Initialisation is et to ATSTARTUP
                                5. Click OK
                                6. Exit Notepad++ and restart
                                7. Settings > Shortcut Mapper > Plugin Commands
                                8. Filter for WordCount
                                9. Select that line, click Modify, and set the keyboard shortcut you desire, and close the dialogs.

                                From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

                                No programming required for you, because Alan already did the work.

                                P 1 Reply Last reply Jan 19, 2021, 9:49 PM Reply Quote 1
                                • A
                                  Alan Kilborn @Patrik Spacek
                                  last edited by Alan Kilborn Jan 19, 2021, 9:22 PM Jan 19, 2021, 9:20 PM

                                  @Patrik-Spacek said in Show a list of same words before replacement:

                                  this kind of addition must be added to notepad++ in general

                                  You seem to think like @carypt
                                  Kindly refer to my previous post which started out with No. With a period after the…

                                  @PeterJones

                                  Some people are just paralyzed with fear about something like you describe. :-)

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    Patrik Spacek @PeterJones
                                    last edited by Jan 19, 2021, 9:49 PM

                                    @PeterJones thanks Peter,

                                    nothing is happening…

                                    ea6f6976-7a92-4534-8dd8-c8161e7e079a-image.png

                                    P 1 Reply Last reply Jan 19, 2021, 9:50 PM Reply Quote 0
                                    • P
                                      PeterJones @Patrik Spacek
                                      last edited by Jan 19, 2021, 9:50 PM

                                      @Patrik-Spacek

                                      As I said,

                                      From now on, that keyboard shortcut will run the histogram (or you can go to Plugins > Python Script > WordCount), and it will put the word count in your clipboard buffer. Then all you have to do is paste the result somewhere.

                                      Did you paste somewhere after running the script?

                                      P 1 Reply Last reply Jan 19, 2021, 9:58 PM Reply Quote 0
                                      • P
                                        Patrik Spacek @PeterJones
                                        last edited by Patrik Spacek Jan 19, 2021, 10:01 PM Jan 19, 2021, 9:58 PM

                                        @PeterJones ouch, i missed that paste to somewhere… yes, its working!

                                        what about add command that opens a new doc in notepad++ and paste it right away?
                                        also set in count order from highest to lowest?

                                        @Alan-Kilborn

                                        A 1 Reply Last reply Jan 19, 2021, 10:05 PM Reply Quote 0
                                        • A
                                          Alan Kilborn @Patrik Spacek
                                          last edited by Jan 19, 2021, 10:05 PM

                                          @Patrik-Spacek said in Show a list of same words before replacement:

                                          what about add command that opens a new doc in notepad++

                                          notepad.new()

                                          and paste it right away?

                                          editor.paste()

                                          You may not be as far from programming as you think, if you try your hand at adding the above (hint: at the end of the script) :-)

                                          P 1 Reply Last reply Jan 19, 2021, 10:10 PM Reply Quote 2
                                          9 out of 24
                                          • First post
                                            9/24
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors