Community
    • Login

    Looking for a freelancer to develop a plugin: Misspelled Word Counter

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    41 Posts 6 Posters 3.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse @Miguel Lescano
      last edited by

      @Miguel-Lescano

      Good testing, I will check that as soon as I get home.

      1 Reply Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse @Miguel Lescano
        last edited by Ekopalypse

        @Miguel-Lescano

        I found the two issues.
        First, a typo and second, a greeting from the past.

        The typo is responsible for the behavior when no more characters are attached to a word.

        editor.research('[[:alpha:]]+(?=[\h|[:punct:]|\R|\Z])', lambda m: words.append(m.group()))
        

        must be

        editor.research('[[:alpha:]]+(?=\h|[[:punct:]]|\R|\Z)', lambda m: words.append(m.group()))
        

        The second problem has to do with the fact that Python2 and Python3 handle Unicode text differently.
        Since I have been working only with Python3 for quite some time now I completely ignored the fact that for example ‘Ü’ is a bytes object in Python2 and a Unicode object in Python3.
        This means that we have to work with Unicode in our code as well, so that functions like .lower() work.

        from Npp import notepad, editor, NOTIFICATION, SCINTILLANOTIFICATION, STATUSBARSECTION, MODIFICATIONFLAGS
        import os
        
        
        class WORD_CHECKER:
            
            def __init__(self):
                print('__init__')
                self.report = ('Total: {0:<5}  '
                               'Unique: {1:<5}  '
                               'Total non-misspelled: {2:<5}({3:.1%})  '
                               'Total misspelled: {4:<4}({5:.1%})  '
                               'Unique misspelled: {6:<4}({7:.1%})')
        
                editor.callbackSync(self.on_modified, [SCINTILLANOTIFICATION.MODIFIED])
                notepad.callback(self.on_buffer_activated, [NOTIFICATION.BUFFERACTIVATED])
                current_dict_path = os.path.join(notepad.getPluginConfigDir(), 'Hunspell')
                current_dict_file = os.path.join(current_dict_path, 'ES-5000.dic')
                with open(current_dict_file, 'r') as f:
                    self.current_dict = [word.decode('utf8')
                                         for word in f.read().splitlines()[1:]]   # skip length entry
        
                self.DEBUG_MODE = False
                self.on_buffer_activated({})  # must be last line here as it triggers check_words
                
        
            def check_words(self):
                words = []
                editor.research('[[:alpha:]]+(?=\h|[[:punct:]]|\R|\Z)',
                                lambda m: words.append(m.group().decode('utf8')))
                if self.DEBUG_MODE:
                    print(u'words contains:\n  {}'.format('  '.join(words)))
                    
                error_words = [word.lower() 
                               for word in words 
                               if word.lower() not in self.current_dict and    # insensitive word check
                               not word.isupper()  # ignore all uppercase only words
                               ]
                if self.DEBUG_MODE:
                    print(u'error_words contains:\n  {}'.format('  '.join(error_words)))
                    print(u'error_words unique contains:\n  {}'.format('  '.join(set(error_words))))
                
                total = len(words)
                unique = len(set(words))
                misspelled = len(error_words)
                misspelled_unique = len(set(error_words))
                notepad.setStatusBar(STATUSBARSECTION.DOCTYPE, 
                                     self.report.format(total,
                                                        unique,
                                                        total-misspelled,  # non-misspelled
                                                        (float(total-misspelled) / total) if misspelled else 1,  # non-misspelled %
                                                        misspelled,
                                                        (float(misspelled) / total) if misspelled else 0,
                                                        misspelled_unique,
                                                        (float(misspelled_unique) / total) if misspelled_unique else 0))
        
        
            def on_modified(self, args):
                if ((args['modificationType'] & MODIFICATIONFLAGS.INSERTTEXT) or 
                    (args['modificationType'] & MODIFICATIONFLAGS.DELETETEXT)):
                    self.check_words()
        
        
            def on_buffer_activated(self, args):
                self.check_words()
        
        
        WORD_CHECKER()
        

        The new code has an additional DEBUG_MODE, which, if set to True,
        will print the content of words-, error_words- and unique error_words-list
        to the python script console.

        At the moment I am checking if it is possible or if it makes sense to read the DSpellCheck.ini to apply the settings automatically if necessary.
        For example one might treat ALL Letters Capital as misspelled words.

        Miguel LescanoM 1 Reply Last reply Reply Quote 3
        • Miguel LescanoM
          Miguel Lescano @Ekopalypse
          last edited by

          @Ekopalypse Thanks a lot! I can’t find the typo to save my life…
          I’ve been using your script today to write the cues for my very first crossword puzzle for Spanish learners:
          http://crossword.info/spanishinput/Spanish_Input_Level_1_Puzzle_001
          All the cues use only words from the top 1000, except for proper names.

          BTW, I have a couple of special request, so feel free to charge me for this. I know this is taking from your time, and I’m grateful for it:
          Is there a way to add comment lines that are completely ignored from the calculation? I mean, not even counted in the word total. Maybe lines that start with // or with an asterisk or something like that.

          Yes, I’ve been thinking about the All caps letters thing… Sometimes it does make sense to treat them the same as the other words. Still thinking…

          Thanks a lot!

          EkopalypseE 1 Reply Last reply Reply Quote 0
          • Bas de ReuverB
            Bas de Reuver @Ekopalypse
            last edited by

            @Ekopalypse said in Looking for a freelancer to develop a plugin: Misspelled Word Counter:

            @Miguel-Lescano

            Maybe you could convince predelnik to implement it!? My approach could serve as a template.
            Maybe another tab called Statistics under Settings… with an option to display this in realtime in the DocType field!?

            I agree with this, you could request a new feaure here:
            https://github.com/Predelnik/DSpellCheck/issues

            So like “feature request: count misspelled words” or “report/stats of misspelled words” or something like that, and refer to this forum thread.

            1 Reply Last reply Reply Quote 2
            • EkopalypseE
              Ekopalypse @Miguel Lescano
              last edited by

              @Miguel-Lescano said in Looking for a freelancer to develop a plugin: Misspelled Word Counter:

              I’ve been using your script today to write the cues for my very first crossword puzzle for Spanish learners:

              Cool :-)

              BTW, I have a couple of special request, so feel free to charge me for this. I know this is taking from your time, and I’m grateful for it:

              As long as I have time to do it and enjoy making it work, no problem.

              Is there a way to add comment lines that are completely ignored from the calculation? I mean, not even counted in the word total. Maybe lines that start with // or with an asterisk or something like that.

              Yes, this is possible, but that will mean that misspelled-word-synchronization with DSpellCheck isn’t working anymore, correct?

              Miguel LescanoM 1 Reply Last reply Reply Quote 1
              • Miguel LescanoM
                Miguel Lescano @Ekopalypse
                last edited by

                @Ekopalypse Yes, this would kinda break things with DSpellCheck, but it’s not a problem.
                I’d love to be able to add notes between Spanish dialogues. The notes would not be actually recorded for my students. I have a YouTube channel where I publish recordings of my stories:
                https://www.youtube.com/watch?v=XfpbG_5Im9Q

                In the future, I plan to learn to use Unreal Engine to create short animations, so the notes would also include scene descriptions.

                EkopalypseE 1 Reply Last reply Reply Quote 0
                • EkopalypseE
                  Ekopalypse @Miguel Lescano
                  last edited by Ekopalypse

                  @Miguel-Lescano

                  The easiest way would be if the comment always starts at the
                  beginning of a line. But it can also be solved if the comment appears at
                  the end. What wouldn’t be so nice is if something like text comment text
                  is thought of or comment goes over several lines without the
                  new lines having a comment character at the beginning.

                  Assuming we use // as the “comment sign”

                  Relatively easy

                  // Comment
                  Text  // Comment
                  

                  Not so easy:

                  // Comment
                  still comment //
                  Text
                  
                  Text //comment
                  comment// Text
                  

                  What do you think?

                  Miguel LescanoM 1 Reply Last reply Reply Quote 1
                  • Miguel LescanoM
                    Miguel Lescano @Ekopalypse
                    last edited by

                    @Ekopalypse Hi!
                    Yes, my plan is to have comment-only lines that could start with //, so the “relatively easy” option is what I’m looking for. I guess the line can be as long as I want it to be, right?

                    EkopalypseE 1 Reply Last reply Reply Quote 0
                    • EkopalypseE
                      Ekopalypse @Miguel Lescano
                      last edited by

                      @Miguel-Lescano said in Looking for a freelancer to develop a plugin: Misspelled Word Counter:

                      I guess the line can be as long as I want it to be, right?

                      Theoretically yes, but there is a known problem with szintilla and the handling of “really” long lines but I do not assume that your comments
                      are longer than 1000 characters, right?

                      Ok, I give it a try.

                      Miguel LescanoM 1 Reply Last reply Reply Quote 0
                      • Miguel LescanoM
                        Miguel Lescano @Ekopalypse
                        last edited by

                        @Ekopalypse Yes, my comments would not be too long. Just explanations and descriptions.

                        EkopalypseE 1 Reply Last reply Reply Quote 0
                        • EkopalypseE
                          Ekopalypse @Miguel Lescano
                          last edited by

                          @Miguel-Lescano

                          Sorry for the delay I got distracted by some other cool projects.
                          I opened a github page to release some of my scripts and I’d say
                          we use the issue tracker on github to avoid cluttering the forum here.

                          Miguel LescanoM 1 Reply Last reply Reply Quote 2
                          • Miguel LescanoM
                            Miguel Lescano @Ekopalypse
                            last edited by

                            @Ekopalypse Thanks! I’ll check it out.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors