Community
    • Login

    How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    16 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones @asadMarmash
      last edited by PeterJones

      @asadMarmash ,

      There aren’t any plugins (that I know of) that re-order the auto-completion list based on word frequency, sorry.

      However, v8.5 (and newer) added Settings > Preferences > Auto-Completion > ☑ Make auto-completion list brief, which will at least trim down the list based on how many characters you’ve typed. So just having typed ab will list all words starting with ab, whereas typing an e after the auto-completion list has popped up will limit the list to words starting with abe. So it might help, even if it doesn’t add frequency-sorting.

      If you’d like to request that the developers add an option to sort the word-list by frequency, you could follow the FAQ about feature requests to put in the request – though my guess is that the developer won’t be interested in that (though I sometimes guess wrong which features will interest him).

      asadMarmashA 1 Reply Last reply Reply Quote 3
      • asadMarmashA
        asadMarmash @PeterJones
        last edited by

        @PeterJones The brief list could be a bit more visually appealing, so thanks for the suggestion.
        I doubt that the feature would be implemented since Notepad++ is primarily a code editor and not a “text” editor.
        Do you know any text editor that has this feature? I though I would experiment with Vim or Emacs, but the learning curve is quite steep.

        1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn
          last edited by

          Perhaps something could be “scripted” for this desire?

          I don’t like Notepad++'s auto-completion feature myself, so I don’t use it. And thus I’m not versed in how it could be scripted, nor would I have interest in writing/using such a script.

          But it seems like some others have recently produced scripts that manipulate auto-completion, so maybe they would be interested in this opportunity to show off some script-brilliance?

          asadMarmashA 1 Reply Last reply Reply Quote 2
          • asadMarmashA
            asadMarmash @Alan Kilborn
            last edited by

            @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @asadMarmash
              last edited by PeterJones

              @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

              @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

              Indeed.

              For some examples that will give you ideas:

              • this discussion led to @Mark-Olson creating a script for PythonScript that uses the dictionary from the DSpellCheck plugin for populating the auto-complete (it thus shows how to do a custom auto-complete, which will be helpful to you)
              • you can search the forum for “frequency” or maybe better “pythonscript frequency”, and may find scripts that help with finding the frequency of words in your document.
              Mark OlsonM 1 Reply Last reply Reply Quote 3
              • Mark OlsonM
                Mark Olson @PeterJones
                last edited by Mark Olson

                @PeterJones
                Here’s a simple way to sort autocompletions by frequency.

                Fair warning: this script engages in some heavy computation every time a char is added, and the amount of computation and memory consumption scales with the size of the file. Since this is Python and not C++, you shouldn’t be surprised if it’s quite a bit slower than normal autocompletion, and it’s probably unusable for files over a kilobytes.

                There is definitely room for efficiency gains. I wrote this using a very simple, obvious algorithm because I didn’t want to fuss. I can try to implement this more efficiently within my dictionary autocompletion plugin, but it’s a nontrivial task.

                a0829b5d-319e-4cbe-8762-9f501be1f6f8-image.png

                from Npp import *
                
                # BEGIN SETTINGS
                AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                # END_SETTINGS
                
                def on_match(m, ctr, startswith):
                    word = m.group(0)
                    if not word.startswith(startswith):
                        return
                    ctr.setdefault(word, 0)
                    ctr[word] += 1
                
                def getWordRangeUnderCaret():
                    pos            = editor.getCurrentPos()
                    word_start_pos = editor.wordStartPosition(pos, True)
                    word_end_pos   = editor.wordEndPosition(pos, True)
                    return word_start_pos, word_end_pos
                
                def onCharInsert(notif):
                    word_start_pos, word_end_pos = getWordRangeUnderCaret()
                    word_length    = word_end_pos - word_start_pos
                    word = editor.getRangePointer(word_start_pos, word_length).strip()
                    if word_length < AUTOCOMPLETION_MIN_LEN:
                        return
                    ctr = {}
                    editor.research('(%s+)' % CHARS_TO_MATCH,
                        lambda m: on_match(m, ctr, word))
                    autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                    autocomp_str = ' '.join(autocomp)
                    editor.autoCShow(word_length, autocomp_str)
                
                
                if __name__ == '__main__':
                    try:
                        CALLBACK_ADDED
                    except NameError:
                        CALLBACK_ADDED = 1
                        editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                

                Edits after initial post:

                1. Add setting to customize what are considered word chars
                2. Get rid of f-string syntax in onCharInsert function to make it (hopefully) compatible with the Python2 version of PythonScript.
                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  This new version now may ignore case for autocompletions depending on the lexer language setting (e.g., will ignore case for SQL but not Python)

                  I would not particularly recommend this version because it may ignore case in some situations where I think it’s rather inappropriate (e.g., text documents) and because of how the algorithm works, the autocompletions it produces are always in all caps. The version in my previous post may be more intuitive and useful.

                  from Npp import *
                  
                  # BEGIN SETTINGS
                  AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                  CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                  USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                  # END_SETTINGS
                  
                  def on_match(m, ctr, ignorecase):
                      '''increase the count of the current word by 1
                      if ignorecase, store only the uppercase version of each word'''
                      word = m.group(0)
                      if ignorecase:
                          word = word.upper()
                      ctr.setdefault(word, 0)
                      ctr[word] += 1
                  
                  def getWordRangeUnderCaret():
                      '''get the start and end of the word under the caret'''
                      pos            = editor.getCurrentPos()
                      word_start_pos = editor.wordStartPosition(pos, True)
                      word_end_pos   = editor.wordEndPosition(pos, True)
                      return word_start_pos, word_end_pos
                  
                  def onCharInsert(notif):
                      '''Find all words in the document prefixed by the word under the caret
                          and show those words for autocompletion sorted by their frequency.
                          Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                      May ignore the case of words based on the lexer language
                          (e.g., will ignore case in SQL but not in Python)'''
                      word_start_pos, word_end_pos = getWordRangeUnderCaret()
                      word_length    = word_end_pos - word_start_pos
                      word = editor.getRangePointer(word_start_pos, word_length).strip()
                      if word_length < AUTOCOMPLETION_MIN_LEN:
                          return
                      ctr = {}
                      # anything preceded by a non-word-char and starting with the current word
                      match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                      ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                      if ignorecase:
                          # match case-insenstively if that's the language default
                          match_pat = '(?i)' + match_pat
                      editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                      autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                      autocomp_str = ' '.join(autocomp)
                      editor.autoCShow(word_length, autocomp_str)
                  
                  
                  if __name__ == '__main__':
                      try:
                          CALLBACK_ADDED
                      except NameError:
                          CALLBACK_ADDED = 1
                          editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                  

                  Edit: added a setting to choose whether to use lexer language to decide whether to ignore case

                  1 Reply Last reply Reply Quote 1
                  • Mark OlsonM
                    Mark Olson
                    last edited by Mark Olson

                    One thing I should note about my scripts that I don’t think anyone has mentioned: there is no way to intercept Scintilla’s default autocompletion list (which includes language keywords) between when it is generated and when it is shown to the user. Thus, my script overrides the default autocompletions and I have no (easy) way to avoid this.

                    The upshot of this would seem to be (if I read the relevant docs correctly) that this is more of a job for the Scintilla devs than the Notepad++ devs.

                    1 Reply Last reply Reply Quote 1
                    • Mark OlsonM
                      Mark Olson
                      last edited by

                      OK, (probably) final update to my little script.
                      Since programmers don’t want their default autocompletions to be overriden for programming languages, and I’m concerned about it being a bit too CPU-hungry for very large files, I’ve added two new settings so that it doesn’t work on very big files and it only autocompletes for files with extensions from a predetermined list.

                      I’ve tried this version out on a decently large (90 kb) JSON file I got from an API, and I was pretty pleased with the results.

                      from Npp import *
                      
                      # BEGIN SETTINGS
                      AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                      CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                      USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                      ENABLED_EXTENSIONS = {
                          '', # files with no extension yet
                          'csv',
                          'txt',
                          'md',
                          'xml',
                          'json',
                          'tsv',
                          'log',
                          'dump',
                          'yaml',
                          'yml',
                      } # only use for files with these extensions
                      MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                      # END_SETTINGS
                      
                      def on_match(m, ctr, ignorecase):
                          '''increase the count of the current word by 1
                          if ignorecase, store only the uppercase version of each word'''
                          word = m.group(0)
                          if ignorecase:
                              word = word.upper()
                          ctr.setdefault(word, 0)
                          ctr[word] += 1
                      
                      def getWordRangeUnderCaret():
                          '''get the start and end of the word under the caret'''
                          pos            = editor.getCurrentPos()
                          word_start_pos = editor.wordStartPosition(pos, True)
                          word_end_pos   = editor.wordEndPosition(pos, True)
                          return word_start_pos, word_end_pos
                          
                      def getExtension(fname):
                          for ii in range(len(fname) - 1, -1, -1):
                              if fname[ii] == '.':
                                  break
                          if ii == 0:
                              return ''
                          return fname[ii + 1:]
                      
                      def onCharInsert(notif):
                          '''Find all words in the document prefixed by the word under the caret
                              and show those words for autocompletion sorted by their frequency.
                              Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                          May ignore the case of words based on the lexer language
                              (e.g., will ignore case in SQL but not in Python)'''
                          if editor.getLength() > MAX_FILE_SIZE:
                              return
                          ext = getExtension(notepad.getCurrentFilename())
                          if ext not in ENABLED_EXTENSIONS:
                              return
                          word_start_pos, word_end_pos = getWordRangeUnderCaret()
                          word_length    = word_end_pos - word_start_pos
                          word = editor.getRangePointer(word_start_pos, word_length).strip()
                          if word_length < AUTOCOMPLETION_MIN_LEN:
                              return
                          ctr = {}
                          # anything preceded by a non-word-char and starting with the current word
                          match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                          ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                          if ignorecase:
                              # match case-insenstively if that's the language default
                              match_pat = '(?i)' + match_pat
                          editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                          autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                          autocomp_str = ' '.join(autocomp)
                          editor.autoCShow(word_length, autocomp_str)
                      
                      
                      if __name__ == '__main__':
                          try:
                              CALLBACK_ADDED
                          except NameError:
                              CALLBACK_ADDED = 1
                              editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                      
                      asadMarmashA 1 Reply Last reply Reply Quote 4
                      • asadMarmashA
                        asadMarmash @Mark Olson
                        last edited by

                        @Mark-Olson I honestly didn’t expect someone to write me a plugin, so thanks a lot dude!

                        But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                        So for example if I have a file than contains this:

                        abc
                        abc
                        abcd
                        

                        Writing “abc” will only suggest “abcd”. While the native autocomplete returns both “abc” and “abcd”. I think it would be appropriate to return “abc” first regardless of the frequency of repetition, this is because it was found in the text, and not truely a “autocomplete”.
                        I tried coding this myself and failed :(

                        PeterJonesP Mark OlsonM 2 Replies Last reply Reply Quote 0
                        • PeterJonesP
                          PeterJones @asadMarmash
                          last edited by

                          @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                          and not truely a “autocomplete”.

                          Exactly. It’s not an autocomplete. If you want to exit out of the auto-complete interface and just accept the word as-is, hit the ESC key.

                          1 Reply Last reply Reply Quote 0
                          • Mark OlsonM
                            Mark Olson @asadMarmash
                            last edited by

                            @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                            But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                            So for example if I have a file than contains this:

                            New setting.
                            With setting on and word present in text:
                            27d84825-b79c-4f61-8370-65f7de0166a1-image.png
                            With setting on and word absent in preceding text:
                            9a3f4c7b-68ef-40b7-9d6b-66a6d582cc57-image.png
                            So if the setting is on, the current word is absent from the autocompletion if the current word is its only occurrence, and it shows up first if the current word is present in the preceding text.

                            from Npp import *
                            
                            # BEGIN SETTINGS
                            AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                            CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                            USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                            DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                            ENABLED_EXTENSIONS = {
                                '', # files with no extension yet
                                'csv',
                                'txt',
                                'md',
                                'xml',
                                'json',
                                'tsv',
                                'log',
                                'dump',
                                'yaml',
                                'yml',
                            } # only use for files with these extensions
                            MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                            CURRENT_WORD_ONLY_IF_IN_TEXT = True
                            # END_SETTINGS
                            
                            def on_match(m, ctr, ignorecase):
                                '''increase the count of the current word by 1
                                if ignorecase, store only the uppercase version of each word'''
                                word = m.group(0)
                                if ignorecase:
                                    word = word.upper()
                                ctr.setdefault(word, 0)
                                ctr[word] += 1
                            
                            def getWordRangeUnderCaret():
                                '''get the start and end of the word under the caret'''
                                pos            = editor.getCurrentPos()
                                word_start_pos = editor.wordStartPosition(pos, True)
                                word_end_pos   = editor.wordEndPosition(pos, True)
                                return word_start_pos, word_end_pos
                                
                            def getExtension(fname):
                                for ii in range(len(fname) - 1, -1, -1):
                                    if fname[ii] == '.':
                                        break
                                if ii == 0:
                                    return ''
                                return fname[ii + 1:]
                            
                            def onCharInsert(notif):
                                '''Find all words in the document prefixed by the word under the caret
                                    and show those words for autocompletion sorted by their frequency.
                                    Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                                May ignore the case of words based on the lexer language
                                    (e.g., will ignore case in SQL but not in Python)'''
                                if editor.getLength() > MAX_FILE_SIZE:
                                    return
                                ext = getExtension(notepad.getCurrentFilename())
                                if ext not in ENABLED_EXTENSIONS:
                                    return
                                word_start_pos, word_end_pos = getWordRangeUnderCaret()
                                word_length    = word_end_pos - word_start_pos
                                word = editor.getRangePointer(word_start_pos, word_length).strip()
                                if word_length < AUTOCOMPLETION_MIN_LEN:
                                    return
                                ctr = {}
                                # anything preceded by a non-word-char and starting with the current word
                                match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                                ignorecase = DEFAULT_IGNORECASE
                                if USE_LANGUAGE_IGNORECASE:
                                    ignorecase = editor.autoCGetIgnoreCase()
                                else:
                                    editor.autoCSetIgnoreCase(ignorecase)
                                if ignorecase:
                                    # match case-insenstively if that's the language default
                                    match_pat = '(?i)' + match_pat
                                editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                                word_was_in_text = False
                                if ignorecase:
                                    word_was_in_text = ctr[word.upper()] > 1
                                else:
                                    word_was_in_text = ctr[word] > 1
                                if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                    del ctr[word]
                                autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                                if CURRENT_WORD_ONLY_IF_IN_TEXT and word_was_in_text:
                                    if ignorecase:
                                        autocomp.insert(0, word.upper())
                                    else:
                                        autocomp.insert(0, word)
                                autocomp_str = ' '.join(autocomp)
                                editor.autoCShow(word_length, autocomp_str)
                            
                            
                            if __name__ == '__main__':
                                try:
                                    CALLBACK_ADDED
                                except NameError:
                                    CALLBACK_ADDED = 1
                                    editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                            
                            asadMarmashA 1 Reply Last reply Reply Quote 3
                            • asadMarmashA
                              asadMarmash @Mark Olson
                              last edited by asadMarmash

                              @Mark-Olson This is brilliant! Thank you Mark, I truly appreciate your effort <3

                              1 Reply Last reply Reply Quote 0
                              • Mark OlsonM
                                Mark Olson
                                last edited by

                                Disregard the version I posted in my previous post. It is bugged and does not work.

                                This version contains a proper working implementation of a feature in which the current word is boosted to the top if it previously occurred and is otherwise not shown.

                                from Npp import *
                                
                                # BEGIN SETTINGS
                                AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                                CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                                USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                                DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                                ENABLED_EXTENSIONS = {
                                    '', # files with no extension yet
                                    'csv',
                                    'txt',
                                    'md',
                                    'xml',
                                    'json',
                                    'tsv',
                                    'log',
                                    'dump',
                                    'yaml',
                                    'yml',
                                } # only use for files with these extensions
                                MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                                CURRENT_WORD_ONLY_IF_IN_TEXT = True
                                # END_SETTINGS
                                
                                def on_match(m, ctr, ignorecase):
                                    '''increase the count of the current word by 1
                                    if ignorecase, store only the uppercase version of each word'''
                                    word = m.group(0)
                                    if ignorecase:
                                        word = word.upper()
                                    ctr.setdefault(word, 0)
                                    ctr[word] += 1
                                
                                def getWordRangeUnderCaret():
                                    '''get the start and end of the word under the caret'''
                                    pos            = editor.getCurrentPos()
                                    word_start_pos = editor.wordStartPosition(pos, True)
                                    word_end_pos   = editor.wordEndPosition(pos, True)
                                    return word_start_pos, word_end_pos
                                    
                                def getExtension(fname):
                                    for ii in range(len(fname) - 1, -1, -1):
                                        if fname[ii] == '.':
                                            break
                                    if ii == 0:
                                        return ''
                                    return fname[ii + 1:]
                                
                                def onCharInsert(notif):
                                    '''Find all words in the document prefixed by the word under the caret
                                        and show those words for autocompletion sorted by their frequency.
                                        Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                                    May ignore the case of words based on the lexer language
                                        (e.g., will ignore case in SQL but not in Python)'''
                                    if editor.getLength() > MAX_FILE_SIZE:
                                        return
                                    ext = getExtension(notepad.getCurrentFilename())
                                    if ext not in ENABLED_EXTENSIONS:
                                        return
                                    word_start_pos, word_end_pos = getWordRangeUnderCaret()
                                    word_length    = word_end_pos - word_start_pos
                                    word = editor.getRangePointer(word_start_pos, word_length).strip()
                                    if word_length < AUTOCOMPLETION_MIN_LEN:
                                        return
                                    ctr = {}
                                    # anything preceded by a non-word-char and starting with the current word
                                    match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                                    ignorecase = DEFAULT_IGNORECASE
                                    if USE_LANGUAGE_IGNORECASE:
                                        ignorecase = editor.autoCGetIgnoreCase()
                                    else:
                                        editor.autoCSetIgnoreCase(ignorecase)
                                    if ignorecase:
                                        # match case-insenstively if that's the language default
                                        match_pat = '(?i)' + match_pat
                                    editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                                    if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                        if ignorecase:
                                            upword = word.upper()
                                            if upword in ctr:
                                                if ctr[upword] > 1: # word earlier in text, move to front
                                                    ctr[upword] = 10_000_000_000
                                                else:
                                                    del ctr[upword] # word not in text, remove
                                        elif word in ctr:
                                            if ctr[word] > 1:
                                                ctr[word] = 10_000_000_000
                                            else:
                                                del ctr[word]
                                    autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                                    autocomp_str = ' '.join(autocomp)
                                    editor.autoCShow(word_length, autocomp_str)
                                
                                
                                if __name__ == '__main__':
                                    try:
                                        CALLBACK_ADDED
                                    except NameError:
                                        CALLBACK_ADDED = 1
                                        editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                                
                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors