Community
    • Login

    How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    16 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • asadMarmashA
      asadMarmash @PeterJones
      last edited by

      @PeterJones The brief list could be a bit more visually appealing, so thanks for the suggestion.
      I doubt that the feature would be implemented since Notepad++ is primarily a code editor and not a “text” editor.
      Do you know any text editor that has this feature? I though I would experiment with Vim or Emacs, but the learning curve is quite steep.

      1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn
        last edited by

        Perhaps something could be “scripted” for this desire?

        I don’t like Notepad++'s auto-completion feature myself, so I don’t use it. And thus I’m not versed in how it could be scripted, nor would I have interest in writing/using such a script.

        But it seems like some others have recently produced scripts that manipulate auto-completion, so maybe they would be interested in this opportunity to show off some script-brilliance?

        asadMarmashA 1 Reply Last reply Reply Quote 2
        • asadMarmashA
          asadMarmash @Alan Kilborn
          last edited by

          @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @asadMarmash
            last edited by PeterJones

            @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

            @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

            Indeed.

            For some examples that will give you ideas:

            • this discussion led to @Mark-Olson creating a script for PythonScript that uses the dictionary from the DSpellCheck plugin for populating the auto-complete (it thus shows how to do a custom auto-complete, which will be helpful to you)
            • you can search the forum for “frequency” or maybe better “pythonscript frequency”, and may find scripts that help with finding the frequency of words in your document.
            Mark OlsonM 1 Reply Last reply Reply Quote 3
            • Mark OlsonM
              Mark Olson @PeterJones
              last edited by Mark Olson

              @PeterJones
              Here’s a simple way to sort autocompletions by frequency.

              Fair warning: this script engages in some heavy computation every time a char is added, and the amount of computation and memory consumption scales with the size of the file. Since this is Python and not C++, you shouldn’t be surprised if it’s quite a bit slower than normal autocompletion, and it’s probably unusable for files over a kilobytes.

              There is definitely room for efficiency gains. I wrote this using a very simple, obvious algorithm because I didn’t want to fuss. I can try to implement this more efficiently within my dictionary autocompletion plugin, but it’s a nontrivial task.

              a0829b5d-319e-4cbe-8762-9f501be1f6f8-image.png

              from Npp import *
              
              # BEGIN SETTINGS
              AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
              CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
              # END_SETTINGS
              
              def on_match(m, ctr, startswith):
                  word = m.group(0)
                  if not word.startswith(startswith):
                      return
                  ctr.setdefault(word, 0)
                  ctr[word] += 1
              
              def getWordRangeUnderCaret():
                  pos            = editor.getCurrentPos()
                  word_start_pos = editor.wordStartPosition(pos, True)
                  word_end_pos   = editor.wordEndPosition(pos, True)
                  return word_start_pos, word_end_pos
              
              def onCharInsert(notif):
                  word_start_pos, word_end_pos = getWordRangeUnderCaret()
                  word_length    = word_end_pos - word_start_pos
                  word = editor.getRangePointer(word_start_pos, word_length).strip()
                  if word_length < AUTOCOMPLETION_MIN_LEN:
                      return
                  ctr = {}
                  editor.research('(%s+)' % CHARS_TO_MATCH,
                      lambda m: on_match(m, ctr, word))
                  autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                  autocomp_str = ' '.join(autocomp)
                  editor.autoCShow(word_length, autocomp_str)
              
              
              if __name__ == '__main__':
                  try:
                      CALLBACK_ADDED
                  except NameError:
                      CALLBACK_ADDED = 1
                      editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
              

              Edits after initial post:

              1. Add setting to customize what are considered word chars
              2. Get rid of f-string syntax in onCharInsert function to make it (hopefully) compatible with the Python2 version of PythonScript.
              1 Reply Last reply Reply Quote 1
              • Mark OlsonM
                Mark Olson
                last edited by Mark Olson

                This new version now may ignore case for autocompletions depending on the lexer language setting (e.g., will ignore case for SQL but not Python)

                I would not particularly recommend this version because it may ignore case in some situations where I think it’s rather inappropriate (e.g., text documents) and because of how the algorithm works, the autocompletions it produces are always in all caps. The version in my previous post may be more intuitive and useful.

                from Npp import *
                
                # BEGIN SETTINGS
                AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                # END_SETTINGS
                
                def on_match(m, ctr, ignorecase):
                    '''increase the count of the current word by 1
                    if ignorecase, store only the uppercase version of each word'''
                    word = m.group(0)
                    if ignorecase:
                        word = word.upper()
                    ctr.setdefault(word, 0)
                    ctr[word] += 1
                
                def getWordRangeUnderCaret():
                    '''get the start and end of the word under the caret'''
                    pos            = editor.getCurrentPos()
                    word_start_pos = editor.wordStartPosition(pos, True)
                    word_end_pos   = editor.wordEndPosition(pos, True)
                    return word_start_pos, word_end_pos
                
                def onCharInsert(notif):
                    '''Find all words in the document prefixed by the word under the caret
                        and show those words for autocompletion sorted by their frequency.
                        Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                    May ignore the case of words based on the lexer language
                        (e.g., will ignore case in SQL but not in Python)'''
                    word_start_pos, word_end_pos = getWordRangeUnderCaret()
                    word_length    = word_end_pos - word_start_pos
                    word = editor.getRangePointer(word_start_pos, word_length).strip()
                    if word_length < AUTOCOMPLETION_MIN_LEN:
                        return
                    ctr = {}
                    # anything preceded by a non-word-char and starting with the current word
                    match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                    ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                    if ignorecase:
                        # match case-insenstively if that's the language default
                        match_pat = '(?i)' + match_pat
                    editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                    autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                    autocomp_str = ' '.join(autocomp)
                    editor.autoCShow(word_length, autocomp_str)
                
                
                if __name__ == '__main__':
                    try:
                        CALLBACK_ADDED
                    except NameError:
                        CALLBACK_ADDED = 1
                        editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                

                Edit: added a setting to choose whether to use lexer language to decide whether to ignore case

                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  One thing I should note about my scripts that I don’t think anyone has mentioned: there is no way to intercept Scintilla’s default autocompletion list (which includes language keywords) between when it is generated and when it is shown to the user. Thus, my script overrides the default autocompletions and I have no (easy) way to avoid this.

                  The upshot of this would seem to be (if I read the relevant docs correctly) that this is more of a job for the Scintilla devs than the Notepad++ devs.

                  1 Reply Last reply Reply Quote 1
                  • Mark OlsonM
                    Mark Olson
                    last edited by

                    OK, (probably) final update to my little script.
                    Since programmers don’t want their default autocompletions to be overriden for programming languages, and I’m concerned about it being a bit too CPU-hungry for very large files, I’ve added two new settings so that it doesn’t work on very big files and it only autocompletes for files with extensions from a predetermined list.

                    I’ve tried this version out on a decently large (90 kb) JSON file I got from an API, and I was pretty pleased with the results.

                    from Npp import *
                    
                    # BEGIN SETTINGS
                    AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                    CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                    USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                    ENABLED_EXTENSIONS = {
                        '', # files with no extension yet
                        'csv',
                        'txt',
                        'md',
                        'xml',
                        'json',
                        'tsv',
                        'log',
                        'dump',
                        'yaml',
                        'yml',
                    } # only use for files with these extensions
                    MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                    # END_SETTINGS
                    
                    def on_match(m, ctr, ignorecase):
                        '''increase the count of the current word by 1
                        if ignorecase, store only the uppercase version of each word'''
                        word = m.group(0)
                        if ignorecase:
                            word = word.upper()
                        ctr.setdefault(word, 0)
                        ctr[word] += 1
                    
                    def getWordRangeUnderCaret():
                        '''get the start and end of the word under the caret'''
                        pos            = editor.getCurrentPos()
                        word_start_pos = editor.wordStartPosition(pos, True)
                        word_end_pos   = editor.wordEndPosition(pos, True)
                        return word_start_pos, word_end_pos
                        
                    def getExtension(fname):
                        for ii in range(len(fname) - 1, -1, -1):
                            if fname[ii] == '.':
                                break
                        if ii == 0:
                            return ''
                        return fname[ii + 1:]
                    
                    def onCharInsert(notif):
                        '''Find all words in the document prefixed by the word under the caret
                            and show those words for autocompletion sorted by their frequency.
                            Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                        May ignore the case of words based on the lexer language
                            (e.g., will ignore case in SQL but not in Python)'''
                        if editor.getLength() > MAX_FILE_SIZE:
                            return
                        ext = getExtension(notepad.getCurrentFilename())
                        if ext not in ENABLED_EXTENSIONS:
                            return
                        word_start_pos, word_end_pos = getWordRangeUnderCaret()
                        word_length    = word_end_pos - word_start_pos
                        word = editor.getRangePointer(word_start_pos, word_length).strip()
                        if word_length < AUTOCOMPLETION_MIN_LEN:
                            return
                        ctr = {}
                        # anything preceded by a non-word-char and starting with the current word
                        match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                        ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                        if ignorecase:
                            # match case-insenstively if that's the language default
                            match_pat = '(?i)' + match_pat
                        editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                        autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                        autocomp_str = ' '.join(autocomp)
                        editor.autoCShow(word_length, autocomp_str)
                    
                    
                    if __name__ == '__main__':
                        try:
                            CALLBACK_ADDED
                        except NameError:
                            CALLBACK_ADDED = 1
                            editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                    
                    asadMarmashA 1 Reply Last reply Reply Quote 4
                    • asadMarmashA
                      asadMarmash @Mark Olson
                      last edited by

                      @Mark-Olson I honestly didn’t expect someone to write me a plugin, so thanks a lot dude!

                      But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                      So for example if I have a file than contains this:

                      abc
                      abc
                      abcd
                      

                      Writing “abc” will only suggest “abcd”. While the native autocomplete returns both “abc” and “abcd”. I think it would be appropriate to return “abc” first regardless of the frequency of repetition, this is because it was found in the text, and not truely a “autocomplete”.
                      I tried coding this myself and failed :(

                      PeterJonesP Mark OlsonM 2 Replies Last reply Reply Quote 0
                      • PeterJonesP
                        PeterJones @asadMarmash
                        last edited by

                        @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                        and not truely a “autocomplete”.

                        Exactly. It’s not an autocomplete. If you want to exit out of the auto-complete interface and just accept the word as-is, hit the ESC key.

                        1 Reply Last reply Reply Quote 0
                        • Mark OlsonM
                          Mark Olson @asadMarmash
                          last edited by

                          @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                          But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                          So for example if I have a file than contains this:

                          New setting.
                          With setting on and word present in text:
                          27d84825-b79c-4f61-8370-65f7de0166a1-image.png
                          With setting on and word absent in preceding text:
                          9a3f4c7b-68ef-40b7-9d6b-66a6d582cc57-image.png
                          So if the setting is on, the current word is absent from the autocompletion if the current word is its only occurrence, and it shows up first if the current word is present in the preceding text.

                          from Npp import *
                          
                          # BEGIN SETTINGS
                          AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                          CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                          USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                          DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                          ENABLED_EXTENSIONS = {
                              '', # files with no extension yet
                              'csv',
                              'txt',
                              'md',
                              'xml',
                              'json',
                              'tsv',
                              'log',
                              'dump',
                              'yaml',
                              'yml',
                          } # only use for files with these extensions
                          MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                          CURRENT_WORD_ONLY_IF_IN_TEXT = True
                          # END_SETTINGS
                          
                          def on_match(m, ctr, ignorecase):
                              '''increase the count of the current word by 1
                              if ignorecase, store only the uppercase version of each word'''
                              word = m.group(0)
                              if ignorecase:
                                  word = word.upper()
                              ctr.setdefault(word, 0)
                              ctr[word] += 1
                          
                          def getWordRangeUnderCaret():
                              '''get the start and end of the word under the caret'''
                              pos            = editor.getCurrentPos()
                              word_start_pos = editor.wordStartPosition(pos, True)
                              word_end_pos   = editor.wordEndPosition(pos, True)
                              return word_start_pos, word_end_pos
                              
                          def getExtension(fname):
                              for ii in range(len(fname) - 1, -1, -1):
                                  if fname[ii] == '.':
                                      break
                              if ii == 0:
                                  return ''
                              return fname[ii + 1:]
                          
                          def onCharInsert(notif):
                              '''Find all words in the document prefixed by the word under the caret
                                  and show those words for autocompletion sorted by their frequency.
                                  Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                              May ignore the case of words based on the lexer language
                                  (e.g., will ignore case in SQL but not in Python)'''
                              if editor.getLength() > MAX_FILE_SIZE:
                                  return
                              ext = getExtension(notepad.getCurrentFilename())
                              if ext not in ENABLED_EXTENSIONS:
                                  return
                              word_start_pos, word_end_pos = getWordRangeUnderCaret()
                              word_length    = word_end_pos - word_start_pos
                              word = editor.getRangePointer(word_start_pos, word_length).strip()
                              if word_length < AUTOCOMPLETION_MIN_LEN:
                                  return
                              ctr = {}
                              # anything preceded by a non-word-char and starting with the current word
                              match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                              ignorecase = DEFAULT_IGNORECASE
                              if USE_LANGUAGE_IGNORECASE:
                                  ignorecase = editor.autoCGetIgnoreCase()
                              else:
                                  editor.autoCSetIgnoreCase(ignorecase)
                              if ignorecase:
                                  # match case-insenstively if that's the language default
                                  match_pat = '(?i)' + match_pat
                              editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                              word_was_in_text = False
                              if ignorecase:
                                  word_was_in_text = ctr[word.upper()] > 1
                              else:
                                  word_was_in_text = ctr[word] > 1
                              if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                  del ctr[word]
                              autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                              if CURRENT_WORD_ONLY_IF_IN_TEXT and word_was_in_text:
                                  if ignorecase:
                                      autocomp.insert(0, word.upper())
                                  else:
                                      autocomp.insert(0, word)
                              autocomp_str = ' '.join(autocomp)
                              editor.autoCShow(word_length, autocomp_str)
                          
                          
                          if __name__ == '__main__':
                              try:
                                  CALLBACK_ADDED
                              except NameError:
                                  CALLBACK_ADDED = 1
                                  editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                          
                          asadMarmashA 1 Reply Last reply Reply Quote 3
                          • asadMarmashA
                            asadMarmash @Mark Olson
                            last edited by asadMarmash

                            @Mark-Olson This is brilliant! Thank you Mark, I truly appreciate your effort <3

                            1 Reply Last reply Reply Quote 0
                            • Mark OlsonM
                              Mark Olson
                              last edited by

                              Disregard the version I posted in my previous post. It is bugged and does not work.

                              This version contains a proper working implementation of a feature in which the current word is boosted to the top if it previously occurred and is otherwise not shown.

                              from Npp import *
                              
                              # BEGIN SETTINGS
                              AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                              CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                              USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                              DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                              ENABLED_EXTENSIONS = {
                                  '', # files with no extension yet
                                  'csv',
                                  'txt',
                                  'md',
                                  'xml',
                                  'json',
                                  'tsv',
                                  'log',
                                  'dump',
                                  'yaml',
                                  'yml',
                              } # only use for files with these extensions
                              MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                              CURRENT_WORD_ONLY_IF_IN_TEXT = True
                              # END_SETTINGS
                              
                              def on_match(m, ctr, ignorecase):
                                  '''increase the count of the current word by 1
                                  if ignorecase, store only the uppercase version of each word'''
                                  word = m.group(0)
                                  if ignorecase:
                                      word = word.upper()
                                  ctr.setdefault(word, 0)
                                  ctr[word] += 1
                              
                              def getWordRangeUnderCaret():
                                  '''get the start and end of the word under the caret'''
                                  pos            = editor.getCurrentPos()
                                  word_start_pos = editor.wordStartPosition(pos, True)
                                  word_end_pos   = editor.wordEndPosition(pos, True)
                                  return word_start_pos, word_end_pos
                                  
                              def getExtension(fname):
                                  for ii in range(len(fname) - 1, -1, -1):
                                      if fname[ii] == '.':
                                          break
                                  if ii == 0:
                                      return ''
                                  return fname[ii + 1:]
                              
                              def onCharInsert(notif):
                                  '''Find all words in the document prefixed by the word under the caret
                                      and show those words for autocompletion sorted by their frequency.
                                      Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                                  May ignore the case of words based on the lexer language
                                      (e.g., will ignore case in SQL but not in Python)'''
                                  if editor.getLength() > MAX_FILE_SIZE:
                                      return
                                  ext = getExtension(notepad.getCurrentFilename())
                                  if ext not in ENABLED_EXTENSIONS:
                                      return
                                  word_start_pos, word_end_pos = getWordRangeUnderCaret()
                                  word_length    = word_end_pos - word_start_pos
                                  word = editor.getRangePointer(word_start_pos, word_length).strip()
                                  if word_length < AUTOCOMPLETION_MIN_LEN:
                                      return
                                  ctr = {}
                                  # anything preceded by a non-word-char and starting with the current word
                                  match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                                  ignorecase = DEFAULT_IGNORECASE
                                  if USE_LANGUAGE_IGNORECASE:
                                      ignorecase = editor.autoCGetIgnoreCase()
                                  else:
                                      editor.autoCSetIgnoreCase(ignorecase)
                                  if ignorecase:
                                      # match case-insenstively if that's the language default
                                      match_pat = '(?i)' + match_pat
                                  editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                                  if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                      if ignorecase:
                                          upword = word.upper()
                                          if upword in ctr:
                                              if ctr[upword] > 1: # word earlier in text, move to front
                                                  ctr[upword] = 10_000_000_000
                                              else:
                                                  del ctr[upword] # word not in text, remove
                                      elif word in ctr:
                                          if ctr[word] > 1:
                                              ctr[word] = 10_000_000_000
                                          else:
                                              del ctr[word]
                                  autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                                  autocomp_str = ' '.join(autocomp)
                                  editor.autoCShow(word_length, autocomp_str)
                              
                              
                              if __name__ == '__main__':
                                  try:
                                      CALLBACK_ADDED
                                  except NameError:
                                      CALLBACK_ADDED = 1
                                      editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                              
                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors