Community
    • Login

    How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    16 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by

      Perhaps something could be “scripted” for this desire?

      I don’t like Notepad++'s auto-completion feature myself, so I don’t use it. And thus I’m not versed in how it could be scripted, nor would I have interest in writing/using such a script.

      But it seems like some others have recently produced scripts that manipulate auto-completion, so maybe they would be interested in this opportunity to show off some script-brilliance?

      asadMarmashA 1 Reply Last reply Reply Quote 2
      • asadMarmashA
        asadMarmash @Alan Kilborn
        last edited by

        @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @asadMarmash
          last edited by PeterJones

          @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

          @Alan-Kilborn I have no experience with C++, but I found a way to create a “plugin” using python. I may look into creating one in the near future.

          Indeed.

          For some examples that will give you ideas:

          • this discussion led to @Mark-Olson creating a script for PythonScript that uses the dictionary from the DSpellCheck plugin for populating the auto-complete (it thus shows how to do a custom auto-complete, which will be helpful to you)
          • you can search the forum for “frequency” or maybe better “pythonscript frequency”, and may find scripts that help with finding the frequency of words in your document.
          Mark OlsonM 1 Reply Last reply Reply Quote 3
          • Mark OlsonM
            Mark Olson @PeterJones
            last edited by Mark Olson

            @PeterJones
            Here’s a simple way to sort autocompletions by frequency.

            Fair warning: this script engages in some heavy computation every time a char is added, and the amount of computation and memory consumption scales with the size of the file. Since this is Python and not C++, you shouldn’t be surprised if it’s quite a bit slower than normal autocompletion, and it’s probably unusable for files over a kilobytes.

            There is definitely room for efficiency gains. I wrote this using a very simple, obvious algorithm because I didn’t want to fuss. I can try to implement this more efficiently within my dictionary autocompletion plugin, but it’s a nontrivial task.

            a0829b5d-319e-4cbe-8762-9f501be1f6f8-image.png

            from Npp import *
            
            # BEGIN SETTINGS
            AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
            CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
            # END_SETTINGS
            
            def on_match(m, ctr, startswith):
                word = m.group(0)
                if not word.startswith(startswith):
                    return
                ctr.setdefault(word, 0)
                ctr[word] += 1
            
            def getWordRangeUnderCaret():
                pos            = editor.getCurrentPos()
                word_start_pos = editor.wordStartPosition(pos, True)
                word_end_pos   = editor.wordEndPosition(pos, True)
                return word_start_pos, word_end_pos
            
            def onCharInsert(notif):
                word_start_pos, word_end_pos = getWordRangeUnderCaret()
                word_length    = word_end_pos - word_start_pos
                word = editor.getRangePointer(word_start_pos, word_length).strip()
                if word_length < AUTOCOMPLETION_MIN_LEN:
                    return
                ctr = {}
                editor.research('(%s+)' % CHARS_TO_MATCH,
                    lambda m: on_match(m, ctr, word))
                autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                autocomp_str = ' '.join(autocomp)
                editor.autoCShow(word_length, autocomp_str)
            
            
            if __name__ == '__main__':
                try:
                    CALLBACK_ADDED
                except NameError:
                    CALLBACK_ADDED = 1
                    editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
            

            Edits after initial post:

            1. Add setting to customize what are considered word chars
            2. Get rid of f-string syntax in onCharInsert function to make it (hopefully) compatible with the Python2 version of PythonScript.
            1 Reply Last reply Reply Quote 1
            • Mark OlsonM
              Mark Olson
              last edited by Mark Olson

              This new version now may ignore case for autocompletions depending on the lexer language setting (e.g., will ignore case for SQL but not Python)

              I would not particularly recommend this version because it may ignore case in some situations where I think it’s rather inappropriate (e.g., text documents) and because of how the algorithm works, the autocompletions it produces are always in all caps. The version in my previous post may be more intuitive and useful.

              from Npp import *
              
              # BEGIN SETTINGS
              AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
              CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
              USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
              # END_SETTINGS
              
              def on_match(m, ctr, ignorecase):
                  '''increase the count of the current word by 1
                  if ignorecase, store only the uppercase version of each word'''
                  word = m.group(0)
                  if ignorecase:
                      word = word.upper()
                  ctr.setdefault(word, 0)
                  ctr[word] += 1
              
              def getWordRangeUnderCaret():
                  '''get the start and end of the word under the caret'''
                  pos            = editor.getCurrentPos()
                  word_start_pos = editor.wordStartPosition(pos, True)
                  word_end_pos   = editor.wordEndPosition(pos, True)
                  return word_start_pos, word_end_pos
              
              def onCharInsert(notif):
                  '''Find all words in the document prefixed by the word under the caret
                      and show those words for autocompletion sorted by their frequency.
                      Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                  May ignore the case of words based on the lexer language
                      (e.g., will ignore case in SQL but not in Python)'''
                  word_start_pos, word_end_pos = getWordRangeUnderCaret()
                  word_length    = word_end_pos - word_start_pos
                  word = editor.getRangePointer(word_start_pos, word_length).strip()
                  if word_length < AUTOCOMPLETION_MIN_LEN:
                      return
                  ctr = {}
                  # anything preceded by a non-word-char and starting with the current word
                  match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                  ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                  if ignorecase:
                      # match case-insenstively if that's the language default
                      match_pat = '(?i)' + match_pat
                  editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                  autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                  autocomp_str = ' '.join(autocomp)
                  editor.autoCShow(word_length, autocomp_str)
              
              
              if __name__ == '__main__':
                  try:
                      CALLBACK_ADDED
                  except NameError:
                      CALLBACK_ADDED = 1
                      editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
              

              Edit: added a setting to choose whether to use lexer language to decide whether to ignore case

              1 Reply Last reply Reply Quote 1
              • Mark OlsonM
                Mark Olson
                last edited by Mark Olson

                One thing I should note about my scripts that I don’t think anyone has mentioned: there is no way to intercept Scintilla’s default autocompletion list (which includes language keywords) between when it is generated and when it is shown to the user. Thus, my script overrides the default autocompletions and I have no (easy) way to avoid this.

                The upshot of this would seem to be (if I read the relevant docs correctly) that this is more of a job for the Scintilla devs than the Notepad++ devs.

                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson
                  last edited by

                  OK, (probably) final update to my little script.
                  Since programmers don’t want their default autocompletions to be overriden for programming languages, and I’m concerned about it being a bit too CPU-hungry for very large files, I’ve added two new settings so that it doesn’t work on very big files and it only autocompletes for files with extensions from a predetermined list.

                  I’ve tried this version out on a decently large (90 kb) JSON file I got from an API, and I was pretty pleased with the results.

                  from Npp import *
                  
                  # BEGIN SETTINGS
                  AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                  CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                  USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                  ENABLED_EXTENSIONS = {
                      '', # files with no extension yet
                      'csv',
                      'txt',
                      'md',
                      'xml',
                      'json',
                      'tsv',
                      'log',
                      'dump',
                      'yaml',
                      'yml',
                  } # only use for files with these extensions
                  MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                  # END_SETTINGS
                  
                  def on_match(m, ctr, ignorecase):
                      '''increase the count of the current word by 1
                      if ignorecase, store only the uppercase version of each word'''
                      word = m.group(0)
                      if ignorecase:
                          word = word.upper()
                      ctr.setdefault(word, 0)
                      ctr[word] += 1
                  
                  def getWordRangeUnderCaret():
                      '''get the start and end of the word under the caret'''
                      pos            = editor.getCurrentPos()
                      word_start_pos = editor.wordStartPosition(pos, True)
                      word_end_pos   = editor.wordEndPosition(pos, True)
                      return word_start_pos, word_end_pos
                      
                  def getExtension(fname):
                      for ii in range(len(fname) - 1, -1, -1):
                          if fname[ii] == '.':
                              break
                      if ii == 0:
                          return ''
                      return fname[ii + 1:]
                  
                  def onCharInsert(notif):
                      '''Find all words in the document prefixed by the word under the caret
                          and show those words for autocompletion sorted by their frequency.
                          Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                      May ignore the case of words based on the lexer language
                          (e.g., will ignore case in SQL but not in Python)'''
                      if editor.getLength() > MAX_FILE_SIZE:
                          return
                      ext = getExtension(notepad.getCurrentFilename())
                      if ext not in ENABLED_EXTENSIONS:
                          return
                      word_start_pos, word_end_pos = getWordRangeUnderCaret()
                      word_length    = word_end_pos - word_start_pos
                      word = editor.getRangePointer(word_start_pos, word_length).strip()
                      if word_length < AUTOCOMPLETION_MIN_LEN:
                          return
                      ctr = {}
                      # anything preceded by a non-word-char and starting with the current word
                      match_pat = '(?<!{0}){1}({0}+)'.format(CHARS_TO_MATCH, word)
                      ignorecase = USE_LANGUAGE_IGNORECASE and editor.autoCGetIgnoreCase()
                      if ignorecase:
                          # match case-insenstively if that's the language default
                          match_pat = '(?i)' + match_pat
                      editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                      autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                      autocomp_str = ' '.join(autocomp)
                      editor.autoCShow(word_length, autocomp_str)
                  
                  
                  if __name__ == '__main__':
                      try:
                          CALLBACK_ADDED
                      except NameError:
                          CALLBACK_ADDED = 1
                          editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                  
                  asadMarmashA 1 Reply Last reply Reply Quote 4
                  • asadMarmashA
                    asadMarmash @Mark Olson
                    last edited by

                    @Mark-Olson I honestly didn’t expect someone to write me a plugin, so thanks a lot dude!

                    But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                    So for example if I have a file than contains this:

                    abc
                    abc
                    abcd
                    

                    Writing “abc” will only suggest “abcd”. While the native autocomplete returns both “abc” and “abcd”. I think it would be appropriate to return “abc” first regardless of the frequency of repetition, this is because it was found in the text, and not truely a “autocomplete”.
                    I tried coding this myself and failed :(

                    PeterJonesP Mark OlsonM 2 Replies Last reply Reply Quote 0
                    • PeterJonesP
                      PeterJones @asadMarmash
                      last edited by

                      @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                      and not truely a “autocomplete”.

                      Exactly. It’s not an autocomplete. If you want to exit out of the auto-complete interface and just accept the word as-is, hit the ESC key.

                      1 Reply Last reply Reply Quote 0
                      • Mark OlsonM
                        Mark Olson @asadMarmash
                        last edited by

                        @asadMarmash said in How can I get autocomplete to be sorted by frequency of repetition instead of alphabetically?:

                        But I have a question, how can I get it to suggest me the word I am currently writing at the top of the suggestions only if it is present in the text.
                        So for example if I have a file than contains this:

                        New setting.
                        With setting on and word present in text:
                        27d84825-b79c-4f61-8370-65f7de0166a1-image.png
                        With setting on and word absent in preceding text:
                        9a3f4c7b-68ef-40b7-9d6b-66a6d582cc57-image.png
                        So if the setting is on, the current word is absent from the autocompletion if the current word is its only occurrence, and it shows up first if the current word is present in the preceding text.

                        from Npp import *
                        
                        # BEGIN SETTINGS
                        AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                        CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                        USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                        DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                        ENABLED_EXTENSIONS = {
                            '', # files with no extension yet
                            'csv',
                            'txt',
                            'md',
                            'xml',
                            'json',
                            'tsv',
                            'log',
                            'dump',
                            'yaml',
                            'yml',
                        } # only use for files with these extensions
                        MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                        CURRENT_WORD_ONLY_IF_IN_TEXT = True
                        # END_SETTINGS
                        
                        def on_match(m, ctr, ignorecase):
                            '''increase the count of the current word by 1
                            if ignorecase, store only the uppercase version of each word'''
                            word = m.group(0)
                            if ignorecase:
                                word = word.upper()
                            ctr.setdefault(word, 0)
                            ctr[word] += 1
                        
                        def getWordRangeUnderCaret():
                            '''get the start and end of the word under the caret'''
                            pos            = editor.getCurrentPos()
                            word_start_pos = editor.wordStartPosition(pos, True)
                            word_end_pos   = editor.wordEndPosition(pos, True)
                            return word_start_pos, word_end_pos
                            
                        def getExtension(fname):
                            for ii in range(len(fname) - 1, -1, -1):
                                if fname[ii] == '.':
                                    break
                            if ii == 0:
                                return ''
                            return fname[ii + 1:]
                        
                        def onCharInsert(notif):
                            '''Find all words in the document prefixed by the word under the caret
                                and show those words for autocompletion sorted by their frequency.
                                Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                            May ignore the case of words based on the lexer language
                                (e.g., will ignore case in SQL but not in Python)'''
                            if editor.getLength() > MAX_FILE_SIZE:
                                return
                            ext = getExtension(notepad.getCurrentFilename())
                            if ext not in ENABLED_EXTENSIONS:
                                return
                            word_start_pos, word_end_pos = getWordRangeUnderCaret()
                            word_length    = word_end_pos - word_start_pos
                            word = editor.getRangePointer(word_start_pos, word_length).strip()
                            if word_length < AUTOCOMPLETION_MIN_LEN:
                                return
                            ctr = {}
                            # anything preceded by a non-word-char and starting with the current word
                            match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                            ignorecase = DEFAULT_IGNORECASE
                            if USE_LANGUAGE_IGNORECASE:
                                ignorecase = editor.autoCGetIgnoreCase()
                            else:
                                editor.autoCSetIgnoreCase(ignorecase)
                            if ignorecase:
                                # match case-insenstively if that's the language default
                                match_pat = '(?i)' + match_pat
                            editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                            word_was_in_text = False
                            if ignorecase:
                                word_was_in_text = ctr[word.upper()] > 1
                            else:
                                word_was_in_text = ctr[word] > 1
                            if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                del ctr[word]
                            autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                            if CURRENT_WORD_ONLY_IF_IN_TEXT and word_was_in_text:
                                if ignorecase:
                                    autocomp.insert(0, word.upper())
                                else:
                                    autocomp.insert(0, word)
                            autocomp_str = ' '.join(autocomp)
                            editor.autoCShow(word_length, autocomp_str)
                        
                        
                        if __name__ == '__main__':
                            try:
                                CALLBACK_ADDED
                            except NameError:
                                CALLBACK_ADDED = 1
                                editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                        
                        asadMarmashA 1 Reply Last reply Reply Quote 3
                        • asadMarmashA
                          asadMarmash @Mark Olson
                          last edited by asadMarmash

                          @Mark-Olson This is brilliant! Thank you Mark, I truly appreciate your effort <3

                          1 Reply Last reply Reply Quote 0
                          • Mark OlsonM
                            Mark Olson
                            last edited by

                            Disregard the version I posted in my previous post. It is bugged and does not work.

                            This version contains a proper working implementation of a feature in which the current word is boosted to the top if it previously occurred and is otherwise not shown.

                            from Npp import *
                            
                            # BEGIN SETTINGS
                            AUTOCOMPLETION_MIN_LEN = 2 # min length of word to trigger autocompletion
                            CHARS_TO_MATCH = r'[\w_-]' # characters that can be in "words" (by default most letters, digits, underscores, and dashes)
                            USE_LANGUAGE_IGNORECASE = True # use the document's lexer language setting for ignoring case
                            DEFAULT_IGNORECASE = False # should case be ignored if not using language ignorecase?
                            ENABLED_EXTENSIONS = {
                                '', # files with no extension yet
                                'csv',
                                'txt',
                                'md',
                                'xml',
                                'json',
                                'tsv',
                                'log',
                                'dump',
                                'yaml',
                                'yml',
                            } # only use for files with these extensions
                            MAX_FILE_SIZE = 200_000 # do not try autocompleting for files with more bytes than this
                            CURRENT_WORD_ONLY_IF_IN_TEXT = True
                            # END_SETTINGS
                            
                            def on_match(m, ctr, ignorecase):
                                '''increase the count of the current word by 1
                                if ignorecase, store only the uppercase version of each word'''
                                word = m.group(0)
                                if ignorecase:
                                    word = word.upper()
                                ctr.setdefault(word, 0)
                                ctr[word] += 1
                            
                            def getWordRangeUnderCaret():
                                '''get the start and end of the word under the caret'''
                                pos            = editor.getCurrentPos()
                                word_start_pos = editor.wordStartPosition(pos, True)
                                word_end_pos   = editor.wordEndPosition(pos, True)
                                return word_start_pos, word_end_pos
                                
                            def getExtension(fname):
                                for ii in range(len(fname) - 1, -1, -1):
                                    if fname[ii] == '.':
                                        break
                                if ii == 0:
                                    return ''
                                return fname[ii + 1:]
                            
                            def onCharInsert(notif):
                                '''Find all words in the document prefixed by the word under the caret
                                    and show those words for autocompletion sorted by their frequency.
                                    Ignore words with length less than AUTOCOMPLETION_MIN_LEN.
                                May ignore the case of words based on the lexer language
                                    (e.g., will ignore case in SQL but not in Python)'''
                                if editor.getLength() > MAX_FILE_SIZE:
                                    return
                                ext = getExtension(notepad.getCurrentFilename())
                                if ext not in ENABLED_EXTENSIONS:
                                    return
                                word_start_pos, word_end_pos = getWordRangeUnderCaret()
                                word_length    = word_end_pos - word_start_pos
                                word = editor.getRangePointer(word_start_pos, word_length).strip()
                                if word_length < AUTOCOMPLETION_MIN_LEN:
                                    return
                                ctr = {}
                                # anything preceded by a non-word-char and starting with the current word
                                match_pat = '(?<!{0}){1}({0}*)'.format(CHARS_TO_MATCH, word)
                                ignorecase = DEFAULT_IGNORECASE
                                if USE_LANGUAGE_IGNORECASE:
                                    ignorecase = editor.autoCGetIgnoreCase()
                                else:
                                    editor.autoCSetIgnoreCase(ignorecase)
                                if ignorecase:
                                    # match case-insenstively if that's the language default
                                    match_pat = '(?i)' + match_pat
                                editor.research(match_pat, lambda m: on_match(m, ctr, ignorecase))
                                if CURRENT_WORD_ONLY_IF_IN_TEXT:
                                    if ignorecase:
                                        upword = word.upper()
                                        if upword in ctr:
                                            if ctr[upword] > 1: # word earlier in text, move to front
                                                ctr[upword] = 10_000_000_000
                                            else:
                                                del ctr[upword] # word not in text, remove
                                    elif word in ctr:
                                        if ctr[word] > 1:
                                            ctr[word] = 10_000_000_000
                                        else:
                                            del ctr[word]
                                autocomp = sorted(ctr, key = lambda x: ctr[x], reverse=True)
                                autocomp_str = ' '.join(autocomp)
                                editor.autoCShow(word_length, autocomp_str)
                            
                            
                            if __name__ == '__main__':
                                try:
                                    CALLBACK_ADDED
                                except NameError:
                                    CALLBACK_ADDED = 1
                                    editor.callback(onCharInsert, [SCINTILLANOTIFICATION.CHARADDED])
                            
                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors