Sort by the largest number of the same results (frequency)
-
Hello
I have a question sorting
Now I’m using Notepad++ but it is difficult to sort in my way.For example
I have few examples
10AndrewJNVR
10Andrpt1Pf7
10Anuiot18g2
10Andrew1H54
10Andrew17yb
10Andre614uw
10Andr321EUb
10And48n1Q2K
10And1in15tA
10Andrew13gG
10An987ppKeP
10Andrew1Dom
How to sort text in Notepad ++ (or other addon) like in bellow
at the very top after sorting there will be addressess with the names most often repeating or closest,
then subsequent, subsequent and subsequent ones as in the example belowSo
How to sort to this way:
10Andrew1Dom
10Andrew1H54
10Andrew13gG
10Andrew17yb
10AndrewJNVR
10Andre614uw
10Andr321EUb
10Andrpt1Pf7
10And1in15tA
10And48n1Q2K
10Ana87ppKeP
10Anuiot18g2
please help
-
That’s less of a general purpose sorting function, so it’s not going to be native to Notepad++, and probably not already in any major plugin.
In general, such tasks should be solved in a programming language of your choice. Using one of Notepad++ scripting plugins, like PythonScript, you could implement the algorithm in such a way that it makes use of the active files in Notepad++… but it’s still a general-purpose coding question, without much that’s specific to Notepad++.
-
To give you a leg up, here’s the Notepad++/PythonScript-plugin-specific aspect. You will just have to replace the indicated two lines from the script with the python algorithm that actually implements your specific sort. My example just sorts alphabetically, which I know isn’t what you want.
# encoding=utf-8 """in response to https://notepad-plus-plus.org/community/topic/19099/ This does not solve the problem. This just gives the Notepad++ and PythonScript specific parts of the answer The actual implementation of the sorting algorithm is a general Python programming exercise, and whether or not Notepad++ exists has no bearing on that part of the coding (thus not a question for this forum) """ # step 0: assume data is active file in editor1 (main/left view) # debug: console.clear() # step 1: grab all the data from the editor1; keep the newline sequence, since I'll be printing it out later contentsArray = [] def grabContentsArray(contents, lineNumber, totalLines): contentsArray.append(contents) editor1.forEachLine( grabContentsArray ) # step 2: define a function that implements _your_ sort algorithm; # it's a generic programming exercise, nothing Notepad++ or PythonScript-plugin specific, # so left for you to implement def sortTheContents( inputArray ): # these next two lines should be replaced by the real algorithm returnArray = list(inputArray) # this will have to be replaced by your actual algorithm returnArray.sort() # in-place alphabetical sort # once the algorithm is done, return the result here return returnArray sortedContents = sortTheContents( contentsArray ) # step 3: replace the entire file's contents with the sorted data editor1.beginUndoAction() editor1.clearAll() # debug: console.show() for s in sortedContents: # debug: console.write(s) editor1.addText(s) editor1.endUndoAction()
-
If we’re just providing “a leg up”, then I’ll donate this bit, which I had started when Peter’s reply showed up:
# -*- coding: utf-8 -*- def custom_sort_function(line_content): # # do your custom logic here # to return a pseudo-key # that will be used to determine # one line's placement relative # to another # return line_content # <--------- this example merely returns original line, so we'll get a normal alpha sort lines_list = editor.getText().splitlines() lines_list.sort(key=custom_sort_function) eol = ['\r\n', '\r', '\n'][editor.getEOLMode()] editor.beginUndoAction() editor.setText(eol.join(lines_list) + eol) editor.endUndoAction()
–
Moderator EDIT (2024-Jan-14): The author of the script has found a fairly serious bug with the code published here for those that use Mac-style or Linux-style line-endings in their files. The logic for Mac and Linux was reversed, and thus if the script was used on one type of file, the line-endings for the opposite type of file could end up in the file after the script is run. This is insidious, because unless one works with visible line-endings turned on, this is likely not noticed. Some detail on the problem is HERE. The script above has been corrected per that instruction. -
I’m sorry but I don’t understand much
I inserted your code and executed it but it doesn’t work
could you lead my hand??
like step by step with my example? -
@Martin-X said in Sort by the largest number of the same results (frequency):
I inserted your code and executed it but it doesn’t work
We told you it wouldn’t. This forum isn’t a generic code-writing service. We gave you two ways to interface between the script and Notepad++. But you (or someone you hire – and this isn’t a jobs-board, either) will have to write the code that does the specific sort you want. Your sort algorithm does not exist in any prepackaged sort function that I know of. And that algorithm has nothing to do with Notepad++; once you get the data into the programming language (those two examples both got it into python), the problem is a general programming exercise, and nothing specific to this editor, so it’s off topic for the forum.
Honestly, if I were programming this,
- I would skip the Notepad++ interface, and just use the programming language’s file I/O, because that’s often simpler
- I would force you to give much better definition of your requirements, and what is and isn’t present in the data set.
But the algorithm you somewhat defined does not exist in Notepad++ or any of its plugins that I know of, and isn’t likely to be needed in the general Notepad++ usage. This is a perfect task for a programming language, and not something that is the topic of this forum.
-
Hello, @martin-x, @alan-kilborn, @peterjones and All,
In addition to the points commented by @peterjones and @alan-kilborn, I don’t understand, clearly, your sort algorithm as well as your output list, below, where I added some space chars for readability :
10Andrew 1Dom 10Andrew 1H54 10Andrew 13gG 10Andrew 17yb 10Andrew JNVR 10Andr e614uw 10Andr 321EUb 10Andr pt1Pf7 10And 1in15tA 10And 48n1Q2K 10An 987ppKeP ( and NOT 10An a87ppKeP ! ) 10An uiot18g2
Seemingly, the varying parts, at end of lines, do not respect your initial or a particular order !
For instance, if you would have followed the alphabetic order, regarding the ending parts, you should get the following list :
10Andrew 13gG 10Andrew 17yb 10Andrew 1Dom 10Andrew 1H54 10Andrew JNVR 10Andr 321EUb 10Andr e614uw 10Andr pt1Pf7 10And 1in15tA 10And 48n1Q2K 10An 987ppKeP 10An uiot18g2
Secondly, if I try to get closed to your sort algorithm, your list should be, strictly speaking :
10Andrew1 3gG 10Andrew1 7yb 10Andrew1 Dom 10Andrew1 H54 10Andre 614uw 10Andre wJNVR 10Andr 321EUb 10Andr pt1Pf7 10And 1in15tA 10And 48n1Q2K 10An 987ppKeP 10An uiot18g2
Finally, I tried to think about a solution, involving regular expressions, without any valuable result, yet :-((
Best Regards,
guy038
-
Hi, @martin-x,
I succeeded to imagine a process, which, however, needs some regex searches and some other regex replacements, not easy to recapitulate in a post, at first sight :-(
So, if your data are not confidential and/or not personal, can you e-mail me an example, of an average size, to :
Of course, may I ask you to add, from your example, which text you’re expecting to ? Moreover, could you describe, as accurately as possible, the customized sort algorithm used ?
Best Regards,
guy038
-
@guy038
Nice :)
Ok I will send You example
thanks -
If you’ve used a script in this thread, you might want to double check your copy of it for a bug I’ve discovered.
Look to previous postings in this topic thread where the script has been changed – find the textmoderator edit (2024-Jan-14)
.
There’s a link there that describes the bug in more detail, and shows what needs to be changed in an old copy (or you can simply grab a copy of the current version).