Community
    • Login

    File sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 8 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Alan Kilborn
      last edited by

      @Alan-Kilborn said:

      The Python len function is apparently simple-minded in this case

      Perhaps this new one-liner is better, for the case where the OP has Unicode data:

      editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=lambda x:len(unicode(x,'utf-8')))))
      

      Of course, still big assumption that the OP is using (or is willing to use) Pythonscript! ;)

      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by guy038

        @dave-pruce, @alan-kilborn,

        Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

        As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

        For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

        Anyway, we just did our best to solve the OP’s problem ;-))

        BR

        guy038

        1 Reply Last reply Reply Quote 1
        • F
          freezer2022 @Dave Pruce
          last edited by freezer2022

          @ Dave-Pruce said :

          Is it possible to sort a file by line length??

          Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

          https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
          

          linesort.png

          1 Reply Last reply Reply Quote 1
          • CoisesC
            Coises @Alan Kilborn
            last edited by Coises

            This post is deleted!
            1 Reply Last reply Reply Quote 0
            • mkupperM
              mkupper @guy038
              last edited by

              @guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

              That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hello, All,

                Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

                Like in my previous post, I will use this list of rivers, below :

                https://en.wikipedia.org/wiki/List_of_rivers_by_length

                After suppression of some doublons, we get an INPUT text of 238 river’s names :

                Nile
                White Nile
                Kagera
                Nyabarongo
                Mwogo
                Rukarara
                Amazon
                Ucayali
                Tambo
                Ene
                Mantaro
                Yangtze
                Mississippi
                Missouri
                Jefferson
                Beaverhead
                Red Rock
                Hell Roaring
                Yenisei
                Angara
                Selenge
                Ider
                Yellow River
                Ob
                Irtysh
                Río de la Plata
                Paraná
                Congo
                Chambeshi
                Amur
                Argun
                Kherlen
                Lena
                Mekong
                Mackenzie
                Slave
                Peace
                Finlay
                Niger
                Brahmaputra
                Tsangpo
                Murray
                Darling
                Culgoa
                Balonne
                Condamine
                Tocantins
                Araguaia
                Volga
                Indus
                Sênggê Zangbo
                Shatt al-Arab
                Euphrates
                Murat
                Madeira
                Mamoré
                Caine
                Rocha
                Purús
                Yukon
                São Francisco
                Syr Darya
                Naryn
                Salween
                Saint Lawrence
                Niagara
                Detroit
                Saint Clair
                Saint Marys
                Saint Louis
                North
                Nizhnyaya Tunguska
                Danube
                Breg
                Zambezi
                Vilyuy
                Ganges
                Hooghly
                Padma
                Amu Darya
                Panj
                Japurá
                Nelson
                Saskatchewan
                Paraguay
                Kolyma
                Pilcomayo
                Biya
                Katun
                Ishim
                Juruá
                Ural
                Arkansas
                Colorado
                Olenyok
                Dnieper
                Aldan
                Ubangi
                Uele
                Negro
                Columbia
                Zhujiang
                Red
                Ayeyarwady
                Kasai
                Ohio
                Allegheny
                Orinoco
                Tarim
                Xingu
                Orange
                Salado
                Vitim
                Tigris
                Songhua
                Tapajós
                Don
                Podkamennaya Tunguska
                Pechora
                Kama
                Limpopo
                Chulym
                Guaporé
                Indigirka
                Snake
                Senegal
                Uruguay
                Blue Nile
                Churchill
                Khatanga
                Okavango
                Volta
                Beni
                Platte
                Tobol
                Alazeya
                Jubba
                Shebelle
                Içá
                Magdalena
                Han
                Kura
                Oka
                Guaviare
                Pecos
                Murrumbidgee
                Godavari
                Río Grande
                Belaya
                Cooper
                Barcoo
                Marañón
                Dniester
                Benue
                Ili
                Warburton
                Georgina
                Sutlej
                Yamuna
                Vyatka
                Fraser
                Brazos
                Liao
                Lachlan
                Yalong
                Iguaçu
                Olyokma
                Northern Dvina
                Sukhona
                Krishna
                Iriri
                Narmada
                Lomami
                Ottawa
                Lerma
                Grande de Santiago
                Elbe
                Vltava
                Zeya
                Juruena
                Rhine
                Athabasca
                Canadian
                North Saskatchewan
                Vistula
                Bug
                Vaal
                Shire
                Ogooué
                Nen
                Kızılırmak
                Markha
                Green
                Milk
                Chindwin
                Sankuru
                Wu
                James
                Kapuas
                Desna
                Helmand
                Madre de Dios
                Tietê
                Vychegda
                Sepik
                Cimarron
                Anadyr
                Paraíba do Sul
                Jialing
                Liard
                Cumberland
                White
                Huallaga
                Kwango
                Draa
                Gambia
                Tyung
                Chenab
                Yellowstone
                Ghaghara
                Huai
                Aras
                Chu
                Seversky Donets
                Bermejo
                Fly
                Kuskokwim
                Tennessee
                Oder
                Warta
                Aruwimi
                Daugava
                Gila
                Loire
                Essequibo
                Khoper
                Tagus
                Flinders
                
                • At end of the first line, we add some space chars till column 100

                • Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

                => We get this temporary text ( I just listed the first lines and the last lines ) :

                Nile                                                                                               !
                White Nile                                                                                         !
                Kagera                                                                                             !
                Nyabarongo                                                                                         !
                Mwogo                                                                                              !
                Rukarara                                                                                           !
                Amazon                                                                                             !
                Ucayali                                                                                            !
                Tambo                                                                                              !
                Ene                                                                                                !
                Mantaro                                                                                            !
                Yangtze                                                                                            !
                Mississippi                                                                                        !
                Missouri                                                                                           !
                ......                                                                                             !
                ......                                                                                             !
                ......                                                                                             !
                ......                                                                                             !
                Seversky Donets                                                                                    !
                Bermejo                                                                                            !
                Fly                                                                                                !
                Kuskokwim                                                                                          !
                Tennessee                                                                                          !
                Oder                                                                                               !
                Warta                                                                                              !
                Aruwimi                                                                                            !
                Daugava                                                                                            !
                Gila                                                                                               !
                Loire                                                                                              !
                Essequibo                                                                                          !
                Khoper                                                                                             !
                Tagus                                                                                              !
                Flinders                                                                                           !
                
                
                • Now, we perform this regex S/R :

                  • SEARCH ^([\w -]+?)(\x20+)(?=!)

                  • REPLACE \2\1

                => Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                                               Nile!
                                                                                                         White Nile!
                                                                                                             Kagera!
                                                                                                         Nyabarongo!
                                                                                                              Mwogo!
                                                                                                           Rukarara!
                                                                                                             Amazon!
                                                                                                            Ucayali!
                                                                                                              Tambo!
                                                                                                                Ene!
                                                                                                            Mantaro!
                                                                                                            Yangtze!
                                                                                                        Mississippi!
                                                                                                           Missouri!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                    Seversky Donets!
                                                                                                            Bermejo!
                                                                                                                Fly!
                                                                                                          Kuskokwim!
                                                                                                          Tennessee!
                                                                                                               Oder!
                                                                                                              Warta!
                                                                                                            Aruwimi!
                                                                                                            Daugava!
                                                                                                               Gila!
                                                                                                              Loire!
                                                                                                          Essequibo!
                                                                                                             Khoper!
                                                                                                              Tagus!
                                                                                                           Flinders!
                
                • Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

                ==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                                 Ob!
                                                                                                                 Wu!
                                                                                                                Bug!
                                                                                                                Chu!
                                                                                                                Don!
                                                                                                                Ene!
                                                                                                                Fly!
                                                                                                                Han!
                                                                                                                Ili!
                                                                                                                Içá!
                                                                                                                Nen!
                                                                                                                Oka!
                                                                                                                Red!
                                                                                                               Amur!
                                                                                                               Aras!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                             ......!
                                                                                                       Saskatchewan!
                                                                                                       Yellow River!
                                                                                                      Madre de Dios!
                                                                                                      Shatt al-Arab!
                                                                                                      São Francisco!
                                                                                                      Sênggê Zangbo!
                                                                                                     Northern Dvina!
                                                                                                     Paraíba do Sul!
                                                                                                     Saint Lawrence!
                                                                                                    Río de la Plata!
                                                                                                    Seversky Donets!
                                                                                                 Grande de Santiago!
                                                                                                 Nizhnyaya Tunguska!
                                                                                                 North Saskatchewan!
                                                                                              Podkamennaya Tunguska!
                
                • Finally, let’s run this last regex S/R

                  • SEARCH ^\x20+|!$

                  • REPLACE Leave EMPTY

                => It remains our expected OUTPUT text, sorted by line length :

                Ob
                Wu
                Bug
                Chu
                Don
                Ene
                Fly
                Han
                Ili
                Içá
                Nen
                Oka
                Red
                Amur
                Aras
                Beni
                Biya
                Breg
                Draa
                Elbe
                Gila
                Huai
                Ider
                Kama
                Kura
                Lena
                Liao
                Milk
                Nile
                Oder
                Ohio
                Panj
                Uele
                Ural
                Vaal
                Zeya
                Aldan
                Argun
                Benue
                Caine
                Congo
                Desna
                Green
                Indus
                Iriri
                Ishim
                James
                Jubba
                Juruá
                Kasai
                Katun
                Lerma
                Liard
                Loire
                Murat
                Mwogo
                Naryn
                Negro
                Niger
                North
                Padma
                Peace
                Pecos
                Purús
                Rhine
                Rocha
                Sepik
                Shire
                Slave
                Snake
                Tagus
                Tambo
                Tarim
                Tietê
                Tobol
                Tyung
                Vitim
                Volga
                Volta
                Warta
                White
                Xingu
                Yukon
                Amazon
                Anadyr
                Angara
                Barcoo
                Belaya
                Brazos
                Chenab
                Chulym
                Cooper
                Culgoa
                Danube
                Finlay
                Fraser
                Gambia
                Ganges
                Iguaçu
                Irtysh
                Japurá
                Kagera
                Kapuas
                Khoper
                Kolyma
                Kwango
                Lomami
                Mamoré
                Markha
                Mekong
                Murray
                Nelson
                Ogooué
                Orange
                Ottawa
                Paraná
                Platte
                Salado
                Sutlej
                Tigris
                Ubangi
                Vilyuy
                Vltava
                Vyatka
                Yalong
                Yamuna
                Alazeya
                Aruwimi
                Balonne
                Bermejo
                Darling
                Daugava
                Detroit
                Dnieper
                Guaporé
                Helmand
                Hooghly
                Jialing
                Juruena
                Kherlen
                Krishna
                Lachlan
                Limpopo
                Madeira
                Mantaro
                Marañón
                Narmada
                Niagara
                Olenyok
                Olyokma
                Orinoco
                Pechora
                Salween
                Sankuru
                Selenge
                Senegal
                Songhua
                Sukhona
                Tapajós
                Tsangpo
                Ucayali
                Uruguay
                Vistula
                Yangtze
                Yenisei
                Zambezi
                Araguaia
                Arkansas
                Canadian
                Chindwin
                Cimarron
                Colorado
                Columbia
                Dniester
                Flinders
                Georgina
                Ghaghara
                Godavari
                Guaviare
                Huallaga
                Khatanga
                Missouri
                Okavango
                Paraguay
                Red Rock
                Rukarara
                Shebelle
                Vychegda
                Zhujiang
                Allegheny
                Amu Darya
                Athabasca
                Blue Nile
                Chambeshi
                Churchill
                Condamine
                Essequibo
                Euphrates
                Indigirka
                Jefferson
                Kuskokwim
                Mackenzie
                Magdalena
                Pilcomayo
                Syr Darya
                Tennessee
                Tocantins
                Warburton
                Ayeyarwady
                Beaverhead
                Cumberland
                Kızılırmak
                Nyabarongo
                Río Grande
                White Nile
                Brahmaputra
                Mississippi
                Saint Clair
                Saint Louis
                Saint Marys
                Yellowstone
                Hell Roaring
                Murrumbidgee
                Saskatchewan
                Yellow River
                Madre de Dios
                Shatt al-Arab
                São Francisco
                Sênggê Zangbo
                Northern Dvina
                Paraíba do Sul
                Saint Lawrence
                Río de la Plata
                Seversky Donets
                Grande de Santiago
                Nizhnyaya Tunguska
                North Saskatchewan
                Podkamennaya Tunguska
                

                That’s all ! Neat, isn’t it ?

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 2
                • CoisesC
                  Coises
                  last edited by Coises

                  @Thomas-Knoefel

                  I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

                  I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

                  Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

                  1 Reply Last reply Reply Quote 1
                  • Mark OlsonM
                    Mark Olson
                    last edited by Mark Olson

                    JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
                    Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

                    This converts

                    abcdefg
                    ab
                    abcdefgh
                    a
                    abcdefghi
                    abcde
                    abcd
                    abc
                    

                    into

                    a
                    ab
                    abc
                    abcd
                    abcde
                    abcdefg
                    abcdefgh
                    abcdefghi
                    
                    Mahmoud MadkourM 1 Reply Last reply Reply Quote 2
                    • CoisesC
                      Coises
                      last edited by

                      In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

                      Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

                      Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

                      The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

                      1 Reply Last reply Reply Quote 4
                      • Mahmoud MadkourM
                        Mahmoud Madkour @Mark Olson
                        last edited by

                        @Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
                        1- how to open the file in tree view
                        2- how to go to REGEX mode to enter the query

                        many thanks

                        Mark OlsonM 1 Reply Last reply Reply Quote 0
                        • Mark OlsonM
                          Mark Olson @Mahmoud Madkour
                          last edited by

                          @Mahmoud-Madkour
                          To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
                          Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.

                          1 Reply Last reply Reply Quote 3
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors