Community
    • Login

    File sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 8 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by

      Hi, @dave-pruce, @alan-kilborn and All,

      My previous list of rivers contained 5 duplicate names :

      Red, Murray, Yenisei, Araguaia and Colorado

      But this is not important, regarding our problem, anyway !

      As you can see, @@dave-pruce, the Python solution, from Alan, is neater ! Isn’t it ?


      Now, Alan, I’ve just tested your one-line script and, to my mind, there’s two problems :

      • Inside a section of river names, of a same length, the names are not sorted alphabetically !

      • Secondly, some names, containing accentuated characters, as, for instance, the Içá river, are located outside their section, as noticed, below :

      Snake
      Volta
      Tobol
      Jubba
      Içá
      Pecos
      Benue
      Iriri
      Lerma
      

      Cheers,

      guy038

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @guy038
        last edited by

        @guy038 said:

        names are not sorted alphabetically

        This is outside the scope of the originally stated problem! :)

        containing accentuated characters…are located outside their section

        The Python len function is apparently simple-minded in this case (using a simple byte count for the length of these strings containing multibyte characters).

        Alan KilbornA 1 Reply Last reply Reply Quote 3
        • Alan KilbornA
          Alan Kilborn @Alan Kilborn
          last edited by

          @Alan-Kilborn said:

          The Python len function is apparently simple-minded in this case

          Perhaps this new one-liner is better, for the case where the OP has Unicode data:

          editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=lambda x:len(unicode(x,'utf-8')))))
          

          Of course, still big assumption that the OP is using (or is willing to use) Pythonscript! ;)

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by guy038

            @dave-pruce, @alan-kilborn,

            Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

            As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

            For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

            Anyway, we just did our best to solve the OP’s problem ;-))

            BR

            guy038

            1 Reply Last reply Reply Quote 1
            • F
              freezer2022 @Dave Pruce
              last edited by freezer2022

              @ Dave-Pruce said :

              Is it possible to sort a file by line length??

              Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

              https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
              

              linesort.png

              1 Reply Last reply Reply Quote 1
              • CoisesC
                Coises @Alan Kilborn
                last edited by Coises

                This post is deleted!
                1 Reply Last reply Reply Quote 0
                • mkupperM
                  mkupper @guy038
                  last edited by

                  @guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

                  That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, All,

                    Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

                    Like in my previous post, I will use this list of rivers, below :

                    https://en.wikipedia.org/wiki/List_of_rivers_by_length

                    After suppression of some doublons, we get an INPUT text of 238 river’s names :

                    Nile
                    White Nile
                    Kagera
                    Nyabarongo
                    Mwogo
                    Rukarara
                    Amazon
                    Ucayali
                    Tambo
                    Ene
                    Mantaro
                    Yangtze
                    Mississippi
                    Missouri
                    Jefferson
                    Beaverhead
                    Red Rock
                    Hell Roaring
                    Yenisei
                    Angara
                    Selenge
                    Ider
                    Yellow River
                    Ob
                    Irtysh
                    Río de la Plata
                    Paraná
                    Congo
                    Chambeshi
                    Amur
                    Argun
                    Kherlen
                    Lena
                    Mekong
                    Mackenzie
                    Slave
                    Peace
                    Finlay
                    Niger
                    Brahmaputra
                    Tsangpo
                    Murray
                    Darling
                    Culgoa
                    Balonne
                    Condamine
                    Tocantins
                    Araguaia
                    Volga
                    Indus
                    Sênggê Zangbo
                    Shatt al-Arab
                    Euphrates
                    Murat
                    Madeira
                    Mamoré
                    Caine
                    Rocha
                    Purús
                    Yukon
                    São Francisco
                    Syr Darya
                    Naryn
                    Salween
                    Saint Lawrence
                    Niagara
                    Detroit
                    Saint Clair
                    Saint Marys
                    Saint Louis
                    North
                    Nizhnyaya Tunguska
                    Danube
                    Breg
                    Zambezi
                    Vilyuy
                    Ganges
                    Hooghly
                    Padma
                    Amu Darya
                    Panj
                    Japurá
                    Nelson
                    Saskatchewan
                    Paraguay
                    Kolyma
                    Pilcomayo
                    Biya
                    Katun
                    Ishim
                    Juruá
                    Ural
                    Arkansas
                    Colorado
                    Olenyok
                    Dnieper
                    Aldan
                    Ubangi
                    Uele
                    Negro
                    Columbia
                    Zhujiang
                    Red
                    Ayeyarwady
                    Kasai
                    Ohio
                    Allegheny
                    Orinoco
                    Tarim
                    Xingu
                    Orange
                    Salado
                    Vitim
                    Tigris
                    Songhua
                    Tapajós
                    Don
                    Podkamennaya Tunguska
                    Pechora
                    Kama
                    Limpopo
                    Chulym
                    Guaporé
                    Indigirka
                    Snake
                    Senegal
                    Uruguay
                    Blue Nile
                    Churchill
                    Khatanga
                    Okavango
                    Volta
                    Beni
                    Platte
                    Tobol
                    Alazeya
                    Jubba
                    Shebelle
                    Içá
                    Magdalena
                    Han
                    Kura
                    Oka
                    Guaviare
                    Pecos
                    Murrumbidgee
                    Godavari
                    Río Grande
                    Belaya
                    Cooper
                    Barcoo
                    Marañón
                    Dniester
                    Benue
                    Ili
                    Warburton
                    Georgina
                    Sutlej
                    Yamuna
                    Vyatka
                    Fraser
                    Brazos
                    Liao
                    Lachlan
                    Yalong
                    Iguaçu
                    Olyokma
                    Northern Dvina
                    Sukhona
                    Krishna
                    Iriri
                    Narmada
                    Lomami
                    Ottawa
                    Lerma
                    Grande de Santiago
                    Elbe
                    Vltava
                    Zeya
                    Juruena
                    Rhine
                    Athabasca
                    Canadian
                    North Saskatchewan
                    Vistula
                    Bug
                    Vaal
                    Shire
                    Ogooué
                    Nen
                    Kızılırmak
                    Markha
                    Green
                    Milk
                    Chindwin
                    Sankuru
                    Wu
                    James
                    Kapuas
                    Desna
                    Helmand
                    Madre de Dios
                    Tietê
                    Vychegda
                    Sepik
                    Cimarron
                    Anadyr
                    Paraíba do Sul
                    Jialing
                    Liard
                    Cumberland
                    White
                    Huallaga
                    Kwango
                    Draa
                    Gambia
                    Tyung
                    Chenab
                    Yellowstone
                    Ghaghara
                    Huai
                    Aras
                    Chu
                    Seversky Donets
                    Bermejo
                    Fly
                    Kuskokwim
                    Tennessee
                    Oder
                    Warta
                    Aruwimi
                    Daugava
                    Gila
                    Loire
                    Essequibo
                    Khoper
                    Tagus
                    Flinders
                    
                    • At end of the first line, we add some space chars till column 100

                    • Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

                    => We get this temporary text ( I just listed the first lines and the last lines ) :

                    Nile                                                                                               !
                    White Nile                                                                                         !
                    Kagera                                                                                             !
                    Nyabarongo                                                                                         !
                    Mwogo                                                                                              !
                    Rukarara                                                                                           !
                    Amazon                                                                                             !
                    Ucayali                                                                                            !
                    Tambo                                                                                              !
                    Ene                                                                                                !
                    Mantaro                                                                                            !
                    Yangtze                                                                                            !
                    Mississippi                                                                                        !
                    Missouri                                                                                           !
                    ......                                                                                             !
                    ......                                                                                             !
                    ......                                                                                             !
                    ......                                                                                             !
                    Seversky Donets                                                                                    !
                    Bermejo                                                                                            !
                    Fly                                                                                                !
                    Kuskokwim                                                                                          !
                    Tennessee                                                                                          !
                    Oder                                                                                               !
                    Warta                                                                                              !
                    Aruwimi                                                                                            !
                    Daugava                                                                                            !
                    Gila                                                                                               !
                    Loire                                                                                              !
                    Essequibo                                                                                          !
                    Khoper                                                                                             !
                    Tagus                                                                                              !
                    Flinders                                                                                           !
                    
                    
                    • Now, we perform this regex S/R :

                      • SEARCH ^([\w -]+?)(\x20+)(?=!)

                      • REPLACE \2\1

                    => Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                                                   Nile!
                                                                                                             White Nile!
                                                                                                                 Kagera!
                                                                                                             Nyabarongo!
                                                                                                                  Mwogo!
                                                                                                               Rukarara!
                                                                                                                 Amazon!
                                                                                                                Ucayali!
                                                                                                                  Tambo!
                                                                                                                    Ene!
                                                                                                                Mantaro!
                                                                                                                Yangtze!
                                                                                                            Mississippi!
                                                                                                               Missouri!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                        Seversky Donets!
                                                                                                                Bermejo!
                                                                                                                    Fly!
                                                                                                              Kuskokwim!
                                                                                                              Tennessee!
                                                                                                                   Oder!
                                                                                                                  Warta!
                                                                                                                Aruwimi!
                                                                                                                Daugava!
                                                                                                                   Gila!
                                                                                                                  Loire!
                                                                                                              Essequibo!
                                                                                                                 Khoper!
                                                                                                                  Tagus!
                                                                                                               Flinders!
                    
                    • Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

                    ==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                                     Ob!
                                                                                                                     Wu!
                                                                                                                    Bug!
                                                                                                                    Chu!
                                                                                                                    Don!
                                                                                                                    Ene!
                                                                                                                    Fly!
                                                                                                                    Han!
                                                                                                                    Ili!
                                                                                                                    Içá!
                                                                                                                    Nen!
                                                                                                                    Oka!
                                                                                                                    Red!
                                                                                                                   Amur!
                                                                                                                   Aras!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                                 ......!
                                                                                                           Saskatchewan!
                                                                                                           Yellow River!
                                                                                                          Madre de Dios!
                                                                                                          Shatt al-Arab!
                                                                                                          São Francisco!
                                                                                                          Sênggê Zangbo!
                                                                                                         Northern Dvina!
                                                                                                         Paraíba do Sul!
                                                                                                         Saint Lawrence!
                                                                                                        Río de la Plata!
                                                                                                        Seversky Donets!
                                                                                                     Grande de Santiago!
                                                                                                     Nizhnyaya Tunguska!
                                                                                                     North Saskatchewan!
                                                                                                  Podkamennaya Tunguska!
                    
                    • Finally, let’s run this last regex S/R

                      • SEARCH ^\x20+|!$

                      • REPLACE Leave EMPTY

                    => It remains our expected OUTPUT text, sorted by line length :

                    Ob
                    Wu
                    Bug
                    Chu
                    Don
                    Ene
                    Fly
                    Han
                    Ili
                    Içá
                    Nen
                    Oka
                    Red
                    Amur
                    Aras
                    Beni
                    Biya
                    Breg
                    Draa
                    Elbe
                    Gila
                    Huai
                    Ider
                    Kama
                    Kura
                    Lena
                    Liao
                    Milk
                    Nile
                    Oder
                    Ohio
                    Panj
                    Uele
                    Ural
                    Vaal
                    Zeya
                    Aldan
                    Argun
                    Benue
                    Caine
                    Congo
                    Desna
                    Green
                    Indus
                    Iriri
                    Ishim
                    James
                    Jubba
                    Juruá
                    Kasai
                    Katun
                    Lerma
                    Liard
                    Loire
                    Murat
                    Mwogo
                    Naryn
                    Negro
                    Niger
                    North
                    Padma
                    Peace
                    Pecos
                    Purús
                    Rhine
                    Rocha
                    Sepik
                    Shire
                    Slave
                    Snake
                    Tagus
                    Tambo
                    Tarim
                    Tietê
                    Tobol
                    Tyung
                    Vitim
                    Volga
                    Volta
                    Warta
                    White
                    Xingu
                    Yukon
                    Amazon
                    Anadyr
                    Angara
                    Barcoo
                    Belaya
                    Brazos
                    Chenab
                    Chulym
                    Cooper
                    Culgoa
                    Danube
                    Finlay
                    Fraser
                    Gambia
                    Ganges
                    Iguaçu
                    Irtysh
                    Japurá
                    Kagera
                    Kapuas
                    Khoper
                    Kolyma
                    Kwango
                    Lomami
                    Mamoré
                    Markha
                    Mekong
                    Murray
                    Nelson
                    Ogooué
                    Orange
                    Ottawa
                    Paraná
                    Platte
                    Salado
                    Sutlej
                    Tigris
                    Ubangi
                    Vilyuy
                    Vltava
                    Vyatka
                    Yalong
                    Yamuna
                    Alazeya
                    Aruwimi
                    Balonne
                    Bermejo
                    Darling
                    Daugava
                    Detroit
                    Dnieper
                    Guaporé
                    Helmand
                    Hooghly
                    Jialing
                    Juruena
                    Kherlen
                    Krishna
                    Lachlan
                    Limpopo
                    Madeira
                    Mantaro
                    Marañón
                    Narmada
                    Niagara
                    Olenyok
                    Olyokma
                    Orinoco
                    Pechora
                    Salween
                    Sankuru
                    Selenge
                    Senegal
                    Songhua
                    Sukhona
                    Tapajós
                    Tsangpo
                    Ucayali
                    Uruguay
                    Vistula
                    Yangtze
                    Yenisei
                    Zambezi
                    Araguaia
                    Arkansas
                    Canadian
                    Chindwin
                    Cimarron
                    Colorado
                    Columbia
                    Dniester
                    Flinders
                    Georgina
                    Ghaghara
                    Godavari
                    Guaviare
                    Huallaga
                    Khatanga
                    Missouri
                    Okavango
                    Paraguay
                    Red Rock
                    Rukarara
                    Shebelle
                    Vychegda
                    Zhujiang
                    Allegheny
                    Amu Darya
                    Athabasca
                    Blue Nile
                    Chambeshi
                    Churchill
                    Condamine
                    Essequibo
                    Euphrates
                    Indigirka
                    Jefferson
                    Kuskokwim
                    Mackenzie
                    Magdalena
                    Pilcomayo
                    Syr Darya
                    Tennessee
                    Tocantins
                    Warburton
                    Ayeyarwady
                    Beaverhead
                    Cumberland
                    Kızılırmak
                    Nyabarongo
                    Río Grande
                    White Nile
                    Brahmaputra
                    Mississippi
                    Saint Clair
                    Saint Louis
                    Saint Marys
                    Yellowstone
                    Hell Roaring
                    Murrumbidgee
                    Saskatchewan
                    Yellow River
                    Madre de Dios
                    Shatt al-Arab
                    São Francisco
                    Sênggê Zangbo
                    Northern Dvina
                    Paraíba do Sul
                    Saint Lawrence
                    Río de la Plata
                    Seversky Donets
                    Grande de Santiago
                    Nizhnyaya Tunguska
                    North Saskatchewan
                    Podkamennaya Tunguska
                    

                    That’s all ! Neat, isn’t it ?

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 2
                    • CoisesC
                      Coises
                      last edited by Coises

                      @Thomas-Knoefel

                      I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

                      I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

                      Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

                      1 Reply Last reply Reply Quote 1
                      • Mark OlsonM
                        Mark Olson
                        last edited by Mark Olson

                        JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
                        Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

                        This converts

                        abcdefg
                        ab
                        abcdefgh
                        a
                        abcdefghi
                        abcde
                        abcd
                        abc
                        

                        into

                        a
                        ab
                        abc
                        abcd
                        abcde
                        abcdefg
                        abcdefgh
                        abcdefghi
                        
                        Mahmoud MadkourM 1 Reply Last reply Reply Quote 2
                        • CoisesC
                          Coises
                          last edited by

                          In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

                          Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

                          Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

                          The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

                          1 Reply Last reply Reply Quote 4
                          • Mahmoud MadkourM
                            Mahmoud Madkour @Mark Olson
                            last edited by

                            @Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
                            1- how to open the file in tree view
                            2- how to go to REGEX mode to enter the query

                            many thanks

                            Mark OlsonM 1 Reply Last reply Reply Quote 0
                            • Mark OlsonM
                              Mark Olson @Mahmoud Madkour
                              last edited by

                              @Mahmoud-Madkour
                              To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
                              Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.

                              1 Reply Last reply Reply Quote 3
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors