Community
    • Login

    File sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 8 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @guy038
      last edited by

      @guy038 said:

      names are not sorted alphabetically

      This is outside the scope of the originally stated problem! :)

      containing accentuated characters…are located outside their section

      The Python len function is apparently simple-minded in this case (using a simple byte count for the length of these strings containing multibyte characters).

      Alan KilbornA 1 Reply Last reply Reply Quote 3
      • Alan KilbornA
        Alan Kilborn @Alan Kilborn
        last edited by

        @Alan-Kilborn said:

        The Python len function is apparently simple-minded in this case

        Perhaps this new one-liner is better, for the case where the OP has Unicode data:

        editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=lambda x:len(unicode(x,'utf-8')))))
        

        Of course, still big assumption that the OP is using (or is willing to use) Pythonscript! ;)

        1 Reply Last reply Reply Quote 2
        • guy038G
          guy038
          last edited by guy038

          @dave-pruce, @alan-kilborn,

          Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

          As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

          For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

          Anyway, we just did our best to solve the OP’s problem ;-))

          BR

          guy038

          1 Reply Last reply Reply Quote 1
          • F
            freezer2022 @Dave Pruce
            last edited by freezer2022

            @ Dave-Pruce said :

            Is it possible to sort a file by line length??

            Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

            https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
            

            linesort.png

            1 Reply Last reply Reply Quote 1
            • CoisesC
              Coises @Alan Kilborn
              last edited by Coises

              This post is deleted!
              1 Reply Last reply Reply Quote 0
              • mkupperM
                mkupper @guy038
                last edited by

                @guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

                That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, All,

                  Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

                  Like in my previous post, I will use this list of rivers, below :

                  https://en.wikipedia.org/wiki/List_of_rivers_by_length

                  After suppression of some doublons, we get an INPUT text of 238 river’s names :

                  Nile
                  White Nile
                  Kagera
                  Nyabarongo
                  Mwogo
                  Rukarara
                  Amazon
                  Ucayali
                  Tambo
                  Ene
                  Mantaro
                  Yangtze
                  Mississippi
                  Missouri
                  Jefferson
                  Beaverhead
                  Red Rock
                  Hell Roaring
                  Yenisei
                  Angara
                  Selenge
                  Ider
                  Yellow River
                  Ob
                  Irtysh
                  Río de la Plata
                  Paraná
                  Congo
                  Chambeshi
                  Amur
                  Argun
                  Kherlen
                  Lena
                  Mekong
                  Mackenzie
                  Slave
                  Peace
                  Finlay
                  Niger
                  Brahmaputra
                  Tsangpo
                  Murray
                  Darling
                  Culgoa
                  Balonne
                  Condamine
                  Tocantins
                  Araguaia
                  Volga
                  Indus
                  Sênggê Zangbo
                  Shatt al-Arab
                  Euphrates
                  Murat
                  Madeira
                  Mamoré
                  Caine
                  Rocha
                  Purús
                  Yukon
                  São Francisco
                  Syr Darya
                  Naryn
                  Salween
                  Saint Lawrence
                  Niagara
                  Detroit
                  Saint Clair
                  Saint Marys
                  Saint Louis
                  North
                  Nizhnyaya Tunguska
                  Danube
                  Breg
                  Zambezi
                  Vilyuy
                  Ganges
                  Hooghly
                  Padma
                  Amu Darya
                  Panj
                  Japurá
                  Nelson
                  Saskatchewan
                  Paraguay
                  Kolyma
                  Pilcomayo
                  Biya
                  Katun
                  Ishim
                  Juruá
                  Ural
                  Arkansas
                  Colorado
                  Olenyok
                  Dnieper
                  Aldan
                  Ubangi
                  Uele
                  Negro
                  Columbia
                  Zhujiang
                  Red
                  Ayeyarwady
                  Kasai
                  Ohio
                  Allegheny
                  Orinoco
                  Tarim
                  Xingu
                  Orange
                  Salado
                  Vitim
                  Tigris
                  Songhua
                  Tapajós
                  Don
                  Podkamennaya Tunguska
                  Pechora
                  Kama
                  Limpopo
                  Chulym
                  Guaporé
                  Indigirka
                  Snake
                  Senegal
                  Uruguay
                  Blue Nile
                  Churchill
                  Khatanga
                  Okavango
                  Volta
                  Beni
                  Platte
                  Tobol
                  Alazeya
                  Jubba
                  Shebelle
                  Içá
                  Magdalena
                  Han
                  Kura
                  Oka
                  Guaviare
                  Pecos
                  Murrumbidgee
                  Godavari
                  Río Grande
                  Belaya
                  Cooper
                  Barcoo
                  Marañón
                  Dniester
                  Benue
                  Ili
                  Warburton
                  Georgina
                  Sutlej
                  Yamuna
                  Vyatka
                  Fraser
                  Brazos
                  Liao
                  Lachlan
                  Yalong
                  Iguaçu
                  Olyokma
                  Northern Dvina
                  Sukhona
                  Krishna
                  Iriri
                  Narmada
                  Lomami
                  Ottawa
                  Lerma
                  Grande de Santiago
                  Elbe
                  Vltava
                  Zeya
                  Juruena
                  Rhine
                  Athabasca
                  Canadian
                  North Saskatchewan
                  Vistula
                  Bug
                  Vaal
                  Shire
                  Ogooué
                  Nen
                  Kızılırmak
                  Markha
                  Green
                  Milk
                  Chindwin
                  Sankuru
                  Wu
                  James
                  Kapuas
                  Desna
                  Helmand
                  Madre de Dios
                  Tietê
                  Vychegda
                  Sepik
                  Cimarron
                  Anadyr
                  Paraíba do Sul
                  Jialing
                  Liard
                  Cumberland
                  White
                  Huallaga
                  Kwango
                  Draa
                  Gambia
                  Tyung
                  Chenab
                  Yellowstone
                  Ghaghara
                  Huai
                  Aras
                  Chu
                  Seversky Donets
                  Bermejo
                  Fly
                  Kuskokwim
                  Tennessee
                  Oder
                  Warta
                  Aruwimi
                  Daugava
                  Gila
                  Loire
                  Essequibo
                  Khoper
                  Tagus
                  Flinders
                  
                  • At end of the first line, we add some space chars till column 100

                  • Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

                  => We get this temporary text ( I just listed the first lines and the last lines ) :

                  Nile                                                                                               !
                  White Nile                                                                                         !
                  Kagera                                                                                             !
                  Nyabarongo                                                                                         !
                  Mwogo                                                                                              !
                  Rukarara                                                                                           !
                  Amazon                                                                                             !
                  Ucayali                                                                                            !
                  Tambo                                                                                              !
                  Ene                                                                                                !
                  Mantaro                                                                                            !
                  Yangtze                                                                                            !
                  Mississippi                                                                                        !
                  Missouri                                                                                           !
                  ......                                                                                             !
                  ......                                                                                             !
                  ......                                                                                             !
                  ......                                                                                             !
                  Seversky Donets                                                                                    !
                  Bermejo                                                                                            !
                  Fly                                                                                                !
                  Kuskokwim                                                                                          !
                  Tennessee                                                                                          !
                  Oder                                                                                               !
                  Warta                                                                                              !
                  Aruwimi                                                                                            !
                  Daugava                                                                                            !
                  Gila                                                                                               !
                  Loire                                                                                              !
                  Essequibo                                                                                          !
                  Khoper                                                                                             !
                  Tagus                                                                                              !
                  Flinders                                                                                           !
                  
                  
                  • Now, we perform this regex S/R :

                    • SEARCH ^([\w -]+?)(\x20+)(?=!)

                    • REPLACE \2\1

                  => Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                                                 Nile!
                                                                                                           White Nile!
                                                                                                               Kagera!
                                                                                                           Nyabarongo!
                                                                                                                Mwogo!
                                                                                                             Rukarara!
                                                                                                               Amazon!
                                                                                                              Ucayali!
                                                                                                                Tambo!
                                                                                                                  Ene!
                                                                                                              Mantaro!
                                                                                                              Yangtze!
                                                                                                          Mississippi!
                                                                                                             Missouri!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                      Seversky Donets!
                                                                                                              Bermejo!
                                                                                                                  Fly!
                                                                                                            Kuskokwim!
                                                                                                            Tennessee!
                                                                                                                 Oder!
                                                                                                                Warta!
                                                                                                              Aruwimi!
                                                                                                              Daugava!
                                                                                                                 Gila!
                                                                                                                Loire!
                                                                                                            Essequibo!
                                                                                                               Khoper!
                                                                                                                Tagus!
                                                                                                             Flinders!
                  
                  • Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

                  ==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                                   Ob!
                                                                                                                   Wu!
                                                                                                                  Bug!
                                                                                                                  Chu!
                                                                                                                  Don!
                                                                                                                  Ene!
                                                                                                                  Fly!
                                                                                                                  Han!
                                                                                                                  Ili!
                                                                                                                  Içá!
                                                                                                                  Nen!
                                                                                                                  Oka!
                                                                                                                  Red!
                                                                                                                 Amur!
                                                                                                                 Aras!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                               ......!
                                                                                                         Saskatchewan!
                                                                                                         Yellow River!
                                                                                                        Madre de Dios!
                                                                                                        Shatt al-Arab!
                                                                                                        São Francisco!
                                                                                                        Sênggê Zangbo!
                                                                                                       Northern Dvina!
                                                                                                       Paraíba do Sul!
                                                                                                       Saint Lawrence!
                                                                                                      Río de la Plata!
                                                                                                      Seversky Donets!
                                                                                                   Grande de Santiago!
                                                                                                   Nizhnyaya Tunguska!
                                                                                                   North Saskatchewan!
                                                                                                Podkamennaya Tunguska!
                  
                  • Finally, let’s run this last regex S/R

                    • SEARCH ^\x20+|!$

                    • REPLACE Leave EMPTY

                  => It remains our expected OUTPUT text, sorted by line length :

                  Ob
                  Wu
                  Bug
                  Chu
                  Don
                  Ene
                  Fly
                  Han
                  Ili
                  Içá
                  Nen
                  Oka
                  Red
                  Amur
                  Aras
                  Beni
                  Biya
                  Breg
                  Draa
                  Elbe
                  Gila
                  Huai
                  Ider
                  Kama
                  Kura
                  Lena
                  Liao
                  Milk
                  Nile
                  Oder
                  Ohio
                  Panj
                  Uele
                  Ural
                  Vaal
                  Zeya
                  Aldan
                  Argun
                  Benue
                  Caine
                  Congo
                  Desna
                  Green
                  Indus
                  Iriri
                  Ishim
                  James
                  Jubba
                  Juruá
                  Kasai
                  Katun
                  Lerma
                  Liard
                  Loire
                  Murat
                  Mwogo
                  Naryn
                  Negro
                  Niger
                  North
                  Padma
                  Peace
                  Pecos
                  Purús
                  Rhine
                  Rocha
                  Sepik
                  Shire
                  Slave
                  Snake
                  Tagus
                  Tambo
                  Tarim
                  Tietê
                  Tobol
                  Tyung
                  Vitim
                  Volga
                  Volta
                  Warta
                  White
                  Xingu
                  Yukon
                  Amazon
                  Anadyr
                  Angara
                  Barcoo
                  Belaya
                  Brazos
                  Chenab
                  Chulym
                  Cooper
                  Culgoa
                  Danube
                  Finlay
                  Fraser
                  Gambia
                  Ganges
                  Iguaçu
                  Irtysh
                  Japurá
                  Kagera
                  Kapuas
                  Khoper
                  Kolyma
                  Kwango
                  Lomami
                  Mamoré
                  Markha
                  Mekong
                  Murray
                  Nelson
                  Ogooué
                  Orange
                  Ottawa
                  Paraná
                  Platte
                  Salado
                  Sutlej
                  Tigris
                  Ubangi
                  Vilyuy
                  Vltava
                  Vyatka
                  Yalong
                  Yamuna
                  Alazeya
                  Aruwimi
                  Balonne
                  Bermejo
                  Darling
                  Daugava
                  Detroit
                  Dnieper
                  Guaporé
                  Helmand
                  Hooghly
                  Jialing
                  Juruena
                  Kherlen
                  Krishna
                  Lachlan
                  Limpopo
                  Madeira
                  Mantaro
                  Marañón
                  Narmada
                  Niagara
                  Olenyok
                  Olyokma
                  Orinoco
                  Pechora
                  Salween
                  Sankuru
                  Selenge
                  Senegal
                  Songhua
                  Sukhona
                  Tapajós
                  Tsangpo
                  Ucayali
                  Uruguay
                  Vistula
                  Yangtze
                  Yenisei
                  Zambezi
                  Araguaia
                  Arkansas
                  Canadian
                  Chindwin
                  Cimarron
                  Colorado
                  Columbia
                  Dniester
                  Flinders
                  Georgina
                  Ghaghara
                  Godavari
                  Guaviare
                  Huallaga
                  Khatanga
                  Missouri
                  Okavango
                  Paraguay
                  Red Rock
                  Rukarara
                  Shebelle
                  Vychegda
                  Zhujiang
                  Allegheny
                  Amu Darya
                  Athabasca
                  Blue Nile
                  Chambeshi
                  Churchill
                  Condamine
                  Essequibo
                  Euphrates
                  Indigirka
                  Jefferson
                  Kuskokwim
                  Mackenzie
                  Magdalena
                  Pilcomayo
                  Syr Darya
                  Tennessee
                  Tocantins
                  Warburton
                  Ayeyarwady
                  Beaverhead
                  Cumberland
                  Kızılırmak
                  Nyabarongo
                  Río Grande
                  White Nile
                  Brahmaputra
                  Mississippi
                  Saint Clair
                  Saint Louis
                  Saint Marys
                  Yellowstone
                  Hell Roaring
                  Murrumbidgee
                  Saskatchewan
                  Yellow River
                  Madre de Dios
                  Shatt al-Arab
                  São Francisco
                  Sênggê Zangbo
                  Northern Dvina
                  Paraíba do Sul
                  Saint Lawrence
                  Río de la Plata
                  Seversky Donets
                  Grande de Santiago
                  Nizhnyaya Tunguska
                  North Saskatchewan
                  Podkamennaya Tunguska
                  

                  That’s all ! Neat, isn’t it ?

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 2
                  • CoisesC
                    Coises
                    last edited by Coises

                    @Thomas-Knoefel

                    I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

                    I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

                    Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

                    1 Reply Last reply Reply Quote 1
                    • Mark OlsonM
                      Mark Olson
                      last edited by Mark Olson

                      JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
                      Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

                      This converts

                      abcdefg
                      ab
                      abcdefgh
                      a
                      abcdefghi
                      abcde
                      abcd
                      abc
                      

                      into

                      a
                      ab
                      abc
                      abcd
                      abcde
                      abcdefg
                      abcdefgh
                      abcdefghi
                      
                      Mahmoud MadkourM 1 Reply Last reply Reply Quote 2
                      • CoisesC
                        Coises
                        last edited by

                        In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

                        Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

                        Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

                        The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

                        1 Reply Last reply Reply Quote 4
                        • Mahmoud MadkourM
                          Mahmoud Madkour @Mark Olson
                          last edited by

                          @Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
                          1- how to open the file in tree view
                          2- how to go to REGEX mode to enter the query

                          many thanks

                          Mark OlsonM 1 Reply Last reply Reply Quote 0
                          • Mark OlsonM
                            Mark Olson @Mahmoud Madkour
                            last edited by

                            @Mahmoud-Madkour
                            To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
                            Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.

                            1 Reply Last reply Reply Quote 3
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors