Community
    • Login

    File sorting

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 8 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @guy038
      last edited by Alan Kilborn

      @guy038 said:

      Hey, guys, it’s not a competition, OK !

      Haha. No, definitely not. A support forum is about giving posters options to solving problems where there is not a very clear answer. It seems we’ve done that so far in this thread! :)

      BTW, that was what I anticipated: A lot of manual steps. :)

      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by

        Hi, @dave-pruce, @alan-kilborn and All,

        My previous list of rivers contained 5 duplicate names :

        Red, Murray, Yenisei, Araguaia and Colorado

        But this is not important, regarding our problem, anyway !

        As you can see, @@dave-pruce, the Python solution, from Alan, is neater ! Isn’t it ?


        Now, Alan, I’ve just tested your one-line script and, to my mind, there’s two problems :

        • Inside a section of river names, of a same length, the names are not sorted alphabetically !

        • Secondly, some names, containing accentuated characters, as, for instance, the Içá river, are located outside their section, as noticed, below :

        Snake
        Volta
        Tobol
        Jubba
        Içá
        Pecos
        Benue
        Iriri
        Lerma
        

        Cheers,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @guy038
          last edited by

          @guy038 said:

          names are not sorted alphabetically

          This is outside the scope of the originally stated problem! :)

          containing accentuated characters…are located outside their section

          The Python len function is apparently simple-minded in this case (using a simple byte count for the length of these strings containing multibyte characters).

          Alan KilbornA 1 Reply Last reply Reply Quote 3
          • Alan KilbornA
            Alan Kilborn @Alan Kilborn
            last edited by

            @Alan-Kilborn said:

            The Python len function is apparently simple-minded in this case

            Perhaps this new one-liner is better, for the case where the OP has Unicode data:

            editor.setText(['\r\n','\r','\n'][editor.getEOLMode()].join(sorted(editor.getText().splitlines(),key=lambda x:len(unicode(x,'utf-8')))))
            

            Of course, still big assumption that the OP is using (or is willing to use) Pythonscript! ;)

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              @dave-pruce, @alan-kilborn,

              Yes, your new attempt, Alan, is the solution, when working with UTF8 encoded files, which may content multi-bytes encoded chars !

              As for me, I was thinking about the opposite solution : to convert UTf8-files to ANSI. However, when using this solution, some characters may result in question marks or may be changed for an approximate character, because, they do not belong to the the corresponding ANSI table of 256 characters !

              For instance, in my previous list of rivers, the Turkish Kızılırmak river, containing the Latin lowercase pointless letter ı, ( of code-point \x{0131} ), is changed into the approximate name Kizilirmak, after conversion to ANSI !

              Anyway, we just did our best to solve the OP’s problem ;-))

              BR

              guy038

              1 Reply Last reply Reply Quote 1
              • F
                freezer2022 @Dave Pruce
                last edited by freezer2022

                @ Dave-Pruce said :

                Is it possible to sort a file by line length??

                Yes, not natively, but there is a Notepad++ plugin for it: Linesort v1.1 (but only for 32bit Notepad++) :

                https://webarchive.org/web/20200207125518/http://www.scout-soft.com/linesort/
                

                linesort.png

                1 Reply Last reply Reply Quote 1
                • CoisesC
                  Coises @Alan Kilborn
                  last edited by Coises

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • mkupperM
                    mkupper @guy038
                    last edited by

                    @guy038 You essentially did “programming” with a human computer doing the evaluations and flow control. :-)

                    That reminds me of the stories about the first computers, which was a human job title, for those that computed but also had to do flow control! When ways were figured out in how to do parts of the job, first via mechanical means, and then electronic, the resulting machines came to be known as computers.

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, All,

                      Thanks to, @mkupper, which recently posted a comment and exactly, three years later, I going to simplify the way to get a sort by length of lines and, secondly, by line contents, too !

                      Like in my previous post, I will use this list of rivers, below :

                      https://en.wikipedia.org/wiki/List_of_rivers_by_length

                      After suppression of some doublons, we get an INPUT text of 238 river’s names :

                      Nile
                      White Nile
                      Kagera
                      Nyabarongo
                      Mwogo
                      Rukarara
                      Amazon
                      Ucayali
                      Tambo
                      Ene
                      Mantaro
                      Yangtze
                      Mississippi
                      Missouri
                      Jefferson
                      Beaverhead
                      Red Rock
                      Hell Roaring
                      Yenisei
                      Angara
                      Selenge
                      Ider
                      Yellow River
                      Ob
                      Irtysh
                      Río de la Plata
                      Paraná
                      Congo
                      Chambeshi
                      Amur
                      Argun
                      Kherlen
                      Lena
                      Mekong
                      Mackenzie
                      Slave
                      Peace
                      Finlay
                      Niger
                      Brahmaputra
                      Tsangpo
                      Murray
                      Darling
                      Culgoa
                      Balonne
                      Condamine
                      Tocantins
                      Araguaia
                      Volga
                      Indus
                      Sênggê Zangbo
                      Shatt al-Arab
                      Euphrates
                      Murat
                      Madeira
                      Mamoré
                      Caine
                      Rocha
                      Purús
                      Yukon
                      São Francisco
                      Syr Darya
                      Naryn
                      Salween
                      Saint Lawrence
                      Niagara
                      Detroit
                      Saint Clair
                      Saint Marys
                      Saint Louis
                      North
                      Nizhnyaya Tunguska
                      Danube
                      Breg
                      Zambezi
                      Vilyuy
                      Ganges
                      Hooghly
                      Padma
                      Amu Darya
                      Panj
                      Japurá
                      Nelson
                      Saskatchewan
                      Paraguay
                      Kolyma
                      Pilcomayo
                      Biya
                      Katun
                      Ishim
                      Juruá
                      Ural
                      Arkansas
                      Colorado
                      Olenyok
                      Dnieper
                      Aldan
                      Ubangi
                      Uele
                      Negro
                      Columbia
                      Zhujiang
                      Red
                      Ayeyarwady
                      Kasai
                      Ohio
                      Allegheny
                      Orinoco
                      Tarim
                      Xingu
                      Orange
                      Salado
                      Vitim
                      Tigris
                      Songhua
                      Tapajós
                      Don
                      Podkamennaya Tunguska
                      Pechora
                      Kama
                      Limpopo
                      Chulym
                      Guaporé
                      Indigirka
                      Snake
                      Senegal
                      Uruguay
                      Blue Nile
                      Churchill
                      Khatanga
                      Okavango
                      Volta
                      Beni
                      Platte
                      Tobol
                      Alazeya
                      Jubba
                      Shebelle
                      Içá
                      Magdalena
                      Han
                      Kura
                      Oka
                      Guaviare
                      Pecos
                      Murrumbidgee
                      Godavari
                      Río Grande
                      Belaya
                      Cooper
                      Barcoo
                      Marañón
                      Dniester
                      Benue
                      Ili
                      Warburton
                      Georgina
                      Sutlej
                      Yamuna
                      Vyatka
                      Fraser
                      Brazos
                      Liao
                      Lachlan
                      Yalong
                      Iguaçu
                      Olyokma
                      Northern Dvina
                      Sukhona
                      Krishna
                      Iriri
                      Narmada
                      Lomami
                      Ottawa
                      Lerma
                      Grande de Santiago
                      Elbe
                      Vltava
                      Zeya
                      Juruena
                      Rhine
                      Athabasca
                      Canadian
                      North Saskatchewan
                      Vistula
                      Bug
                      Vaal
                      Shire
                      Ogooué
                      Nen
                      Kızılırmak
                      Markha
                      Green
                      Milk
                      Chindwin
                      Sankuru
                      Wu
                      James
                      Kapuas
                      Desna
                      Helmand
                      Madre de Dios
                      Tietê
                      Vychegda
                      Sepik
                      Cimarron
                      Anadyr
                      Paraíba do Sul
                      Jialing
                      Liard
                      Cumberland
                      White
                      Huallaga
                      Kwango
                      Draa
                      Gambia
                      Tyung
                      Chenab
                      Yellowstone
                      Ghaghara
                      Huai
                      Aras
                      Chu
                      Seversky Donets
                      Bermejo
                      Fly
                      Kuskokwim
                      Tennessee
                      Oder
                      Warta
                      Aruwimi
                      Daugava
                      Gila
                      Loire
                      Essequibo
                      Khoper
                      Tagus
                      Flinders
                      
                      • At end of the first line, we add some space chars till column 100

                      • Then, with a zero-length selection, at column 100, we insert a exclamation mark ( ! ) at end of all lines of the list :

                      => We get this temporary text ( I just listed the first lines and the last lines ) :

                      Nile                                                                                               !
                      White Nile                                                                                         !
                      Kagera                                                                                             !
                      Nyabarongo                                                                                         !
                      Mwogo                                                                                              !
                      Rukarara                                                                                           !
                      Amazon                                                                                             !
                      Ucayali                                                                                            !
                      Tambo                                                                                              !
                      Ene                                                                                                !
                      Mantaro                                                                                            !
                      Yangtze                                                                                            !
                      Mississippi                                                                                        !
                      Missouri                                                                                           !
                      ......                                                                                             !
                      ......                                                                                             !
                      ......                                                                                             !
                      ......                                                                                             !
                      Seversky Donets                                                                                    !
                      Bermejo                                                                                            !
                      Fly                                                                                                !
                      Kuskokwim                                                                                          !
                      Tennessee                                                                                          !
                      Oder                                                                                               !
                      Warta                                                                                              !
                      Aruwimi                                                                                            !
                      Daugava                                                                                            !
                      Gila                                                                                               !
                      Loire                                                                                              !
                      Essequibo                                                                                          !
                      Khoper                                                                                             !
                      Tagus                                                                                              !
                      Flinders                                                                                           !
                      
                      
                      • Now, we perform this regex S/R :

                        • SEARCH ^([\w -]+?)(\x20+)(?=!)

                        • REPLACE \2\1

                      => Again, we get this temporary text ( I just listed the first lines and the last lines ) :

                                                                                                                     Nile!
                                                                                                               White Nile!
                                                                                                                   Kagera!
                                                                                                               Nyabarongo!
                                                                                                                    Mwogo!
                                                                                                                 Rukarara!
                                                                                                                   Amazon!
                                                                                                                  Ucayali!
                                                                                                                    Tambo!
                                                                                                                      Ene!
                                                                                                                  Mantaro!
                                                                                                                  Yangtze!
                                                                                                              Mississippi!
                                                                                                                 Missouri!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                          Seversky Donets!
                                                                                                                  Bermejo!
                                                                                                                      Fly!
                                                                                                                Kuskokwim!
                                                                                                                Tennessee!
                                                                                                                     Oder!
                                                                                                                    Warta!
                                                                                                                  Aruwimi!
                                                                                                                  Daugava!
                                                                                                                     Gila!
                                                                                                                    Loire!
                                                                                                                Essequibo!
                                                                                                                   Khoper!
                                                                                                                    Tagus!
                                                                                                                 Flinders!
                      
                      • Then, we run the Edit > Line Operations > Sort Lines Lexicographically Ascending option

                      ==> Here is our sorted text ( I just listed the first lines and the last lines ) :

                                                                                                                       Ob!
                                                                                                                       Wu!
                                                                                                                      Bug!
                                                                                                                      Chu!
                                                                                                                      Don!
                                                                                                                      Ene!
                                                                                                                      Fly!
                                                                                                                      Han!
                                                                                                                      Ili!
                                                                                                                      Içá!
                                                                                                                      Nen!
                                                                                                                      Oka!
                                                                                                                      Red!
                                                                                                                     Amur!
                                                                                                                     Aras!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                                   ......!
                                                                                                             Saskatchewan!
                                                                                                             Yellow River!
                                                                                                            Madre de Dios!
                                                                                                            Shatt al-Arab!
                                                                                                            São Francisco!
                                                                                                            Sênggê Zangbo!
                                                                                                           Northern Dvina!
                                                                                                           Paraíba do Sul!
                                                                                                           Saint Lawrence!
                                                                                                          Río de la Plata!
                                                                                                          Seversky Donets!
                                                                                                       Grande de Santiago!
                                                                                                       Nizhnyaya Tunguska!
                                                                                                       North Saskatchewan!
                                                                                                    Podkamennaya Tunguska!
                      
                      • Finally, let’s run this last regex S/R

                        • SEARCH ^\x20+|!$

                        • REPLACE Leave EMPTY

                      => It remains our expected OUTPUT text, sorted by line length :

                      Ob
                      Wu
                      Bug
                      Chu
                      Don
                      Ene
                      Fly
                      Han
                      Ili
                      Içá
                      Nen
                      Oka
                      Red
                      Amur
                      Aras
                      Beni
                      Biya
                      Breg
                      Draa
                      Elbe
                      Gila
                      Huai
                      Ider
                      Kama
                      Kura
                      Lena
                      Liao
                      Milk
                      Nile
                      Oder
                      Ohio
                      Panj
                      Uele
                      Ural
                      Vaal
                      Zeya
                      Aldan
                      Argun
                      Benue
                      Caine
                      Congo
                      Desna
                      Green
                      Indus
                      Iriri
                      Ishim
                      James
                      Jubba
                      Juruá
                      Kasai
                      Katun
                      Lerma
                      Liard
                      Loire
                      Murat
                      Mwogo
                      Naryn
                      Negro
                      Niger
                      North
                      Padma
                      Peace
                      Pecos
                      Purús
                      Rhine
                      Rocha
                      Sepik
                      Shire
                      Slave
                      Snake
                      Tagus
                      Tambo
                      Tarim
                      Tietê
                      Tobol
                      Tyung
                      Vitim
                      Volga
                      Volta
                      Warta
                      White
                      Xingu
                      Yukon
                      Amazon
                      Anadyr
                      Angara
                      Barcoo
                      Belaya
                      Brazos
                      Chenab
                      Chulym
                      Cooper
                      Culgoa
                      Danube
                      Finlay
                      Fraser
                      Gambia
                      Ganges
                      Iguaçu
                      Irtysh
                      Japurá
                      Kagera
                      Kapuas
                      Khoper
                      Kolyma
                      Kwango
                      Lomami
                      Mamoré
                      Markha
                      Mekong
                      Murray
                      Nelson
                      Ogooué
                      Orange
                      Ottawa
                      Paraná
                      Platte
                      Salado
                      Sutlej
                      Tigris
                      Ubangi
                      Vilyuy
                      Vltava
                      Vyatka
                      Yalong
                      Yamuna
                      Alazeya
                      Aruwimi
                      Balonne
                      Bermejo
                      Darling
                      Daugava
                      Detroit
                      Dnieper
                      Guaporé
                      Helmand
                      Hooghly
                      Jialing
                      Juruena
                      Kherlen
                      Krishna
                      Lachlan
                      Limpopo
                      Madeira
                      Mantaro
                      Marañón
                      Narmada
                      Niagara
                      Olenyok
                      Olyokma
                      Orinoco
                      Pechora
                      Salween
                      Sankuru
                      Selenge
                      Senegal
                      Songhua
                      Sukhona
                      Tapajós
                      Tsangpo
                      Ucayali
                      Uruguay
                      Vistula
                      Yangtze
                      Yenisei
                      Zambezi
                      Araguaia
                      Arkansas
                      Canadian
                      Chindwin
                      Cimarron
                      Colorado
                      Columbia
                      Dniester
                      Flinders
                      Georgina
                      Ghaghara
                      Godavari
                      Guaviare
                      Huallaga
                      Khatanga
                      Missouri
                      Okavango
                      Paraguay
                      Red Rock
                      Rukarara
                      Shebelle
                      Vychegda
                      Zhujiang
                      Allegheny
                      Amu Darya
                      Athabasca
                      Blue Nile
                      Chambeshi
                      Churchill
                      Condamine
                      Essequibo
                      Euphrates
                      Indigirka
                      Jefferson
                      Kuskokwim
                      Mackenzie
                      Magdalena
                      Pilcomayo
                      Syr Darya
                      Tennessee
                      Tocantins
                      Warburton
                      Ayeyarwady
                      Beaverhead
                      Cumberland
                      Kızılırmak
                      Nyabarongo
                      Río Grande
                      White Nile
                      Brahmaputra
                      Mississippi
                      Saint Clair
                      Saint Louis
                      Saint Marys
                      Yellowstone
                      Hell Roaring
                      Murrumbidgee
                      Saskatchewan
                      Yellow River
                      Madre de Dios
                      Shatt al-Arab
                      São Francisco
                      Sênggê Zangbo
                      Northern Dvina
                      Paraíba do Sul
                      Saint Lawrence
                      Río de la Plata
                      Seversky Donets
                      Grande de Santiago
                      Nizhnyaya Tunguska
                      North Saskatchewan
                      Podkamennaya Tunguska
                      

                      That’s all ! Neat, isn’t it ?

                      Best Regards,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • CoisesC
                        Coises
                        last edited by Coises

                        @Thomas-Knoefel

                        I received a feature request related to this post. It doesn’t quite feel like a good fit for Columns++ to me, but I think your MultiReplace plugin can assist in making this possible in a reasonable number of steps.

                        I believe multi-replace can be set up to find ^.*$ and replace with set(string.len(MATCH).." "..MATCH).

                        Then Edit | Line operations | Sort Lines As Integers Ascending will sort the lines in order by length, and then ^\d+\x20 replaced with nothing would remove the lengths.

                        1 Reply Last reply Reply Quote 1
                        • Mark OlsonM
                          Mark Olson
                          last edited by Mark Olson

                          JsonTools v6.0 or higher, open treeview for document, go to REGEX mode, enter query @ = s_join(`\r\n`, sort_by(s_split(@, `\r\n`), s_len(@)))
                          Hopefully the syntax is reasonably easy to understand- split the file by \r\n, sort the list of lines by string length, then set the document’s text (@) to the result of string-joining the list back together with \r\n.

                          This converts

                          abcdefg
                          ab
                          abcdefgh
                          a
                          abcdefghi
                          abcde
                          abcd
                          abc
                          

                          into

                          a
                          ab
                          abc
                          abcd
                          abcde
                          abcdefg
                          abcdefgh
                          abcdefghi
                          
                          Mahmoud MadkourM 1 Reply Last reply Reply Quote 2
                          • CoisesC
                            Coises
                            last edited by

                            In case anyone comes across this topic looking for a way to sort lines by length, Columns++ release 1.0.1 can do this.

                            Select Sort… from the Columns++ menu and let it enclose the entire document in a rectangular selection (or make your own selection first).

                            Use Whole lines, Ascending or Descending as desired, and Width. You can then sort on Entire column, unless you wish to use one of the other options.

                            The sort is based on the on-screen width of text in the current font. Columns++ is meant to deal with data in columns using tabs, including elastic tabstops and proportionally-spaced fonts; I found that using the width, rather than a count of characters, was the most consistent way to deal with all the variations in a way that makes intuitive sense for users. For files using monospaced fonts and no tabs, the results are the same as counting characters.

                            1 Reply Last reply Reply Quote 4
                            • Mahmoud MadkourM
                              Mahmoud Madkour @Mark Olson
                              last edited by

                              @Mark-Olson , your proposed solution seems to be so easy but can you please elaborate more,
                              1- how to open the file in tree view
                              2- how to go to REGEX mode to enter the query

                              many thanks

                              Mark OlsonM 1 Reply Last reply Reply Quote 0
                              • Mark OlsonM
                                Mark Olson @Mahmoud Madkour
                                last edited by

                                @Mahmoud-Madkour
                                To open a tree view for a file in REGEX mode, just use the Regex search to JSON command from the JsonTools plugin menu.
                                Once the tree view is open, you can paste the query into the text box at the top right corner of the tree view, and click the Submit query button next to the text box.

                                1 Reply Last reply Reply Quote 3
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors