Sort a list of filenames alphabetically but grouped by their extension



  • I have a text file which goes like this:

    apple.c
    orange.hpp
    plum.h
    mango.c
    banana.cpp
    grapes.hpp

    I want the above text sorted alphabetically but grouped based on their extensions: i.e,

    apple.c
    mango.c
    banana.cpp
    plum.h
    grapes.hpp
    orange.hpp

    How can I do this in Notepad++?



  • Hello, Anu Anand Premji,

    Very easily, indeed ! you just need two S/R, in regex mode and a classical sort :-))

    So, from your original text, below :

    apple.c
    orange.hpp
    plum.h
    mango.c
    banana.cpp
    grapes.hpp
    
    • First, move back to the very beginning of your file, containing a list of filenames ( Ctrl + Home )

    • Open the Replace dialog ( Ctrl + H )

    • Select the Regular expression search mode

    • Fill up the two fields Find what: and Replace with:, with the values, below :

    SEARCH (?-s)^.+\.(.+)

    REPLACE \1\t\t$0

    • Click on the Replace all button

    => You should obtain the changed text, below :

    c		apple.c
    hpp		orange.hpp
    h		plum.h
    c		mango.c
    cpp		banana.cpp
    hpp		grapes.hpp
    
    • Now, perform a classical sort with the menu option Edit > Line operations > Sort Lines Lexicographically Ascending

    => We get the sorted text, below :

    c		apple.c
    c		mango.c
    cpp		banana.cpp
    h		plum.h
    hpp		grapes.hpp
    hpp		orange.hpp
    
    • Again, open the Replace dialog and type the regex expressions, below :

    SEARCH (?-s)^.+\t+

    REPLACE Leave the field EMPTY

    Yeah ! Here is our final text, sorted as expected :-))

    apple.c
    mango.c
    banana.cpp
    plum.h
    grapes.hpp
    orange.hpp
    

    I’ve got to be out for a while ! But, when I’m home, I’ll add some explanations to my regexes, just in case that you’re not acquainted with regular expressions !

    Best Regards,

    guy038



  • Wow!!! That worked perfectly!! Thank you!!
    Saved me a lot of work!!



  • Hi Anu Anand Premji and All,

    Well, I’m back home and here are some explanations on the two S/R :

    • The two search regex begin with the modifier (?-s) ( No PCRE_DOTALL ) that ensures that the regex engine will consider the dot special character . will match only a single standard character and not an EOL character

    • Then the part, ^.+\. matches, from the beginning of the current line ^, a maximum, NON empty, of characters, till a literal dot character ( which must be escaped by an antislash character \ in order to be considered as literal ) So, if the filename would be, for instance, “abc.def.ghi.cpp”, this range would be the string abc.def.ghi

    • The remainder of the search regex (.+) catches the remainder of the standard characters of the current line ( The extension part ) which is stored as group 1, due to the couple of parentheses

    • In replacement, the regex \1\t\t$0 rewrites the group 1 ( The extension part ), then two tabulation characters and, finally, the searched string, that is to say the name of each file, with its extension ( part $0 )

    • I preferred to use the tabulation character, instead of some spaces, because, with the tabulation character, the complete filenames look all aligned :-)

    • In the second S/R the part ^.+ selects, from the beginning of each line ^, the range maximum, NON empty, of standard characters, till a non empty range of tabulation characters \t+

    • As the replacement field is, this time, empty, all these characters, included tabulations , are, simply, deleted !

    Cheers,

    guuy038


Log in to reply