Sorting multiple chunks of non-connected text



  • How do I sort disparate chunks of text across a large document?
    Is a macro going to be the best option?

    Bounds would be (everything between):
    Member of groups:

    to


    #########################

    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R


    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    SPURCHASE_R
    RDOC_R
    RDOC_RW
    SRECEIVE_RWX
    RDOC_RWX
    SRECEIVE_R


    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R


    Name: grt
    Comment: name
    Member of groups:
    TRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    Lease_R


    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R


    Name: jkl1
    Comment: name
    Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R




  • I would do it in a three-step process:

    1. Search and replace (with

      • Find What : \R(?!^-)

        • find all the end-of-lines that are not followed by a - at the beginning of the line (if you need to allow some non-divider hyphens to start a line, you can put how ever many hyphens you showed: it got rendered as a horizontal-rule in the forum, so I cannot tell exactly how many hyphens you had on each divider line)
      • Replace with :

        • pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley ), just to avoid collisions with common ASCII characters.
      • Replace All

    2. Edit > Line Operations > Sort Lines Lexicographically Ascending

    3. Search and replace

      • Find What :

      • Replace With : \r\n

      • Replace All

    If you wanted to sort by something other than name, you’ll have to do some fancier manipulation in the original search-and-replace to easily sort it inside Notepad++. In theory, you could also use the PythonScript plugin to do more manipulation.

    Personally, if it gets more complicated than just sorting simply by name, I think it would be worth it to pick some scripting language that you’re familiar with (Perl is my go-to for text manipulation, but nowadays, more people know Python than Perl – and, as I mentioned, there is a PythonScript plugin that will allow you to keep things “inside” Notepad++): then you could parse the document into a meaningful structure, and then deal with the structured data, to sort with or rearrange, as you please.

    Or, if you don’t want to learn another language, you might do a search-and-replace to get it into an official CSV (being careful of lines commas, and merging all the members of the groups with a different delimiter than you use between fields), then open it with your favorite spreadsheet or database, and sort on the appropriate column(s).

    For details on regular expressions, this FAQ Desk entry points you to plenty of good sites for reference material.



  • Thank you Peter. :)

    It was actually 80 underscores, not dashes. But I do see now that they show as a page break.
    Also, what didn’t show up, was that each of the items I wanted to sort had 5 spaces in from of them:

    Name: jkl1
    Comment: name
    Member of groups:
        TRAVEL_R
        STRAVEL_R
        PDOC_R
        SPURCHASE_R
        VOUCHER_R
        SVOUCHER_R
        RDOC_R
        SRECEIVE_R
        CONTRACT_R
    _______________________________________________________________________________
    

    The above suggestion caused all groups of the same name to move together and moved all the users.
    I apologize for not stating this better earlier: I was hoping to maintain each users own groups, but sorted alphabetically down a long list.
    Each “Name” groups needs to remain as:

    Name:
    Comment:
    Member Of Groups:
    A
    B
    C
    ...
    _______________________________________________________________________________


  • Hmm, in my experiments (with a hyphen separator), it did keep the indenting, I thought. Maybe I’m wrong.

    This does the sorting that I think you wanted, but with the underscore separators, and I have confirmed indentation stayed consistent:

    1. Find ([^_])\r\n Replace \1☺ (this assumes windows CRLF line endings: the generic \R matched too much)
    2. Sort
    3. Find \1☺ Replace \r\n

    So

    Name: jkl1
    Comment: name
    Member of groups:
        TRAVEL_R
        STRAVEL_R
        PDOC_R
    _______________________________________________________________________________
    Name: abc3
    Comment: name
    Member of groups:
        SPURCHASE_R
        VOUCHER_R
        SVOUCHER_R
    _______________________________________________________________________________
    Name: xyz9
    Comment: name
    Member of groups:
        RDOC_R
        SRECEIVE_R
        CONTRACT_R
    _______________________________________________________________________________
    Name: abc1
    Comment: name
    Member of groups:
        TRAVEL_R
        VOUCHER_R
        SVOUCHER_R
    _______________________________________________________________________________
    

    became

    Name: abc1
    Comment: name
    Member of groups:
        TRAVEL_R
        VOUCHER_R
        SVOUCHER_R
    _______________________________________________________________________________
    Name: abc3
    Comment: name
    Member of groups:
        SPURCHASE_R
        VOUCHER_R
        SVOUCHER_R
    _______________________________________________________________________________
    Name: jkl1
    Comment: name
    Member of groups:
        TRAVEL_R
        STRAVEL_R
        PDOC_R
    _______________________________________________________________________________
    Name: xyz9
    Comment: name
    Member of groups:
        RDOC_R
        SRECEIVE_R
        CONTRACT_R
    _______________________________________________________________________________
    

    … but now I’m confused: Do you want to alphabetize by name? Or do you want to leave the records all in the same order, and just sort the group names within an individual record? Or do you want both (ie, records going alphabetical by name, followed by groups going alphabetical inside it’s “member of groups” section)?

    If what I’ve shown doesn’t give what you want, then show at least two records (sets of lines) with at least two groups listed in each record, and show a before and after scenario. (And make sure you include any exceptions or edge cases you think might crop up)

    That said, if you want to alphabetize only the group lines within a record, or something more complicated, I will reiterate: “if it gets more complicated than just simply sortin by name, I think it would be worth it to pick some scripting language…”. Of course, @guy038 can probably work his magic on even something more complicated, but you’ll have to give a sufficient example to include all edge cases.



  • Hello, @jeff-test,

    Do you expect an output text like below, where each block of member names, containing the _ character, is sorted ?

    #########################
    
    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    VOUCHER_R
    
    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    RDOC_RW
    RDOC_RWX
    SPURCHASE_R
    SRECEIVE_R
    SRECEIVE_RWX
    
    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R
    
    Name: grt
    Comment: name
    Member of groups:
    CONTRACT_R
    Lease_R
    PDOC_R
    RDOC_R
    SPURCHASE_R
    SRECEIVE_R
    SVOUCHER_R
    TRAVEL_R
    VOUCHER_R
    
    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    VOUCHER_R
    
    Name: jkl1
    Comment: name
    Member of groups:
    CONTRACT_R
    PDOC_R
    RDOC_R
    SPURCHASE_R
    SRECEIVE_R
    STRAVEL_R
    SVOUCHER_R
    TRAVEL_R
    VOUCHER_R
    

    See you later,

    Best regards,

    guy038



  • Yes. Each block: doesnt matter what order they are in, as long as each user remains with their listed access. However, each little list of ‘groups’ needs to be sorted, such as @guy038 has stated/shown.



  • So, assuming you have the Python Script Plugin

    1. Create the script: Plugins > Python Script > New Script

    2. Paste the script, and save with meaningful name: sortGroups.py

       #https://notepad-plus-plus.org/community/topic/15971/sorting-multiple-chunks-of-non-connected-text/4
       # works on active view / file
       import sys
       from Npp import *
      
       console.clear()
       #console.show()
      
       # for debug, change view and document index
       #i = notepad.getCurrentDocIndex(1)
       #notepad.activateIndex(1,i)
      
       keepGoing = True
      
       editor.documentEnd()               # go to the last position
       end2 = editor.getCurrentPos()      # record the position
       #console.write("editor.end = " + str(end2)+"\n")
       editor.documentStart()             # back to the beginning
       start2 = editor.getCurrentPos()    # record the position
       #console.write("editor.start = " + str(start2)+"\n")
      
       while keepGoing:
           # find the group:\R prefix
           position = editor.findText( FINDOPTION.REGEXP, start2, end2, "groups:$")
           if position is None:
               break
           #console.write("editor: findText @ " + str(position[0]) + ":" + str(position[1]) + "\n")
      
           # find ___ (or EOF)
           underscore = editor.findText( FINDOPTION.REGEXP, position[1], end2, "^_+$")
           if underscore is None:
               keepGoing = False
               underscore = (end2, end2)
      
           # select the text
           #console.write("editor: findText @ " + str(underscore[0]) + ":" + str(underscore[1]) + "\n")
           editor.setSelectionStart(position[1]+2)    # start after the newline
           editor.setSelectionEnd(underscore[0]-2)    # end before the newline
      
           # okay, now the first match is highlighted... need to run the Edit > Line Operations > Sort Lines Lexicographically Ascending...
           # maybe notepad.menuCommand() or notepad.runMenuCommand()
           notepad.menuCommand(42059)  # got from https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml
           #console.write(str(keepGoing))
      
           # start at the end of the last group
           start2 = underscore[0]
      
       # want nothing selected at end
       editor.clearSelections()
      
    3. Input file (make sure this is the active window / view / document in Notepad++)

       Name: jkl1
       Comment: name
       Member of groups:
           STRAVEL_R
           PDOC_R
           TRAVEL_R
       _______________________________________________________________________________
       Name: abc3
       Comment: name
       Member of groups:
           SPURCHASE_R
           VOUCHER_R
           SVOUCHER_R
       _______________________________________________________________________________
       Name: xyz9
       Comment: name
       Member of groups:
           RDOC_R
           SRECEIVE_R
           CONTRACT_R
       _______________________________________________________________________________
       Name: abc1
       Comment: name
       Member of groups:
           TRAVEL_R
           VOUCHER_R
           SVOUCHER_R
       _______________________________________________________________________________
      
    4. Plugins > Python Script > Scripts > sortGroups.py

    5. Result:

       Name: jkl1
       Comment: name
       Member of groups:
           PDOC_R
           STRAVEL_R
           TRAVEL_R
       _______________________________________________________________________________
       Name: abc3
       Comment: name
       Member of groups:
           SPURCHASE_R
           SVOUCHER_R
           VOUCHER_R
       _______________________________________________________________________________
       Name: xyz9
       Comment: name
       Member of groups:
           CONTRACT_R
           RDOC_R
           SRECEIVE_R
       _______________________________________________________________________________
       Name: abc1
       Comment: name
       Member of groups:
           SVOUCHER_R
           TRAVEL_R
           VOUCHER_R
       _______________________________________________________________________________
      

    I think this does what you want.



  • Hi, @jeff-test, @peterjones and All,

    Sorry for this late reply, but I was with my sister and brother-in-law, a couple of days !


    So, here is, below, my solution, just using regex S/R and a N++ sort ! Of course, the Python script, from Peter is easier to use and, probably, quicker than all my stuff ! Just one advantage : this method can be processed with a minimalist N++ package Ah, Ah -;))

    So, starting with this sample text, below, in a new N++ tab :

    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R
    
    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    SPURCHASE_R
    RDOC_R
    RDOC_RW
    SRECEIVE_RWX
    RDOC_RWX
    SRECEIVE_R
    
    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R
    
    Name: grt
    Comment: name
    Member of groups:
    TRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    Lease_R
    
    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R
    
    Name: jkl1
    Comment: name
    Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    

    Firstly, add, if necessary, an empty line at the very end of the list


    Now, if we use the following regex S/R, which copies the name of each group, preceded by the # symbol, in a new line, located after the corresponding group :

    SEARCH (?s-i)^Name:\x20(.+?\R).+?\R(?=\R|\z)

    REPLACE $0#\1

    we get the text :

    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R
    #asd0
    
    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    SPURCHASE_R
    RDOC_R
    RDOC_RW
    SRECEIVE_RWX
    RDOC_RWX
    SRECEIVE_R
    #wr3
    
    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R
    #ft3
    
    Name: grt
    Comment: name
    Member of groups:
    TRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    Lease_R
    #grt
    
    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R
    #asdq
    
    Name: jkl1
    Comment: name
    Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    #jkl1
    

    Then, using this second regex S/R, which adds a prefix to each line of the list, depending on its type :

    SEARCH (?-si)^((?:(Name:)|(Comment:)|(Member\x20)|(.+_)).+)(?=(?:(?s).+?)#(.+))|^#(.+)\R?

    REPLACE (?2\6_1_\1)(?3\6_2_\1)(?4\6_3_\1)(?5\6_4_\1)(?7\7_5_)

    we obtain :

    asd0_1_Name: asd0
    asd0_2_Comment: name
    asd0_3_Member of groups:
    asd0_4_PDOC_R
    asd0_4_VOUCHER_R
    asd0_4_RDOC_R
    asd0_5_
    wr3_1_Name: wr3
    wr3_2_Comment: name
    wr3_3_Member of groups:
    wr3_4_PDOC_R
    wr3_4_SPURCHASE_R
    wr3_4_RDOC_R
    wr3_4_RDOC_RW
    wr3_4_SRECEIVE_RWX
    wr3_4_RDOC_RWX
    wr3_4_SRECEIVE_R
    wr3_5_
    ft3_1_Name: ft3
    ft3_2_Comment: name
    ft3_3_Member of groups:
    ft3_4_PDOC_R
    ft3_4_RDOC_R
    ft3_4_SORMPURCHASING_R
    ft3_4_SORMRECEIVING_R
    ft3_4_SORMVOUCHERS_R
    ft3_4_VOUCHER_R
    ft3_5_
    grt_1_Name: grt
    grt_2_Comment: name
    grt_3_Member of groups:
    grt_4_TRAVEL_R
    grt_4_PDOC_R
    grt_4_SPURCHASE_R
    grt_4_VOUCHER_R
    grt_4_SVOUCHER_R
    grt_4_RDOC_R
    grt_4_SRECEIVE_R
    grt_4_CONTRACT_R
    grt_4_Lease_R
    grt_5_
    asdq_1_Name: asdq
    asdq_2_Comment: name
    asdq_3_Member of groups:
    asdq_4_PDOC_R
    asdq_4_VOUCHER_R
    asdq_4_RDOC_R
    asdq_5_
    jkl1_1_Name: jkl1
    jkl1_2_Comment: name
    jkl1_3_Member of groups:
    jkl1_4_TRAVEL_R
    jkl1_4_STRAVEL_R
    jkl1_4_PDOC_R
    jkl1_4_SPURCHASE_R
    jkl1_4_VOUCHER_R
    jkl1_4_SVOUCHER_R
    jkl1_4_RDOC_R
    jkl1_4_SRECEIVE_R
    jkl1_4_CONTRACT_R
    jkl1_5_
    

    Now, we perform the usual sort, using the menu command Edit > Line Operations > Sort Lines Lexicographically Ascending

    which gives the following text :

    asd0_1_Name: asd0
    asd0_2_Comment: name
    asd0_3_Member of groups:
    asd0_4_PDOC_R
    asd0_4_RDOC_R
    asd0_4_VOUCHER_R
    asd0_5_
    asdq_1_Name: asdq
    asdq_2_Comment: name
    asdq_3_Member of groups:
    asdq_4_PDOC_R
    asdq_4_RDOC_R
    asdq_4_VOUCHER_R
    asdq_5_
    ft3_1_Name: ft3
    ft3_2_Comment: name
    ft3_3_Member of groups:
    ft3_4_PDOC_R
    ft3_4_RDOC_R
    ft3_4_SORMPURCHASING_R
    ft3_4_SORMRECEIVING_R
    ft3_4_SORMVOUCHERS_R
    ft3_4_VOUCHER_R
    ft3_5_
    grt_1_Name: grt
    grt_2_Comment: name
    grt_3_Member of groups:
    grt_4_CONTRACT_R
    grt_4_Lease_R
    grt_4_PDOC_R
    grt_4_RDOC_R
    grt_4_SPURCHASE_R
    grt_4_SRECEIVE_R
    grt_4_SVOUCHER_R
    grt_4_TRAVEL_R
    grt_4_VOUCHER_R
    grt_5_
    jkl1_1_Name: jkl1
    jkl1_2_Comment: name
    jkl1_3_Member of groups:
    jkl1_4_CONTRACT_R
    jkl1_4_PDOC_R
    jkl1_4_RDOC_R
    jkl1_4_SPURCHASE_R
    jkl1_4_SRECEIVE_R
    jkl1_4_STRAVEL_R
    jkl1_4_SVOUCHER_R
    jkl1_4_TRAVEL_R
    jkl1_4_VOUCHER_R
    jkl1_5_
    wr3_1_Name: wr3
    wr3_2_Comment: name
    wr3_3_Member of groups:
    wr3_4_PDOC_R
    wr3_4_RDOC_R
    wr3_4_RDOC_RW
    wr3_4_RDOC_RWX
    wr3_4_SPURCHASE_R
    wr3_4_SRECEIVE_R
    wr3_4_SRECEIVE_RWX
    wr3_5_
    

    And, finally, we get rid of all the prefixes, in all the lines of the list. Very easy with the simple S/R, below :

    SEARCH ^.+_\d_

    REPLACE Leave EMPTY

    So, we are left with :

    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    VOUCHER_R
    
    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    VOUCHER_R
    
    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R
    
    Name: grt
    Comment: name
    Member of groups:
    CONTRACT_R
    Lease_R
    PDOC_R
    RDOC_R
    SPURCHASE_R
    SRECEIVE_R
    SVOUCHER_R
    TRAVEL_R
    VOUCHER_R
    
    Name: jkl1
    Comment: name
    Member of groups:
    CONTRACT_R
    PDOC_R
    RDOC_R
    SPURCHASE_R
    SRECEIVE_R
    STRAVEL_R
    SVOUCHER_R
    TRAVEL_R
    VOUCHER_R
    
    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    RDOC_RW
    RDOC_RWX
    SPURCHASE_R
    SRECEIVE_R
    SRECEIVE_RWX
    

    Remark :

    Just notice that, both :

    • The different groups are alphabetically sorted

    • The list of all the members, in each group, is also, alphabetically sorted

    Cheers,

    guy038



  • That was utter brilliance @guy038 . I, i can’t even. Hats off on that.
    (This is not to say that the py script from @PeterJones is out the window, this will just help me convince folks that Notepad++ is the best.) :)

    +++Unrelated+++
    So, is there a book, like NoStarchPress or something for Notepad++? How did you learn all these fancy bits for the regex? (its Perl?)
    Python vs. Perl? (I’d like to learn one, or both. My aim is system administration/automation. Recommendations?)


Log in to reply