Community
    • Login

    Sorting multiple chunks of non-connected text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    sorting
    9 Posts 3 Posters 3.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jeff TestJ
      Jeff Test
      last edited by

      How do I sort disparate chunks of text across a large document?
      Is a macro going to be the best option?

      Bounds would be (everything between):
      Member of groups:

      to


      #########################

      Name: asd0
      Comment: name
      Member of groups:
      PDOC_R
      VOUCHER_R
      RDOC_R


      Name: wr3
      Comment: name
      Member of groups:
      PDOC_R
      SPURCHASE_R
      RDOC_R
      RDOC_RW
      SRECEIVE_RWX
      RDOC_RWX
      SRECEIVE_R


      Name: ft3
      Comment: name
      Member of groups:
      PDOC_R
      RDOC_R
      SORMPURCHASING_R
      SORMRECEIVING_R
      SORMVOUCHERS_R
      VOUCHER_R


      Name: grt
      Comment: name
      Member of groups:
      TRAVEL_R
      PDOC_R
      SPURCHASE_R
      VOUCHER_R
      SVOUCHER_R
      RDOC_R
      SRECEIVE_R
      CONTRACT_R
      Lease_R


      Name: asdq
      Comment: name
      Member of groups:
      PDOC_R
      VOUCHER_R
      RDOC_R


      Name: jkl1
      Comment: name
      Member of groups:
      TRAVEL_R
      STRAVEL_R
      PDOC_R
      SPURCHASE_R
      VOUCHER_R
      SVOUCHER_R
      RDOC_R
      SRECEIVE_R
      CONTRACT_R


      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        I would do it in a three-step process:

        1. Search and replace (with

          • Find What : \R(?!^-)

            • find all the end-of-lines that are not followed by a - at the beginning of the line (if you need to allow some non-divider hyphens to start a line, you can put how ever many hyphens you showed: it got rendered as a horizontal-rule in the forum, so I cannot tell exactly how many hyphens you had on each divider line)
          • Replace with : ☺

            • pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley ☺), just to avoid collisions with common ASCII characters.
          • Replace All

        2. Edit > Line Operations > Sort Lines Lexicographically Ascending

        3. Search and replace

          • Find What : ☺

          • Replace With : \r\n

          • Replace All

        If you wanted to sort by something other than name, you’ll have to do some fancier manipulation in the original search-and-replace to easily sort it inside Notepad++. In theory, you could also use the PythonScript plugin to do more manipulation.

        Personally, if it gets more complicated than just sorting simply by name, I think it would be worth it to pick some scripting language that you’re familiar with (Perl is my go-to for text manipulation, but nowadays, more people know Python than Perl – and, as I mentioned, there is a PythonScript plugin that will allow you to keep things “inside” Notepad++): then you could parse the document into a meaningful structure, and then deal with the structured data, to sort with or rearrange, as you please.

        Or, if you don’t want to learn another language, you might do a search-and-replace to get it into an official CSV (being careful of lines commas, and merging all the members of the groups with a different delimiter than you use between fields), then open it with your favorite spreadsheet or database, and sort on the appropriate column(s).

        For details on regular expressions, this FAQ Desk entry points you to plenty of good sites for reference material.

        1 Reply Last reply Reply Quote 1
        • Jeff TestJ
          Jeff Test
          last edited by Jeff Test

          Thank you Peter. :)

          It was actually 80 underscores, not dashes. But I do see now that they show as a page break.
          Also, what didn’t show up, was that each of the items I wanted to sort had 5 spaces in from of them:

          Name: jkl1
          Comment: name
          Member of groups:
              TRAVEL_R
              STRAVEL_R
              PDOC_R
              SPURCHASE_R
              VOUCHER_R
              SVOUCHER_R
              RDOC_R
              SRECEIVE_R
              CONTRACT_R
          _______________________________________________________________________________
          

          The above suggestion caused all groups of the same name to move together and moved all the users.
          I apologize for not stating this better earlier: I was hoping to maintain each users own groups, but sorted alphabetically down a long list.
          Each “Name” groups needs to remain as:

          Name:
          Comment:
          Member Of Groups:
          A
          B
          C
          ...
          _______________________________________________________________________________
          
          1 Reply Last reply Reply Quote 1
          • PeterJonesP
            PeterJones
            last edited by

            Hmm, in my experiments (with a hyphen separator), it did keep the indenting, I thought. Maybe I’m wrong.

            This does the sorting that I think you wanted, but with the underscore separators, and I have confirmed indentation stayed consistent:

            1. Find ([^_])\r\n Replace \1☺ (this assumes windows CRLF line endings: the generic \R matched too much)
            2. Sort
            3. Find \1☺ Replace \r\n

            So

            Name: jkl1
            Comment: name
            Member of groups:
                TRAVEL_R
                STRAVEL_R
                PDOC_R
            _______________________________________________________________________________
            Name: abc3
            Comment: name
            Member of groups:
                SPURCHASE_R
                VOUCHER_R
                SVOUCHER_R
            _______________________________________________________________________________
            Name: xyz9
            Comment: name
            Member of groups:
                RDOC_R
                SRECEIVE_R
                CONTRACT_R
            _______________________________________________________________________________
            Name: abc1
            Comment: name
            Member of groups:
                TRAVEL_R
                VOUCHER_R
                SVOUCHER_R
            _______________________________________________________________________________
            

            became

            Name: abc1
            Comment: name
            Member of groups:
                TRAVEL_R
                VOUCHER_R
                SVOUCHER_R
            _______________________________________________________________________________
            Name: abc3
            Comment: name
            Member of groups:
                SPURCHASE_R
                VOUCHER_R
                SVOUCHER_R
            _______________________________________________________________________________
            Name: jkl1
            Comment: name
            Member of groups:
                TRAVEL_R
                STRAVEL_R
                PDOC_R
            _______________________________________________________________________________
            Name: xyz9
            Comment: name
            Member of groups:
                RDOC_R
                SRECEIVE_R
                CONTRACT_R
            _______________________________________________________________________________
            

            … but now I’m confused: Do you want to alphabetize by name? Or do you want to leave the records all in the same order, and just sort the group names within an individual record? Or do you want both (ie, records going alphabetical by name, followed by groups going alphabetical inside it’s “member of groups” section)?

            If what I’ve shown doesn’t give what you want, then show at least two records (sets of lines) with at least two groups listed in each record, and show a before and after scenario. (And make sure you include any exceptions or edge cases you think might crop up)

            That said, if you want to alphabetize only the group lines within a record, or something more complicated, I will reiterate: “if it gets more complicated than just simply sortin by name, I think it would be worth it to pick some scripting language…”. Of course, @guy038 can probably work his magic on even something more complicated, but you’ll have to give a sufficient example to include all edge cases.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by

              Hello, @jeff-test,

              Do you expect an output text like below, where each block of member names, containing the _ character, is sorted ?

              #########################
              
              Name: asd0
              Comment: name
              Member of groups:
              PDOC_R
              RDOC_R
              VOUCHER_R
              
              Name: wr3
              Comment: name
              Member of groups:
              PDOC_R
              RDOC_R
              RDOC_RW
              RDOC_RWX
              SPURCHASE_R
              SRECEIVE_R
              SRECEIVE_RWX
              
              Name: ft3
              Comment: name
              Member of groups:
              PDOC_R
              RDOC_R
              SORMPURCHASING_R
              SORMRECEIVING_R
              SORMVOUCHERS_R
              VOUCHER_R
              
              Name: grt
              Comment: name
              Member of groups:
              CONTRACT_R
              Lease_R
              PDOC_R
              RDOC_R
              SPURCHASE_R
              SRECEIVE_R
              SVOUCHER_R
              TRAVEL_R
              VOUCHER_R
              
              Name: asdq
              Comment: name
              Member of groups:
              PDOC_R
              RDOC_R
              VOUCHER_R
              
              Name: jkl1
              Comment: name
              Member of groups:
              CONTRACT_R
              PDOC_R
              RDOC_R
              SPURCHASE_R
              SRECEIVE_R
              STRAVEL_R
              SVOUCHER_R
              TRAVEL_R
              VOUCHER_R
              

              See you later,

              Best regards,

              guy038

              1 Reply Last reply Reply Quote 1
              • Jeff TestJ
                Jeff Test
                last edited by

                Yes. Each block: doesnt matter what order they are in, as long as each user remains with their listed access. However, each little list of ‘groups’ needs to be sorted, such as @guy038 has stated/shown.

                1 Reply Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones
                  last edited by

                  So, assuming you have the Python Script Plugin…

                  1. Create the script: Plugins > Python Script > New Script

                  2. Paste the script, and save with meaningful name: sortGroups.py

                     #https://notepad-plus-plus.org/community/topic/15971/sorting-multiple-chunks-of-non-connected-text/4
                     # works on active view / file
                     import sys
                     from Npp import *
                    
                     console.clear()
                     #console.show()
                    
                     # for debug, change view and document index
                     #i = notepad.getCurrentDocIndex(1)
                     #notepad.activateIndex(1,i)
                    
                     keepGoing = True
                    
                     editor.documentEnd()               # go to the last position
                     end2 = editor.getCurrentPos()      # record the position
                     #console.write("editor.end = " + str(end2)+"\n")
                     editor.documentStart()             # back to the beginning
                     start2 = editor.getCurrentPos()    # record the position
                     #console.write("editor.start = " + str(start2)+"\n")
                    
                     while keepGoing:
                         # find the group:\R prefix
                         position = editor.findText( FINDOPTION.REGEXP, start2, end2, "groups:$")
                         if position is None:
                             break
                         #console.write("editor: findText @ " + str(position[0]) + ":" + str(position[1]) + "\n")
                    
                         # find ___ (or EOF)
                         underscore = editor.findText( FINDOPTION.REGEXP, position[1], end2, "^_+$")
                         if underscore is None:
                             keepGoing = False
                             underscore = (end2, end2)
                    
                         # select the text
                         #console.write("editor: findText @ " + str(underscore[0]) + ":" + str(underscore[1]) + "\n")
                         editor.setSelectionStart(position[1]+2)    # start after the newline
                         editor.setSelectionEnd(underscore[0]-2)    # end before the newline
                    
                         # okay, now the first match is highlighted... need to run the Edit > Line Operations > Sort Lines Lexicographically Ascending...
                         # maybe notepad.menuCommand() or notepad.runMenuCommand()
                         notepad.menuCommand(42059)  # got from https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml
                         #console.write(str(keepGoing))
                    
                         # start at the end of the last group
                         start2 = underscore[0]
                    
                     # want nothing selected at end
                     editor.clearSelections()
                    
                  3. Input file (make sure this is the active window / view / document in Notepad++)

                     Name: jkl1
                     Comment: name
                     Member of groups:
                         STRAVEL_R
                         PDOC_R
                         TRAVEL_R
                     _______________________________________________________________________________
                     Name: abc3
                     Comment: name
                     Member of groups:
                         SPURCHASE_R
                         VOUCHER_R
                         SVOUCHER_R
                     _______________________________________________________________________________
                     Name: xyz9
                     Comment: name
                     Member of groups:
                         RDOC_R
                         SRECEIVE_R
                         CONTRACT_R
                     _______________________________________________________________________________
                     Name: abc1
                     Comment: name
                     Member of groups:
                         TRAVEL_R
                         VOUCHER_R
                         SVOUCHER_R
                     _______________________________________________________________________________
                    
                  4. Plugins > Python Script > Scripts > sortGroups.py

                  5. Result:

                     Name: jkl1
                     Comment: name
                     Member of groups:
                         PDOC_R
                         STRAVEL_R
                         TRAVEL_R
                     _______________________________________________________________________________
                     Name: abc3
                     Comment: name
                     Member of groups:
                         SPURCHASE_R
                         SVOUCHER_R
                         VOUCHER_R
                     _______________________________________________________________________________
                     Name: xyz9
                     Comment: name
                     Member of groups:
                         CONTRACT_R
                         RDOC_R
                         SRECEIVE_R
                     _______________________________________________________________________________
                     Name: abc1
                     Comment: name
                     Member of groups:
                         SVOUCHER_R
                         TRAVEL_R
                         VOUCHER_R
                     _______________________________________________________________________________
                    

                  I think this does what you want.

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by

                    Hi, @jeff-test, @peterjones and All,

                    Sorry for this late reply, but I was with my sister and brother-in-law, a couple of days !


                    So, here is, below, my solution, just using regex S/R and a N++ sort ! Of course, the Python script, from Peter is easier to use and, probably, quicker than all my stuff ! Just one advantage : this method can be processed with a minimalist N++ package Ah, Ah -;))

                    So, starting with this sample text, below, in a new N++ tab :

                    Name: asd0
                    Comment: name
                    Member of groups:
                    PDOC_R
                    VOUCHER_R
                    RDOC_R
                    
                    Name: wr3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    SPURCHASE_R
                    RDOC_R
                    RDOC_RW
                    SRECEIVE_RWX
                    RDOC_RWX
                    SRECEIVE_R
                    
                    Name: ft3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    SORMPURCHASING_R
                    SORMRECEIVING_R
                    SORMVOUCHERS_R
                    VOUCHER_R
                    
                    Name: grt
                    Comment: name
                    Member of groups:
                    TRAVEL_R
                    PDOC_R
                    SPURCHASE_R
                    VOUCHER_R
                    SVOUCHER_R
                    RDOC_R
                    SRECEIVE_R
                    CONTRACT_R
                    Lease_R
                    
                    Name: asdq
                    Comment: name
                    Member of groups:
                    PDOC_R
                    VOUCHER_R
                    RDOC_R
                    
                    Name: jkl1
                    Comment: name
                    Member of groups:
                    TRAVEL_R
                    STRAVEL_R
                    PDOC_R
                    SPURCHASE_R
                    VOUCHER_R
                    SVOUCHER_R
                    RDOC_R
                    SRECEIVE_R
                    CONTRACT_R
                    

                    Firstly, add, if necessary, an empty line at the very end of the list


                    Now, if we use the following regex S/R, which copies the name of each group, preceded by the # symbol, in a new line, located after the corresponding group :

                    SEARCH (?s-i)^Name:\x20(.+?\R).+?\R(?=\R|\z)

                    REPLACE $0#\1

                    we get the text :

                    Name: asd0
                    Comment: name
                    Member of groups:
                    PDOC_R
                    VOUCHER_R
                    RDOC_R
                    #asd0
                    
                    Name: wr3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    SPURCHASE_R
                    RDOC_R
                    RDOC_RW
                    SRECEIVE_RWX
                    RDOC_RWX
                    SRECEIVE_R
                    #wr3
                    
                    Name: ft3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    SORMPURCHASING_R
                    SORMRECEIVING_R
                    SORMVOUCHERS_R
                    VOUCHER_R
                    #ft3
                    
                    Name: grt
                    Comment: name
                    Member of groups:
                    TRAVEL_R
                    PDOC_R
                    SPURCHASE_R
                    VOUCHER_R
                    SVOUCHER_R
                    RDOC_R
                    SRECEIVE_R
                    CONTRACT_R
                    Lease_R
                    #grt
                    
                    Name: asdq
                    Comment: name
                    Member of groups:
                    PDOC_R
                    VOUCHER_R
                    RDOC_R
                    #asdq
                    
                    Name: jkl1
                    Comment: name
                    Member of groups:
                    TRAVEL_R
                    STRAVEL_R
                    PDOC_R
                    SPURCHASE_R
                    VOUCHER_R
                    SVOUCHER_R
                    RDOC_R
                    SRECEIVE_R
                    CONTRACT_R
                    #jkl1
                    

                    Then, using this second regex S/R, which adds a prefix to each line of the list, depending on its type :

                    SEARCH (?-si)^((?:(Name:)|(Comment:)|(Member\x20)|(.+_)).+)(?=(?:(?s).+?)#(.+))|^#(.+)\R?

                    REPLACE (?2\6_1_\1)(?3\6_2_\1)(?4\6_3_\1)(?5\6_4_\1)(?7\7_5_)

                    we obtain :

                    asd0_1_Name: asd0
                    asd0_2_Comment: name
                    asd0_3_Member of groups:
                    asd0_4_PDOC_R
                    asd0_4_VOUCHER_R
                    asd0_4_RDOC_R
                    asd0_5_
                    wr3_1_Name: wr3
                    wr3_2_Comment: name
                    wr3_3_Member of groups:
                    wr3_4_PDOC_R
                    wr3_4_SPURCHASE_R
                    wr3_4_RDOC_R
                    wr3_4_RDOC_RW
                    wr3_4_SRECEIVE_RWX
                    wr3_4_RDOC_RWX
                    wr3_4_SRECEIVE_R
                    wr3_5_
                    ft3_1_Name: ft3
                    ft3_2_Comment: name
                    ft3_3_Member of groups:
                    ft3_4_PDOC_R
                    ft3_4_RDOC_R
                    ft3_4_SORMPURCHASING_R
                    ft3_4_SORMRECEIVING_R
                    ft3_4_SORMVOUCHERS_R
                    ft3_4_VOUCHER_R
                    ft3_5_
                    grt_1_Name: grt
                    grt_2_Comment: name
                    grt_3_Member of groups:
                    grt_4_TRAVEL_R
                    grt_4_PDOC_R
                    grt_4_SPURCHASE_R
                    grt_4_VOUCHER_R
                    grt_4_SVOUCHER_R
                    grt_4_RDOC_R
                    grt_4_SRECEIVE_R
                    grt_4_CONTRACT_R
                    grt_4_Lease_R
                    grt_5_
                    asdq_1_Name: asdq
                    asdq_2_Comment: name
                    asdq_3_Member of groups:
                    asdq_4_PDOC_R
                    asdq_4_VOUCHER_R
                    asdq_4_RDOC_R
                    asdq_5_
                    jkl1_1_Name: jkl1
                    jkl1_2_Comment: name
                    jkl1_3_Member of groups:
                    jkl1_4_TRAVEL_R
                    jkl1_4_STRAVEL_R
                    jkl1_4_PDOC_R
                    jkl1_4_SPURCHASE_R
                    jkl1_4_VOUCHER_R
                    jkl1_4_SVOUCHER_R
                    jkl1_4_RDOC_R
                    jkl1_4_SRECEIVE_R
                    jkl1_4_CONTRACT_R
                    jkl1_5_
                    

                    Now, we perform the usual sort, using the menu command Edit > Line Operations > Sort Lines Lexicographically Ascending

                    which gives the following text :

                    asd0_1_Name: asd0
                    asd0_2_Comment: name
                    asd0_3_Member of groups:
                    asd0_4_PDOC_R
                    asd0_4_RDOC_R
                    asd0_4_VOUCHER_R
                    asd0_5_
                    asdq_1_Name: asdq
                    asdq_2_Comment: name
                    asdq_3_Member of groups:
                    asdq_4_PDOC_R
                    asdq_4_RDOC_R
                    asdq_4_VOUCHER_R
                    asdq_5_
                    ft3_1_Name: ft3
                    ft3_2_Comment: name
                    ft3_3_Member of groups:
                    ft3_4_PDOC_R
                    ft3_4_RDOC_R
                    ft3_4_SORMPURCHASING_R
                    ft3_4_SORMRECEIVING_R
                    ft3_4_SORMVOUCHERS_R
                    ft3_4_VOUCHER_R
                    ft3_5_
                    grt_1_Name: grt
                    grt_2_Comment: name
                    grt_3_Member of groups:
                    grt_4_CONTRACT_R
                    grt_4_Lease_R
                    grt_4_PDOC_R
                    grt_4_RDOC_R
                    grt_4_SPURCHASE_R
                    grt_4_SRECEIVE_R
                    grt_4_SVOUCHER_R
                    grt_4_TRAVEL_R
                    grt_4_VOUCHER_R
                    grt_5_
                    jkl1_1_Name: jkl1
                    jkl1_2_Comment: name
                    jkl1_3_Member of groups:
                    jkl1_4_CONTRACT_R
                    jkl1_4_PDOC_R
                    jkl1_4_RDOC_R
                    jkl1_4_SPURCHASE_R
                    jkl1_4_SRECEIVE_R
                    jkl1_4_STRAVEL_R
                    jkl1_4_SVOUCHER_R
                    jkl1_4_TRAVEL_R
                    jkl1_4_VOUCHER_R
                    jkl1_5_
                    wr3_1_Name: wr3
                    wr3_2_Comment: name
                    wr3_3_Member of groups:
                    wr3_4_PDOC_R
                    wr3_4_RDOC_R
                    wr3_4_RDOC_RW
                    wr3_4_RDOC_RWX
                    wr3_4_SPURCHASE_R
                    wr3_4_SRECEIVE_R
                    wr3_4_SRECEIVE_RWX
                    wr3_5_
                    

                    And, finally, we get rid of all the prefixes, in all the lines of the list. Very easy with the simple S/R, below :

                    SEARCH ^.+_\d_

                    REPLACE Leave EMPTY

                    So, we are left with :

                    Name: asd0
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    VOUCHER_R
                    
                    Name: asdq
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    VOUCHER_R
                    
                    Name: ft3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    SORMPURCHASING_R
                    SORMRECEIVING_R
                    SORMVOUCHERS_R
                    VOUCHER_R
                    
                    Name: grt
                    Comment: name
                    Member of groups:
                    CONTRACT_R
                    Lease_R
                    PDOC_R
                    RDOC_R
                    SPURCHASE_R
                    SRECEIVE_R
                    SVOUCHER_R
                    TRAVEL_R
                    VOUCHER_R
                    
                    Name: jkl1
                    Comment: name
                    Member of groups:
                    CONTRACT_R
                    PDOC_R
                    RDOC_R
                    SPURCHASE_R
                    SRECEIVE_R
                    STRAVEL_R
                    SVOUCHER_R
                    TRAVEL_R
                    VOUCHER_R
                    
                    Name: wr3
                    Comment: name
                    Member of groups:
                    PDOC_R
                    RDOC_R
                    RDOC_RW
                    RDOC_RWX
                    SPURCHASE_R
                    SRECEIVE_R
                    SRECEIVE_RWX
                    

                    Remark :

                    Just notice that, both :

                    • The different groups are alphabetically sorted

                    • The list of all the members, in each group, is also, alphabetically sorted

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 2
                    • Jeff TestJ
                      Jeff Test
                      last edited by

                      That was utter brilliance @guy038 . I, i can’t even. Hats off on that.
                      (This is not to say that the py script from @PeterJones is out the window, this will just help me convince folks that Notepad++ is the best.) :)

                      +++Unrelated+++
                      So, is there a book, like NoStarchPress or something for Notepad++? How did you learn all these fancy bits for the regex? (its Perl?)
                      Python vs. Perl? (I’d like to learn one, or both. My aim is system administration/automation. Recommendations?)

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors