• Login
Community
  • Login

Sorting multiple chunks of non-connected text

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
sorting
9 Posts 3 Posters 3.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    Jeff Test
    last edited by Jun 26, 2018, 10:58 PM

    How do I sort disparate chunks of text across a large document?
    Is a macro going to be the best option?

    Bounds would be (everything between):
    Member of groups:

    to


    #########################

    Name: asd0
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R


    Name: wr3
    Comment: name
    Member of groups:
    PDOC_R
    SPURCHASE_R
    RDOC_R
    RDOC_RW
    SRECEIVE_RWX
    RDOC_RWX
    SRECEIVE_R


    Name: ft3
    Comment: name
    Member of groups:
    PDOC_R
    RDOC_R
    SORMPURCHASING_R
    SORMRECEIVING_R
    SORMVOUCHERS_R
    VOUCHER_R


    Name: grt
    Comment: name
    Member of groups:
    TRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
    Lease_R


    Name: asdq
    Comment: name
    Member of groups:
    PDOC_R
    VOUCHER_R
    RDOC_R


    Name: jkl1
    Comment: name
    Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R


    1 Reply Last reply Reply Quote 0
    • P
      PeterJones
      last edited by Jun 27, 2018, 1:40 PM

      I would do it in a three-step process:

      1. Search and replace (with

        • Find What : \R(?!^-)

          • find all the end-of-lines that are not followed by a - at the beginning of the line (if you need to allow some non-divider hyphens to start a line, you can put how ever many hyphens you showed: it got rendered as a horizontal-rule in the forum, so I cannot tell exactly how many hyphens you had on each divider line)
        • Replace with : ☺

          • pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley ☺), just to avoid collisions with common ASCII characters.
        • Replace All

      2. Edit > Line Operations > Sort Lines Lexicographically Ascending

      3. Search and replace

        • Find What : ☺

        • Replace With : \r\n

        • Replace All

      If you wanted to sort by something other than name, you’ll have to do some fancier manipulation in the original search-and-replace to easily sort it inside Notepad++. In theory, you could also use the PythonScript plugin to do more manipulation.

      Personally, if it gets more complicated than just sorting simply by name, I think it would be worth it to pick some scripting language that you’re familiar with (Perl is my go-to for text manipulation, but nowadays, more people know Python than Perl – and, as I mentioned, there is a PythonScript plugin that will allow you to keep things “inside” Notepad++): then you could parse the document into a meaningful structure, and then deal with the structured data, to sort with or rearrange, as you please.

      Or, if you don’t want to learn another language, you might do a search-and-replace to get it into an official CSV (being careful of lines commas, and merging all the members of the groups with a different delimiter than you use between fields), then open it with your favorite spreadsheet or database, and sort on the appropriate column(s).

      For details on regular expressions, this FAQ Desk entry points you to plenty of good sites for reference material.

      1 Reply Last reply Reply Quote 1
      • J
        Jeff Test
        last edited by Jeff Test Jun 29, 2018, 3:12 PM Jun 29, 2018, 3:12 PM

        Thank you Peter. :)

        It was actually 80 underscores, not dashes. But I do see now that they show as a page break.
        Also, what didn’t show up, was that each of the items I wanted to sort had 5 spaces in from of them:

        Name: jkl1
        Comment: name
        Member of groups:
            TRAVEL_R
            STRAVEL_R
            PDOC_R
            SPURCHASE_R
            VOUCHER_R
            SVOUCHER_R
            RDOC_R
            SRECEIVE_R
            CONTRACT_R
        _______________________________________________________________________________
        

        The above suggestion caused all groups of the same name to move together and moved all the users.
        I apologize for not stating this better earlier: I was hoping to maintain each users own groups, but sorted alphabetically down a long list.
        Each “Name” groups needs to remain as:

        Name:
        Comment:
        Member Of Groups:
        A
        B
        C
        ...
        _______________________________________________________________________________
        
        1 Reply Last reply Reply Quote 1
        • P
          PeterJones
          last edited by Jun 29, 2018, 5:11 PM

          Hmm, in my experiments (with a hyphen separator), it did keep the indenting, I thought. Maybe I’m wrong.

          This does the sorting that I think you wanted, but with the underscore separators, and I have confirmed indentation stayed consistent:

          1. Find ([^_])\r\n Replace \1☺ (this assumes windows CRLF line endings: the generic \R matched too much)
          2. Sort
          3. Find \1☺ Replace \r\n

          So

          Name: jkl1
          Comment: name
          Member of groups:
              TRAVEL_R
              STRAVEL_R
              PDOC_R
          _______________________________________________________________________________
          Name: abc3
          Comment: name
          Member of groups:
              SPURCHASE_R
              VOUCHER_R
              SVOUCHER_R
          _______________________________________________________________________________
          Name: xyz9
          Comment: name
          Member of groups:
              RDOC_R
              SRECEIVE_R
              CONTRACT_R
          _______________________________________________________________________________
          Name: abc1
          Comment: name
          Member of groups:
              TRAVEL_R
              VOUCHER_R
              SVOUCHER_R
          _______________________________________________________________________________
          

          became

          Name: abc1
          Comment: name
          Member of groups:
              TRAVEL_R
              VOUCHER_R
              SVOUCHER_R
          _______________________________________________________________________________
          Name: abc3
          Comment: name
          Member of groups:
              SPURCHASE_R
              VOUCHER_R
              SVOUCHER_R
          _______________________________________________________________________________
          Name: jkl1
          Comment: name
          Member of groups:
              TRAVEL_R
              STRAVEL_R
              PDOC_R
          _______________________________________________________________________________
          Name: xyz9
          Comment: name
          Member of groups:
              RDOC_R
              SRECEIVE_R
              CONTRACT_R
          _______________________________________________________________________________
          

          … but now I’m confused: Do you want to alphabetize by name? Or do you want to leave the records all in the same order, and just sort the group names within an individual record? Or do you want both (ie, records going alphabetical by name, followed by groups going alphabetical inside it’s “member of groups” section)?

          If what I’ve shown doesn’t give what you want, then show at least two records (sets of lines) with at least two groups listed in each record, and show a before and after scenario. (And make sure you include any exceptions or edge cases you think might crop up)

          That said, if you want to alphabetize only the group lines within a record, or something more complicated, I will reiterate: “if it gets more complicated than just simply sortin by name, I think it would be worth it to pick some scripting language…”. Of course, @guy038 can probably work his magic on even something more complicated, but you’ll have to give a sufficient example to include all edge cases.

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by Jun 30, 2018, 3:00 AM

            Hello, @jeff-test,

            Do you expect an output text like below, where each block of member names, containing the _ character, is sorted ?

            #########################
            
            Name: asd0
            Comment: name
            Member of groups:
            PDOC_R
            RDOC_R
            VOUCHER_R
            
            Name: wr3
            Comment: name
            Member of groups:
            PDOC_R
            RDOC_R
            RDOC_RW
            RDOC_RWX
            SPURCHASE_R
            SRECEIVE_R
            SRECEIVE_RWX
            
            Name: ft3
            Comment: name
            Member of groups:
            PDOC_R
            RDOC_R
            SORMPURCHASING_R
            SORMRECEIVING_R
            SORMVOUCHERS_R
            VOUCHER_R
            
            Name: grt
            Comment: name
            Member of groups:
            CONTRACT_R
            Lease_R
            PDOC_R
            RDOC_R
            SPURCHASE_R
            SRECEIVE_R
            SVOUCHER_R
            TRAVEL_R
            VOUCHER_R
            
            Name: asdq
            Comment: name
            Member of groups:
            PDOC_R
            RDOC_R
            VOUCHER_R
            
            Name: jkl1
            Comment: name
            Member of groups:
            CONTRACT_R
            PDOC_R
            RDOC_R
            SPURCHASE_R
            SRECEIVE_R
            STRAVEL_R
            SVOUCHER_R
            TRAVEL_R
            VOUCHER_R
            

            See you later,

            Best regards,

            guy038

            1 Reply Last reply Reply Quote 1
            • J
              Jeff Test
              last edited by Jul 3, 2018, 3:36 PM

              Yes. Each block: doesnt matter what order they are in, as long as each user remains with their listed access. However, each little list of ‘groups’ needs to be sorted, such as @guy038 has stated/shown.

              1 Reply Last reply Reply Quote 1
              • P
                PeterJones
                last edited by Jul 3, 2018, 5:38 PM

                So, assuming you have the Python Script Plugin …

                1. Create the script: Plugins > Python Script > New Script

                2. Paste the script, and save with meaningful name: sortGroups.py

                   #https://notepad-plus-plus.org/community/topic/15971/sorting-multiple-chunks-of-non-connected-text/4
                   # works on active view / file
                   import sys
                   from Npp import *
                  
                   console.clear()
                   #console.show()
                  
                   # for debug, change view and document index
                   #i = notepad.getCurrentDocIndex(1)
                   #notepad.activateIndex(1,i)
                  
                   keepGoing = True
                  
                   editor.documentEnd()               # go to the last position
                   end2 = editor.getCurrentPos()      # record the position
                   #console.write("editor.end = " + str(end2)+"\n")
                   editor.documentStart()             # back to the beginning
                   start2 = editor.getCurrentPos()    # record the position
                   #console.write("editor.start = " + str(start2)+"\n")
                  
                   while keepGoing:
                       # find the group:\R prefix
                       position = editor.findText( FINDOPTION.REGEXP, start2, end2, "groups:$")
                       if position is None:
                           break
                       #console.write("editor: findText @ " + str(position[0]) + ":" + str(position[1]) + "\n")
                  
                       # find ___ (or EOF)
                       underscore = editor.findText( FINDOPTION.REGEXP, position[1], end2, "^_+$")
                       if underscore is None:
                           keepGoing = False
                           underscore = (end2, end2)
                  
                       # select the text
                       #console.write("editor: findText @ " + str(underscore[0]) + ":" + str(underscore[1]) + "\n")
                       editor.setSelectionStart(position[1]+2)    # start after the newline
                       editor.setSelectionEnd(underscore[0]-2)    # end before the newline
                  
                       # okay, now the first match is highlighted... need to run the Edit > Line Operations > Sort Lines Lexicographically Ascending...
                       # maybe notepad.menuCommand() or notepad.runMenuCommand()
                       notepad.menuCommand(42059)  # got from https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml
                       #console.write(str(keepGoing))
                  
                       # start at the end of the last group
                       start2 = underscore[0]
                  
                   # want nothing selected at end
                   editor.clearSelections()
                  
                3. Input file (make sure this is the active window / view / document in Notepad++)

                   Name: jkl1
                   Comment: name
                   Member of groups:
                       STRAVEL_R
                       PDOC_R
                       TRAVEL_R
                   _______________________________________________________________________________
                   Name: abc3
                   Comment: name
                   Member of groups:
                       SPURCHASE_R
                       VOUCHER_R
                       SVOUCHER_R
                   _______________________________________________________________________________
                   Name: xyz9
                   Comment: name
                   Member of groups:
                       RDOC_R
                       SRECEIVE_R
                       CONTRACT_R
                   _______________________________________________________________________________
                   Name: abc1
                   Comment: name
                   Member of groups:
                       TRAVEL_R
                       VOUCHER_R
                       SVOUCHER_R
                   _______________________________________________________________________________
                  
                4. Plugins > Python Script > Scripts > sortGroups.py

                5. Result:

                   Name: jkl1
                   Comment: name
                   Member of groups:
                       PDOC_R
                       STRAVEL_R
                       TRAVEL_R
                   _______________________________________________________________________________
                   Name: abc3
                   Comment: name
                   Member of groups:
                       SPURCHASE_R
                       SVOUCHER_R
                       VOUCHER_R
                   _______________________________________________________________________________
                   Name: xyz9
                   Comment: name
                   Member of groups:
                       CONTRACT_R
                       RDOC_R
                       SRECEIVE_R
                   _______________________________________________________________________________
                   Name: abc1
                   Comment: name
                   Member of groups:
                       SVOUCHER_R
                       TRAVEL_R
                       VOUCHER_R
                   _______________________________________________________________________________
                  

                I think this does what you want.

                1 Reply Last reply Reply Quote 1
                • G
                  guy038
                  last edited by Jul 5, 2018, 6:44 PM

                  Hi, @jeff-test, @peterjones and All,

                  Sorry for this late reply, but I was with my sister and brother-in-law, a couple of days !


                  So, here is, below, my solution, just using regex S/R and a N++ sort ! Of course, the Python script, from Peter is easier to use and, probably, quicker than all my stuff ! Just one advantage : this method can be processed with a minimalist N++ package Ah, Ah -;))

                  So, starting with this sample text, below, in a new N++ tab :

                  Name: asd0
                  Comment: name
                  Member of groups:
                  PDOC_R
                  VOUCHER_R
                  RDOC_R
                  
                  Name: wr3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  SPURCHASE_R
                  RDOC_R
                  RDOC_RW
                  SRECEIVE_RWX
                  RDOC_RWX
                  SRECEIVE_R
                  
                  Name: ft3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  SORMPURCHASING_R
                  SORMRECEIVING_R
                  SORMVOUCHERS_R
                  VOUCHER_R
                  
                  Name: grt
                  Comment: name
                  Member of groups:
                  TRAVEL_R
                  PDOC_R
                  SPURCHASE_R
                  VOUCHER_R
                  SVOUCHER_R
                  RDOC_R
                  SRECEIVE_R
                  CONTRACT_R
                  Lease_R
                  
                  Name: asdq
                  Comment: name
                  Member of groups:
                  PDOC_R
                  VOUCHER_R
                  RDOC_R
                  
                  Name: jkl1
                  Comment: name
                  Member of groups:
                  TRAVEL_R
                  STRAVEL_R
                  PDOC_R
                  SPURCHASE_R
                  VOUCHER_R
                  SVOUCHER_R
                  RDOC_R
                  SRECEIVE_R
                  CONTRACT_R
                  

                  Firstly, add, if necessary, an empty line at the very end of the list


                  Now, if we use the following regex S/R, which copies the name of each group, preceded by the # symbol, in a new line, located after the corresponding group :

                  SEARCH (?s-i)^Name:\x20(.+?\R).+?\R(?=\R|\z)

                  REPLACE $0#\1

                  we get the text :

                  Name: asd0
                  Comment: name
                  Member of groups:
                  PDOC_R
                  VOUCHER_R
                  RDOC_R
                  #asd0
                  
                  Name: wr3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  SPURCHASE_R
                  RDOC_R
                  RDOC_RW
                  SRECEIVE_RWX
                  RDOC_RWX
                  SRECEIVE_R
                  #wr3
                  
                  Name: ft3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  SORMPURCHASING_R
                  SORMRECEIVING_R
                  SORMVOUCHERS_R
                  VOUCHER_R
                  #ft3
                  
                  Name: grt
                  Comment: name
                  Member of groups:
                  TRAVEL_R
                  PDOC_R
                  SPURCHASE_R
                  VOUCHER_R
                  SVOUCHER_R
                  RDOC_R
                  SRECEIVE_R
                  CONTRACT_R
                  Lease_R
                  #grt
                  
                  Name: asdq
                  Comment: name
                  Member of groups:
                  PDOC_R
                  VOUCHER_R
                  RDOC_R
                  #asdq
                  
                  Name: jkl1
                  Comment: name
                  Member of groups:
                  TRAVEL_R
                  STRAVEL_R
                  PDOC_R
                  SPURCHASE_R
                  VOUCHER_R
                  SVOUCHER_R
                  RDOC_R
                  SRECEIVE_R
                  CONTRACT_R
                  #jkl1
                  

                  Then, using this second regex S/R, which adds a prefix to each line of the list, depending on its type :

                  SEARCH (?-si)^((?:(Name:)|(Comment:)|(Member\x20)|(.+_)).+)(?=(?:(?s).+?)#(.+))|^#(.+)\R?

                  REPLACE (?2\6_1_\1)(?3\6_2_\1)(?4\6_3_\1)(?5\6_4_\1)(?7\7_5_)

                  we obtain :

                  asd0_1_Name: asd0
                  asd0_2_Comment: name
                  asd0_3_Member of groups:
                  asd0_4_PDOC_R
                  asd0_4_VOUCHER_R
                  asd0_4_RDOC_R
                  asd0_5_
                  wr3_1_Name: wr3
                  wr3_2_Comment: name
                  wr3_3_Member of groups:
                  wr3_4_PDOC_R
                  wr3_4_SPURCHASE_R
                  wr3_4_RDOC_R
                  wr3_4_RDOC_RW
                  wr3_4_SRECEIVE_RWX
                  wr3_4_RDOC_RWX
                  wr3_4_SRECEIVE_R
                  wr3_5_
                  ft3_1_Name: ft3
                  ft3_2_Comment: name
                  ft3_3_Member of groups:
                  ft3_4_PDOC_R
                  ft3_4_RDOC_R
                  ft3_4_SORMPURCHASING_R
                  ft3_4_SORMRECEIVING_R
                  ft3_4_SORMVOUCHERS_R
                  ft3_4_VOUCHER_R
                  ft3_5_
                  grt_1_Name: grt
                  grt_2_Comment: name
                  grt_3_Member of groups:
                  grt_4_TRAVEL_R
                  grt_4_PDOC_R
                  grt_4_SPURCHASE_R
                  grt_4_VOUCHER_R
                  grt_4_SVOUCHER_R
                  grt_4_RDOC_R
                  grt_4_SRECEIVE_R
                  grt_4_CONTRACT_R
                  grt_4_Lease_R
                  grt_5_
                  asdq_1_Name: asdq
                  asdq_2_Comment: name
                  asdq_3_Member of groups:
                  asdq_4_PDOC_R
                  asdq_4_VOUCHER_R
                  asdq_4_RDOC_R
                  asdq_5_
                  jkl1_1_Name: jkl1
                  jkl1_2_Comment: name
                  jkl1_3_Member of groups:
                  jkl1_4_TRAVEL_R
                  jkl1_4_STRAVEL_R
                  jkl1_4_PDOC_R
                  jkl1_4_SPURCHASE_R
                  jkl1_4_VOUCHER_R
                  jkl1_4_SVOUCHER_R
                  jkl1_4_RDOC_R
                  jkl1_4_SRECEIVE_R
                  jkl1_4_CONTRACT_R
                  jkl1_5_
                  

                  Now, we perform the usual sort, using the menu command Edit > Line Operations > Sort Lines Lexicographically Ascending

                  which gives the following text :

                  asd0_1_Name: asd0
                  asd0_2_Comment: name
                  asd0_3_Member of groups:
                  asd0_4_PDOC_R
                  asd0_4_RDOC_R
                  asd0_4_VOUCHER_R
                  asd0_5_
                  asdq_1_Name: asdq
                  asdq_2_Comment: name
                  asdq_3_Member of groups:
                  asdq_4_PDOC_R
                  asdq_4_RDOC_R
                  asdq_4_VOUCHER_R
                  asdq_5_
                  ft3_1_Name: ft3
                  ft3_2_Comment: name
                  ft3_3_Member of groups:
                  ft3_4_PDOC_R
                  ft3_4_RDOC_R
                  ft3_4_SORMPURCHASING_R
                  ft3_4_SORMRECEIVING_R
                  ft3_4_SORMVOUCHERS_R
                  ft3_4_VOUCHER_R
                  ft3_5_
                  grt_1_Name: grt
                  grt_2_Comment: name
                  grt_3_Member of groups:
                  grt_4_CONTRACT_R
                  grt_4_Lease_R
                  grt_4_PDOC_R
                  grt_4_RDOC_R
                  grt_4_SPURCHASE_R
                  grt_4_SRECEIVE_R
                  grt_4_SVOUCHER_R
                  grt_4_TRAVEL_R
                  grt_4_VOUCHER_R
                  grt_5_
                  jkl1_1_Name: jkl1
                  jkl1_2_Comment: name
                  jkl1_3_Member of groups:
                  jkl1_4_CONTRACT_R
                  jkl1_4_PDOC_R
                  jkl1_4_RDOC_R
                  jkl1_4_SPURCHASE_R
                  jkl1_4_SRECEIVE_R
                  jkl1_4_STRAVEL_R
                  jkl1_4_SVOUCHER_R
                  jkl1_4_TRAVEL_R
                  jkl1_4_VOUCHER_R
                  jkl1_5_
                  wr3_1_Name: wr3
                  wr3_2_Comment: name
                  wr3_3_Member of groups:
                  wr3_4_PDOC_R
                  wr3_4_RDOC_R
                  wr3_4_RDOC_RW
                  wr3_4_RDOC_RWX
                  wr3_4_SPURCHASE_R
                  wr3_4_SRECEIVE_R
                  wr3_4_SRECEIVE_RWX
                  wr3_5_
                  

                  And, finally, we get rid of all the prefixes, in all the lines of the list. Very easy with the simple S/R, below :

                  SEARCH ^.+_\d_

                  REPLACE Leave EMPTY

                  So, we are left with :

                  Name: asd0
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  VOUCHER_R
                  
                  Name: asdq
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  VOUCHER_R
                  
                  Name: ft3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  SORMPURCHASING_R
                  SORMRECEIVING_R
                  SORMVOUCHERS_R
                  VOUCHER_R
                  
                  Name: grt
                  Comment: name
                  Member of groups:
                  CONTRACT_R
                  Lease_R
                  PDOC_R
                  RDOC_R
                  SPURCHASE_R
                  SRECEIVE_R
                  SVOUCHER_R
                  TRAVEL_R
                  VOUCHER_R
                  
                  Name: jkl1
                  Comment: name
                  Member of groups:
                  CONTRACT_R
                  PDOC_R
                  RDOC_R
                  SPURCHASE_R
                  SRECEIVE_R
                  STRAVEL_R
                  SVOUCHER_R
                  TRAVEL_R
                  VOUCHER_R
                  
                  Name: wr3
                  Comment: name
                  Member of groups:
                  PDOC_R
                  RDOC_R
                  RDOC_RW
                  RDOC_RWX
                  SPURCHASE_R
                  SRECEIVE_R
                  SRECEIVE_RWX
                  

                  Remark :

                  Just notice that, both :

                  • The different groups are alphabetically sorted

                  • The list of all the members, in each group, is also, alphabetically sorted

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 2
                  • J
                    Jeff Test
                    last edited by Jul 6, 2018, 8:29 PM

                    That was utter brilliance @guy038 . I, i can’t even. Hats off on that.
                    (This is not to say that the py script from @PeterJones is out the window, this will just help me convince folks that Notepad++ is the best.) :)

                    +++Unrelated+++
                    So, is there a book, like NoStarchPress or something for Notepad++? How did you learn all these fancy bits for the regex? (its Perl?)
                    Python vs. Perl? (I’d like to learn one, or both. My aim is system administration/automation. Recommendations?)

                    1 Reply Last reply Reply Quote 0
                    7 out of 9
                    • First post
                      7/9
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors