Sorting multiple chunks of non-connected text

Jeff Test

How do I sort disparate chunks of text across a large document?
Is a macro going to be the best option?

Bounds would be (everything between):
Member of groups:

to

#########################

Name: asd0
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R

Name: wr3
Comment: name
Member of groups:
PDOC_R
SPURCHASE_R
RDOC_R
RDOC_RW
SRECEIVE_RWX
RDOC_RWX
SRECEIVE_R

Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R

Name: grt
Comment: name
Member of groups:
TRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
Lease_R

Name: asdq
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R

Name: jkl1
Comment: name
Member of groups:
TRAVEL_R
STRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R

PeterJones

I would do it in a three-step process:

Search and replace (with
- Find What : \R(?!^-)
  - find all the end-of-lines that are not followed by a - at the beginning of the line (if you need to allow some non-divider hyphens to start a line, you can put how ever many hyphens you showed: it got rendered as a horizontal-rule in the forum, so I cannot tell exactly how many hyphens you had on each divider line)
- Replace with : ☺
  - pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley ☺), just to avoid collisions with common ASCII characters.
- Replace All
Edit > Line Operations > Sort Lines Lexicographically Ascending
Search and replace
- Find What : ☺
- Replace With : \r\n
- Replace All

If you wanted to sort by something other than name, you’ll have to do some fancier manipulation in the original search-and-replace to easily sort it inside Notepad++. In theory, you could also use the PythonScript plugin to do more manipulation.

Personally, if it gets more complicated than just sorting simply by name, I think it would be worth it to pick some scripting language that you’re familiar with (Perl is my go-to for text manipulation, but nowadays, more people know Python than Perl – and, as I mentioned, there is a PythonScript plugin that will allow you to keep things “inside” Notepad++): then you could parse the document into a meaningful structure, and then deal with the structured data, to sort with or rearrange, as you please.

Or, if you don’t want to learn another language, you might do a search-and-replace to get it into an official CSV (being careful of lines commas, and merging all the members of the groups with a different delimiter than you use between fields), then open it with your favorite spreadsheet or database, and sort on the appropriate column(s).

For details on regular expressions, this FAQ Desk entry points you to plenty of good sites for reference material.

Jeff Test

Thank you Peter. :)

It was actually 80 underscores, not dashes. But I do see now that they show as a page break.
Also, what didn’t show up, was that each of the items I wanted to sort had 5 spaces in from of them:

Name: jkl1
Comment: name
Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
_______________________________________________________________________________

The above suggestion caused all groups of the same name to move together and moved all the users.
I apologize for not stating this better earlier: I was hoping to maintain each users own groups, but sorted alphabetically down a long list.
Each “Name” groups needs to remain as:

Name:
Comment:
Member Of Groups:
A
B
C
...
_______________________________________________________________________________

PeterJones

Hmm, in my experiments (with a hyphen separator), it did keep the indenting, I thought. Maybe I’m wrong.

This does the sorting that I think you wanted, but with the underscore separators, and I have confirmed indentation stayed consistent:

Find ([^_])\r\n Replace \1☺ (this assumes windows CRLF line endings: the generic \R matched too much)
Sort
Find \1☺ Replace \r\n

So

Name: jkl1
Comment: name
Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
_______________________________________________________________________________
Name: abc3
Comment: name
Member of groups:
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
_______________________________________________________________________________
Name: xyz9
Comment: name
Member of groups:
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
_______________________________________________________________________________
Name: abc1
Comment: name
Member of groups:
    TRAVEL_R
    VOUCHER_R
    SVOUCHER_R
_______________________________________________________________________________

became

Name: abc1
Comment: name
Member of groups:
    TRAVEL_R
    VOUCHER_R
    SVOUCHER_R
_______________________________________________________________________________
Name: abc3
Comment: name
Member of groups:
    SPURCHASE_R
    VOUCHER_R
    SVOUCHER_R
_______________________________________________________________________________
Name: jkl1
Comment: name
Member of groups:
    TRAVEL_R
    STRAVEL_R
    PDOC_R
_______________________________________________________________________________
Name: xyz9
Comment: name
Member of groups:
    RDOC_R
    SRECEIVE_R
    CONTRACT_R
_______________________________________________________________________________

… but now I’m confused: Do you want to alphabetize by name? Or do you want to leave the records all in the same order, and just sort the group names within an individual record? Or do you want both (ie, records going alphabetical by name, followed by groups going alphabetical inside it’s “member of groups” section)?

If what I’ve shown doesn’t give what you want, then show at least two records (sets of lines) with at least two groups listed in each record, and show a before and after scenario. (And make sure you include any exceptions or edge cases you think might crop up)

That said, if you want to alphabetize only the group lines within a record, or something more complicated, I will reiterate: “if it gets more complicated than just simply sortin by name, I think it would be worth it to pick some scripting language…”. Of course, @guy038 can probably work his magic on even something more complicated, but you’ll have to give a sufficient example to include all edge cases.

guy038

Hello, @jeff-test,

Do you expect an output text like below, where each block of member names, containing the _ character, is sorted ?

#########################

Name: asd0
Comment: name
Member of groups:
PDOC_R
RDOC_R
VOUCHER_R

Name: wr3
Comment: name
Member of groups:
PDOC_R
RDOC_R
RDOC_RW
RDOC_RWX
SPURCHASE_R
SRECEIVE_R
SRECEIVE_RWX

Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R

Name: grt
Comment: name
Member of groups:
CONTRACT_R
Lease_R
PDOC_R
RDOC_R
SPURCHASE_R
SRECEIVE_R
SVOUCHER_R
TRAVEL_R
VOUCHER_R

Name: asdq
Comment: name
Member of groups:
PDOC_R
RDOC_R
VOUCHER_R

Name: jkl1
Comment: name
Member of groups:
CONTRACT_R
PDOC_R
RDOC_R
SPURCHASE_R
SRECEIVE_R
STRAVEL_R
SVOUCHER_R
TRAVEL_R
VOUCHER_R

See you later,

Best regards,

guy038

Jeff Test

Yes. Each block: doesnt matter what order they are in, as long as each user remains with their listed access. However, each little list of ‘groups’ needs to be sorted, such as @guy038 has stated/shown.

PeterJones

So, assuming you have the Python Script Plugin…

Create the script: Plugins > Python Script > New Script

Paste the script, and save with meaningful name: sortGroups.py

 #https://notepad-plus-plus.org/community/topic/15971/sorting-multiple-chunks-of-non-connected-text/4
 # works on active view / file
 import sys
 from Npp import *

 console.clear()
 #console.show()

 # for debug, change view and document index
 #i = notepad.getCurrentDocIndex(1)
 #notepad.activateIndex(1,i)

 keepGoing = True

 editor.documentEnd()               # go to the last position
 end2 = editor.getCurrentPos()      # record the position
 #console.write("editor.end = " + str(end2)+"\n")
 editor.documentStart()             # back to the beginning
 start2 = editor.getCurrentPos()    # record the position
 #console.write("editor.start = " + str(start2)+"\n")

 while keepGoing:
     # find the group:\R prefix
     position = editor.findText( FINDOPTION.REGEXP, start2, end2, "groups:$")
     if position is None:
         break
     #console.write("editor: findText @ " + str(position[0]) + ":" + str(position[1]) + "\n")

     # find ___ (or EOF)
     underscore = editor.findText( FINDOPTION.REGEXP, position[1], end2, "^_+$")
     if underscore is None:
         keepGoing = False
         underscore = (end2, end2)

     # select the text
     #console.write("editor: findText @ " + str(underscore[0]) + ":" + str(underscore[1]) + "\n")
     editor.setSelectionStart(position[1]+2)    # start after the newline
     editor.setSelectionEnd(underscore[0]-2)    # end before the newline

     # okay, now the first match is highlighted... need to run the Edit > Line Operations > Sort Lines Lexicographically Ascending...
     # maybe notepad.menuCommand() or notepad.runMenuCommand()
     notepad.menuCommand(42059)  # got from https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml
     #console.write(str(keepGoing))

     # start at the end of the last group
     start2 = underscore[0]

 # want nothing selected at end
 editor.clearSelections()

Input file (make sure this is the active window / view / document in Notepad++)

 Name: jkl1
 Comment: name
 Member of groups:
     STRAVEL_R
     PDOC_R
     TRAVEL_R
 _______________________________________________________________________________
 Name: abc3
 Comment: name
 Member of groups:
     SPURCHASE_R
     VOUCHER_R
     SVOUCHER_R
 _______________________________________________________________________________
 Name: xyz9
 Comment: name
 Member of groups:
     RDOC_R
     SRECEIVE_R
     CONTRACT_R
 _______________________________________________________________________________
 Name: abc1
 Comment: name
 Member of groups:
     TRAVEL_R
     VOUCHER_R
     SVOUCHER_R
 _______________________________________________________________________________

Plugins > Python Script > Scripts > sortGroups.py

Result:

 Name: jkl1
 Comment: name
 Member of groups:
     PDOC_R
     STRAVEL_R
     TRAVEL_R
 _______________________________________________________________________________
 Name: abc3
 Comment: name
 Member of groups:
     SPURCHASE_R
     SVOUCHER_R
     VOUCHER_R
 _______________________________________________________________________________
 Name: xyz9
 Comment: name
 Member of groups:
     CONTRACT_R
     RDOC_R
     SRECEIVE_R
 _______________________________________________________________________________
 Name: abc1
 Comment: name
 Member of groups:
     SVOUCHER_R
     TRAVEL_R
     VOUCHER_R
 _______________________________________________________________________________

I think this does what you want.

guy038

Hi, @jeff-test, @peterjones and All,

Sorry for this late reply, but I was with my sister and brother-in-law, a couple of days !

So, here is, below, my solution, just using regex S/R and a N++ sort ! Of course, the Python script, from Peter is easier to use and, probably, quicker than all my stuff ! Just one advantage : this method can be processed with a minimalist N++ package Ah, Ah -;))

So, starting with this sample text, below, in a new N++ tab :

Name: asd0
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R

Name: wr3
Comment: name
Member of groups:
PDOC_R
SPURCHASE_R
RDOC_R
RDOC_RW
SRECEIVE_RWX
RDOC_RWX
SRECEIVE_R

Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R

Name: grt
Comment: name
Member of groups:
TRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
Lease_R

Name: asdq
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R

Name: jkl1
Comment: name
Member of groups:
TRAVEL_R
STRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R

Firstly, add, if necessary, an empty line at the very end of the list

Now, if we use the following regex S/R, which copies the name of each group, preceded by the # symbol, in a new line, located after the corresponding group :

SEARCH (?s-i)^Name:\x20(.+?\R).+?\R(?=\R|\z)

REPLACE $0#\1

we get the text :

Name: asd0
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R
#asd0

Name: wr3
Comment: name
Member of groups:
PDOC_R
SPURCHASE_R
RDOC_R
RDOC_RW
SRECEIVE_RWX
RDOC_RWX
SRECEIVE_R
#wr3

Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R
#ft3

Name: grt
Comment: name
Member of groups:
TRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
Lease_R
#grt

Name: asdq
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R
#asdq

Name: jkl1
Comment: name
Member of groups:
TRAVEL_R
STRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
#jkl1

Then, using this second regex S/R, which adds a prefix to each line of the list, depending on its type :

SEARCH (?-si)^((?:(Name:)|(Comment:)|(Member\x20)|(.+_)).+)(?=(?:(?s).+?)#(.+))|^#(.+)\R?

REPLACE (?2\6_1_\1)(?3\6_2_\1)(?4\6_3_\1)(?5\6_4_\1)(?7\7_5_)

we obtain :

asd0_1_Name: asd0
asd0_2_Comment: name
asd0_3_Member of groups:
asd0_4_PDOC_R
asd0_4_VOUCHER_R
asd0_4_RDOC_R
asd0_5_
wr3_1_Name: wr3
wr3_2_Comment: name
wr3_3_Member of groups:
wr3_4_PDOC_R
wr3_4_SPURCHASE_R
wr3_4_RDOC_R
wr3_4_RDOC_RW
wr3_4_SRECEIVE_RWX
wr3_4_RDOC_RWX
wr3_4_SRECEIVE_R
wr3_5_
ft3_1_Name: ft3
ft3_2_Comment: name
ft3_3_Member of groups:
ft3_4_PDOC_R
ft3_4_RDOC_R
ft3_4_SORMPURCHASING_R
ft3_4_SORMRECEIVING_R
ft3_4_SORMVOUCHERS_R
ft3_4_VOUCHER_R
ft3_5_
grt_1_Name: grt
grt_2_Comment: name
grt_3_Member of groups:
grt_4_TRAVEL_R
grt_4_PDOC_R
grt_4_SPURCHASE_R
grt_4_VOUCHER_R
grt_4_SVOUCHER_R
grt_4_RDOC_R
grt_4_SRECEIVE_R
grt_4_CONTRACT_R
grt_4_Lease_R
grt_5_
asdq_1_Name: asdq
asdq_2_Comment: name
asdq_3_Member of groups:
asdq_4_PDOC_R
asdq_4_VOUCHER_R
asdq_4_RDOC_R
asdq_5_
jkl1_1_Name: jkl1
jkl1_2_Comment: name
jkl1_3_Member of groups:
jkl1_4_TRAVEL_R
jkl1_4_STRAVEL_R
jkl1_4_PDOC_R
jkl1_4_SPURCHASE_R
jkl1_4_VOUCHER_R
jkl1_4_SVOUCHER_R
jkl1_4_RDOC_R
jkl1_4_SRECEIVE_R
jkl1_4_CONTRACT_R
jkl1_5_

Now, we perform the usual sort, using the menu command Edit > Line Operations > Sort Lines Lexicographically Ascending

which gives the following text :

asd0_1_Name: asd0
asd0_2_Comment: name
asd0_3_Member of groups:
asd0_4_PDOC_R
asd0_4_RDOC_R
asd0_4_VOUCHER_R
asd0_5_
asdq_1_Name: asdq
asdq_2_Comment: name
asdq_3_Member of groups:
asdq_4_PDOC_R
asdq_4_RDOC_R
asdq_4_VOUCHER_R
asdq_5_
ft3_1_Name: ft3
ft3_2_Comment: name
ft3_3_Member of groups:
ft3_4_PDOC_R
ft3_4_RDOC_R
ft3_4_SORMPURCHASING_R
ft3_4_SORMRECEIVING_R
ft3_4_SORMVOUCHERS_R
ft3_4_VOUCHER_R
ft3_5_
grt_1_Name: grt
grt_2_Comment: name
grt_3_Member of groups:
grt_4_CONTRACT_R
grt_4_Lease_R
grt_4_PDOC_R
grt_4_RDOC_R
grt_4_SPURCHASE_R
grt_4_SRECEIVE_R
grt_4_SVOUCHER_R
grt_4_TRAVEL_R
grt_4_VOUCHER_R
grt_5_
jkl1_1_Name: jkl1
jkl1_2_Comment: name
jkl1_3_Member of groups:
jkl1_4_CONTRACT_R
jkl1_4_PDOC_R
jkl1_4_RDOC_R
jkl1_4_SPURCHASE_R
jkl1_4_SRECEIVE_R
jkl1_4_STRAVEL_R
jkl1_4_SVOUCHER_R
jkl1_4_TRAVEL_R
jkl1_4_VOUCHER_R
jkl1_5_
wr3_1_Name: wr3
wr3_2_Comment: name
wr3_3_Member of groups:
wr3_4_PDOC_R
wr3_4_RDOC_R
wr3_4_RDOC_RW
wr3_4_RDOC_RWX
wr3_4_SPURCHASE_R
wr3_4_SRECEIVE_R
wr3_4_SRECEIVE_RWX
wr3_5_

And, finally, we get rid of all the prefixes, in all the lines of the list. Very easy with the simple S/R, below :

SEARCH ^.+_\d_

REPLACE Leave EMPTY

So, we are left with :

Name: asd0
Comment: name
Member of groups:
PDOC_R
RDOC_R
VOUCHER_R

Name: asdq
Comment: name
Member of groups:
PDOC_R
RDOC_R
VOUCHER_R

Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R

Name: grt
Comment: name
Member of groups:
CONTRACT_R
Lease_R
PDOC_R
RDOC_R
SPURCHASE_R
SRECEIVE_R
SVOUCHER_R
TRAVEL_R
VOUCHER_R

Name: jkl1
Comment: name
Member of groups:
CONTRACT_R
PDOC_R
RDOC_R
SPURCHASE_R
SRECEIVE_R
STRAVEL_R
SVOUCHER_R
TRAVEL_R
VOUCHER_R

Name: wr3
Comment: name
Member of groups:
PDOC_R
RDOC_R
RDOC_RW
RDOC_RWX
SPURCHASE_R
SRECEIVE_R
SRECEIVE_RWX

Remark :

Just notice that, both :

The different groups are alphabetically sorted
The list of all the members, in each group, is also, alphabetically sorted

Cheers,

guy038

Jeff Test

That was utter brilliance @guy038 . I, i can’t even. Hats off on that.
(This is not to say that the py script from @PeterJones is out the window, this will just help me convince folks that Notepad++ is the best.) :)

+++Unrelated+++
So, is there a book, like NoStarchPress or something for Notepad++? How did you learn all these fancy bits for the regex? (its Perl?)
Python vs. Perl? (I’d like to learn one, or both. My aim is system administration/automation. Recommendations?)