Sorting multiple chunks of non-connected text
-
How do I sort disparate chunks of text across a large document?
Is a macro going to be the best option?Bounds would be (everything between):
Member of groups:to
#########################
Name: asd0
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R
Name: wr3
Comment: name
Member of groups:
PDOC_R
SPURCHASE_R
RDOC_R
RDOC_RW
SRECEIVE_RWX
RDOC_RWX
SRECEIVE_R
Name: ft3
Comment: name
Member of groups:
PDOC_R
RDOC_R
SORMPURCHASING_R
SORMRECEIVING_R
SORMVOUCHERS_R
VOUCHER_R
Name: grt
Comment: name
Member of groups:
TRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
Lease_R
Name: asdq
Comment: name
Member of groups:
PDOC_R
VOUCHER_R
RDOC_R
Name: jkl1
Comment: name
Member of groups:
TRAVEL_R
STRAVEL_R
PDOC_R
SPURCHASE_R
VOUCHER_R
SVOUCHER_R
RDOC_R
SRECEIVE_R
CONTRACT_R
-
I would do it in a three-step process:
-
Search and replace (with
-
Find What :
\R(?!^-)
- find all the end-of-lines that are not followed by a - at the beginning of the line (if you need to allow some non-divider hyphens to start a line, you can put how ever many hyphens you showed: it got rendered as a horizontal-rule in the forum, so I cannot tell exactly how many hyphens you had on each divider line)
-
Replace with :
☺
- pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley
☺
), just to avoid collisions with common ASCII characters.
- pick some character that doesn’t occur elsewhere in your document. You might want to pick a unicode character that’s not likely to occur in your text (I picked the smiley
-
Replace All
-
-
Edit > Line Operations > Sort Lines Lexicographically Ascending
-
Search and replace
-
Find What :
☺
-
Replace With :
\r\n
-
Replace All
-
If you wanted to sort by something other than name, you’ll have to do some fancier manipulation in the original search-and-replace to easily sort it inside Notepad++. In theory, you could also use the PythonScript plugin to do more manipulation.
Personally, if it gets more complicated than just sorting simply by name, I think it would be worth it to pick some scripting language that you’re familiar with (Perl is my go-to for text manipulation, but nowadays, more people know Python than Perl – and, as I mentioned, there is a PythonScript plugin that will allow you to keep things “inside” Notepad++): then you could parse the document into a meaningful structure, and then deal with the structured data, to sort with or rearrange, as you please.
Or, if you don’t want to learn another language, you might do a search-and-replace to get it into an official CSV (being careful of lines commas, and merging all the members of the groups with a different delimiter than you use between fields), then open it with your favorite spreadsheet or database, and sort on the appropriate column(s).
For details on regular expressions, this FAQ Desk entry points you to plenty of good sites for reference material.
-
-
Thank you Peter. :)
It was actually 80 underscores, not dashes. But I do see now that they show as a page break.
Also, what didn’t show up, was that each of the items I wanted to sort had 5 spaces in from of them:Name: jkl1 Comment: name Member of groups: TRAVEL_R STRAVEL_R PDOC_R SPURCHASE_R VOUCHER_R SVOUCHER_R RDOC_R SRECEIVE_R CONTRACT_R _______________________________________________________________________________
The above suggestion caused all groups of the same name to move together and moved all the users.
I apologize for not stating this better earlier: I was hoping to maintain each users own groups, but sorted alphabetically down a long list.
Each “Name” groups needs to remain as:Name: Comment: Member Of Groups: A B C ... _______________________________________________________________________________
-
Hmm, in my experiments (with a hyphen separator), it did keep the indenting, I thought. Maybe I’m wrong.
This does the sorting that I think you wanted, but with the underscore separators, and I have confirmed indentation stayed consistent:
- Find
([^_])\r\n
Replace\1☺
(this assumes windows CRLF line endings: the generic\R
matched too much) - Sort
- Find
\1☺
Replace\r\n
So
Name: jkl1 Comment: name Member of groups: TRAVEL_R STRAVEL_R PDOC_R _______________________________________________________________________________ Name: abc3 Comment: name Member of groups: SPURCHASE_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________ Name: xyz9 Comment: name Member of groups: RDOC_R SRECEIVE_R CONTRACT_R _______________________________________________________________________________ Name: abc1 Comment: name Member of groups: TRAVEL_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________
became
Name: abc1 Comment: name Member of groups: TRAVEL_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________ Name: abc3 Comment: name Member of groups: SPURCHASE_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________ Name: jkl1 Comment: name Member of groups: TRAVEL_R STRAVEL_R PDOC_R _______________________________________________________________________________ Name: xyz9 Comment: name Member of groups: RDOC_R SRECEIVE_R CONTRACT_R _______________________________________________________________________________
… but now I’m confused: Do you want to alphabetize by name? Or do you want to leave the records all in the same order, and just sort the group names within an individual record? Or do you want both (ie, records going alphabetical by name, followed by groups going alphabetical inside it’s “member of groups” section)?
If what I’ve shown doesn’t give what you want, then show at least two records (sets of lines) with at least two groups listed in each record, and show a before and after scenario. (And make sure you include any exceptions or edge cases you think might crop up)
That said, if you want to alphabetize only the group lines within a record, or something more complicated, I will reiterate: “if it gets more complicated than just simply sortin by name, I think it would be worth it to pick some scripting language…”. Of course, @guy038 can probably work his magic on even something more complicated, but you’ll have to give a sufficient example to include all edge cases.
- Find
-
Hello, @jeff-test,
Do you expect an output text like below, where each block of member names, containing the
_
character, is sorted ?######################### Name: asd0 Comment: name Member of groups: PDOC_R RDOC_R VOUCHER_R Name: wr3 Comment: name Member of groups: PDOC_R RDOC_R RDOC_RW RDOC_RWX SPURCHASE_R SRECEIVE_R SRECEIVE_RWX Name: ft3 Comment: name Member of groups: PDOC_R RDOC_R SORMPURCHASING_R SORMRECEIVING_R SORMVOUCHERS_R VOUCHER_R Name: grt Comment: name Member of groups: CONTRACT_R Lease_R PDOC_R RDOC_R SPURCHASE_R SRECEIVE_R SVOUCHER_R TRAVEL_R VOUCHER_R Name: asdq Comment: name Member of groups: PDOC_R RDOC_R VOUCHER_R Name: jkl1 Comment: name Member of groups: CONTRACT_R PDOC_R RDOC_R SPURCHASE_R SRECEIVE_R STRAVEL_R SVOUCHER_R TRAVEL_R VOUCHER_R
See you later,
Best regards,
guy038
-
Yes. Each block: doesnt matter what order they are in, as long as each user remains with their listed access. However, each little list of ‘groups’ needs to be sorted, such as @guy038 has stated/shown.
-
So, assuming you have the Python Script Plugin…
-
Create the script:
Plugins > Python Script > New Script
-
Paste the script, and save with meaningful name:
sortGroups.py
#https://notepad-plus-plus.org/community/topic/15971/sorting-multiple-chunks-of-non-connected-text/4 # works on active view / file import sys from Npp import * console.clear() #console.show() # for debug, change view and document index #i = notepad.getCurrentDocIndex(1) #notepad.activateIndex(1,i) keepGoing = True editor.documentEnd() # go to the last position end2 = editor.getCurrentPos() # record the position #console.write("editor.end = " + str(end2)+"\n") editor.documentStart() # back to the beginning start2 = editor.getCurrentPos() # record the position #console.write("editor.start = " + str(start2)+"\n") while keepGoing: # find the group:\R prefix position = editor.findText( FINDOPTION.REGEXP, start2, end2, "groups:$") if position is None: break #console.write("editor: findText @ " + str(position[0]) + ":" + str(position[1]) + "\n") # find ___ (or EOF) underscore = editor.findText( FINDOPTION.REGEXP, position[1], end2, "^_+$") if underscore is None: keepGoing = False underscore = (end2, end2) # select the text #console.write("editor: findText @ " + str(underscore[0]) + ":" + str(underscore[1]) + "\n") editor.setSelectionStart(position[1]+2) # start after the newline editor.setSelectionEnd(underscore[0]-2) # end before the newline # okay, now the first match is highlighted... need to run the Edit > Line Operations > Sort Lines Lexicographically Ascending... # maybe notepad.menuCommand() or notepad.runMenuCommand() notepad.menuCommand(42059) # got from https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/PowerEditor/installer/nativeLang/english.xml #console.write(str(keepGoing)) # start at the end of the last group start2 = underscore[0] # want nothing selected at end editor.clearSelections()
-
Input file (make sure this is the active window / view / document in Notepad++)
Name: jkl1 Comment: name Member of groups: STRAVEL_R PDOC_R TRAVEL_R _______________________________________________________________________________ Name: abc3 Comment: name Member of groups: SPURCHASE_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________ Name: xyz9 Comment: name Member of groups: RDOC_R SRECEIVE_R CONTRACT_R _______________________________________________________________________________ Name: abc1 Comment: name Member of groups: TRAVEL_R VOUCHER_R SVOUCHER_R _______________________________________________________________________________
-
Plugins > Python Script > Scripts > sortGroups.py
-
Result:
Name: jkl1 Comment: name Member of groups: PDOC_R STRAVEL_R TRAVEL_R _______________________________________________________________________________ Name: abc3 Comment: name Member of groups: SPURCHASE_R SVOUCHER_R VOUCHER_R _______________________________________________________________________________ Name: xyz9 Comment: name Member of groups: CONTRACT_R RDOC_R SRECEIVE_R _______________________________________________________________________________ Name: abc1 Comment: name Member of groups: SVOUCHER_R TRAVEL_R VOUCHER_R _______________________________________________________________________________
I think this does what you want.
-
-
Hi, @jeff-test, @peterjones and All,
Sorry for this late reply, but I was with my sister and brother-in-law, a couple of days !
So, here is, below, my solution, just using regex S/R and a N++ sort ! Of course, the Python script, from Peter is easier to use and, probably, quicker than all my stuff ! Just one advantage : this method can be processed with a minimalist N++ package Ah, Ah -;))
So, starting with this sample text, below, in a new N++ tab :
Name: asd0 Comment: name Member of groups: PDOC_R VOUCHER_R RDOC_R Name: wr3 Comment: name Member of groups: PDOC_R SPURCHASE_R RDOC_R RDOC_RW SRECEIVE_RWX RDOC_RWX SRECEIVE_R Name: ft3 Comment: name Member of groups: PDOC_R RDOC_R SORMPURCHASING_R SORMRECEIVING_R SORMVOUCHERS_R VOUCHER_R Name: grt Comment: name Member of groups: TRAVEL_R PDOC_R SPURCHASE_R VOUCHER_R SVOUCHER_R RDOC_R SRECEIVE_R CONTRACT_R Lease_R Name: asdq Comment: name Member of groups: PDOC_R VOUCHER_R RDOC_R Name: jkl1 Comment: name Member of groups: TRAVEL_R STRAVEL_R PDOC_R SPURCHASE_R VOUCHER_R SVOUCHER_R RDOC_R SRECEIVE_R CONTRACT_R
Firstly, add, if necessary, an empty line at the very end of the list
Now, if we use the following regex S/R, which copies the name of each group, preceded by the
#
symbol, in a new line, located after the corresponding group :SEARCH
(?s-i)^Name:\x20(.+?\R).+?\R(?=\R|\z)
REPLACE
$0#\1
we get the text :
Name: asd0 Comment: name Member of groups: PDOC_R VOUCHER_R RDOC_R #asd0 Name: wr3 Comment: name Member of groups: PDOC_R SPURCHASE_R RDOC_R RDOC_RW SRECEIVE_RWX RDOC_RWX SRECEIVE_R #wr3 Name: ft3 Comment: name Member of groups: PDOC_R RDOC_R SORMPURCHASING_R SORMRECEIVING_R SORMVOUCHERS_R VOUCHER_R #ft3 Name: grt Comment: name Member of groups: TRAVEL_R PDOC_R SPURCHASE_R VOUCHER_R SVOUCHER_R RDOC_R SRECEIVE_R CONTRACT_R Lease_R #grt Name: asdq Comment: name Member of groups: PDOC_R VOUCHER_R RDOC_R #asdq Name: jkl1 Comment: name Member of groups: TRAVEL_R STRAVEL_R PDOC_R SPURCHASE_R VOUCHER_R SVOUCHER_R RDOC_R SRECEIVE_R CONTRACT_R #jkl1
Then, using this second regex S/R, which adds a prefix to each line of the list, depending on its type :
SEARCH
(?-si)^((?:(Name:)|(Comment:)|(Member\x20)|(.+_)).+)(?=(?:(?s).+?)#(.+))|^#(.+)\R?
REPLACE
(?2\6_1_\1)(?3\6_2_\1)(?4\6_3_\1)(?5\6_4_\1)(?7\7_5_)
we obtain :
asd0_1_Name: asd0 asd0_2_Comment: name asd0_3_Member of groups: asd0_4_PDOC_R asd0_4_VOUCHER_R asd0_4_RDOC_R asd0_5_ wr3_1_Name: wr3 wr3_2_Comment: name wr3_3_Member of groups: wr3_4_PDOC_R wr3_4_SPURCHASE_R wr3_4_RDOC_R wr3_4_RDOC_RW wr3_4_SRECEIVE_RWX wr3_4_RDOC_RWX wr3_4_SRECEIVE_R wr3_5_ ft3_1_Name: ft3 ft3_2_Comment: name ft3_3_Member of groups: ft3_4_PDOC_R ft3_4_RDOC_R ft3_4_SORMPURCHASING_R ft3_4_SORMRECEIVING_R ft3_4_SORMVOUCHERS_R ft3_4_VOUCHER_R ft3_5_ grt_1_Name: grt grt_2_Comment: name grt_3_Member of groups: grt_4_TRAVEL_R grt_4_PDOC_R grt_4_SPURCHASE_R grt_4_VOUCHER_R grt_4_SVOUCHER_R grt_4_RDOC_R grt_4_SRECEIVE_R grt_4_CONTRACT_R grt_4_Lease_R grt_5_ asdq_1_Name: asdq asdq_2_Comment: name asdq_3_Member of groups: asdq_4_PDOC_R asdq_4_VOUCHER_R asdq_4_RDOC_R asdq_5_ jkl1_1_Name: jkl1 jkl1_2_Comment: name jkl1_3_Member of groups: jkl1_4_TRAVEL_R jkl1_4_STRAVEL_R jkl1_4_PDOC_R jkl1_4_SPURCHASE_R jkl1_4_VOUCHER_R jkl1_4_SVOUCHER_R jkl1_4_RDOC_R jkl1_4_SRECEIVE_R jkl1_4_CONTRACT_R jkl1_5_
Now, we perform the usual sort, using the menu command
Edit > Line Operations > Sort Lines Lexicographically Ascending
which gives the following text :
asd0_1_Name: asd0 asd0_2_Comment: name asd0_3_Member of groups: asd0_4_PDOC_R asd0_4_RDOC_R asd0_4_VOUCHER_R asd0_5_ asdq_1_Name: asdq asdq_2_Comment: name asdq_3_Member of groups: asdq_4_PDOC_R asdq_4_RDOC_R asdq_4_VOUCHER_R asdq_5_ ft3_1_Name: ft3 ft3_2_Comment: name ft3_3_Member of groups: ft3_4_PDOC_R ft3_4_RDOC_R ft3_4_SORMPURCHASING_R ft3_4_SORMRECEIVING_R ft3_4_SORMVOUCHERS_R ft3_4_VOUCHER_R ft3_5_ grt_1_Name: grt grt_2_Comment: name grt_3_Member of groups: grt_4_CONTRACT_R grt_4_Lease_R grt_4_PDOC_R grt_4_RDOC_R grt_4_SPURCHASE_R grt_4_SRECEIVE_R grt_4_SVOUCHER_R grt_4_TRAVEL_R grt_4_VOUCHER_R grt_5_ jkl1_1_Name: jkl1 jkl1_2_Comment: name jkl1_3_Member of groups: jkl1_4_CONTRACT_R jkl1_4_PDOC_R jkl1_4_RDOC_R jkl1_4_SPURCHASE_R jkl1_4_SRECEIVE_R jkl1_4_STRAVEL_R jkl1_4_SVOUCHER_R jkl1_4_TRAVEL_R jkl1_4_VOUCHER_R jkl1_5_ wr3_1_Name: wr3 wr3_2_Comment: name wr3_3_Member of groups: wr3_4_PDOC_R wr3_4_RDOC_R wr3_4_RDOC_RW wr3_4_RDOC_RWX wr3_4_SPURCHASE_R wr3_4_SRECEIVE_R wr3_4_SRECEIVE_RWX wr3_5_
And, finally, we get rid of all the prefixes, in all the lines of the list. Very easy with the simple S/R, below :
SEARCH
^.+_\d_
REPLACE
Leave EMPTY
So, we are left with :
Name: asd0 Comment: name Member of groups: PDOC_R RDOC_R VOUCHER_R Name: asdq Comment: name Member of groups: PDOC_R RDOC_R VOUCHER_R Name: ft3 Comment: name Member of groups: PDOC_R RDOC_R SORMPURCHASING_R SORMRECEIVING_R SORMVOUCHERS_R VOUCHER_R Name: grt Comment: name Member of groups: CONTRACT_R Lease_R PDOC_R RDOC_R SPURCHASE_R SRECEIVE_R SVOUCHER_R TRAVEL_R VOUCHER_R Name: jkl1 Comment: name Member of groups: CONTRACT_R PDOC_R RDOC_R SPURCHASE_R SRECEIVE_R STRAVEL_R SVOUCHER_R TRAVEL_R VOUCHER_R Name: wr3 Comment: name Member of groups: PDOC_R RDOC_R RDOC_RW RDOC_RWX SPURCHASE_R SRECEIVE_R SRECEIVE_RWX
Remark :
Just notice that, both :
-
The different groups are alphabetically sorted
-
The list of all the members, in each group, is also, alphabetically sorted
Cheers,
guy038
-
-
That was utter brilliance @guy038 . I, i can’t even. Hats off on that.
(This is not to say that the py script from @PeterJones is out the window, this will just help me convince folks that Notepad++ is the best.) :)+++Unrelated+++
So, is there a book, like NoStarchPress or something for Notepad++? How did you learn all these fancy bits for the regex? (its Perl?)
Python vs. Perl? (I’d like to learn one, or both. My aim is system administration/automation. Recommendations?)