Column-aligning jagged data
-
Note, doing an involved regex replacement (with potentially lots of precalculation) isn’t what I’m after here, sorry @guy038
Tab characters aren’t in play either.
-
Personally, this looks like a question begging for a scripting environment solution, not a pure-NPP solution.
I’m sure @Claudia-Frank could whip up the PythonScript solution in no time.
Since I’m not a python guru, I’d choose my go-to language of Perl, and send it through a one-liner:
perl -lpi.bak -e "$_ = sprintf qq(%-256s %s), split ' ', $_, 2" $(FULL_CURRENT_PATH)
(It will run in NppExec or even through
Run > Run
)That oneliner assumes the first word is never more than 256 characters; you can use any width you want by replacing
%-256s
. (Nifty aside: if you don’t know the maximum length of the first-word strings, but didn’t care about characters beyond N [say, for example, 40 characters], you could use%-40.40s
, which would then pad to 40characters when the first word is short, and truncate to 40 if it’s longer than 40.)That word-length assumption possibly violates your rejection of “precalculation”, but I am not a miracle worker, sorry; that’s why I picked 256 – I doubted there would be a “word” more than that, except in binary or genetic data. I could have made a two-pass script, but depending on file size, that might involve a lot of memory. I could have opened/parsed the file twice to avoid keeping it in memory and to auto-pre-compute the maximum width of the first word. But since it’s unlikely a random NPP user has perl installed, I won’t bother with a more-complicated perl solution, unless you ask for it.
(Golf challenge: I’d like to see if Claudia, or other python guru, could make the Python Script or python.exe one-liner shorter…)
-
I think I can’t make it shorter ;-)
editor.setText('\n'.join(['{:<20} {}'.format(*x.split()) for x in editor.getText().splitlines()]))
whereas 20 is assumed to be the length of the longest string in first column.
Cheers
Claudia -
Hi, @alan-kilborn, @claudia-frank, @peterjones, and All,
Nevertheless, it’s quite simple, indeed !! I propose to you 3 different regex S/R :
SEARCH
^.{12}\K +
, with a space before the plus signREPLACE
EMPTY
or
SEARCH
(?<=^.{12}) +
, with a space before the plus signREPLACE
EMPTY
or
SEARCH
^(.{12}) +
, with a space before the plus signREPLACE
\1
Notes :
-
For the first two S/R, you must use the Replace All button only ( The step by step replacement does NOT work, due to the
\K
syntax or the look-behind ) -
The last S/R accept hitting on the Replace button, too !
-
Note that these regexes need that the blank character, is, exclusively, the space character !
Now, Alan, let’s try something more tricky : I simply copy all your list again, on the right, using the column mode !
----|----1----|----2----|----3----|----4----|----5----|----6----|----7----|----8----|----9----|----A----|----B trade Ground trade Ground list Cry list Cry free print free print Told Supply Told Supply square stood square stood metal do metal do held shine held shine large boy large boy map table map table book car book car process also process also thank young thank young held if held if ship atom ship atom Have game Have game thousand strong thousand strong case most case most head Tube head Tube those wait those wait sudden triangle sudden triangle while feed while feed human order human order paint sight paint sight mouth rope mouth rope Hair suffix Hair suffix want this want this hot salt hot salt call house call house similar experiment similar experiment count rub count rub quite won't quite won't opposite no opposite no note low note low process term process term to Fine to Fine Solution Season Solution Season band block band block among direct among direct who These who These between sugar between sugar ice leg ice leg took symbol took symbol between Leg between Leg Design Share Design Share quotient segment quotient segment
Then :
-
Place your cursor just, under the ruler and before the first item trade
-
Open the Replace dialog
-
Leave the Replace with: zone
EMPTY
-
Type, in the Find what: zone, the regex
(?-s)^.{12}\K +
, with a space before the plus sign -
Click on the Replace All button
=> The second column is aligned :-)) Of course, the third and fourth ones are not aligned
-
Now, change the number
12
by the number27
, in the Find what: zone -
Click, again, on the Replace All button
=> The third column is now aligned :-))
-
Now, change the number
27
by43
, in the Find what: zone -
Click, a last time, on the Replace All button
=> All the columns are well aligned…, as below. Et voilà ! Note that the columns begin at positions
12+1
,27+1
and43+1
----|----1----|----2----|----3----|----4----|----5----|----6----|----7----|----8----|----9----|----A----|----B trade Ground trade Ground list Cry list Cry free print free print Told Supply Told Supply square stood square stood metal do metal do held shine held shine large boy large boy map table map table book car book car process also process also thank young thank young held if held if ship atom ship atom Have game Have game thousand strong thousand strong case most case most head Tube head Tube those wait those wait sudden triangle sudden triangle while feed while feed human order human order paint sight paint sight mouth rope mouth rope Hair suffix Hair suffix want this want this hot salt hot salt call house call house similar experiment similar experiment count rub count rub quite won't quite won't opposite no opposite no note low note low process term process term to Fine to Fine Solution Season Solution Season band block band block among direct among direct who These who These between sugar between sugar ice leg ice leg took symbol took symbol between Leg between Leg Design Share Design Share quotient segment quotient segment
Of course, I just evaluated, roughly, at each step, where the next column should begin, according to the longest string of the previous column. I don’t know, Alan, if you consider this way as a lot of pre-calculation steps !!
Cheers,
guy038
-
-
I will play reverse-golf and make @Claudia-Frank 's version longer but IMO better…and still one line:
editor.setText(['\r\n', '\r', '\n'][notepad.getFormatType()].join([('{:<' + str(editor.getColumn(editor.getCurrentPos())-1) + '} {}').format(*x.split()) for x in editor.getText().splitlines()]))
Two changes:
- do correct line-endings, not Linux–sorry Claudia!–line-endings
- start the aligned data in the column the caret is in when the script is run (be sure to leave the caret in a column greater than the longest entry in the leftmost data “column”!)
-
I deserve what I get because I didn’t quite ask in the right way. I was sort of looking for the solution to the general case. But in presenting example text I got specific answers to solve that specific thing (2 columns, whole file). Don’t get me wrong, the answers I got were awesome!–thanks to responders! Good ideas, all!
Of the answers I think Scott’s (put caret in column…and then run script) starts getting at the interactivity I was hoping for. Another clarifying situation might be what if I want this to only affect certain lines, or only after a certain column point on specific lines…
So I guess the main answer is something like this is best served by scripting, although in the end I did like Guy’s regexes (although i did try to head off his enthusiasm for them with my earlier post).
-
Hi, @alan-kilborn,
Another clarifying situation might be what if I want this to only affect certain lines, or only after a certain column point on specific lines…
-
Concerning the possibility to change text, after a specific column point
c
, simply use the regex^.{c+ε}\K\x20+
-
Concerning reducing text changed to a specific block of lines, do a normal selection of your range of lines, first. So, when opening the Replace dialog, the In selection option is automatically ticked, and the Replace All operation is performed on the selection, only :-))
Cheers,
guy038
-
-
The old man wasn’t invited to the tournament. Nevertheless, he ambled over to the tee box and took a swing with an ancient wooden driver that has been meticulously maintained for more than 40 years:
gawk "{printf \"%-256s%s\n\",$1,$2}" $(FULL_CURRENT_PATH)
:-)
-
Somewhat tangential but possibly a solution is the Elastic Tabstops plugin. Its would only require a single tab between columns but has the disadvantage of only working within Notepad++ itself.
-
Neither was the simpleton invited to the tournament but he stumbled up to the tee and out from his bag fell a TextFX plugin and hideous python script that would make a crow blush:
# coding: iso-8859-1 selected = editor.getSelText() selStart = editor.getSelectionStart() #replace any existing commas with a weird char selected = selected.replace(",", chr(174)) #replace the double spaces while ( selected.find(" ") > 0 ): selected = selected.replace(" ", " ") #replace the spaces with commas since our 'line up' function uses commas selected = selected.replace(" ", ",") selEnd = len(selected) editor.replaceSel(selected) #re-select the selection editor.setSelectionStart(selStart) editor.setSelectionEnd(selStart + selEnd) notepad.runMenuCommand("TextFX Edit", "Line up multiple lines by (,)") notepad.runMenuCommand("TextFX Edit", "E:Line up multiple lines by (,)") selected = editor.getSelText() #take out the lineup commas selected = selected.replace(",", " ") #put back any original commas selected = selected.replace(chr(174), ",") editor.replaceSel(selected)
This works for any number of columns, and only on lines in the current selection. It makes the columns as narrow as possible. I’m not really sure how you would line up things after a certain column point though.
-
Hello, @cipher-1024, and All,
I’m thinking about an other solution, which still use the TextFX plugin but which avoids this [ hideous :-D ] Python Script !
- First, use the following regex S/R :
SEARCH
\x20+
REPLACE
\x60
Note : I, specially, chose the Unicode Grave Accent character (
U+0060
) , as a dummy character, because it is, both, rarely used in programming languages, ( AFAIK ! ) and part of all character encodings, as belonging to the international ASCII encoding ( from UnicodeU+0000
toU+007F
)-
Copy a single ` ( Grave Accent ) in the clipboard, hitting the
Ctrl + C
shortcut ( IMPORTANT ) -
Now, do a normal selection of the text, which is to be aligned
-
Click on the menu choice TextFX > TextFX Edit > Line up multiples lines by (Clipboard Character)
-
Finally, use the regex, below, to delete the dummy Grave Accent character ` and add some space characters between columns, with a possible delimiter character !
SEARCH
\x60
REPLACE
\x20\x20\x20
OR, for instance :
SEARCH
\x60
REPLACE
\x20\x20|\x20\x20
Cheers,
guy038
-
Other than “rarely used in programming languages,” I like that answer.
Perl uses a pair of Grave Accents (aka “backticks”) as an often-used alternate for the
qx//
quote-like syntax for running a shell command and placing the command’s output in a string.SQL uses backticks for denoting identifiers, such as field names.
Markdown uses it for embedding
inline
fixed width text, like:embedding `inline` fixed width text
But if you know your text has no backticks, then it’s a great choice.
If your data might have backticks, I would use U+001C (
\x1c
), the Field SeparatorFS
character, which is a control code found in ASCII. (I won’t make the claim that it’s “rarely used” in text files or programming language source code… but I’ve never seen it intentionally used in such. :-) )I think this style of solution meets the original requirements of not requiring complicated S/R regex or precomputing, which is nice.
-
Hi, @PeterJones and All,
So, I strongly apologize ! My programming skills are weaker than most N++ users’s ones :-D.
BTW, Peter, just have a look to the link, below :
https://en.wikipedia.org/wiki/C0_and_C1_control_codes
it seems, that the C0 Control character (
\x1C
) rather refers to the File Separator control character ! Anyway, your idea, about using a Control character, is great ! And, if we follow the description notes, it would be logical to prefer the US Control character\x1F
:-DCheers,
guy038
-
IMO the ultimate solution to the question I originally posed is found HERE.
-
I didn’t check the date of the original posting but did immediately say "that’s a job for
BetterMultiSelection
. It was very satisfying to be able to figure out how to solve that problem. (It took me a few attempts, but that’s why Ctrl-Z exists)Thank you to @Alan-Kilborn for a wonderful lesson. It really helped drive home @astrosofista’s examples.
-
…It really helped drive home @astrosofista’s examples
This was not my intention; perhaps you misunderstood.
I was linking directly to a posting, not the larger thread, for the awesome solution to the problem posed here in this thread.
The linked posting discusses using Ctrl+Delete, not any plugin.Plugins (including Better Multiselection) are great, but even better is when something available natively is the solution to something. And the Ctrl+Delete technique is available natively.