Is there a plugin for deleting spaces in blocks of data?
-
I am working with log data taken from sqlplus from a remote server. Once I get a short number of rows for checking the information, I got columns really long which make it hard to read when you have many columns at the right.
I would like to delete all those white spaces and also the dashes between the column names in order to make it easier for reading.
I was checking the TextFX v0.26 plugin, but I can not see something which can achieve that automatically.I think it will be easier to understand my goal, watching this image.
I posted the question in stackoverflow as well, in case you want to earn something on that site.
https://stackoverflow.com/q/48384421/7573996PS: I know I can set the column length from sqlplus with a code like this:
column c1 heading “Column_Name” format a15
but I want to achieve that from notepad++ -
I get the impression that your “columns” should be reformatted based on the “header line”.
If this is the case, then a python script, like this, could do what you want
# how should the columns be separated separator = ' ' # create a list of lines lines = editor.getText().splitlines() # create a list with the length(width) of each column format_param = [len(x) for x in lines[0].split()] # create the width formats width_per_column = {'width%s'%k:v if v > 0 else 1 for k, v in enumerate(format_param)} # create the necessary line formatter format_specifier = ''.join(['{:{width%s}}%s' % (x,separator) for x in range(len(format_param))]) # get rid of the last separator format_specifier = format_specifier[:-len(separator)] # build the reformatted text as a list new_lines = [format_specifier.format(*line.split(), **width_per_column) for line in lines] # set undo start point editor.beginUndoAction() # write the reformatted text editor.setText('\r\n'.join(new_lines)) # set undo end point editor.endUndoAction()
To make this work you need to
- install the python script plugin,
- create a new script,
- give it a meaningful name
- paste the code from above into it
- save it
- open your data
- execute the script
Cheers
Claudia -
Hi @Claudia-Frank .
This is the best approach so far. Thank you so much for your time.
However, it is just working when there are no spaces in the column values. If the column values are longer than the column name length, unfortunately it shows the information wrongly :(
But again, thank you very much for your time, it shows me I should prepare the script manually. -
we could create a python script which resizes the column based on the
longest value and takes missing values into account if you want,
but if you want to go another way I’m fine with it as well :-)Cheers
Claudia -
Hi @Claudia-Frank !
It should be great if you can create the python script for achieving this!
Could you please help me with that?
This should be your test data:Kindly remove the first character in all the data: {
{COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
{--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
{ 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
{ 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676And this is the desired output value:
{COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
{--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
{ 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
{ 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676Thank you very much for your time!
-
Just to make sure if I understand correctly, what needs to be done
{COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- ----------- { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975 { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676
remove { from each line
read dash line (second line) to see where each column starts and ends
read all lines (except second) to see which column can be shortened
reformat textCOLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO --------- --------- --------- --------- --------- ----------------------------- ------------------ ------- 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676
Is my understanding correct?
If so, I try to create a script which would do this.Cheers
Claudia -
if I understood correctly, then this script should do what you are looking for.
I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.from Npp import editor # create list of lines from whole text lines = editor.getText().splitlines() # column width calculated based on dash line (second line) max_width = [len(x)+1 for x in lines[1][1:].split()] # start calulating needed width per column # header line start = 1 column_width = [] for width in max_width: column_width.append(len(lines[0][start:start+width].strip())) start += width # skip second line # rest of lines for line in lines[2:]: start = 1 _l = [] for width in max_width: _l.append(len(line[start:start+width].strip())) start += width column_width = [max(x,y) for x,y in zip(column_width,_l)] # start reformat process def reformat_line(_line): tmp_lines = [] start = 1 for i, width in enumerate(max_width): cell = _line[start:start+width] if cell.startswith(' '): cell = cell.strip() cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell ) else: cell = cell.strip() cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell))) tmp_lines.append(cell) start += width return tmp_lines new_lines = [] # header line new_lines.append(' '.join(reformat_line(lines[0]))) # secodn line = dash line new_lines.append(' '.join(['-'*x for x in column_width])) # rest of the lines for line in lines[2:]: new_lines.append(' '.join(reformat_line(line))) # set undo start point if something goes wrong editor.beginUndoAction() # set new text editor.setText('\r\n'.join(new_lines)) # set end undo point editor.endUndoAction() # cleanup del lines del new_lines
Cheers
Claudia -
Hello, @claudia-frank, and All,
I’ve just tried your last script version, and it’s just working fine ;-))
I did some tests with these two ranges of text :
{NAME PRODUCTNAME NAMES_OF_PRODUCTS ABCDE {----------------- ----------------- ----------------- ----- {FIRST FIRST FIRST 12345 {SECOND SECOND SECOND 12345 {THIRD THIRD THIRD 12345 {FOURTH FOURTH FOURTH 12345 {FIFTH FIFTH FIFTH 12345 {SIXTH SIXTH SIXTH 12345 {SEVENTH SEVENTH SEVENTH 12345
and :
{ VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE {----------------- ----------------- ----------------- ----- { 1 1 1 12345 { 12 12 12 12345 { 123 123 123 12345 { 1234 1234 1234 12345 { 12345 12345 12345 12345 { 123456 123456 123456 12345 { 1234567 1234567 1234567 12345
Each column is 17 characters wide :
-
In the first column, some column values are greater than the header value
-
In the second column the header value is greater than all column values
-
In the third column, the header value is, exactly, the width of the column
-
The fourth column is, simply, a pre-formatted column
After running your script, we obtain the two results, below :
NAME PRODUCTNAME NAMES_OF_PRODUCTS ABCDE ------- ----------- ----------------- ----- FIRST FIRST FIRST 12345 SECOND SECOND SECOND 12345 THIRD THIRD THIRD 12345 FOURTH FOURTH FOURTH 12345 FIFTH FIFTH FIFTH 12345 SIXTH SIXTH SIXTH 12345 SEVENTH SEVENTH SEVENTH 12345
and
VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE ------- ------------ ----------------- ----- 1 1 1 12345 12 12 12 12345 123 123 123 12345 1234 1234 1234 12345 12345 12345 12345 12345 123456 123456 123456 12345 1234567 1234567 1234567 12345
Remark : Quite important to notice that your list must begin your file, with the header values in line
1
and the dash ranges in line2
!! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line2
would be sensible ?Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.
So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !
Cheers,
guy038
-
-
Guy, thank you for taking the time to test this script.
Yes, regardless which char is the first one, it will be ignored. :-)
This is achieved by the three instances ofstart = 1
and the slicing [1:] done here
max_width = [len(x)+1 for x in lines[1][1:].split()]
So in order to have a script which doesn’t check for the first char
one needs to replace start=1 with start=0 and remove the slice part from
max_width calculation code likemax_width = [len(x)+1 for x in lines[1].split()]
That’s it.
Concerning the dash line, yes, we could add a check but does it really makes sense?
The line itself must not contain any dash sign at all, it could be even mixed chars.
The only must have is that this is the line which specifies the width of each column
by using chars separated by a space. Whether this line full fills the requirement
can’t be really checked.In order to have a more general script which reduces the amount of used spaces
we need to ask for three parameters I guess.- Which line should be used to calculate the column width
- Which char is used to separate the columns
- Is there a need to ignore some chars at the beginning of EACH line
What do you think? Something I forgot?
Cheers
Claudia -
Hi, @claudia-frank, and All,
Great, indeed ! Changing the three lines :
start = 1
with
start = 0
and the line :
max_width = [len(x)+1 for x in lines[1][1:].split()]
with
max_width = [len(x)+1 for x in lines[1].split()]
does the job :-))
Now, I noticed a nice side_effect of this new script ! Assuming the text, below :
Text to be preserved ! NAME PRODUCTNAME NAMES_OF_PRODUCTS ABCDE Text to be preserved ! ----------------- ----------------- ----------------- ----- Text to be preserved ! FIRST FIRST FIRST 12345 Text to be preserved ! SECOND SECOND SECOND 12345 Text to be preserved ! THIRD THIRD THIRD 12345 Text to be preserved ! FOURTH FOURTH FOURTH 12345 Text to be preserved ! FIFTH FIFTH FIFTH 12345 Text to be preserved ! SIXTH SIXTH SIXTH 12345 Text to be preserved ! SEVENTH SEVENTH SEVENTH 12345
This new script version gives, automatically, the text :
Text to be preserved ! NAME PRODUCTNAME NAMES_OF_PRODUCTS ABCDE ---- -- -- --------- - ------- ----------- ----------------- ----- Text to be preserved ! FIRST FIRST FIRST 12345 Text to be preserved ! SECOND SECOND SECOND 12345 Text to be preserved ! THIRD THIRD THIRD 12345 Text to be preserved ! FOURTH FOURTH FOURTH 12345 Text to be preserved ! FIFTH FIFTH FIFTH 12345 Text to be preserved ! SIXTH SIXTH SIXTH 12345 Text to be preserved ! SEVENTH SEVENTH SEVENTH 12345
Wow ! Just note how the beginning of line
2
is modified !!and the same for right-justified text :
Text to be preserved ! VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE Text to be preserved ! ----------------- ----------------- ----------------- ----- Text to be preserved ! 1 1 1 12345 Text to be preserved ! 12 12 12 12345 Text to be preserved ! 123 123 123 12345 Text to be preserved ! 1234 1234 1234 12345 Text to be preserved ! 12345 12345 12345 12345 Text to be preserved ! 123456 123456 123456 12345 Text to be preserved ! 1234567 1234567 1234567 12345
which is changed as below :
Text to be preserved ! VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE ---- -- -- --------- - ------- ------------ ----------------- ----- Text to be preserved ! 1 1 1 12345 Text to be preserved ! 12 12 12 12345 Text to be preserved ! 123 123 123 12345 Text to be preserved ! 1234 1234 1234 12345 Text to be preserved ! 12345 12345 12345 12345 Text to be preserved ! 123456 123456 123456 12345 Text to be preserved ! 1234567 1234567 1234567 12345
So, your point
3
is, implicitly, realized ! No need for any change ;-))
Now, I think that point
1
, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say,46
NAME PRODUCTNAME NAMES_OF_PRODUCTS ABCDE %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%% FIRST FIRST FIRST 12345 SECOND SECOND SECOND 12345 THIRD THIRD THIRD 12345 FOURTH FOURTH FOURTH 12345 FIFTH FIFTH FIFTH 12345 SIXTH SIXTH SIXTH 12345 SEVENTH SEVENTH SEVENTH 12345
If you tell the script that the “key” line is the line
47
, you, automatically know that the user character is the%
symbol :-)) Of course, this symbol must not be a space character !Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.
If a second list occurs, afterwards, the user just has to re-run your script, telling about the number
n
of the new line of symbols. So, this second list would begin at linen-1
and end at the last line, before a true empty lineBest Regards,
guy038
-
Hi Guy,
I had something different in mind but must admit I wasn’t very clear.
But this revealed that there might be even more questions/features waiting.What I had in mind was something like this
Text I would to get rid of 1 1 1 1 1 1 Text I would to get rid of 22 22 22 22 22 22 Text I would to get rid of 333 333 333 333 333 333 Text I would to get rid of 4444 44444 4444 44444 4444 44444 Text I would to get rid of 55555 5555 55555 5555 55555 5555 Text I would to get rid of 6666 666 6666 666 6666 666 Text I would to get rid of 777 77 777 77 777 77 Text I would to get rid of 88 8 88 8 88 8 Text I would to get rid of 9 9999 9 9999 9 9999 Text I would to get rid of ####### ###### ####### ###### ####### ###### Text I would to get rid of 1 1 1 1 1 1 Text I would to get rid of 22 22 22 22 22 22 Text I would to get rid of 333 333 333 333 333 333 Text I would to get rid of 4444 44444 4444 44444 4444 44444 Text I would to get rid of 55555 5555 55555 5555 55555 5555 Text I would to get rid of 6666 666 6666 666 6666 666 Text I would to get rid of 777 77 777 77 777 77 Text I would to get rid of 88 8 88 8 88 8 Text I would to get rid of 9 9999 9 9999 9 9999 Text I would to get rid of ####### ###### ####### ###### ####### ###### . . .
So there is no real header, just some raw data and then suddenly (line 10)
some divider which could be used to calculate the column width in this case.in this case I would be looking for a result like this
1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999 ##### ##### ##### ##### ##### ##### 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999 ##### ##### ##### ##### ##### ##### . . .
(not that it is readable anymore - but …)
Your idea, of having text in between text that should
be reformatted without changing the other text (WHATT??? text always text)
could be added if we consider selections.So if nothing is selected, whole document should be reformatted
but if user has selected something, only this should be reformatted.I guess if we still think about it, we can even find more ways of
reformatting text in other different ways :-)Cheers
Claudia -
Hi, @claudia-frank, and All,
First, I would object that, given your recent example text, where I suppressed the common text, at beginning :
ABCDEFG ABCDEF ABCDEFG ABCDEF ABCDEFG ABCDEF ------- ------ ------- ------ ------- ------ 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999
Your script changes it into the form, below :
ABCDEFG ABC DEF ABCDEFG ABC DEF ------- --- --- ------- --- --- 1 1 1 1 22 22 22 22 333 333 333 333 4444 44 444 4444 44 444 55555 5 555 55555 5 555 6666 666 6666 666 777 77 777 77 88 8 88 8 9 9 999 9 9 999 1 1 1 1 22 22 22 22 333 333 333 333 4444 44 444 4444 44 444 55555 5 555 55555 5 555 6666 666 6666 666 777 77 777 77 88 8 88 8 9 9 999 9 9 999
Not exactly what it’s expected, isn’t it ?
I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !
So, after modifying the initial text, as below :
ABC ABC ABC ABC ABC ABC ------- ------ ------- ------ ------- ------ 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999
Your present script does give, as expected, the text :
ABC ABC ABC ABC ABC ABC ----- ----- ----- ----- ----- ----- 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999 1 1 1 1 1 1 22 22 22 22 22 22 333 333 333 333 333 333 4444 44444 4444 44444 4444 44444 55555 5555 55555 5555 55555 5555 6666 666 6666 666 6666 666 777 77 777 77 777 77 88 8 88 8 88 8 9 9999 9 9999 9 9999
Great !
Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !
Now, regarding the text to get rid of, at beginning of each line, just ask for the number
n
of characters to delete. We already know how to do ;-))- Change the
3
lines :
start = n # instead of start = 0
and the line :
max_width = [len(x)+1 for x in lines[1][n:].split()] # instead of max_width = [len(x)+1 for x in lines[1].split()]
Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !
Cheers,
guy038
- Change the
-
Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !
hehe, me too until I found out that there is a nice call
editor.getUserLineSelection()
which returns the start and end line number of the selected text and instead there isn’t something selected,
it returns the start and end line number of the whole text. So, regardless what the user does, a singlestart_line, end_line = editor.getUserLineSelection()
returns what is needed. :-)
I have to admit, I didn’t test the code with my text - just used it for illustration.
But you are right, the critical part in the script is to calculate the column width correctly,
the rest is just reformat what you already have.Cheers
Claudia -
Hi all!
Thank you very much for your support on this. I really appreciate your effort during the weekend.
I already tested with the Claudia’s script, and adding as well thestart = 0
changes and deleting the[1:]
and it works really great.
I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
Thank you really so much again to you two, and happy coding!
=)