Is there a plugin for deleting spaces in blocks of data?

osg174

I am working with log data taken from sqlplus from a remote server. Once I get a short number of rows for checking the information, I got columns really long which make it hard to read when you have many columns at the right.
I would like to delete all those white spaces and also the dashes between the column names in order to make it easier for reading.
I was checking the TextFX v0.26 plugin, but I can not see something which can achieve that automatically.

I think it will be easier to understand my goal, watching this image.

I posted the question in stackoverflow as well, in case you want to earn something on that site.
https://stackoverflow.com/q/48384421/7573996

PS: I know I can set the column length from sqlplus with a code like this:
column c1 heading “Column_Name” format a15
but I want to achieve that from notepad++

Claudia Frank

@osg174

I get the impression that your “columns” should be reformatted based on the “header line”.

If this is the case, then a python script, like this, could do what you want

# how should the columns be separated
separator = ' '
# create a list of lines
lines = editor.getText().splitlines()
# create a list with the length(width) of each column
format_param = [len(x) for x in lines[0].split()]
# create the width formats
width_per_column = {'width%s'%k:v if v > 0 else 1 for k, v in enumerate(format_param)}
# create the necessary line formatter
format_specifier = ''.join(['{:{width%s}}%s' % (x,separator) for x in range(len(format_param))])
# get rid of the last separator
format_specifier = format_specifier[:-len(separator)]
# build the reformatted text as a list
new_lines = [format_specifier.format(*line.split(), **width_per_column) for line in lines]

# set undo start point
editor.beginUndoAction()
# write the reformatted text
editor.setText('\r\n'.join(new_lines))
# set undo end point
editor.endUndoAction()

To make this work you need to

install the python script plugin,
create a new script,
give it a meaningful name
paste the code from above into it
save it
open your data
execute the script

Cheers
Claudia

osg174

Hi @Claudia-Frank .
This is the best approach so far. Thank you so much for your time.
However, it is just working when there are no spaces in the column values. If the column values are longer than the column name length, unfortunately it shows the information wrongly :(
But again, thank you very much for your time, it shows me I should prepare the script manually.

Claudia Frank

@osg174

we could create a python script which resizes the column based on the
longest value and takes missing values into account if you want,
but if you want to go another way I’m fine with it as well :-)

Cheers
Claudia

osg174

Hi @Claudia-Frank !
It should be great if you can create the python script for achieving this!
Could you please help me with that?
This should be your test data:

Kindly remove the first character in all the data: {

{COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
{--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
{ 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
{ 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

And this is the desired output value:

{COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
{--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
{ 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
{ 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

Thank you very much for your time!

Claudia Frank

@osg174

Just to make sure if I understand correctly, what needs to be done

{COLUMN_01 COLUMN_02 STARTDATE ENDDATE    PRODUCTID PRODUCTNAME                     PRODUCTDESCRIPTION               SEQNO
{--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
{    38826     14757                            134 PRODUCT NAME VALUE FIRST ROW    1 DESCRIPTION                    32975
{    38826     14757 01-DEC-17 31-JAN-20        114 PRODUCT NAME VALUE SECOND ROW   2 DESCRIPTION                  1043676

remove { from each line
read dash line (second line) to see where each column starts and ends
read all lines (except second) to see which column can be shortened
reformat text

COLUMN_01 COLUMN_02 STARTDATE ENDDATE   PRODUCTID PRODUCTNAME                   PRODUCTDESCRIPTION   SEQNO
--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
    38826     14757                           134 PRODUCT NAME VALUE FIRST ROW  1 DESCRIPTION        32975
    38826     14757 01-DEC-17 31-JAN-20       114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION      1043676

Is my understanding correct?
If so, I try to create a script which would do this.

Cheers
Claudia

Claudia Frank

@osg174

if I understood correctly, then this script should do what you are looking for.
I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.

from Npp import editor

# create list of lines from whole text
lines = editor.getText().splitlines()

# column width calculated based on dash line (second line)
max_width = [len(x)+1 for x in lines[1][1:].split()]
 
# start calulating needed width per column
# header line
start = 1
column_width = []
for width in max_width:
    column_width.append(len(lines[0][start:start+width].strip()))
    start += width

# skip second line
# rest of lines
for line in lines[2:]:
    start = 1
    _l = []
    for width in max_width:
        _l.append(len(line[start:start+width].strip()))
        start += width
    column_width = [max(x,y) for x,y in zip(column_width,_l)]
    
# start reformat process

def reformat_line(_line):
    tmp_lines = []
    start = 1
    for i, width in enumerate(max_width):
        cell = _line[start:start+width]
        if cell.startswith(' '):
            cell = cell.strip()
            cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell )
        else:
            cell = cell.strip()
            cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell)))
        tmp_lines.append(cell)
        start += width    
    return tmp_lines

new_lines = []
# header line
new_lines.append(' '.join(reformat_line(lines[0])))

# secodn line = dash line
new_lines.append(' '.join(['-'*x for x in column_width]))

# rest of the lines
for line in lines[2:]:   
    new_lines.append(' '.join(reformat_line(line)))

# set undo start point if something goes wrong
editor.beginUndoAction()
# set new text
editor.setText('\r\n'.join(new_lines))
# set end undo point
editor.endUndoAction()

# cleanup
del lines
del new_lines

Cheers
Claudia

guy038

Hello, @claudia-frank, and All,

I’ve just tried your last script version, and it’s just working fine ;-))

I did some tests with these two ranges of text :

{NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
{----------------- ----------------- ----------------- -----
{FIRST             FIRST             FIRST             12345
{SECOND            SECOND            SECOND            12345
{THIRD             THIRD             THIRD             12345
{FOURTH            FOURTH            FOURTH            12345
{FIFTH             FIFTH             FIFTH             12345
{SIXTH             SIXTH             SIXTH             12345
{SEVENTH           SEVENTH           SEVENTH           12345

and :

{            VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
{----------------- ----------------- ----------------- -----
{                1                 1                 1 12345
{               12                12                12 12345
{              123               123               123 12345
{             1234              1234              1234 12345
{            12345             12345             12345 12345
{           123456            123456            123456 12345
{          1234567           1234567           1234567 12345

Each column is 17 characters wide :

In the first column, some column values are greater than the header value
In the second column the header value is greater than all column values
In the third column, the header value is, exactly, the width of the column
The fourth column is, simply, a pre-formatted column

After running your script, we obtain the two results, below :

NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
------- ----------- ----------------- -----
FIRST   FIRST       FIRST             12345
SECOND  SECOND      SECOND            12345
THIRD   THIRD       THIRD             12345
FOURTH  FOURTH      FOURTH            12345
FIFTH   FIFTH       FIFTH             12345
SIXTH   SIXTH       SIXTH             12345
SEVENTH SEVENTH     SEVENTH           12345

and

  VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
------- ------------ ----------------- -----
      1            1                 1 12345
     12           12                12 12345
    123          123               123 12345
   1234         1234              1234 12345
  12345        12345             12345 12345
 123456       123456            123456 12345
1234567      1234567           1234567 12345

Remark : Quite important to notice that your list must begin your file, with the header values in line 1 and the dash ranges in line 2 !! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line 2 would be sensible ?

Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.

So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !

Cheers,

guy038

Claudia Frank

@guy038

Guy, thank you for taking the time to test this script.
Yes, regardless which char is the first one, it will be ignored. :-)
This is achieved by the three instances of

start = 1

and the slicing [1:] done here

max_width = [len(x)+1 for x in lines[1][1:].split()]

So in order to have a script which doesn’t check for the first char
one needs to replace start=1 with start=0 and remove the slice part from
max_width calculation code like

max_width = [len(x)+1 for x in lines[1].split()]

That’s it.

Concerning the dash line, yes, we could add a check but does it really makes sense?
The line itself must not contain any dash sign at all, it could be even mixed chars.
The only must have is that this is the line which specifies the width of each column
by using chars separated by a space. Whether this line full fills the requirement
can’t be really checked.

In order to have a more general script which reduces the amount of used spaces
we need to ask for three parameters I guess.

Which line should be used to calculate the column width
Which char is used to separate the columns
Is there a need to ignore some chars at the beginning of EACH line

What do you think? Something I forgot?

Cheers
Claudia

guy038

Hi, @claudia-frank, and All,

Great, indeed ! Changing the three lines :

start = 1

with

start = 0

and the line :

max_width = [len(x)+1 for x in lines[1][1:].split()]

with

max_width = [len(x)+1 for x in lines[1].split()]

does the job :-))

Now, I noticed a nice side_effect of this new script ! Assuming the text, below :

Text to be preserved ! NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
Text to be preserved ! ----------------- ----------------- ----------------- -----
Text to be preserved ! FIRST             FIRST             FIRST             12345
Text to be preserved ! SECOND            SECOND            SECOND            12345
Text to be preserved ! THIRD             THIRD             THIRD             12345
Text to be preserved ! FOURTH            FOURTH            FOURTH            12345
Text to be preserved ! FIFTH             FIFTH             FIFTH             12345
Text to be preserved ! SIXTH             SIXTH             SIXTH             12345
Text to be preserved ! SEVENTH           SEVENTH           SEVENTH           12345

This new script version gives, automatically, the text :

Text to be preserved ! NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
---- -- -- --------- - ------- ----------- ----------------- -----
Text to be preserved ! FIRST   FIRST       FIRST             12345
Text to be preserved ! SECOND  SECOND      SECOND            12345
Text to be preserved ! THIRD   THIRD       THIRD             12345
Text to be preserved ! FOURTH  FOURTH      FOURTH            12345
Text to be preserved ! FIFTH   FIFTH       FIFTH             12345
Text to be preserved ! SIXTH   SIXTH       SIXTH             12345
Text to be preserved ! SEVENTH SEVENTH     SEVENTH           12345

Wow ! Just note how the beginning of line 2 is modified !!

and the same for right-justified text :

Text to be preserved !             VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
Text to be preserved ! ----------------- ----------------- ----------------- -----
Text to be preserved !                 1                 1                 1 12345
Text to be preserved !                12                12                12 12345
Text to be preserved !               123               123               123 12345
Text to be preserved !              1234              1234              1234 12345
Text to be preserved !             12345             12345             12345 12345
Text to be preserved !            123456            123456            123456 12345
Text to be preserved !           1234567           1234567           1234567 12345

which is changed as below :

Text to be preserved !   VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
---- -- -- --------- - ------- ------------ ----------------- -----
Text to be preserved !       1            1                 1 12345
Text to be preserved !      12           12                12 12345
Text to be preserved !     123          123               123 12345
Text to be preserved !    1234         1234              1234 12345
Text to be preserved !   12345        12345             12345 12345
Text to be preserved !  123456       123456            123456 12345
Text to be preserved ! 1234567      1234567           1234567 12345

So, your point 3 is, implicitly, realized ! No need for any change ;-))

Now, I think that point 1, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say, 46

NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%
FIRST             FIRST             FIRST             12345
SECOND            SECOND            SECOND            12345
THIRD             THIRD             THIRD             12345
FOURTH            FOURTH            FOURTH            12345
FIFTH             FIFTH             FIFTH             12345
SIXTH             SIXTH             SIXTH             12345
SEVENTH           SEVENTH           SEVENTH           12345

If you tell the script that the “key” line is the line 47, you, automatically know that the user character is the % symbol :-)) Of course, this symbol must not be a space character !

Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.

If a second list occurs, afterwards, the user just has to re-run your script, telling about the number n of the new line of symbols. So, this second list would begin at line n-1 and end at the last line, before a true empty line

Best Regards,

guy038

Claudia Frank

@guy038

Hi Guy,

I had something different in mind but must admit I wasn’t very clear.
But this revealed that there might be even more questions/features waiting.

What I had in mind was something like this

Text I would to get rid of 1                1     1                1     1                1
Text I would to get rid of 22              22     22              22     22              22
Text I would to get rid of 333            333     333            333     333            333
Text I would to get rid of 4444         44444     4444         44444     4444         44444
Text I would to get rid of 55555         5555     55555         5555     55555         5555
Text I would to get rid of 6666           666     6666           666     6666           666
Text I would to get rid of 777             77     777             77     777             77
Text I would to get rid of 88               8     88               8     88               8
Text I would to get rid of 9             9999     9             9999     9             9999
Text I would to get rid of #######     ######     #######     ######     #######     ######
Text I would to get rid of 1                1     1                1     1                1
Text I would to get rid of 22              22     22              22     22              22
Text I would to get rid of 333            333     333            333     333            333
Text I would to get rid of 4444         44444     4444         44444     4444         44444
Text I would to get rid of 55555         5555     55555         5555     55555         5555
Text I would to get rid of 6666           666     6666           666     6666           666
Text I would to get rid of 777             77     777             77     777             77
Text I would to get rid of 88               8     88               8     88               8
Text I would to get rid of 9             9999     9             9999     9             9999
Text I would to get rid of #######     ######     #######     ######     #######     ######
.
.
.

So there is no real header, just some raw data and then suddenly (line 10)
some divider which could be used to calculate the column width in this case.

in this case I would be looking for a result like this

1         1 1         1 1         1
22       22 22       22 22       22
333     333 333     333 333     333
4444  44444 4444  44444 4444  44444
55555  5555 55555  5555 55555  5555
6666    666 6666    666 6666    666
777      77 777      77 777      77
88        8 88        8 88        8
9      9999 9      9999 9      9999
##### ##### ##### ##### ##### #####
1         1 1         1 1         1
22       22 22       22 22       22
333     333 333     333 333     333
4444  44444 4444  44444 4444  44444
55555  5555 55555  5555 55555  5555
6666    666 6666    666 6666    666
777      77 777      77 777      77
88        8 88        8 88        8
9      9999 9      9999 9      9999
##### ##### ##### ##### ##### #####
.
.
.

(not that it is readable anymore - but …)

Your idea, of having text in between text that should
be reformatted without changing the other text (WHATT??? text always text)
could be added if we consider selections.

So if nothing is selected, whole document should be reformatted
but if user has selected something, only this should be reformatted.

I guess if we still think about it, we can even find more ways of
reformatting text in other different ways :-)

Cheers
Claudia

guy038

Hi, @claudia-frank, and All,

First, I would object that, given your recent example text, where I suppressed the common text, at beginning :

ABCDEFG     ABCDEF     ABCDEFG     ABCDEF     ABCDEFG     ABCDEF
-------     ------     -------     ------     -------     ------
1                1     1                1     1                1
22              22     22              22     22              22
333            333     333            333     333            333
4444         44444     4444         44444     4444         44444
55555         5555     55555         5555     55555         5555
6666           666     6666           666     6666           666
777             77     777             77     777             77
88               8     88               8     88               8
9             9999     9             9999     9             9999
1                1     1                1     1                1
22              22     22              22     22              22
333            333     333            333     333            333
4444         44444     4444         44444     4444         44444
55555         5555     55555         5555     55555         5555
6666           666     6666           666     6666           666
777             77     777             77     777             77
88               8     88               8     88               8
9             9999     9             9999     9             9999

Your script changes it into the form, below :

ABCDEFG ABC DEF ABCDEFG ABC DEF
------- --- --- ------- --- ---
1             1 1             1
22           22 22           22
333         333 333         333
4444     44 444 4444     44 444
55555     5 555 55555     5 555
6666        666 6666        666
777          77 777          77
88            8 88            8
9         9 999 9         9 999
1             1 1             1
22           22 22           22
333         333 333         333
4444     44 444 4444     44 444
55555     5 555 55555     5 555
6666        666 6666        666
777          77 777          77
88            8 88            8
9         9 999 9         9 999

Not exactly what it’s expected, isn’t it ?

I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !

So, after modifying the initial text, as below :

ABC        ABC ABC        ABC ABC        ABC
------- ------ ------- ------ ------- ------
1            1 1            1 1            1
22          22 22          22 22          22
333        333 333        333 333        333
4444     44444 4444     44444 4444     44444
55555     5555 55555     5555 55555     5555
6666       666 6666       666 6666       666
777         77 777         77 777         77
88           8 88           8 88           8
9         9999 9         9999 9         9999
1            1 1            1 1            1
22          22 22          22 22          22
333        333 333        333 333        333
4444     44444 4444     44444 4444     44444
55555     5555 55555     5555 55555     5555
6666       666 6666       666 6666       666
777         77 777         77 777         77
88           8 88           8 88           8
9         9999 9         9999 9         9999

Your present script does give, as expected, the text :

ABC     ABC ABC     ABC ABC     ABC
----- ----- ----- ----- ----- -----
1         1 1         1 1         1
22       22 22       22 22       22
333     333 333     333 333     333
4444  44444 4444  44444 4444  44444
55555  5555 55555  5555 55555  5555
6666    666 6666    666 6666    666
777      77 777      77 777      77
88        8 88        8 88        8
9      9999 9      9999 9      9999
1         1 1         1 1         1
22       22 22       22 22       22
333     333 333     333 333     333
4444  44444 4444  44444 4444  44444
55555  5555 55555  5555 55555  5555
6666    666 6666    666 6666    666
777      77 777      77 777      77
88        8 88        8 88        8
9      9999 9      9999 9      9999

Great !

Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !

Now, regarding the text to get rid of, at beginning of each line, just ask for the number n of characters to delete. We already know how to do ;-))

Change the 3 lines :

start = n   #  instead of start = 0

and the line :

max_width = [len(x)+1 for x in lines[1][n:].split()]   #  instead of max_width = [len(x)+1 for x in lines[1].split()]

Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !

Cheers,

guy038

Claudia Frank

@guy038

Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !

hehe, me too until I found out that there is a nice call

 editor.getUserLineSelection()

which returns the start and end line number of the selected text and instead there isn’t something selected,
it returns the start and end line number of the whole text. So, regardless what the user does, a single

start_line, end_line = editor.getUserLineSelection()

returns what is needed. :-)

I have to admit, I didn’t test the code with my text - just used it for illustration.
But you are right, the critical part in the script is to calculate the column width correctly,
the rest is just reformat what you already have.

Cheers
Claudia

osg174

Hi all!
Thank you very much for your support on this. I really appreciate your effort during the weekend.
I already tested with the Claudia’s script, and adding as well the start = 0 changes and deleting the [1:] and it works really great.
I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
Thank you really so much again to you two, and happy coding!
=)