Is there a plugin for deleting spaces in blocks of data?



  • I am working with log data taken from sqlplus from a remote server. Once I get a short number of rows for checking the information, I got columns really long which make it hard to read when you have many columns at the right.
    I would like to delete all those white spaces and also the dashes between the column names in order to make it easier for reading.
    I was checking the TextFX v0.26 plugin, but I can not see something which can achieve that automatically.

    I think it will be easier to understand my goal, watching this image.

    I posted the question in stackoverflow as well, in case you want to earn something on that site.
    https://stackoverflow.com/q/48384421/7573996

    PS: I know I can set the column length from sqlplus with a code like this:
    column c1 heading “Column_Name” format a15
    but I want to achieve that from notepad++



  • @osg174

    I get the impression that your “columns” should be reformatted based on the “header line”.

    If this is the case, then a python script, like this, could do what you want

    # how should the columns be separated
    separator = ' '
    # create a list of lines
    lines = editor.getText().splitlines()
    # create a list with the length(width) of each column
    format_param = [len(x) for x in lines[0].split()]
    # create the width formats
    width_per_column = {'width%s'%k:v if v > 0 else 1 for k, v in enumerate(format_param)}
    # create the necessary line formatter
    format_specifier = ''.join(['{:{width%s}}%s' % (x,separator) for x in range(len(format_param))])
    # get rid of the last separator
    format_specifier = format_specifier[:-len(separator)]
    # build the reformatted text as a list
    new_lines = [format_specifier.format(*line.split(), **width_per_column) for line in lines]
    
    # set undo start point
    editor.beginUndoAction()
    # write the reformatted text
    editor.setText('\r\n'.join(new_lines))
    # set undo end point
    editor.endUndoAction()
    

    To make this work you need to

    • install the python script plugin,
    • create a new script,
    • give it a meaningful name
    • paste the code from above into it
    • save it
    • open your data
    • execute the script

    Cheers
    Claudia



  • Hi @Claudia-Frank .
    This is the best approach so far. Thank you so much for your time.
    However, it is just working when there are no spaces in the column values. If the column values are longer than the column name length, unfortunately it shows the information wrongly :(
    But again, thank you very much for your time, it shows me I should prepare the script manually.



  • @osg174

    we could create a python script which resizes the column based on the
    longest value and takes missing values into account if you want,
    but if you want to go another way I’m fine with it as well :-)

    Cheers
    Claudia



  • Hi @Claudia-Frank !
    It should be great if you can create the python script for achieving this!
    Could you please help me with that?
    This should be your test data:

    Kindly remove the first character in all the data: {

    {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
    {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
    { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
    { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

    And this is the desired output value:

    {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
    {--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
    { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
    { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

    Thank you very much for your time!



  • @osg174

    Just to make sure if I understand correctly, what needs to be done

    {COLUMN_01 COLUMN_02 STARTDATE ENDDATE    PRODUCTID PRODUCTNAME                     PRODUCTDESCRIPTION               SEQNO
    {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
    {    38826     14757                            134 PRODUCT NAME VALUE FIRST ROW    1 DESCRIPTION                    32975
    {    38826     14757 01-DEC-17 31-JAN-20        114 PRODUCT NAME VALUE SECOND ROW   2 DESCRIPTION                  1043676
    

    remove { from each line
    read dash line (second line) to see where each column starts and ends
    read all lines (except second) to see which column can be shortened
    reformat text

    COLUMN_01 COLUMN_02 STARTDATE ENDDATE   PRODUCTID PRODUCTNAME                   PRODUCTDESCRIPTION   SEQNO
    --------- --------- --------- --------- --------- ----------------------------- ------------------ -------
        38826     14757                           134 PRODUCT NAME VALUE FIRST ROW  1 DESCRIPTION        32975
        38826     14757 01-DEC-17 31-JAN-20       114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION      1043676
    

    Is my understanding correct?
    If so, I try to create a script which would do this.

    Cheers
    Claudia



  • @osg174

    if I understood correctly, then this script should do what you are looking for.
    I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.

    from Npp import editor
    
    # create list of lines from whole text
    lines = editor.getText().splitlines()
    
    # column width calculated based on dash line (second line)
    max_width = [len(x)+1 for x in lines[1][1:].split()]
     
    # start calulating needed width per column
    # header line
    start = 1
    column_width = []
    for width in max_width:
        column_width.append(len(lines[0][start:start+width].strip()))
        start += width
    
    # skip second line
    # rest of lines
    for line in lines[2:]:
        start = 1
        _l = []
        for width in max_width:
            _l.append(len(line[start:start+width].strip()))
            start += width
        column_width = [max(x,y) for x,y in zip(column_width,_l)]
        
    # start reformat process
    
    def reformat_line(_line):
        tmp_lines = []
        start = 1
        for i, width in enumerate(max_width):
            cell = _line[start:start+width]
            if cell.startswith(' '):
                cell = cell.strip()
                cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell )
            else:
                cell = cell.strip()
                cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell)))
            tmp_lines.append(cell)
            start += width    
        return tmp_lines
    
    new_lines = []
    # header line
    new_lines.append(' '.join(reformat_line(lines[0])))
    
    # secodn line = dash line
    new_lines.append(' '.join(['-'*x for x in column_width]))
    
    # rest of the lines
    for line in lines[2:]:   
        new_lines.append(' '.join(reformat_line(line)))
    
    # set undo start point if something goes wrong
    editor.beginUndoAction()
    # set new text
    editor.setText('\r\n'.join(new_lines))
    # set end undo point
    editor.endUndoAction()
    
    # cleanup
    del lines
    del new_lines
    

    Cheers
    Claudia



  • Hello, @claudia-frank, and All,

    I’ve just tried your last script version, and it’s just working fine ;-))

    I did some tests with these two ranges of text :

    {NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
    {----------------- ----------------- ----------------- -----
    {FIRST             FIRST             FIRST             12345
    {SECOND            SECOND            SECOND            12345
    {THIRD             THIRD             THIRD             12345
    {FOURTH            FOURTH            FOURTH            12345
    {FIFTH             FIFTH             FIFTH             12345
    {SIXTH             SIXTH             SIXTH             12345
    {SEVENTH           SEVENTH           SEVENTH           12345
    

    and :

    {            VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
    {----------------- ----------------- ----------------- -----
    {                1                 1                 1 12345
    {               12                12                12 12345
    {              123               123               123 12345
    {             1234              1234              1234 12345
    {            12345             12345             12345 12345
    {           123456            123456            123456 12345
    {          1234567           1234567           1234567 12345
    

    Each column is 17 characters wide :

    • In the first column, some column values are greater than the header value

    • In the second column the header value is greater than all column values

    • In the third column, the header value is, exactly, the width of the column

    • The fourth column is, simply, a pre-formatted column


    After running your script, we obtain the two results, below :

    NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
    ------- ----------- ----------------- -----
    FIRST   FIRST       FIRST             12345
    SECOND  SECOND      SECOND            12345
    THIRD   THIRD       THIRD             12345
    FOURTH  FOURTH      FOURTH            12345
    FIFTH   FIFTH       FIFTH             12345
    SIXTH   SIXTH       SIXTH             12345
    SEVENTH SEVENTH     SEVENTH           12345
    

    and

      VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
    ------- ------------ ----------------- -----
          1            1                 1 12345
         12           12                12 12345
        123          123               123 12345
       1234         1234              1234 12345
      12345        12345             12345 12345
     123456       123456            123456 12345
    1234567      1234567           1234567 12345
    

    Remark : Quite important to notice that your list must begin your file, with the header values in line 1 and the dash ranges in line 2 !! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line 2 would be sensible ?

    Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.

    So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !

    Cheers,

    guy038



  • @guy038

    Guy, thank you for taking the time to test this script.
    Yes, regardless which char is the first one, it will be ignored. :-)
    This is achieved by the three instances of

    start = 1
    

    and the slicing [1:] done here

    max_width = [len(x)+1 for x in lines[1][1:].split()]
    

    So in order to have a script which doesn’t check for the first char
    one needs to replace start=1 with start=0 and remove the slice part from
    max_width calculation code like

    max_width = [len(x)+1 for x in lines[1].split()]
    

    That’s it.

    Concerning the dash line, yes, we could add a check but does it really makes sense?
    The line itself must not contain any dash sign at all, it could be even mixed chars.
    The only must have is that this is the line which specifies the width of each column
    by using chars separated by a space. Whether this line full fills the requirement
    can’t be really checked.

    In order to have a more general script which reduces the amount of used spaces
    we need to ask for three parameters I guess.

    1. Which line should be used to calculate the column width
    2. Which char is used to separate the columns
    3. Is there a need to ignore some chars at the beginning of EACH line

    What do you think? Something I forgot?

    Cheers
    Claudia



  • Hi, @claudia-frank, and All,

    Great, indeed ! Changing the three lines :

    start = 1
    

    with

    start = 0
    

    and the line :

    max_width = [len(x)+1 for x in lines[1][1:].split()]
    

    with

    max_width = [len(x)+1 for x in lines[1].split()]
    

    does the job :-))


    Now, I noticed a nice side_effect of this new script ! Assuming the text, below :

    Text to be preserved ! NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
    Text to be preserved ! ----------------- ----------------- ----------------- -----
    Text to be preserved ! FIRST             FIRST             FIRST             12345
    Text to be preserved ! SECOND            SECOND            SECOND            12345
    Text to be preserved ! THIRD             THIRD             THIRD             12345
    Text to be preserved ! FOURTH            FOURTH            FOURTH            12345
    Text to be preserved ! FIFTH             FIFTH             FIFTH             12345
    Text to be preserved ! SIXTH             SIXTH             SIXTH             12345
    Text to be preserved ! SEVENTH           SEVENTH           SEVENTH           12345
    

    This new script version gives, automatically, the text :

    Text to be preserved ! NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
    ---- -- -- --------- - ------- ----------- ----------------- -----
    Text to be preserved ! FIRST   FIRST       FIRST             12345
    Text to be preserved ! SECOND  SECOND      SECOND            12345
    Text to be preserved ! THIRD   THIRD       THIRD             12345
    Text to be preserved ! FOURTH  FOURTH      FOURTH            12345
    Text to be preserved ! FIFTH   FIFTH       FIFTH             12345
    Text to be preserved ! SIXTH   SIXTH       SIXTH             12345
    Text to be preserved ! SEVENTH SEVENTH     SEVENTH           12345
    

    Wow ! Just note how the beginning of line 2 is modified !!

    and the same for right-justified text :

    Text to be preserved !             VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
    Text to be preserved ! ----------------- ----------------- ----------------- -----
    Text to be preserved !                 1                 1                 1 12345
    Text to be preserved !                12                12                12 12345
    Text to be preserved !               123               123               123 12345
    Text to be preserved !              1234              1234              1234 12345
    Text to be preserved !             12345             12345             12345 12345
    Text to be preserved !            123456            123456            123456 12345
    Text to be preserved !           1234567           1234567           1234567 12345
    

    which is changed as below :

    Text to be preserved !   VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
    ---- -- -- --------- - ------- ------------ ----------------- -----
    Text to be preserved !       1            1                 1 12345
    Text to be preserved !      12           12                12 12345
    Text to be preserved !     123          123               123 12345
    Text to be preserved !    1234         1234              1234 12345
    Text to be preserved !   12345        12345             12345 12345
    Text to be preserved !  123456       123456            123456 12345
    Text to be preserved ! 1234567      1234567           1234567 12345
    

    So, your point 3 is, implicitly, realized ! No need for any change ;-))


    Now, I think that point 1, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say, 46

    NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
    %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%
    FIRST             FIRST             FIRST             12345
    SECOND            SECOND            SECOND            12345
    THIRD             THIRD             THIRD             12345
    FOURTH            FOURTH            FOURTH            12345
    FIFTH             FIFTH             FIFTH             12345
    SIXTH             SIXTH             SIXTH             12345
    SEVENTH           SEVENTH           SEVENTH           12345
    

    If you tell the script that the “key” line is the line 47, you, automatically know that the user character is the % symbol :-)) Of course, this symbol must not be a space character !

    Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.

    If a second list occurs, afterwards, the user just has to re-run your script, telling about the number n of the new line of symbols. So, this second list would begin at line n-1 and end at the last line, before a true empty line

    Best Regards,

    guy038



  • @guy038

    Hi Guy,

    I had something different in mind but must admit I wasn’t very clear.
    But this revealed that there might be even more questions/features waiting.

    What I had in mind was something like this

    Text I would to get rid of 1                1     1                1     1                1
    Text I would to get rid of 22              22     22              22     22              22
    Text I would to get rid of 333            333     333            333     333            333
    Text I would to get rid of 4444         44444     4444         44444     4444         44444
    Text I would to get rid of 55555         5555     55555         5555     55555         5555
    Text I would to get rid of 6666           666     6666           666     6666           666
    Text I would to get rid of 777             77     777             77     777             77
    Text I would to get rid of 88               8     88               8     88               8
    Text I would to get rid of 9             9999     9             9999     9             9999
    Text I would to get rid of #######     ######     #######     ######     #######     ######
    Text I would to get rid of 1                1     1                1     1                1
    Text I would to get rid of 22              22     22              22     22              22
    Text I would to get rid of 333            333     333            333     333            333
    Text I would to get rid of 4444         44444     4444         44444     4444         44444
    Text I would to get rid of 55555         5555     55555         5555     55555         5555
    Text I would to get rid of 6666           666     6666           666     6666           666
    Text I would to get rid of 777             77     777             77     777             77
    Text I would to get rid of 88               8     88               8     88               8
    Text I would to get rid of 9             9999     9             9999     9             9999
    Text I would to get rid of #######     ######     #######     ######     #######     ######
    .
    .
    .
    

    So there is no real header, just some raw data and then suddenly (line 10)
    some divider which could be used to calculate the column width in this case.

    in this case I would be looking for a result like this

    1         1 1         1 1         1
    22       22 22       22 22       22
    333     333 333     333 333     333
    4444  44444 4444  44444 4444  44444
    55555  5555 55555  5555 55555  5555
    6666    666 6666    666 6666    666
    777      77 777      77 777      77
    88        8 88        8 88        8
    9      9999 9      9999 9      9999
    ##### ##### ##### ##### ##### #####
    1         1 1         1 1         1
    22       22 22       22 22       22
    333     333 333     333 333     333
    4444  44444 4444  44444 4444  44444
    55555  5555 55555  5555 55555  5555
    6666    666 6666    666 6666    666
    777      77 777      77 777      77
    88        8 88        8 88        8
    9      9999 9      9999 9      9999
    ##### ##### ##### ##### ##### #####
    .
    .
    .
    

    (not that it is readable anymore - but …)

    Your idea, of having text in between text that should
    be reformatted without changing the other text (WHATT??? text always text)
    could be added if we consider selections.

    So if nothing is selected, whole document should be reformatted
    but if user has selected something, only this should be reformatted.

    I guess if we still think about it, we can even find more ways of
    reformatting text in other different ways :-)

    Cheers
    Claudia



  • Hi, @claudia-frank, and All,

    First, I would object that, given your recent example text, where I suppressed the common text, at beginning :

    ABCDEFG     ABCDEF     ABCDEFG     ABCDEF     ABCDEFG     ABCDEF
    -------     ------     -------     ------     -------     ------
    1                1     1                1     1                1
    22              22     22              22     22              22
    333            333     333            333     333            333
    4444         44444     4444         44444     4444         44444
    55555         5555     55555         5555     55555         5555
    6666           666     6666           666     6666           666
    777             77     777             77     777             77
    88               8     88               8     88               8
    9             9999     9             9999     9             9999
    1                1     1                1     1                1
    22              22     22              22     22              22
    333            333     333            333     333            333
    4444         44444     4444         44444     4444         44444
    55555         5555     55555         5555     55555         5555
    6666           666     6666           666     6666           666
    777             77     777             77     777             77
    88               8     88               8     88               8
    9             9999     9             9999     9             9999
    

    Your script changes it into the form, below :

    ABCDEFG ABC DEF ABCDEFG ABC DEF
    ------- --- --- ------- --- ---
    1             1 1             1
    22           22 22           22
    333         333 333         333
    4444     44 444 4444     44 444
    55555     5 555 55555     5 555
    6666        666 6666        666
    777          77 777          77
    88            8 88            8
    9         9 999 9         9 999
    1             1 1             1
    22           22 22           22
    333         333 333         333
    4444     44 444 4444     44 444
    55555     5 555 55555     5 555
    6666        666 6666        666
    777          77 777          77
    88            8 88            8
    9         9 999 9         9 999
    

    Not exactly what it’s expected, isn’t it ?

    I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !

    So, after modifying the initial text, as below :

    ABC        ABC ABC        ABC ABC        ABC
    ------- ------ ------- ------ ------- ------
    1            1 1            1 1            1
    22          22 22          22 22          22
    333        333 333        333 333        333
    4444     44444 4444     44444 4444     44444
    55555     5555 55555     5555 55555     5555
    6666       666 6666       666 6666       666
    777         77 777         77 777         77
    88           8 88           8 88           8
    9         9999 9         9999 9         9999
    1            1 1            1 1            1
    22          22 22          22 22          22
    333        333 333        333 333        333
    4444     44444 4444     44444 4444     44444
    55555     5555 55555     5555 55555     5555
    6666       666 6666       666 6666       666
    777         77 777         77 777         77
    88           8 88           8 88           8
    9         9999 9         9999 9         9999
    

    Your present script does give, as expected, the text :

    ABC     ABC ABC     ABC ABC     ABC
    ----- ----- ----- ----- ----- -----
    1         1 1         1 1         1
    22       22 22       22 22       22
    333     333 333     333 333     333
    4444  44444 4444  44444 4444  44444
    55555  5555 55555  5555 55555  5555
    6666    666 6666    666 6666    666
    777      77 777      77 777      77
    88        8 88        8 88        8
    9      9999 9      9999 9      9999
    1         1 1         1 1         1
    22       22 22       22 22       22
    333     333 333     333 333     333
    4444  44444 4444  44444 4444  44444
    55555  5555 55555  5555 55555  5555
    6666    666 6666    666 6666    666
    777      77 777      77 777      77
    88        8 88        8 88        8
    9      9999 9      9999 9      9999
    

    Great !


    Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !

    Now, regarding the text to get rid of, at beginning of each line, just ask for the number n of characters to delete. We already know how to do ;-))

    • Change the 3 lines :
    start = n   #  instead of start = 0
    

    and the line :

    max_width = [len(x)+1 for x in lines[1][n:].split()]   #  instead of max_width = [len(x)+1 for x in lines[1].split()]
    

    Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !

    Cheers,

    guy038



  • @guy038

    Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !

    hehe, me too until I found out that there is a nice call

     editor.getUserLineSelection()
    

    which returns the start and end line number of the selected text and instead there isn’t something selected,
    it returns the start and end line number of the whole text. So, regardless what the user does, a single

    start_line, end_line = editor.getUserLineSelection()
    

    returns what is needed. :-)

    I have to admit, I didn’t test the code with my text - just used it for illustration.
    But you are right, the critical part in the script is to calculate the column width correctly,
    the rest is just reformat what you already have.

    Cheers
    Claudia



  • Hi all!
    Thank you very much for your support on this. I really appreciate your effort during the weekend.
    I already tested with the Claudia’s script, and adding as well the start = 0 changes and deleting the [1:] and it works really great.
    I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
    I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
    Thank you really so much again to you two, and happy coding!
    =)


Log in to reply