Community
    • Login

    Is there a plugin for deleting spaces in blocks of data?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    sqlcolumns
    14 Posts 3 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • osg174O
      osg174
      last edited by

      Hi @Claudia-Frank .
      This is the best approach so far. Thank you so much for your time.
      However, it is just working when there are no spaces in the column values. If the column values are longer than the column name length, unfortunately it shows the information wrongly :(
      But again, thank you very much for your time, it shows me I should prepare the script manually.

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @osg174
        last edited by

        @osg174

        we could create a python script which resizes the column based on the
        longest value and takes missing values into account if you want,
        but if you want to go another way I’m fine with it as well :-)

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • osg174O
          osg174
          last edited by osg174

          Hi @Claudia-Frank !
          It should be great if you can create the python script for achieving this!
          Could you please help me with that?
          This should be your test data:

          Kindly remove the first character in all the data: {

          {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
          {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
          { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
          { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

          And this is the desired output value:

          {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
          {--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
          { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
          { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

          Thank you very much for your time!

          Claudia FrankC 2 Replies Last reply Reply Quote 0
          • Claudia FrankC
            Claudia Frank @osg174
            last edited by Claudia Frank

            @osg174

            Just to make sure if I understand correctly, what needs to be done

            {COLUMN_01 COLUMN_02 STARTDATE ENDDATE    PRODUCTID PRODUCTNAME                     PRODUCTDESCRIPTION               SEQNO
            {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
            {    38826     14757                            134 PRODUCT NAME VALUE FIRST ROW    1 DESCRIPTION                    32975
            {    38826     14757 01-DEC-17 31-JAN-20        114 PRODUCT NAME VALUE SECOND ROW   2 DESCRIPTION                  1043676
            

            remove { from each line
            read dash line (second line) to see where each column starts and ends
            read all lines (except second) to see which column can be shortened
            reformat text

            COLUMN_01 COLUMN_02 STARTDATE ENDDATE   PRODUCTID PRODUCTNAME                   PRODUCTDESCRIPTION   SEQNO
            --------- --------- --------- --------- --------- ----------------------------- ------------------ -------
                38826     14757                           134 PRODUCT NAME VALUE FIRST ROW  1 DESCRIPTION        32975
                38826     14757 01-DEC-17 31-JAN-20       114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION      1043676
            

            Is my understanding correct?
            If so, I try to create a script which would do this.

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 1
            • Claudia FrankC
              Claudia Frank @osg174
              last edited by

              @osg174

              if I understood correctly, then this script should do what you are looking for.
              I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.

              from Npp import editor
              
              # create list of lines from whole text
              lines = editor.getText().splitlines()
              
              # column width calculated based on dash line (second line)
              max_width = [len(x)+1 for x in lines[1][1:].split()]
               
              # start calulating needed width per column
              # header line
              start = 1
              column_width = []
              for width in max_width:
                  column_width.append(len(lines[0][start:start+width].strip()))
                  start += width
              
              # skip second line
              # rest of lines
              for line in lines[2:]:
                  start = 1
                  _l = []
                  for width in max_width:
                      _l.append(len(line[start:start+width].strip()))
                      start += width
                  column_width = [max(x,y) for x,y in zip(column_width,_l)]
                  
              # start reformat process
              
              def reformat_line(_line):
                  tmp_lines = []
                  start = 1
                  for i, width in enumerate(max_width):
                      cell = _line[start:start+width]
                      if cell.startswith(' '):
                          cell = cell.strip()
                          cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell )
                      else:
                          cell = cell.strip()
                          cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell)))
                      tmp_lines.append(cell)
                      start += width    
                  return tmp_lines
              
              new_lines = []
              # header line
              new_lines.append(' '.join(reformat_line(lines[0])))
              
              # secodn line = dash line
              new_lines.append(' '.join(['-'*x for x in column_width]))
              
              # rest of the lines
              for line in lines[2:]:   
                  new_lines.append(' '.join(reformat_line(line)))
              
              # set undo start point if something goes wrong
              editor.beginUndoAction()
              # set new text
              editor.setText('\r\n'.join(new_lines))
              # set end undo point
              editor.endUndoAction()
              
              # cleanup
              del lines
              del new_lines
              

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 2
              • guy038G
                guy038
                last edited by guy038

                Hello, @claudia-frank, and All,

                I’ve just tried your last script version, and it’s just working fine ;-))

                I did some tests with these two ranges of text :

                {NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                {----------------- ----------------- ----------------- -----
                {FIRST             FIRST             FIRST             12345
                {SECOND            SECOND            SECOND            12345
                {THIRD             THIRD             THIRD             12345
                {FOURTH            FOURTH            FOURTH            12345
                {FIFTH             FIFTH             FIFTH             12345
                {SIXTH             SIXTH             SIXTH             12345
                {SEVENTH           SEVENTH           SEVENTH           12345
                

                and :

                {            VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
                {----------------- ----------------- ----------------- -----
                {                1                 1                 1 12345
                {               12                12                12 12345
                {              123               123               123 12345
                {             1234              1234              1234 12345
                {            12345             12345             12345 12345
                {           123456            123456            123456 12345
                {          1234567           1234567           1234567 12345
                

                Each column is 17 characters wide :

                • In the first column, some column values are greater than the header value

                • In the second column the header value is greater than all column values

                • In the third column, the header value is, exactly, the width of the column

                • The fourth column is, simply, a pre-formatted column


                After running your script, we obtain the two results, below :

                NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
                ------- ----------- ----------------- -----
                FIRST   FIRST       FIRST             12345
                SECOND  SECOND      SECOND            12345
                THIRD   THIRD       THIRD             12345
                FOURTH  FOURTH      FOURTH            12345
                FIFTH   FIFTH       FIFTH             12345
                SIXTH   SIXTH       SIXTH             12345
                SEVENTH SEVENTH     SEVENTH           12345
                

                and

                  VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
                ------- ------------ ----------------- -----
                      1            1                 1 12345
                     12           12                12 12345
                    123          123               123 12345
                   1234         1234              1234 12345
                  12345        12345             12345 12345
                 123456       123456            123456 12345
                1234567      1234567           1234567 12345
                

                Remark : Quite important to notice that your list must begin your file, with the header values in line 1 and the dash ranges in line 2 !! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line 2 would be sensible ?

                Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.

                So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !

                Cheers,

                guy038

                Claudia FrankC 1 Reply Last reply Reply Quote 2
                • Claudia FrankC
                  Claudia Frank @guy038
                  last edited by

                  @guy038

                  Guy, thank you for taking the time to test this script.
                  Yes, regardless which char is the first one, it will be ignored. :-)
                  This is achieved by the three instances of

                  start = 1
                  

                  and the slicing [1:] done here

                  max_width = [len(x)+1 for x in lines[1][1:].split()]
                  

                  So in order to have a script which doesn’t check for the first char
                  one needs to replace start=1 with start=0 and remove the slice part from
                  max_width calculation code like

                  max_width = [len(x)+1 for x in lines[1].split()]
                  

                  That’s it.

                  Concerning the dash line, yes, we could add a check but does it really makes sense?
                  The line itself must not contain any dash sign at all, it could be even mixed chars.
                  The only must have is that this is the line which specifies the width of each column
                  by using chars separated by a space. Whether this line full fills the requirement
                  can’t be really checked.

                  In order to have a more general script which reduces the amount of used spaces
                  we need to ask for three parameters I guess.

                  1. Which line should be used to calculate the column width
                  2. Which char is used to separate the columns
                  3. Is there a need to ignore some chars at the beginning of EACH line

                  What do you think? Something I forgot?

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @claudia-frank, and All,

                    Great, indeed ! Changing the three lines :

                    start = 1
                    

                    with

                    start = 0
                    

                    and the line :

                    max_width = [len(x)+1 for x in lines[1][1:].split()]
                    

                    with

                    max_width = [len(x)+1 for x in lines[1].split()]
                    

                    does the job :-))


                    Now, I noticed a nice side_effect of this new script ! Assuming the text, below :

                    Text to be preserved ! NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                    Text to be preserved ! ----------------- ----------------- ----------------- -----
                    Text to be preserved ! FIRST             FIRST             FIRST             12345
                    Text to be preserved ! SECOND            SECOND            SECOND            12345
                    Text to be preserved ! THIRD             THIRD             THIRD             12345
                    Text to be preserved ! FOURTH            FOURTH            FOURTH            12345
                    Text to be preserved ! FIFTH             FIFTH             FIFTH             12345
                    Text to be preserved ! SIXTH             SIXTH             SIXTH             12345
                    Text to be preserved ! SEVENTH           SEVENTH           SEVENTH           12345
                    

                    This new script version gives, automatically, the text :

                    Text to be preserved ! NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
                    ---- -- -- --------- - ------- ----------- ----------------- -----
                    Text to be preserved ! FIRST   FIRST       FIRST             12345
                    Text to be preserved ! SECOND  SECOND      SECOND            12345
                    Text to be preserved ! THIRD   THIRD       THIRD             12345
                    Text to be preserved ! FOURTH  FOURTH      FOURTH            12345
                    Text to be preserved ! FIFTH   FIFTH       FIFTH             12345
                    Text to be preserved ! SIXTH   SIXTH       SIXTH             12345
                    Text to be preserved ! SEVENTH SEVENTH     SEVENTH           12345
                    

                    Wow ! Just note how the beginning of line 2 is modified !!

                    and the same for right-justified text :

                    Text to be preserved !             VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
                    Text to be preserved ! ----------------- ----------------- ----------------- -----
                    Text to be preserved !                 1                 1                 1 12345
                    Text to be preserved !                12                12                12 12345
                    Text to be preserved !               123               123               123 12345
                    Text to be preserved !              1234              1234              1234 12345
                    Text to be preserved !             12345             12345             12345 12345
                    Text to be preserved !            123456            123456            123456 12345
                    Text to be preserved !           1234567           1234567           1234567 12345
                    

                    which is changed as below :

                    Text to be preserved !   VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
                    ---- -- -- --------- - ------- ------------ ----------------- -----
                    Text to be preserved !       1            1                 1 12345
                    Text to be preserved !      12           12                12 12345
                    Text to be preserved !     123          123               123 12345
                    Text to be preserved !    1234         1234              1234 12345
                    Text to be preserved !   12345        12345             12345 12345
                    Text to be preserved !  123456       123456            123456 12345
                    Text to be preserved ! 1234567      1234567           1234567 12345
                    

                    So, your point 3 is, implicitly, realized ! No need for any change ;-))


                    Now, I think that point 1, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say, 46

                    NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                    %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%
                    FIRST             FIRST             FIRST             12345
                    SECOND            SECOND            SECOND            12345
                    THIRD             THIRD             THIRD             12345
                    FOURTH            FOURTH            FOURTH            12345
                    FIFTH             FIFTH             FIFTH             12345
                    SIXTH             SIXTH             SIXTH             12345
                    SEVENTH           SEVENTH           SEVENTH           12345
                    

                    If you tell the script that the “key” line is the line 47, you, automatically know that the user character is the % symbol :-)) Of course, this symbol must not be a space character !

                    Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.

                    If a second list occurs, afterwards, the user just has to re-run your script, telling about the number n of the new line of symbols. So, this second list would begin at line n-1 and end at the last line, before a true empty line

                    Best Regards,

                    guy038

                    Claudia FrankC 1 Reply Last reply Reply Quote 1
                    • Claudia FrankC
                      Claudia Frank @guy038
                      last edited by Claudia Frank

                      @guy038

                      Hi Guy,

                      I had something different in mind but must admit I wasn’t very clear.
                      But this revealed that there might be even more questions/features waiting.

                      What I had in mind was something like this

                      Text I would to get rid of 1                1     1                1     1                1
                      Text I would to get rid of 22              22     22              22     22              22
                      Text I would to get rid of 333            333     333            333     333            333
                      Text I would to get rid of 4444         44444     4444         44444     4444         44444
                      Text I would to get rid of 55555         5555     55555         5555     55555         5555
                      Text I would to get rid of 6666           666     6666           666     6666           666
                      Text I would to get rid of 777             77     777             77     777             77
                      Text I would to get rid of 88               8     88               8     88               8
                      Text I would to get rid of 9             9999     9             9999     9             9999
                      Text I would to get rid of #######     ######     #######     ######     #######     ######
                      Text I would to get rid of 1                1     1                1     1                1
                      Text I would to get rid of 22              22     22              22     22              22
                      Text I would to get rid of 333            333     333            333     333            333
                      Text I would to get rid of 4444         44444     4444         44444     4444         44444
                      Text I would to get rid of 55555         5555     55555         5555     55555         5555
                      Text I would to get rid of 6666           666     6666           666     6666           666
                      Text I would to get rid of 777             77     777             77     777             77
                      Text I would to get rid of 88               8     88               8     88               8
                      Text I would to get rid of 9             9999     9             9999     9             9999
                      Text I would to get rid of #######     ######     #######     ######     #######     ######
                      .
                      .
                      .
                      

                      So there is no real header, just some raw data and then suddenly (line 10)
                      some divider which could be used to calculate the column width in this case.

                      in this case I would be looking for a result like this

                      1         1 1         1 1         1
                      22       22 22       22 22       22
                      333     333 333     333 333     333
                      4444  44444 4444  44444 4444  44444
                      55555  5555 55555  5555 55555  5555
                      6666    666 6666    666 6666    666
                      777      77 777      77 777      77
                      88        8 88        8 88        8
                      9      9999 9      9999 9      9999
                      ##### ##### ##### ##### ##### #####
                      1         1 1         1 1         1
                      22       22 22       22 22       22
                      333     333 333     333 333     333
                      4444  44444 4444  44444 4444  44444
                      55555  5555 55555  5555 55555  5555
                      6666    666 6666    666 6666    666
                      777      77 777      77 777      77
                      88        8 88        8 88        8
                      9      9999 9      9999 9      9999
                      ##### ##### ##### ##### ##### #####
                      .
                      .
                      .
                      

                      (not that it is readable anymore - but …)

                      Your idea, of having text in between text that should
                      be reformatted without changing the other text (WHATT??? text always text)
                      could be added if we consider selections.

                      So if nothing is selected, whole document should be reformatted
                      but if user has selected something, only this should be reformatted.

                      I guess if we still think about it, we can even find more ways of
                      reformatting text in other different ways :-)

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by

                        Hi, @claudia-frank, and All,

                        First, I would object that, given your recent example text, where I suppressed the common text, at beginning :

                        ABCDEFG     ABCDEF     ABCDEFG     ABCDEF     ABCDEFG     ABCDEF
                        -------     ------     -------     ------     -------     ------
                        1                1     1                1     1                1
                        22              22     22              22     22              22
                        333            333     333            333     333            333
                        4444         44444     4444         44444     4444         44444
                        55555         5555     55555         5555     55555         5555
                        6666           666     6666           666     6666           666
                        777             77     777             77     777             77
                        88               8     88               8     88               8
                        9             9999     9             9999     9             9999
                        1                1     1                1     1                1
                        22              22     22              22     22              22
                        333            333     333            333     333            333
                        4444         44444     4444         44444     4444         44444
                        55555         5555     55555         5555     55555         5555
                        6666           666     6666           666     6666           666
                        777             77     777             77     777             77
                        88               8     88               8     88               8
                        9             9999     9             9999     9             9999
                        

                        Your script changes it into the form, below :

                        ABCDEFG ABC DEF ABCDEFG ABC DEF
                        ------- --- --- ------- --- ---
                        1             1 1             1
                        22           22 22           22
                        333         333 333         333
                        4444     44 444 4444     44 444
                        55555     5 555 55555     5 555
                        6666        666 6666        666
                        777          77 777          77
                        88            8 88            8
                        9         9 999 9         9 999
                        1             1 1             1
                        22           22 22           22
                        333         333 333         333
                        4444     44 444 4444     44 444
                        55555     5 555 55555     5 555
                        6666        666 6666        666
                        777          77 777          77
                        88            8 88            8
                        9         9 999 9         9 999
                        

                        Not exactly what it’s expected, isn’t it ?

                        I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !

                        So, after modifying the initial text, as below :

                        ABC        ABC ABC        ABC ABC        ABC
                        ------- ------ ------- ------ ------- ------
                        1            1 1            1 1            1
                        22          22 22          22 22          22
                        333        333 333        333 333        333
                        4444     44444 4444     44444 4444     44444
                        55555     5555 55555     5555 55555     5555
                        6666       666 6666       666 6666       666
                        777         77 777         77 777         77
                        88           8 88           8 88           8
                        9         9999 9         9999 9         9999
                        1            1 1            1 1            1
                        22          22 22          22 22          22
                        333        333 333        333 333        333
                        4444     44444 4444     44444 4444     44444
                        55555     5555 55555     5555 55555     5555
                        6666       666 6666       666 6666       666
                        777         77 777         77 777         77
                        88           8 88           8 88           8
                        9         9999 9         9999 9         9999
                        

                        Your present script does give, as expected, the text :

                        ABC     ABC ABC     ABC ABC     ABC
                        ----- ----- ----- ----- ----- -----
                        1         1 1         1 1         1
                        22       22 22       22 22       22
                        333     333 333     333 333     333
                        4444  44444 4444  44444 4444  44444
                        55555  5555 55555  5555 55555  5555
                        6666    666 6666    666 6666    666
                        777      77 777      77 777      77
                        88        8 88        8 88        8
                        9      9999 9      9999 9      9999
                        1         1 1         1 1         1
                        22       22 22       22 22       22
                        333     333 333     333 333     333
                        4444  44444 4444  44444 4444  44444
                        55555  5555 55555  5555 55555  5555
                        6666    666 6666    666 6666    666
                        777      77 777      77 777      77
                        88        8 88        8 88        8
                        9      9999 9      9999 9      9999
                        

                        Great !


                        Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !

                        Now, regarding the text to get rid of, at beginning of each line, just ask for the number n of characters to delete. We already know how to do ;-))

                        • Change the 3 lines :
                        start = n   #  instead of start = 0
                        

                        and the line :

                        max_width = [len(x)+1 for x in lines[1][n:].split()]   #  instead of max_width = [len(x)+1 for x in lines[1].split()]
                        

                        Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !

                        Cheers,

                        guy038

                        Claudia FrankC 1 Reply Last reply Reply Quote 0
                        • Claudia FrankC
                          Claudia Frank @guy038
                          last edited by Claudia Frank

                          @guy038

                          Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !

                          hehe, me too until I found out that there is a nice call

                           editor.getUserLineSelection()
                          

                          which returns the start and end line number of the selected text and instead there isn’t something selected,
                          it returns the start and end line number of the whole text. So, regardless what the user does, a single

                          start_line, end_line = editor.getUserLineSelection()
                          

                          returns what is needed. :-)

                          I have to admit, I didn’t test the code with my text - just used it for illustration.
                          But you are right, the critical part in the script is to calculate the column width correctly,
                          the rest is just reformat what you already have.

                          Cheers
                          Claudia

                          1 Reply Last reply Reply Quote 1
                          • osg174O
                            osg174
                            last edited by

                            Hi all!
                            Thank you very much for your support on this. I really appreciate your effort during the weekend.
                            I already tested with the Claudia’s script, and adding as well the start = 0 changes and deleting the [1:] and it works really great.
                            I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
                            I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
                            Thank you really so much again to you two, and happy coding!
                            =)

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors