Community
    • Login

    Is there a plugin for deleting spaces in blocks of data?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    sqlcolumns
    14 Posts 3 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • osg174O
      osg174
      last edited by osg174

      Hi @Claudia-Frank !
      It should be great if you can create the python script for achieving this!
      Could you please help me with that?
      This should be your test data:

      Kindly remove the first character in all the data: {

      {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
      {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
      { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
      { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

      And this is the desired output value:

      {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
      {--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
      { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
      { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

      Thank you very much for your time!

      Claudia FrankC 2 Replies Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @osg174
        last edited by Claudia Frank

        @osg174

        Just to make sure if I understand correctly, what needs to be done

        {COLUMN_01 COLUMN_02 STARTDATE ENDDATE    PRODUCTID PRODUCTNAME                     PRODUCTDESCRIPTION               SEQNO
        {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
        {    38826     14757                            134 PRODUCT NAME VALUE FIRST ROW    1 DESCRIPTION                    32975
        {    38826     14757 01-DEC-17 31-JAN-20        114 PRODUCT NAME VALUE SECOND ROW   2 DESCRIPTION                  1043676
        

        remove { from each line
        read dash line (second line) to see where each column starts and ends
        read all lines (except second) to see which column can be shortened
        reformat text

        COLUMN_01 COLUMN_02 STARTDATE ENDDATE   PRODUCTID PRODUCTNAME                   PRODUCTDESCRIPTION   SEQNO
        --------- --------- --------- --------- --------- ----------------------------- ------------------ -------
            38826     14757                           134 PRODUCT NAME VALUE FIRST ROW  1 DESCRIPTION        32975
            38826     14757 01-DEC-17 31-JAN-20       114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION      1043676
        

        Is my understanding correct?
        If so, I try to create a script which would do this.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 1
        • Claudia FrankC
          Claudia Frank @osg174
          last edited by

          @osg174

          if I understood correctly, then this script should do what you are looking for.
          I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.

          from Npp import editor
          
          # create list of lines from whole text
          lines = editor.getText().splitlines()
          
          # column width calculated based on dash line (second line)
          max_width = [len(x)+1 for x in lines[1][1:].split()]
           
          # start calulating needed width per column
          # header line
          start = 1
          column_width = []
          for width in max_width:
              column_width.append(len(lines[0][start:start+width].strip()))
              start += width
          
          # skip second line
          # rest of lines
          for line in lines[2:]:
              start = 1
              _l = []
              for width in max_width:
                  _l.append(len(line[start:start+width].strip()))
                  start += width
              column_width = [max(x,y) for x,y in zip(column_width,_l)]
              
          # start reformat process
          
          def reformat_line(_line):
              tmp_lines = []
              start = 1
              for i, width in enumerate(max_width):
                  cell = _line[start:start+width]
                  if cell.startswith(' '):
                      cell = cell.strip()
                      cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell )
                  else:
                      cell = cell.strip()
                      cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell)))
                  tmp_lines.append(cell)
                  start += width    
              return tmp_lines
          
          new_lines = []
          # header line
          new_lines.append(' '.join(reformat_line(lines[0])))
          
          # secodn line = dash line
          new_lines.append(' '.join(['-'*x for x in column_width]))
          
          # rest of the lines
          for line in lines[2:]:   
              new_lines.append(' '.join(reformat_line(line)))
          
          # set undo start point if something goes wrong
          editor.beginUndoAction()
          # set new text
          editor.setText('\r\n'.join(new_lines))
          # set end undo point
          editor.endUndoAction()
          
          # cleanup
          del lines
          del new_lines
          

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by guy038

            Hello, @claudia-frank, and All,

            I’ve just tried your last script version, and it’s just working fine ;-))

            I did some tests with these two ranges of text :

            {NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
            {----------------- ----------------- ----------------- -----
            {FIRST             FIRST             FIRST             12345
            {SECOND            SECOND            SECOND            12345
            {THIRD             THIRD             THIRD             12345
            {FOURTH            FOURTH            FOURTH            12345
            {FIFTH             FIFTH             FIFTH             12345
            {SIXTH             SIXTH             SIXTH             12345
            {SEVENTH           SEVENTH           SEVENTH           12345
            

            and :

            {            VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
            {----------------- ----------------- ----------------- -----
            {                1                 1                 1 12345
            {               12                12                12 12345
            {              123               123               123 12345
            {             1234              1234              1234 12345
            {            12345             12345             12345 12345
            {           123456            123456            123456 12345
            {          1234567           1234567           1234567 12345
            

            Each column is 17 characters wide :

            • In the first column, some column values are greater than the header value

            • In the second column the header value is greater than all column values

            • In the third column, the header value is, exactly, the width of the column

            • The fourth column is, simply, a pre-formatted column


            After running your script, we obtain the two results, below :

            NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
            ------- ----------- ----------------- -----
            FIRST   FIRST       FIRST             12345
            SECOND  SECOND      SECOND            12345
            THIRD   THIRD       THIRD             12345
            FOURTH  FOURTH      FOURTH            12345
            FIFTH   FIFTH       FIFTH             12345
            SIXTH   SIXTH       SIXTH             12345
            SEVENTH SEVENTH     SEVENTH           12345
            

            and

              VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
            ------- ------------ ----------------- -----
                  1            1                 1 12345
                 12           12                12 12345
                123          123               123 12345
               1234         1234              1234 12345
              12345        12345             12345 12345
             123456       123456            123456 12345
            1234567      1234567           1234567 12345
            

            Remark : Quite important to notice that your list must begin your file, with the header values in line 1 and the dash ranges in line 2 !! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line 2 would be sensible ?

            Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.

            So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !

            Cheers,

            guy038

            Claudia FrankC 1 Reply Last reply Reply Quote 2
            • Claudia FrankC
              Claudia Frank @guy038
              last edited by

              @guy038

              Guy, thank you for taking the time to test this script.
              Yes, regardless which char is the first one, it will be ignored. :-)
              This is achieved by the three instances of

              start = 1
              

              and the slicing [1:] done here

              max_width = [len(x)+1 for x in lines[1][1:].split()]
              

              So in order to have a script which doesn’t check for the first char
              one needs to replace start=1 with start=0 and remove the slice part from
              max_width calculation code like

              max_width = [len(x)+1 for x in lines[1].split()]
              

              That’s it.

              Concerning the dash line, yes, we could add a check but does it really makes sense?
              The line itself must not contain any dash sign at all, it could be even mixed chars.
              The only must have is that this is the line which specifies the width of each column
              by using chars separated by a space. Whether this line full fills the requirement
              can’t be really checked.

              In order to have a more general script which reduces the amount of used spaces
              we need to ask for three parameters I guess.

              1. Which line should be used to calculate the column width
              2. Which char is used to separate the columns
              3. Is there a need to ignore some chars at the beginning of EACH line

              What do you think? Something I forgot?

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hi, @claudia-frank, and All,

                Great, indeed ! Changing the three lines :

                start = 1
                

                with

                start = 0
                

                and the line :

                max_width = [len(x)+1 for x in lines[1][1:].split()]
                

                with

                max_width = [len(x)+1 for x in lines[1].split()]
                

                does the job :-))


                Now, I noticed a nice side_effect of this new script ! Assuming the text, below :

                Text to be preserved ! NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                Text to be preserved ! ----------------- ----------------- ----------------- -----
                Text to be preserved ! FIRST             FIRST             FIRST             12345
                Text to be preserved ! SECOND            SECOND            SECOND            12345
                Text to be preserved ! THIRD             THIRD             THIRD             12345
                Text to be preserved ! FOURTH            FOURTH            FOURTH            12345
                Text to be preserved ! FIFTH             FIFTH             FIFTH             12345
                Text to be preserved ! SIXTH             SIXTH             SIXTH             12345
                Text to be preserved ! SEVENTH           SEVENTH           SEVENTH           12345
                

                This new script version gives, automatically, the text :

                Text to be preserved ! NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
                ---- -- -- --------- - ------- ----------- ----------------- -----
                Text to be preserved ! FIRST   FIRST       FIRST             12345
                Text to be preserved ! SECOND  SECOND      SECOND            12345
                Text to be preserved ! THIRD   THIRD       THIRD             12345
                Text to be preserved ! FOURTH  FOURTH      FOURTH            12345
                Text to be preserved ! FIFTH   FIFTH       FIFTH             12345
                Text to be preserved ! SIXTH   SIXTH       SIXTH             12345
                Text to be preserved ! SEVENTH SEVENTH     SEVENTH           12345
                

                Wow ! Just note how the beginning of line 2 is modified !!

                and the same for right-justified text :

                Text to be preserved !             VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
                Text to be preserved ! ----------------- ----------------- ----------------- -----
                Text to be preserved !                 1                 1                 1 12345
                Text to be preserved !                12                12                12 12345
                Text to be preserved !               123               123               123 12345
                Text to be preserved !              1234              1234              1234 12345
                Text to be preserved !             12345             12345             12345 12345
                Text to be preserved !            123456            123456            123456 12345
                Text to be preserved !           1234567           1234567           1234567 12345
                

                which is changed as below :

                Text to be preserved !   VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
                ---- -- -- --------- - ------- ------------ ----------------- -----
                Text to be preserved !       1            1                 1 12345
                Text to be preserved !      12           12                12 12345
                Text to be preserved !     123          123               123 12345
                Text to be preserved !    1234         1234              1234 12345
                Text to be preserved !   12345        12345             12345 12345
                Text to be preserved !  123456       123456            123456 12345
                Text to be preserved ! 1234567      1234567           1234567 12345
                

                So, your point 3 is, implicitly, realized ! No need for any change ;-))


                Now, I think that point 1, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say, 46

                NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%
                FIRST             FIRST             FIRST             12345
                SECOND            SECOND            SECOND            12345
                THIRD             THIRD             THIRD             12345
                FOURTH            FOURTH            FOURTH            12345
                FIFTH             FIFTH             FIFTH             12345
                SIXTH             SIXTH             SIXTH             12345
                SEVENTH           SEVENTH           SEVENTH           12345
                

                If you tell the script that the “key” line is the line 47, you, automatically know that the user character is the % symbol :-)) Of course, this symbol must not be a space character !

                Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.

                If a second list occurs, afterwards, the user just has to re-run your script, telling about the number n of the new line of symbols. So, this second list would begin at line n-1 and end at the last line, before a true empty line

                Best Regards,

                guy038

                Claudia FrankC 1 Reply Last reply Reply Quote 1
                • Claudia FrankC
                  Claudia Frank @guy038
                  last edited by Claudia Frank

                  @guy038

                  Hi Guy,

                  I had something different in mind but must admit I wasn’t very clear.
                  But this revealed that there might be even more questions/features waiting.

                  What I had in mind was something like this

                  Text I would to get rid of 1                1     1                1     1                1
                  Text I would to get rid of 22              22     22              22     22              22
                  Text I would to get rid of 333            333     333            333     333            333
                  Text I would to get rid of 4444         44444     4444         44444     4444         44444
                  Text I would to get rid of 55555         5555     55555         5555     55555         5555
                  Text I would to get rid of 6666           666     6666           666     6666           666
                  Text I would to get rid of 777             77     777             77     777             77
                  Text I would to get rid of 88               8     88               8     88               8
                  Text I would to get rid of 9             9999     9             9999     9             9999
                  Text I would to get rid of #######     ######     #######     ######     #######     ######
                  Text I would to get rid of 1                1     1                1     1                1
                  Text I would to get rid of 22              22     22              22     22              22
                  Text I would to get rid of 333            333     333            333     333            333
                  Text I would to get rid of 4444         44444     4444         44444     4444         44444
                  Text I would to get rid of 55555         5555     55555         5555     55555         5555
                  Text I would to get rid of 6666           666     6666           666     6666           666
                  Text I would to get rid of 777             77     777             77     777             77
                  Text I would to get rid of 88               8     88               8     88               8
                  Text I would to get rid of 9             9999     9             9999     9             9999
                  Text I would to get rid of #######     ######     #######     ######     #######     ######
                  .
                  .
                  .
                  

                  So there is no real header, just some raw data and then suddenly (line 10)
                  some divider which could be used to calculate the column width in this case.

                  in this case I would be looking for a result like this

                  1         1 1         1 1         1
                  22       22 22       22 22       22
                  333     333 333     333 333     333
                  4444  44444 4444  44444 4444  44444
                  55555  5555 55555  5555 55555  5555
                  6666    666 6666    666 6666    666
                  777      77 777      77 777      77
                  88        8 88        8 88        8
                  9      9999 9      9999 9      9999
                  ##### ##### ##### ##### ##### #####
                  1         1 1         1 1         1
                  22       22 22       22 22       22
                  333     333 333     333 333     333
                  4444  44444 4444  44444 4444  44444
                  55555  5555 55555  5555 55555  5555
                  6666    666 6666    666 6666    666
                  777      77 777      77 777      77
                  88        8 88        8 88        8
                  9      9999 9      9999 9      9999
                  ##### ##### ##### ##### ##### #####
                  .
                  .
                  .
                  

                  (not that it is readable anymore - but …)

                  Your idea, of having text in between text that should
                  be reformatted without changing the other text (WHATT??? text always text)
                  could be added if we consider selections.

                  So if nothing is selected, whole document should be reformatted
                  but if user has selected something, only this should be reformatted.

                  I guess if we still think about it, we can even find more ways of
                  reformatting text in other different ways :-)

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @claudia-frank, and All,

                    First, I would object that, given your recent example text, where I suppressed the common text, at beginning :

                    ABCDEFG     ABCDEF     ABCDEFG     ABCDEF     ABCDEFG     ABCDEF
                    -------     ------     -------     ------     -------     ------
                    1                1     1                1     1                1
                    22              22     22              22     22              22
                    333            333     333            333     333            333
                    4444         44444     4444         44444     4444         44444
                    55555         5555     55555         5555     55555         5555
                    6666           666     6666           666     6666           666
                    777             77     777             77     777             77
                    88               8     88               8     88               8
                    9             9999     9             9999     9             9999
                    1                1     1                1     1                1
                    22              22     22              22     22              22
                    333            333     333            333     333            333
                    4444         44444     4444         44444     4444         44444
                    55555         5555     55555         5555     55555         5555
                    6666           666     6666           666     6666           666
                    777             77     777             77     777             77
                    88               8     88               8     88               8
                    9             9999     9             9999     9             9999
                    

                    Your script changes it into the form, below :

                    ABCDEFG ABC DEF ABCDEFG ABC DEF
                    ------- --- --- ------- --- ---
                    1             1 1             1
                    22           22 22           22
                    333         333 333         333
                    4444     44 444 4444     44 444
                    55555     5 555 55555     5 555
                    6666        666 6666        666
                    777          77 777          77
                    88            8 88            8
                    9         9 999 9         9 999
                    1             1 1             1
                    22           22 22           22
                    333         333 333         333
                    4444     44 444 4444     44 444
                    55555     5 555 55555     5 555
                    6666        666 6666        666
                    777          77 777          77
                    88            8 88            8
                    9         9 999 9         9 999
                    

                    Not exactly what it’s expected, isn’t it ?

                    I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !

                    So, after modifying the initial text, as below :

                    ABC        ABC ABC        ABC ABC        ABC
                    ------- ------ ------- ------ ------- ------
                    1            1 1            1 1            1
                    22          22 22          22 22          22
                    333        333 333        333 333        333
                    4444     44444 4444     44444 4444     44444
                    55555     5555 55555     5555 55555     5555
                    6666       666 6666       666 6666       666
                    777         77 777         77 777         77
                    88           8 88           8 88           8
                    9         9999 9         9999 9         9999
                    1            1 1            1 1            1
                    22          22 22          22 22          22
                    333        333 333        333 333        333
                    4444     44444 4444     44444 4444     44444
                    55555     5555 55555     5555 55555     5555
                    6666       666 6666       666 6666       666
                    777         77 777         77 777         77
                    88           8 88           8 88           8
                    9         9999 9         9999 9         9999
                    

                    Your present script does give, as expected, the text :

                    ABC     ABC ABC     ABC ABC     ABC
                    ----- ----- ----- ----- ----- -----
                    1         1 1         1 1         1
                    22       22 22       22 22       22
                    333     333 333     333 333     333
                    4444  44444 4444  44444 4444  44444
                    55555  5555 55555  5555 55555  5555
                    6666    666 6666    666 6666    666
                    777      77 777      77 777      77
                    88        8 88        8 88        8
                    9      9999 9      9999 9      9999
                    1         1 1         1 1         1
                    22       22 22       22 22       22
                    333     333 333     333 333     333
                    4444  44444 4444  44444 4444  44444
                    55555  5555 55555  5555 55555  5555
                    6666    666 6666    666 6666    666
                    777      77 777      77 777      77
                    88        8 88        8 88        8
                    9      9999 9      9999 9      9999
                    

                    Great !


                    Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !

                    Now, regarding the text to get rid of, at beginning of each line, just ask for the number n of characters to delete. We already know how to do ;-))

                    • Change the 3 lines :
                    start = n   #  instead of start = 0
                    

                    and the line :

                    max_width = [len(x)+1 for x in lines[1][n:].split()]   #  instead of max_width = [len(x)+1 for x in lines[1].split()]
                    

                    Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !

                    Cheers,

                    guy038

                    Claudia FrankC 1 Reply Last reply Reply Quote 0
                    • Claudia FrankC
                      Claudia Frank @guy038
                      last edited by Claudia Frank

                      @guy038

                      Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !

                      hehe, me too until I found out that there is a nice call

                       editor.getUserLineSelection()
                      

                      which returns the start and end line number of the selected text and instead there isn’t something selected,
                      it returns the start and end line number of the whole text. So, regardless what the user does, a single

                      start_line, end_line = editor.getUserLineSelection()
                      

                      returns what is needed. :-)

                      I have to admit, I didn’t test the code with my text - just used it for illustration.
                      But you are right, the critical part in the script is to calculate the column width correctly,
                      the rest is just reformat what you already have.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 1
                      • osg174O
                        osg174
                        last edited by

                        Hi all!
                        Thank you very much for your support on this. I really appreciate your effort during the weekend.
                        I already tested with the Claudia’s script, and adding as well the start = 0 changes and deleting the [1:] and it works really great.
                        I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
                        I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
                        Thank you really so much again to you two, and happy coding!
                        =)

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors