• Login
Community
  • Login

Is there a plugin for deleting spaces in blocks of data?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
sqlcolumns
14 Posts 3 Posters 3.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • O
    osg174
    last edited by Jan 25, 2018, 12:55 PM

    Hi @Claudia-Frank .
    This is the best approach so far. Thank you so much for your time.
    However, it is just working when there are no spaces in the column values. If the column values are longer than the column name length, unfortunately it shows the information wrongly :(
    But again, thank you very much for your time, it shows me I should prepare the script manually.

    C 1 Reply Last reply Jan 25, 2018, 12:59 PM Reply Quote 0
    • C
      Claudia Frank @osg174
      last edited by Jan 25, 2018, 12:59 PM

      @osg174

      we could create a python script which resizes the column based on the
      longest value and takes missing values into account if you want,
      but if you want to go another way I’m fine with it as well :-)

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • O
        osg174
        last edited by osg174 Jan 26, 2018, 6:50 AM Jan 26, 2018, 6:49 AM

        Hi @Claudia-Frank !
        It should be great if you can create the python script for achieving this!
        Could you please help me with that?
        This should be your test data :

        Kindly remove the first character in all the data: {

        {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
        {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
        { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
        { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

        And this is the desired output value:

        {COLUMN_01 COLUMN_02 STARTDATE ENDDATE PRODUCTID PRODUCTNAME PRODUCTDESCRIPTION SEQNO
        {--------- --------- --------- --------- --------- ----------------------------- ------------------ -------
        { 38826 14757 134 PRODUCT NAME VALUE FIRST ROW 1 DESCRIPTION 32975
        { 38826 14757 01-DEC-17 31-JAN-20 114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION 1043676

        Thank you very much for your time!

        C 2 Replies Last reply Jan 26, 2018, 2:05 PM Reply Quote 0
        • C
          Claudia Frank @osg174
          last edited by Claudia Frank Jan 26, 2018, 2:08 PM Jan 26, 2018, 2:05 PM

          @osg174

          Just to make sure if I understand correctly, what needs to be done

          {COLUMN_01 COLUMN_02 STARTDATE ENDDATE    PRODUCTID PRODUCTNAME                     PRODUCTDESCRIPTION               SEQNO
          {--------- --------- --------- --------- ---------- ------------------------------- -------------------------- -----------
          {    38826     14757                            134 PRODUCT NAME VALUE FIRST ROW    1 DESCRIPTION                    32975
          {    38826     14757 01-DEC-17 31-JAN-20        114 PRODUCT NAME VALUE SECOND ROW   2 DESCRIPTION                  1043676
          

          remove { from each line
          read dash line (second line) to see where each column starts and ends
          read all lines (except second) to see which column can be shortened
          reformat text

          COLUMN_01 COLUMN_02 STARTDATE ENDDATE   PRODUCTID PRODUCTNAME                   PRODUCTDESCRIPTION   SEQNO
          --------- --------- --------- --------- --------- ----------------------------- ------------------ -------
              38826     14757                           134 PRODUCT NAME VALUE FIRST ROW  1 DESCRIPTION        32975
              38826     14757 01-DEC-17 31-JAN-20       114 PRODUCT NAME VALUE SECOND ROW 2 DESCRIPTION      1043676
          

          Is my understanding correct?
          If so, I try to create a script which would do this.

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 1
          • C
            Claudia Frank @osg174
            last edited by Jan 26, 2018, 5:55 PM

            @osg174

            if I understood correctly, then this script should do what you are looking for.
            I haven’t tested it on really large data, so maybe start with a couple of hundred lines first.

            from Npp import editor
            
            # create list of lines from whole text
            lines = editor.getText().splitlines()
            
            # column width calculated based on dash line (second line)
            max_width = [len(x)+1 for x in lines[1][1:].split()]
             
            # start calulating needed width per column
            # header line
            start = 1
            column_width = []
            for width in max_width:
                column_width.append(len(lines[0][start:start+width].strip()))
                start += width
            
            # skip second line
            # rest of lines
            for line in lines[2:]:
                start = 1
                _l = []
                for width in max_width:
                    _l.append(len(line[start:start+width].strip()))
                    start += width
                column_width = [max(x,y) for x,y in zip(column_width,_l)]
                
            # start reformat process
            
            def reformat_line(_line):
                tmp_lines = []
                start = 1
                for i, width in enumerate(max_width):
                    cell = _line[start:start+width]
                    if cell.startswith(' '):
                        cell = cell.strip()
                        cell = '{}{}'.format(' '*(column_width[i]-len(cell)),cell )
                    else:
                        cell = cell.strip()
                        cell = '{}{}'.format(cell,' '*(column_width[i]-len(cell)))
                    tmp_lines.append(cell)
                    start += width    
                return tmp_lines
            
            new_lines = []
            # header line
            new_lines.append(' '.join(reformat_line(lines[0])))
            
            # secodn line = dash line
            new_lines.append(' '.join(['-'*x for x in column_width]))
            
            # rest of the lines
            for line in lines[2:]:   
                new_lines.append(' '.join(reformat_line(line)))
            
            # set undo start point if something goes wrong
            editor.beginUndoAction()
            # set new text
            editor.setText('\r\n'.join(new_lines))
            # set end undo point
            editor.endUndoAction()
            
            # cleanup
            del lines
            del new_lines
            

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 2
            • G
              guy038
              last edited by guy038 Jan 27, 2018, 3:16 PM Jan 27, 2018, 2:59 PM

              Hello, @claudia-frank, and All,

              I’ve just tried your last script version, and it’s just working fine ;-))

              I did some tests with these two ranges of text :

              {NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
              {----------------- ----------------- ----------------- -----
              {FIRST             FIRST             FIRST             12345
              {SECOND            SECOND            SECOND            12345
              {THIRD             THIRD             THIRD             12345
              {FOURTH            FOURTH            FOURTH            12345
              {FIFTH             FIFTH             FIFTH             12345
              {SIXTH             SIXTH             SIXTH             12345
              {SEVENTH           SEVENTH           SEVENTH           12345
              

              and :

              {            VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
              {----------------- ----------------- ----------------- -----
              {                1                 1                 1 12345
              {               12                12                12 12345
              {              123               123               123 12345
              {             1234              1234              1234 12345
              {            12345             12345             12345 12345
              {           123456            123456            123456 12345
              {          1234567           1234567           1234567 12345
              

              Each column is 17 characters wide :

              • In the first column, some column values are greater than the header value

              • In the second column the header value is greater than all column values

              • In the third column, the header value is, exactly, the width of the column

              • The fourth column is, simply, a pre-formatted column


              After running your script, we obtain the two results, below :

              NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
              ------- ----------- ----------------- -----
              FIRST   FIRST       FIRST             12345
              SECOND  SECOND      SECOND            12345
              THIRD   THIRD       THIRD             12345
              FOURTH  FOURTH      FOURTH            12345
              FIFTH   FIFTH       FIFTH             12345
              SIXTH   SIXTH       SIXTH             12345
              SEVENTH SEVENTH     SEVENTH           12345
              

              and

                VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
              ------- ------------ ----------------- -----
                    1            1                 1 12345
                   12           12                12 12345
                  123          123               123 12345
                 1234         1234              1234 12345
                12345        12345             12345 12345
               123456       123456            123456 12345
              1234567      1234567           1234567 12345
              

              Remark : Quite important to notice that your list must begin your file, with the header values in line 1 and the dash ranges in line 2 !! Perhaps, Claudia, a test to verify that ranges of dashes, which define column width, are, really, in line 2 would be sensible ?

              Claudia, I also verified, that your script does not mind about the first character of each line, which may be any symbol, instead of an opening brace character, even a single space character ! It will be deleted after running your script.

              So, could you create a similar version, which does not need that extra character, at beginning of all lines ? Thanks for your investigation !

              Cheers,

              guy038

              C 1 Reply Last reply Jan 27, 2018, 7:49 PM Reply Quote 2
              • C
                Claudia Frank @guy038
                last edited by Jan 27, 2018, 7:49 PM

                @guy038

                Guy, thank you for taking the time to test this script.
                Yes, regardless which char is the first one, it will be ignored. :-)
                This is achieved by the three instances of

                start = 1
                

                and the slicing [1:] done here

                max_width = [len(x)+1 for x in lines[1][1:].split()]
                

                So in order to have a script which doesn’t check for the first char
                one needs to replace start=1 with start=0 and remove the slice part from
                max_width calculation code like

                max_width = [len(x)+1 for x in lines[1].split()]
                

                That’s it.

                Concerning the dash line, yes, we could add a check but does it really makes sense?
                The line itself must not contain any dash sign at all, it could be even mixed chars.
                The only must have is that this is the line which specifies the width of each column
                by using chars separated by a space. Whether this line full fills the requirement
                can’t be really checked.

                In order to have a more general script which reduces the amount of used spaces
                we need to ask for three parameters I guess.

                1. Which line should be used to calculate the column width
                2. Which char is used to separate the columns
                3. Is there a need to ignore some chars at the beginning of EACH line

                What do you think? Something I forgot?

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 1
                • G
                  guy038
                  last edited by guy038 Jan 27, 2018, 11:55 PM Jan 27, 2018, 11:41 PM

                  Hi, @claudia-frank, and All,

                  Great, indeed ! Changing the three lines :

                  start = 1
                  

                  with

                  start = 0
                  

                  and the line :

                  max_width = [len(x)+1 for x in lines[1][1:].split()]
                  

                  with

                  max_width = [len(x)+1 for x in lines[1].split()]
                  

                  does the job :-))


                  Now, I noticed a nice side_effect of this new script ! Assuming the text, below :

                  Text to be preserved ! NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                  Text to be preserved ! ----------------- ----------------- ----------------- -----
                  Text to be preserved ! FIRST             FIRST             FIRST             12345
                  Text to be preserved ! SECOND            SECOND            SECOND            12345
                  Text to be preserved ! THIRD             THIRD             THIRD             12345
                  Text to be preserved ! FOURTH            FOURTH            FOURTH            12345
                  Text to be preserved ! FIFTH             FIFTH             FIFTH             12345
                  Text to be preserved ! SIXTH             SIXTH             SIXTH             12345
                  Text to be preserved ! SEVENTH           SEVENTH           SEVENTH           12345
                  

                  This new script version gives, automatically, the text :

                  Text to be preserved ! NAME    PRODUCTNAME NAMES_OF_PRODUCTS ABCDE
                  ---- -- -- --------- - ------- ----------- ----------------- -----
                  Text to be preserved ! FIRST   FIRST       FIRST             12345
                  Text to be preserved ! SECOND  SECOND      SECOND            12345
                  Text to be preserved ! THIRD   THIRD       THIRD             12345
                  Text to be preserved ! FOURTH  FOURTH      FOURTH            12345
                  Text to be preserved ! FIFTH   FIFTH       FIFTH             12345
                  Text to be preserved ! SIXTH   SIXTH       SIXTH             12345
                  Text to be preserved ! SEVENTH SEVENTH     SEVENTH           12345
                  

                  Wow ! Just note how the beginning of line 2 is modified !!

                  and the same for right-justified text :

                  Text to be preserved !             VALUE      PRODUCTVALUE VALUES OF PRODUCT ABCDE
                  Text to be preserved ! ----------------- ----------------- ----------------- -----
                  Text to be preserved !                 1                 1                 1 12345
                  Text to be preserved !                12                12                12 12345
                  Text to be preserved !               123               123               123 12345
                  Text to be preserved !              1234              1234              1234 12345
                  Text to be preserved !             12345             12345             12345 12345
                  Text to be preserved !            123456            123456            123456 12345
                  Text to be preserved !           1234567           1234567           1234567 12345
                  

                  which is changed as below :

                  Text to be preserved !   VALUE PRODUCTVALUE VALUES OF PRODUCT ABCDE
                  ---- -- -- --------- - ------- ------------ ----------------- -----
                  Text to be preserved !       1            1                 1 12345
                  Text to be preserved !      12           12                12 12345
                  Text to be preserved !     123          123               123 12345
                  Text to be preserved !    1234         1234              1234 12345
                  Text to be preserved !   12345        12345             12345 12345
                  Text to be preserved !  123456       123456            123456 12345
                  Text to be preserved ! 1234567      1234567           1234567 12345
                  

                  So, your point 3 is, implicitly, realized ! No need for any change ;-))


                  Now, I think that point 1, which would ask the user, about the number of the line, identifying the different columns, would be enough. Indeed, let’s suppose the text, below, beginning at line, say, 46

                  NAME              PRODUCTNAME       NAMES_OF_PRODUCTS ABCDE
                  %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%% %%%%%
                  FIRST             FIRST             FIRST             12345
                  SECOND            SECOND            SECOND            12345
                  THIRD             THIRD             THIRD             12345
                  FOURTH            FOURTH            FOURTH            12345
                  FIFTH             FIFTH             FIFTH             12345
                  SIXTH             SIXTH             SIXTH             12345
                  SEVENTH           SEVENTH           SEVENTH           12345
                  

                  If you tell the script that the “key” line is the line 47, you, automatically know that the user character is the % symbol :-)) Of course, this symbol must not be a space character !

                  Now, in order to avoid changing text, located after the given list or even a second list, built on the same way, I think that your script should consider that any true empty line, after the first list, stops the process.

                  If a second list occurs, afterwards, the user just has to re-run your script, telling about the number n of the new line of symbols. So, this second list would begin at line n-1 and end at the last line, before a true empty line

                  Best Regards,

                  guy038

                  C 1 Reply Last reply Jan 28, 2018, 12:40 AM Reply Quote 1
                  • C
                    Claudia Frank @guy038
                    last edited by Claudia Frank Jan 28, 2018, 12:43 AM Jan 28, 2018, 12:40 AM

                    @guy038

                    Hi Guy,

                    I had something different in mind but must admit I wasn’t very clear.
                    But this revealed that there might be even more questions/features waiting.

                    What I had in mind was something like this

                    Text I would to get rid of 1                1     1                1     1                1
                    Text I would to get rid of 22              22     22              22     22              22
                    Text I would to get rid of 333            333     333            333     333            333
                    Text I would to get rid of 4444         44444     4444         44444     4444         44444
                    Text I would to get rid of 55555         5555     55555         5555     55555         5555
                    Text I would to get rid of 6666           666     6666           666     6666           666
                    Text I would to get rid of 777             77     777             77     777             77
                    Text I would to get rid of 88               8     88               8     88               8
                    Text I would to get rid of 9             9999     9             9999     9             9999
                    Text I would to get rid of #######     ######     #######     ######     #######     ######
                    Text I would to get rid of 1                1     1                1     1                1
                    Text I would to get rid of 22              22     22              22     22              22
                    Text I would to get rid of 333            333     333            333     333            333
                    Text I would to get rid of 4444         44444     4444         44444     4444         44444
                    Text I would to get rid of 55555         5555     55555         5555     55555         5555
                    Text I would to get rid of 6666           666     6666           666     6666           666
                    Text I would to get rid of 777             77     777             77     777             77
                    Text I would to get rid of 88               8     88               8     88               8
                    Text I would to get rid of 9             9999     9             9999     9             9999
                    Text I would to get rid of #######     ######     #######     ######     #######     ######
                    .
                    .
                    .
                    

                    So there is no real header, just some raw data and then suddenly (line 10)
                    some divider which could be used to calculate the column width in this case.

                    in this case I would be looking for a result like this

                    1         1 1         1 1         1
                    22       22 22       22 22       22
                    333     333 333     333 333     333
                    4444  44444 4444  44444 4444  44444
                    55555  5555 55555  5555 55555  5555
                    6666    666 6666    666 6666    666
                    777      77 777      77 777      77
                    88        8 88        8 88        8
                    9      9999 9      9999 9      9999
                    ##### ##### ##### ##### ##### #####
                    1         1 1         1 1         1
                    22       22 22       22 22       22
                    333     333 333     333 333     333
                    4444  44444 4444  44444 4444  44444
                    55555  5555 55555  5555 55555  5555
                    6666    666 6666    666 6666    666
                    777      77 777      77 777      77
                    88        8 88        8 88        8
                    9      9999 9      9999 9      9999
                    ##### ##### ##### ##### ##### #####
                    .
                    .
                    .
                    

                    (not that it is readable anymore - but …)

                    Your idea, of having text in between text that should
                    be reformatted without changing the other text (WHATT??? text always text)
                    could be added if we consider selections.

                    So if nothing is selected, whole document should be reformatted
                    but if user has selected something, only this should be reformatted.

                    I guess if we still think about it, we can even find more ways of
                    reformatting text in other different ways :-)

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • G
                      guy038
                      last edited by Jan 28, 2018, 10:20 PM

                      Hi, @claudia-frank, and All,

                      First, I would object that, given your recent example text, where I suppressed the common text, at beginning :

                      ABCDEFG     ABCDEF     ABCDEFG     ABCDEF     ABCDEFG     ABCDEF
                      -------     ------     -------     ------     -------     ------
                      1                1     1                1     1                1
                      22              22     22              22     22              22
                      333            333     333            333     333            333
                      4444         44444     4444         44444     4444         44444
                      55555         5555     55555         5555     55555         5555
                      6666           666     6666           666     6666           666
                      777             77     777             77     777             77
                      88               8     88               8     88               8
                      9             9999     9             9999     9             9999
                      1                1     1                1     1                1
                      22              22     22              22     22              22
                      333            333     333            333     333            333
                      4444         44444     4444         44444     4444         44444
                      55555         5555     55555         5555     55555         5555
                      6666           666     6666           666     6666           666
                      777             77     777             77     777             77
                      88               8     88               8     88               8
                      9             9999     9             9999     9             9999
                      

                      Your script changes it into the form, below :

                      ABCDEFG ABC DEF ABCDEFG ABC DEF
                      ------- --- --- ------- --- ---
                      1             1 1             1
                      22           22 22           22
                      333         333 333         333
                      4444     44 444 4444     44 444
                      55555     5 555 55555     5 555
                      6666        666 6666        666
                      777          77 777          77
                      88            8 88            8
                      9         9 999 9         9 999
                      1             1 1             1
                      22           22 22           22
                      333         333 333         333
                      4444     44 444 4444     44 444
                      55555     5 555 55555     5 555
                      6666        666 6666        666
                      777          77 777          77
                      88            8 88            8
                      9         9 999 9         9 999
                      

                      Not exactly what it’s expected, isn’t it ?

                      I think that, with the present modified script, we need that all ranges of dashes are separated by a single space character, only !

                      So, after modifying the initial text, as below :

                      ABC        ABC ABC        ABC ABC        ABC
                      ------- ------ ------- ------ ------- ------
                      1            1 1            1 1            1
                      22          22 22          22 22          22
                      333        333 333        333 333        333
                      4444     44444 4444     44444 4444     44444
                      55555     5555 55555     5555 55555     5555
                      6666       666 6666       666 6666       666
                      777         77 777         77 777         77
                      88           8 88           8 88           8
                      9         9999 9         9999 9         9999
                      1            1 1            1 1            1
                      22          22 22          22 22          22
                      333        333 333        333 333        333
                      4444     44444 4444     44444 4444     44444
                      55555     5555 55555     5555 55555     5555
                      6666       666 6666       666 6666       666
                      777         77 777         77 777         77
                      88           8 88           8 88           8
                      9         9999 9         9999 9         9999
                      

                      Your present script does give, as expected, the text :

                      ABC     ABC ABC     ABC ABC     ABC
                      ----- ----- ----- ----- ----- -----
                      1         1 1         1 1         1
                      22       22 22       22 22       22
                      333     333 333     333 333     333
                      4444  44444 4444  44444 4444  44444
                      55555  5555 55555  5555 55555  5555
                      6666    666 6666    666 6666    666
                      777      77 777      77 777      77
                      88        8 88        8 88        8
                      9      9999 9      9999 9      9999
                      1         1 1         1 1         1
                      22       22 22       22 22       22
                      333     333 333     333 333     333
                      4444  44444 4444  44444 4444  44444
                      55555  5555 55555  5555 55555  5555
                      6666    666 6666    666 6666    666
                      777      77 777      77 777      77
                      88        8 88        8 88        8
                      9      9999 9      9999 9      9999
                      

                      Great !


                      Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection ! Never mind : so, if no normal selection exists, all document would be reformatted. On the contrary, only the selected text would be changed !

                      Now, regarding the text to get rid of, at beginning of each line, just ask for the number n of characters to delete. We already know how to do ;-))

                      • Change the 3 lines :
                      start = n   #  instead of start = 0
                      

                      and the line :

                      max_width = [len(x)+1 for x in lines[1][n:].split()]   #  instead of max_width = [len(x)+1 for x in lines[1].split()]
                      

                      Finally, about the line which identifies the columns width, you would have to scan all lines till a line, built of ranges of the same NON-word character, separated by a single space character !

                      Cheers,

                      guy038

                      C 1 Reply Last reply Jan 28, 2018, 10:36 PM Reply Quote 0
                      • C
                        Claudia Frank @guy038
                        last edited by Claudia Frank Jan 28, 2018, 10:36 PM Jan 28, 2018, 10:36 PM

                        @guy038

                        Personally, I thought that detecting the first true empty line-break, after the list, was more simple to code that managing selection !

                        hehe, me too until I found out that there is a nice call

                         editor.getUserLineSelection()
                        

                        which returns the start and end line number of the selected text and instead there isn’t something selected,
                        it returns the start and end line number of the whole text. So, regardless what the user does, a single

                        start_line, end_line = editor.getUserLineSelection()
                        

                        returns what is needed. :-)

                        I have to admit, I didn’t test the code with my text - just used it for illustration.
                        But you are right, the critical part in the script is to calculate the column width correctly,
                        the rest is just reformat what you already have.

                        Cheers
                        Claudia

                        1 Reply Last reply Reply Quote 1
                        • O
                          osg174
                          last edited by Jan 30, 2018, 8:39 AM

                          Hi all!
                          Thank you very much for your support on this. I really appreciate your effort during the weekend.
                          I already tested with the Claudia’s script, and adding as well the start = 0 changes and deleting the [1:] and it works really great.
                          I just placed the { character at the beginning for sharing the content. No need to take it in consideration.
                          I have never worked with Python [I’m a PL/SQL and database developer], and now I’d like to try to make magic with it. The Claudia’s support was awesome to make the script work in Notepad++.
                          Thank you really so much again to you two, and happy coding!
                          =)

                          1 Reply Last reply Reply Quote 1
                          12 out of 14
                          • First post
                            12/14
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors