Community
    • Login

    How to Remove a Column when Rows are Uneven

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 3 Posters 895 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ryan McLaughlinR
      Ryan McLaughlin
      last edited by Ryan McLaughlin

      Hi Everyone!

      I’m new to NPP (but I love it already). Is there a way to remove a column when the rows are uneven? I’ve tried the shift + alt options but because the row length is inconsistent, it doesn’t work to select only the column I want removed. The other issue is that if I open the file in Excel, I lose my leading zeros for some numbers and I need the leading zeros. (I’m also trying to find a solution to this but it seems like the data Excel loads doesn’t have any record of the leading zeros. I couldn’t find an answer to this in the forums so I apologize if this is a dup question.

      Here’s an example of the data, but I’m working with thousands of lines so I can’t manually remove it effectively.

      “ClientNumber”,“ClientStatus”,“FirstName”,“MiddleName”,“LastName”,“SSN”,“Gender”,“DOB”,“Email”,“AddressLine1”,
      “AddressLine2”,“ZipCode”,“PhoneNo”,“ApprovedDate”,“DateOfAssessment”,“MANumber”,“ClientDischargeDate”,“Case Manager(EMPID)”
      5,“InActive”,“XXXX”,“”,“XXXXXX”,“”,“F”,“1001-01-01”,“”,“1111 XXXXXXX XX”,“11”,“11111”,“111111111”,“1001-01-01”,“1001-
      01-01”,“11111111”,“1001-01-01”,“”
      6,“Active”,“XXXXX”,“X”,“XXXX”,“”,“F”,“1001-01-01”,“”,“111 XXXXXXXX XX X”,“111”,“11111”,“111111111”,“1001-01-
      01”,“1001-01-01”,“11111111”,“”,“”
      7,“InActive”,“XXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 X 11XX XX”,“111”,“11111”,“1001-01-01”,“1001-01-
      01”,“11111111”,“1001-01-01”,“”
      8,“InActive”,“XXXXXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 XX 1XX XX”,“X1111”,“11111”,“111111111”,“1001-01-
      01”,“1001-01-01”,“11111111”,“1001-01-01”,“”

      Sorry the formatting is kind of clunky. I would have added a screenshot but the info is sensitive. Specifically I’m looking to remove the MANumber column. The column always has 8 digits so that’s nice, but it’s not always lined up vertically to highlight.

      Thanks so much for any ideas you have!

      PeterJonesP 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Ryan McLaughlin
        last edited by

        @Ryan-McLaughlin ,

        Notepad++'s concept of “columns” is based on character, so each character position is a separate column; so using a shift+alt for column-mode selection, the columns are based on character position, not based on your CSV/spreadsheet “column”.

        However, if your spreadsheet-columns do not contain any extra commas, then it’s possible to do a regex to delete the nth spreadsheet “column”. So in your example, you want to delete the 16th spreadsheet-column, so you could use a regex like

        FIND=^((.*?,){15})(.*?,)
        REPLACE=$1
        SEARCH MODE=regular expression

        which would change

        "ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","MANumber","ClientDischargeDate","Case Manager(EMPID)"
        5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01",""
        6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","11111111","",""
        7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111","1001-01-01",""
        8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01",""
        

        to

        "ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","ClientDischargeDate","Case Manager(EMPID)"
        5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","1001-01-01",""
        6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","",""
        7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111",""
        8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","1001-01-01",""
        
        1 Reply Last reply Reply Quote 2
        • PeterJonesP
          PeterJones @Ryan McLaughlin
          last edited by PeterJones

          PS:

          BTW: an explanation of my regex: it grabs 15 groups of any non-comma followed by a comma and stores those 15 groups into memory#1, then grabs one more group after that; the replacement keeps the memory#1 contents by using $1, but then does not include the contents of that last group. That’s how it deletes the 16th column. If you wanted to delete the Nth column instead, just replace the 15 in the regex with N-1… so the fourth column would have 3 in the regex.

          Bonus Info: even though this isn’t an excel forum: if you import the CSV instead of directly opening it with Excel, it will give you some options which allow you to define columns as text instead of as numbers, so it will keep the leading zeros. (In the Excel Help/Search bar, type Text Import Wizard and use the Get data from text to open the file, pick your CSV file, then go through the prompts; the third screen allows you to change columns from General to Text (or whatever you want).)

          Ryan McLaughlinR 1 Reply Last reply Reply Quote 1
          • Ryan McLaughlinR
            Ryan McLaughlin @PeterJones
            last edited by

            @PeterJones My goodness. You’re a genius. Thank you so much for this. It worked perfectly! And I will check out the solution with the Excel import. I need to start learning a lot more about regex apparently. Thanks again!

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @Ryan McLaughlin
              last edited by

              @Ryan-McLaughlin said in How to Remove a Column when Rows are Uneven:

              I need to start learning a lot more about regex apparently.

              FAQ; Where to find Regular Expression (regex) Documentation, which includes links to the online Notepad++ User Manual and many external regex resources.

              Ryan McLaughlinR 1 Reply Last reply Reply Quote 0
              • Ryan McLaughlinR
                Ryan McLaughlin @PeterJones
                last edited by

                @PeterJones Wonderful! Thank you! I’m so excited to dig into ++ more!

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @ryan-mclaughlin, @peterjones and All,

                  We must take care whether the ranges between "........." may or not contain comma(s), themselves !

                  if we consider the general case, where :

                  • The field separator is the comma character

                  • Some fields are surrounded by doubles quotes, containing possible commas or could be empty fields ( "")

                  • Some fields do not contain any double-quote and could be empty too

                  • The field to delete is the last field of each row, possibly followed with a comma

                  Then, the following regex S/R is the right solution, especially when the total number of fields of a row is inferior to the number of the field to delete !!


                  • SEARCH (?-s)^(?:"[^"\r\n]*",|[^",\r\n]*,){N}\K(?:"[^"\r\n]*",?|[^",\r\n]*,?)    where N is the number of fields right before the field to delete

                  • REPLACE Leave EMPTY

                  • Tick preferably the Wrap around option

                  • Click only on the Replace All button


                  Test it against this two-lines text :

                  "ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Field to delete,"ClientDischargeDate","Case Manager(EMPID)",
                  
                  "ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Last field to delete,
                  

                  Note that field 12 ( "1,234.56" ) contains a decimal number with a comma separator => I surrounded this field with double-quotes to keep its meaning

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors