• Login
Community
  • Login

How to Remove a Column when Rows are Uneven

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 3 Posters 717 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    Ryan McLaughlin
    last edited by Ryan McLaughlin Oct 22, 2021, 3:43 PM Oct 22, 2021, 3:42 PM

    Hi Everyone!

    I’m new to NPP (but I love it already). Is there a way to remove a column when the rows are uneven? I’ve tried the shift + alt options but because the row length is inconsistent, it doesn’t work to select only the column I want removed. The other issue is that if I open the file in Excel, I lose my leading zeros for some numbers and I need the leading zeros. (I’m also trying to find a solution to this but it seems like the data Excel loads doesn’t have any record of the leading zeros. I couldn’t find an answer to this in the forums so I apologize if this is a dup question.

    Here’s an example of the data, but I’m working with thousands of lines so I can’t manually remove it effectively.

    “ClientNumber”,“ClientStatus”,“FirstName”,“MiddleName”,“LastName”,“SSN”,“Gender”,“DOB”,“Email”,“AddressLine1”,
    “AddressLine2”,“ZipCode”,“PhoneNo”,“ApprovedDate”,“DateOfAssessment”,“MANumber”,“ClientDischargeDate”,“Case Manager(EMPID)”
    5,“InActive”,“XXXX”,“”,“XXXXXX”,“”,“F”,“1001-01-01”,“”,“1111 XXXXXXX XX”,“11”,“11111”,“111111111”,“1001-01-01”,“1001-
    01-01”,“11111111”,“1001-01-01”,“”
    6,“Active”,“XXXXX”,“X”,“XXXX”,“”,“F”,“1001-01-01”,“”,“111 XXXXXXXX XX X”,“111”,“11111”,“111111111”,“1001-01-
    01”,“1001-01-01”,“11111111”,“”,“”
    7,“InActive”,“XXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 X 11XX XX”,“111”,“11111”,“1001-01-01”,“1001-01-
    01”,“11111111”,“1001-01-01”,“”
    8,“InActive”,“XXXXXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 XX 1XX XX”,“X1111”,“11111”,“111111111”,“1001-01-
    01”,“1001-01-01”,“11111111”,“1001-01-01”,“”

    Sorry the formatting is kind of clunky. I would have added a screenshot but the info is sensitive. Specifically I’m looking to remove the MANumber column. The column always has 8 digits so that’s nice, but it’s not always lined up vertically to highlight.

    Thanks so much for any ideas you have!

    P 2 Replies Last reply Oct 22, 2021, 3:56 PM Reply Quote 0
    • P
      PeterJones @Ryan McLaughlin
      last edited by Oct 22, 2021, 3:56 PM

      @Ryan-McLaughlin ,

      Notepad++'s concept of “columns” is based on character, so each character position is a separate column; so using a shift+alt for column-mode selection, the columns are based on character position, not based on your CSV/spreadsheet “column”.

      However, if your spreadsheet-columns do not contain any extra commas, then it’s possible to do a regex to delete the nth spreadsheet “column”. So in your example, you want to delete the 16th spreadsheet-column, so you could use a regex like

      FIND=^((.*?,){15})(.*?,)
      REPLACE=$1
      SEARCH MODE=regular expression

      which would change

      "ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","MANumber","ClientDischargeDate","Case Manager(EMPID)"
      5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01",""
      6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","11111111","",""
      7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111","1001-01-01",""
      8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01",""
      

      to

      "ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","ClientDischargeDate","Case Manager(EMPID)"
      5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","1001-01-01",""
      6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","",""
      7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111",""
      8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","1001-01-01",""
      
      1 Reply Last reply Reply Quote 2
      • P
        PeterJones @Ryan McLaughlin
        last edited by PeterJones Oct 22, 2021, 4:17 PM Oct 22, 2021, 4:11 PM

        PS:

        BTW: an explanation of my regex: it grabs 15 groups of any non-comma followed by a comma and stores those 15 groups into memory#1, then grabs one more group after that; the replacement keeps the memory#1 contents by using $1, but then does not include the contents of that last group. That’s how it deletes the 16th column. If you wanted to delete the Nth column instead, just replace the 15 in the regex with N-1… so the fourth column would have 3 in the regex.

        Bonus Info: even though this isn’t an excel forum: if you import the CSV instead of directly opening it with Excel, it will give you some options which allow you to define columns as text instead of as numbers, so it will keep the leading zeros. (In the Excel Help/Search bar, type Text Import Wizard and use the Get data from text to open the file, pick your CSV file, then go through the prompts; the third screen allows you to change columns from General to Text (or whatever you want).)

        R 1 Reply Last reply Oct 22, 2021, 4:18 PM Reply Quote 1
        • R
          Ryan McLaughlin @PeterJones
          last edited by Oct 22, 2021, 4:18 PM

          @PeterJones My goodness. You’re a genius. Thank you so much for this. It worked perfectly! And I will check out the solution with the Excel import. I need to start learning a lot more about regex apparently. Thanks again!

          P 1 Reply Last reply Oct 22, 2021, 4:20 PM Reply Quote 0
          • P
            PeterJones @Ryan McLaughlin
            last edited by Oct 22, 2021, 4:20 PM

            @Ryan-McLaughlin said in How to Remove a Column when Rows are Uneven:

            I need to start learning a lot more about regex apparently.

            FAQ; Where to find Regular Expression (regex) Documentation, which includes links to the online Notepad++ User Manual and many external regex resources.

            R 1 Reply Last reply Oct 22, 2021, 4:28 PM Reply Quote 0
            • R
              Ryan McLaughlin @PeterJones
              last edited by Oct 22, 2021, 4:28 PM

              @PeterJones Wonderful! Thank you! I’m so excited to dig into ++ more!

              1 Reply Last reply Reply Quote 1
              • G
                guy038
                last edited by guy038 Oct 22, 2021, 8:11 PM Oct 22, 2021, 8:06 PM

                Hello, @ryan-mclaughlin, @peterjones and All,

                We must take care whether the ranges between "........." may or not contain comma(s), themselves !

                if we consider the general case, where :

                • The field separator is the comma character

                • Some fields are surrounded by doubles quotes, containing possible commas or could be empty fields ( "")

                • Some fields do not contain any double-quote and could be empty too

                • The field to delete is the last field of each row, possibly followed with a comma

                Then, the following regex S/R is the right solution, especially when the total number of fields of a row is inferior to the number of the field to delete !!


                • SEARCH (?-s)^(?:"[^"\r\n]*",|[^",\r\n]*,){N}\K(?:"[^"\r\n]*",?|[^",\r\n]*,?)    where N is the number of fields right before the field to delete

                • REPLACE Leave EMPTY

                • Tick preferably the Wrap around option

                • Click only on the Replace All button


                Test it against this two-lines text :

                "ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Field to delete,"ClientDischargeDate","Case Manager(EMPID)",
                
                "ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Last field to delete,
                

                Note that field 12 ( "1,234.56" ) contains a decimal number with a comma separator => I surrounded this field with double-quotes to keep its meaning

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 3
                3 out of 7
                • First post
                  3/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors