How to Remove a Column when Rows are Uneven
-
Hi Everyone!
I’m new to NPP (but I love it already). Is there a way to remove a column when the rows are uneven? I’ve tried the shift + alt options but because the row length is inconsistent, it doesn’t work to select only the column I want removed. The other issue is that if I open the file in Excel, I lose my leading zeros for some numbers and I need the leading zeros. (I’m also trying to find a solution to this but it seems like the data Excel loads doesn’t have any record of the leading zeros. I couldn’t find an answer to this in the forums so I apologize if this is a dup question.
Here’s an example of the data, but I’m working with thousands of lines so I can’t manually remove it effectively.
“ClientNumber”,“ClientStatus”,“FirstName”,“MiddleName”,“LastName”,“SSN”,“Gender”,“DOB”,“Email”,“AddressLine1”,
“AddressLine2”,“ZipCode”,“PhoneNo”,“ApprovedDate”,“DateOfAssessment”,“MANumber”,“ClientDischargeDate”,“Case Manager(EMPID)”
5,“InActive”,“XXXX”,“”,“XXXXXX”,“”,“F”,“1001-01-01”,“”,“1111 XXXXXXX XX”,“11”,“11111”,“111111111”,“1001-01-01”,“1001-
01-01”,“11111111”,“1001-01-01”,“”
6,“Active”,“XXXXX”,“X”,“XXXX”,“”,“F”,“1001-01-01”,“”,“111 XXXXXXXX XX X”,“111”,“11111”,“111111111”,“1001-01-
01”,“1001-01-01”,“11111111”,“”,“”
7,“InActive”,“XXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 X 11XX XX”,“111”,“11111”,“1001-01-01”,“1001-01-
01”,“11111111”,“1001-01-01”,“”
8,“InActive”,“XXXXXXX”,“X”,“XXXXXXX”,“”,“M”,“1001-01-01”,“”,“1111 XX 1XX XX”,“X1111”,“11111”,“111111111”,“1001-01-
01”,“1001-01-01”,“11111111”,“1001-01-01”,“”Sorry the formatting is kind of clunky. I would have added a screenshot but the info is sensitive. Specifically I’m looking to remove the MANumber column. The column always has 8 digits so that’s nice, but it’s not always lined up vertically to highlight.
Thanks so much for any ideas you have!
-
Notepad++'s concept of “columns” is based on character, so each character position is a separate column; so using a shift+alt for column-mode selection, the columns are based on character position, not based on your CSV/spreadsheet “column”.
However, if your spreadsheet-columns do not contain any extra commas, then it’s possible to do a regex to delete the nth spreadsheet “column”. So in your example, you want to delete the 16th spreadsheet-column, so you could use a regex like
FIND=
^((.*?,){15})(.*?,)
REPLACE=$1
SEARCH MODE=regular expressionwhich would change
"ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","MANumber","ClientDischargeDate","Case Manager(EMPID)" 5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01","" 6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","11111111","","" 7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111","1001-01-01","" 8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","11111111","1001-01-01",""
to
"ClientNumber","ClientStatus","FirstName","MiddleName","LastName","SSN","Gender","DOB","Email","AddressLine1","AddressLine2","ZipCode","PhoneNo","ApprovedDate","DateOfAssessment","ClientDischargeDate","Case Manager(EMPID)" 5,"InActive","XXXX","","XXXXXX","","F","1001-01-01","","1111 XXXXXXX XX","11","11111","111111111","1001-01-01","1001-01-01","1001-01-01","" 6,"Active","XXXXX","X","XXXX","","F","1001-01-01","","111 XXXXXXXX XX X","111","11111","111111111","1001-01-01","1001-01-01","","" 7,"InActive","XXXX","X","XXXXXXX","","M","1001-01-01","","1111 X 11XX XX","111","11111","1001-01-01","1001-01-01","11111111","" 8,"InActive","XXXXXXX","X","XXXXXXX","","M","1001-01-01","","1111 XX 1XX XX","X1111","11111","111111111","1001-01-01","1001-01-01","1001-01-01",""
-
PS:
BTW: an explanation of my regex: it grabs 15 groups of any non-comma followed by a comma and stores those 15 groups into memory#1, then grabs one more group after that; the replacement keeps the memory#1 contents by using
$1
, but then does not include the contents of that last group. That’s how it deletes the 16th column. If you wanted to delete the Nth column instead, just replace the 15 in the regex with N-1… so the fourth column would have 3 in the regex.Bonus Info: even though this isn’t an excel forum: if you import the CSV instead of directly opening it with Excel, it will give you some options which allow you to define columns as text instead of as numbers, so it will keep the leading zeros. (In the Excel Help/Search bar, type
Text Import Wizard
and use theGet data from text
to open the file, pick your CSV file, then go through the prompts; the third screen allows you to change columns from General to Text (or whatever you want).) -
@PeterJones My goodness. You’re a genius. Thank you so much for this. It worked perfectly! And I will check out the solution with the Excel import. I need to start learning a lot more about regex apparently. Thanks again!
-
@Ryan-McLaughlin said in How to Remove a Column when Rows are Uneven:
I need to start learning a lot more about regex apparently.
FAQ; Where to find Regular Expression (regex) Documentation, which includes links to the online Notepad++ User Manual and many external regex resources.
-
@PeterJones Wonderful! Thank you! I’m so excited to dig into ++ more!
-
Hello, @ryan-mclaughlin, @peterjones and All,
We must take care whether the ranges between
"........."
may or not contain comma(s), themselves !if we consider the general case, where :
-
The field separator is the comma character
-
Some fields are surrounded by doubles quotes, containing possible commas or could be empty fields (
""
) -
Some fields do not contain any double-quote and could be empty too
-
The field to delete is the last field of each row, possibly followed with a comma
Then, the following regex S/R is the right solution, especially when the total number of fields of a row is inferior to the number of the field to delete !!
-
SEARCH
(?-s)^(?:"[^"\r\n]*",|[^",\r\n]*,){
N}\K(?:"[^"\r\n]*",?|[^",\r\n]*,?)
where N is the number of fields right before the field to delete -
REPLACE
Leave EMPTY
-
Tick preferably the
Wrap around
option -
Click only on the
Replace All
button
Test it against this two-lines text :
"ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Field to delete,"ClientDischargeDate","Case Manager(EMPID)", "ClientNumber","","First,Name",123456789,"Last,Name",2021-12-31,"Gender","","Email","AddressLine1",,"1,234.56","PhoneNo","ApprovedDate",01-34-56-78,Last field to delete,
Note that field
12
("1,234.56"
) contains a decimal number with a comma separator => I surrounded this field with double-quotes to keep its meaningBest Regards,
guy038
-