Finding and Removing LF within a comma delimited comments field
-
Hi, I have a comma delimited file with double quotes as a text qualifier. There are five columns. The last is a “comments” field where some users entered line feeds. That’s causing fragmented lines in my output file. I can’t fix the source, so I need to fix the output. A line of text basically goes:
“John”,“Smith”,“Chicago”,“Illinois”,“Likes to bicycle and [LF]
swim.”[LF]
I tried to Find using ([^"])$\n as a Regular Expression and replace with SPACE, but for some reason that selects the entire file (then replaced with a blank space, ha! fail). I appreciate any help!! -
You have a real problem here. If I had this problem I might try replacing a non-double-quote (which I capture into group #1) followed by a
\nwith the group #1 character. Here’s how to tell Notepad++ to do it:Find what box:
([^"])\n
Replace with box:\1
Search mode: Regular expressionI’m not sure your text always follows that specification, but from your description it appears as if it might.
Actually, looking back at what you were trying, I must say you were pretty close to what I came up with.
We always like to see those asking for help here show us what they’ve already tried and failed with, so hat’s off to ya for that! :)
-
That worked perfectly! Thank you so much Alan!!
-
Hello, @john-ditzel and All,
I tried to re-visit the general problem, when a list of records, in a delimited file, is completely unformatted and may contain :
-
Line-break(s) and/or blank character(s), possibly consecutive, inserted before and/or **after any field separator symbol ( generally a comma, semi-colon or colon char )
-
Blank character(s), possibly consecutive, located at beginning or end of any field contents, right before and/or after the delimiter symbol ( generally a double or single quote char )
-
Line-break(s), possibly consecutive, located inside any field contents, between two standard characters, different from, both, the delimiter and the separator symbols
-
Additional line-break(s) and/or blank character(s), possibly consecutive, between two records
-
Unnecessary space characters char(s)( or tabulation chars, when not a delimiter ) inside field contents
So, here is a way to get a neat formatted list :
-
Let SEP be the field separator ( The
,character, in your case ) -
Let DEL be the starting AND ending character delimiter ( The
"character, in your case )
IMPORTANT : The separator character is NEVER part of any field value !
-
IF the separator is not the tabulation character, use these four generic regex S/R, below, in that order :
-
SEARCH 1
[\h\r\n]*(SEP)[\h\r\n]*|DEL\K[\h\r\n]*(?=DEL|\z) -
REPLACE 1
?1SEP:\r\n -
SEARCH 2
[^DELSEP\h\r\n]\K(\h*\R+)+\h*(?=[^DELSEP\h\r\n]) -
REPLACE 2
\x20 -
SEARCH 3
(^|SEP)DEL\K[\h\r\n]+|[\h\r\n]+(?=DEL(SEP|$)) -
REPLACE 3
Leave EMPTY -
SEARCH 4
(['-])\h+|\x20{2,}|\t+ -
REPLACE 4
?1\1:\x20
-
-
IF the separator is the tabulation character, use these four generic regex S/R, below, in that order :
-
SEARCH 1
[\x20\r\n]*(\t)[\x20\r\n]*|DEL\K[\x20\r\n]*(?=DEL|\z) -
REPLACE 1
?1\t:\r\n -
SEARCH 2
[^DEL\h\r\n]\K(\x20*\R+)+\x20*(?=[^DEL\h\r\n]) -
REPLACE 2
\x20 -
SEARCH 3
(^|\t)DEL\K[\x20\r\n]+|[\x20\r\n]+(?=DEL(\t|$)) -
REPLACE 3
Leave EMPTY -
SEARCH 4
(['-])\x20+|\x20{2,} -
REPLACE 4
?1\1:\x20
-
Practical example :
Assuming the sample list, below, containing
3records, and, really, very, very twisted !"John","Smith","Chicago","Illinois","Likes to bicycle and swim. " " Mary Elisabeth " , " Ditzel " ," Los Angeles" , "California"," Love skiing and snow- boarding" "Andrew " , " Mc Millan " , "San Francisco " , " California" , " This is a small Test "With the first regex S/R, below,
-
We delete any range of blank and/or EOL characters, located outside fields and which surrounds separators
-
We replace any range of blank and/or EOL characters; between two records, with a single line-break
SEARCH
[\h\r\n]*(,)[\h\r\n]*|"\K[\h\r\n]*(?="|\z)REPLACE
?1,:\r\nSo, the text is changed into :
"John","Smith","Chicago","Illinois","Likes to bicycle and swim. " " Mary Elisabeth "," Ditzel "," Los Angeles","California"," Love skiing and snow- boarding" "Andrew "," Mc Millan ","San Francisco "," California"," This is a small Test "Now, after running the second regex S/R, below, which replaces any range of blank and/or EOL chars, containing, at least, one EOL char, between
2non-delimiter/blank/EOL characters, with a single space character :SEARCH
[^",\h\r\n]\K(\h*\R+)+\h*(?=[^",\h\r\n])REPLACE
\x20We get the changed list, below :
"John","Smith","Chicago","Illinois","Likes to bicycle and swim. " " Mary Elisabeth "," Ditzel "," Los Angeles","California"," Love skiing and snow- boarding" "Andrew "," Mc Millan ","San Francisco "," California"," This is a small Test "Then, using this third regex S/R, which deletes any consecutive range of blank and/or EOL characters, beginning or ending any field :
SEARCH
(^|,)"\K[\h\r\n]+|[\h\r\n]+(?="(,|$))REPLACE
Leave EMPTYWe get :
"John","Smith","Chicago","Illinois","Likes to bicycle and swim." "Mary Elisabeth","Ditzel","Los Angeles","California","Love skiing and snow- boarding" "Andrew","Mc Millan","San Francisco","California","This is a small Test"From now on, we almost have our neat list, without unnecessary line-breaks ! Finally, with that fourth regex S/R, we normalize any consecutive range of blank characters to a single space character, except when right after a dash or apostrophe. In that case, the space char, wrongly added, by the second S/R above, is, then, deleted !
SEARCH
(['-])\h+|\x20{2,}|\t+REPLACE
?1\1:\x20And, here is our final and expected list :
"John","Smith","Chicago","Illinois","Likes to bicycle and swim." "Mary Elisabeth","Ditzel","Los Angeles","California","Love skiing and snow-boarding" "Andrew","Mc Millan","San Francisco","California","This is a small Test"Regards
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login