Replacing in specific columns (More difficult than you'd think)

Reply to Replacing in specific columns (More difficult than you'd think) on Fri, 25 Jun 2021 17:15:01 GMT

LingEd — Fri, 25 Jun 2021 17:15:01 GMT

@PeterJones can’t believe I didn’t think of that. Thanks!

Reply to Replacing in specific columns (More difficult than you'd think) on Fri, 25 Jun 2021 15:47:24 GMT

PeterJones — Fri, 25 Jun 2021 15:47:24 GMT

@LingEd said in Replacing in specific columns (More difficult than you'd think):

I altered the regex code to replace the first 5 commas instead of the first 4

Congratulations. That means you understood what was going on. Knowing that people learn from what I write, rather than just copy/pasting and moving on, is always a good feeling.

@PeterJones OK, sorry to bother, but new problem. When I import the altered files into Excel with comma as delimiter it creates too many columns when the tweets contain commas themselves. I was thinking what I could do is replace the 6th+ instances of commas (…) in a line with a very uncommon character like “ɤ” and then search/replace that character in excel after the fact. The thing is, I don’t know how to write the Regex code to replace not just the 6th instance, but the 7th, 8th, 9th, etc instances. Thanks for the help!

That’s one good idea. If I were to do it that way, step 1 would be to just replace all commas with ɤ. Step 2 would be your 5-space-to-comma replacement from above.

But since you’re trying to make valid CSV to open in Excel, CSV has a way of putting quotes around a field so that any commas inside will be treated as part of the text, not as a field separator. But that would mean that if you have any text with quotes in it, that will get messed up. But there’s a way around that by escaping the quote by changing any " to "". So my procedure for what I think what you want with your data:

Search Mode = regular expression for all of this
FIND = "
REPLACE = "" to escape the quotes
FIND = ^(\S+) (\S+) (\S+) (\S+) (\S+) (.*$)
REPLACE = $1,$2,$3,$4,$5,"$6" to change spaces to commas and to put quotes around the text.

Reply to Replacing in specific columns (More difficult than you'd think) on Fri, 25 Jun 2021 15:32:02 GMT

LingEd — Fri, 25 Jun 2021 15:32:02 GMT

@PeterJones OK, sorry to bother, but new problem. When I import the altered files into Excel with comma as delimiter it creates too many columns when the tweets contain commas themselves. I was thinking what I could do is replace the 6th+ instances of commas (I altered the regex code to replace the first 5 commas instead of the first 4) in a line with a very uncommon character like “ɤ” and then search/replace that character in excel after the fact. The thing is, I don’t know how to write the Regex code to replace not just the 6th instance, but the 7th, 8th, 9th, etc instances. Thanks for the help!

Reply to Replacing in specific columns (More difficult than you'd think) on Fri, 25 Jun 2021 15:17:28 GMT

LingEd — Fri, 25 Jun 2021 15:17:28 GMT

@PeterJones Thanks! Works like a charm

Reply to Replacing in specific columns (More difficult than you'd think) on Fri, 25 Jun 2021 15:00:43 GMT

PeterJones — Fri, 25 Jun 2021 15:00:43 GMT

@LingEd ,

data:

123456786 1999-12-31 23:59:57 -0400  Three! with more spaces
123456787 1999-12-31 23:59:58 -0400  Two! with more spaces
123456788 1999-12-31 23:59:59 -0400  One! with more spaces
123456789 1900-01-01 00:00:00 -0400  Happy Y2K Bug!

FIND = ^(\S+) (\S+) (\S+) (\S+)\x20
(I used a \x20, which is equivalent to a space character at the end to make it obvious that there’s something at the end, so you will get it when you copy/paste; if typing the regex, you could just use a space after the last parentheses)
REPLACE = $1,$2,$3,$4,
Search Mode = Regular expression

123456786,1999-12-31,23:59:57,-0400, Three! with more spaces
123456787,1999-12-31,23:59:58,-0400, Two! with more spaces
123456788,1999-12-31,23:59:59,-0400, One! with more spaces
123456789,1900-01-01,00:00:00,-0400, Happy Y2K Bug!