Search for a wrong comma in csv file
-
Hello together,
i have a lot of very large csv files, which i will import to an Historian system.
The seperator of the csv files is a comma. here an Example:
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
The bold text are the historized values.In some the historized value is text, which contains a comma:
LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192In this case the import will interupted and i must search the text with the wrong comma.
It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines)
Is it possible to search for lines with more than 6 comma?
I can’t use wildcard in the search like * in notepad ++Has anyone a solution for my question?
Regards
Torsten -
using regular expression and something like
(.+,){7}
might do the job.
This matches lines likea,b,c,d,e,f,g,hbut NOT lines like
a,b,c,d,e,f,,h a,,c,d,e,f,g,h -
@ekopalypse said in Search for a wrong comma in csv file:
(.+,){7}
Hi,
i tested it, and it works fine.
Thanks for your support.Regards
Torsten -
Come to think of it, this regex
(.*?,){7}should also work and has the advantage that it also fits lines likea,,c,d,e,f,g,h. -
It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for:
Find:
(?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
Search mode: Regular expression -
Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All,
Let’s consider a regular
CSVfile, will all records containing the same number ofcommascharsFor instance, the @torsten-froese’s example contains exactly
6commas, so7fieldsRegarding this example, I suppose that it would be better to :
- Firstly, Mark lines matching the regex :
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$Because all these lines, with the template, below, are correct :
Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma- Secondly, perform the operation
Search > Bookmark > Inverse Bookmarks
So, we could easily find out any line which do no have the same template, as above !
Indeed, your solution, Alan, works only for more than
6comma chars, found in text, but it misses text containing less than6commas !You need to perform an additional search :
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$In that case, all wrong cases are detected !
Test it against this text, below, where :
- The first line contains
6commas, (OK) - The second line contains
4commas (KO) - The third line contains
6commas, (OK) - The fourth line contains
8commas (KO)
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192 LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192 LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192 LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
So, we could generalize and say :
Let’s suppose a regular
CSVfile, containingnone-char separators (,;:or else ) andn+1fields-
To search all the lines which do not contain exactly
nseparators : -
Open the Mark dialog (
Ctrl + M)-
MARK
(?-s)^(?:((?:(?!(SEP)).)*?)(?2)){n}(?1)$ -
Untick all options
-
Tick the
Bookmark line,Purge fore each searchandWrap aroundoptions -
Select the
Regular expressionsearch mode -
Click on the
Mark Allbutton -
Close the Mark dialog (
Esc)
-
-
Use the
Search > Bookmark > Inverse Bookmarksoption -
Use the
F2and/orMaj + F2shortcuts to navigate between all the remaining lines -
Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly
nseparators for all the records
Best Regards,
guy038
-
@guy038 said in Search for a wrong comma in csv file:
your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !
Yes (but actually it is your – generic – solution I used!)
I just try to give them what they ask for:
Is it possible to search for lines with more than 6 comma?
And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though).
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login