Search for a wrong comma in csv file
-
Hello together,
i have a lot of very large csv files, which i will import to an Historian system.
The seperator of the csv files is a comma. here an Example:
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192
LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192
The bold text are the historized values.In some the historized value is text, which contains a comma:
LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRAUN (EU),192In this case the import will interupted and i must search the text with the wrong comma.
It is not possible to open the csv files in excel, because they are to large (> 1.000.000 lines)
Is it possible to search for lines with more than 6 comma?
I can’t use wildcard in the search like * in notepad ++Has anyone a solution for my question?
Regards
Torsten -
using regular expression and something like
(.+,){7}
might do the job.
This matches lines likea,b,c,d,e,f,g,h
but NOT lines like
a,b,c,d,e,f,,h a,,c,d,e,f,g,h
-
@ekopalypse said in Search for a wrong comma in csv file:
(.+,){7}
Hi,
i tested it, and it works fine.
Thanks for your support.Regards
Torsten -
Come to think of it, this regex
(.*?,){7}
should also work and has the advantage that it also fits lines likea,,c,d,e,f,g,h
. -
It’s probably overkill, especially since the problem is already solved, but the way I would solve this one is to use the formula provided HERE and, adapting it for the problem at hand I would search for:
Find:
(?i-s)^(?:((?:(?!(,)).)*?)(?2)){7,}(?1)$
Search mode: Regular expression -
Hello, @torsten-froese, @aekopalypse, @alan-kiborn and All,
Let’s consider a regular
CSV
file, will all records containing the same number ofcommas
charsFor instance, the @torsten-froese’s example contains exactly
6
commas, so7
fieldsRegarding this example, I suppose that it would be better to :
- Firstly, Mark lines matching the regex :
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){6}(?1)$
Because all these lines, with the template, below, are correct :
Field_1 w/o comma,Field_2 w/o comma,Field_3 w/o comma,Field_4 w/o comma,Field_5 w/o comma,Field_6 w/o comma,field_7 w/o comma
- Secondly, perform the operation
Search > Bookmark > Inverse Bookmarks
So, we could easily find out any line which do no have the same template, as above !
Indeed, your solution, Alan, works only for more than
6
comma chars, found in text, but it misses text containing less than6
commas !You need to perform an additional search :
MARK
(?-s)^(?:((?:(?!(,)).)*?)(?2)){1,5}(?1)$
In that case, all wrong cases are detected !
Test it against this text, below, where :
- The first line contains
6
commas, (OK) - The second line contains
4
commas (KO) - The third line contains
6
commas, (OK) - The fourth line contains
8
commas (KO)
LS4_Produktion.job_id_current.pv,0,2021/06/07,12:00:00.000,0,555555555,192 LS4_Produktion.job_id_current.pv,2021/06/07,12:00:00.000,555555555,192 LS4_Produktion.job_qty_current.pv,0,2021/06/07,12:00:00.000,0,3000,192 LS4_Produktion.product_group_current.pv,0,2021/06/07,12:00:00.000,0,1582 COBAN ,2IN BRA,UN (EU),192
So, we could generalize and say :
Let’s suppose a regular
CSV
file, containingn
one-char separators (,
;
:
or else ) andn+1
fields-
To search all the lines which do not contain exactly
n
separators : -
Open the Mark dialog (
Ctrl + M
)-
MARK
(?-s)^(?:((?:(?!(
SEP)).)*?)(?2)){
n}(?1)$
-
Untick all options
-
Tick the
Bookmark line
,Purge fore each search
andWrap around
options -
Select the
Regular expression
search mode -
Click on the
Mark All
button -
Close the Mark dialog (
Esc
)
-
-
Use the
Search > Bookmark > Inverse Bookmarks
option -
Use the
F2
and/orMaj + F2
shortcuts to navigate between all the remaining lines -
Replace these extra separators with an other char and/or add the missing separators if any, in order to get exactly
n
separators for all the records
Best Regards,
guy038
-
@guy038 said in Search for a wrong comma in csv file:
your solution, Alan, works only for more than 6 comma chars, found in text, but it misses text containing less than 6 commas !
Yes (but actually it is your – generic – solution I used!)
I just try to give them what they ask for:
Is it possible to search for lines with more than 6 comma?
And, truly, I think the OP is looking for the case where some program output csv-like text without regard for embedded commas in the data, so it is very likely that the less-than-6-commas case never occurs. (Certainly good to check for it, though).