CSV file first field check



  • I have to clean .csv files where the first field (comma separated)of each line must ALWAYS be 6 numerals, wrapped in quotes followed a comma. Example: “123456”, The data I generally get randomly entered commas or quotes causing a false line wrap. I have not figured out a proper search for this and have to this point, looking at line by line, eyeballing the first field of each record line through sometimes 1000s of lines trying to find the lone or few offenders, which will causing problems for the program that has to receive the .csv file as an import. Does anybody know how I can do a Find for this condition (when it is NOT “123456”, (6 random numerals wrapped by quotes, followed by a comma)?



  • @Wally-Kempf

    I would do a regular expression search mode Mark operation (Mark tab of Find window) and search for ^"\d{6}", and tick the Bookmark line option. Then I would invert the bookmarks using Search (menu) > Bookmark > Invert Bookmarked Lines. Then I’d jump between the bookmarks on lines that have problems.

    There are other ways as well.



  • Another way: This would produce a hit for the Find-result panel for every non-match of your criterion: ^(?!"\d{6}",).

    It matches on the first character, but that’s probably sufficient for the need.



  • Thanks for the quick responses Alan. I tried both of your suggestions on a file that has the offending records in it, but neither suggestion found those particular lines. I don’t know if it is permissible or possible to send you a screen shot of what I’m talking about, to see if it sparks another idea.



  • @Wally-Kempf

    Best would be both a screenshot and a small section of data. Paste the data here, highlight it and then press the </> button right above the compose box.

    Your data would then look like this (and most importantly wouldn’t be messed with by any “interpretation”:

    I'm some data
    I'm line 2
    etc
    

    Just a wild guess: Perhaps your double quotes are not the simple kind?
    Like maybe they are U+2018, U+2019, U+201C or U+201D ; most likely some combination.



  • “040339”,“0006”,“013”,“Amherst at City View”,"","","","","",“HOUSTON”,“TX”,“77060”,“17103 IMPERIAL VALLEY DR”,“227”,“75.00”,“8.000000”,“AI”,“gate code 2070”,"",“1”,"","","","","","","",“29.946850”,"-95.396846",“1.00”,"","",“174350”
    “041217”,“0001”,“001”,“PALM BEACH ESTATES “,””,"","","","",“HOUSTON”,“TX”,“77060”,“16818 CITY VIEW PL”,“227”,“76.00”,“8.000000”,“AI”,“GATE CODE 1454 SECTION 1
    GATE CODE 1453 ACROSS THE STREET
    ENTRANCE IS ON CITY VIEW OFF BENMAR”,"",“1”,"","","","","","","",“29.934018”,"-95.396125",“1.00”,"","",“174325”

    I couldn’t paste the screen shot as it wouldn’t let me and I don’t see a compose box to do the rest of what you said. But the above is line 1 is good, line 2 is the start of the first bad record, meaning that there are carriage returns breaking the record of line 2 into lines 3 and line 4. I have to remove the carriage returns so that lines 3 and 4 move back up to line 2 as one complete record. I appreciate your help and hope I’m explaining this well enough. But it’s the “junk” record lines similar to lines 3 and 4 that I need to find since that line doesn’t contain “123456”,



  • @Wally-Kempf said in CSV file first field check:

    I couldn’t paste the screen shot

    Really? You just copy your screenshot into the clipboard and hit ctrl+v while you are composing a post here. Super easy. But forget the screenshot, it is not that important.

    I really can’t draw a conclusion from what you’ve posted, because you didn’t follow directions, so I don’t know if that is actually your data, or a version of your data that has been messed with by this website.

    Paste your data here, select it here, and press this button (the one that I highlighted in yellow), right above the box you are composing in (called the “Compose” box):

    4510921d-5a1a-4df8-b577-7a15c73c2c52-image.png

    If you don’t get a black-boxed version of your data, you haven’t done it right.



  • I’m sorry Alan. I won’t take up more of your time. I use Snag-IT to copy the part of the screen and when I right click within the reply box, the paste function is not active, although the copy is and using control v won’t paste it either. Also, as I’m replying, I don’t get any of the compose option that your screen shot shows (they don’t appear anywhere on my screen while replying in this forum). Apparently I don’t have “rights” to give you the complete information. Thank you for trying.



  • @Wally-Kempf

    Hmmm. I use Snag-it as well sometimes and have never had a problem pasting image data here. You can try this with your data, too, to format it properly (you may want to zoom in to see the bracketing characters better):

    71320e90-2e88-4ed0-9590-9e743ac88244-image.png

    But really, I’ve never heard of so much difficulty creating posts with special stuff in them here.

    I won’t take up more of your time.

    Don’t worry about that. It is very unsatisfying to start something and not finish it, so…



  • "040339","0006","013","Amherst at City View","","","","","","HOUSTON","TX","77060","17103 IMPERIAL VALLEY DR","227","75.00","8.000000","AI","gate code 2070","","1","","","","","","","","29.946850","-95.396846","1.00","","","174350"
    "041217","0001","001","PALM BEACH ESTATES ","","","","","","HOUSTON","TX","77060","16818 CITY VIEW PL","227","76.00","8.000000","AI","GATE CODE 1454  SECTION 1 
    GATE CODE 1453  ACROSS THE STREET
    ENTRANCE IS ON CITY VIEW OFF BENMAR","","1","","","","","","","","29.934018","-95.396125","1.00","","","174325"
    

    8a50ed7e-50c4-458b-a7ab-185e06badb72-image.png

    Sorry- I figured out the issue. Now I’m using my local desktop. Previously I was using a remote server and apparently permissions are different. Thanks again.



  • @Wally-Kempf

    @Alan-Kilborn 's regex seems to work for me

    ed9884e5-b0cd-42d9-90ae-d8ca149b5fcd-image.png



  • Hi @Wally-Kempf

    Maybe I’m assuming too much, but if you want to get rid of those false line wraps, extend @Alan-Kilborn 's regex as follows:

    Search:		(?s)(\R)(?!"\d{6}",)
    Replace:	\x20
    

    which produces this two-lines outcome:

    "040339","0006","013","Amherst at City View","","","","","","HOUSTON","TX","77060","17103 IMPERIAL VALLEY DR","227","75.00","8.000000","AI","gate code 2070","","1","","","","","","","","29.946850","-95.396846","1.00","","","174350"
    "041217","0001","001","PALM BEACH ESTATES ","","","","","","HOUSTON","TX","77060","16818 CITY VIEW PL","227","76.00","8.000000","AI","GATE CODE 1454  SECTION 1  GATE CODE 1453  ACROSS THE STREET ENTRANCE IS ON CITY VIEW OFF BENMAR","","1","","","","","","","","29.934018","-95.396125","1.00","","","174325"
    

    Best Regards.


Log in to reply