regex to conditionally delete unnecessary lines in .csv files?



  • Can I use regex expression and replace function to delete lines that have 0 in the fourth data field of a .csv file?

    Before
    20180829014727,40.393018,-117.974365,14.54421,163.1
    20180829014740,40.590137,-117.952637,12.82776,162.3
    20180829014754,40.187256,-117.505029,13.20208,162.5
    20180829014832,40.372559,-117.538574,41.76501,162.5
    20180829014836,40.584375,-117.960938,0,162.8
    20180829014840,40.152100,-117.909180,0,162.7
    20180829014845,40.346338,-117.683301,0,162.7

    After
    20180829014727,40.393018,-117.974365,14.54421,163.1
    20180829014740,40.590137,-117.952637,12.82776,162.3
    20180829014754,40.187256,-117.505029,13.20208,162.5
    20180829014832,40.372559,-117.538574,41.76501,162.5



  • @Brandon-Tibbitts
    Certainly and quite simply. The following regex will filter out those lines.
    Find What:^([^,]+,){3}0.+\R*
    Replace With:empty field <—nothing in this field.

    Have search mode as “regular expression” and wraparound ticked.
    As a brief explanation:
    [^,] means any character that is NOT a comma
    ([^,]+,) means any number of non-commas followed by a comma.
    {3} means three sets of the expression preceding (inside brackets).
    0.+\R* means a zero followed by as many characters as possible followed by the carriage return line if it exists (this caters for the last line).

    As the replace with field is empty is just removes that line. When complete check the very last line as it will likely be blank, so you may need to remove that.

    Terry



  • @Brandon-Tibbitts
    I just realised that although my regex works well on your example there is an instance where it won’t. If the 4th field contains something like 0.76 then my regex will still pick it. The correct regex is:
    ^([^,]+,){3}0,.+\R*
    Note the addition of a comma after the zero.

    Sorry about that.

    Terry



  • Thank you Terry,

    That helps me out a ton.

    Is there also a way to do arithmetic on that same field(after the zero values/lines removed), so for example if I wanted to add to or multiply the fourth field value in every line across a file

    I suspect the first part of the regex would be the same ^([^,]+,){3}0,. but rather with a filter to recognize any value in that field instead of the 0, then a replace function similar but to replace it with a modified value

    Or should I post a new topic to cover that one separately?

    Brandon



  • @Brandon-Tibbitts

    You can’t do arithmetic in this (regex) manner. You’re going to need a programming language for that, or perhaps, since you appear to have .csv files, pulling the data into Excel to do the math would work. Good luck.


Log in to reply