Delete all rows with a specific content occuring in a specific column



  • Hello,

    I can’t work out how to delete rows which have a specific content in a specific column. In the example below I’d like to delete all rows which have a ‘.’ in column 5, so that after the deletion I would only have the rows with ‘T’ and ‘A’ in it. Can anybody help?

    Thank you

    Markus

    32 18 . G T 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 1/1:0,0,12,0 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 19 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 20 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 21 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 22 . A . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,10,0,0 0/0:0,12,0,0 ./.:0,0,0,0 0/0:0,13,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 23 . T . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,10,0 0/0:0,0,12,0 ./.:0,0,0,0 0/0:0,0,13,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 24 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 25 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 26 . G A 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/1:0,5,0,8 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 27 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 28 . G A 13 PASS NS=4;DP=41 GT:CATG 0/1:0,5,0,5 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
    32 29 . T . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,10,0 0/0:0,0,12,0 ./.:0,0,0,0 0/0:0,0,13,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0



  • @Markus-Ruhsam

    Perhaps you could search for something like this:

    ^([^ ]+ ){4}\. .*\R 
    

    and replace it with nothing?

    The search begins at the start of a line (^) and is for 4 occurrences ({4}) of 1 or more non-blanks ([^ ]+) followed by a blank () followed by a period and space (\.) followed by any other characters (.*) up to the end of the line, including the newline character(s) (\R).

    If the replace box is empty, the lines matching the search will be deleted.



  • This did not quite work but adjusting the search string to (^\d+\t\d+\t.\t\w\t…*\n) did. Thanks for that!



  • Hello, Markus-Ruhsam,

    Of course, it’s a bit late, by now, but here is a shorter regex, that works perfectly well !

    SEARCH (?-s)^(?=.{10}\.).+\R

    REPLACE EMPTY

    Et voilà !

    Notes :

    • The first part (?-s) is an in-line modifier, which ensures that the dot meta-character will match only standard characters, even if you previously checked the . matches newline option

    • From the beginning of line ^, the form (?=.{10}\.), also called a positive look-ahead, verifies if the condition Does exist a DOT character, in column 11, of the current line, is true.

    • The important thing to remember about look-arounds is that, as soon as the condition is evaluated, the regex engine position is reset to the position before evaluating the look-around condition. So, in our case, to the position just before the first character of each line

    • Consequently, if this condition is true, the following part of the regex .+ matches all the standard characters of the current line

    • Finally, the \R syntax matches any kind of EOL characters ( \r\n ) in Windows files, ( \n ) in Unix files or ( \r ) in Mac files

    • Now, as the replacement string is empty, that means that any complete line ( with a dot character at column 11 ) will be, automatically, deleted !

    Best Regards,

    guy038


Log in to reply