Delete all rows with a specific content occuring in a specific column
-
Hello,
I can’t work out how to delete rows which have a specific content in a specific column. In the example below I’d like to delete all rows which have a ‘.’ in column 5, so that after the deletion I would only have the rows with ‘T’ and ‘A’ in it. Can anybody help?
Thank you
Markus
32 18 . G T 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 1/1:0,0,12,0 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 19 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 20 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 21 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 22 . A . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,10,0,0 0/0:0,12,0,0 ./.:0,0,0,0 0/0:0,13,0,0 0/0:0,6,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 23 . T . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,10,0 0/0:0,0,12,0 ./.:0,0,0,0 0/0:0,0,13,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 24 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 25 . C . 13 PASS NS=4;DP=41 GT:CATG 0/0:10,0,0,0 0/0:12,0,0,0 ./.:0,0,0,0 0/0:13,0,0,0 0/0:6,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 26 . G A 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/1:0,5,0,8 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 27 . G . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,0,10 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 28 . G A 13 PASS NS=4;DP=41 GT:CATG 0/1:0,5,0,5 0/0:0,0,0,12 ./.:0,0,0,0 0/0:0,0,0,13 0/0:0,0,0,6 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0
32 29 . T . 13 PASS NS=4;DP=41 GT:CATG 0/0:0,0,10,0 0/0:0,0,12,0 ./.:0,0,0,0 0/0:0,0,13,0 0/0:0,0,6,0 ./.:0,0,0,0 ./.:0,0,0,0 ./.:0,0,0,0 -
Perhaps you could search for something like this:
^([^ ]+ ){4}\. .*\R
and replace it with nothing?
The search begins at the start of a line (
^
) and is for 4 occurrences ({4}
) of 1 or more non-blanks ([^ ]+
) followed by a blank (\.
) followed by any other characters (.*
) up to the end of the line, including the newline character(s) (\R
).If the replace box is empty, the lines matching the search will be deleted.
-
This did not quite work but adjusting the search string to (^\d+\t\d+\t.\t\w\t…*\n) did. Thanks for that!
-
Hello, Markus-Ruhsam,
Of course, it’s a bit late, by now, but here is a shorter regex, that works perfectly well !
SEARCH
(?-s)^(?=.{10}\.).+\R
REPLACE
EMPTY
Et voilà !
Notes :
-
The first part
(?-s)
is an in-line modifier, which ensures that the dot meta-character will match only standard characters, even if you previously checked the . matches newline option -
From the beginning of line
^
, the form(?=.{10}\.)
, also called a positive look-ahead, verifies if the conditionDoes exist a DOT character, in column 11
, of the current line, is true. -
The important thing to remember about look-arounds is that, as soon as the condition is evaluated, the regex engine position is reset to the position before evaluating the look-around condition. So, in our case, to the position just before the first character of each line
-
Consequently, if this condition is true, the following part of the regex
.+
matches all the standard characters of the current line -
Finally, the
\R
syntax matches any kind of EOL characters (\r\n
) in Windows files, (\n
) in Unix files or (\r
) in Mac files -
Now, as the replacement string is empty, that means that any complete line ( with a dot character at column 11 ) will be, automatically, deleted !
Best Regards,
guy038
-