Delete lines in multiple text/DAT files that contain specific characters
-
Oh, sorry I missed the multi-file aspect of your question! Must not be my day.
-
@supasillyass It worked! thank you!
-
@supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:
11N90-SS9035X 00000000
311N90-SS9035X 00000000
6001N90-SS9035X 00000000Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.
\r\n[ ]*.N90-.*00000000$
-
How about:
\r\n\s+\d{1,4}N90-.*\s+00000000$
The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.
The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).
The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.
The N90- is self explanatory
The .* means match any character (.) or or more times (*).
The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.
Finally, the $ is stop at the end of the line.
-
Using PREGGER:
PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/" The regular expression: (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \r '\r' (carriage return) ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \d{1,4} digits (0-9) (between 1 and 4 times (matching the most amount possible)) ---------------------------------------------------------------------- N90- 'N90-' ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- 00000000 '00000000' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- PS VinsWorldcom@:~>
-
The dot indicated matches a single character:
\r\n[ ]*.N90-.*00000000$ ^
So change it to match a string of digits:
\r\n[ ]*[0-9]*N90-.*00000000$ ^^^^^^
There’s also an edge case not matched where the first line has
N90-
, so follow up with:^[ ]*[0-9]*N90-.*00000000\r\n
-
@Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?
-
@Adam-Bowsky said:
For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?
It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then
\d{1,5}
would match it.I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use
\d+
which would match at least 1 and as many digits in a row (similar to the\s+
we’ve been using).Cheers.
-
@supasillyass thanks!
-
@Michael-Vincent thanks again!
-
Re: Delete lines in multiple text/DAT files that contain specific characters
Hello,
I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?
10DTP-1040K 00000000 This should not have been deleted, but was. 10N90-SS7784X 00000000 This was deleted correctly.
-
Hello,
I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?
10DTP-1040K 00000000 This should not have been deleted, but was. 10N90-SS7784X 00000000 This was deleted correctly.
-
Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,
Personally, I would use the following regex S/R, which should work in all the discussed cases !
I simply assume that the
N90-
string, with this exact case, is preceded with, at least, one digit !SEARCH
(?-si)^\h*\d+N90-.*\R?
REPLACE
Leave EMPTY
Of course, the
Regular expression
search mode is selected and theWrap around
option is tickedGive a try !
I’ll give you some explanations when everything is right ;-))
Best Regards
guy038