Delete lines in multiple text/DAT files that contain specific characters



  • @supasillyass It worked! thank you!



  • @supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:

    11N90-SS9035X 00000000
    311N90-SS9035X 00000000
    6001N90-SS9035X 00000000

    Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.

    \r\n[ ]*.N90-.*00000000$



  • @Adam-Bowsky

    How about:

    \r\n\s+\d{1,4}N90-.*\s+00000000$
    

    The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.

    The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).

    The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.

    The N90- is self explanatory

    The .* means match any character (.) or or more times (*).

    The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.

    Finally, the $ is stop at the end of the line.



  • Using PREGGER:

    PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/"
    The regular expression:
    
    (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$)
    
    matches as follows:
    
    NODE                     EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx:                 group, but do not capture (case-sensitive)
                             (with ^ and $ matching normally) (with . not
                             matching \n) (matching whitespace and #
                             normally):
    ----------------------------------------------------------------------
      \r                       '\r' (carriage return)
    ----------------------------------------------------------------------
      \n                       '\n' (newline)
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      \d{1,4}                  digits (0-9) (between 1 and 4 times
                               (matching the most amount possible))
    ----------------------------------------------------------------------
      N90-                     'N90-'
    ----------------------------------------------------------------------
      .*                       any character except \n (0 or more times
                               (matching the most amount possible))
    ----------------------------------------------------------------------
      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                               more times (matching the most amount
                               possible))
    ----------------------------------------------------------------------
      00000000                 '00000000'
    ----------------------------------------------------------------------
      $                        before an optional \n, and the end of the
                               string
    ----------------------------------------------------------------------
    )                        end of grouping
    ----------------------------------------------------------------------
    
    PS VinsWorldcom@:~>
    


  • @Adam-Bowsky

    The dot indicated matches a single character:

    \r\n[ ]*.N90-.*00000000$
            ^
    

    So change it to match a string of digits:

    \r\n[ ]*[0-9]*N90-.*00000000$
            ^^^^^^
    

    There’s also an edge case not matched where the first line has N90-, so follow up with: ^[ ]*[0-9]*N90-.*00000000\r\n



  • @Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?



  • @Adam-Bowsky said:

    For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

    It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then \d{1,5} would match it.

    I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use \d+ which would match at least 1 and as many digits in a row (similar to the \s+ we’ve been using).

    Cheers.



  • @supasillyass thanks!



  • @Michael-Vincent thanks again!



  • Re: Delete lines in multiple text/DAT files that contain specific characters

    Hello,

    I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

          10DTP-1040K           00000000  This should not have been deleted, but was.
          10N90-SS7784X         00000000 This was deleted correctly.


  • @Michael-Vincent

    Hello,

    I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

      10DTP-1040K           00000000  This should not have been deleted, but was.
      10N90-SS7784X         00000000 This was deleted correctly.


  • Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,

    Personally, I would use the following regex S/R, which should work in all the discussed cases !

    I simply assume that the N90- string, with this exact case, is preceded with, at least, one digit !

    SEARCH (?-si)^\h*\d+N90-.*\R?

    REPLACE Leave EMPTY

    Of course, the Regular expression search mode is selected and the Wrap around option is ticked

    Give a try !

    I’ll give you some explanations when everything is right ;-))

    Best Regards

    guy038


Log in to reply