• Login
Community
  • Login

Delete lines in multiple text/DAT files that contain specific characters

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
17 Posts 5 Posters 2.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    supasillyass @Adam Bowsky
    last edited by Aug 21, 2019, 9:05 PM

    @Adam-Bowsky

    1. Search
    2. Find in Files…
    3. Search Mode: Regular expression
    4. Find what: \r\n[ ]*.N90-.*00000000$ (Windows EOL)
    5. Replace with: (empty string)
    6. Set file filters and directory as appropriate
    7. Replace in Files
    A 1 Reply Last reply Aug 21, 2019, 9:11 PM Reply Quote 2
    • A
      Alan Kilborn @Adam Bowsky
      last edited by Aug 21, 2019, 9:10 PM

      @Adam-Bowsky

      Oh, sorry I missed the multi-file aspect of your question! Must not be my day.

      1 Reply Last reply Reply Quote 0
      • A
        Adam Bowsky @supasillyass
        last edited by Aug 21, 2019, 9:11 PM

        @supasillyass It worked! thank you!

        1 Reply Last reply Reply Quote 0
        • A
          Adam Bowsky
          last edited by Aug 22, 2019, 4:00 PM

          @supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:

          11N90-SS9035X 00000000
          311N90-SS9035X 00000000
          6001N90-SS9035X 00000000

          Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.

          \r\n[ ]*.N90-.*00000000$

          S 1 Reply Last reply Aug 22, 2019, 4:21 PM Reply Quote 0
          • M
            Michael Vincent
            last edited by Aug 22, 2019, 4:11 PM

            @Adam-Bowsky

            How about:

            \r\n\s+\d{1,4}N90-.*\s+00000000$
            

            The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.

            The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).

            The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.

            The N90- is self explanatory

            The .* means match any character (.) or or more times (*).

            The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.

            Finally, the $ is stop at the end of the line.

            A 1 Reply Last reply Aug 22, 2019, 5:54 PM Reply Quote 2
            • M
              Michael Vincent
              last edited by Aug 22, 2019, 4:15 PM

              Using PREGGER:

              PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/"
              The regular expression:
              
              (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$)
              
              matches as follows:
              
              NODE                     EXPLANATION
              ----------------------------------------------------------------------
              (?-imsx:                 group, but do not capture (case-sensitive)
                                       (with ^ and $ matching normally) (with . not
                                       matching \n) (matching whitespace and #
                                       normally):
              ----------------------------------------------------------------------
                \r                       '\r' (carriage return)
              ----------------------------------------------------------------------
                \n                       '\n' (newline)
              ----------------------------------------------------------------------
                \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                         more times (matching the most amount
                                         possible))
              ----------------------------------------------------------------------
                \d{1,4}                  digits (0-9) (between 1 and 4 times
                                         (matching the most amount possible))
              ----------------------------------------------------------------------
                N90-                     'N90-'
              ----------------------------------------------------------------------
                .*                       any character except \n (0 or more times
                                         (matching the most amount possible))
              ----------------------------------------------------------------------
                \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                         more times (matching the most amount
                                         possible))
              ----------------------------------------------------------------------
                00000000                 '00000000'
              ----------------------------------------------------------------------
                $                        before an optional \n, and the end of the
                                         string
              ----------------------------------------------------------------------
              )                        end of grouping
              ----------------------------------------------------------------------
              
              PS VinsWorldcom@:~>
              
              1 Reply Last reply Reply Quote 1
              • S
                supasillyass @Adam Bowsky
                last edited by Aug 22, 2019, 4:21 PM

                @Adam-Bowsky

                The dot indicated matches a single character:

                \r\n[ ]*.N90-.*00000000$
                        ^
                

                So change it to match a string of digits:

                \r\n[ ]*[0-9]*N90-.*00000000$
                        ^^^^^^
                

                There’s also an edge case not matched where the first line has N90-, so follow up with: ^[ ]*[0-9]*N90-.*00000000\r\n

                A 1 Reply Last reply Aug 22, 2019, 6:15 PM Reply Quote 2
                • A
                  Adam Bowsky @Michael Vincent
                  last edited by Aug 22, 2019, 5:54 PM

                  @Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                  1 Reply Last reply Reply Quote 0
                  • M
                    Michael Vincent
                    last edited by Aug 22, 2019, 6:13 PM

                    @Adam-Bowsky said:

                    For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                    It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then \d{1,5} would match it.

                    I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use \d+ which would match at least 1 and as many digits in a row (similar to the \s+ we’ve been using).

                    Cheers.

                    A 1 Reply Last reply Aug 22, 2019, 7:20 PM Reply Quote 2
                    • A
                      Adam Bowsky @supasillyass
                      last edited by Aug 22, 2019, 6:15 PM

                      @supasillyass thanks!

                      1 Reply Last reply Reply Quote 0
                      • A
                        Adam Bowsky @Michael Vincent
                        last edited by Aug 22, 2019, 7:20 PM

                        @Michael-Vincent thanks again!

                        1 Reply Last reply Reply Quote 0
                        • A
                          Adam Bowsky
                          last edited by Jan 21, 2020, 5:07 PM

                          Re: Delete lines in multiple text/DAT files that contain specific characters

                          Hello,

                          I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                10DTP-1040K           00000000  This should not have been deleted, but was.
                                10N90-SS7784X         00000000 This was deleted correctly.
                          
                          A 1 Reply Last reply Jan 21, 2020, 5:25 PM Reply Quote 0
                          • A
                            Adam Bowsky @Adam Bowsky
                            last edited by Jan 21, 2020, 5:25 PM

                            @Michael-Vincent

                            Hello,

                            I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                              10DTP-1040K           00000000  This should not have been deleted, but was.
                              10N90-SS7784X         00000000 This was deleted correctly.
                            
                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038 Jan 23, 2020, 2:36 AM Jan 21, 2020, 9:45 PM

                              Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,

                              Personally, I would use the following regex S/R, which should work in all the discussed cases !

                              I simply assume that the N90- string, with this exact case, is preceded with, at least, one digit !

                              SEARCH (?-si)^\h*\d+N90-.*\R?

                              REPLACE Leave EMPTY

                              Of course, the Regular expression search mode is selected and the Wrap around option is ticked

                              Give a try !

                              I’ll give you some explanations when everything is right ;-))

                              Best Regards

                              guy038

                              1 Reply Last reply Reply Quote 1
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors