• Login
Community
  • Login

Delete lines in multiple text/DAT files that contain specific characters

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
17 Posts 5 Posters 2.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Adam Bowsky
    last edited by Aug 21, 2019, 8:15 PM

    I have 730+ DAT files that contain lines of text that I need to delete from all of them. An example is below. I only used 3 lines, but some files contain over 1000 lines.

       1CONT-15507900000    00000000
       1N90-SS9035X         00000000
       1BFG-89442           00000000
    

    I need to delete the complete line that contains the text “N90-”. Please note that there is spacing preceding the text, and the spacing between the text and zeros cannot change. Also, the number preceding the text changes in each file, so the only constant is the “N90-”.

    I tried Googling it, but could not find what I needed. I heard there was a way to do it with Notepad++, but I was unsuccessful.

    A S 2 Replies Last reply Aug 21, 2019, 8:35 PM Reply Quote 0
    • A
      Alan Kilborn @Adam Bowsky
      last edited by Alan Kilborn Aug 21, 2019, 8:37 PM Aug 21, 2019, 8:35 PM

      @Adam-Bowsky

      You could use a Search > Mark... operation to bookmark lines containing N90-. Then delete the bookmarked lines using Search > Bookmark > Remove Bookmarked Lines.

      There are other ways as well.

      1 Reply Last reply Reply Quote 0
      • A
        Adam Bowsky
        last edited by Aug 21, 2019, 8:45 PM

        I think that only works in a single file at a time… there is no option to do that in the Find in Files tab. I am a novice, sorry.

        A 1 Reply Last reply Aug 21, 2019, 9:10 PM Reply Quote 0
        • S
          supasillyass @Adam Bowsky
          last edited by Aug 21, 2019, 9:05 PM

          @Adam-Bowsky

          1. Search
          2. Find in Files…
          3. Search Mode: Regular expression
          4. Find what: \r\n[ ]*.N90-.*00000000$ (Windows EOL)
          5. Replace with: (empty string)
          6. Set file filters and directory as appropriate
          7. Replace in Files
          A 1 Reply Last reply Aug 21, 2019, 9:11 PM Reply Quote 2
          • A
            Alan Kilborn @Adam Bowsky
            last edited by Aug 21, 2019, 9:10 PM

            @Adam-Bowsky

            Oh, sorry I missed the multi-file aspect of your question! Must not be my day.

            1 Reply Last reply Reply Quote 0
            • A
              Adam Bowsky @supasillyass
              last edited by Aug 21, 2019, 9:11 PM

              @supasillyass It worked! thank you!

              1 Reply Last reply Reply Quote 0
              • A
                Adam Bowsky
                last edited by Aug 22, 2019, 4:00 PM

                @supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:

                11N90-SS9035X 00000000
                311N90-SS9035X 00000000
                6001N90-SS9035X 00000000

                Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.

                \r\n[ ]*.N90-.*00000000$

                S 1 Reply Last reply Aug 22, 2019, 4:21 PM Reply Quote 0
                • M
                  Michael Vincent
                  last edited by Aug 22, 2019, 4:11 PM

                  @Adam-Bowsky

                  How about:

                  \r\n\s+\d{1,4}N90-.*\s+00000000$
                  

                  The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.

                  The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).

                  The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.

                  The N90- is self explanatory

                  The .* means match any character (.) or or more times (*).

                  The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.

                  Finally, the $ is stop at the end of the line.

                  A 1 Reply Last reply Aug 22, 2019, 5:54 PM Reply Quote 2
                  • M
                    Michael Vincent
                    last edited by Aug 22, 2019, 4:15 PM

                    Using PREGGER:

                    PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/"
                    The regular expression:
                    
                    (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$)
                    
                    matches as follows:
                    
                    NODE                     EXPLANATION
                    ----------------------------------------------------------------------
                    (?-imsx:                 group, but do not capture (case-sensitive)
                                             (with ^ and $ matching normally) (with . not
                                             matching \n) (matching whitespace and #
                                             normally):
                    ----------------------------------------------------------------------
                      \r                       '\r' (carriage return)
                    ----------------------------------------------------------------------
                      \n                       '\n' (newline)
                    ----------------------------------------------------------------------
                      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                               more times (matching the most amount
                                               possible))
                    ----------------------------------------------------------------------
                      \d{1,4}                  digits (0-9) (between 1 and 4 times
                                               (matching the most amount possible))
                    ----------------------------------------------------------------------
                      N90-                     'N90-'
                    ----------------------------------------------------------------------
                      .*                       any character except \n (0 or more times
                                               (matching the most amount possible))
                    ----------------------------------------------------------------------
                      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                               more times (matching the most amount
                                               possible))
                    ----------------------------------------------------------------------
                      00000000                 '00000000'
                    ----------------------------------------------------------------------
                      $                        before an optional \n, and the end of the
                                               string
                    ----------------------------------------------------------------------
                    )                        end of grouping
                    ----------------------------------------------------------------------
                    
                    PS VinsWorldcom@:~>
                    
                    1 Reply Last reply Reply Quote 1
                    • S
                      supasillyass @Adam Bowsky
                      last edited by Aug 22, 2019, 4:21 PM

                      @Adam-Bowsky

                      The dot indicated matches a single character:

                      \r\n[ ]*.N90-.*00000000$
                              ^
                      

                      So change it to match a string of digits:

                      \r\n[ ]*[0-9]*N90-.*00000000$
                              ^^^^^^
                      

                      There’s also an edge case not matched where the first line has N90-, so follow up with: ^[ ]*[0-9]*N90-.*00000000\r\n

                      A 1 Reply Last reply Aug 22, 2019, 6:15 PM Reply Quote 2
                      • A
                        Adam Bowsky @Michael Vincent
                        last edited by Aug 22, 2019, 5:54 PM

                        @Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                        1 Reply Last reply Reply Quote 0
                        • M
                          Michael Vincent
                          last edited by Aug 22, 2019, 6:13 PM

                          @Adam-Bowsky said:

                          For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                          It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then \d{1,5} would match it.

                          I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use \d+ which would match at least 1 and as many digits in a row (similar to the \s+ we’ve been using).

                          Cheers.

                          A 1 Reply Last reply Aug 22, 2019, 7:20 PM Reply Quote 2
                          • A
                            Adam Bowsky @supasillyass
                            last edited by Aug 22, 2019, 6:15 PM

                            @supasillyass thanks!

                            1 Reply Last reply Reply Quote 0
                            • A
                              Adam Bowsky @Michael Vincent
                              last edited by Aug 22, 2019, 7:20 PM

                              @Michael-Vincent thanks again!

                              1 Reply Last reply Reply Quote 0
                              • A
                                Adam Bowsky
                                last edited by Jan 21, 2020, 5:07 PM

                                Re: Delete lines in multiple text/DAT files that contain specific characters

                                Hello,

                                I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                      10DTP-1040K           00000000  This should not have been deleted, but was.
                                      10N90-SS7784X         00000000 This was deleted correctly.
                                
                                A 1 Reply Last reply Jan 21, 2020, 5:25 PM Reply Quote 0
                                • A
                                  Adam Bowsky @Adam Bowsky
                                  last edited by Jan 21, 2020, 5:25 PM

                                  @Michael-Vincent

                                  Hello,

                                  I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                    10DTP-1040K           00000000  This should not have been deleted, but was.
                                    10N90-SS7784X         00000000 This was deleted correctly.
                                  
                                  1 Reply Last reply Reply Quote 0
                                  • guy038G
                                    guy038
                                    last edited by guy038 Jan 23, 2020, 2:36 AM Jan 21, 2020, 9:45 PM

                                    Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,

                                    Personally, I would use the following regex S/R, which should work in all the discussed cases !

                                    I simply assume that the N90- string, with this exact case, is preceded with, at least, one digit !

                                    SEARCH (?-si)^\h*\d+N90-.*\R?

                                    REPLACE Leave EMPTY

                                    Of course, the Regular expression search mode is selected and the Wrap around option is ticked

                                    Give a try !

                                    I’ll give you some explanations when everything is right ;-))

                                    Best Regards

                                    guy038

                                    1 Reply Last reply Reply Quote 1
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors