Community
    • Login

    Delete lines in multiple text/DAT files that contain specific characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    17 Posts 5 Posters 1.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Adam Bowsky
      last edited by Alan Kilborn

      @Adam-Bowsky

      You could use a Search > Mark... operation to bookmark lines containing N90-. Then delete the bookmarked lines using Search > Bookmark > Remove Bookmarked Lines.

      There are other ways as well.

      1 Reply Last reply Reply Quote 0
      • Adam BowskyA
        Adam Bowsky
        last edited by

        I think that only works in a single file at a time… there is no option to do that in the Find in Files tab. I am a novice, sorry.

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • supasillyassS
          supasillyass @Adam Bowsky
          last edited by

          @Adam-Bowsky

          1. Search
          2. Find in Files…
          3. Search Mode: Regular expression
          4. Find what: \r\n[ ]*.N90-.*00000000$ (Windows EOL)
          5. Replace with: (empty string)
          6. Set file filters and directory as appropriate
          7. Replace in Files
          Adam BowskyA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA
            Alan Kilborn @Adam Bowsky
            last edited by

            @Adam-Bowsky

            Oh, sorry I missed the multi-file aspect of your question! Must not be my day.

            1 Reply Last reply Reply Quote 0
            • Adam BowskyA
              Adam Bowsky @supasillyass
              last edited by

              @supasillyass It worked! thank you!

              1 Reply Last reply Reply Quote 0
              • Adam BowskyA
                Adam Bowsky
                last edited by

                @supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:

                11N90-SS9035X 00000000
                311N90-SS9035X 00000000
                6001N90-SS9035X 00000000

                Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.

                \r\n[ ]*.N90-.*00000000$

                supasillyassS 1 Reply Last reply Reply Quote 0
                • Michael VincentM
                  Michael Vincent
                  last edited by

                  @Adam-Bowsky

                  How about:

                  \r\n\s+\d{1,4}N90-.*\s+00000000$
                  

                  The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.

                  The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).

                  The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.

                  The N90- is self explanatory

                  The .* means match any character (.) or or more times (*).

                  The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.

                  Finally, the $ is stop at the end of the line.

                  Adam BowskyA 1 Reply Last reply Reply Quote 2
                  • Michael VincentM
                    Michael Vincent
                    last edited by

                    Using PREGGER:

                    PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/"
                    The regular expression:
                    
                    (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$)
                    
                    matches as follows:
                    
                    NODE                     EXPLANATION
                    ----------------------------------------------------------------------
                    (?-imsx:                 group, but do not capture (case-sensitive)
                                             (with ^ and $ matching normally) (with . not
                                             matching \n) (matching whitespace and #
                                             normally):
                    ----------------------------------------------------------------------
                      \r                       '\r' (carriage return)
                    ----------------------------------------------------------------------
                      \n                       '\n' (newline)
                    ----------------------------------------------------------------------
                      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                               more times (matching the most amount
                                               possible))
                    ----------------------------------------------------------------------
                      \d{1,4}                  digits (0-9) (between 1 and 4 times
                                               (matching the most amount possible))
                    ----------------------------------------------------------------------
                      N90-                     'N90-'
                    ----------------------------------------------------------------------
                      .*                       any character except \n (0 or more times
                                               (matching the most amount possible))
                    ----------------------------------------------------------------------
                      \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                               more times (matching the most amount
                                               possible))
                    ----------------------------------------------------------------------
                      00000000                 '00000000'
                    ----------------------------------------------------------------------
                      $                        before an optional \n, and the end of the
                                               string
                    ----------------------------------------------------------------------
                    )                        end of grouping
                    ----------------------------------------------------------------------
                    
                    PS VinsWorldcom@:~>
                    
                    1 Reply Last reply Reply Quote 1
                    • supasillyassS
                      supasillyass @Adam Bowsky
                      last edited by

                      @Adam-Bowsky

                      The dot indicated matches a single character:

                      \r\n[ ]*.N90-.*00000000$
                              ^
                      

                      So change it to match a string of digits:

                      \r\n[ ]*[0-9]*N90-.*00000000$
                              ^^^^^^
                      

                      There’s also an edge case not matched where the first line has N90-, so follow up with: ^[ ]*[0-9]*N90-.*00000000\r\n

                      Adam BowskyA 1 Reply Last reply Reply Quote 2
                      • Adam BowskyA
                        Adam Bowsky @Michael Vincent
                        last edited by

                        @Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                        1 Reply Last reply Reply Quote 0
                        • Michael VincentM
                          Michael Vincent
                          last edited by

                          @Adam-Bowsky said:

                          For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                          It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then \d{1,5} would match it.

                          I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use \d+ which would match at least 1 and as many digits in a row (similar to the \s+ we’ve been using).

                          Cheers.

                          Adam BowskyA 1 Reply Last reply Reply Quote 2
                          • Adam BowskyA
                            Adam Bowsky @supasillyass
                            last edited by

                            @supasillyass thanks!

                            1 Reply Last reply Reply Quote 0
                            • Adam BowskyA
                              Adam Bowsky @Michael Vincent
                              last edited by

                              @Michael-Vincent thanks again!

                              1 Reply Last reply Reply Quote 0
                              • Adam BowskyA
                                Adam Bowsky
                                last edited by

                                Re: Delete lines in multiple text/DAT files that contain specific characters

                                Hello,

                                I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                      10DTP-1040K           00000000  This should not have been deleted, but was.
                                      10N90-SS7784X         00000000 This was deleted correctly.
                                
                                Adam BowskyA 1 Reply Last reply Reply Quote 0
                                • Adam BowskyA
                                  Adam Bowsky @Adam Bowsky
                                  last edited by

                                  @Michael-Vincent

                                  Hello,

                                  I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                    10DTP-1040K           00000000  This should not have been deleted, but was.
                                    10N90-SS7784X         00000000 This was deleted correctly.
                                  
                                  1 Reply Last reply Reply Quote 0
                                  • guy038G
                                    guy038
                                    last edited by guy038

                                    Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,

                                    Personally, I would use the following regex S/R, which should work in all the discussed cases !

                                    I simply assume that the N90- string, with this exact case, is preceded with, at least, one digit !

                                    SEARCH (?-si)^\h*\d+N90-.*\R?

                                    REPLACE Leave EMPTY

                                    Of course, the Regular expression search mode is selected and the Wrap around option is ticked

                                    Give a try !

                                    I’ll give you some explanations when everything is right ;-))

                                    Best Regards

                                    guy038

                                    1 Reply Last reply Reply Quote 1
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors