Community
    • Login

    Delete lines in multiple text/DAT files that contain specific characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    17 Posts 5 Posters 4.0k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA Offline
      Alan Kilborn @Adam Bowsky
      last edited by

      @Adam-Bowsky

      Oh, sorry I missed the multi-file aspect of your question! Must not be my day.

      1 Reply Last reply Reply Quote 0
      • Adam BowskyA Offline
        Adam Bowsky @supasillyass
        last edited by

        @supasillyass It worked! thank you!

        1 Reply Last reply Reply Quote 0
        • Adam BowskyA Offline
          Adam Bowsky
          last edited by

          @supasillyass It worked for files that had 1 digit in front of the text. Some of the files have 2, 3, and 4 digits, EX:

          11N90-SS9035X 00000000
          311N90-SS9035X 00000000
          6001N90-SS9035X 00000000

          Unfortunately, I am not sure of what the switches do, or if there is a different variance I need to use.

          \r\n[ ]*.N90-.*00000000$

          supasillyassS 1 Reply Last reply Reply Quote 0
          • Michael VincentM Offline
            Michael Vincent
            last edited by

            @Adam-Bowsky

            How about:

            \r\n\s+\d{1,4}N90-.*\s+00000000$
            

            The \r\n matches a windows carriage return, line feed. If you’re not using Windows (CR/LF) but rather Unix (LF), just remove the ‘\r’.

            The \s+ means match white space at least once but get as many as possible (you said there is preceding space on each line).

            The \d{1,4} means match a digit at least once, but not more than 4 times - you said “Some of the files have 2, 3, and 4 digits”.

            The N90- is self explanatory

            The .* means match any character (.) or or more times (*).

            The \s+ is spacing again before all the trailing '0’s, which themselves are self-explanatory.

            Finally, the $ is stop at the end of the line.

            Adam BowskyA 1 Reply Last reply Reply Quote 2
            • Michael VincentM Offline
              Michael Vincent
              last edited by

              Using PREGGER:

              PS VinsWorldcom@:~> pregger "/\r\n\s+\d{1,4}N90-.*\s+00000000$/"
              The regular expression:
              
              (?-imsx:\r\n\s+\d{1,4}N90-.*\s+00000000$)
              
              matches as follows:
              
              NODE                     EXPLANATION
              ----------------------------------------------------------------------
              (?-imsx:                 group, but do not capture (case-sensitive)
                                       (with ^ and $ matching normally) (with . not
                                       matching \n) (matching whitespace and #
                                       normally):
              ----------------------------------------------------------------------
                \r                       '\r' (carriage return)
              ----------------------------------------------------------------------
                \n                       '\n' (newline)
              ----------------------------------------------------------------------
                \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                         more times (matching the most amount
                                         possible))
              ----------------------------------------------------------------------
                \d{1,4}                  digits (0-9) (between 1 and 4 times
                                         (matching the most amount possible))
              ----------------------------------------------------------------------
                N90-                     'N90-'
              ----------------------------------------------------------------------
                .*                       any character except \n (0 or more times
                                         (matching the most amount possible))
              ----------------------------------------------------------------------
                \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                                         more times (matching the most amount
                                         possible))
              ----------------------------------------------------------------------
                00000000                 '00000000'
              ----------------------------------------------------------------------
                $                        before an optional \n, and the end of the
                                         string
              ----------------------------------------------------------------------
              )                        end of grouping
              ----------------------------------------------------------------------
              
              PS VinsWorldcom@:~>
              
              1 Reply Last reply Reply Quote 1
              • supasillyassS Offline
                supasillyass @Adam Bowsky
                last edited by

                @Adam-Bowsky

                The dot indicated matches a single character:

                \r\n[ ]*.N90-.*00000000$
                        ^
                

                So change it to match a string of digits:

                \r\n[ ]*[0-9]*N90-.*00000000$
                        ^^^^^^
                

                There’s also an edge case not matched where the first line has N90-, so follow up with: ^[ ]*[0-9]*N90-.*00000000\r\n

                Adam BowskyA 1 Reply Last reply Reply Quote 2
                • Adam BowskyA Offline
                  Adam Bowsky @Michael Vincent
                  last edited by

                  @Michael-Vincent thank you! I believe this worked correctly. 1 question… “the match a digit at least once”… does this include preceding zeros? For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                  1 Reply Last reply Reply Quote 0
                  • Michael VincentM Offline
                    Michael Vincent
                    last edited by

                    @Adam-Bowsky said:

                    For example, if the line had looked like this: 00001N90-SS9035X? If so, would I change \d{1,4} to \d{1,5}?

                    It does not include preceding zeros by default. Zeros (0) are numbers (digits) so they would count towards the 4 maximum ( { …, 4} ). You’re correct in that if you had 4 leading zeros, then \d{1,5} would match it.

                    I like to be precise in my RegEx (as precise as possible) to not catch anything I shouldn’t. I’d rather be cautious than aggressive when doing a bulk replace like this. You could just use \d+ which would match at least 1 and as many digits in a row (similar to the \s+ we’ve been using).

                    Cheers.

                    Adam BowskyA 1 Reply Last reply Reply Quote 2
                    • Adam BowskyA Offline
                      Adam Bowsky @supasillyass
                      last edited by

                      @supasillyass thanks!

                      1 Reply Last reply Reply Quote 0
                      • Adam BowskyA Offline
                        Adam Bowsky @Michael Vincent
                        last edited by

                        @Michael-Vincent thanks again!

                        1 Reply Last reply Reply Quote 0
                        • Adam BowskyA Offline
                          Adam Bowsky
                          last edited by

                          Re: Delete lines in multiple text/DAT files that contain specific characters

                          Hello,

                          I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                                10DTP-1040K           00000000  This should not have been deleted, but was.
                                10N90-SS7784X         00000000 This was deleted correctly.
                          
                          Adam BowskyA 1 Reply Last reply Reply Quote 0
                          • Adam BowskyA Offline
                            Adam Bowsky @Adam Bowsky
                            last edited by

                            @Michael-Vincent

                            Hello,

                            I have been using this process since you were kind enough to help me, and just notices that I am running into a problem with this expression: \r\n\s+\d{1,4}N90-.*\s+00000000$. in addition to deleting the line that has the N90- with a , it is also deleting the line above it. For example, the line above was deleted in addition to the line that I wanted to delete. This is happening on every file where N90- is present. Do you have any idea why this is happening?

                              10DTP-1040K           00000000  This should not have been deleted, but was.
                              10N90-SS7784X         00000000 This was deleted correctly.
                            
                            1 Reply Last reply Reply Quote 0
                            • guy038G Offline
                              guy038
                              last edited by guy038

                              Hello, @adam-bowsky, @michael-vincent, @supasillyass and All,

                              Personally, I would use the following regex S/R, which should work in all the discussed cases !

                              I simply assume that the N90- string, with this exact case, is preceded with, at least, one digit !

                              SEARCH (?-si)^\h*\d+N90-.*\R?

                              REPLACE Leave EMPTY

                              Of course, the Regular expression search mode is selected and the Wrap around option is ticked

                              Give a try !

                              I’ll give you some explanations when everything is right ;-))

                              Best Regards

                              guy038

                              1 Reply Last reply Reply Quote 1

                              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                              With your input, this post could be even better 💗

                              Register Login
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors