• Login
Community
  • Login

Finding multiple lines in multiple files and deleting just those lines

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
26 Posts 6 Posters 9.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    Steve Wilson
    last edited by Aug 4, 2018, 10:19 PM

    I apologize if this is addressed elsewhere. Actually I’m sure it must be, but I couldn’t find it. Anyway

    I have a directory full of text files. Most of them contain some lines that I need to delete. I know I can do this by hitting Shift-Ctrl-F, typing one of the lines I want to delete in the “Find what” box, leaving the “Replace with” box empty, making sure the “Directory” box is set properly, and then clicking on “Replace in files”. But I have a LOT of files that I need to do this with. I had this issue before and I don’t think the answer was complicated. I just can’t find the answer and I can’t remember it. I SEEM to remember having to make sure “Search mode” was set to extended, but no matter what syntax I try and use now I can’t get it to work. Any suggestions would be very much appreciated.

    Thanks

    Steve

    1 Reply Last reply Reply Quote 0
    • T
      Terry R
      last edited by Terry R Aug 4, 2018, 10:51 PM Aug 4, 2018, 10:49 PM

      Possibly you are using syntax for the 'regular expression mode, in which case use that search mode.

      More info on the various search modes can be found here:
      http://docs.notepad-plus-plus.org/index.php/Searching_And_Replacing

      I suspect you may be using the wrong mode, regular expression on a folder full of files works well for me. I recently did 30 or so text files and ran 20 different macros across then with a number of different search and replacements. All that was under the search mode regular expression with the tab ‘find in files’.

      If you are just typing text only try the normal search mode.

      Terry

      1 Reply Last reply Reply Quote 1
      • S
        Steve Wilson
        last edited by Aug 5, 2018, 11:41 AM

        I’m probably not explaining myself very well. I created a simple test file to try. It was just

        this is line 1
        this is line 2
        this is line 3
        this is line 4

        Trying for test purposes to get rid of lines 1 and 3 I put “this is line 1 \n this is line 3” in the “Find what” box. In the “Replace with” box I just clicked the mouse and hit the spacebar. Then I clicked on "Replace in files. I set search mode to regular expression and extended. I tried with no spaces before and after the \n. And I tried with using \n, instead of \n. I’m doing something wrong, but damned if I can figure out what.

        Thanks again

        S 1 Reply Last reply Aug 5, 2018, 12:52 PM Reply Quote 0
        • S
          Scott Sumner @Steve Wilson
          last edited by Aug 5, 2018, 12:52 PM

          @Steve-Wilson

          I’m doing something wrong, but damned if I can figure out what.

          You are doing a lot of things wrong… :-)

          Say you had this text in a file:

          abcdefghijklmnop
          

          What you are currently trying in your “for test purposes” data is the equivalent of using a Find what of abcghi on the above text…and wondering why it doesn’t match.

          We could give you something that would delete your desired lines in your sample text, but that would probably just elicit from you a “That’s great but the same technique doesn’t work on my real data”.

          So why don’t you cough up some real data and someone will probably help you do what you want…?

          1 Reply Last reply Reply Quote 0
          • S
            Steve Wilson
            last edited by Aug 5, 2018, 1:22 PM

            I’m not sure what you’re talking about here. I used that test file just to simplify and show what I was trying to do. I’m NEVER trying to remove characters or parts of a line. I’m trying to remove lines that may be in some files and may not be in other files. Complete lines. The files are subtitles. For instance, a directory containing 200 subtitles. SOME of these subtitles contain the lines. I’d want to keep lines 1 and 2, but I’d want to get rid of line 3.

            6
            00:01:42,614 --> 00:01:52,950
            <font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>

            Other subtitle files might contain the lines

            16
            00:01:22,611 --> 00:01:32,611
            Sync by yyets.net - corrected by chamallow35

            Again, I’d want to remove line 3 of that, but leave the other two lines.

            So all I’m looking for is a way to tell NP++ to remove
            <font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>
            and
            Sync by yyets.net - corrected by chamallow35

            and any other COMPLETE lines unnecessary in the file without my having to search and replace one line at a time. I’m NEVER trying to remove text from within a line.

            Thanks again

            S 1 Reply Last reply Aug 5, 2018, 1:33 PM Reply Quote 0
            • S
              Scott Sumner @Steve Wilson
              last edited by Aug 5, 2018, 1:33 PM

              @Steve-Wilson

              It’s cool. The large font didn’t help my understanding (still quite confused), but maybe someone else will jump in. I’m out. Cheers, bro.

              1 Reply Last reply Reply Quote 0
              • S
                Steve Wilson
                last edited by Aug 5, 2018, 1:49 PM

                Sorry. I’ve got no idea what caused the large font. I assure you I didn’t do it on purpose.

                But thanks again!

                1 Reply Last reply Reply Quote 0
                • S
                  Steve Wilson
                  last edited by Aug 5, 2018, 2:45 PM

                  Okay, I think I’m making my request/explanation confusing and over-complicated. Say I have the following file:

                  1
                  00:00:06,000 --> 00:00:12,074
                  <font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font> [line to remove]
                  <font color=“#ffff00”>www.Addic7ed.Com </font> [line to remove]

                  2
                  00:00:12,920 --> 00:00:14,420
                  Now we’re talking. Yeah, please.

                  3
                  00:00:15,870 --> 00:00:16,980
                  Right here, baby. Aw…

                  4
                  00:00:20,580 --> 00:00:21,480
                  Over here.

                  5
                  00:00:21,480 --> 00:00:23,140
                  Over here, yeah, yeah.

                  6
                  00:00:32,020 --> 00:00:32,990
                  Over here, Kelli.

                  7
                  00:00:33,990 --> 00:00:35,810
                  Sync by yyets.net - corrected by chamallow35 [line to remove]
                  www.addic7ed.com [line to remove]

                  8
                  00:00:36,010 --> 00:00:38,390
                  Over here, Kelli. You
                  look beautiful. Right here.

                  9
                  00:00:38,390 --> 00:00:40,190
                  Please rate this subtitle at www.osdb.link/6hdjt [line to remove]
                  Help other users to choose the best subtitles [line to remove]

                  And in that file I’m trying to remove just the lines that I’ve marked with [line to remove].

                  I’m just trying to figure out what I would put in the “Find what” box, the “Replace with” box and what I would have the “Search mode” set to. I can’t figure it out.

                  Thanks

                  10
                  00:00:44,180 --> 00:00:45,170
                  Bupkes.

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 Aug 5, 2018, 5:34 PM Aug 5, 2018, 5:30 PM

                    @steve-wilson, @scott-sumner, @terry-r and All,

                    Thanks for your last post which gives us useful information. However there is still a point unclear !

                    You previously said that you wanted to get rid of line 3. But, from your last post, it seems that your want to get rid, also, of all lines, located after the line 3 ! Am I right about it ?


                    Anyway, the regex S/R, below, supposes that you want to get rid of all lines :

                    • Containing the Sync by string, with that exact case

                    OR

                    • Containing the string www., of an Internet address

                    as well as any subsequent lines, until a true empty line


                    So, assuming your example, placed in a N++ new tab :

                    1
                    00:00:06,000 --> 00:00:12,074
                    <font color="#ffff00">Sync by honeybunny - corrected by chamallow35</font> [line to remove]
                    <font color="#ffff00">www.Addic7ed.Com</font> [line to remove]
                    
                    2
                    00:00:12,920 --> 00:00:14,420
                    Now we’re talking. Yeah, please.
                    
                    3
                    00:00:15,870 --> 00:00:16,980
                    Right here, baby. Aw…
                    
                    4
                    00:00:20,580 --> 00:00:21,480
                    Over here.
                    
                    5
                    00:00:21,480 --> 00:00:23,140
                    Over here, yeah, yeah.
                    
                    6
                    00:00:32,020 --> 00:00:32,990
                    Over here, Kelli.
                    
                    7
                    00:00:33,990 --> 00:00:35,810
                    Sync by yyets.net - corrected by chamallow35 [line to remove]
                    www.addic7ed.com [line to remove]
                    
                    8
                    00:00:36,010 --> 00:00:38,390
                    Over here, Kelli. You
                    look beautiful. Right here.
                    
                    9
                    00:00:38,390 --> 00:00:40,190
                    Please rate this subtitle at www.osdb.link/6hdjt [line to remove]
                    Help other users to choose the best subtitles [line to remove]
                    
                    
                    10
                    00:00:44,180 --> 00:00:45,170
                    Bupkes.
                    
                    • Open the Replace dialog ( CTRL + H )

                    • Type, or copy/paste the regex (?-s)^.*\b(Sync by\x20|www\.).*\R(.+\R)+ in the Find what: zone

                    • Leave the Replace with: zone EMPTY

                    • Ticked the Wrap around option

                    • Select the Regular expression search mode

                    • Click once, on the Replace All button

                    You should obtain the expected text :

                    1
                    00:00:06,000 --> 00:00:12,074
                    
                    2
                    00:00:12,920 --> 00:00:14,420
                    Now we’re talking. Yeah, please.
                    
                    3
                    00:00:15,870 --> 00:00:16,980
                    Right here, baby. Aw…
                    
                    4
                    00:00:20,580 --> 00:00:21,480
                    Over here.
                    
                    5
                    00:00:21,480 --> 00:00:23,140
                    Over here, yeah, yeah.
                    
                    6
                    00:00:32,020 --> 00:00:32,990
                    Over here, Kelli.
                    
                    7
                    00:00:33,990 --> 00:00:35,810
                    
                    8
                    00:00:36,010 --> 00:00:38,390
                    Over here, Kelli. You
                    look beautiful. Right here.
                    
                    9
                    00:00:38,390 --> 00:00:40,190
                    
                    
                    10
                    00:00:44,180 --> 00:00:45,170
                    Bupkes.
                    

                    Voilà !


                    If we’re not far from the goal, I could, next time, explain my search regex !

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • S
                      Steve Wilson
                      last edited by Aug 5, 2018, 5:45 PM

                      No. I’m simply trying to get rid of the lines I’ve marked in that example with [line to remove] at the end of the line. <sigh> I’m not making myself clear. What I’m trying to do is, I think, really simple. I simply want to remove multiple lines of text from within files without having to remove them one at a time.

                      i said I wanted to get rid of line 3 simply to indicate that it was the third line of text that I was attempting to remove. Not the third and fourth, fifth, etc. Multiple lines containing specific text strings. In my last example I indicated those lines by appending [line to be removed] to the end of the line/string that I was wanting to be gone.

                      Thanks

                      1 Reply Last reply Reply Quote 0
                      • A
                        Alan Kilborn
                        last edited by Aug 5, 2018, 6:07 PM

                        I would go with a find field of ^.+?\[line to remove\].*?\R and a replace box that is totally empty. That should eliminate all of the desired to be deleted lines.

                        1 Reply Last reply Reply Quote 2
                        • G
                          guy038
                          last edited by Aug 5, 2018, 6:34 PM

                          Hi, @steve-wilson, @alan-kilborn, @scott-sumner, @terry-r and All,

                          Many thanks, Alan ! Oh my god, so simple ! Then, Steve, actually, you would like to get rid of all lines containing the literal string [line to remove], wouldn’t you ? Is it, really, the single rule needed for the regex ?

                          If so, of course, the Alan’s regex works fine. You could, also use, the regex (?-is)^.+\[line to remove\].*\R, which :

                          • Catches single-line text, only, due to the (?-s) modifier

                          • Matches the literal string [line to remove], with that exact case, due to the (?-i) modifier

                          Remember that the Replace with: zone remains Empty

                          Cheers

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • S
                            Steve Wilson
                            last edited by Aug 5, 2018, 6:42 PM

                            Could I use the regex “(?-is)^.+[first line to remove][second line to remove].*\R” (etc on the lines to remove? There are probably at least a dozen or more lines I’m trying to remove from a lot of files. I just want to avoid having to do it one file (or one line) at a time.

                            And, many thanks.

                            1 Reply Last reply Reply Quote 0
                            • S
                              Steve Wilson
                              last edited by Aug 5, 2018, 7:14 PM

                              So, if I want to get rid of each of the following lines that may be contained
                              "
                              <font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>
                              <font color=“#ffff00”>www.Addic7ed.Com </font>
                              Sync by yyets.net - corrected by chamallow35
                              www.addic7ed.com
                              Please rate this subtitle at www.osdb.link/6hdjt
                              Help other users to choose the best subtitles
                              "
                              could I just use the regex (?-is)^.+<font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>\<font color=“#ffff00”>www.Addic7ed.Com </font>\Sync by yyets.net - corrected by chamallow35\www.addic7ed.com \Please rate this subtitle at www.osdb.link/6hdjt\Help other users to choose the best subtitles.*\R

                              1 Reply Last reply Reply Quote 0
                              • G
                                guy038
                                last edited by Aug 5, 2018, 7:19 PM

                                Hi, @steve-wilson,

                                Ah, yes ! Of course, if you already placed strings, like First line to remove or Second line to remove…, in your files, just change my previous regex, as below :

                                (?-is)^.+\[.*line to remove\].*\R

                                Notes :

                                • The modifiers (?-is) were explained previously

                                • Then the regex matches, from beginning of line ( ^ ), any non-empty range of standard characters ( .+ ), ending with an opening square bracket symbol ( \[ )

                                • Then matching any range, possibly empty, of standard characters ( .*), ending with the string line to remove, with that exact case, and the ending square bracket symbol ( \] )

                                • And, finally, matching any remaining range of characters, possibly empty, of the current line ( .* ) , along with its End of Line characters ( \R ), which may be \r\n for Windows files, \n for Unix files or \r for Macintosh files

                                • And, as the Replacement field is empty, the complete matched line, with its line-break, is, thus, deleted

                                Remarks :

                                • The square bracket symbols, being regex symbols, must be escaped to be considered as literals !

                                • Any syntax [...... line to remove], whatever text is, between the [ symbol and the string line to remove will be taken in account by the regex and the corresponding line selected for deletion

                                Cheers,

                                guy038

                                1 Reply Last reply Reply Quote 1
                                • S
                                  Steve Wilson
                                  last edited by Aug 5, 2018, 7:48 PM

                                  I REALLY do appreciate all the time you’ve put in here. I’m just not getting it. I’m going to need to input at least a dozen lines that I want gone, but if you could show me an example of a regex to remove the following six lines, it’d be terribly helpful.

                                  <font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>
                                  <font color=“#ffff00”>www.Addic7ed.Com </font>
                                  Sync by yyets.net - corrected by chamallow35
                                  www.addic7ed.com
                                  Please rate this subtitle at www.osdb.link/6hdjt
                                  Help other users to choose the best subtitles

                                  I DO appreciate it. I’m just not getting it.

                                  1 Reply Last reply Reply Quote 0
                                  • T
                                    Terry R
                                    last edited by Terry R Aug 5, 2018, 7:55 PM Aug 5, 2018, 7:55 PM

                                    I’ll quickly wade in here.
                                    Of the examples provided and knowing that the subtitle files are for movies I’d suggest you can group some of the lines you wish to remove. For example it’s very unlikely some dialogue would include www. or <font color or even Sync by. So in effect you may not need to actually write out the lines in full. you just need enough information to uniquely identify the lines you want to remove.

                                    May I also suggest you combine ALL the files together, order them and remove duplicates. Look at what’s left. This could quickly identify what you’re trying to remove. Then using that information you get a regex to run over the original files.

                                    Terry

                                    1 Reply Last reply Reply Quote 1
                                    • G
                                      guy038
                                      last edited by guy038 Aug 5, 2018, 8:25 PM Aug 5, 2018, 8:12 PM

                                      @steve-wilson, and All,

                                      Very sorry, because we posted, rather simultaneously :-((

                                      Your last post goes towards a completely different direction ! The syntax, that you described, cannot be used in that form !! The regex would be quite invalid :-((

                                      So, first, are you searching these six sentences, below, with that exact syntax ?

                                      <font color="#ffff00">Sync by honeybunny - corrected by chamallow35</font>
                                      <font color="#ffff00">www.Addic7ed.Com</font>
                                      Sync by yyets.net - corrected by chamallow35
                                      www.addic7ed.com
                                      Please rate this subtitle at www.osdb.link/6hdjt
                                      Help other users to choose the best subtitles
                                      

                                      I mean, could it be that, sometimes, you get lines with chamallow73 instead of chamallow35, OR these six lines never change, in all your files ?


                                      If these 6 lines have a fix form, the generic regex, to use, is :

                                      SEARCH (?-is)^(.*\Q...Line 1 Contents...\E.*|.*\Q...Line 2 Contents...\E.*|.*\Q...Line 3 Contents...\E.*|..........|.*\Q...Line 6 Contents...\E.*)\R

                                      Notes :

                                      • ...Line #n Contents... represents any the exact part, of the nth line, what you want to search

                                      • The \Q and \E escaped sequences ensure you that any text placed between these two boundaries, is taken, literally

                                      • The | symbol is a regex symbol to separate different alternatives to search, simultaneously

                                      • The .* syntaxes, located, before \Q and after \E are the areas, possibly empty, located before and after your different sentences to search. Note that, if a sentence represents all the contents of a line, you may suppress these .* syntaxes, in the corresponding alternative

                                      • Finally, any possible alternative, between parentheses, must begin a line ( ^ ) and ends with its line-break characters ( \R )


                                      If we apply this generic regex to your real example, we get the following regex :

                                      (?-is)^(.*\Q<font color="#ffff00">Sync by honeybunny - corrected by chamallow35</font>\E.*|.*\Q<font color="#ffff00">www.Addic7ed.Com</font>\E.*|.*\QSync by yyets.net - corrected by chamallow35\E.*|.*\Qwww.addic7ed.com\E.*|.*\QPlease rate this subtitle at www.osdb.link/6hdjt\E.*|.*\QHelp other users to choose the best subtitles\E.*)\R

                                      Et voilà :-))

                                      Cheers,

                                      guy038

                                      P.S. :

                                      I strongly advice you to read this FAQ post, on regexes, below :

                                      https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation/1

                                      1 Reply Last reply Reply Quote 1
                                      • T
                                        Terry R
                                        last edited by Terry R Aug 5, 2018, 8:16 PM Aug 5, 2018, 8:14 PM

                                        To remove the 6 lines in your example the following regex would work.
                                        Find what: ^((?|<font color|Sync by|Please rate|Help other|www.).+\R)
                                        Replace with: empty line here

                                        The search mode is regular expression and wrap around is ticked.

                                        Note that I haven’t included the complete line as I think the strings I’m searching for are unique enough. The .+\R sequence at the end means as long as it starts with one of the strings, also grab the remainder of the line. The ^ at the start makes sure we are starting a search at the start of a line. Thus if these strings are NOT at the start of the line, they will not be removed.

                                        The regex includes a pipe character between the different strings (|), this allows the regexe to look for different strings all within the one expression, so you would only need to run it once to get all those alternatives removed. You can extend the regex by adding more pipe characters and other strings to search for.

                                        Terry

                                        1 Reply Last reply Reply Quote 1
                                        • S
                                          Steve Wilson
                                          last edited by Aug 5, 2018, 9:08 PM

                                          Many Many thanks. I’ve managed to remove a few hundred bothersome lines from a few hundred srt files. Much faster than doing it one by one.

                                          I DO appreciate it.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          10 out of 26
                                          • First post
                                            10/26
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors