Finding multiple lines in multiple files and deleting just those lines
-
OK. Many thanks again. And I DID read that documentation. It explained a lot, but I didn’t see an answer to this:
I’ve got the following regex.(?-is)^(.\Q<font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>\E.|.\Q<font color=“#ffff00”>www.Addic7ed.Com</font>\E.|.\QSync by yyets.net - corrected by chamallow35\E.|.\Qwww.addic7ed.com\E.|.\QPlease rate this subtitle at www.osdb.link/6hdjt\E.|.\QSync & corrections by\E.|.www.addic7ed.com\E.|.\QPlease rate this subtitle\E.|.\Q== sync, corrected by <font color=“#00FF00”>elderman</font> ==\E.|.\Q <font color=“#00FFFF”>@elder_man\E.|.\Q<font color=“#00FFFF”>@elder_man</font> \E.|.\QWWW.MY-SUBS.COM\E.|.\QAdvertise your product or brand here\E.|.\Qcontact www.OpenSubtitles.org today\E.|.\QAmericasCardroom.com brings poker back\E.|.\QMillion Dollar Sunday Tournament every Sunday\E.|.\QSynced & corrected by\E.|.\QSynced and corrected by Octavia\E.|.\QHelp other users to choose the best subtitles\E.)\R
It works, but since the list of lines I’d like to remove grows, for the sake of simplicity I’d like to use this regex - it has the same expressions but the lines are separate. And reason that wouldn’t work?
(?-is)^(.\Q
<font color=“#ffff00”>Sync by honeybunny - corrected by chamallow35</font>\E.|.\Q
<font color=“#ffff00”>www.Addic7ed.Com</font>\E.|.\Q
Sync by yyets.net - corrected by chamallow35\E.|.\Q
www.addic7ed.com\E.|.\Q
Please rate this subtitle at www.osdb.link/6hdjt\E.|.\Q
Sync & corrections by\E.|.\Q*
www.addic7ed.com\E.|.\Q
Please rate this subtitle\E.|.\Q
== sync, corrected by <font color=“#00FF00”>elderman</font> ==\E.|.\Q
<font color=“#00FFFF”>@elder_man\E.|.\Q
<font color=“#00FFFF”>@elder_man</font> \E.|.\Q
WWW.MY-SUBS.COM\E.|.\Q
Advertise your product or brand here\E.|.\Q
contact www.OpenSubtitles.org today\E.|.\Q
AmericasCardroom.com brings poker back\E.|.\Q
Million Dollar Sunday Tournament every Sunday\E.|.\Q
Synced & corrected by\E.|.\Q
Synced and corrected by Octavia\E.|.\Q
Help other users to choose the best subtitles\E.*)\RThat way I could easily scan to see if a line already existed and if not just put it in before the last line followed by a macro for \E .|.\Q
Thanks
-
The following regex will get every one of those lines and it doesn’t care about the case of the character.
(?i)^.*?(<font|sync.*?correct|www\.|help other.*?subtitle|please rate|advertise your|\.com|million dollar).*?\R
So the search is an ‘insensitive’ search, it doesn’t care whether it’s an
a
or anA
. So first off that would save you looking for ‘WWW.’ and ‘www.’ with 2 sub expressions.You see, there isn’t a need to type every single line you need to search for individually. If you tried you would quickly find you exceeded the limit of the regex allowed. My example identifies a complete line so long as it has the characters defined within each sub expression. They are shown between the ‘|’ characters.
As a couple of the lines are very similar to possible dialogue I’ve made the sub expression look for 2 words with ‘something’ between them. I don’t care what the ‘something’ is, only that the 2 words appear on the same line. This may also be something you wish to try.
The only issue with my example is that it will NOT grab the very last line if that is one of the lines you want. That’s because the last line doesn’t finish on a ‘\R’. I don’t think that would be a problem though, these ‘advertising’ lines would generally be in the first 100 or so lines of each subtitle file I think.
The last 3 tests (
advertise your
,\.com
andmillion dollar
) might potentially occur within dialogue, so you may want to expand on those, but you still should not need to include the WHOLE line.As to why you expression didn’t work, I found it hard to read, far too much text. I just found it easier to do an example for you, maybe also because I think you are trying too hard to identify the lines. Regex is all about trying to bunch groups/strings of characters into neat buckets, that’s where it’s power lies. In your case you’re removing most of that power and trying to search for each unique line individually.
Terry
-
Hi, @steve-wilson, @terry-r and All,
Indeed, you can use separated lines if you include the
(?x)
modifier, which enables the free-spacing regex mode !So, copy/paste the search regex, below, in the Find what: zone, with, of course the two options
Regular expression
andWrap around
tickedNotes :
-
The
\R
syntax does not work in Free-spacing mode. So, you must write\r\n
( or\n
if Unix files ) -
In Free-spacing mode :
-
The
#
is the comment-line symbol. To use it, literally, simply write\#
-
The
space
character is not taken in account. To use it, literally, simply write\x20
or[ ]
-
-
However, when using the
\Q.......\E
syntax, either, the#
and thespace
symbols are searched, as literals !
So, your multi-lines search regex could be, as below :
(?x) (?-is) ^( .*\Q<font color="#ffff00">Sync by honeybunny - corrected by chamallow35</font>\E.*| .*\Q<font color="#ffff00">www.Addic7ed.Com</font>\E.*| .*\QSync by yyets.net - corrected by chamallow35\E.*| .*\Qwww.addic7ed.com\E.*| .*\QPlease rate this subtitle at www.osdb.link/6hdjt\E.*| .*\QSync & corrections by\E.*| .*\QPlease rate this subtitle\E.*| .*\Q== sync, corrected by <font color="#00FF00">elderman</font> ==\E.*| .*\Q<font color="#00FFFF">@elder_man\E.*| .*\Q<font color="#00FFFF">@elder_man</font> \E.*| .*\QWWW.MY-SUBS.COM\E.*| .*\QAdvertise your product or brand here\E.*| .*\Qcontact www.OpenSubtitles.org today\E.*| .*\QAmericasCardroom.com brings poker back\E.*| .*\QMillion Dollar Sunday Tournament every Sunday\E.*| .*\QSynced & corrected by\E.*| .*\QSynced and corrected by Octavia\E.*| .*\QHelp other users to choose the best subtitles\E.* )\r\n
Remarks :
-
The entire search selection must not exceed
2,046
characters -
Unfortunately, the multi-lines replacement is NOT allowed, with our N++ regex engine !
Et voilà !
Cheers,
guy038
-
-
OK. I’ve read all the help files and understand a lot more, but I’m finding that a lot of the subtitles I’ve edited are missing lines. For instance, the line:
Help other users to choose the best subtitles
might be in the subtitle and the regex would find it and remove it. But in some of the subtitles, that line is
Help other users to choose the best subtitles
That leading empty space is stopping the regex from catching the line. And of course, that’s just one of the lines that it happens to. I’ve tried changing the line in the regex to .\QHelp other users to choose the best subtitles\E.|, but that doesn’t seem to have any effect. I’ve tried
.Help other users to choose the best subtitles| and .?Help other users to choose the best subtitles| , and neither of those seem to help either. I don’t have any idea what I’m missing. I just want to remove any line in the subtitle that contains Help other users to choose the best subtitles . I don’t care what (if anything) precedes it. -
OK, I understand a little better what is happening. I think. If I run a regex on a subtitle file containing 100 lines, single lines of text that are in the regex ARE removed. But sections of the subtitle that have 2 consecutive lines of text listed in the regex would only have the first line removed and a space would be prefixed to the next line. If I run the regex a second time, the second line WOULD be removed. If there were a third line that was listed in the regex (a rare occurrence), that third line would have TWO spaces prefixed. Running the regex a third time would catch that line. Except in cases of there being multiple consecutive lines at the very end of the file. In that case the line just doesn’t get removed no matter how many times I run the regex.
It’s much better than doing this all by hand though, so thanks again.
-
Like I said, I understand MUCH better what the regex is supposed to do and is doing. This is the regex I’m using:
(?i)^.?(<font.?|\sync.?correct|www.|.(?=Help other users)|please rate|Professional Translation|advertise your|== for|WWW.MY-SUBS.COM|SLY@Moon|subXpacio|www.opensubtitles.org|Open Subtitles MKV Player|Subtitles by|Synchr:AA|Support us and become VIP|to remove all ads|AmericasCardroom|.(?=Synced and corrected by BLuk)|Subrip by|million dollar).?\RIt works flawlessly. If I run this regex on a directory of files and some of the files have consecutive lines listed for removal in the regex, I’ll get a message at the end like “Replace in Files: 26 occurrences replaced”. If I run the regex until it says “Replace in Files: 0 occurrences replaced”, all of the lines (including the multiple consecutive lines) will have been replaced. Except - occasionally there will be multiple consecutive lines at the very end of the subtitle I’m editing. In those instances, the very last line won’t have been removed. I can solve that fairly simply by loading every subtitle into NP++ and just making sure that there’s a blank line (simply a CR/FL) for the last line. I’ve tried to create a regex to do just that, but they aren’t working. It’s not a huge deal - this is MUCH better and I’m very appreciative, but if what I’m looking for is a trivial thing and anyone has suggestions they’d be very welcome.
As always, thanks very much