How to detect lines lacking "..." at end?
-
I use Notepad++ as my primary program for editing movie subtitles.
But for years I have been struggling with an issue that plagues me for a long time. Often the subtitles lack the three dots (…) at the end of the dialog breaks. How can I tell notepad to detect the lines lacking dots, and fix them?
For example we have 2 dialogs, broken in 2 parts each, resulting in four subtitles lines:
1
00:00:10,000 --> 00:00:13,000
All the living organisms
can co-exist2
00:00:15,000 --> 00:00:18,000
on earth’s surface.3
00:00:20,000 --> 00:00:23,000
Earth is home to more than 7 billion humans4
00:00:25,000 --> 00:00:28,000
who live together, forming communities.Subtitle lines 2 and 4, are fine. So we ignore them. However, subtitle lines 1 and 3 are lacking three dots (…) at their end. How can I tell Notepad++ to add “…”, like that:
1
00:00:10,000 --> 00:00:13,000
All the living organisms
can co-exist…3
00:00:20,000 --> 00:00:23,000
Earth is home to more than 7 billion humans… -
You seem to be asking for adding “…” to any that doesn’t already end in a “.”. However, studying your data, I am guessing you’re really wanting “add an ellipsis to any subtitle that doesn’t end with a sentence-ender”. Unfortunately, detecting all possible sentence-enders might be more difficult than you think. I mean, there’s
., but there’s also!and?. But there’s also lines that end in quotes – and an endquote doesn’t necessarily mean end of sentence…I also cannot tell in your example data whether there’s a blank line between subtitles, but there probably is, and I’m going to assume that all subtitles end with a blank line.
I think, after you clarify that, we’ll be able to come up with a better solution
-----
My naive guess for just “subtitles that don’t end with a period need an ellipsis” would have been
- FIND =
[^\.]\K(?=\R\R\d+$) - replace =
... - mode = regular expression
but that seems to add two sets of
...for each, and I cannot quickly figure out why. But that’s a starting point. And if you need real end-of-sentence rather than just.-end-of-sentence detection, we can probably improve things… but it won’t be perfect. (though @guy038 will come close to perfection, I am sure.)(Like that! That last parenthetical sentence is the end of a sentence, and wouldn’t need an ellipsis, but ends with something that I hadn’t thought of as being an allowable sentence-ender.)
- FIND =
-
What’s the definitive pattern? A lack of a trailing period on the last one of a sequence?
-
@Silent-Resident
My suggestion:Find:
[^[:punct:]]$\K(?=\R\R\d+$)
Replace:...
Mode: Regular expression@PeterJones
Due to the addition of$it doesn’t add two sets of.... The[:punct:]character class should match all characters in question. -
Hello @silent-resident, and All,
Quite easy ! For instance, assuming your example, below, where I added
4subtitles lines ( from5to8)1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae", 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals : 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.Note : The text, included in the last
4subtitles, comes from :https://en.wikipedia.org/wiki/10th_edition_of_Systema_Naturae#Animals
So, in current language, your request could be : Add a
…sign at the end of any line, which does not end with a dot and which is followed by, at least2line-breaks. In that case, here is my solution :
-
Open the Replace dialog (
Ctrl + H) -
SEARCH
[^.\r\n]\K(?=\R{2,}) -
REPLACE
…OR\x{2026} -
Tick the
Wrap aroundoption -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton, exclusively ( Do not use theReplacebutton ! )
And you’ll get :
1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist… 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans… 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae",… 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom… 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals :… 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.
Now, If you do not want to add an horizontal ellipsis (
…) after a punctuation sign, prefer the following regex S/R :SEARCH
\w\K(?=\R{2,})REPLACE
…OR\x{2026}This time, the text will be changed into :
1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist… 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans… 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae", 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom… 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals : 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.Best Regards,
guy038
-
-
Amazing! Thank you all very very much, I am grateful. This is what I was looking for.
-
Hi, @silent-resident, @peterjones, @alan-kilborn, @dinkumoil, and All,
Peter, your search regex
[^\.]\K(?=\R\R\d+$)( which could be written[^.]\K(?=\R\R\d+$)) does not work as expected ! Why ?-
Well, considering the first sub-title, in a Windows file, your regex first matches the zero-length gap, between letter
tand the\rEOL character and it correctly adds the…symbol. OK ! -
Now, the regex engine location is between the two characters
…and\r. At this location, is there a single char, different from a dot, which can be followed with two line-breaks. The answer is YES : it’s just the\rEOL char, which is followed with\n( so the first\R) and the\r\ncouple ( So, the second\R). Thus, it adds a...symbol between the\rand the\nof the first line-break -
This situation cannot occur, again, right after as, if it had chosen the
\rchar ( as[^.]), then the remaining EOL characters were, only, the\ncharacter, which cannot match the\R\Rpart of your regex ! So, the next match happens, wrongly, between the\rand\nEOL characters, at the end of the lineon earth’s surface.and so on...-((
Luckily,
3solutions are possible to get the right behaviour :-
The
[^.\r\n]\K(?=\R\R\d+$)syntax to forces the character, before the EOL characters, to be different from EOL chars ! -
The
[^.]$\K(?=\R\R\d+$)syntax. Adding the$anchor forces the[^.]character to be located right before an end of line ( so, obviously, not between the two EOL characters\rand\n). It’s the solution adopted by @dinkumoil ! -
Finally, use the exact Windows EOL definition, with the
[^.]\K(?=\r\n\r\n\d+$)syntax
Cheers,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login