How to detect lines lacking "..." at end?
- 
I use Notepad++ as my primary program for editing movie subtitles.
But for years I have been struggling with an issue that plagues me for a long time. Often the subtitles lack the three dots (…) at the end of the dialog breaks. How can I tell notepad to detect the lines lacking dots, and fix them?
For example we have 2 dialogs, broken in 2 parts each, resulting in four subtitles lines:
1
00:00:10,000 --> 00:00:13,000
All the living organisms
can co-exist2
00:00:15,000 --> 00:00:18,000
on earth’s surface.3
00:00:20,000 --> 00:00:23,000
Earth is home to more than 7 billion humans4
00:00:25,000 --> 00:00:28,000
who live together, forming communities.Subtitle lines 2 and 4, are fine. So we ignore them. However, subtitle lines 1 and 3 are lacking three dots (…) at their end. How can I tell Notepad++ to add “…”, like that:
1
00:00:10,000 --> 00:00:13,000
All the living organisms
can co-exist…3
00:00:20,000 --> 00:00:23,000
Earth is home to more than 7 billion humans… - 
You seem to be asking for adding “…” to any that doesn’t already end in a “.”. However, studying your data, I am guessing you’re really wanting “add an ellipsis to any subtitle that doesn’t end with a sentence-ender”. Unfortunately, detecting all possible sentence-enders might be more difficult than you think. I mean, there’s
., but there’s also!and?. But there’s also lines that end in quotes – and an endquote doesn’t necessarily mean end of sentence…I also cannot tell in your example data whether there’s a blank line between subtitles, but there probably is, and I’m going to assume that all subtitles end with a blank line.
I think, after you clarify that, we’ll be able to come up with a better solution
-----
My naive guess for just “subtitles that don’t end with a period need an ellipsis” would have been
- FIND = 
[^\.]\K(?=\R\R\d+$) - replace = 
... - mode = regular expression
 
but that seems to add two sets of
...for each, and I cannot quickly figure out why. But that’s a starting point. And if you need real end-of-sentence rather than just.-end-of-sentence detection, we can probably improve things… but it won’t be perfect. (though @guy038 will come close to perfection, I am sure.)(Like that! That last parenthetical sentence is the end of a sentence, and wouldn’t need an ellipsis, but ends with something that I hadn’t thought of as being an allowable sentence-ender.)
 - FIND = 
 - 
What’s the definitive pattern? A lack of a trailing period on the last one of a sequence?
 - 
@Silent-Resident
My suggestion:Find:
[^[:punct:]]$\K(?=\R\R\d+$)
Replace:...
Mode: Regular expression@PeterJones
Due to the addition of$it doesn’t add two sets of.... The[:punct:]character class should match all characters in question. - 
Hello @silent-resident, and All,
Quite easy ! For instance, assuming your example, below, where I added
4subtitles lines ( from5to8)1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae", 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals : 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.Note : The text, included in the last
4subtitles, comes from :https://en.wikipedia.org/wiki/10th_edition_of_Systema_Naturae#Animals
So, in current language, your request could be : Add a
…sign at the end of any line, which does not end with a dot and which is followed by, at least2line-breaks. In that case, here is my solution :
- 
Open the Replace dialog (
Ctrl + H) - 
SEARCH
[^.\r\n]\K(?=\R{2,}) - 
REPLACE
…OR\x{2026} - 
Tick the
Wrap aroundoption - 
Select the
Regular expressionsearch mode - 
Click on the
Replace Allbutton, exclusively ( Do not use theReplacebutton ! ) 
And you’ll get :
1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist… 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans… 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae",… 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom… 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals :… 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.
Now, If you do not want to add an horizontal ellipsis (
…) after a punctuation sign, prefer the following regex S/R :SEARCH
\w\K(?=\R{2,})REPLACE
…OR\x{2026}This time, the text will be changed into :
1 00:00:10,000 --> 00:00:13,000 All the living organisms can co-exist… 2 00:00:15,000 --> 00:00:18,000 on earth’s surface. 3 00:00:20,000 --> 00:00:23,000 Earth is home to more than 7 billion humans… 4 00:00:25,000 --> 00:00:28,000 who live together, forming communities. 5 00:00:30,000 --> 00:00:33,000 In the first volume of the 10th edition of "Systema Naturae", 6 00:00:36,000 --> 00:00:39,000 written by the Swedish naturalist Carolus Linnaeus, the Animal kingdom… 7 00:00:42,000 --> 00:00:45,000 is broken down into six original classes of animals : 8 00:00:48,000 --> 00:00:51,000 Mammalia, Aves, Amphibia, Pisces, Insecta, & Vermes.Best Regards,
guy038
 - 
 - 
Amazing! Thank you all very very much, I am grateful. This is what I was looking for.
 - 
Hi, @silent-resident, @peterjones, @alan-kilborn, @dinkumoil, and All,
Peter, your search regex
[^\.]\K(?=\R\R\d+$)( which could be written[^.]\K(?=\R\R\d+$)) does not work as expected ! Why ?- 
Well, considering the first sub-title, in a Windows file, your regex first matches the zero-length gap, between letter
tand the\rEOL character and it correctly adds the…symbol. OK ! - 
Now, the regex engine location is between the two characters
…and\r. At this location, is there a single char, different from a dot, which can be followed with two line-breaks. The answer is YES : it’s just the\rEOL char, which is followed with\n( so the first\R) and the\r\ncouple ( So, the second\R). Thus, it adds a...symbol between the\rand the\nof the first line-break - 
This situation cannot occur, again, right after as, if it had chosen the
\rchar ( as[^.]), then the remaining EOL characters were, only, the\ncharacter, which cannot match the\R\Rpart of your regex ! So, the next match happens, wrongly, between the\rand\nEOL characters, at the end of the lineon earth’s surface.and so on...-(( 
Luckily,
3solutions are possible to get the right behaviour :- 
The
[^.\r\n]\K(?=\R\R\d+$)syntax to forces the character, before the EOL characters, to be different from EOL chars ! - 
The
[^.]$\K(?=\R\R\d+$)syntax. Adding the$anchor forces the[^.]character to be located right before an end of line ( so, obviously, not between the two EOL characters\rand\n). It’s the solution adopted by @dinkumoil ! - 
Finally, use the exact Windows EOL definition, with the
[^.]\K(?=\r\n\r\n\d+$)syntax 
Cheers,
guy038
 -