Capitalization challenge with srt subtitle file - ignoring strings of text possible?
-
Seems simple (I think) to describe, but I’m not sure if it’s possible to solve it…
I have an .srt file with speaker diarization and it has capitalization errors that I’d like to auto correct.
So, I have a sentence like this:
12 00:00:43,110 --> 00:00:47,970 Speaker 0: and it took weeks to create this five second shot.
And I want to change it to:
12 00:00:43,110 --> 00:00:47,970 Speaker 0: And it took weeks to create this five second shot.
In this example, just the “and” would be changed to “And”.
With the “Convert Case to” -> “Sentence case (blend)” option, it considers “Speaker 0” to be the first word of the sentence and therefore does not capitalize “and”.
Is there a way to get it to ignore text in strings or something like that? So for example, if it could ignore text in brackets [ ], then I could search and replace "Speaker 0: " with “[Speaker 0: ]”, do the case conversion, then search and replace “[Speaker 0: ]” with "Speaker 0: " to change it back.
Otherwise, is there maybe a way to accomplish this with RegEx?
-
Regex could do it, but I wonder if maybe inserting a carriage return/line feed at the
:
so that the sentence starts on a new line, then use the Convert Case option, then remove that carriage return/line feed.The only reason I say that is the variability of this 3rd line might be such that a regex either might miss some or include others that shouldn’t be adjusted.
Terry
PS Of course if you knew the parameters of the 3rd line, does it always start with the word Speaker, always have a number followed by a colon?
-
I don’t have NPP handy to test this right now but (famous last words), wouldn’t replacing
^([^:\r\n]+:\h*)([a-z])
with${1}\u${2}
do the job? That will capitalize any single ASCII letter following the start of a line, then a speaker name, then a colon, then any amount of non-newline whitespace.AFAICT the only way that doesn’t work is if the name of a speaker could have a literal
:
character in it, which it probably can’t. -
@Jacob-Holmes said in Capitalization challenge with srt subtitle file - ignoring strings of text possible?:
Otherwise, is there maybe a way to accomplish this with RegEx?
So given the single example it might not be a good solution, but:
Using the Replace function:
Find What:(?-is)^Speaker\s*\d*:\s*\K(.)
Replace With:\U\1
Of course the search mode is Regular Expression and you need to click on the “Replace All” button. Best if the cursor is at the start of the file.
The
\K
will cause issues if you want to detect and replace one line at a time, hence the need for the “Replace All”. So obviously you’d consider working on a copy of the file, just in case it caused more problems. If you did try this and it failed we’d need to know more about the issues. Then either we adjust this solution or start with a new idea.Terry