Community
    • Login

    Capitalization challenge with srt subtitle file - ignoring strings of text possible?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 195 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jacob HolmesJ
      Jacob Holmes
      last edited by

      Seems simple (I think) to describe, but I’m not sure if it’s possible to solve it…

      I have an .srt file with speaker diarization and it has capitalization errors that I’d like to auto correct.

      So, I have a sentence like this:

      12
      00:00:43,110 --> 00:00:47,970
      Speaker 0: and it took weeks to create this five second shot.
      

      And I want to change it to:

      12
      00:00:43,110 --> 00:00:47,970
      Speaker 0: And it took weeks to create this five second shot.
      

      In this example, just the “and” would be changed to “And”.

      With the “Convert Case to” -> “Sentence case (blend)” option, it considers “Speaker 0” to be the first word of the sentence and therefore does not capitalize “and”.

      Is there a way to get it to ignore text in strings or something like that? So for example, if it could ignore text in brackets [ ], then I could search and replace "Speaker 0: " with “[Speaker 0: ]”, do the case conversion, then search and replace “[Speaker 0: ]” with "Speaker 0: " to change it back.

      Otherwise, is there maybe a way to accomplish this with RegEx?

      Terry RT 2 Replies Last reply Reply Quote 0
      • Terry RT
        Terry R @Jacob Holmes
        last edited by Terry R

        @Jacob-Holmes

        Regex could do it, but I wonder if maybe inserting a carriage return/line feed at the : so that the sentence starts on a new line, then use the Convert Case option, then remove that carriage return/line feed.

        The only reason I say that is the variability of this 3rd line might be such that a regex either might miss some or include others that shouldn’t be adjusted.

        Terry

        PS Of course if you knew the parameters of the 3rd line, does it always start with the word Speaker, always have a number followed by a colon?

        1 Reply Last reply Reply Quote 1
        • Mark OlsonM
          Mark Olson
          last edited by Mark Olson

          I don’t have NPP handy to test this right now but (famous last words), wouldn’t replacing ^([^:\r\n]+:\h*)([a-z]) with ${1}\u${2} do the job? That will capitalize any single ASCII letter following the start of a line, then a speaker name, then a colon, then any amount of non-newline whitespace.

          AFAICT the only way that doesn’t work is if the name of a speaker could have a literal : character in it, which it probably can’t.

          1 Reply Last reply Reply Quote 2
          • Terry RT
            Terry R @Jacob Holmes
            last edited by

            @Jacob-Holmes said in Capitalization challenge with srt subtitle file - ignoring strings of text possible?:

            Otherwise, is there maybe a way to accomplish this with RegEx?

            So given the single example it might not be a good solution, but:
            Using the Replace function:
            Find What:(?-is)^Speaker\s*\d*:\s*\K(.)
            Replace With:\U\1

            Of course the search mode is Regular Expression and you need to click on the “Replace All” button. Best if the cursor is at the start of the file.

            The \K will cause issues if you want to detect and replace one line at a time, hence the need for the “Replace All”. So obviously you’d consider working on a copy of the file, just in case it caused more problems. If you did try this and it failed we’d need to know more about the issues. Then either we adjust this solution or start with a new idea.

            Terry

            1 Reply Last reply Reply Quote 3
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors