Community
    • Login

    Captions for video - Find and Replace across time stamps

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    30 Posts 5 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • astrosofistaA
      astrosofista @MaximillianM
      last edited by

      @MaximillianM, @PeterJones

      One solution could be as simple as adding a space before “like a” and grouping the second time stamp in @PeterJones’ regex. Also, in the replacement expression insert a reference to recently created group and the new string. As follows:

      Search: (?x-s) \x20like (\x20 | \R | \R* \d{1,2} : .*\R) a ((?1)) fire
      Replace: $2liquefy
      

      Hope this helps

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @astrosofista
        last edited by

        I think what the OP is looking for here is basically a way to search for a phrase which could have a timestamp inserted into it at any word position in the phrase, i.e., it won’t occur that the timestamp is inserted into the middle of the word.

        Then the OP wants the ability to replace with the same sort of criterion.
        And it appears that the replacement can wholly occur after the timestamp (the portion of the search match before the timestamp is removed) and things still work.

        I think what might be best here is a script; the script would still use regular expressions, but it would allow the user to very naturally search for a simple phrase (user specifies “like a fire”) and replace it with a phrase (user specifies “liquify”) and the script takes care of the rest.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello, @MaximillianM, @peterjones, @alan-kilborn, @astrosofista and All,

          Let’s expand the problem a bit and imagine this text :

          0:00:17.680,0:00:20.400
          The licenses for most software are designed to
          
          0:00:19.840,0:00:22.400
          take away your freedom to share and change it.
          

          And that the OP want to :

          • Find the string designed to take away

          • Replace it with the string generally made to always suppress

          Right now, the timestamp is located between the two strings designed to and take away

          Now, let’s use the | symbol to represent the timestamp feature

          Which replacement is expected by @MaximillianM ?

          designed to | take away => generally | made to always suppress    ( Case A )
          designed to | take away => generally made | to always suppress    ( Case B )
          designed to | take away => generally made to | always suppress    ( Case C )
          designed to | take away => generally made to always | suppress    ( Case D )
          

          To my mind, the more logical one would be Case B, as, before the timestamp, there would be as many words after the replacement than before the replacement !

          …

          Best Regards,

          guy038

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @guy038
            last edited by

            @guy038 said in Captions for video - Find and Replace across time stamps:

            To my mind, the more logical one would be Case B

            But we really don’t know what the OP’s needs in this regard are.
            I mean, well, if you want to have fun, go off and solve any problem you’d like. :-)

            For me, I’ll wait to see if OP returns again, clarifies need, and expresses interest in the scripted solution.

            The scripted solution I envision would prompt with an input box for the string to search for, then would prompt with an input box for the replacement. The script would run and use some variant of Peter’s solution behind the scenes.

            1 Reply Last reply Reply Quote 0
            • MaximillianMM
              MaximillianM
              last edited by

              Thank you all so much! You are Great! This really helps.

              I tested replacing the text after the second timestamp using $2 which works. Also adding it after the second timestamp probably is the best default option. The only real “error” could be if the replacement made it so that one line of text was completely blank.

              I hadn’t known about the script function and have just used recorded macros. Macros do work, but once created macros can be but are not that easy to edit.

              Since you have been so helpful. I’ll outline the full desired outcome which might be a script.

              1-Find and replace a list of words. Words might be added or removed to this list in the future. My current list is about 90 words.

              Most are just simple words so a 1 to 1 on the same line works.
              For example “cuz” replace with “because” or “um” replaced with a blank. Ideally, this list could be seen and edited

              WordToFind, Word to Replace
              cuz, because
              um, [blank]

              2-Then phrases that could go across lines. The regular expressions ones.
              Could be in the same list/table as above with the full code or a different list/table.
              WordToFind-----------Word to Replace
              (?x-s) \x20like (\x20 | \R | \R* \d{1,2} : .*\R) a ((?1)) fire-----------------$2liquefy

              or since you really know your stuff. Maybe fancy and takes the phrase
              “like a fire” and creates the code (?x-s) \x20like (\x20 | \R | \R* \d{1,2} : .*\R) a ((?1)) fire

              Thanks again! You have really been helping me out :-)

              Alan KilbornA 1 Reply Last reply Reply Quote 2
              • Alan KilbornA
                Alan Kilborn @MaximillianM
                last edited by

                @MaximillianM said in Captions for video - Find and Replace across time stamps:

                “like a fire” and creates the code (?x-s) \x20like (\x20 | \R | \R* \d{1,2} : .*\R) a ((?1)) fire

                Yes, something like that was what I had in mind.

                Would your search “phrase” ever have TWO+ embedded timestamps in it? Or is that a case that we don’t need to consider…?

                Let me absorb your specs and see if I can put together a reasonable demo.

                MaximillianMM 1 Reply Last reply Reply Quote 1
                • MaximillianMM
                  MaximillianM @Alan Kilborn
                  last edited by

                  @Alan-Kilborn Thanks! Across two timestamps is a possibility and would be great to consider for some potential longer phrases. You keep exceeding my expectations :-) Though if it is too complicated, across one is still really good.

                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    @MaximillianM

                    So let’s start simple and slowly. :-)

                    Here’s a test PythonScript that can demo the functionality:

                    # -*- coding: utf-8 -*-
                    
                    from Npp import editor, notepad
                    
                    class T1(object):
                    
                        def __init__(self):
                            search_phrase = 'like a fire'
                            while True:
                                search_phrase = notepad.prompt('\r\nEnter search phrase and press OK to find next:', '', search_phrase)
                                if search_phrase == None or len(search_phrase) == 0: return  # quit
                                word_list = search_phrase.strip().split()
                                regex = r'(?-is)(?(DEFINE)(\x20|\R|\R*\d{1,2}:.*\R))' + '(?1)'.join(word_list)
                                matches = []
                                editor.research(regex, lambda m: matches.append(m.span(0)), 0, editor.getCurrentPos(), editor.getLength(), 1)
                                if len(matches) == 0:
                                    notepad.messageBox('No (more) matches', '')
                                    return
                                else:
                                    (match_start, match_end) = matches[0]
                                    editor.scrollRange(match_end, match_start)
                                    editor.setSelection(match_end, match_start)
                    
                    if __name__ == '__main__': T1()
                    

                    When you run it, it will prompt you for your search phrase, like this:

                    Imgur

                    When you enter it and press OK, you will be shown your first match (it will become selected) and it will prompt again:

                    Imgur

                    It will continue moving downward in a file, until no more matches occur or you press Cancel.

                    Peter has some good instructions for getting started with the PythonScript plugin HERE.
                    See if you can get what I’ve shown working for you, and then we’ll talk about where to take it from here.

                    MaximillianMM 2 Replies Last reply Reply Quote 2
                    • MaximillianMM
                      MaximillianM @Alan Kilborn
                      last edited by

                      @Alan-Kilborn Thanks, yes, I was able to get it to work!

                      1 Reply Last reply Reply Quote 0
                      • MaximillianMM
                        MaximillianM @Alan Kilborn
                        last edited by

                        @Alan-Kilborn Hi Alan, Thanks for your help on this project can you point me in the right direction for the next step of adding code to use a preset list of words to find and replace? Thanks again :-)

                        Alan KilbornA 1 Reply Last reply Reply Quote 1
                        • Alan KilbornA
                          Alan Kilborn @MaximillianM
                          last edited by

                          @MaximillianM said in Captions for video - Find and Replace across time stamps:

                          use a preset list of words to find and replace

                          Sure; again let’s start small…

                          Let’s just put the list inside the code, but maybe make it look like it is coming from a file (because you may want to go there, later).

                          So I’d suggest a list like this:

                          ::findable_you:replaceable_you
                          :I can contain spaces:So I see
                          

                          where the format is:

                          • one-character delimiter
                          • find word/phrase
                          • delimiter (as previously defined by first character)
                          • replace word/phrase

                          In Python we might do it like this:

                          the_list = [
                              ':findable_you:replaceable_you',
                              ':I can contain spaces:So I see',
                              ':look_for_me:really_want_to_be_you',
                              '!simple!complex',
                              '$fire$liquify',
                          ]
                          

                          We need some code to process that list:

                          # -*- coding: utf-8 -*-
                          
                          from Npp import editor
                          
                          class T2(object):
                          
                              def __init__(self):
                          
                                  the_list = [
                                      ':findable_you:replaceable_you',
                                      ':I can contain spaces:So I see',
                                      ':look_for_me:really_want_to_be_you',
                                      '!simple!complex',
                                      '$fire$liquify',
                                  ]
                          
                                  editor.beginUndoAction()
                                  for definition in the_list:
                                      delim = definition[0]
                                      (find_what, repl_with) = definition[1:].split(delim, 2)
                                      editor.replace(find_what, repl_with)
                                  editor.endUndoAction()
                          
                          if __name__ == '__main__': T2()
                          
                          MaximillianMM 1 Reply Last reply Reply Quote 2
                          • MaximillianMM
                            MaximillianM @Alan Kilborn
                            last edited by

                            @Alan-Kilborn Thanks! Yes, I like the idea of making it a separate file for ease of updates.

                            I experimented with the code.

                            What is the difference between using the ! or $ for the word in the list? They seem to do a similar find/replace in the example, at least in my small test. But I’m probably missing something.

                            The next step seems to be adding the search over multiple lines code as from your earlier example. How can I do that?

                            Thanks!

                            Alan KilbornA 2 Replies Last reply Reply Quote 0
                            • Alan KilbornA
                              Alan Kilborn @MaximillianM
                              last edited by

                              @MaximillianM said in Captions for video - Find and Replace across time stamps:

                              What is the difference between using the ! or $ for the word in the list? They seem to do a similar find/replace in the example, at least in my small test. But I’m probably missing something.

                              So the “delimiter” variability is useful if the data itself contains the delimiter.

                              Say we hardcoded the delimiter to be a colon (:).
                              Then if you wanted to replace something like a:b with c:d it would be difficult.

                              The way I defined it, you could just use a different delimiter for this case, e.g. !a:b!c:d.

                              1 Reply Last reply Reply Quote 3
                              • Alan KilbornA
                                Alan Kilborn @MaximillianM
                                last edited by

                                @MaximillianM said in Captions for video - Find and Replace across time stamps:

                                The next step seems to be adding the search over multiple lines code as from your earlier example. How can I do that?

                                This is the point where I’m having trouble envisioning how it would work.
                                I know you said something about it before, but I didn’t quite understand it.

                                Would you put a special symbol in the replacement part that you’d want the timestamp to be replaced by?
                                Maybe a more in-depth walk-through (example(s)) of what is wanted?

                                I’m certainly willing to do it, or at least help you get started…

                                MaximillianMM 1 Reply Last reply Reply Quote 2
                                • MaximillianMM
                                  MaximillianM @Alan Kilborn
                                  last edited by

                                  @Alan-Kilborn Thanks again. I see you are helping many other people so I should have put a summary in to help you :-)

                                  The problem
                                  1-Simple Find and replace with a list of words (your most recent code does this)
                                  2-Find and replace multi-word string that goes across a timestamp
                                  Find “like a fire” replace with $2liquefy

                                  0:00:17.680,0:00:20.400
                                  vaporize like a

                                  0:00:19.840,0:00:22.400
                                  fire

                                  I would like to combine the most recent code with this one (minus the manual entry box so it could use the list as in your most recent code) to search across the time stamp.

                                  -- coding: utf-8 --

                                  from Npp import editor, notepad

                                  class T1(object):

                                  def __init__(self):
                                      search_phrase = 'like a fire'
                                      while True:
                                          search_phrase = notepad.prompt('\r\nEnter search phrase and press OK to find next:', '', search_phrase)
                                          if search_phrase == None or len(search_phrase) == 0: return  # quit
                                          word_list = search_phrase.strip().split()
                                          regex = r'(?-is)(?(DEFINE)(\x20|\R|\R*\d{1,2}:.*\R))' + '(?1)'.join(word_list)
                                          matches = []
                                          editor.research(regex, lambda m: matches.append(m.span(0)), 0, editor.getCurrentPos(), editor.getLength(), 1)
                                          if len(matches) == 0:
                                              notepad.messageBox('No (more) matches', '')
                                              return
                                          else:
                                              (match_start, match_end) = matches[0]
                                              editor.scrollRange(match_end, match_start)
                                              editor.setSelection(match_end, match_start)
                                  

                                  if name == ‘main’: T1()

                                  Thanks again :-)

                                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                                  • Alan KilbornA
                                    Alan Kilborn @MaximillianM
                                    last edited by

                                    @MaximillianM said in Captions for video - Find and Replace across time stamps:

                                    Find “like a fire” replace with $2liquefy
                                    0:00:17.680,0:00:20.400
                                    vaporize like a
                                    0:00:19.840,0:00:22.400
                                    fire

                                    Yes, you used this example before, but I didn’t fully understand it.
                                    I take it the $2 represents the bridged timestamp, and where it would appear in the replacement text.

                                    BTW, why $2?
                                    Is it because a search match could possibly bridge two timestamps?
                                    And $1 might possibly appear in the replace expression as well?
                                    Or $3 etc?

                                    MaximillianMM 1 Reply Last reply Reply Quote 0
                                    • MaximillianMM
                                      MaximillianM @Alan Kilborn
                                      last edited by

                                      @Alan-Kilborn Hi, I’m a beginner and was just using the $2 that astrosofista suggested in this post so I don’t fully understand it.

                                      He suggested
                                      Search: (?x-s) \x20like (\x20 | \R | \R* \d{1,2} : .*\R) a ((?1)) fire
                                      Replace: $2liquefy

                                      If I don’t use the $2 before the replacement word then the timestamp is removed in the replacement. I tried $1 and $3 before the replacement word as a test and the time stamp was removed in both cases.

                                      Putting the replacement expression in the second part of the string (second timestamp) is preferable as it is the most likely scenario.

                                      Just one bridged time-stamp is the basic requirement, in the future I might look at across multiple time-stamps.

                                      There a blank line between the timestamp/phrase as in the example below.
                                      $1, $2, $3, would not be present in the text so ok to use in our expression.

                                      Find “like a fire” replace (end of expression) with liquefy

                                      0:00:17.680,0:00:20.400
                                      vaporize like a

                                      0:00:19.840,0:00:22.400
                                      fire

                                      0:00:22.400,0:00:24.300
                                      next phrase

                                      Thanks :-)

                                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                                      • Alan KilbornA
                                        Alan Kilborn @MaximillianM
                                        last edited by

                                        @MaximillianM said in Captions for video - Find and Replace across time stamps:

                                        I’m a beginner and was just using the $2 that astrosofista suggested in this post so I don’t fully understand it.

                                        I’m not a beginner, but I don’t see how this is going to work in the bigger scheme of things. I mean, well, maybe I see can see it if I squint at it, but I don’t have the desire/time to sort out the regexes needed down to the Nth level so that every situation is covered.

                                        I think finding the matches is one level of difficulty (which has already been conquered), but replacing them introduces a whole new level of complexity to it. Even the single bridged timestamp can be nuancy when you really think about some examples that a generic replace could encounter.

                                        I’m sorry if I misrepresented that I would do the whole solution for your “list” based replacement. My intent was to demo a few things to show what’s possible with scripting, not come up with a full-blown solution for some very specific data.

                                        If someone else (@guy038 loves to do this sort of thing, or maybe @PeterJones since he got the original ball rolling) is willing to do it, I can certainly help put together the final script using the information. What is needed is a find/replace regex pair that would walk through a document doing the replacements desired.

                                        1 Reply Last reply Reply Quote 0
                                        • guy038G
                                          guy038
                                          last edited by guy038

                                          Hello, @maximillianm, @peterjones, @alan-kilborn, @astrosofista and All,

                                          @maximillianm, I assume that the timestamp always begins lines of your file, without any leading blank characters ! If it’s not the case, just tell me !

                                          Here is a generic regex which searches any range of text, containing one timestamp feature, and replace it with any range of text, still containing the same timestamp

                                          SEARCH (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F)|(?-i)Before_Find_Text((?1))After_Find_Text

                                          REPLACE Before_Replace_Text\2After_Replace_Text

                                          where :

                                          • Before_Find_Text represents the text to search, located BEFORE the time-stamp line

                                          • After_Find_Text represents the text to search, located AFTER the time-stamp line

                                          • Before_Replace_Text represents the text to replace BEFORE the time-stamp line

                                          • After_Replace_Text represents the text to replace AFTER the time-stamp line


                                          First example :

                                          Given your initial text :

                                          0:00:17.680,0:00:20.400
                                          vaporize like a
                                          
                                          0:00:19.840,0:00:22.400
                                          fire
                                          

                                          And the expected result :

                                          0:00:17.680,0:00:20.400
                                          vaporize
                                          
                                          0:00:19.840,0:00:22.400
                                          liquefy
                                          

                                          The different variable parts of the generic regex S/R are :

                                          • Before_Find_Text = vaporize like a

                                          • After_Find_Text = fire

                                          • Before_Replace_Text = vaporize

                                          • After_Replace_Text = liquefy

                                          which gives the functional regex S/R :

                                          SEARCH (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F)|(?-i)vaporize like a((?1))fire

                                          REPLACE vaporize\2liquefy


                                          Second example :

                                          Given this initial example, taken from my previous post :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software are designed to
                                          
                                          0:00:19.840,0:00:22.400
                                          take away your freedom to share and change it.
                                          

                                          And the expected result :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software are generally made to
                                          
                                          0:00:19.840,0:00:22.400
                                          always suppress your freedom to share and change it.
                                          

                                          The different variable parts of the generic regex S/R are, this time :

                                          • Before_Find_Text = designed to

                                          • After_Find_Text = take away

                                          • Before_Replace_Text = generally made to

                                          • After_Replace_Text = always suppress

                                          which gives the functional regex S/R :

                                          SEARCH (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F)|(?-i)designed to((?1))take away

                                          REPLACE generally made to\2always suppress


                                          Third example :

                                          Given, again, this initial example, taken from my previous post :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software are designed to
                                          
                                          0:00:19.840,0:00:22.400
                                          take away your freedom to share and change it.
                                          

                                          And the expected result :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software are
                                          
                                          0:00:19.840,0:00:22.400
                                          designed to suppress your freedom.
                                          

                                          The different variable parts of the generic regex S/R are, this time :

                                          • Before_Find_Text = are designed to

                                          • After_Find_Text = take away your freedom to share and change it

                                          • Before_Replace_Text = are

                                          • After_Replace_Text = designed to suppress your freedom

                                          which gives the functional regex S/R :

                                          SEARCH (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F)|(?-i)are designed to((?1))take away your freedom to share and change it

                                          REPLACE are\2designed to suppress your freedom


                                          Fourth example :

                                          Given this initial example :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software are designed to
                                          
                                          0:00:19.840,0:00:22.400
                                          take away your freedom to share and change it.
                                          

                                          And the expected result :

                                          0:00:17.680,0:00:20.400
                                          The licenses for most software
                                          
                                          0:00:19.840,0:00:22.400
                                          prevent you from sharing and changing it.
                                          

                                          The different variable parts of the generic regex S/R are, this time :

                                          • Before_Find_Text = software are designed to

                                          • After_Find_Text = take away your freedom to share and change

                                          • Before_Replace_Text = software

                                          • After_Replace_Text = prevent you from sharing and changing

                                          which gives the functional regex S/R :

                                          SEARCH (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F)|(?-i)software are designed to((?1))take away your freedom to share and change

                                          REPLACE software\2prevent you from sharing and changing


                                          Notes :

                                          • The first alternative of this search regex (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R)(*F) is never matched, due to backtracking control verb (*F) which forces a failure of the match attempt.

                                          • However, the regex (\R+\d{1,2}:\d\d:\d\d\.\d{3},\d{1,2}:\d\d:\d\d\.\d{3}\R), which would match any range of line-breaks, followed with a complete timestamp line, is stored in group1 for later use, in the second alternative of the search regex

                                          • As you can see, the timestamp 0:00:19.840,0:00:22.400 is kept, after replacement because it’s stored in group 2 ( Current timestamp value of the subroutine call (?1), in the regex part ((?1)) ! )

                                          • If you prefer an “insensitive to case” search, simply change the part (?-i) by (?i)

                                          Best regards

                                          guy038

                                          1 Reply Last reply Reply Quote 1
                                          • Alan KilbornA
                                            Alan Kilborn
                                            last edited by

                                            @guy038 said:

                                            Before_Find_Text = vaporize like a
                                            After_Find_Text = fire

                                            Before_Find_Text = designed to
                                            After_Find_Text = take away

                                            Before_Find_Text = are designed to
                                            After_Find_Text = take away your freedom to share and change it

                                            Before_Find_Text = software are designed to
                                            After_Find_Text = take away your freedom to share and change

                                            But the OP wants to simply specify the following for each of those searches:

                                            • vaporize like a fire
                                            • designed to take away
                                            • are designed to take away your freedom to share and change it
                                            • software are designed to take away your freedom to share and change

                                            In other words, transparency of the timestamp and where it occurs.
                                            Thus, timestamp could occur at any one of the following points (denoted by TS):

                                            • vaporizeTSlikeTSaTSfire
                                            • designedTStoTStakeTSaway
                                            • areTSdesignedTStoTStakeTSawayTSyourTSfreedomTStoTSshareTSandTSchangeTSit
                                            • softwareTSareTSdesignedTStoTStakeTSawayTSyourTSfreedomTStoTSshareTSandTSchange

                                            And, this problem has already been solved, by Peter, way above.
                                            What is needed now is a replacement regex that works, for all cases of a generic substitution, to accompany the original regex scheme.

                                            Now, ok, it is fine if the orginal search regex mutates somewhat to meet this need, but the “spirit” of it needs to be retained.

                                            And, it may be simpler than I think it is, truly. But what I don’t want to have happen is the usual – people put a lot of time into it, and then the reality of it is that a different problem was solved than what was wanted.

                                            PeterJonesP 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors