Automatic text transform with incremental numbering



  • Hello there!
    I’m struggling with finding an automated process/workflow for a big data set of text input. I’m working in documentary film editing and have to transfer provided texts as subtitles into a current film.

    I’m practically new to this kind of text editing. So, please excuse my low level of knowledge with RegEx and np++ scripts.

    My input data usually looks like this:

    _**00:00:03,000_++
    Text is written here.
    _**00:00:07,031_++
    More text is written here.
    _**00:00:11,028_++
    Text is written here.
    

    The goal is to transform that into this:

    1
    00:00:00,000 --> 00:00:03,000
    Text is written here.
    2
    00:00:03,000 --> 00:00:07,031
    More text is written here.
    3
    00:00:070,31 --> 00:00:11,028
    Text is written here.
    

    The first steps are quite obvious:

    • search & replace all “_++” with nothing
    • search & replace all “_**” with “##\nv*#*”

    Now I have “##” as a variable placeholder for incremental numbering in the next step. And “#” as another variable for later.
    Next step would be to read out the provided timestamp (XX:XX:XX,XXX) and write it into my variable #. One row at a time.
    Here my understanding of simple automation stops with the lack of knowledge with NP++ and RegEx.

    Could you help me out, or do I need to use another tool/write a script in another language?
    Any lead is highly appreciated :)



  • _\*\*([:,\d]+)_\+\+ .+? _\*\*([:,\d]+)_\+\+ - time range in $1, $2, but u must replace it to same format, at next step convert format of timestamps



  • @Maxim-Abrossimow said in Automatic text transform with incremental numbering:

    00:00:00,000

    A python script solution might look like this

    from Npp import editor
    import re
    
    transformed_text = ''
    counter = 1
    current_timestamp = '00:00:00,000'
    replace_with = '''{}
    {} --> {}
    '''
    
    lines = editor.getText().splitlines(True)
    for line in lines:
        m = re.search('(\d\d:\d\d:\d\d,\d\d\d)', line)
        if m:
            line = replace_with.format(counter,
                                       current_timestamp,
                                       m.group())
            counter += 1
            current_timestamp = m.group()
        transformed_text += line
            
    editor.setText(transformed_text)
    


  • cheater (:



  • @Ekopalypse well, this certainly does the trick. Thank you a lot!



  • @WinterSilence Thank you, too for your input! I tried with RegEx, but it is just so much easier with Python…


Log in to reply