Automatic text transform with incremental numbering

Maxim Abrossimow

Hello there!
I’m struggling with finding an automated process/workflow for a big data set of text input. I’m working in documentary film editing and have to transfer provided texts as subtitles into a current film.

I’m practically new to this kind of text editing. So, please excuse my low level of knowledge with RegEx and np++ scripts.

My input data usually looks like this:

_**00:00:03,000_++
Text is written here.
_**00:00:07,031_++
More text is written here.
_**00:00:11,028_++
Text is written here.

The goal is to transform that into this:

1
00:00:00,000 --> 00:00:03,000
Text is written here.
2
00:00:03,000 --> 00:00:07,031
More text is written here.
3
00:00:070,31 --> 00:00:11,028
Text is written here.

The first steps are quite obvious:

search & replace all “_++” with nothing
search & replace all “_**” with “##\nv*#*”

Now I have “##” as a variable placeholder for incremental numbering in the next step. And “#” as another variable for later.
Next step would be to read out the provided timestamp (XX:XX:XX,XXX) and write it into my variable #. One row at a time.
Here my understanding of simple automation stops with the lack of knowledge with NP++ and RegEx.

Could you help me out, or do I need to use another tool/write a script in another language?
Any lead is highly appreciated :)

WinterSilence

_\*\*([:,\d]+)_\+\+ .+? _\*\*([:,\d]+)_\+\+ - time range in $1, $2, but u must replace it to same format, at next step convert format of timestamps

Ekopalypse

@Maxim-Abrossimow said in Automatic text transform with incremental numbering:

00:00:00,000

A python script solution might look like this

from Npp import editor
import re

transformed_text = ''
counter = 1
current_timestamp = '00:00:00,000'
replace_with = '''{}
{} --> {}
'''

lines = editor.getText().splitlines(True)
for line in lines:
    m = re.search('(\d\d:\d\d:\d\d,\d\d\d)', line)
    if m:
        line = replace_with.format(counter,
                                   current_timestamp,
                                   m.group())
        counter += 1
        current_timestamp = m.group()
    transformed_text += line
        
editor.setText(transformed_text)

WinterSilence

cheater (:

Maxim Abrossimow

@Ekopalypse well, this certainly does the trick. Thank you a lot!

Maxim Abrossimow

@WinterSilence Thank you, too for your input! I tried with RegEx, but it is just so much easier with Python…