Regex replace only on the first group not the others
-
First, I’m not a regex expert, but here is the scenario.
I have several .AAS subtitles files that I need to edit in batch-like approach, googling about it, I found that n++ can do multiple replaces using regex, which I believe is perfect for my claim, so:The problem is a I have multiple lines among all my files as the following sample:
Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style lineI need to replace ONLY the “Dialog:” word on lines containing “Song - Romaji” OR “Song - Translation”.
I tried regx -> Dialogue:|(Song - (Romaji,|Translation,))
But it’s matching the default lines too.
Also tried -> ^(?=.?\bDialogue\b)(?=.?\bSong\b).*$
But it’s matching the correct target lines, but I didn’t realize how to make the replace regx field. -
from the infos given I assume the following regex should do the job.
^Dialogue(?=.*?Song - (?=Romaji|Translation).*$)
You are looking for lines which start with word Dialogue
followed by, but not counted as match,- any chars (less as possible) followed by
- Song - followed by (but again not counted) either
- Romajii or
- Translation followed by
- any chars until end of line
In replace put the string you want.
Cheers
Claudia -
Hello, Danilo and Claudia,
No problem, Danilo, regexes can do miracles, indeed !
Claudia, as you, I thought about a look-ahead structure. I just shortened it, a bit ;-)
Indeed, we don’t need a second look-ahead, to verify if the Romaji or Translation strings are present. We don’t need, also, to test for the presence of the range, possibly empty, of standard characters, till the end of the line (
.*$
). This won’t change the main test ( Is the string Song - Romaji OR the string Song - Translation exists, further on, in the current line ? ) anyway !So, my regex attempt would be :
SEARCH
(?-s)^Dialogue(?=.+Song - (Romaji|Translation))
REPLACE
Any Text you want
Notes :
-
First, the modifier,
(?-s)
, ensures you that the dot special character will match, ONLY, a single standard character ( and not an EOL character ) -
The second part,
Dialogue
is the string to match -
The ending part,
(?=.+Song - (Romaji|Translation))
, called a positive look-ahead, is a *condition, which must be true, in order to valid the overall regex ! -
The condition to test is : After the word dialogue, is there, further on, a string Song - Romaji OR a string Song - Translation, in the current line ?
-
If that condition is true, the search match ( Dialogue ) is, then, replaced by the contents of the Replace with field
So, Danilo, let’s consider the original text, of nine lines, below :
Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
If the Replace with: field contains the string Test_001, then, after clicking on the Replace All button, you should obtain the changed text, below :
Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
Best Regards,
guy038
-
-
Hello Danilo and guy038,
you are right, second lookahead isn’t necessary, so the modified version is
^Dialogue(?=.*?Song - (Romaji|Translation).*$)
which is a char less than your example ;-D
and also a little bit faster ;-Dregex((?-s)^Dialogue(?=.+Song - (Romaji|Translation))) -> took 0.010000 seconds
regex(^Dialogue(?=.?Song - (Romaji|Translation).$)) -> took 0.008000 secondsYour turn ;-)
Cheers
Claudia -
Hi, Claudia,
First of all, due to the Markdown syntax, in our site, I suppose that two star symbols are missing, in your regex. So the exact regexes are :
(?-s)^Dialogue(?=.+Song - (Romaji|Translation)) Me ^Dialogue(?=.*?Song - (Romaji|Translation).*$) You
Oh ! I’ve never thought about timing regex’s execution, yet ! So, I’ve lost for 0.002s only :-(( I’ll never recover after such an event !
Out of curiosity, could you time this similar regex
^Dialogue(?=.+Song - (Romaji|Translation))
?. I just omitted the modifier(?-s)
, at the beginning. Of course, this implies that the . matches newline option must be unchecked, in the Replace dialog, before performing the S/RTwo remarks :
-
I still think that the block
.*$
, at the end of your regex, is not necessary for knowing if, either, the string Song - Romaji OR Song - Translation occurs, in the current line ! -
As these two strings may be located, anywhere, after the word Dialogue, I don’t think, also, that the lazy quantifier
*?
is necessary, at the beginning of the look-ahead !
Finally, your regex could be shortened to
^Dialogue(?=.*Song - (Romaji|Translation))
which is quite similar to my regex syntax
^Dialogue(?=.+Song - (Romaji|Translation))
, without the(?-s)
modifier !Cheers,
guy038
-
-
@guy038 @Claudia-Frank
Hi Buddies I didn’t tested each of this expressions you mentioned except the first one that just did the job.
Now I’m trying to understand each one of this characters for further uses, any good doc to point.
Thanks for both your help :) -
Hi, Danilo,
First, don’t bother about the choice of the regex, Our regexes are quite similar, anyway ! Just the pleasure to discuss with Claudia !
I just forgot to give you some information for improving your knowledge of regular expressions !
Begin with that article, in N++ Wiki :
http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by
Notepad++
, since its6.0
version, at the TWO addresses below :http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
-
The FIRST link explains the syntax, of regular expressions, in the SEARCH part
-
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part
You may, also, look for valuable informations, on the sites, below :
http://www.regular-expressions.info
http://perldoc.perl.org/perlre.html
Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))
-
-
Hi to everyone,
Guy, after doing some tests,
I would say it doesn’t matter if using the (?-s) or not,
because sometimes it was faster and sometimes not.
Seems that external influence has much more significance.What can be said so is that non-greedy beats greedy regexes.
Out of interest I did a test with two regexes which describes the pattern more concrete,
as I thought this could be faster but it turned out it wasn’t.
I assume it is related to the fact that each char needs to be checked anyway.Results of that simple test (10.000 iterations per each regex)
(?-s)^Dialogue(?=.+Song - (Romaji|Translation)) -> took 14.708000 seconds (?-s)^Dialogue(?=.*Song - (Romaji|Translation)) -> took 14.631000 seconds (?-s)^Dialogue(?=.+Song - (Romaji|Translation).+$) -> took 15.046000 seconds (?-s)^Dialogue(?=.*Song - (Romaji|Translation).*$) -> took 15.035000 seconds ^Dialogue(?=.+Song - (Romaji|Translation)) -> took 14.635000 seconds ^Dialogue(?=.*Song - (Romaji|Translation)) -> took 14.697000 seconds ^Dialogue(?=.+?Song - (Romaji|Translation)) -> took 13.568000 seconds ^Dialogue(?=.*?Song - (Romaji|Translation)) -> took 13.575000 seconds ^Dialogue(?=.+Song - (Romaji|Translation).+$) -> took 14.885000 seconds ^Dialogue(?=.*Song - (Romaji|Translation).*$) -> took 14.947000 seconds ^Dialogue(?=.+?Song - (Romaji|Translation).+?$) -> took 13.928000 seconds ^Dialogue(?=.*?Song - (Romaji|Translation).*?$) -> took 13.972000 seconds ^Dialogue(?=: \d,\d\:\d\d\:\d\d\.\d\d,\d\:\d\d\:\d\d\.\d\d,Song - (Romaji|Translation)) -> took 16.183000 seconds ^Dialogue(?=: \d,\d\:\d{2}\:\d{2}\.\d{2},\d\:\d{2}\:\d{2}\.\d{2},Song - (Romaji|Translation)) -> took 20.889000 seconds
Cheers
Claudia -
Hi, Danilo and Claudia,
Claudia, Your serie of tests, on regexes, is very interesting, indeed ! We can deduce some facts :
-
1) If a quantifier, with syntax
Regex{n}
has a small value, it’s better to use the syntax RegexRegex… Regex than Regex{n} ! ( Refer to the time difference between your two last examples : the regex containing\d\d
and the one containing\d{2}
) -
2) When a range of text is NOT needed, for the results of the regex, replace it by
.*
or.+
, if needed text is located, after, in the regex, ELSE don’t add it, at all ! -
3) When it does not matter between using a lazy quantifier and a greedy one, for the results of the regex, always prefer the lazy form !
-
4) If a range of text cannot be a null length string, prefer the
+
quantifier( idem{1,x}
) to the*
quantifier ( idem{0,x}
)
Finally, Danilo, thanks to Claudia’s tests about timing, and according to the rules above, the faster and shorter regex, for your case, seems to be :
^Dialogue(?=.+?Song - (Romaji|Translation)) ( 13.568000 seconds for 10,000 iterations )
But, as this regex does not contain the
(?-s)
modifier, at beginning, just be sure that the . matches newline option is not enabled, in the Replace dialog !Cheers,
guy038
-
-
Hi Guy,
I would still use the more descriptive version of (?-s) because there isn’t really a difference
(?-s)^Dialogue(?=.+Song - (Romaji|Translation)) -> took 14.708000 seconds (?-s)^Dialogue(?=.*Song - (Romaji|Translation)) -> took 14.631000 seconds ^Dialogue(?=.+Song - (Romaji|Translation)) -> took 14.635000 seconds ^Dialogue(?=.*Song - (Romaji|Translation)) -> took 14.697000 seconds
but has the advantage of settings the s switch explicitly - so you’re sure about what should be done.
Cheers
Claudia