Community
    • Login

    Regex replace only on the first group not the others

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 3 Posters 8.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Danilo de QueirozD
      Danilo de Queiroz
      last edited by

      First, I’m not a regex expert, but here is the scenario.
      I have several .AAS subtitles files that I need to edit in batch-like approach, googling about it, I found that n++ can do multiple replaces using regex, which I believe is perfect for my claim, so:

      The problem is a I have multiple lines among all my files as the following sample:

      Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
      Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
      Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line

      I need to replace ONLY the “Dialog:” word on lines containing “Song - Romaji” OR “Song - Translation”.

      I tried regx -> Dialogue:|(Song - (Romaji,|Translation,))
      But it’s matching the default lines too.
      Also tried -> ^(?=.?\bDialogue\b)(?=.?\bSong\b).*$
      But it’s matching the correct target lines, but I didn’t realize how to make the replace regx field.

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @Danilo de Queiroz
        last edited by

        @Danilo-de-Queiroz

        from the infos given I assume the following regex should do the job.

        ^Dialogue(?=.*?Song - (?=Romaji|Translation).*$)
        

        You are looking for lines which start with word Dialogue
        followed by, but not counted as match,

        • any chars (less as possible) followed by
        • Song - followed by (but again not counted) either
        • Romajii or
        • Translation followed by
        • any chars until end of line

        In replace put the string you want.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by

          Hello, Danilo and Claudia,

          No problem, Danilo, regexes can do miracles, indeed !

          Claudia, as you, I thought about a look-ahead structure. I just shortened it, a bit ;-)

          Indeed, we don’t need a second look-ahead, to verify if the Romaji or Translation strings are present. We don’t need, also, to test for the presence of the range, possibly empty, of standard characters, till the end of the line ( .*$ ). This won’t change the main test ( Is the string Song - Romaji OR the string Song - Translation exists, further on, in the current line ? ) anyway !

          So, my regex attempt would be :

          SEARCH (?-s)^Dialogue(?=.+Song - (Romaji|Translation))

          REPLACE Any Text you want

          Notes :

          • First, the modifier, (?-s), ensures you that the dot special character will match, ONLY, a single standard character ( and not an EOL character )

          • The second part, Dialogue is the string to match

          • The ending part, (?=.+Song - (Romaji|Translation)), called a positive look-ahead, is a *condition, which must be true, in order to valid the overall regex !

          • The condition to test is : After the word dialogue, is there, further on, a string Song - Romaji OR a string Song - Translation, in the current line ?

          • If that condition is true, the search match ( Dialogue ) is, then, replaced by the contents of the Replace with field

          So, Danilo, let’s consider the original text, of nine lines, below :

          Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          

          If the Replace with: field contains the string Test_001, then, after clicking on the Replace All button, you should obtain the changed text, below :

          Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
          Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line
          Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
          Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line
          Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
          

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • Claudia FrankC
            Claudia Frank
            last edited by Claudia Frank

            Hello Danilo and guy038,

            you are right, second lookahead isn’t necessary, so the modified version is

            ^Dialogue(?=.*?Song - (Romaji|Translation).*$)
            

            which is a char less than your example ;-D
            and also a little bit faster ;-D

            regex((?-s)^Dialogue(?=.+Song - (Romaji|Translation))) -> took 0.010000 seconds
            regex(^Dialogue(?=.?Song - (Romaji|Translation).$)) -> took 0.008000 seconds

            Your turn ;-)

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, Claudia,

              First of all, due to the Markdown syntax, in our site, I suppose that two star symbols are missing, in your regex. So the exact regexes are :

              (?-s)^Dialogue(?=.+Song - (Romaji|Translation))      Me
              
              ^Dialogue(?=.*?Song - (Romaji|Translation).*$)       You
              

              Oh ! I’ve never thought about timing regex’s execution, yet ! So, I’ve lost for 0.002s only :-(( I’ll never recover after such an event !

              Out of curiosity, could you time this similar regex ^Dialogue(?=.+Song - (Romaji|Translation)) ?. I just omitted the modifier (?-s), at the beginning. Of course, this implies that the . matches newline option must be unchecked, in the Replace dialog, before performing the S/R

              Two remarks :

              • I still think that the block .*$, at the end of your regex, is not necessary for knowing if, either, the string Song - Romaji OR Song - Translation occurs, in the current line !

              • As these two strings may be located, anywhere, after the word Dialogue, I don’t think, also, that the lazy quantifier *? is necessary, at the beginning of the look-ahead !

              Finally, your regex could be shortened to ^Dialogue(?=.*Song - (Romaji|Translation))

              which is quite similar to my regex syntax ^Dialogue(?=.+Song - (Romaji|Translation)), without the (?-s) modifier !

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • Danilo de QueirozD
                Danilo de Queiroz
                last edited by

                @guy038 @Claudia-Frank
                Hi Buddies I didn’t tested each of this expressions you mentioned except the first one that just did the job.
                Now I’m trying to understand each one of this characters for further uses, any good doc to point.
                Thanks for both your help :)

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hi, Danilo,

                  First, don’t bother about the choice of the regex, Our regexes are quite similar, anyway ! Just the pleasure to discuss with Claudia !


                  I just forgot to give you some information for improving your knowledge of regular expressions !

                  Begin with that article, in N++ Wiki :

                  http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                  In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                  http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                  http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                  • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                  • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                  You may, also, look for valuable informations, on the sites, below :

                  http://www.regular-expressions.info

                  http://www.rexegg.com

                  http://perldoc.perl.org/perlre.html

                  Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

                  1 Reply Last reply Reply Quote 0
                  • Claudia FrankC
                    Claudia Frank
                    last edited by

                    Hi to everyone,

                    Guy, after doing some tests,
                    I would say it doesn’t matter if using the (?-s) or not,
                    because sometimes it was faster and sometimes not.
                    Seems that external influence has much more significance.

                    What can be said so is that non-greedy beats greedy regexes.

                    Out of interest I did a test with two regexes which describes the pattern more concrete,
                    as I thought this could be faster but it turned out it wasn’t.
                    I assume it is related to the fact that each char needs to be checked anyway.

                    Results of that simple test (10.000 iterations per each regex)

                    (?-s)^Dialogue(?=.+Song - (Romaji|Translation))       -> took 14.708000 seconds
                    (?-s)^Dialogue(?=.*Song - (Romaji|Translation))       -> took 14.631000 seconds
                    
                    (?-s)^Dialogue(?=.+Song - (Romaji|Translation).+$)    -> took 15.046000 seconds
                    (?-s)^Dialogue(?=.*Song - (Romaji|Translation).*$)    -> took 15.035000 seconds
                    
                    ^Dialogue(?=.+Song - (Romaji|Translation))            -> took 14.635000 seconds
                    ^Dialogue(?=.*Song - (Romaji|Translation))            -> took 14.697000 seconds
                    
                    ^Dialogue(?=.+?Song - (Romaji|Translation))           -> took 13.568000 seconds    
                    ^Dialogue(?=.*?Song - (Romaji|Translation))           -> took 13.575000 seconds
                    
                    ^Dialogue(?=.+Song - (Romaji|Translation).+$)         -> took 14.885000 seconds
                    ^Dialogue(?=.*Song - (Romaji|Translation).*$)         -> took 14.947000 seconds
                    
                    ^Dialogue(?=.+?Song - (Romaji|Translation).+?$)       -> took 13.928000 seconds
                    ^Dialogue(?=.*?Song - (Romaji|Translation).*?$)       -> took 13.972000 seconds
                    
                    ^Dialogue(?=: \d,\d\:\d\d\:\d\d\.\d\d,\d\:\d\d\:\d\d\.\d\d,Song - (Romaji|Translation)) -> took 16.183000 seconds
                    ^Dialogue(?=: \d,\d\:\d{2}\:\d{2}\.\d{2},\d\:\d{2}\:\d{2}\.\d{2},Song - (Romaji|Translation)) -> took 20.889000 seconds
                    

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, Danilo and Claudia,

                      Claudia, Your serie of tests, on regexes, is very interesting, indeed ! We can deduce some facts :

                      • 1) If a quantifier, with syntax Regex{n} has a small value, it’s better to use the syntax RegexRegex… Regex than Regex{n} ! ( Refer to the time difference between your two last examples : the regex containing \d\d and the one containing \d{2} )

                      • 2) When a range of text is NOT needed, for the results of the regex, replace it by .* or .+, if needed text is located, after, in the regex, ELSE don’t add it, at all !

                      • 3) When it does not matter between using a lazy quantifier and a greedy one, for the results of the regex, always prefer the lazy form !

                      • 4) If a range of text cannot be a null length string, prefer the + quantifier( idem {1,x} ) to the * quantifier ( idem {0,x} )


                      Finally, Danilo, thanks to Claudia’s tests about timing, and according to the rules above, the faster and shorter regex, for your case, seems to be :

                      ^Dialogue(?=.+?Song - (Romaji|Translation))        ( 13.568000 seconds for 10,000 iterations )
                      

                      But, as this regex does not contain the (?-s) modifier, at beginning, just be sure that the . matches newline option is not enabled, in the Replace dialog !

                      Cheers,

                      guy038

                      Claudia FrankC 1 Reply Last reply Reply Quote 0
                      • Claudia FrankC
                        Claudia Frank @guy038
                        last edited by

                        Hi Guy,

                        I would still use the more descriptive version of (?-s) because there isn’t really a difference

                        (?-s)^Dialogue(?=.+Song - (Romaji|Translation))       -> took 14.708000 seconds
                        (?-s)^Dialogue(?=.*Song - (Romaji|Translation))       -> took 14.631000 seconds
                        
                        ^Dialogue(?=.+Song - (Romaji|Translation))            -> took 14.635000 seconds
                        ^Dialogue(?=.*Song - (Romaji|Translation))            -> took 14.697000 seconds
                        

                        but has the advantage of settings the s switch explicitly - so you’re sure about what should be done.

                        Cheers
                        Claudia

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors