• Login
Community
  • Login

Regex replace only on the first group not the others

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
10 Posts 3 Posters 8.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Danilo de Queiroz
    last edited by Jan 7, 2017, 11:37 AM

    First, I’m not a regex expert, but here is the scenario.
    I have several .AAS subtitles files that I need to edit in batch-like approach, googling about it, I found that n++ can do multiple replaces using regex, which I believe is perfect for my claim, so:

    The problem is a I have multiple lines among all my files as the following sample:

    Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
    Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
    Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line

    I need to replace ONLY the “Dialog:” word on lines containing “Song - Romaji” OR “Song - Translation”.

    I tried regx -> Dialogue:|(Song - (Romaji,|Translation,))
    But it’s matching the default lines too.
    Also tried -> ^(?=.?\bDialogue\b)(?=.?\bSong\b).*$
    But it’s matching the correct target lines, but I didn’t realize how to make the replace regx field.

    C 1 Reply Last reply Jan 7, 2017, 4:09 PM Reply Quote 0
    • C
      Claudia Frank @Danilo de Queiroz
      last edited by Jan 7, 2017, 4:09 PM

      @Danilo-de-Queiroz

      from the infos given I assume the following regex should do the job.

      ^Dialogue(?=.*?Song - (?=Romaji|Translation).*$)
      

      You are looking for lines which start with word Dialogue
      followed by, but not counted as match,

      • any chars (less as possible) followed by
      • Song - followed by (but again not counted) either
      • Romajii or
      • Translation followed by
      • any chars until end of line

      In replace put the string you want.

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • G
        guy038
        last edited by Jan 7, 2017, 8:54 PM

        Hello, Danilo and Claudia,

        No problem, Danilo, regexes can do miracles, indeed !

        Claudia, as you, I thought about a look-ahead structure. I just shortened it, a bit ;-)

        Indeed, we don’t need a second look-ahead, to verify if the Romaji or Translation strings are present. We don’t need, also, to test for the presence of the range, possibly empty, of standard characters, till the end of the line ( .*$ ). This won’t change the main test ( Is the string Song - Romaji OR the string Song - Translation exists, further on, in the current line ? ) anyway !

        So, my regex attempt would be :

        SEARCH (?-s)^Dialogue(?=.+Song - (Romaji|Translation))

        REPLACE Any Text you want

        Notes :

        • First, the modifier, (?-s), ensures you that the dot special character will match, ONLY, a single standard character ( and not an EOL character )

        • The second part, Dialogue is the string to match

        • The ending part, (?=.+Song - (Romaji|Translation)), called a positive look-ahead, is a *condition, which must be true, in order to valid the overall regex !

        • The condition to test is : After the word dialogue, is there, further on, a string Song - Romaji OR a string Song - Translation, in the current line ?

        • If that condition is true, the search match ( Dialogue ) is, then, replaced by the contents of the Replace with field

        So, Danilo, let’s consider the original text, of nine lines, below :

        Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        

        If the Replace with: field contains the string Test_001, then, after clicking on the Replace All button, you should obtain the changed text, below :

        Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
        Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - TEST,0,0,0,Subtitle Line
        Test_001: 0,0:00:06.43,0:00:10.49,Song - Translation,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        Test_001: 0,0:00:06.43,0:00:10.49,Song - Romaji,0,0,0,Subtitle Line
        Dialogue: 0,0:00:06.43,0:00:10.49,Song - XXXXXXXX,0,0,0,Another Subtitle Line
        Dialogue: 0,0:02:37.90,0:02:40.36,Default,0,0,0,Some Default style line
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • C
          Claudia Frank
          last edited by Claudia Frank Jan 7, 2017, 9:42 PM Jan 7, 2017, 9:41 PM

          Hello Danilo and guy038,

          you are right, second lookahead isn’t necessary, so the modified version is

          ^Dialogue(?=.*?Song - (Romaji|Translation).*$)
          

          which is a char less than your example ;-D
          and also a little bit faster ;-D

          regex((?-s)^Dialogue(?=.+Song - (Romaji|Translation))) -> took 0.010000 seconds
          regex(^Dialogue(?=.?Song - (Romaji|Translation).$)) -> took 0.008000 seconds

          Your turn ;-)

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by guy038 Jan 8, 2017, 10:40 AM Jan 8, 2017, 10:36 AM

            Hi, Claudia,

            First of all, due to the Markdown syntax, in our site, I suppose that two star symbols are missing, in your regex. So the exact regexes are :

            (?-s)^Dialogue(?=.+Song - (Romaji|Translation))      Me
            
            ^Dialogue(?=.*?Song - (Romaji|Translation).*$)       You
            

            Oh ! I’ve never thought about timing regex’s execution, yet ! So, I’ve lost for 0.002s only :-(( I’ll never recover after such an event !

            Out of curiosity, could you time this similar regex ^Dialogue(?=.+Song - (Romaji|Translation)) ?. I just omitted the modifier (?-s), at the beginning. Of course, this implies that the . matches newline option must be unchecked, in the Replace dialog, before performing the S/R

            Two remarks :

            • I still think that the block .*$, at the end of your regex, is not necessary for knowing if, either, the string Song - Romaji OR Song - Translation occurs, in the current line !

            • As these two strings may be located, anywhere, after the word Dialogue, I don’t think, also, that the lazy quantifier *? is necessary, at the beginning of the look-ahead !

            Finally, your regex could be shortened to ^Dialogue(?=.*Song - (Romaji|Translation))

            which is quite similar to my regex syntax ^Dialogue(?=.+Song - (Romaji|Translation)), without the (?-s) modifier !

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • D
              Danilo de Queiroz
              last edited by Jan 8, 2017, 12:22 PM

              @guy038 @Claudia-Frank
              Hi Buddies I didn’t tested each of this expressions you mentioned except the first one that just did the job.
              Now I’m trying to understand each one of this characters for further uses, any good doc to point.
              Thanks for both your help :)

              1 Reply Last reply Reply Quote 0
              • G
                guy038
                last edited by Jan 8, 2017, 3:24 PM

                Hi, Danilo,

                First, don’t bother about the choice of the regex, Our regexes are quite similar, anyway ! Just the pleasure to discuss with Claudia !


                I just forgot to give you some information for improving your knowledge of regular expressions !

                Begin with that article, in N++ Wiki :

                http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                You may, also, look for valuable informations, on the sites, below :

                http://www.regular-expressions.info

                http://www.rexegg.com

                http://perldoc.perl.org/perlre.html

                Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

                1 Reply Last reply Reply Quote 0
                • C
                  Claudia Frank
                  last edited by Jan 8, 2017, 4:47 PM

                  Hi to everyone,

                  Guy, after doing some tests,
                  I would say it doesn’t matter if using the (?-s) or not,
                  because sometimes it was faster and sometimes not.
                  Seems that external influence has much more significance.

                  What can be said so is that non-greedy beats greedy regexes.

                  Out of interest I did a test with two regexes which describes the pattern more concrete,
                  as I thought this could be faster but it turned out it wasn’t.
                  I assume it is related to the fact that each char needs to be checked anyway.

                  Results of that simple test (10.000 iterations per each regex)

                  (?-s)^Dialogue(?=.+Song - (Romaji|Translation))       -> took 14.708000 seconds
                  (?-s)^Dialogue(?=.*Song - (Romaji|Translation))       -> took 14.631000 seconds
                  
                  (?-s)^Dialogue(?=.+Song - (Romaji|Translation).+$)    -> took 15.046000 seconds
                  (?-s)^Dialogue(?=.*Song - (Romaji|Translation).*$)    -> took 15.035000 seconds
                  
                  ^Dialogue(?=.+Song - (Romaji|Translation))            -> took 14.635000 seconds
                  ^Dialogue(?=.*Song - (Romaji|Translation))            -> took 14.697000 seconds
                  
                  ^Dialogue(?=.+?Song - (Romaji|Translation))           -> took 13.568000 seconds    
                  ^Dialogue(?=.*?Song - (Romaji|Translation))           -> took 13.575000 seconds
                  
                  ^Dialogue(?=.+Song - (Romaji|Translation).+$)         -> took 14.885000 seconds
                  ^Dialogue(?=.*Song - (Romaji|Translation).*$)         -> took 14.947000 seconds
                  
                  ^Dialogue(?=.+?Song - (Romaji|Translation).+?$)       -> took 13.928000 seconds
                  ^Dialogue(?=.*?Song - (Romaji|Translation).*?$)       -> took 13.972000 seconds
                  
                  ^Dialogue(?=: \d,\d\:\d\d\:\d\d\.\d\d,\d\:\d\d\:\d\d\.\d\d,Song - (Romaji|Translation)) -> took 16.183000 seconds
                  ^Dialogue(?=: \d,\d\:\d{2}\:\d{2}\.\d{2},\d\:\d{2}\:\d{2}\.\d{2},Song - (Romaji|Translation)) -> took 20.889000 seconds
                  

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 Jan 8, 2017, 6:53 PM Jan 8, 2017, 6:51 PM

                    Hi, Danilo and Claudia,

                    Claudia, Your serie of tests, on regexes, is very interesting, indeed ! We can deduce some facts :

                    • 1) If a quantifier, with syntax Regex{n} has a small value, it’s better to use the syntax RegexRegex… Regex than Regex{n} ! ( Refer to the time difference between your two last examples : the regex containing \d\d and the one containing \d{2} )

                    • 2) When a range of text is NOT needed, for the results of the regex, replace it by .* or .+, if needed text is located, after, in the regex, ELSE don’t add it, at all !

                    • 3) When it does not matter between using a lazy quantifier and a greedy one, for the results of the regex, always prefer the lazy form !

                    • 4) If a range of text cannot be a null length string, prefer the + quantifier( idem {1,x} ) to the * quantifier ( idem {0,x} )


                    Finally, Danilo, thanks to Claudia’s tests about timing, and according to the rules above, the faster and shorter regex, for your case, seems to be :

                    ^Dialogue(?=.+?Song - (Romaji|Translation))        ( 13.568000 seconds for 10,000 iterations )
                    

                    But, as this regex does not contain the (?-s) modifier, at beginning, just be sure that the . matches newline option is not enabled, in the Replace dialog !

                    Cheers,

                    guy038

                    C 1 Reply Last reply Jan 9, 2017, 6:29 PM Reply Quote 0
                    • C
                      Claudia Frank @guy038
                      last edited by Jan 9, 2017, 6:29 PM

                      Hi Guy,

                      I would still use the more descriptive version of (?-s) because there isn’t really a difference

                      (?-s)^Dialogue(?=.+Song - (Romaji|Translation))       -> took 14.708000 seconds
                      (?-s)^Dialogue(?=.*Song - (Romaji|Translation))       -> took 14.631000 seconds
                      
                      ^Dialogue(?=.+Song - (Romaji|Translation))            -> took 14.635000 seconds
                      ^Dialogue(?=.*Song - (Romaji|Translation))            -> took 14.697000 seconds
                      

                      but has the advantage of settings the s switch explicitly - so you’re sure about what should be done.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      8 out of 10
                      • First post
                        8/10
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors