Community
    • Login

    Regex: check for a specific word/string across multiple files, then insert a string below

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 4 Posters 432 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • juon2ndJ
      juon2nd
      last edited by

      Hello,
      I have blocks of dialogue lines reoccurring hundreds of times across many files, and I want to insert the same word pattern of Name-linebreak-quotationmarks:

      Pucchi
      ""
      

      in the same specific place: in the empty line between TRANSLATION and END STRING.

      ぷっち
      「いいなぁ。ほしいなぁ。」
      # TRANSLATION 
      
      # END STRING
      

      after reading up on regex a little, I tried to search and replace for

      ぷっち[^"]*TRANSLATION
      

      but that’s clearly not working and just cannibalizes everything nearby. What’s a regex search that correctly identifies the name above the 「 bracket and then inserts the word on the next empty line between TRANSLATION and END STRING?

      So that this:

      # TEXT STRING
      # UNTRANSLATED
      # CONTEXT : Dialogue/Message/FaceUnknown
      # ADVICE : 49 char limit (35 if face)
      ぷっち
      「いいなぁ。ほしいなぁ。」
      # TRANSLATION 
      
      # END STRING
      
      # TEXT STRING
      # UNTRANSLATED
      # CONTEXT : Dialogue/Message/FaceUnknown
      # ADVICE : 49 char limit (35 if face)
      初期ぷっち
      「わぅーん♪」
      # TRANSLATION 
      
      # END STRING
      

      turns into this:

      # TEXT STRING
      # UNTRANSLATED
      # CONTEXT : Dialogue/Message/FaceUnknown
      # ADVICE : 49 char limit (35 if face)
      ぷっち
      「いいなぁ。ほしいなぁ。」
      # TRANSLATION 
      Pucchi
      “”
      # END STRING
      
      # TEXT STRING
      # UNTRANSLATED
      # CONTEXT : Dialogue/Message/FaceUnknown
      # ADVICE : 49 char limit (35 if face)
      初期ぷっち
      「わぅーん♪」
      # TRANSLATION 
      
      # END STRING
      

      without destroying any nearby lines which also contain END STRING, but start with a different name.

      Cheers

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by Terry R

        @juon2nd said in Regex: check for a specific word/string across multiple files, then insert a string below:

        inserts the word on the next empty line between TRANSLATION and END STRING?

        First off, welcome to the NPP forum. You almost had it. Your find text was correct in so far as it works. In order to add in additional text, once you select something with the find text you need to return that text otherwise it will be removed.
        So my idea of the replace function is:
        Find What:(ぷっち[^"]*TRANSLATION\h*\R)(\R)
        Replace With:\1Pucchi\2""\2

        Notice I extended what your find text selects and also put brackets around them. This is what we call a “capture” group. Using this allows us to identify it in the replace text field to return it.

        Have a go with my answer and see if that helps. Remember to have a backup of the file just in case it doesn’t work.

        Terry

        juon2ndJ 1 Reply Last reply Reply Quote 2
        • juon2ndJ
          juon2nd @Terry R
          last edited by

          @Terry-R

          Hello,
          thanks for the quick reply, but that didn’t work for me, since it finds too much text and indiscriminately inserts the dialogue string into any empty space instead the ones I want, see attached pic:
          Screenshot 2020-04-02 23.40.43.png
          (instead of a backslash there’s a yen symbol because of Japanese system locale, not sure if that is a problem)

          Is the problem maybe that the text contains a Japanese bracket「」?

          cheers

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by Terry R

            @juon2nd said in Regex: check for a specific word/string across multiple files, then insert a string below:

            but that didn’t work for me, since it finds too much text

            Sorry about that, I didn’t have multiple instances of the text to search. That meant your original find text was ALSO selecting too much text.

            Try this (I did a bit more testing this time).
            Find What:(?-i)^(ぷっち(\R.*){2}TRANSLATION\h*\R)(\R)
            Replace With:\1Pucchi\3""\3

            Note again more characters in the find what field. I added the ^ near the start so that it forces looking for that text only at the START of a line, not as a subset of any other (your example should those characters as a subset of the next one).
            Also the Replace field had some changes as a result. Let us know if this helps.

            Terry

            1 Reply Last reply Reply Quote 2
            • Terry RT
              Terry R
              last edited by

              @Terry-R said in Regex: check for a specific word/string across multiple files, then insert a string below:

              Find What:(?-i)^(

              Your regex had an inherent “failure” that I temporarily forgot about, I’m going to say it was because I was in awe of the “weird” characters. Actually the [^"] is a dangerous piece of coding. It would appear you do know something about regexes, however this piece of coding has a hidden side to it.

              You will note I included a (?-i), actually that was also wrong it should have been (?-s) which means the . can be any character EXCEPT an ‘end of line marker’. However the [^"] says anything EXCEPT " and that includes an end of line marker. In your regex this allowed it to select multiple lines unchecked.

              So my amended answer is
              Find What:(?-s)^(ぷっち(\R.*){2}TRANSLATION\h*\R)(\R)
              Replace With:\1Pucchi\3""\3

              Terry

              juon2ndJ 1 Reply Last reply Reply Quote 3
              • juon2ndJ
                juon2nd @Terry R
                last edited by

                @Terry-R
                actually I really don’t know jack about regex except what I’ve learned by looking at a tutorial for only 1 hour - so processing all you’ve said is tough for me.
                The last regex you’ve provided got me very close to the goal, but in this example it doesn’t catch the lower text block - is it because the brackets are on two different lines?
                「なんか気になります。
                 博士、ついていきましょう。」

                Here’s highlighted text from editpad for comparison’s sake, showing that only the above block is matched by the regex
                111.PNG

                I’m out of time now but I will try again tomorrow. Thanks so far.

                Alan KilbornA 1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @juon2nd
                  last edited by

                  @juon2nd

                  Since the latest screenshot doesn’t appear to be Notepad++, this thread has truly devolved into something that should be on a regex forum rather than here?

                  1 Reply Last reply Reply Quote 0
                  • Terry RT
                    Terry R
                    last edited by Terry R

                    @juon2nd said in Regex: check for a specific word/string across multiple files, then insert a string below:

                    what I’ve learned by looking at a tutorial for only 1 hour

                    Right, yes there were some unknowns, one being how many lines those “weird” brackets used up. From the latest info you provided they may be spread over several lines. That involves an almost complete redesign of my regex.
                    I will explain the regex (broken down) as best I can in the hope you can alter any minor issues you find in the next test.

                    Find What:(?s-i)^ぷっち.*?(?=TRANSLATION)\K(TRANSLATION\h*\R)(\R)
                    Replace With:\1Pucchi\2""\2

                    The (?s-i) states we will accept “end of line” markers with a . and the -i means a non-insensitive’ test. So ‘TRANSLATION’ is NOT equal to ‘translation’.
                    ^ぷっち means find these characters at the start of a line
                    .*?(?=TRANSLATION) means (along with the (?s) any characters but non-greedily so long as we stop when we see TRANSLATION directly ahead.
                    \K means we forget any characters we previously searched for and possibly captured. This allows us to recommence our search again.
                    (TRANSLATION\h*\R)(\R) this is the REAL search we want, we capture those parts we wish to return in the replace with field with the additional text.

                    Terry

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @juon2nd, @Terry-r, @alan-kilborn and All,

                      An alternate generic regex S/R could be :

                      SEARCH (?x-i)     ^     Your Japanese Expression\R    (?s:「.+?」)\R     \#\h*TRANSLATION\h*(\R)

                      REPLACE $0Your translation\1""


                      which gives, for your specific example, the following S/R :

                      SEARCH (?x-i) ^ ぷっち\R (?s:「.+?」)\R \#\h*TRANSLATION\h*(\R)

                      REPLACE $0Pucchi\1""

                      Notes :

                      • The first part (?x-i) is a shorthand of (?x)(?-i), two in-line modifiers which mean :

                        • Any subsequent space char is not taken in account for the overall regex and any range after a # char is considered as a comment, independent of the overall regex, too ! ( (?s) )

                        • The search is processed in a non-insensitive way ( (?-i) )

                      • The ^ assertion matches a beginning of line

                      • Then, ぷっち\R is your Japanese expression, followed with its line-break

                      • Now the (?s:「.+?」)\R part is a non-capturing group containing the shortest range, even in multi-lines, of any character, even EOL ones, between a left and a right corner brackets, followed with its line-break

                      • Finally the part \#\h*TRANSLATION\h*(\R) looks for a literal # char, followed with possible horizontal blank characters ( Tab, Space ), then the string TRANSLATION, with this exact case and ending with possible blank chars, again and a line-break, stored as group 1, due to the surrounding parentheses

                      • In replacement, we rewrite the $0 part, which represents the overall regex, so all lines from the line, located above the CJK brackets line till the entire line # TRANSLATION

                      • Then the part Pucchi\1 adds your translation string Pucchi, followed with a line-break ( group 1 )

                      Finally, the part "" just writes two consecutive double quotes characters. No need for an extra line-break, which already exists ( the empty line between the lines # TRANSLATION and # END STRING )

                      Best Regards,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • juon2ndJ
                        juon2nd
                        last edited by

                        Thank you, thank you, thank you! Terry’s last solution did the trick. Thanks for the elaborate explanations, I’ll learn more about regex later when I have the time, for now by doing this simple search I killed at least a thousand times of repetitive CTRL V and can smoothly proceed with my translation.
                        Cheers

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors