Regex: check for a specific word/string across multiple files, then insert a string below
I have blocks of dialogue lines reoccurring hundreds of times across many files, and I want to insert the same word pattern of Name-linebreak-quotationmarks:
in the same specific place: in the empty line between TRANSLATION and END STRING.
ぷっち 「いいなぁ。ほしいなぁ。」 # TRANSLATION # END STRING
after reading up on regex a little, I tried to search and replace for
but that’s clearly not working and just cannibalizes everything nearby. What’s a regex search that correctly identifies the name above the 「 bracket and then inserts the word on the next empty line between TRANSLATION and END STRING?
So that this:
# TEXT STRING # UNTRANSLATED # CONTEXT : Dialogue/Message/FaceUnknown # ADVICE : 49 char limit (35 if face) ぷっち 「いいなぁ。ほしいなぁ。」 # TRANSLATION # END STRING # TEXT STRING # UNTRANSLATED # CONTEXT : Dialogue/Message/FaceUnknown # ADVICE : 49 char limit (35 if face) 初期ぷっち 「わぅーん♪」 # TRANSLATION # END STRING
turns into this:
# TEXT STRING # UNTRANSLATED # CONTEXT : Dialogue/Message/FaceUnknown # ADVICE : 49 char limit (35 if face) ぷっち 「いいなぁ。ほしいなぁ。」 # TRANSLATION Pucchi “” # END STRING # TEXT STRING # UNTRANSLATED # CONTEXT : Dialogue/Message/FaceUnknown # ADVICE : 49 char limit (35 if face) 初期ぷっち 「わぅーん♪」 # TRANSLATION # END STRING
without destroying any nearby lines which also contain END STRING, but start with a different name.
inserts the word on the next empty line between TRANSLATION and END STRING?
First off, welcome to the NPP forum. You almost had it. Your find text was correct in so far as it works. In order to add in additional text, once you select something with the find text you need to return that text otherwise it will be removed.
So my idea of the replace function is:
Notice I extended what your find text selects and also put brackets around them. This is what we call a “capture” group. Using this allows us to identify it in the replace text field to return it.
Have a go with my answer and see if that helps. Remember to have a backup of the file just in case it doesn’t work.
thanks for the quick reply, but that didn’t work for me, since it finds too much text and indiscriminately inserts the dialogue string into any empty space instead the ones I want, see attached pic:
(instead of a backslash there’s a yen symbol because of Japanese system locale, not sure if that is a problem)
Is the problem maybe that the text contains a Japanese bracket「」?
but that didn’t work for me, since it finds too much text
Sorry about that, I didn’t have multiple instances of the text to search. That meant your original find text was ALSO selecting too much text.
Try this (I did a bit more testing this time).
Note again more characters in the find what field. I added the
^near the start so that it forces looking for that text only at the START of a line, not as a subset of any other (your example should those characters as a subset of the next one).
Also the Replace field had some changes as a result. Let us know if this helps.
Terry R last edited by
Your regex had an inherent “failure” that I temporarily forgot about, I’m going to say it was because I was in awe of the “weird” characters. Actually the
[^"]is a dangerous piece of coding. It would appear you do know something about regexes, however this piece of coding has a hidden side to it.
You will note I included a
(?-i), actually that was also wrong it should have been
(?-s)which means the
.can be any character EXCEPT an ‘end of line marker’. However the
anything EXCEPT "and that includes an end of line marker. In your regex this allowed it to select multiple lines unchecked.
So my amended answer is
actually I really don’t know jack about regex except what I’ve learned by looking at a tutorial for only 1 hour - so processing all you’ve said is tough for me.
The last regex you’ve provided got me very close to the goal, but in this example it doesn’t catch the lower text block - is it because the brackets are on two different lines?
Here’s highlighted text from editpad for comparison’s sake, showing that only the above block is matched by the regex
I’m out of time now but I will try again tomorrow. Thanks so far.
Alan Kilborn last edited by
Since the latest screenshot doesn’t appear to be Notepad++, this thread has truly devolved into something that should be on a regex forum rather than here?
what I’ve learned by looking at a tutorial for only 1 hour
Right, yes there were some unknowns, one being how many lines those “weird” brackets used up. From the latest info you provided they may be spread over several lines. That involves an almost complete redesign of my regex.
I will explain the regex (broken down) as best I can in the hope you can alter any minor issues you find in the next test.
(?s-i)states we will accept “end of line” markers with a
-imeans a non-insensitive’ test. So ‘TRANSLATION’ is NOT equal to ‘translation’.
^ぷっちmeans find these characters at the start of a line
.*?(?=TRANSLATION)means (along with the
(?s) any characters but non-greedily so long as we stop when we see TRANSLATION directly ahead.
\Kmeans we forget any characters we previously searched for and possibly captured. This allows us to recommence our search again.
(TRANSLATION\h*\R)(\R)this is the REAL search we want, we capture those parts we wish to return in the replace with field with the additional text.
guy038 last edited by guy038
An alternate generic regex S/R could be :
(?x-i) ^Your Japanese Expression
\R (?s:「.+?」)\R \#\h*TRANSLATION\h*(\R)
which gives, for your specific example, the following S/R :
(?x-i) ^ ぷっち\R (?s:「.+?」)\R \#\h*TRANSLATION\h*(\R)
The first part
(?x-i)is a shorthand of
(?x)(?-i), two in-line modifiers which mean :
Any subsequent space char is not taken in account for the overall regex and any range after a
#char is considered as a comment, independent of the overall regex, too ! (
The search is processed in a non-insensitive way (
^assertion matches a beginning of line
ぷっち\Ris your Japanese expression, followed with its line-break
(?s:「.+?」)\Rpart is a non-capturing group containing the shortest range, even in multi-lines, of any character, even EOL ones, between a left and a right corner brackets, followed with its line-break
Finally the part
\#\h*TRANSLATION\h*(\R)looks for a literal
#char, followed with possible horizontal blank characters ( Tab, Space ), then the string TRANSLATION, with this exact case and ending with possible blank chars, again and a line-break, stored as group
1, due to the surrounding parentheses
In replacement, we rewrite the
$0part, which represents the overall regex, so all lines from the line, located above the CJK brackets line till the entire line # TRANSLATION
Then the part
Pucchi\1adds your translation string Pucchi, followed with a line-break ( group
Finally, the part
""just writes two consecutive double quotes characters. No need for an extra line-break, which already exists ( the empty line between the lines # TRANSLATION and # END STRING )
Thank you, thank you, thank you! Terry’s last solution did the trick. Thanks for the elaborate explanations, I’ll learn more about regex later when I have the time, for now by doing this simple search I killed at least a thousand times of repetitive CTRL V and can smoothly proceed with my translation.