search and replace: replace 66 terms
-
Hi, I have hundreds of articles in Chinese (unicode characters) full of Biblical references. I need to replace the Chinese reference to English. An example would be:
(創1:1) need to replace “(創” with "(Gen "
(出2:2) need to replace “(出” with "(Exo "
etc.As there are 66 books in the Bible, I need to do the search and replace 66 times.I tried recording a macro, but I kept making mistakes, and found it very difficult to get it right.
I understand that this may be possible by using regular expressions, but I have diffculties in consturcting the right syntax.
Can any one show me how this can be done?
Doing a search for 66 items in one go may be too much, and I am prepared to break it do down to smaller chunks.
Thanks in advace for any constructive suggestions.
-
Hello @michael-hsu, and All
I think that changing these 66 items, in one go, is quite possible :-)) I just give you the general rule to apply :
-
Your SEARCH regex will be simply composed of the list of all your Chinese characters, surrounded with parentheses, in order to define a capturing group (
1
,2
, and so on ), and separated with the|
symbol to create multiples alternatives, tested, one after another, evaluated from the leftmost one to the rightmost one -
Your REPLACE regex will be a non-ordered list of several conditional replacements, of the form
(?#ABCD)
, which means that, if the group#
is matched, in the search regex, it will be replaced with theABCD
text. At the end of all these syntaxes, just add the\x20
syntax which is just… a space char
So assuming your text
(創1:1) (出2:2)
SEARCH
(創)|(出)
REPLACE
(?1Gen)(?2Exo)\x20
Of course, in the Replace dialog, the
Regular expression
search mode must be selected and, preferably, theWrap around
option will be ticked. Then, after clicking on theReplace All
button , you should obtain :(Gen 1:1) (Exo 2:2)
So the general syntax is :
SEARCH
(Search_1)|(Search_2)|(Search_3)|..............|(Search_n)
REPLACE
(?1Replacement_1)(?2Replacement_2)(?3Replacement_3)...............(?nReplacement_n)
BTW, if you have a list of your
66
Chinese items and a second list of the66
English items, I could, first, generate the totality of your regex S/R…with an other regex !!Best Regards,
guy038
-
-
Thank you very much for your prompt and helpful reply.
I managed to construct the required S/R with your assistance.
For the record, I did two S&R, one for the Old Testament references, and another one for the New Testament.
Search: ((創)|((出)|((利)|((民)|((申)|((書)|((士)|((得)|((撒上)|((撒下)|((王上)|((王下)|((代上)|((代下)|((拉)|((尼)|((斯)|((伯)|((詩)|((箴)|((傳)|((歌)|((賽)|((耶)|((哀)|((結)|((但)|((何)|((珥)|((摩)|((俄)|((拿)|((彌)|((鴻)|((哈)|((番)|((該)|((亞)|((瑪)
Replace: (?1(Gen)(?2(Ex)(?3(Lev)(?4(Num)(?5(Deut)(?6(Jos)(?7(Jdg)(?8(Ruth)(?9(1 Sam)(?10(2 Sam)(?11(1 Kgs)(?12(2 Kgs)(?13(1 Ch)(?14(2 Ch)(?15(Ezr)(?16(Neh)(?17(Est)(?18(Job)(?19(Ps)(?20(Prov)(?21(Ecc)(?22(Song)(?23(Isa)(?24(Jer)(?25(Lam)(?26(Ezk)(?27(Dan)(?28(Hos)(?29(Joel)(?30(Amos)(?31(Obad)(?32(Jon)(?33(Mic)(?34(Nah)(?35(Hab)(?36(Zep)(?37(Hag)(?38(Zec)(?39(Mal)\x20
Search: ((啟)|((猶)|((約參)|((約貳)|((約壹)|((彼後)|((彼前)|((雅)|((來)|((門)|((多)|((提後)|((提前)|((帖後)|((帖前)|((西)|((腓)|((弗)|((加)|((林後)|((林前)|((羅)|((徒)|((約)|((路)|((可)|((太)|((啓)
Replace: (?1(Rev)(?2(Jud)(?3(3 Jn)(?4(2 Jn)(?5(1 Jn)(?6(2 Pet)(?7(1 Pet)(?8(Jas)(?9(Heb)(?10(Phm)(?11(Tit)(?12(2 Tim)(?13(1 Tim)(?14(2 Th)(?15(1 Th)(?16(Col)(?17(Phil)(?18(Eph)(?19(Gal)(?20(2 Cor)(?21(1 Cor)(?22(Rom)(?23(Act)(?24(Jn)(?25(Lk)(?26(Mk)(?27(Mat)(?28(Rev)\x20
Just one more question: can you clarify what you meant by the last sentence? How would I generate a regex with another regex?
-
Hi, @michael-hsu, and All
Glad that everything went fine :-)) I just noticed that I did not include the opening parenthesis
(
in my regexes. And I guess that your Chinese characters may occur, in your file, without parentheses, too ! In that case, of course, the regexes become, for instance :SEARCH
(\(創)|(\(出)
REPLACE
(?1\(Gen)(?2\(Ex)
Now, in the last point of my previous post, I just wanted to point out that, from your complete list of
66
Chinese chars and the66
English words, it’s possible to generate your big search / replace regexes with specific regexes. Here is, below, an example with the first7
items of your list, but it would works for any number of items. Actually, not exactly, because the total size of the search and replace zones must not exceed2046
characters !So from the list :
創 出 利 民 申 書 士
the regex :
SEARCH
(\w)\R(\R)?
REPLACE
\(\\\($1\)(?2:|)
would produce your final SEARCH regex :
(\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)
Voilà !
Similarly, with that given list :
1Gen 2Ex 3Lev 4Num 5Deut 6Jos 7Jdg
The following regex :
SEARCH
(\d+)(\w+)\R
REPLACE
\(?$1\\\($2\)
would generate your final REPLACEMENT regex
(?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)
Whaoou ! And process would be identical for your
66
items ;-))Cheers,
guy038
P.S. :
Thus, from your original text :
(創1:1) (出2:2) (利3:3) (民4:4) (申5:5) (書6:6) (士7:7)
and the two generated regexes, below :
SEARCH
(\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)
REPLACE
(?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)
It would give your expected list !
(Gen1:1) (Ex2:2) (Lev3:3) (Num4:4) (Deut5:5) (Jos6:6) (Jdg7:7)
-
In the interim between your last two posts, I was wondering how you were going to get the numbers into the replace expression. I thought some new regex trick was coming, but no, the numbers are part of the data, they don’t get created by the regex. :-)
However, if all one had was a list of the un-numbered replacement values, one could easily use Notepad++'s Edit (menu) -> Column Editor… to add the numbers before running the regex to create the regex.
-
@guy038 ,
So let’s try RegexBuddy on one of your regexes from an earlier post in this thread, this time with links!:
FIND
(\(創)|(\(出)
- Match this alternative (attempting the next alternative only if this one fails)
(\(創)
- Or match this alternative (the entire match attempt fails if this one fails to match)
(\(出)
REPLACE
(?1\(Gen)(?2\(Ex)
- Check whether capturing group number 1 was matched
(?1\(Gen)
- Check whether capturing group number 2 was matched
(?2\(Ex)
Created with RegexBuddy
- Match this alternative (attempting the next alternative only if this one fails)
-
Hi, @michael-hsu, @scott-sumner,
Aaaah ! I wish I could have been a magician and produced a regex which could generate an automatic numbering list, but, unfortunately, I’m not :-(( Don’t be sad : It’s just one of the advantages of scripting languages as Python and Lua :-))
Indeed, regexes are no good for calculus and operations as
i+=1
, are rather impossible to produce. So, I preceded your list of your English words, with a list of numbers, which can be generated, as Scott suggested, with the command Edit > Column Editor…Cheers,
guy038
-
Hi, @guy038 & @scott-sumner
Wow, you guys are amazing.
Thanks very much for the assistance.
The next task is to replace the Chinese numbering system with Roman numerials. Using two lists to generate the S/R syntax is certainly most helping.
Once again, thanks a million
-
Just for clarification to my last post, I mean replacing Chinese numerials with arabic numerials, just in case someone is wondering :-)