search and replace: replace 66 terms



  • Hi, I have hundreds of articles in Chinese (unicode characters) full of Biblical references. I need to replace the Chinese reference to English. An example would be:

    (創1:1) need to replace “(創” with "(Gen "
    (出2:2) need to replace “(出” with "(Exo "
    etc.

    As there are 66 books in the Bible, I need to do the search and replace 66 times.I tried recording a macro, but I kept making mistakes, and found it very difficult to get it right.

    I understand that this may be possible by using regular expressions, but I have diffculties in consturcting the right syntax.

    Can any one show me how this can be done?

    Doing a search for 66 items in one go may be too much, and I am prepared to break it do down to smaller chunks.

    Thanks in advace for any constructive suggestions.



  • Hello @michael-hsu, and All

    I think that changing these 66 items, in one go, is quite possible :-)) I just give you the general rule to apply :

    • Your SEARCH regex will be simply composed of the list of all your Chinese characters, surrounded with parentheses, in order to define a capturing group ( 1, 2, and so on ), and separated with the | symbol to create multiples alternatives, tested, one after another, evaluated from the leftmost one to the rightmost one

    • Your REPLACE regex will be a non-ordered list of several conditional replacements, of the form (?#ABCD), which means that, if the group # is matched, in the search regex, it will be replaced with the ABCD text. At the end of all these syntaxes, just add the \x20 syntax which is just… a space char

    So assuming your text

    (創1:1)
    (出2:2)
    

    SEARCH (創)|(出)

    REPLACE (?1Gen)(?2Exo)\x20

    Of course, in the Replace dialog, the Regular expression search mode must be selected and, preferably, the Wrap around option will be ticked. Then, after clicking on the Replace All button , you should obtain :

    (Gen 1:1)
    (Exo 2:2)
    

    So the general syntax is :

    SEARCH (Search_1)|(Search_2)|(Search_3)|..............|(Search_n)

    REPLACE (?1Replacement_1)(?2Replacement_2)(?3Replacement_3)...............(?nReplacement_n)


    BTW, if you have a list of your 66 Chinese items and a second list of the 66 English items, I could, first, generate the totality of your regex S/R…with an other regex !!

    Best Regards,

    guy038



  • @guy038

    Thank you very much for your prompt and helpful reply.

    I managed to construct the required S/R with your assistance.

    For the record, I did two S&R, one for the Old Testament references, and another one for the New Testament.

    Search: ((創)|((出)|((利)|((民)|((申)|((書)|((士)|((得)|((撒上)|((撒下)|((王上)|((王下)|((代上)|((代下)|((拉)|((尼)|((斯)|((伯)|((詩)|((箴)|((傳)|((歌)|((賽)|((耶)|((哀)|((結)|((但)|((何)|((珥)|((摩)|((俄)|((拿)|((彌)|((鴻)|((哈)|((番)|((該)|((亞)|((瑪)

    Replace: (?1(Gen)(?2(Ex)(?3(Lev)(?4(Num)(?5(Deut)(?6(Jos)(?7(Jdg)(?8(Ruth)(?9(1 Sam)(?10(2 Sam)(?11(1 Kgs)(?12(2 Kgs)(?13(1 Ch)(?14(2 Ch)(?15(Ezr)(?16(Neh)(?17(Est)(?18(Job)(?19(Ps)(?20(Prov)(?21(Ecc)(?22(Song)(?23(Isa)(?24(Jer)(?25(Lam)(?26(Ezk)(?27(Dan)(?28(Hos)(?29(Joel)(?30(Amos)(?31(Obad)(?32(Jon)(?33(Mic)(?34(Nah)(?35(Hab)(?36(Zep)(?37(Hag)(?38(Zec)(?39(Mal)\x20

    Search: ((啟)|((猶)|((約參)|((約貳)|((約壹)|((彼後)|((彼前)|((雅)|((來)|((門)|((多)|((提後)|((提前)|((帖後)|((帖前)|((西)|((腓)|((弗)|((加)|((林後)|((林前)|((羅)|((徒)|((約)|((路)|((可)|((太)|((啓)

    Replace: (?1(Rev)(?2(Jud)(?3(3 Jn)(?4(2 Jn)(?5(1 Jn)(?6(2 Pet)(?7(1 Pet)(?8(Jas)(?9(Heb)(?10(Phm)(?11(Tit)(?12(2 Tim)(?13(1 Tim)(?14(2 Th)(?15(1 Th)(?16(Col)(?17(Phil)(?18(Eph)(?19(Gal)(?20(2 Cor)(?21(1 Cor)(?22(Rom)(?23(Act)(?24(Jn)(?25(Lk)(?26(Mk)(?27(Mat)(?28(Rev)\x20

    Just one more question: can you clarify what you meant by the last sentence? How would I generate a regex with another regex?



  • Hi, @michael-hsu, and All

    Glad that everything went fine :-)) I just noticed that I did not include the opening parenthesis ( in my regexes. And I guess that your Chinese characters may occur, in your file, without parentheses, too ! In that case, of course, the regexes become, for instance :

    SEARCH (\(創)|(\(出)

    REPLACE (?1\(Gen)(?2\(Ex)


    Now, in the last point of my previous post, I just wanted to point out that, from your complete list of 66 Chinese chars and the 66 English words, it’s possible to generate your big search / replace regexes with specific regexes. Here is, below, an example with the first 7 items of your list, but it would works for any number of items. Actually, not exactly, because the total size of the search and replace zones must not exceed 2046 characters !

    So from the list :

    創
    出
    利
    民
    申
    書
    士
    

    the regex :

    SEARCH (\w)\R(\R)?

    REPLACE \(\\\($1\)(?2:|)

    would produce your final SEARCH regex :

    (\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)
    

    Voilà !

    Similarly, with that given list :

    1Gen
    2Ex
    3Lev
    4Num
    5Deut
    6Jos
    7Jdg
    

    The following regex :

    SEARCH (\d+)(\w+)\R

    REPLACE \(?$1\\\($2\)

    would generate your final REPLACEMENT regex

    (?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)
    

    Whaoou ! And process would be identical for your 66 items ;-))

    Cheers,

    guy038

    P.S. :

    Thus, from your original text :

    (創1:1)
    (出2:2)
    (利3:3)
    (民4:4)
    (申5:5)
    (書6:6)
    (士7:7)
    

    and the two generated regexes, below :

    SEARCH (\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)

    REPLACE (?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)

    It would give your expected list !

    (Gen1:1)
    (Ex2:2)
    (Lev3:3)
    (Num4:4)
    (Deut5:5)
    (Jos6:6)
    (Jdg7:7)
    


  • @guy038

    In the interim between your last two posts, I was wondering how you were going to get the numbers into the replace expression. I thought some new regex trick was coming, but no, the numbers are part of the data, they don’t get created by the regex. :-)

    However, if all one had was a list of the un-numbered replacement values, one could easily use Notepad++'s Edit (menu) -> Column Editor… to add the numbers before running the regex to create the regex.





  • Hi, @michael-hsu, @scott-sumner,

    @scott-sumner :

    Aaaah ! I wish I could have been a magician and produced a regex which could generate an automatic numbering list, but, unfortunately, I’m not :-(( Don’t be sad : It’s just one of the advantages of scripting languages as Python and Lua :-))

    @michael-hsu

    Indeed, regexes are no good for calculus and operations as i+=1, are rather impossible to produce. So, I preceded your list of your English words, with a list of numbers, which can be generated, as Scott suggested, with the command Edit > Column Editor…

    Cheers,

    guy038



  • Hi, @guy038 & @scott-sumner

    Wow, you guys are amazing.

    Thanks very much for the assistance.

    The next task is to replace the Chinese numbering system with Roman numerials. Using two lists to generate the S/R syntax is certainly most helping.

    Once again, thanks a million



  • Just for clarification to my last post, I mean replacing Chinese numerials with arabic numerials, just in case someone is wondering :-)


Log in to reply