• Login
Community
  • Login

search and replace: replace 66 terms

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
9 Posts 3 Posters 2.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    Michael Hsu
    last edited by Jul 26, 2018, 8:43 PM

    Hi, I have hundreds of articles in Chinese (unicode characters) full of Biblical references. I need to replace the Chinese reference to English. An example would be:

    (創1:1) need to replace “(創” with "(Gen "
    (出2:2) need to replace “(出” with "(Exo "
    etc.

    As there are 66 books in the Bible, I need to do the search and replace 66 times.I tried recording a macro, but I kept making mistakes, and found it very difficult to get it right.

    I understand that this may be possible by using regular expressions, but I have diffculties in consturcting the right syntax.

    Can any one show me how this can be done?

    Doing a search for 66 items in one go may be too much, and I am prepared to break it do down to smaller chunks.

    Thanks in advace for any constructive suggestions.

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Jul 27, 2018, 8:15 AM Jul 26, 2018, 9:37 PM

      Hello @michael-hsu, and All

      I think that changing these 66 items, in one go, is quite possible :-)) I just give you the general rule to apply :

      • Your SEARCH regex will be simply composed of the list of all your Chinese characters, surrounded with parentheses, in order to define a capturing group ( 1, 2, and so on ), and separated with the | symbol to create multiples alternatives, tested, one after another, evaluated from the leftmost one to the rightmost one

      • Your REPLACE regex will be a non-ordered list of several conditional replacements, of the form (?#ABCD), which means that, if the group # is matched, in the search regex, it will be replaced with the ABCD text. At the end of all these syntaxes, just add the \x20 syntax which is just… a space char

      So assuming your text

      (創1:1)
      (出2:2)
      

      SEARCH (創)|(出)

      REPLACE (?1Gen)(?2Exo)\x20

      Of course, in the Replace dialog, the Regular expression search mode must be selected and, preferably, the Wrap around option will be ticked. Then, after clicking on the Replace All button , you should obtain :

      (Gen 1:1)
      (Exo 2:2)
      

      So the general syntax is :

      SEARCH (Search_1)|(Search_2)|(Search_3)|..............|(Search_n)

      REPLACE (?1Replacement_1)(?2Replacement_2)(?3Replacement_3)...............(?nReplacement_n)


      BTW, if you have a list of your 66 Chinese items and a second list of the 66 English items, I could, first, generate the totality of your regex S/R…with an other regex !!

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 2
      • M
        Michael Hsu
        last edited by Jul 27, 2018, 3:12 PM

        @guy038

        Thank you very much for your prompt and helpful reply.

        I managed to construct the required S/R with your assistance.

        For the record, I did two S&R, one for the Old Testament references, and another one for the New Testament.

        Search: ((創)|((出)|((利)|((民)|((申)|((書)|((士)|((得)|((撒上)|((撒下)|((王上)|((王下)|((代上)|((代下)|((拉)|((尼)|((斯)|((伯)|((詩)|((箴)|((傳)|((歌)|((賽)|((耶)|((哀)|((結)|((但)|((何)|((珥)|((摩)|((俄)|((拿)|((彌)|((鴻)|((哈)|((番)|((該)|((亞)|((瑪)

        Replace: (?1(Gen)(?2(Ex)(?3(Lev)(?4(Num)(?5(Deut)(?6(Jos)(?7(Jdg)(?8(Ruth)(?9(1 Sam)(?10(2 Sam)(?11(1 Kgs)(?12(2 Kgs)(?13(1 Ch)(?14(2 Ch)(?15(Ezr)(?16(Neh)(?17(Est)(?18(Job)(?19(Ps)(?20(Prov)(?21(Ecc)(?22(Song)(?23(Isa)(?24(Jer)(?25(Lam)(?26(Ezk)(?27(Dan)(?28(Hos)(?29(Joel)(?30(Amos)(?31(Obad)(?32(Jon)(?33(Mic)(?34(Nah)(?35(Hab)(?36(Zep)(?37(Hag)(?38(Zec)(?39(Mal)\x20

        Search: ((啟)|((猶)|((約參)|((約貳)|((約壹)|((彼後)|((彼前)|((雅)|((來)|((門)|((多)|((提後)|((提前)|((帖後)|((帖前)|((西)|((腓)|((弗)|((加)|((林後)|((林前)|((羅)|((徒)|((約)|((路)|((可)|((太)|((啓)

        Replace: (?1(Rev)(?2(Jud)(?3(3 Jn)(?4(2 Jn)(?5(1 Jn)(?6(2 Pet)(?7(1 Pet)(?8(Jas)(?9(Heb)(?10(Phm)(?11(Tit)(?12(2 Tim)(?13(1 Tim)(?14(2 Th)(?15(1 Th)(?16(Col)(?17(Phil)(?18(Eph)(?19(Gal)(?20(2 Cor)(?21(1 Cor)(?22(Rom)(?23(Act)(?24(Jn)(?25(Lk)(?26(Mk)(?27(Mat)(?28(Rev)\x20

        Just one more question: can you clarify what you meant by the last sentence? How would I generate a regex with another regex?

        1 Reply Last reply Reply Quote 2
        • G
          guy038
          last edited by guy038 Jul 27, 2018, 6:00 PM Jul 27, 2018, 5:59 PM

          Hi, @michael-hsu, and All

          Glad that everything went fine :-)) I just noticed that I did not include the opening parenthesis ( in my regexes. And I guess that your Chinese characters may occur, in your file, without parentheses, too ! In that case, of course, the regexes become, for instance :

          SEARCH (\(創)|(\(出)

          REPLACE (?1\(Gen)(?2\(Ex)


          Now, in the last point of my previous post, I just wanted to point out that, from your complete list of 66 Chinese chars and the 66 English words, it’s possible to generate your big search / replace regexes with specific regexes. Here is, below, an example with the first 7 items of your list, but it would works for any number of items. Actually, not exactly, because the total size of the search and replace zones must not exceed 2046 characters !

          So from the list :

          創
          出
          利
          民
          申
          書
          士
          

          the regex :

          SEARCH (\w)\R(\R)?

          REPLACE \(\\\($1\)(?2:|)

          would produce your final SEARCH regex :

          (\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)
          

          Voilà !

          Similarly, with that given list :

          1Gen
          2Ex
          3Lev
          4Num
          5Deut
          6Jos
          7Jdg
          

          The following regex :

          SEARCH (\d+)(\w+)\R

          REPLACE \(?$1\\\($2\)

          would generate your final REPLACEMENT regex

          (?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)
          

          Whaoou ! And process would be identical for your 66 items ;-))

          Cheers,

          guy038

          P.S. :

          Thus, from your original text :

          (創1:1)
          (出2:2)
          (利3:3)
          (民4:4)
          (申5:5)
          (書6:6)
          (士7:7)
          

          and the two generated regexes, below :

          SEARCH (\(創)|(\(出)|(\(利)|(\(民)|(\(申)|(\(書)|(\(士)

          REPLACE (?1\(Gen)(?2\(Ex)(?3\(Lev)(?4\(Num)(?5\(Deut)(?6\(Jos)(?7\(Jdg)

          It would give your expected list !

          (Gen1:1)
          (Ex2:2)
          (Lev3:3)
          (Num4:4)
          (Deut5:5)
          (Jos6:6)
          (Jdg7:7)
          
          S 1 Reply Last reply Jul 27, 2018, 6:19 PM Reply Quote 2
          • S
            Scott Sumner @guy038
            last edited by Jul 27, 2018, 6:19 PM

            @guy038

            In the interim between your last two posts, I was wondering how you were going to get the numbers into the replace expression. I thought some new regex trick was coming, but no, the numbers are part of the data, they don’t get created by the regex. :-)

            However, if all one had was a list of the un-numbered replacement values, one could easily use Notepad++'s Edit (menu) -> Column Editor… to add the numbers before running the regex to create the regex.

            1 Reply Last reply Reply Quote 2
            • S
              Scott Sumner
              last edited by Jul 27, 2018, 6:25 PM

              @guy038 ,

              So let’s try RegexBuddy on one of your regexes from an earlier post in this thread, this time with links!:

              FIND (\(創)|(\(出)

              • Match this alternative (attempting the next alternative only if this one fails) (\(創)
                • Match the regex below and capture its match into backreference number 1 (\(創)
                  • Match the opening parenthesis character \(
                  • Match the character “創” literally 創
              • Or match this alternative (the entire match attempt fails if this one fails to match) (\(出)
                • Match the regex below and capture its match into backreference number 2 (\(出)
                  • Match the opening parenthesis character \(
                  • Match the character “出” literally 出

              REPLACE (?1\(Gen)(?2\(Ex)

              • Check whether capturing group number 1 was matched (?1\(Gen)
                • If the group was matched then insert the following \(Gen
                  • Insert an opening parenthesis \(
                  • Insert the character string “Gen” literally Gen
              • Check whether capturing group number 2 was matched (?2\(Ex)
                • If the group was matched then insert the following \(Ex
                  • Insert an opening parenthesis \(
                  • Insert the character string “Ex” literally Ex

              Created with RegexBuddy

              1 Reply Last reply Reply Quote 2
              • G
                guy038
                last edited by guy038 Jul 27, 2018, 6:56 PM Jul 27, 2018, 6:47 PM

                Hi, @michael-hsu, @scott-sumner,

                @scott-sumner :

                Aaaah ! I wish I could have been a magician and produced a regex which could generate an automatic numbering list, but, unfortunately, I’m not :-(( Don’t be sad : It’s just one of the advantages of scripting languages as Python and Lua :-))

                @michael-hsu

                Indeed, regexes are no good for calculus and operations as i+=1, are rather impossible to produce. So, I preceded your list of your English words, with a list of numbers, which can be generated, as Scott suggested, with the command Edit > Column Editor…

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 1
                • M
                  Michael Hsu
                  last edited by Jul 30, 2018, 9:06 AM

                  Hi, @guy038 & @scott-sumner

                  Wow, you guys are amazing.

                  Thanks very much for the assistance.

                  The next task is to replace the Chinese numbering system with Roman numerials. Using two lists to generate the S/R syntax is certainly most helping.

                  Once again, thanks a million

                  1 Reply Last reply Reply Quote 3
                  • M
                    Michael Hsu
                    last edited by Jul 30, 2018, 9:14 AM

                    Just for clarification to my last post, I mean replacing Chinese numerials with arabic numerials, just in case someone is wondering :-)

                    1 Reply Last reply Reply Quote 2
                    2 out of 9
                    • First post
                      2/9
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors