• Login
Community
  • Login

Add comma between two sentences on a two language document

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
7 Posts 4 Posters 390 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Ali Jafari
    last edited by Oct 6, 2020, 9:27 AM

    Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.

    E 1 Reply Last reply Oct 6, 2020, 10:13 AM Reply Quote 0
    • T
      Terry R
      last edited by Oct 6, 2020, 9:50 AM

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • E
        Ekopalypse @Ali Jafari
        last edited by Oct 6, 2020, 10:13 AM

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Oct 6, 2020, 9:32 PM Oct 6, 2020, 5:52 PM

          Hello, @ali-jafari, @terry-r, @ekopalypse and All,

          As you may know, the different Arabic characters belong to one of these 5 Unicode blocks :

          • Arabic : [\x{0600}-\x{06FF}]

          • Arabic Supplement : [\x{0750}-\x{07FF}]

          • Arabic Extended-A : [\x{08A0}-\x{08FF}]

          • Arabic Presentation Forms-A : [\x{FB50}-\x{FDFF}]

          • Arabic Presentation Forms-B : [\x{FE70}-\x{FEFF}]

          Refer to :

          http://www.unicode.org/charts/PDF/U0600.pdf
          http://www.unicode.org/charts/PDF/U0750.pdf
          http://www.unicode.org/charts/PDF/U08A0.pdf
          http://www.unicode.org/charts/PDF/UFB50.pdf
          http://www.unicode.org/charts/PDF/UFE70.pdf


          So, here is the road map :

          • Open the Replace dialog ( Ctrl + H )

          • SEARCH ([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}])

          • REPLACE \1,

          • Tick the Wrap around option

          • Select the Regular expression search mode

          • Click on the Replace All button


          Notes :

          • The part ([\x{0021}-\x{007E}]) searches for any single ASCII character from \x{0021} = ! till \x{007E} = ~, stored as group 1, due to parentheses, possibly followed by a space char ( \x20? ) ONLY IF  followed with an arabic char, from one of the five zones described above ( due to the lookahead construction )

          • In replacement, the English-American char is simply rewritten ( \1 ) with a comma

          For instance the two lines :

          this is a testاین یک آزمایش است
          this is a test این یک آزمایش است
          

          would be changed as :

          this is a test,این یک آزمایش است
          this is a test,این یک آزمایش است
          

          Best Regards,

          guy038

          P.S. :

          Note that the sub-regex which matches the English-American character is [\x{0021}-\x{007E}] and not [\x{0020}-\x{007E}] ! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-))

          1 Reply Last reply Reply Quote 3
          • A
            Ali Jafari
            last edited by Oct 7, 2020, 8:26 AM

            @guy038 said in Add comma between two sentences on a two language document:

            \1,

            Dear friend,

            Thanks for your great support, I have tried and it answered.

            All the Best.

            1 Reply Last reply Reply Quote 0
            • A
              Ali Jafari
              last edited by Oct 7, 2020, 8:31 AM

              @guy038 said in Add comma between two sentences on a two language document:

              \1,

              Dear my friend,

              Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.

              All the Best.

              1 Reply Last reply Reply Quote 0
              • G
                guy038
                last edited by guy038 Oct 7, 2020, 11:27 AM Oct 7, 2020, 11:25 AM

                Hi, @ali-jafari and All,

                Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite … 2002, which is good enough for my Word’s work !

                SEARCH ( for Word ) : ([^0033-^0126])^0032*([^1536-^1791])

                REPLACE ( for Word ) : \1,\2

                In Word 2002, when your tick the search option Use generic characters, any non-Unicode char ( below \x0100 ), must be written as ^####, where #### represents the decimal value of the code-point. So, \x{0021} must be changed as ^0033, \x{007E} as ^0126 and so on…

                Unfortunately, syntaxes over \xFF, as for the main Arabic range [^1536-^1791], that is to say {\x{0600}-{06FF}] in N++ ), is definitively not a valid syntax :-((

                Moreover, the quantifier syntax {0,n}, after a possible space char, does not work, too. The {1,n}, only, seems valid ! So I prefered to use the usual * syntax

                Certainly, the recent versions of Word allows the search of characters of the BMP ( so from \x{0000} to \x{FFFF} ). If so, the proposed regex S/R should work correctly !

                On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?

                Cheers,

                guy038

                1 Reply Last reply Reply Quote 0
                1 out of 7
                • First post
                  1/7
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors