Community
    • Login

    Add comma between two sentences on a two language document

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 4 Posters 379 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ali JafariA
      Ali Jafari
      last edited by

      Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.

      EkopalypseE 1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • EkopalypseE
          Ekopalypse @Ali Jafari
          last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hello, @ali-jafari, @terry-r, @ekopalypse and All,

            As you may know, the different Arabic characters belong to one of these 5 Unicode blocks :

            • Arabic : [\x{0600}-\x{06FF}]

            • Arabic Supplement : [\x{0750}-\x{07FF}]

            • Arabic Extended-A : [\x{08A0}-\x{08FF}]

            • Arabic Presentation Forms-A : [\x{FB50}-\x{FDFF}]

            • Arabic Presentation Forms-B : [\x{FE70}-\x{FEFF}]

            Refer to :

            http://www.unicode.org/charts/PDF/U0600.pdf
            http://www.unicode.org/charts/PDF/U0750.pdf
            http://www.unicode.org/charts/PDF/U08A0.pdf
            http://www.unicode.org/charts/PDF/UFB50.pdf
            http://www.unicode.org/charts/PDF/UFE70.pdf


            So, here is the road map :

            • Open the Replace dialog ( Ctrl + H )

            • SEARCH ([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}])

            • REPLACE \1,

            • Tick the Wrap around option

            • Select the Regular expression search mode

            • Click on the Replace All button


            Notes :

            • The part ([\x{0021}-\x{007E}]) searches for any single ASCII character from \x{0021} = ! till \x{007E} = ~, stored as group 1, due to parentheses, possibly followed by a space char ( \x20? ) ONLY IF  followed with an arabic char, from one of the five zones described above ( due to the lookahead construction )

            • In replacement, the English-American char is simply rewritten ( \1 ) with a comma

            For instance the two lines :

            this is a testاین یک آزمایش است
            this is a test این یک آزمایش است
            

            would be changed as :

            this is a test,این یک آزمایش است
            this is a test,این یک آزمایش است
            

            Best Regards,

            guy038

            P.S. :

            Note that the sub-regex which matches the English-American character is [\x{0021}-\x{007E}] and not [\x{0020}-\x{007E}] ! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-))

            1 Reply Last reply Reply Quote 3
            • Ali JafariA
              Ali Jafari
              last edited by

              @guy038 said in Add comma between two sentences on a two language document:

              \1,

              Dear friend,

              Thanks for your great support, I have tried and it answered.

              All the Best.

              1 Reply Last reply Reply Quote 0
              • Ali JafariA
                Ali Jafari
                last edited by

                @guy038 said in Add comma between two sentences on a two language document:

                \1,

                Dear my friend,

                Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.

                All the Best.

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @ali-jafari and All,

                  Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite … 2002, which is good enough for my Word’s work !

                  SEARCH ( for Word ) : ([^0033-^0126])^0032*([^1536-^1791])

                  REPLACE ( for Word ) : \1,\2

                  In Word 2002, when your tick the search option Use generic characters, any non-Unicode char ( below \x0100 ), must be written as ^####, where #### represents the decimal value of the code-point. So, \x{0021} must be changed as ^0033, \x{007E} as ^0126 and so on…

                  Unfortunately, syntaxes over \xFF, as for the main Arabic range [^1536-^1791], that is to say {\x{0600}-{06FF}] in N++ ), is definitively not a valid syntax :-((

                  Moreover, the quantifier syntax {0,n}, after a possible space char, does not work, too. The {1,n}, only, seems valid ! So I prefered to use the usual * syntax

                  Certainly, the recent versions of Word allows the search of characters of the BMP ( so from \x{0000} to \x{FFFF} ). If so, the proposed regex S/R should work correctly !

                  On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors