Add comma between two sentences on a two language document

  • Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.

  • This post is deleted!

  • This post is deleted!

  • Hello, @ali-jafari, @terry-r, @ekopalypse and All,

    As you may know, the different Arabic characters belong to one of these 5 Unicode blocks :

    • Arabic : [\x{0600}-\x{06FF}]

    • Arabic Supplement : [\x{0750}-\x{07FF}]

    • Arabic Extended-A : [\x{08A0}-\x{08FF}]

    • Arabic Presentation Forms-A : [\x{FB50}-\x{FDFF}]

    • Arabic Presentation Forms-B : [\x{FE70}-\x{FEFF}]

    Refer to :

    So, here is the road map :

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH ([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}])

    • REPLACE \1,

    • Tick the Wrap around option

    • Select the Regular expression search mode

    • Click on the Replace All button

    Notes :

    • The part ([\x{0021}-\x{007E}]) searches for any single ASCII character from \x{0021} = ! till \x{007E} = ~, stored as group 1, due to parentheses, possibly followed by a space char ( \x20? ) ONLY IF  followed with an arabic char, from one of the five zones described above ( due to the lookahead construction )

    • In replacement, the English-American char is simply rewritten ( \1 ) with a comma

    For instance the two lines :

    this is a testاین یک آزمایش است
    this is a test این یک آزمایش است

    would be changed as :

    this is a test,این یک آزمایش است
    this is a test,این یک آزمایش است

    Best Regards,


    P.S. :

    Note that the sub-regex which matches the English-American character is [\x{0021}-\x{007E}] and not [\x{0020}-\x{007E}] ! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-))

  • @guy038 said in Add comma between two sentences on a two language document:


    Dear friend,

    Thanks for your great support, I have tried and it answered.

    All the Best.

  • @guy038 said in Add comma between two sentences on a two language document:


    Dear my friend,

    Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.

    All the Best.

  • Hi, @ali-jafari and All,

    Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite2002, which is good enough for my Word’s work !

    SEARCH ( for Word ) : ([^0033-^0126])^0032*([^1536-^1791])

    REPLACE ( for Word ) : \1,\2

    In Word 2002, when your tick the search option Use generic characters, any non-Unicode char ( below \x0100 ), must be written as ^####, where #### represents the decimal value of the code-point. So, \x{0021} must be changed as ^0033, \x{007E} as ^0126 and so on…

    Unfortunately, syntaxes over \xFF, as for the main Arabic range [^1536-^1791], that is to say {\x{0600}-{06FF}] in N++ ), is definitively not a valid syntax :-((

    Moreover, the quantifier syntax {0,n}, after a possible space char, does not work, too. The {1,n}, only, seems valid ! So I prefered to use the usual * syntax

    Certainly, the recent versions of Word allows the search of characters of the BMP ( so from \x{0000} to \x{FFFF} ). If so, the proposed regex S/R should work correctly !

    On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?



Log in to reply