Add comma between two sentences on a two language document

Ali Jafari

Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.

Terry R

This post is deleted!

Ekopalypse

This post is deleted!

guy038

Hello, @ali-jafari, @terry-r, @ekopalypse and All,

As you may know, the different Arabic characters belong to one of these 5 Unicode blocks :

Arabic : [\x{0600}-\x{06FF}]
Arabic Supplement : [\x{0750}-\x{07FF}]
Arabic Extended-A : [\x{08A0}-\x{08FF}]
Arabic Presentation Forms-A : [\x{FB50}-\x{FDFF}]
Arabic Presentation Forms-B : [\x{FE70}-\x{FEFF}]

Refer to :

http://www.unicode.org/charts/PDF/U0600.pdf
http://www.unicode.org/charts/PDF/U0750.pdf
http://www.unicode.org/charts/PDF/U08A0.pdf
http://www.unicode.org/charts/PDF/UFB50.pdf
http://www.unicode.org/charts/PDF/UFE70.pdf

So, here is the road map :

Open the Replace dialog ( Ctrl + H )
SEARCH ([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}])
REPLACE \1,
Tick the Wrap around option
Select the Regular expression search mode
Click on the Replace All button

Notes :

The part ([\x{0021}-\x{007E}]) searches for any single ASCII character from \x{0021} = ! till \x{007E} = ~, stored as group 1, due to parentheses, possibly followed by a space char ( \x20? ) ONLY IF followed with an arabic char, from one of the five zones described above ( due to the lookahead construction )
In replacement, the English-American char is simply rewritten ( \1 ) with a comma

For instance the two lines :

this is a testاین یک آزمایش است
this is a test این یک آزمایش است

would be changed as :

this is a test,این یک آزمایش است
this is a test,این یک آزمایش است

Best Regards,

guy038

P.S. :

Note that the sub-regex which matches the English-American character is [\x{0021}-\x{007E}] and not [\x{0020}-\x{007E}] ! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-))

Ali Jafari

@guy038 said in Add comma between two sentences on a two language document:

\1,

Dear friend,

Thanks for your great support, I have tried and it answered.

All the Best.

Ali Jafari

@guy038 said in Add comma between two sentences on a two language document:

\1,

Dear my friend,

Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.

All the Best.

guy038

Hi, @ali-jafari and All,

Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite … 2002, which is good enough for my Word’s work !

SEARCH ( for Word ) : ([^0033-^0126])^0032*([^1536-^1791])

REPLACE ( for Word ) : \1,\2

In Word 2002, when your tick the search option Use generic characters, any non-Unicode char ( below \x0100 ), must be written as ^####, where #### represents the decimal value of the code-point. So, \x{0021} must be changed as ^0033, \x{007E} as ^0126 and so on…

Unfortunately, syntaxes over \xFF, as for the main Arabic range [^1536-^1791], that is to say {\x{0600}-{06FF}] in N++ ), is definitively not a valid syntax :-((

Moreover, the quantifier syntax {0,n}, after a possible space char, does not work, too. The {1,n}, only, seems valid ! So I prefered to use the usual * syntax

Certainly, the recent versions of Word allows the search of characters of the BMP ( so from \x{0000} to \x{FFFF} ). If so, the proposed regex S/R should work correctly !

On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?

Cheers,

guy038