Add comma between two sentences on a two language document
-
Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.
-
This post is deleted! -
This post is deleted! -
Hello, @ali-jafari, @terry-r, @ekopalypse and All,
As you may know, the different Arabic characters belong to one of these
5Unicode blocks :-
Arabic :
[\x{0600}-\x{06FF}] -
Arabic Supplement :
[\x{0750}-\x{07FF}] -
Arabic Extended-A :
[\x{08A0}-\x{08FF}] -
Arabic Presentation Forms-A :
[\x{FB50}-\x{FDFF}] -
Arabic Presentation Forms-B :
[\x{FE70}-\x{FEFF}]
Refer to :
http://www.unicode.org/charts/PDF/U0600.pdf
http://www.unicode.org/charts/PDF/U0750.pdf
http://www.unicode.org/charts/PDF/U08A0.pdf
http://www.unicode.org/charts/PDF/UFB50.pdf
http://www.unicode.org/charts/PDF/UFE70.pdf
So, here is the road map :
-
Open the Replace dialog (
Ctrl + H) -
SEARCH
([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}]) -
REPLACE
\1, -
Tick the
Wrap aroundoption -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton
Notes :
-
The part
([\x{0021}-\x{007E}])searches for any single ASCII character from\x{0021} = !till\x{007E} = ~, stored as group1, due to parentheses, possibly followed by a space char (\x20?) ONLY IF followed with an arabic char, from one of the five zones described above ( due to the lookahead construction ) -
In replacement, the English-American char is simply rewritten (
\1) with a comma
For instance the two lines :
this is a testاین یک آزمایش است this is a test این یک آزمایش استwould be changed as :
this is a test,این یک آزمایش است this is a test,این یک آزمایش استBest Regards,
guy038
P.S. :
Note that the sub-regex which matches the English-American character is
[\x{0021}-\x{007E}]and not[\x{0020}-\x{007E}]! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-)) -
-
@guy038 said in Add comma between two sentences on a two language document:
\1,
Dear friend,
Thanks for your great support, I have tried and it answered.
All the Best.
-
@guy038 said in Add comma between two sentences on a two language document:
\1,
Dear my friend,
Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.
All the Best.
-
Hi, @ali-jafari and All,
Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite …
2002, which is good enough for my Word’s work !SEARCH ( for Word ) :
([^0033-^0126])^0032*([^1536-^1791])REPLACE ( for Word ) :
\1,\2In
Word 2002, when your tick the search optionUse generic characters, any non-Unicode char ( below\x0100), must be written as^####, where####represents the decimal value of the code-point. So,\x{0021}must be changed as^0033,\x{007E}as ^0126 and so on…Unfortunately, syntaxes over
\xFF, as for the main Arabic range[^1536-^1791], that is to say{\x{0600}-{06FF}]in N++ ), is definitively not a valid syntax :-((Moreover, the quantifier syntax
{0,n}, after a possible space char, does not work, too. The{1,n}, only, seems valid ! So I prefered to use the usual*syntaxCertainly, the recent versions of Word allows the search of characters of the BMP ( so from
\x{0000}to\x{FFFF}). If so, the proposed regex S/R should work correctly !On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?
Cheers,
guy038