Add comma between two sentences on a two language document
-
Does anybody know how can we input a comma at the endpoint of a line, when we are working in a two language document?, suppose we have a sentence like this “this is a testاین یک آزمایش است” in this sentence I have an English text and a Persian text at the end, now I would like to have this sentence like this"this is a test,این یک آزمایش است", actually I would like to do this in a large document with “Search and Replace” command.
-
This post is deleted! -
This post is deleted! -
Hello, @ali-jafari, @terry-r, @ekopalypse and All,
As you may know, the different Arabic characters belong to one of these
5
Unicode blocks :-
Arabic :
[\x{0600}-\x{06FF}]
-
Arabic Supplement :
[\x{0750}-\x{07FF}]
-
Arabic Extended-A :
[\x{08A0}-\x{08FF}]
-
Arabic Presentation Forms-A :
[\x{FB50}-\x{FDFF}]
-
Arabic Presentation Forms-B :
[\x{FE70}-\x{FEFF}]
Refer to :
http://www.unicode.org/charts/PDF/U0600.pdf
http://www.unicode.org/charts/PDF/U0750.pdf
http://www.unicode.org/charts/PDF/U08A0.pdf
http://www.unicode.org/charts/PDF/UFB50.pdf
http://www.unicode.org/charts/PDF/UFE70.pdf
So, here is the road map :
-
Open the Replace dialog (
Ctrl + H
) -
SEARCH
([\x{0021}-\x{007E}])\x20?(?=[\x{0600}-\x{06FF}\x{0750}-\x{07FF}\x{08A0}-\x{08FF}\x{FB50}-\x{FDFF}\x{FE70}-\x{FEFF}])
-
REPLACE
\1,
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the
Replace All
button
Notes :
-
The part
([\x{0021}-\x{007E}])
searches for any single ASCII character from\x{0021} = !
till\x{007E} = ~
, stored as group1
, due to parentheses, possibly followed by a space char (\x20?
) ONLY IF followed with an arabic char, from one of the five zones described above ( due to the lookahead construction ) -
In replacement, the English-American char is simply rewritten (
\1
) with a comma
For instance the two lines :
this is a testاین یک آزمایش است this is a test این یک آزمایش است
would be changed as :
this is a test,این یک آزمایش است this is a test,این یک آزمایش است
Best Regards,
guy038
P.S. :
Note that the sub-regex which matches the English-American character is
[\x{0021}-\x{007E}]
and not[\x{0020}-\x{007E}]
! Indeed, as the Arabic text contains, itself, spaces chars too, we would have some false positive matchs among the Arabic text ;-)) -
-
@guy038 said in Add comma between two sentences on a two language document:
\1,
Dear friend,
Thanks for your great support, I have tried and it answered.
All the Best.
-
@guy038 said in Add comma between two sentences on a two language document:
\1,
Dear my friend,
Could you please tell me is this way OK for Microsoft office Word or not?, or I need to do something else ?.
All the Best.
-
Hi, @ali-jafari and All,
Unfortunately, I cannot give you valuable information :-(( I’m presently using, on my old XP laptop, the Microsoft Office Suite …
2002
, which is good enough for my Word’s work !SEARCH ( for Word ) :
([^0033-^0126])^0032*([^1536-^1791])
REPLACE ( for Word ) :
\1,\2
In
Word 2002
, when your tick the search optionUse generic characters
, any non-Unicode char ( below\x0100
), must be written as^####
, where####
represents the decimal value of the code-point. So,\x{0021}
must be changed as^0033
,\x{007E}
as ^0126 and so on…Unfortunately, syntaxes over
\xFF
, as for the main Arabic range[^1536-^1791]
, that is to say{\x{0600}-{06FF}]
in N++ ), is definitively not a valid syntax :-((Moreover, the quantifier syntax
{0,n}
, after a possible space char, does not work, too. The{1,n}
, only, seems valid ! So I prefered to use the usual*
syntaxCertainly, the recent versions of Word allows the search of characters of the BMP ( so from
\x{0000}
to\x{FFFF}
). If so, the proposed regex S/R should work correctly !On the other hand, why not process with the N++ regex engine, first and, then, paste your updated text in Word ?
Cheers,
guy038