Cyrillic UCase to Lcase in the middle and end of words
-
Hi there,
I have a text file with a bunch of words in Cyrillic with capital letters in the middle or end of the words like this:
абигЭль
зИппо
клубЫ
бегОм
надЕяться…I’m trying the following Regex to lowercase ONLY the capitalized characters:
Find: ( \w+)([\x{0410}-\x{042F}])(\w+)
Replace with: \1\L\2\E\3Apparently, the Find expression works but the same is not true for Replace with.
Can you help with this one? Thanks!!!
-
Hello, @Mrsimurq,
There is no solution, indeed :-(( Just because the N++ version of the Boost C++ Regex library still contains some bugs whose that one !
The case modifiers (
\l
,\u
,\L
and\U
), used in the replacement part, works, only, on characters, with Unicode code-point< \x007F
, that is to say, only on the non accentuated set of letters[A-za-z]
:-(( Really bad !For instance, if you consider the French text, below, all in upper-case, pasted in a new tab :
C'EST LÀ, PRÈS DE LA FORÊT, DANS UN GÎTE, OÙ RÉGNAIT UN GRAND CAPHARNAÜM, QUE L'AÏEUL ÔTA SA FLÛTE ET SON BÂTON DE SON CANOË
The regex S/R : SEARCH
(?s).+
and REPLACE\L$0
, would give the text :c'est lÀ, prÈs de la forÊt, dans un gÎte, oÙ rÉgnait un grand capharnaÜm, que l'aÏeul Ôta sa flÛte et son bÂton de son canoË
Note that all the accentuated characters are, still, in upper-case !
Now, assuming the Cyrillic text :
Upper-Case Lower-case Your example АБИГЭЛЬ абигэль абигЭль ЗИППО зиппо зИппо КЛУБЫ клубы клубЫ БЕГОМ бегом бегОм НАДЕЯТЬСЯ… надеяться… надЕяться…
The above S/R would get :
upper-case lower-case your example АБИГЭЛЬ абигэль абигЭль ЗИППО зиппо зИппо КЛУБЫ клубы клубЫ БЕГОМ бегом бегОм НАДЕЯТЬСЯ… надеяться… надЕяться…
Ironically, just the title line is lower-cased ! All the other cyrillic characters, with Unicode value, between
\x{0400}
and\x{04FF}
, are not converted.
However, if you select, manually, any amount of text, either with a normal or rectangular selection, you may change it :
-
in UPPER-case, with the command menu Edit > Convert Case to > UPPERCASE or
Ctrl + Shift + U
-
in lower-case, with the command menu Edit > Convert Case to > lowercase or
Ctrl + U
Best Regards,
guy038
-
-
guy038, thanks for your prompt and plain reply! :))
I just can hope that this issue will be solved asap…
As a temporary working solution, I just S/R the above capitals, which in fact are vowels only, one by one. Eight in total, not that difficult… So, my Replace with regex for Э -> э looks like: \1э\3
Thanks!