Community
    • Login

    Cyrillic UCase to Lcase in the middle and end of words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MrSimurqM
      MrSimurq
      last edited by MrSimurq

      Hi there,

      I have a text file with a bunch of words in Cyrillic with capital letters in the middle or end of the words like this:

      абигЭль
      зИппо
      клубЫ
      бегОм
      надЕяться…

      I’m trying the following Regex to lowercase ONLY the capitalized characters:

      Find: ( \w+)([\x{0410}-\x{042F}])(\w+)
      Replace with: \1\L\2\E\3

      Apparently, the Find expression works but the same is not true for Replace with.

      Can you help with this one? Thanks!!!

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @Mrsimurq,

        There is no solution, indeed :-(( Just because the N++ version of the Boost C++ Regex library still contains some bugs whose that one !

        The case modifiers ( \l, \u, \L and \U ), used in the replacement part, works, only, on characters, with Unicode code-point < \x007F, that is to say, only on the non accentuated set of letters [A-za-z] :-(( Really bad !

        For instance, if you consider the French text, below, all in upper-case, pasted in a new tab :

        C'EST LÀ, PRÈS DE LA FORÊT, DANS UN GÎTE, OÙ RÉGNAIT UN GRAND CAPHARNAÜM, QUE L'AÏEUL ÔTA SA FLÛTE ET SON BÂTON DE SON CANOË
        

        The regex S/R : SEARCH (?s).+ and REPLACE \L$0, would give the text :

        c'est lÀ, prÈs de la forÊt, dans un gÎte, oÙ rÉgnait un grand capharnaÜm, que l'aÏeul Ôta sa flÛte et son bÂton de son canoË
        

        Note that all the accentuated characters are, still, in upper-case !


        Now, assuming the Cyrillic text :

            Upper-Case         Lower-case           Your example
                                                 
            АБИГЭЛЬ            абигэль              абигЭль
            ЗИППО              зиппо                зИппо
            КЛУБЫ              клубы                клубЫ
            БЕГОМ              бегом                бегОм
            НАДЕЯТЬСЯ…         надеяться…           надЕяться…
        

        The above S/R would get :

            upper-case         lower-case           your example
                                                 
            АБИГЭЛЬ            абигэль              абигЭль
            ЗИППО              зиппо                зИппо
            КЛУБЫ              клубы                клубЫ
            БЕГОМ              бегом                бегОм
            НАДЕЯТЬСЯ…         надеяться…           надЕяться…
        

        Ironically, just the title line is lower-cased ! All the other cyrillic characters, with Unicode value, between \x{0400} and \x{04FF}, are not converted.


        However, if you select, manually, any amount of text, either with a normal or rectangular selection, you may change it :

        • in UPPER-case, with the command menu Edit > Convert Case to > UPPERCASE or Ctrl + Shift + U

        • in lower-case, with the command menu Edit > Convert Case to > lowercase or Ctrl + U

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • MrSimurqM
          MrSimurq
          last edited by

          guy038, thanks for your prompt and plain reply! :))

          I just can hope that this issue will be solved asap…

          As a temporary working solution, I just S/R the above capitals, which in fact are vowels only, one by one. Eight in total, not that difficult… So, my Replace with regex for Э -> э looks like: \1э\3

          Thanks!

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors