Community
    • Login

    Cyrillic UCase to Lcase in the middle and end of words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 2.1k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MrSimurqM Offline
      MrSimurq
      last edited by MrSimurq

      Hi there,

      I have a text file with a bunch of words in Cyrillic with capital letters in the middle or end of the words like this:

      абигЭль
      зИппо
      клубЫ
      бегОм
      надЕяться…

      I’m trying the following Regex to lowercase ONLY the capitalized characters:

      Find: ( \w+)([\x{0410}-\x{042F}])(\w+)
      Replace with: \1\L\2\E\3

      Apparently, the Find expression works but the same is not true for Replace with.

      Can you help with this one? Thanks!!!

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hello, @Mrsimurq,

        There is no solution, indeed :-(( Just because the N++ version of the Boost C++ Regex library still contains some bugs whose that one !

        The case modifiers ( \l, \u, \L and \U ), used in the replacement part, works, only, on characters, with Unicode code-point < \x007F, that is to say, only on the non accentuated set of letters [A-za-z] :-(( Really bad !

        For instance, if you consider the French text, below, all in upper-case, pasted in a new tab :

        C'EST LÀ, PRÈS DE LA FORÊT, DANS UN GÎTE, OÙ RÉGNAIT UN GRAND CAPHARNAÜM, QUE L'AÏEUL ÔTA SA FLÛTE ET SON BÂTON DE SON CANOË
        

        The regex S/R : SEARCH (?s).+ and REPLACE \L$0, would give the text :

        c'est lÀ, prÈs de la forÊt, dans un gÎte, oÙ rÉgnait un grand capharnaÜm, que l'aÏeul Ôta sa flÛte et son bÂton de son canoË
        

        Note that all the accentuated characters are, still, in upper-case !


        Now, assuming the Cyrillic text :

            Upper-Case         Lower-case           Your example
                                                 
            АБИГЭЛЬ            абигэль              абигЭль
            ЗИППО              зиппо                зИппо
            КЛУБЫ              клубы                клубЫ
            БЕГОМ              бегом                бегОм
            НАДЕЯТЬСЯ…         надеяться…           надЕяться…
        

        The above S/R would get :

            upper-case         lower-case           your example
                                                 
            АБИГЭЛЬ            абигэль              абигЭль
            ЗИППО              зиппо                зИппо
            КЛУБЫ              клубы                клубЫ
            БЕГОМ              бегом                бегОм
            НАДЕЯТЬСЯ…         надеяться…           надЕяться…
        

        Ironically, just the title line is lower-cased ! All the other cyrillic characters, with Unicode value, between \x{0400} and \x{04FF}, are not converted.


        However, if you select, manually, any amount of text, either with a normal or rectangular selection, you may change it :

        • in UPPER-case, with the command menu Edit > Convert Case to > UPPERCASE or Ctrl + Shift + U

        • in lower-case, with the command menu Edit > Convert Case to > lowercase or Ctrl + U

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • MrSimurqM Offline
          MrSimurq
          last edited by

          guy038, thanks for your prompt and plain reply! :))

          I just can hope that this issue will be solved asap…

          As a temporary working solution, I just S/R the above capitals, which in fact are vowels only, one by one. Eight in total, not that difficult… So, my Replace with regex for Э -> э looks like: \1э\3

          Thanks!

          1 Reply Last reply Reply Quote 0

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors