Community
    • Login

    Convert to lowercase in xml text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 3.2k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stefan KrumovS Offline
      Stefan Krumov
      last edited by

      Hi! I need to change cyrillic letters to lowercase without XML text (example: <Heart.label></Heart.label> )

      Example:
      BEFONE:
      <Heart.label>Сърце</Heart.label>
      <LeftLung.label>Ляв бял дроб</LeftLung.label>
      <RightLung.label>Десен бял дроб</RightLung.label>
      <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
      <RightKidney.label>Десен бъбрек</RightKidney.label>
      <Liver.label>Черен дроб</Liver.label>
      <Stomach.label>Стомах</Stomach.label>

      AFTER:
      <Heart.label>сърце</Heart.label>
      <LeftLung.label>ляв бял дроб</LeftLung.label>
      <RightLung.label>десен бял дроб</RightLung.label>
      <LeftKidney.label>ляв бъбрек</LeftKidney.label>
      <RightKidney.label>десен бъбрек</RightKidney.label>
      <Liver.label>черен дроб</Liver.label>
      <Stomach.label>стомах</Stomach.label>

      Is it possible to do this in a quick way?
      Sorry for bad English!
      Thanks in advance!

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hello, @stefan-krumov,

        Logically, the right and straight way to do it would be to use a S/R, with regular expressions, as below, which changes any part, between the tags <xxx.label> and </xxx.label> by the same lower-cased part !

        SEARCH (?-i)<(.+\.label)>(.+?)</\1>
        REPLACE <\1>\L\2\E</\1>

        Notes :

        • The (?-i) forces the search to be performed, in a non-insensitive way

        • Then, the part <(.+\.label)> stores the tag name as group 1, which is re-used to match the ending tag </\1>

        • And the (.+?) middle part is the smaller range of characters, between these two tags, that is to say the Cyrillic text, stored as group 2

        • In replacement, we rewrite, first, the starting tag <\1>, then the Cyrillic text \2, preceded by the global lower-case modifier \L and followed by the ending case modifier \E and, finally, the ending tag </\1>


        Unfortunately, due to a bug, in the BOOST Regex Engine, used internally by N++, the different case modifiers ( \L, \U, \l and \u ), used in the Replacement part, do NOT change the case of characters, with Unicode value greater than \x{007F}. Too bad, indeed :-((

        On the contrary, with the above S/R, for instance, the line :

        <Heart.label>Young Man Heart</Heart.label>
        

        would, correctly, be changed into :

        <Heart.label>young man heart</Heart.label>
        

        So, Stefan, we cannot use this method to lower case your Cyrillic characters ! However, just note that the Edit > Convert Case to > lowercase command (Crtl + U ) works as expected and does change any letter, of any alphabet, to its lower-cased counter-part :-))


        Therefore, Stefan, here is a work-around, to get what you want, easily enough ! The general idea is :

        • First, to surround any Cyrillic text, between the two tags <xxx.label> and </xxx.label>, with some tabulation characters

        • Secondly, to get a rectangular selection ( Column mode ) of all your Cyrillic text

        • Thirdly, to use the Edit > Convert Case to > lowercase command ( or the Ctrl + U shortcut )

        • Finally, to delete all the temporary tabulation characters

        So, from your initial example, below :

        <Heart.label>Сърце</Heart.label>
        <LeftLung.label>Ляв бял дроб</LeftLung.label>
        <RightLung.label>Десен бял дроб</RightLung.label>
        <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
        <RightKidney.label>Десен бъбрек</RightKidney.label>
        <Liver.label>Черен дроб</Liver.label>
        <Stomach.label>Стомах</Stomach.label>
        
        • Open the Replace dialog ( Ctrl + H )

        SEARCH (?-i)<(.+\.label)>\K.+?(?=</\1>)

        REPLACE \t\t\t\t$0\t\t\t\t

        OPTIONS Wrap around and Regular expression

        And ,after a click on the Replace All button ( do NOT use the Replace button ! ), you get :

        <Heart.label>				Сърце				</Heart.label>
        <LeftLung.label>				Ляв  бял дроб				</LeftLung.label>
        <RightLung.label>				Десен бял дроб				</RightLung.label>
        <LeftKidney.label>				Ляв бъбрек				</LeftKidney.label>
        <RightKidney.label>				Десен бъбрек				</RightKidney.label>
        <Liver.label>				Черен дроб				</Liver.label>
        <Stomach.label>				Стомах				</Stomach.label>
        

        Remarks :

        • You may, immediately, RE-run this S/R, to better isolate your Cyrillic text

        • You may change, in the Replacement part, the number of \t forms, surrounding the $0 syntax ( the entire matched expression )

        Now, create a rectangular selection of all the Cyrillic characters and execute a Ctrl + U action, to get this text lower-cased :

        <Heart.label>				сърце				</Heart.label>
        <LeftLung.label>				ляв бял дроб				</LeftLung.label>
        <RightLung.label>				десен бял дроб				</RightLung.label>
        <LeftKidney.label>				ляв бъбрек				</LeftKidney.label>
        <RightKidney.label>				десен бъбрек				</RightKidney.label>
        <Liver.label>				черен дроб				</Liver.label>
        <Stomach.label>				стомах				</Stomach.label>
        

        Finally, with the simple S/R, below, get rid of all the tabulation characters :

        SEARCH \t

        REPLACE Leave EMPTY

        And you get the expected text :

        <Heart.label>сърце</Heart.label>
        <LeftLung.label>ляв бял дроб</LeftLung.label>
        <RightLung.label>десен бял дроб</RightLung.label>
        <LeftKidney.label>ляв бъбрек</LeftKidney.label>
        <RightKidney.label>десен бъбрек</RightKidney.label>
        <Liver.label>черен дроб</Liver.label>
        <Stomach.label>стомах</Stomach.label>
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • Stefan KrumovS Offline
          Stefan Krumov
          last edited by

          Wow man! You made me very happy indeed :) Thanks so much

          1 Reply Last reply Reply Quote 0

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors