Community
    • Login

    Convert to lowercase in xml text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Stefan KrumovS
      Stefan Krumov
      last edited by

      Hi! I need to change cyrillic letters to lowercase without XML text (example: <Heart.label></Heart.label> )

      Example:
      BEFONE:
      <Heart.label>Сърце</Heart.label>
      <LeftLung.label>Ляв бял дроб</LeftLung.label>
      <RightLung.label>Десен бял дроб</RightLung.label>
      <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
      <RightKidney.label>Десен бъбрек</RightKidney.label>
      <Liver.label>Черен дроб</Liver.label>
      <Stomach.label>Стомах</Stomach.label>

      AFTER:
      <Heart.label>сърце</Heart.label>
      <LeftLung.label>ляв бял дроб</LeftLung.label>
      <RightLung.label>десен бял дроб</RightLung.label>
      <LeftKidney.label>ляв бъбрек</LeftKidney.label>
      <RightKidney.label>десен бъбрек</RightKidney.label>
      <Liver.label>черен дроб</Liver.label>
      <Stomach.label>стомах</Stomach.label>

      Is it possible to do this in a quick way?
      Sorry for bad English!
      Thanks in advance!

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @stefan-krumov,

        Logically, the right and straight way to do it would be to use a S/R, with regular expressions, as below, which changes any part, between the tags <xxx.label> and </xxx.label> by the same lower-cased part !

        SEARCH (?-i)<(.+\.label)>(.+?)</\1>
        REPLACE <\1>\L\2\E</\1>

        Notes :

        • The (?-i) forces the search to be performed, in a non-insensitive way

        • Then, the part <(.+\.label)> stores the tag name as group 1, which is re-used to match the ending tag </\1>

        • And the (.+?) middle part is the smaller range of characters, between these two tags, that is to say the Cyrillic text, stored as group 2

        • In replacement, we rewrite, first, the starting tag <\1>, then the Cyrillic text \2, preceded by the global lower-case modifier \L and followed by the ending case modifier \E and, finally, the ending tag </\1>


        Unfortunately, due to a bug, in the BOOST Regex Engine, used internally by N++, the different case modifiers ( \L, \U, \l and \u ), used in the Replacement part, do NOT change the case of characters, with Unicode value greater than \x{007F}. Too bad, indeed :-((

        On the contrary, with the above S/R, for instance, the line :

        <Heart.label>Young Man Heart</Heart.label>
        

        would, correctly, be changed into :

        <Heart.label>young man heart</Heart.label>
        

        So, Stefan, we cannot use this method to lower case your Cyrillic characters ! However, just note that the Edit > Convert Case to > lowercase command (Crtl + U ) works as expected and does change any letter, of any alphabet, to its lower-cased counter-part :-))


        Therefore, Stefan, here is a work-around, to get what you want, easily enough ! The general idea is :

        • First, to surround any Cyrillic text, between the two tags <xxx.label> and </xxx.label>, with some tabulation characters

        • Secondly, to get a rectangular selection ( Column mode ) of all your Cyrillic text

        • Thirdly, to use the Edit > Convert Case to > lowercase command ( or the Ctrl + U shortcut )

        • Finally, to delete all the temporary tabulation characters

        So, from your initial example, below :

        <Heart.label>Сърце</Heart.label>
        <LeftLung.label>Ляв бял дроб</LeftLung.label>
        <RightLung.label>Десен бял дроб</RightLung.label>
        <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
        <RightKidney.label>Десен бъбрек</RightKidney.label>
        <Liver.label>Черен дроб</Liver.label>
        <Stomach.label>Стомах</Stomach.label>
        
        • Open the Replace dialog ( Ctrl + H )

        SEARCH (?-i)<(.+\.label)>\K.+?(?=</\1>)

        REPLACE \t\t\t\t$0\t\t\t\t

        OPTIONS Wrap around and Regular expression

        And ,after a click on the Replace All button ( do NOT use the Replace button ! ), you get :

        <Heart.label>				Сърце				</Heart.label>
        <LeftLung.label>				Ляв  бял дроб				</LeftLung.label>
        <RightLung.label>				Десен бял дроб				</RightLung.label>
        <LeftKidney.label>				Ляв бъбрек				</LeftKidney.label>
        <RightKidney.label>				Десен бъбрек				</RightKidney.label>
        <Liver.label>				Черен дроб				</Liver.label>
        <Stomach.label>				Стомах				</Stomach.label>
        

        Remarks :

        • You may, immediately, RE-run this S/R, to better isolate your Cyrillic text

        • You may change, in the Replacement part, the number of \t forms, surrounding the $0 syntax ( the entire matched expression )

        Now, create a rectangular selection of all the Cyrillic characters and execute a Ctrl + U action, to get this text lower-cased :

        <Heart.label>				сърце				</Heart.label>
        <LeftLung.label>				ляв бял дроб				</LeftLung.label>
        <RightLung.label>				десен бял дроб				</RightLung.label>
        <LeftKidney.label>				ляв бъбрек				</LeftKidney.label>
        <RightKidney.label>				десен бъбрек				</RightKidney.label>
        <Liver.label>				черен дроб				</Liver.label>
        <Stomach.label>				стомах				</Stomach.label>
        

        Finally, with the simple S/R, below, get rid of all the tabulation characters :

        SEARCH \t

        REPLACE Leave EMPTY

        And you get the expected text :

        <Heart.label>сърце</Heart.label>
        <LeftLung.label>ляв бял дроб</LeftLung.label>
        <RightLung.label>десен бял дроб</RightLung.label>
        <LeftKidney.label>ляв бъбрек</LeftKidney.label>
        <RightKidney.label>десен бъбрек</RightKidney.label>
        <Liver.label>черен дроб</Liver.label>
        <Stomach.label>стомах</Stomach.label>
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • Stefan KrumovS
          Stefan Krumov
          last edited by

          Wow man! You made me very happy indeed :) Thanks so much

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors