Convert to lowercase in xml text



  • Hi! I need to change cyrillic letters to lowercase without XML text (example: <Heart.label></Heart.label> )

    Example:
    BEFONE:
    <Heart.label>Сърце</Heart.label>
    <LeftLung.label>Ляв бял дроб</LeftLung.label>
    <RightLung.label>Десен бял дроб</RightLung.label>
    <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
    <RightKidney.label>Десен бъбрек</RightKidney.label>
    <Liver.label>Черен дроб</Liver.label>
    <Stomach.label>Стомах</Stomach.label>

    AFTER:
    <Heart.label>сърце</Heart.label>
    <LeftLung.label>ляв бял дроб</LeftLung.label>
    <RightLung.label>десен бял дроб</RightLung.label>
    <LeftKidney.label>ляв бъбрек</LeftKidney.label>
    <RightKidney.label>десен бъбрек</RightKidney.label>
    <Liver.label>черен дроб</Liver.label>
    <Stomach.label>стомах</Stomach.label>

    Is it possible to do this in a quick way?
    Sorry for bad English!
    Thanks in advance!



  • Hello, @stefan-krumov,

    Logically, the right and straight way to do it would be to use a S/R, with regular expressions, as below, which changes any part, between the tags <xxx.label> and </xxx.label> by the same lower-cased part !

    SEARCH (?-i)<(.+\.label)>(.+?)</\1>
    REPLACE <\1>\L\2\E</\1>

    Notes :

    • The (?-i) forces the search to be performed, in a non-insensitive way

    • Then, the part <(.+\.label)> stores the tag name as group 1, which is re-used to match the ending tag </\1>

    • And the (.+?) middle part is the smaller range of characters, between these two tags, that is to say the Cyrillic text, stored as group 2

    • In replacement, we rewrite, first, the starting tag <\1>, then the Cyrillic text \2, preceded by the global lower-case modifier \L and followed by the ending case modifier \E and, finally, the ending tag </\1>


    Unfortunately, due to a bug, in the BOOST Regex Engine, used internally by N++, the different case modifiers ( \L, \U, \l and \u ), used in the Replacement part, do NOT change the case of characters, with Unicode value greater than \x{007F}. Too bad, indeed :-((

    On the contrary, with the above S/R, for instance, the line :

    <Heart.label>Young Man Heart</Heart.label>
    

    would, correctly, be changed into :

    <Heart.label>young man heart</Heart.label>
    

    So, Stefan, we cannot use this method to lower case your Cyrillic characters ! However, just note that the Edit > Convert Case to > lowercase command (Crtl + U ) works as expected and does change any letter, of any alphabet, to its lower-cased counter-part :-))


    Therefore, Stefan, here is a work-around, to get what you want, easily enough ! The general idea is :

    • First, to surround any Cyrillic text, between the two tags <xxx.label> and </xxx.label>, with some tabulation characters

    • Secondly, to get a rectangular selection ( Column mode ) of all your Cyrillic text

    • Thirdly, to use the Edit > Convert Case to > lowercase command ( or the Ctrl + U shortcut )

    • Finally, to delete all the temporary tabulation characters

    So, from your initial example, below :

    <Heart.label>Сърце</Heart.label>
    <LeftLung.label>Ляв бял дроб</LeftLung.label>
    <RightLung.label>Десен бял дроб</RightLung.label>
    <LeftKidney.label>Ляв бъбрек</LeftKidney.label>
    <RightKidney.label>Десен бъбрек</RightKidney.label>
    <Liver.label>Черен дроб</Liver.label>
    <Stomach.label>Стомах</Stomach.label>
    
    • Open the Replace dialog ( Ctrl + H )

    SEARCH (?-i)<(.+\.label)>\K.+?(?=</\1>)

    REPLACE \t\t\t\t$0\t\t\t\t

    OPTIONS Wrap around and Regular expression

    And ,after a click on the Replace All button ( do NOT use the Replace button ! ), you get :

    <Heart.label>				Сърце				</Heart.label>
    <LeftLung.label>				Ляв  бял дроб				</LeftLung.label>
    <RightLung.label>				Десен бял дроб				</RightLung.label>
    <LeftKidney.label>				Ляв бъбрек				</LeftKidney.label>
    <RightKidney.label>				Десен бъбрек				</RightKidney.label>
    <Liver.label>				Черен дроб				</Liver.label>
    <Stomach.label>				Стомах				</Stomach.label>
    

    Remarks :

    • You may, immediately, RE-run this S/R, to better isolate your Cyrillic text

    • You may change, in the Replacement part, the number of \t forms, surrounding the $0 syntax ( the entire matched expression )

    Now, create a rectangular selection of all the Cyrillic characters and execute a Ctrl + U action, to get this text lower-cased :

    <Heart.label>				сърце				</Heart.label>
    <LeftLung.label>				ляв бял дроб				</LeftLung.label>
    <RightLung.label>				десен бял дроб				</RightLung.label>
    <LeftKidney.label>				ляв бъбрек				</LeftKidney.label>
    <RightKidney.label>				десен бъбрек				</RightKidney.label>
    <Liver.label>				черен дроб				</Liver.label>
    <Stomach.label>				стомах				</Stomach.label>
    

    Finally, with the simple S/R, below, get rid of all the tabulation characters :

    SEARCH \t

    REPLACE Leave EMPTY

    And you get the expected text :

    <Heart.label>сърце</Heart.label>
    <LeftLung.label>ляв бял дроб</LeftLung.label>
    <RightLung.label>десен бял дроб</RightLung.label>
    <LeftKidney.label>ляв бъбрек</LeftKidney.label>
    <RightKidney.label>десен бъбрек</RightKidney.label>
    <Liver.label>черен дроб</Liver.label>
    <Stomach.label>стомах</Stomach.label>
    

    Best Regards,

    guy038



  • Wow man! You made me very happy indeed :) Thanks so much


Log in to reply