Convert to lowercase in xml text
-
Hi! I need to change cyrillic letters to lowercase without XML text (example: <Heart.label></Heart.label> )
Example:
BEFONE:
<Heart.label>Сърце</Heart.label>
<LeftLung.label>Ляв бял дроб</LeftLung.label>
<RightLung.label>Десен бял дроб</RightLung.label>
<LeftKidney.label>Ляв бъбрек</LeftKidney.label>
<RightKidney.label>Десен бъбрек</RightKidney.label>
<Liver.label>Черен дроб</Liver.label>
<Stomach.label>Стомах</Stomach.label>AFTER:
<Heart.label>сърце</Heart.label>
<LeftLung.label>ляв бял дроб</LeftLung.label>
<RightLung.label>десен бял дроб</RightLung.label>
<LeftKidney.label>ляв бъбрек</LeftKidney.label>
<RightKidney.label>десен бъбрек</RightKidney.label>
<Liver.label>черен дроб</Liver.label>
<Stomach.label>стомах</Stomach.label>Is it possible to do this in a quick way?
Sorry for bad English!
Thanks in advance! -
Hello, @stefan-krumov,
Logically, the right and straight way to do it would be to use a S/R, with regular expressions, as below, which changes any part, between the tags
<xxx.label>and</xxx.label>by the same lower-cased part !SEARCH
(?-i)<(.+\.label)>(.+?)</\1>
REPLACE<\1>\L\2\E</\1>Notes :
-
The
(?-i)forces the search to be performed, in a non-insensitive way -
Then, the part
<(.+\.label)>stores the tag name as group 1, which is re-used to match the ending tag</\1> -
And the
(.+?)middle part is the smaller range of characters, between these two tags, that is to say the Cyrillic text, stored as group 2 -
In replacement, we rewrite, first, the starting tag
<\1>, then the Cyrillic text\2, preceded by the global lower-case modifier\Land followed by the ending case modifier\Eand, finally, the ending tag</\1>
Unfortunately, due to a bug, in the BOOST Regex Engine, used internally by N++, the different case modifiers (
\L,\U,\land\u), used in the Replacement part, do NOT change the case of characters, with Unicode value greater than\x{007F}. Too bad, indeed :-((On the contrary, with the above S/R, for instance, the line :
<Heart.label>Young Man Heart</Heart.label>would, correctly, be changed into :
<Heart.label>young man heart</Heart.label>So, Stefan, we cannot use this method to lower case your Cyrillic characters ! However, just note that the Edit > Convert Case to > lowercase command (
Crtl + U) works as expected and does change any letter, of any alphabet, to its lower-cased counter-part :-))
Therefore, Stefan, here is a work-around, to get what you want, easily enough ! The general idea is :
-
First, to surround any Cyrillic text, between the two tags
<xxx.label>and</xxx.label>, with some tabulation characters -
Secondly, to get a rectangular selection ( Column mode ) of all your Cyrillic text
-
Thirdly, to use the Edit > Convert Case to > lowercase command ( or the
Ctrl + Ushortcut ) -
Finally, to delete all the temporary tabulation characters
So, from your initial example, below :
<Heart.label>Сърце</Heart.label> <LeftLung.label>Ляв бял дроб</LeftLung.label> <RightLung.label>Десен бял дроб</RightLung.label> <LeftKidney.label>Ляв бъбрек</LeftKidney.label> <RightKidney.label>Десен бъбрек</RightKidney.label> <Liver.label>Черен дроб</Liver.label> <Stomach.label>Стомах</Stomach.label>- Open the Replace dialog (
Ctrl + H)
SEARCH
(?-i)<(.+\.label)>\K.+?(?=</\1>)REPLACE
\t\t\t\t$0\t\t\t\tOPTIONS
Wrap aroundandRegular expressionAnd ,after a click on the Replace All button ( do NOT use the Replace button ! ), you get :
<Heart.label> Сърце </Heart.label> <LeftLung.label> Ляв бял дроб </LeftLung.label> <RightLung.label> Десен бял дроб </RightLung.label> <LeftKidney.label> Ляв бъбрек </LeftKidney.label> <RightKidney.label> Десен бъбрек </RightKidney.label> <Liver.label> Черен дроб </Liver.label> <Stomach.label> Стомах </Stomach.label>Remarks :
-
You may, immediately, RE-run this S/R, to better isolate your Cyrillic text
-
You may change, in the Replacement part, the number of
\tforms, surrounding the$0syntax ( the entire matched expression )
Now, create a rectangular selection of all the Cyrillic characters and execute a
Ctrl + Uaction, to get this text lower-cased :<Heart.label> сърце </Heart.label> <LeftLung.label> ляв бял дроб </LeftLung.label> <RightLung.label> десен бял дроб </RightLung.label> <LeftKidney.label> ляв бъбрек </LeftKidney.label> <RightKidney.label> десен бъбрек </RightKidney.label> <Liver.label> черен дроб </Liver.label> <Stomach.label> стомах </Stomach.label>Finally, with the simple S/R, below, get rid of all the tabulation characters :
SEARCH
\tREPLACE
Leave EMPTYAnd you get the expected text :
<Heart.label>сърце</Heart.label> <LeftLung.label>ляв бял дроб</LeftLung.label> <RightLung.label>десен бял дроб</RightLung.label> <LeftKidney.label>ляв бъбрек</LeftKidney.label> <RightKidney.label>десен бъбрек</RightKidney.label> <Liver.label>черен дроб</Liver.label> <Stomach.label>стомах</Stomach.label>Best Regards,
guy038
-
-
Wow man! You made me very happy indeed :) Thanks so much