Convert to lowercase in xml text
-
Hi! I need to change cyrillic letters to lowercase without XML text (example: <Heart.label></Heart.label> )
Example:
BEFONE:
<Heart.label>Сърце</Heart.label>
<LeftLung.label>Ляв бял дроб</LeftLung.label>
<RightLung.label>Десен бял дроб</RightLung.label>
<LeftKidney.label>Ляв бъбрек</LeftKidney.label>
<RightKidney.label>Десен бъбрек</RightKidney.label>
<Liver.label>Черен дроб</Liver.label>
<Stomach.label>Стомах</Stomach.label>AFTER:
<Heart.label>сърце</Heart.label>
<LeftLung.label>ляв бял дроб</LeftLung.label>
<RightLung.label>десен бял дроб</RightLung.label>
<LeftKidney.label>ляв бъбрек</LeftKidney.label>
<RightKidney.label>десен бъбрек</RightKidney.label>
<Liver.label>черен дроб</Liver.label>
<Stomach.label>стомах</Stomach.label>Is it possible to do this in a quick way?
Sorry for bad English!
Thanks in advance! -
Hello, @stefan-krumov,
Logically, the right and straight way to do it would be to use a S/R, with regular expressions, as below, which changes any part, between the tags
<xxx.label>
and</xxx.label>
by the same lower-cased part !SEARCH
(?-i)<(.+\.label)>(.+?)</\1>
REPLACE<\1>\L\2\E</\1>
Notes :
-
The
(?-i)
forces the search to be performed, in a non-insensitive way -
Then, the part
<(.+\.label)>
stores the tag name as group 1, which is re-used to match the ending tag</\1>
-
And the
(.+?)
middle part is the smaller range of characters, between these two tags, that is to say the Cyrillic text, stored as group 2 -
In replacement, we rewrite, first, the starting tag
<\1>
, then the Cyrillic text\2
, preceded by the global lower-case modifier\L
and followed by the ending case modifier\E
and, finally, the ending tag</\1>
Unfortunately, due to a bug, in the BOOST Regex Engine, used internally by N++, the different case modifiers (
\L
,\U
,\l
and\u
), used in the Replacement part, do NOT change the case of characters, with Unicode value greater than\x{007F}
. Too bad, indeed :-((On the contrary, with the above S/R, for instance, the line :
<Heart.label>Young Man Heart</Heart.label>
would, correctly, be changed into :
<Heart.label>young man heart</Heart.label>
So, Stefan, we cannot use this method to lower case your Cyrillic characters ! However, just note that the Edit > Convert Case to > lowercase command (
Crtl + U
) works as expected and does change any letter, of any alphabet, to its lower-cased counter-part :-))
Therefore, Stefan, here is a work-around, to get what you want, easily enough ! The general idea is :
-
First, to surround any Cyrillic text, between the two tags
<xxx.label>
and</xxx.label>
, with some tabulation characters -
Secondly, to get a rectangular selection ( Column mode ) of all your Cyrillic text
-
Thirdly, to use the Edit > Convert Case to > lowercase command ( or the
Ctrl + U
shortcut ) -
Finally, to delete all the temporary tabulation characters
So, from your initial example, below :
<Heart.label>Сърце</Heart.label> <LeftLung.label>Ляв бял дроб</LeftLung.label> <RightLung.label>Десен бял дроб</RightLung.label> <LeftKidney.label>Ляв бъбрек</LeftKidney.label> <RightKidney.label>Десен бъбрек</RightKidney.label> <Liver.label>Черен дроб</Liver.label> <Stomach.label>Стомах</Stomach.label>
- Open the Replace dialog (
Ctrl + H
)
SEARCH
(?-i)<(.+\.label)>\K.+?(?=</\1>)
REPLACE
\t\t\t\t$0\t\t\t\t
OPTIONS
Wrap around
andRegular expression
And ,after a click on the Replace All button ( do NOT use the Replace button ! ), you get :
<Heart.label> Сърце </Heart.label> <LeftLung.label> Ляв бял дроб </LeftLung.label> <RightLung.label> Десен бял дроб </RightLung.label> <LeftKidney.label> Ляв бъбрек </LeftKidney.label> <RightKidney.label> Десен бъбрек </RightKidney.label> <Liver.label> Черен дроб </Liver.label> <Stomach.label> Стомах </Stomach.label>
Remarks :
-
You may, immediately, RE-run this S/R, to better isolate your Cyrillic text
-
You may change, in the Replacement part, the number of
\t
forms, surrounding the$0
syntax ( the entire matched expression )
Now, create a rectangular selection of all the Cyrillic characters and execute a
Ctrl + U
action, to get this text lower-cased :<Heart.label> сърце </Heart.label> <LeftLung.label> ляв бял дроб </LeftLung.label> <RightLung.label> десен бял дроб </RightLung.label> <LeftKidney.label> ляв бъбрек </LeftKidney.label> <RightKidney.label> десен бъбрек </RightKidney.label> <Liver.label> черен дроб </Liver.label> <Stomach.label> стомах </Stomach.label>
Finally, with the simple S/R, below, get rid of all the tabulation characters :
SEARCH
\t
REPLACE
Leave EMPTY
And you get the expected text :
<Heart.label>сърце</Heart.label> <LeftLung.label>ляв бял дроб</LeftLung.label> <RightLung.label>десен бял дроб</RightLung.label> <LeftKidney.label>ляв бъбрек</LeftKidney.label> <RightKidney.label>десен бъбрек</RightKidney.label> <Liver.label>черен дроб</Liver.label> <Stomach.label>стомах</Stomach.label>
Best Regards,
guy038
-
-
Wow man! You made me very happy indeed :) Thanks so much