Search and replace special characters ANSI-UTF
-
hello, how can I search and replace special ANSI-UTF characters like:
xE2, xEE, xCE, x80, x9D, x9E
etcI try to search and replace in normal mode and with regex, but nothing happen:
xE2 = (â)
xEE = (î),
xCE = (Î) -
Hello, Vasile,
Post, updated on 12-11-2016, at 21h30 ( French TZ ) !
The accentuated characters, whose Unicode code-points is between
\x00c0
and\x00ff
can be easily searched with the following syntaxes :-
A) If your current file has an Unicode encoding (
UTF-8
,UTF-8 BOM
,UCS-2
,BE BOM
orUCS-2 LE BOM
) :-
\xmn
, where m and n belong to [0-9A-Fa-f], if search mode =Extended
ORRegular expression
-
\x{mn}
, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
\x{00mn}
, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
-
B) If your current file has the
ANSI
encoding :-
\xmn
, where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode =Extended
-
\xmn
, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
\x{mn}
, where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode =Regular expression
-
\x{00mn}
, where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode =Regular expression
-
-
C) If your current file has a
NON
Unicode encoding, from Encoding > Character Sets ) :-
\xmn
, where m and n belong to [0-9A-Fa-f], if search mode =Extended
ORRegular expression
-
\x{mn}
, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
\x{00mn}
, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
Of course, from your example :
xE2 = (â) xEE = (î) xCE = (Î)
As you have, both, the upper-case letter
Î
and the lower-caseî
, you’ll need to check the Match case option or to put the(?-i)
modifier, in front of\x..
, in order to get the right letter, only !Best Regards,
guy038
-
-
thanks guy038, you are always perfect !
-
Hi Vasile and All,
Since my previous post, I noticed some odd things :
- Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax
\xmn
, between\x80
and\x9f
( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark (\x3F
), instead of saying 0 matches, which should be the correct answer !
Refer to :
http://www.unicode.org/charts/PDF/U0080.pdf
- Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?
Therefore, I updated my previous post, to reflect these restrictions !
Cheers,
guy038
- Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax
-
I use this regex to find ANSI characters in all my documents:
FIND:
¾|Ð|¼|°|Ñ|Ä|¢|º|ª|Å|Ÿ|ž|È|æ|Ã|¢|£|®|º|©|€|§|®|™|¢
in almost all ANSI characters these signs are repeated:
¾|Ð|¼|°|Ñ
But I use that longer regex, to make sure I don’t miss anything.