Search and replace special characters ANSI-UTF
-
hello, how can I search and replace special ANSI-UTF characters like:
xE2, xEE, xCE, x80, x9D, x9EetcI try to search and replace in normal mode and with regex, but nothing happen:
xE2 = (â)
xEE = (î),
xCE = (Î) -
Hello, Vasile,
Post, updated on 12-11-2016, at 21h30 ( French TZ ) !
The accentuated characters, whose Unicode code-points is between
\x00c0and\x00ffcan be easily searched with the following syntaxes :-
A) If your current file has an Unicode encoding (
UTF-8,UTF-8 BOM,UCS-2,BE BOMorUCS-2 LE BOM) :-
\xmn, where m and n belong to [0-9A-Fa-f], if search mode =ExtendedORRegular expression -
\x{mn}, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression -
\x{00mn}, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
-
B) If your current file has the
ANSIencoding :-
\xmn, where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode =Extended -
\xmn, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression -
\x{mn}, where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode =Regular expression -
\x{00mn}, where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode =Regular expression
-
-
C) If your current file has a
NONUnicode encoding, from Encoding > Character Sets ) :-
\xmn, where m and n belong to [0-9A-Fa-f], if search mode =ExtendedORRegular expression -
\x{mn}, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression -
\x{00mn}, where m and n belong to [0-9A-Fa-f], if search mode =Regular expression
-
Of course, from your example :
xE2 = (â) xEE = (î) xCE = (Î)As you have, both, the upper-case letter
Îand the lower-caseî, you’ll need to check the Match case option or to put the(?-i)modifier, in front of\x.., in order to get the right letter, only !Best Regards,
guy038
-
-
thanks guy038, you are always perfect !
-
Hi Vasile and All,
Since my previous post, I noticed some odd things :
- Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax
\xmn, between\x80and\x9f( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark (\x3F), instead of saying 0 matches, which should be the correct answer !
Refer to :
http://www.unicode.org/charts/PDF/U0080.pdf
- Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?
Therefore, I updated my previous post, to reflect these restrictions !
Cheers,
guy038
- Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax
-
I use this regex to find ANSI characters in all my documents:
FIND:
¾|Ð|¼|°|Ñ|Ä|¢|º|ª|Å|Ÿ|ž|È|æ|Ã|¢|£|®|º|©|€|§|®|™|¢in almost all ANSI characters these signs are repeated:
¾|Ð|¼|°|ÑBut I use that longer regex, to make sure I don’t miss anything.