Search and replace special characters ANSI-UTF

Vasile Caraus

hello, how can I search and replace special ANSI-UTF characters like: xE2, xEE, xCE, x80, x9D, x9E etc

I try to search and replace in normal mode and with regex, but nothing happen:

xE2 = (â)
xEE = (î),
xCE = (Î)

guy038

Hello, Vasile,

Post, updated on 12-11-2016, at 21h30 ( French TZ ) !

The accentuated characters, whose Unicode code-points is between \x00c0 and \x00ff can be easily searched with the following syntaxes :

A) If your current file has an Unicode encoding ( UTF-8, UTF-8 BOM, UCS-2, BE BOM or UCS-2 LE BOM) :
- \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression
- \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression
- \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

B) If your current file has the ANSI encoding :
- \xmn , where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode = Extended
- \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression
- \x{mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression
- \x{00mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression

C) If your current file has a NON Unicode encoding, from Encoding > Character Sets ) :
- \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression
- \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression
- \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

Of course, from your example :

xE2 = (â)
xEE = (î)
xCE = (Î)

As you have, both, the upper-case letter Î and the lower-case î, you’ll need to check the Match case option or to put the (?-i) modifier, in front of \x.., in order to get the right letter, only !

Best Regards,

guy038

Vasile Caraus

thanks guy038, you are always perfect !

guy038

Hi Vasile and All,

Since my previous post, I noticed some odd things :

Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax \xmn, between \x80 and \x9f ( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark ( \x3F ), instead of saying 0 matches, which should be the correct answer !

Refer to :

http://www.unicode.org/charts/PDF/U0080.pdf

Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?

Therefore, I updated my previous post, to reflect these restrictions !

Cheers,

guy038

rodica F

I use this regex to find ANSI characters in all my documents:

in almost all ANSI characters these signs are repeated: ¾|Ð|¼|°|Ñ

But I use that longer regex, to make sure I don’t miss anything.