Search and replace special characters ANSI-UTF



  • hello, how can I search and replace special ANSI-UTF characters like: xE2, xEE, xCE, x80, x9D, x9E etc

    I try to search and replace in normal mode and with regex, but nothing happen:

    xE2 = (â)
    xEE = (î),
    xCE = (Î)



  • Hello, Vasile,

    Post, updated on 12-11-2016, at 21h30 ( French TZ ) !

    The accentuated characters, whose Unicode code-points is between \x00c0 and \x00ff can be easily searched with the following syntaxes :

    • A) If your current file has an Unicode encoding ( UTF-8, UTF-8 BOM, UCS-2, BE BOM or UCS-2 LE BOM) :

      • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

      • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

      • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


    • B) If your current file has the ANSI encoding :

      • \xmn , where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode = Extended

      • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

      • \x{mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression

      • \x{00mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression


    • C) If your current file has a NON Unicode encoding, from Encoding > Character Sets ) :

      • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

      • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

      • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


    Of course, from your example :

    xE2 = (â)
    xEE = (î)
    xCE = (Î)
    

    As you have, both, the upper-case letter Î and the lower-case î, you’ll need to check the Match case option or to put the (?-i) modifier, in front of \x.., in order to get the right letter, only !

    Best Regards,

    guy038



  • thanks guy038, you are always perfect !



  • Hi Vasile and All,

    Since my previous post, I noticed some odd things :

    • Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax \xmn, between \x80 and \x9f ( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark ( \x3F ), instead of saying 0 matches, which should be the correct answer !

    Refer to :

    http://www.unicode.org/charts/PDF/U0080.pdf

    • Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?

    Therefore, I updated my previous post, to reflect these restrictions !

    Cheers,

    guy038


Log in to reply