Community
    • Login

    Search and replace special characters ANSI-UTF

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 15.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV
      Vasile Caraus
      last edited by Vasile Caraus

      hello, how can I search and replace special ANSI-UTF characters like: xE2, xEE, xCE, x80, x9D, x9E etc

      I try to search and replace in normal mode and with regex, but nothing happen:

      xE2 = (â)
      xEE = (î),
      xCE = (Î)

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, Vasile,

        Post, updated on 12-11-2016, at 21h30 ( French TZ ) !

        The accentuated characters, whose Unicode code-points is between \x00c0 and \x00ff can be easily searched with the following syntaxes :

        • A) If your current file has an Unicode encoding ( UTF-8, UTF-8 BOM, UCS-2, BE BOM or UCS-2 LE BOM) :

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

          • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


        • B) If your current file has the ANSI encoding :

          • \xmn , where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode = Extended

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression


        • C) If your current file has a NON Unicode encoding, from Encoding > Character Sets ) :

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

          • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


        Of course, from your example :

        xE2 = (â)
        xEE = (î)
        xCE = (Î)
        

        As you have, both, the upper-case letter Î and the lower-case î, you’ll need to check the Match case option or to put the (?-i) modifier, in front of \x.., in order to get the right letter, only !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • Vasile CarausV
          Vasile Caraus
          last edited by

          thanks guy038, you are always perfect !

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hi Vasile and All,

            Since my previous post, I noticed some odd things :

            • Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax \xmn, between \x80 and \x9f ( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark ( \x3F ), instead of saying 0 matches, which should be the correct answer !

            Refer to :

            http://www.unicode.org/charts/PDF/U0080.pdf

            • Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?

            Therefore, I updated my previous post, to reflect these restrictions !

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • rodica FR
              rodica F
              last edited by

              I use this regex to find ANSI characters in all my documents:

              FIND: ¾|Ð|¼|°|Ñ|Ä|¢|º|ª|Å|Ÿ|ž|È|æ|Ã|¢|£|®|º|©|€|§|®|™|¢

              in almost all ANSI characters these signs are repeated: ¾|Ð|¼|°|Ñ

              But I use that longer regex, to make sure I don’t miss anything.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors