Community
    • Login

    Search and replace special characters ANSI-UTF

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 15.7k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV Offline
      Vasile Caraus
      last edited by Vasile Caraus

      hello, how can I search and replace special ANSI-UTF characters like: xE2, xEE, xCE, x80, x9D, x9E etc

      I try to search and replace in normal mode and with regex, but nothing happen:

      xE2 = (â)
      xEE = (î),
      xCE = (Î)

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hello, Vasile,

        Post, updated on 12-11-2016, at 21h30 ( French TZ ) !

        The accentuated characters, whose Unicode code-points is between \x00c0 and \x00ff can be easily searched with the following syntaxes :

        • A) If your current file has an Unicode encoding ( UTF-8, UTF-8 BOM, UCS-2, BE BOM or UCS-2 LE BOM) :

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

          • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


        • B) If your current file has the ANSI encoding :

          • \xmn , where m belongs to [0-7A-Fa-f] and n belongs to [0-9A-Fa-f], if search mode = Extended

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m belongs to [0-7] and n belongs to [0-9A-Fa-f], if search mode = Regular expression


        • C) If your current file has a NON Unicode encoding, from Encoding > Character Sets ) :

          • \xmn , where m and n belong to [0-9A-Fa-f], if search mode = Extended OR Regular expression

          • \x{mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression

          • \x{00mn} , where m and n belong to [0-9A-Fa-f], if search mode = Regular expression


        Of course, from your example :

        xE2 = (â)
        xEE = (î)
        xCE = (Î)
        

        As you have, both, the upper-case letter Î and the lower-case î, you’ll need to check the Match case option or to put the (?-i) modifier, in front of \x.., in order to get the right letter, only !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 0
        • Vasile CarausV Offline
          Vasile Caraus
          last edited by

          thanks guy038, you are always perfect !

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by

            Hi Vasile and All,

            Since my previous post, I noticed some odd things :

            • Firstly, for most of the cases, in extended search mode, the search, in an ANSI encoded file, of the syntax \xmn, between \x80 and \x9f ( which represents the range of Unicode C1 Control characters ), get the classical Interrogation Mark ( \x3F ), instead of saying 0 matches, which should be the correct answer !

            Refer to :

            http://www.unicode.org/charts/PDF/U0080.pdf

            • Secondly, when using the Regular expression search mode, the behaviour of the search, in an ANSI encoded file, seems different, than with an other NON-Unicode encoding, got from the menu option Encoding > Character Sets !?

            Therefore, I updated my previous post, to reflect these restrictions !

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • rodica FR Offline
              rodica F
              last edited by

              I use this regex to find ANSI characters in all my documents:

              FIND: ¾|Ð|¼|°|Ñ|Ä|¢|º|ª|Å|Ÿ|ž|È|æ|Ã|¢|£|®|º|©|€|§|®|™|¢

              in almost all ANSI characters these signs are repeated: ¾|Ð|¼|°|Ñ

              But I use that longer regex, to make sure I don’t miss anything.

              1 Reply Last reply Reply Quote 0

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors