Community
    • Login

    Remove unicode characters within range

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dan WierD
      Dan Wier
      last edited by

      I have a large text document that includes accented characters like æøåáäĺćçčéđńőöřůýţžš. I am trying to remove all unicode characters between 0 and 96 with the intention of leaving behind these characters only so I can make sure that when I process text like these that I know what special characters I need to be able to handle.

      This regular expression I would expect to work but the accented letters are still removed. I presume it’s using unicode hex code rather than number?

      [\u0001-\u0096,-]
      

      I don’t see a n++ character class that would work either. Any sugestions?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Dan Wier
        last edited by PeterJones

        @Dan-Wier

        As the manual says, \u#### notation is for Extended Search Notation, not regular expression match by character code notation. Extended search does not have range notation. Make sure you use it in the right situation.

        In regular expression, you use \x{####} for four-nibble unicode characters.
        [\x{0001}-\x{0096}] will match from 'START OF HEADING' (U+0001) to 'START OF GUARDED AREA' (U+0096) … an odd range to pick for your stated goals, but whatever makes you happy on that.

        And, BTW, the ,- is useless, since comma and hyphen are already in that Unicode range.

        With regular expression mode and regular expression syntax, it matches the right characters.
        90ccb025-c304-4873-8536-a13a785a315b-image.png

        Dan WierD 1 Reply Last reply Reply Quote 2
        • Dan WierD
          Dan Wier @PeterJones
          last edited by

          @PeterJones Thank you so much for not only the answer but explaining the difference notation. I may need to adjust my range but it felt like a good place to start to see what I get back from these documents.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors