• Login
Community
  • Login

Remove unicode characters within range

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
3 Posts 2 Posters 1.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Dan Wier
    last edited by Aug 6, 2022, 5:00 PM

    I have a large text document that includes accented characters like æøåáäĺćçčéđńőöřůýţžš. I am trying to remove all unicode characters between 0 and 96 with the intention of leaving behind these characters only so I can make sure that when I process text like these that I know what special characters I need to be able to handle.

    This regular expression I would expect to work but the accented letters are still removed. I presume it’s using unicode hex code rather than number?

    [\u0001-\u0096,-]
    

    I don’t see a n++ character class that would work either. Any sugestions?

    P 1 Reply Last reply Aug 6, 2022, 5:57 PM Reply Quote 0
    • P
      PeterJones @Dan Wier
      last edited by PeterJones Aug 6, 2022, 5:58 PM Aug 6, 2022, 5:57 PM

      @Dan-Wier

      As the manual says, \u#### notation is for Extended Search Notation , not regular expression match by character code notation. Extended search does not have range notation. Make sure you use it in the right situation.

      In regular expression, you use \x{####} for four-nibble unicode characters.
      [\x{0001}-\x{0096}] will match from 'START OF HEADING' (U+0001) to 'START OF GUARDED AREA' (U+0096) … an odd range to pick for your stated goals, but whatever makes you happy on that.

      And, BTW, the ,- is useless, since comma and hyphen are already in that Unicode range.

      With regular expression mode and regular expression syntax, it matches the right characters.
      90ccb025-c304-4873-8536-a13a785a315b-image.png

      D 1 Reply Last reply Aug 6, 2022, 7:06 PM Reply Quote 2
      • D
        Dan Wier @PeterJones
        last edited by Aug 6, 2022, 7:06 PM

        @PeterJones Thank you so much for not only the answer but explaining the difference notation. I may need to adjust my range but it felt like a good place to start to see what I get back from these documents.

        1 Reply Last reply Reply Quote 0
        1 out of 3
        • First post
          1/3
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors