Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Option to display all zero-width characters?

    General Discussion
    security unicode programming
    3
    3
    220
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Daniel Bragg
      Daniel Bragg last edited by

      As outlined in Krebs Security, and other places, we’re now starting to see the inclusion of UDF control characters to obfuscate what appears in text, compared to what is actually in the text. This can be used with malicious intent, and without any tool to check for the presence of these zero-width characters, we can unwittingly include malicious code in the libraries we use as developers.

      For example, if you have the text:

      class M‮{public static void main(String[]a‭){System.out.print(new char[]
      {'H','e','l','l','o',' ','W','o','r','l','d','!'});}}
      

      And you paste it into an empty Notepad++ document, it recognizes it as UTF-8, No Language (Normal Text). If you change the language to C#, it changes the output to be:

      f18bbd05-b94b-48bc-b004-92b5d1355d71-image.png

      This highlights the complexity of the problem. Someone reading that will see very reasonable text. A compiler, bypassing the zero-width characters, will see something different entirely, and nobody is the wiser if the sterilized text is still valid code.

      I’m looking for a feature in Notepad++ that will display ALL zero-width characters as their ASCII-Unicode value (as in “[U+202e]”), permitting at least a cursory review of code ensuring that all zero-width characters are correct in their placement and usage.

      PeterJones 1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones @Daniel Bragg last edited by PeterJones

        @daniel-bragg ,

        Natively, that feature is not exposed. However, there are API messages which will allow you to set the representation for specific characters, which can be called from Plugins.

        Rather than writing a custom plugin just for that, you can use one of the scripting plugins to send those messages and thus change the representation of the characters. @Alan-Kilborn shared such a script in Invisible Characters Unwanted – you would install the PythonScript plugin, then create a new script and paste in the code from that discussion; then, when you run that script, it will change the representation of those characters to the little black boxes like CR and LF use. (Make sure you read down the thread, and see if there are additional characters added later that you want to include in your copy of that script.)

        Alan Kilborn 1 Reply Last reply Reply Quote 1
        • Alan Kilborn
          Alan Kilborn @PeterJones last edited by

          @peterjones

          Wow, cool popup preview if you hover over the blue “Invisible Characters Unwanted” text … apparently another aspect of the recent NodeBB update!

          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          Copyright © 2014 NodeBB Forums | Contributors