Community
    • Login

    Unicode 'ÿ' , problem converting to Hex 'FF'

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    38 Posts 8 Posters 11.0k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L Offline
      LanceMarchetti @guy038
      last edited by

      @guy038 Wow - Thanks for this macro!

      1 Reply Last reply Reply Quote 0
      • L Offline
        LanceMarchetti @LanceMarchetti
        last edited by

        @LanceMarchetti said in Unicode 'ÿ' , problem converting to Hex 'FF':

        “NOTE: Using a space ‘20’ to replace a NUL ‘00’ will still result in a 32x32 Gif image,”
        Correction: I just noticed that the GIF file format repeats the image dimension block later on in the binary, just before the image data begins.
        So I tried using Hex 20 20 20 20 in the second dimension sector as well. The result was INDEED a 8224x8224 image!
        Changing it back to 20 00 20 00 resulted in a 32x32 image.

        Just wanted to set that straight. Thanks.

        1 Reply Last reply Reply Quote 0
        • PeterJonesP Offline
          PeterJones @LanceMarchetti
          last edited by

          @LanceMarchetti ,

          I was surprised that no one had suggested submitting a bug report to the Converter Plugin repo’s issue list regarding the ÿ portion, because even though most of your request is abusing a text editor to edit a non-text file, Notepad++'s Converter Plugin should be able to handle converting ÿ in a valid ANSI file (win1252 or ISO-8859-1 or similar codepage).

          But when I went looking through open and closed issues, I found it’s actually already been reported multiple times – most recently, in #11… and #11 even came with a fix in PR#16, which was rejected because Don didn’t have the Steps to Reproduce.

          Based on this conversation, I was able to come up with a minimal Steps to Reproduce, and asked Don to re-open that issue, and hopefully he will then see whether the fix #16 will work, or will come up with one that will.

          The right answer to your full query is “Notepad++ is not a binary-file editor, so you need to accept hacky suggestions”. But ÿ should be able to be handled by the Converter plugin, so I poked the ÿ issue.

          L 1 Reply Last reply Reply Quote 3
          • Alan KilbornA Offline
            Alan Kilborn
            last edited by Alan Kilborn

            I didn’t look super closely at it (due to limited interest), but I’d guess this line/block is the cause of the problem:

            https://github.com/npp-plugins/converter/blob/37d62a54a476c935f21bdd3aa37e04f3495adcb3/source/PluginDefinition.cpp#L390

            At least it seems suspicious. 0xFF can look a lot like -1. :-)

            PeterJonesP 1 Reply Last reply Reply Quote 2
            • PeterJonesP Offline
              PeterJones @Alan Kilborn
              last edited by

              @Alan-Kilborn said in Unicode 'ÿ' , problem converting to Hex 'FF':

              At least it seems suspicious. 0xFF can look a lot like -1. :-)

              Given that PR#16 changed the getChar return value, which is what that -1 is comparing against, I’d say you’re probably right.

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA Offline
                Alan Kilborn @PeterJones
                last edited by

                @PeterJones said in Unicode 'ÿ' , problem converting to Hex 'FF':

                Given that PR#16 changed the getChar return value, which is what that -1 is comparing against, I’d say you’re probably right.

                I didn’t look beyond the current code.

                1 Reply Last reply Reply Quote 0
                • L Offline
                  LanceMarchetti @PeterJones
                  last edited by

                  @PeterJones Thanks Peter. Appreciated. I also emailed Don last week on the ÿ issue. But have not heard back yet. So I’m grateful that you clubbed in to hopefully get it sorted.

                  I’m not a programmer as stated before, so i think you Guys are obviously on the right track with ‘-1’ because Mistral AI identified it also as such.

                  /chat.mistral.ai/chat/ec2b9a74-2da7-4bda-bfb3-e2b8681c435f/

                  I see that the provided code snippet is part of the Ascii to Hex plugin for Notepad++. Based on my analysis, the issue likely arises from the fact that the ‘ÿ’ character (Alt+0255) corresponds to the unsigned char value 255, which is interpreted as a negative value (-1) when stored in a signed char. The code then treats this negative value as an error and refuses to convert the whole block of highlighted text.

                  To fix this issue, you can modify the code to use unsigned characters instead of signed characters. Here’s an updated version of the relevant part of the code:

                  void ascii2hex(bool insertSpace, bool isMaj, size_t nbCharPerLine)
                  {
                      // ... (previous code remains unchanged)
                  
                      unsigned char *pDestText = new unsigned char[textLen*(inc+eolNbChar)+1];
                  
                      // ... (previous code remains unchanged)
                  
                      const char *format = "";
                      int val = static_cast<unsigned char>(selText.getChar(i));
                      if (val == -1)
                      {
                          delete [] pDestText;
                          return;
                      }
                  
                      // ... (remaining code)
                  }
                  

                  By changing the pDestText type to unsigned char and casting the result of selText.getChar(i) to unsigned char, the code should now properly handle the ‘ÿ’ character and convert it to ‘FF’ in hexadecimal.

                  rdipardoR 1 Reply Last reply Reply Quote 2
                  • rdipardoR Offline
                    rdipardo @LanceMarchetti
                    last edited by

                    @LanceMarchetti said in Unicode 'ÿ' , problem converting to Hex 'FF':

                        int val = static_cast<unsigned char>(selText.getChar(i));
                    

                    I’m afraid this type cast will corrupt data if any character ordinal is greater than 0xFF, e.g. ġ. See, for example:

                    Fix character truncation bug that lead to ‘ġ’ styled as an operator since its low 8 bits are equal to ‘!’

                    selText.getChar already returns an int, which is 32-bits wide, so more than enough for any ASCII or Unicode ordinal.

                    The real issue is the declaration of SelectedString::_str as char *, which limits every representable “byte” to a maximum ordinal value of 0x7F. It should be the generic TCHAR * type, which becomes wchar_t * when the _UNICODE compile-time definition is set (as it always has been ever since Notepad++ became a Unicode application about 17 years ago). A wchar_t is 16 bits, which would accommodate “extended” 8-bit ASCII as well as the East Asian double-byte character sets that remain popular with some users.

                    1 Reply Last reply Reply Quote 3
                    • Lance MarchettiL Offline
                      Lance Marchetti
                      last edited by PeterJones

                      Re: Unicode 'ÿ' – problem converting to Hex 'FF'

                      Wow, I can’t believe it’s been 2 years since I ran into this issue…anyway I finally figured out a working fix.

                      To fix the signed integer anomaly, we apply a bitwise AND mask (& 0xFF). This converts Scintilla’s negative integers back into true unsigned byte values (0 to 255) before string formatting. The entire conversion happens entirely in memory and replaces the target range in a single, atomic undo action.

                      How to Use:
                      Open a binary file, highlight the bytes you want to convert, then run the script. All ÿ and NUL bytes will successfully be converted.

                      Cheers :)

                      # -*- coding: utf-8 -*-
                      # Notepad++ PythonScript: Convert selected bytes to hex, skip 0xFF (ÿ) and 0x00 (NUL). Substite with FF and 00 on second pass.
                      
                      from Npp import *
                      
                      def main():
                          editor.beginUndoAction()
                          try:
                              start = editor.getSelectionStart()
                              end = editor.getSelectionEnd()
                              if start == end:
                                  return
                      
                              hex_parts = []
                              
                              for pos in range(start, end):
                                  raw_b = editor.getCharAt(pos)
                                  b = raw_b & 0xFF  # Keep the unsigned 0-255 byte conversion
                                      
                                  if b == 0xFF:
                                      # Direct conversion: No more placeholder strings needed!
                                      hex_parts.append("FF")  
                                  else:
                                      # Flawless 2-digit Hex formatting for NUL (00) and other bytes
                                      hex_parts.append("%02X" % b)  
                                      
                              # Write the finalized, pure hex block directly to the document
                              editor.setTargetStart(start)
                              editor.setTargetEnd(end)
                              editor.replaceTarget(''.join(hex_parts))
                              
                          finally:
                              editor.endUndoAction()
                      
                      if __name__ == '__main__':
                          main()
                      
                      PeterJonesP 1 Reply Last reply Reply Quote 0
                      • PeterJonesP Offline
                        PeterJones @Lance Marchetti
                        last edited by

                        ( @Lance-Marchetti , you originally posted this as a separate topic: I merged it into the original, so it’s easier for people to follow the conversation. )

                        1 Reply Last reply Reply Quote 0

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors