Community
    • Login

    Unicode 'ÿ' , problem converting to Hex 'FF'

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    36 Posts 7 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mkupperM
      mkupper @LanceMarchetti
      last edited by

      @LanceMarchetti said in Unicode 'ÿ' , problem converting to Hex 'FF':

      But I only noticed today that when encoding a string that has NUL byte values , it encodes them to ‘20’ (space) instead of ‘00’ (NUL). Can you please show me how to fix that.?Hex-NUL-Issue.png

      NUL does not appear in UTF-8 encoded Unicode unless the intent is to have codepoint U+0000 characters in your files. Keep in mind that NUL is also the string terminator for many things, including the Windows copy/paste of text mechanism.

      While you can construct files that contain U+0000 I would only do so when testing edge conditions. I would expect interesting behavior for U+0000 such as them turning into spaces. Notepad++ is a text editor, not a binary data editor.

      NUL, as a byte value, can and will appear in UTF-16 encoded Unicode for code points U+0000 and U+0001 to U+00FF. However, you are then not supposed to be looking at the bytes and thus should never need to deal with NUL bytes unless you have U+0000 in your files.

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @mkupper
        last edited by

        @mkupper said in Unicode 'ÿ' , problem converting to Hex 'FF':

        Notepad++ is a text editor, not a binary data editor.

        OP said, initially:

        Understanding that Notepad++ is strictly a text editor

        But, I question if he really understands that what he’s attempting to do isn’t advisable.

        1 Reply Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @LanceMarchetti
          last edited by

          @LanceMarchetti ,

          The editor.getSelText() and anything else that uses Notepad++'s normal interface for accessing the text characters will convert the NUL to a space. (The same things if you Copy a NUL inside Notepad++'s GUI.)

          As is said, Notepad++ was designed as a text editor, not a hex editor.

          That said, if you want to shoot yourself in the foot while trying to edit a PNG or other binary file with a text editor (which is obviously a bad idea), there are other commands available to PythonScript and similar.

          For example, editor.getCharAt(p) does properly return 0 when p is a variable containing the 0-based character position of a NUL inside the document. So if instead of having the custom convert_to_hex_lines() use the getSelText(), you could instead rewrite it to pass in the convert_to_hex_lines(selstart,selend), and then change to iterating over that range of offsets; instead of hex(ord(char)), you would use hex(editor.getCharAt(p)). I’m not going to write it completely for you (nor should you expect anyone here to do it for you), but I have given you enough that if you want to turn this into a learning exercise, I am sure you could figure out how to take what’s above, combined with my description here, and eventually get it to work.

          L 1 Reply Last reply Reply Quote 4
          • Mark OlsonM
            Mark Olson
            last edited by

            If NULs are involved, I would kick the problem over to VSCode, which does not have such issues with NULs.

            Alan KilbornA mkupperM L 3 Replies Last reply Reply Quote 3
            • Alan KilbornA
              Alan Kilborn @Mark Olson
              last edited by Alan Kilborn

              @Mark-Olson said :

              I would kick the problem over to VSCode, which does not have such issues with NULs

              There really is nothing intrinsically wrong with a text editor supporting NULs in documents, and, if VSCode indeed supports this, then bully for it. It was designed in “modern times”.

              Notepad++ is bound by a 20+ year old legacy, when null characters were (only) used to terminate C-strings. They are still used as C-string terminators, just not in such a “willy nilly” application as the “olden days”, when the coders were aghast at the possibility of a null character in a document (and thus made compromises based upon this).

              I think Notepad++ will get better with this as time continues to march on, but it is going to be slow in happening. Best thing to do is not to do anything in Notepad++ with files that need/have these characters.

              1 Reply Last reply Reply Quote 3
              • mkupperM
                mkupper @Mark Olson
                last edited by

                @Mark-Olson said in Unicode 'ÿ' , problem converting to Hex 'FF':

                If NULs are involved, I would kick the problem over to VSCode, which does not have such issues with NULs.

                Notepad++ and Scintilla support NULs within text files. The problems are in the details such as plugins. As Windows does not support NULs within text strings you can’t copy/paste strings containing NULs to or from any editor.

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @mkupper
                  last edited by

                  @mkupper said in Unicode 'ÿ' , problem converting to Hex 'FF':

                  Notepad++ and Scintilla support NULs within text files.

                  Not fully (as I was trying to say in my previous post).
                  Anytime some text data is pulled into a C-string for processing, NUL bytes in that data is going to cause a problem, because it will look like the end-of-text right at the NUL byte, instead of the full length.
                  There are MANY examples of this in the Notepad++ source.
                  A very recent example (yesterday) of a user-reported problem of this nature is HERE.

                  mkupperM 1 Reply Last reply Reply Quote 2
                  • mkupperM
                    mkupper @Alan Kilborn
                    last edited by

                    @Alan-Kilborn said in Unicode 'ÿ' , problem converting to Hex 'FF':

                    Not fully (as I was trying to say in my previous post).
                    Anytime some text data is pulled into a C-string for processing, NUL bytes in that data is going to cause a problem, because it will look like the end-of-text right at the NUL byte, instead of the full length.
                    There are MANY examples of this in the Notepad++ source.
                    A very recent example (yesterday) of a user-reported problem of this nature is HERE .

                    Thank you - I see that one is a big issue. I did some testing.

                    • The convert case functions are ok.
                    • Splitting and joining lines are ok.

                    Besides sorting, these Notepad++ operations have issues:

                    • The reverse and randomize line order functions truncate the selection at the NUL.
                    • Remove Duplicate lines does nothing if there are no duplicates before a NUL… If there are duplicate lines before line(s) with NULs then the file or selection gets truncated at the first NUL and then the dupe-removal happens. I guess the initial scan for dupes is truncated at the NULL and it decides to do nothing. If there are dupes then those are removed and the NUL truncated buffer is written back to Scintilla.

                    I discovered that the Edit / Line Operations / Paste Special / copy and paste binary content work for text data containing NULs. That’s useful to keep in mind testing this sort of stuff.

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @lancemarchetti, @mark-olson, @alan-kilborn, @rdipardo and All,

                      In my previous post, I said :

                      • Firstly, the important thing to note is that any character over the BMP, so with code char > x{FFFF}, is strictly forbidden in files encoded with the UCS-2 BE BOM or the UCS-2 LE BOM encoding !

                      This assumption is true but, in a N++ point of vue, is not accurate anymore because, since the v8.0 version of N++, these encodings have been improved to the two new encodings UTF-16 BE BOM and UTF-16 LE BOM

                      And, indeed, these new encodings do support characters over the BMP ! Note that all the characters over the BMP are encoded with their surrogate representation, containing two 16-bytes code units. For instance the Unicode char \x{1F4A6} is, therefore, encoded with its surrogate representation \x{D83D}\x{DCA6}


                      It’s important to point out that, within N++, you CANNOT search the character \x{1F4A6} itself but you can search it via its surrogate representation which is \x{D83D}\x{DCA6}

                      BTW, if someone is interessed, I created a macro which does the translation \x{[#]#####} to its corresponding surrogate pair \x{####}\x{####}, for any character over the BMP, between \x{10000} and \x{10FFFF}, of the current selection !


                      So, as an example :

                      • For a file containing ONLY the over BMP char SPLASHING SWEAT SYMBOL : 💦, of Unicode value U + 1F4A6, the UTF-16 BE BOM encoded file would contain the bytes :

                      FE FF D8 3D DC A6 ( So, the BOM and the char, with the MSB byte first, and then, the LSB byte )

                      • For a file containing ONLY the over BMP char “SPLASHING SWEAT SYMBOL” : 💦, of Unicode value U + 1F4A6, the UTF-16 LE BOM encoded file would contain the bytes :

                      FF FE 3D D8 A6 DC ( So the BOM and the char, with the LSB byte first, and then, the MSB byte )

                      Best Regards,

                      guy038

                      Alan KilbornA 1 Reply Last reply Reply Quote 2
                      • Alan KilbornA
                        Alan Kilborn @mkupper
                        last edited by

                        @mkupper said in Unicode 'ÿ' , problem converting to Hex 'FF':

                        these Notepad++ operations have issues

                        The list is endless (well, it IS finite, but perhaps doesn’t seem like it). :-)
                        HERE is another interesting one.

                        1 Reply Last reply Reply Quote 2
                        • Alan KilbornA
                          Alan Kilborn @guy038
                          last edited by

                          @guy038 said in Unicode 'ÿ' , problem converting to Hex 'FF':

                          if someone is interessed, I created a macro which does the translation

                          I suppose the interest might be small, but why not post the macro?

                          1 Reply Last reply Reply Quote 1
                          • L
                            LanceMarchetti @PeterJones
                            last edited by

                            @PeterJones Thanks Peter. I will try my best to figure it out. I’m currently 55 and have zero programming background, but find computers and binary fascinating.

                            Just to recap, I will try to keep this a short and as comprehensive as possible.
                            Example Of The ASCII to HEX Issue I am Currently Experiencing in NP++:

                            Encoding=UTF-8
                            StringDataType=Image/Binary
                            TotalBytes=10

                            UseCase: The manual editing of the dimensions sector in the image binary allows for the creation of CTF challenges. Manual BOM manipulation allows for forced cropping and the concealment of color-data in the rest of the image. It can also be used to create pixelated patterns for unraveling. Notepad++ has met all these requirements for me so far. So I am pretty impressed at what I can get away with in various image formats.

                            THIS EXAMPLE :


                            Binary: GIF89a NUL NUL
                            ASCII -> Hex: GIF89a20002000
                            Bytes: 1-6 , Header
                            Bytes: 7-10 , Dimensions


                            The above ASCII to Hex conversion is 100% correct and was performed using the ASCII->Hex converter Ver4.3 by Don Ho which utilizes NppConverter.dll
                            However this converter does not convert ‘ÿ’ to ‘FF’ for some reason. I therefore requested help on the Notepad++ forum and a py script was suggested to help with the ‘ÿ’ to ‘FF’ issue. The script worked successfully, but unfortunately landed up converting NUL bytes ‘00’ to spaces ‘20’.

                            This is not helpful due to the fact that a space in ASCII is represented by the value ‘32’ (Hex ‘20’).
                            So when it comes to image dimensions as in the GIF example above 20 00 20 00 , indicates a 32x32 pixel image.
                            NOTE: Using a space ‘20’ to replace a NUL ‘00’ will still result in a 32x32 Gif image,

                            HOWEVER, doing this to the dimension sector of a Jpeg will result in a 32x32px becoming a 8244x8224px image! Because the Big-Endian byte order of the 4-byte dimension block in Jpeg will translate 20202020 as 8244x8244 instead of 32x32 as in the case with GIF.

                            This is why I need the script to correctly translate NUL bytes to ‘00’ and NOT ‘20’, otherwise it will create outrageous dimensions in formats like Jpeg, Webp and PNG.
                            If NppConverter.dll is actually doing it right, then why can we not replicate it in Python?

                            Thanks once again.
                            Lance Marchetti

                            ascii-hex-NppConvter.dll.png


                            ascii-hex-PythonScript.dll.png

                            L 1 Reply Last reply Reply Quote 0
                            • L
                              LanceMarchetti @Mark Olson
                              last edited by

                              @Mark-Olson I did not know that about VSCode, thanks, Ill look into it. Mark.

                              1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello, @alan_kilborn and All,

                                So, here is the contents of this macro :

                                        <Macro name="Surrogates Pairs in Selection" Ctrl="no" Alt="no" Shift="no" Key="0">
                                            <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                                            <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?-i)\\x\{(10|[[:xdigit:]])[[:xdigit:]]{4}" />
                                            <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                                            <Action type="3" message="1602" wParam="0" lParam="0" sParam="$0\x1F" />
                                            <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                                            <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                                
                                            <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                                            <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?i)(?:(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(A)|(B)|(C)|(D)|(E)|(F)|(10))(?=[[:xdigit:]]{4}\x1F\})|(?:(0)|(1)|(2)|(3)|(4)|(5)|(6)|(7)|(8)|(9)|(A)|(B)|(C)|(D)|(E)|(F))(?=[[:xdigit:]]{0,3}\x1F\})" />
                                            <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                                            <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}0000)(?{2}0001)(?{3}0010)(?{4}0011)(?{5}0100)(?{6}0101)(?{7}0110)(?{8}0111)(?{9}1000)(?{10}1001)(?{11}1010)(?{12}1011)(?{13}1100)(?{14}1101)(?{15}1110)(?{16}1111)(?{17}0000)(?{18}0001)(?{19}0010)(?{20}0011)(?{21}0100)(?{22}0101)(?{23}0110)(?{24}0111)(?{25}1000)(?{26}1001)(?{27}1010)(?{28}1011)(?{29}1100)(?{30}1101)(?{31}1110)(?{32}1111)" />
                                            <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                                            <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                                
                                            <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                                            <Action type="3" message="1601" wParam="0" lParam="0" sParam="([01]{10})([01]{10})(?=\x1F)" />
                                            <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                                            <Action type="3" message="1602" wParam="0" lParam="0" sParam="110110\1\x1F}\\x{110111\2" />
                                            <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                                            <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                                
                                            <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
                                            <Action type="3" message="1601" wParam="0" lParam="0" sParam="(?:(0000)|(0001)|(0010)|(0011)|(0100)|(0101)|(0110)|(0111)|(1000)|(1001)|(1010)|(1011)|(1100)|(1101)|(1110)|(1111))(?=[[:xdigit:]]*\x1F\})|\x1F" />
                                            <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
                                            <Action type="3" message="1602" wParam="0" lParam="0" sParam="(?{1}0)(?{2}1)(?{3}2)(?{4}3)(?{5}4)(?{6}5)(?{7}6)(?{8}7)(?{9}8)(?{10}9)(?11A)(?12B)(?13C)(?14D)(?15E)(?16F)" />
                                            <Action type="3" message="1702" wParam="0" lParam="640" sParam="" />
                                            <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
                                        </Macro>
                                

                                How this macro works :

                                • You select a true Unicode value of a character over the BMP : \x{#####} or \x{######}, with 5 or 6 hexadecimal digits

                                • You run the macro

                                • You get its surrogate pair : \x{####}\x{####}


                                Note that your normal selection can encompass several \x{#####} syntaxes in one go. For example, after selection of the text, below and execution of the macro :

                                
                                \x{10000} is the FIRST char over the BMP
                                
                                This SECOND char \x{14A9B} is concerned by this macro, too
                                
                                A THIRD char \x{14A9D}, very closed to the PREVIOUS one
                                
                                A FOURTH example, at the VERY END of the UNICODE range = \x{10FFFF}
                                
                                

                                This text will be changed as :

                                
                                \x{D800}\x{DC00} is the FIRST char over the BMP
                                
                                This SECOND char \x{D812}\x{DE9B} is concerned by this macro, too
                                
                                A THIRD char \x{D812}\x{DE9D}, very closed to the PREVIOUS one
                                
                                A FOURTH example, at the VERY END of the UNICODE range = \x{DBFF}\x{DFFF}
                                
                                

                                Now, Alan, I will concede that, for a single char, this macro is not very useful as you can, instead, just select this character over the `BMP, open the search dialog and search for all occurrences of this specific character !

                                But, let’s suppose you would scan your files for those containing, let’s say, the Unicode Emoticons range of chars, which lays between the \x{1F600} and the \x{1F64F} characters. Refer to :

                                http://www.unicode.org/charts/PDF/U1F600.pdf

                                And which is reported, below :

                                😀 \x{1F600} = \x{D83D}\x{DE00}
                                😁 \x{1F601} = \x{D83D}\x{DE01}
                                😂 \x{1F602} = \x{D83D}\x{DE02}
                                😃 \x{1F603} = \x{D83D}\x{DE03}
                                😄 \x{1F604} = \x{D83D}\x{DE04}
                                😅 \x{1F605} = \x{D83D}\x{DE05}
                                😆 \x{1F606} = \x{D83D}\x{DE06}
                                😇 \x{1F607} = \x{D83D}\x{DE07}
                                😈 \x{1F608} = \x{D83D}\x{DE08}
                                😉 \x{1F609} = \x{D83D}\x{DE09}
                                😊 \x{1F60A} = \x{D83D}\x{DE0A}
                                😋 \x{1F60B} = \x{D83D}\x{DE0B}
                                😌 \x{1F60C} = \x{D83D}\x{DE0C}
                                😍 \x{1F60D} = \x{D83D}\x{DE0D}
                                😎 \x{1F60E} = \x{D83D}\x{DE0E}
                                😏 \x{1F60F} = \x{D83D}\x{DE0F}
                                😐 \x{1F610} = \x{D83D}\x{DE10}
                                😑 \x{1F611} = \x{D83D}\x{DE11}
                                😒 \x{1F612} = \x{D83D}\x{DE12}
                                😓 \x{1F613} = \x{D83D}\x{DE13}
                                😔 \x{1F614} = \x{D83D}\x{DE14}
                                😕 \x{1F615} = \x{D83D}\x{DE15}
                                😖 \x{1F616} = \x{D83D}\x{DE16}
                                😗 \x{1F617} = \x{D83D}\x{DE17}
                                😘 \x{1F618} = \x{D83D}\x{DE18}
                                😙 \x{1F619} = \x{D83D}\x{DE19}
                                😚 \x{1F61A} = \x{D83D}\x{DE1A}
                                😛 \x{1F61B} = \x{D83D}\x{DE1B}
                                😜 \x{1F61C} = \x{D83D}\x{DE1C}
                                😝 \x{1F61D} = \x{D83D}\x{DE1D}
                                😞 \x{1F61E} = \x{D83D}\x{DE1E}
                                😟 \x{1F61F} = \x{D83D}\x{DE1F}
                                😠 \x{1F620} = \x{D83D}\x{DE20}
                                😡 \x{1F621} = \x{D83D}\x{DE21}
                                😢 \x{1F622} = \x{D83D}\x{DE22}
                                😣 \x{1F623} = \x{D83D}\x{DE23}
                                😤 \x{1F624} = \x{D83D}\x{DE24}
                                😥 \x{1F625} = \x{D83D}\x{DE25}
                                😦 \x{1F626} = \x{D83D}\x{DE26}
                                😧 \x{1F627} = \x{D83D}\x{DE27}
                                😨 \x{1F628} = \x{D83D}\x{DE28}
                                😩 \x{1F629} = \x{D83D}\x{DE29}
                                😪 \x{1F62A} = \x{D83D}\x{DE2A}
                                😫 \x{1F62B} = \x{D83D}\x{DE2B}
                                😬 \x{1F62C} = \x{D83D}\x{DE2C}
                                😭 \x{1F62D} = \x{D83D}\x{DE2D}
                                😮 \x{1F62E} = \x{D83D}\x{DE2E}
                                😯 \x{1F62F} = \x{D83D}\x{DE2F}
                                😰 \x{1F630} = \x{D83D}\x{DE30}
                                😱 \x{1F631} = \x{D83D}\x{DE31}
                                😲 \x{1F632} = \x{D83D}\x{DE32}
                                😳 \x{1F633} = \x{D83D}\x{DE33}
                                😴 \x{1F634} = \x{D83D}\x{DE34}
                                😵 \x{1F635} = \x{D83D}\x{DE35}
                                😶 \x{1F636} = \x{D83D}\x{DE36}
                                😷 \x{1F637} = \x{D83D}\x{DE37}
                                😸 \x{1F638} = \x{D83D}\x{DE38}
                                😹 \x{1F639} = \x{D83D}\x{DE39}
                                😺 \x{1F63A} = \x{D83D}\x{DE3A}
                                😻 \x{1F63B} = \x{D83D}\x{DE3B}
                                😼 \x{1F63C} = \x{D83D}\x{DE3C}
                                😽 \x{1F63D} = \x{D83D}\x{DE3D}
                                😾 \x{1F63E} = \x{D83D}\x{DE3E}
                                😿 \x{1F63F} = \x{D83D}\x{DE3F}
                                🙀 \x{1F640} = \x{D83D}\x{DE40}
                                🙁 \x{1F641} = \x{D83D}\x{DE41}
                                🙂 \x{1F642} = \x{D83D}\x{DE42}
                                🙃 \x{1F643} = \x{D83D}\x{DE43}
                                🙄 \x{1F644} = \x{D83D}\x{DE44}
                                🙅 \x{1F645} = \x{D83D}\x{DE45}
                                🙆 \x{1F646} = \x{D83D}\x{DE46}
                                🙇 \x{1F647} = \x{D83D}\x{DE47}
                                🙈 \x{1F648} = \x{D83D}\x{DE48}
                                🙉 \x{1F649} = \x{D83D}\x{DE49}
                                🙊 \x{1F64A} = \x{D83D}\x{DE4A}
                                🙋 \x{1F64B} = \x{D83D}\x{DE4B}
                                🙌 \x{1F64C} = \x{D83D}\x{DE4C}
                                🙍 \x{1F64D} = \x{D83D}\x{DE4D}
                                🙎 \x{1F64E} = \x{D83D}\x{DE4E}
                                🙏 \x{1F64F} = \x{D83D}\x{DE4F}

                                In this case, knowing the first and last surrogate values of the Emoticons range, which are \x{D83D}\x{DE00} and \x{D83D}\x{DE4F}, it’s easier to get the right regex to search for, which is \x{D83D}[\x{DE00}-\x{DE4F}] !


                                And, indeed :

                                • Open a new tab

                                • Paste the 80 characters of the Emoticons range, with their associated value, above, in this new tab

                                • Open the Mark dialog ( Ctrl + M )

                                • Tick the Wrap around box and select the Regular expression mode

                                • Enter the regex \x{D83D}[\x{DE00}-\x{DE4F}] in the Search zone

                                • Click on the Mark All button

                                => The 80 characters, of this Unicode range, should be marked in red-orange, all at once !

                                Best Regards,

                                guy038

                                L 1 Reply Last reply Reply Quote 3
                                • L
                                  LanceMarchetti @guy038
                                  last edited by

                                  @guy038 Wow - Thanks for this macro!

                                  1 Reply Last reply Reply Quote 0
                                  • L
                                    LanceMarchetti @LanceMarchetti
                                    last edited by

                                    @LanceMarchetti said in Unicode 'ÿ' , problem converting to Hex 'FF':

                                    “NOTE: Using a space ‘20’ to replace a NUL ‘00’ will still result in a 32x32 Gif image,”
                                    Correction: I just noticed that the GIF file format repeats the image dimension block later on in the binary, just before the image data begins.
                                    So I tried using Hex 20 20 20 20 in the second dimension sector as well. The result was INDEED a 8224x8224 image!
                                    Changing it back to 20 00 20 00 resulted in a 32x32 image.

                                    Just wanted to set that straight. Thanks.

                                    1 Reply Last reply Reply Quote 0
                                    • PeterJonesP
                                      PeterJones @LanceMarchetti
                                      last edited by

                                      @LanceMarchetti ,

                                      I was surprised that no one had suggested submitting a bug report to the Converter Plugin repo’s issue list regarding the ÿ portion, because even though most of your request is abusing a text editor to edit a non-text file, Notepad++'s Converter Plugin should be able to handle converting ÿ in a valid ANSI file (win1252 or ISO-8859-1 or similar codepage).

                                      But when I went looking through open and closed issues, I found it’s actually already been reported multiple times – most recently, in #11… and #11 even came with a fix in PR#16, which was rejected because Don didn’t have the Steps to Reproduce.

                                      Based on this conversation, I was able to come up with a minimal Steps to Reproduce, and asked Don to re-open that issue, and hopefully he will then see whether the fix #16 will work, or will come up with one that will.

                                      The right answer to your full query is “Notepad++ is not a binary-file editor, so you need to accept hacky suggestions”. But ÿ should be able to be handled by the Converter plugin, so I poked the ÿ issue.

                                      L 1 Reply Last reply Reply Quote 3
                                      • Alan KilbornA
                                        Alan Kilborn
                                        last edited by Alan Kilborn

                                        I didn’t look super closely at it (due to limited interest), but I’d guess this line/block is the cause of the problem:

                                        https://github.com/npp-plugins/converter/blob/37d62a54a476c935f21bdd3aa37e04f3495adcb3/source/PluginDefinition.cpp#L390

                                        At least it seems suspicious. 0xFF can look a lot like -1. :-)

                                        PeterJonesP 1 Reply Last reply Reply Quote 2
                                        • PeterJonesP
                                          PeterJones @Alan Kilborn
                                          last edited by

                                          @Alan-Kilborn said in Unicode 'ÿ' , problem converting to Hex 'FF':

                                          At least it seems suspicious. 0xFF can look a lot like -1. :-)

                                          Given that PR#16 changed the getChar return value, which is what that -1 is comparing against, I’d say you’re probably right.

                                          Alan KilbornA 1 Reply Last reply Reply Quote 1
                                          • Alan KilbornA
                                            Alan Kilborn @PeterJones
                                            last edited by

                                            @PeterJones said in Unicode 'ÿ' , problem converting to Hex 'FF':

                                            Given that PR#16 changed the getChar return value, which is what that -1 is comparing against, I’d say you’re probably right.

                                            I didn’t look beyond the current code.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors