Community
    • Login

    Another encoding issue

    Scheduled Pinned Locked Moved General Discussion
    4 Posts 3 Posters 255 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by

      So, encoding has been a theme here recently in the Community.
      Or, maybe it is just me bringing it up all the time. :-)
      I’m not even working with obscure encodings, just UTF-8 (no BOM).

      Here’s the latest:

      Confirm my encoding setting:

      6415c532-08e8-4302-885d-9d41269b7615-image.png

      I had some text I was working on, which contained this UTF-8 character ➤ :

      d66bf2d3-fbd7-485a-92ae-55df46d34dda-image.png

      And I used the arrow key to caret up from the http line so that I could put some text after the “right arrow” on the line above, and what I was seeing changed to:

      020cab4f-bd02-49a2-b460-938775b06ac7-image.png

      I can’t say for sure much more than that.
      I could undo to get the original text back for the screenshot, but I couldn’t say what my actions were before this happened.
      Meaning, I don’t know what position I was on on line 247 when I pressed up-arrow and got the weirdness.
      (And no matter what I’ve tried, I can’t reproduce it.)

      But, I know that the “x” position on a line is retained sometimes so that when a move up or down is made, the x position can be maintained. Perhaps in this case this somehow caused the caret to end up in the middle of the UTF-8 character?

      This is with 7.9, BTW.

      PeterJonesP 1 Reply Last reply Reply Quote 1
      • PeterJonesP
        PeterJones @Alan Kilborn
        last edited by

        @Alan-Kilborn ,

        Not sure. I cannot get anything like that to happen.

        Even in this example:

         ➤ 
         x 
        
            ➤ 
            https://com
            ➤ 
            ➤ 
        

        64c34357-ada1-4ce6-a497-3435eb933a87-image.png
        … where the cursor is at offset 18, shown in red: if I try to go to offset 19,20, or 21, it places the cursor after the ➤ (those are the three byte offsets for the UTF8 encoding of that character), and 22 places it after the space.

        So I don’t know how you convinced it to break that character apart. Also, ➤ is U+27A4, so it’s three bytes should be 0xE2, 0x9E, 0xA4… so your screenshot shows that it took the two outer bytes, but the central byte is apparently missing.

        Ooh, that gave me a hint: in my example, if I go to offset 19 then hit DEL, it changed to
        9c3adeb1-fb67-4603-9e99-26f2d5f79240-image.png

        So I am guessing what happened is that you somehow got it to the central offset in the multi-byte character and deleted it – though for me, UNDO works to fix that. So maybe you triggered a script which deleted the byte from the character but plays with the UNDO history so UNDO didn’t work.

        Alan KilbornA 1 Reply Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @PeterJones
          last edited by

          @PeterJones

          So first, thanks for your thoughts and your experimentation.

          maybe you triggered a script which deleted the byte from the character

          I suppose this IS possible, although I don’t have any scripts that do any deleting when I’m just careting around. :-)

          It seems like even scripts should be somewhat insulated from getting into the middle of multibyte characters. Sure, it should be possible for those that need it (not me), but I pretty much always want to deal with things at a character level. So such a thing should be “difficult” to have happen, if truly caused by a script.

          But, alas, I don’t have more data on this, so that’s the end of conclusions.

          Regarding “character level”, it is a bit disturbing to me that Notepad++ allows the user to jump to an offset right in the middle of a multibyte character. Again, I would expect to be restricted to the character level by this.

          1 Reply Last reply Reply Quote 2
          • guy038G
            guy038
            last edited by

            Hello, @alan-kilborn, @peterjones and All,

            So I created a new issue about this disturbing behavior ;-))

            Best regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors