• Login
Community
  • Login

Another encoding issue

Scheduled Pinned Locked Moved General Discussion
4 Posts 3 Posters 255 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Alan Kilborn
    last edited by Nov 4, 2020, 1:52 PM

    So, encoding has been a theme here recently in the Community.
    Or, maybe it is just me bringing it up all the time. :-)
    I’m not even working with obscure encodings, just UTF-8 (no BOM).

    Here’s the latest:

    Confirm my encoding setting:

    6415c532-08e8-4302-885d-9d41269b7615-image.png

    I had some text I was working on, which contained this UTF-8 character ➤ :

    d66bf2d3-fbd7-485a-92ae-55df46d34dda-image.png

    And I used the arrow key to caret up from the http line so that I could put some text after the “right arrow” on the line above, and what I was seeing changed to:

    020cab4f-bd02-49a2-b460-938775b06ac7-image.png

    I can’t say for sure much more than that.
    I could undo to get the original text back for the screenshot, but I couldn’t say what my actions were before this happened.
    Meaning, I don’t know what position I was on on line 247 when I pressed up-arrow and got the weirdness.
    (And no matter what I’ve tried, I can’t reproduce it.)

    But, I know that the “x” position on a line is retained sometimes so that when a move up or down is made, the x position can be maintained. Perhaps in this case this somehow caused the caret to end up in the middle of the UTF-8 character?

    This is with 7.9, BTW.

    P 1 Reply Last reply Nov 4, 2020, 2:41 PM Reply Quote 1
    • P
      PeterJones @Alan Kilborn
      last edited by Nov 4, 2020, 2:41 PM

      @Alan-Kilborn ,

      Not sure. I cannot get anything like that to happen.

      Even in this example:

       ➤ 
       x 
      
          ➤ 
          https://com
          ➤ 
          ➤ 
      

      64c34357-ada1-4ce6-a497-3435eb933a87-image.png
      … where the cursor is at offset 18, shown in red: if I try to go to offset 19,20, or 21, it places the cursor after the ➤ (those are the three byte offsets for the UTF8 encoding of that character), and 22 places it after the space.

      So I don’t know how you convinced it to break that character apart. Also, ➤ is U+27A4, so it’s three bytes should be 0xE2, 0x9E, 0xA4… so your screenshot shows that it took the two outer bytes, but the central byte is apparently missing.

      Ooh, that gave me a hint: in my example, if I go to offset 19 then hit DEL, it changed to
      9c3adeb1-fb67-4603-9e99-26f2d5f79240-image.png

      So I am guessing what happened is that you somehow got it to the central offset in the multi-byte character and deleted it – though for me, UNDO works to fix that. So maybe you triggered a script which deleted the byte from the character but plays with the UNDO history so UNDO didn’t work.

      A 1 Reply Last reply Nov 4, 2020, 3:35 PM Reply Quote 1
      • A
        Alan Kilborn @PeterJones
        last edited by Nov 4, 2020, 3:35 PM

        @PeterJones

        So first, thanks for your thoughts and your experimentation.

        maybe you triggered a script which deleted the byte from the character

        I suppose this IS possible, although I don’t have any scripts that do any deleting when I’m just careting around. :-)

        It seems like even scripts should be somewhat insulated from getting into the middle of multibyte characters. Sure, it should be possible for those that need it (not me), but I pretty much always want to deal with things at a character level. So such a thing should be “difficult” to have happen, if truly caused by a script.

        But, alas, I don’t have more data on this, so that’s the end of conclusions.

        Regarding “character level”, it is a bit disturbing to me that Notepad++ allows the user to jump to an offset right in the middle of a multibyte character. Again, I would expect to be restricted to the character level by this.

        1 Reply Last reply Reply Quote 2
        • G
          guy038
          last edited by Nov 4, 2020, 9:29 PM

          Hello, @alan-kilborn, @peterjones and All,

          So I created a new issue about this disturbing behavior ;-))

          Best regards,

          guy038

          1 Reply Last reply Reply Quote 2
          1 out of 4
          • First post
            1/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors