Another encoding issue

Alan Kilborn

So, encoding has been a theme here recently in the Community.
Or, maybe it is just me bringing it up all the time. :-)
I’m not even working with obscure encodings, just UTF-8 (no BOM).

Here’s the latest:

Confirm my encoding setting:

I had some text I was working on, which contained this UTF-8 character ➤ :

And I used the arrow key to caret up from the http line so that I could put some text after the “right arrow” on the line above, and what I was seeing changed to:

I can’t say for sure much more than that.
I could undo to get the original text back for the screenshot, but I couldn’t say what my actions were before this happened.
Meaning, I don’t know what position I was on on line 247 when I pressed up-arrow and got the weirdness.
(And no matter what I’ve tried, I can’t reproduce it.)

But, I know that the “x” position on a line is retained sometimes so that when a move up or down is made, the x position can be maintained. Perhaps in this case this somehow caused the caret to end up in the middle of the UTF-8 character?

This is with 7.9, BTW.

PeterJones

@Alan-Kilborn ,

Not sure. I cannot get anything like that to happen.

Even in this example:

 ➤ 
 x 

    ➤ 
    https://com
    ➤ 
    ➤

… where the cursor is at offset 18, shown in red: if I try to go to offset 19,20, or 21, it places the cursor after the ➤ (those are the three byte offsets for the UTF8 encoding of that character), and 22 places it after the space.

So I don’t know how you convinced it to break that character apart. Also, ➤ is U+27A4, so it’s three bytes should be 0xE2, 0x9E, 0xA4… so your screenshot shows that it took the two outer bytes, but the central byte is apparently missing.

Ooh, that gave me a hint: in my example, if I go to offset 19 then hit DEL, it changed to

So I am guessing what happened is that you somehow got it to the central offset in the multi-byte character and deleted it – though for me, UNDO works to fix that. So maybe you triggered a script which deleted the byte from the character but plays with the UNDO history so UNDO didn’t work.

Alan Kilborn

@PeterJones

So first, thanks for your thoughts and your experimentation.

maybe you triggered a script which deleted the byte from the character

I suppose this IS possible, although I don’t have any scripts that do any deleting when I’m just careting around. :-)

It seems like even scripts should be somewhat insulated from getting into the middle of multibyte characters. Sure, it should be possible for those that need it (not me), but I pretty much always want to deal with things at a character level. So such a thing should be “difficult” to have happen, if truly caused by a script.

But, alas, I don’t have more data on this, so that’s the end of conclusions.

Regarding “character level”, it is a bit disturbing to me that Notepad++ allows the user to jump to an offset right in the middle of a multibyte character. Again, I would expect to be restricted to the character level by this.

guy038

Hello, @alan-kilborn, @peterjones and All,

So I created a new issue about this disturbing behavior ;-))

Best regards,

guy038