UnicodeEncodeError: 'gbk' codec can't encode character '\u2649' in position 4: illegal multibyte sequence

mabangde0

As shown in the figure, when running Python code, notepad++ reports an error “UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u2649’ in position 4: illegal multibyte sequence”.

The same code will not report an error when running on vscode. What is the reason? How should notepad++ be set? Thank you for your answers.

屏幕截图 2024-08-17 190206.png
屏幕截图 2024-08-17 190143.png

PeterJones

@mabangde0 ,

it’s python.exe, not Notepad++, which is giving you that error. However, since Notepad++'s use of encoding might be the culprit, it’s still on-topic here.

Possibly, the file is not saved with the encoding that you think it is in Notepad++. You will want to look at Notepad++'s Encoding menu’s selection, and/or the status bar (which you didn’t show in the screenshot):

Also, look to see what encoding VSCode claims it is.

If I remember correctly, without an # encoding=... or similar, python3 assumes UTF-8. I know that the one I showed has an error when I do UTF-8 – but I just used OCR on your screenshot to try to get similar characters… I have no idea how to enter the exact characters. (it would have been nice if you’d clicked on </> in the forum, then put your example code, with the right characters, in between the ``` lines, so we could copy/paste.) So it might be that by pasting from OCR, I got characters that won’t encode correctly.

When looking at the error I get when I do Notepad++ in UTF-8, I thought Notepad++ might be using the surrogate pairs (U+D800–U+DFFF) to encode the U+1xxxx, and then encodes those into UTF-8 in the file. I don’t think UTF-8 normally needs/uses surrogate pairs, so that might be confusing the Python interpreter. (I think “modified UTF-8” allows it, but maybe not normal UTF-8.) And since Python 3.12 codecs says that it doesn’t support surrogates for even UTF-16 (which are the encodings for which surrogate pairs are defined), it wouldn’t surprise me if Python doesn’t accept the surrogates for UTF-8 either. But someone who knows more about Python’s encoding rules would really need to chime in for that. But before posting, while re-reading, I noticed that it wasn’t actually listing anything in the surrogate-pair range, so this paragraph was probably wrong – at least for the characters I used; maybe your characters do, I’m not sure.

Since your error message included “gbk”, I tried setting # encoding=gbk (which was listed in Python’s codecs), but then couldn’t find a Notepad++ Encoding (at least not in the Encoding > Character Set > Chinese) that matched GBK . I tried # encoding=gb2312 and using that encoding in Notepad++ gives me SyntaxError: encoding problem: gb2312 instead…

(But I’m not an expert on Chinese encodings – everything that seems right in anything I said above specifically about those encodings was only because google apparently gave me good answers; anything that’s wrong is probably my fault for not understanding/interpreting correctly.)

I guess I might just be wasting everyone’s time, other than suggesting that you double-check encodings on both applications to see if there’s a difference. Sorry for the rambling. Maybe there’s someone else who has more experience with Chinese characters and python and Notepad++ all working together, who can come give you a real answer.

mabangde0

@PeterJones Thank you for your answer, here is the original code:

n = eval(input(“请输入一个数字:”))
print(“{:+^11}”.format(chr(n-1)+chr(n)+chr(n+1)))

After running, enter 9802 in the console to reproduce.

Ekopalypse

@mabangde0

see here for a solution to your problem