UnicodeEncodeError: 'gbk' codec can't encode character '\u2649' in position 4: illegal multibyte sequence
-
As shown in the figure, when running Python code, notepad++ reports an error “UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\u2649’ in position 4: illegal multibyte sequence”.
The same code will not report an error when running on vscode. What is the reason? How should notepad++ be set? Thank you for your answers.
-
it’s python.exe, not Notepad++, which is giving you that error. However, since Notepad++'s use of encoding might be the culprit, it’s still on-topic here.
Possibly, the file is not saved with the encoding that you think it is in Notepad++. You will want to look at Notepad++'s Encoding menu’s selection, and/or the status bar (which you didn’t show in the screenshot):
Also, look to see what encoding VSCode claims it is.
If I remember correctly, without an
# encoding=...
or similar, python3 assumes UTF-8. I know that the one I showed has an error when I do UTF-8 – but I just used OCR on your screenshot to try to get similar characters… I have no idea how to enter the exact characters. (it would have been nice if you’d clicked on</>
in the forum, then put your example code, with the right characters, in between the```
lines, so we could copy/paste.) So it might be that by pasting from OCR, I got characters that won’t encode correctly.When looking at the error I get when I do Notepad++ in UTF-8, I thought Notepad++ might be using the surrogate pairs (But before posting, while re-reading, I noticed that it wasn’t actually listing anything in the surrogate-pair range, so this paragraph was probably wrong – at least for the characters I used; maybe your characters do, I’m not sure.U+D800
–U+DFFF
) to encode theU+1xxxx
, and then encodes those into UTF-8 in the file. I don’t think UTF-8 normally needs/uses surrogate pairs, so that might be confusing the Python interpreter. (I think “modified UTF-8” allows it, but maybe not normal UTF-8.) And since Python 3.12 codecs says that it doesn’t support surrogates for even UTF-16 (which are the encodings for which surrogate pairs are defined), it wouldn’t surprise me if Python doesn’t accept the surrogates for UTF-8 either. But someone who knows more about Python’s encoding rules would really need to chime in for that.Since your error message included “gbk”, I tried setting
# encoding=gbk
(which was listed in Python’s codecs), but then couldn’t find a Notepad++ Encoding (at least not in the Encoding > Character Set > Chinese) that matched GBK . I tried# encoding=gb2312
and using that encoding in Notepad++ gives meSyntaxError: encoding problem: gb2312
instead…(But I’m not an expert on Chinese encodings – everything that seems right in anything I said above specifically about those encodings was only because google apparently gave me good answers; anything that’s wrong is probably my fault for not understanding/interpreting correctly.)
I guess I might just be wasting everyone’s time, other than suggesting that you double-check encodings on both applications to see if there’s a difference. Sorry for the rambling. Maybe there’s someone else who has more experience with Chinese characters and python and Notepad++ all working together, who can come give you a real answer.
-
@PeterJones Thank you for your answer, here is the original code:
n = eval(input(“请输入一个数字:”))
print(“{:+^11}”.format(chr(n-1)+chr(n)+chr(n+1)))After running, enter 9802 in the console to reproduce.
-
see here for a solution to your problem