Incorrect display of UTF-8 in the console.



  • Hello. I want to execute Python code so that the execution output is in the console. Configured as shown in the YouTube video “Setting up Notepad ++ for Python”. Only Russian words are distorted in the console when I run the file for execution. The encoding of the executable is UTF-8.
    Notepad++ v7.9.1 (64-bit)
    Build time : Nov 2 2020 - 01:07:46
    Path : C:\soft\Notepad++\notepad++.exe
    Admin mode : OFF
    Local Conf mode : ON
    OS Name : Windows 10 Home (64-bit)
    OS Version : 2009
    OS Build : 19042.1052
    Current ANSI codepage : 1251
    Plugins : mimeTools.dll NppConverter.dll NppExec.dll NppExport.dll NppRegExTractorPlugin.dll NppScripts.dll PreviewHTML.dll PythonScript.dll RegexTrainer.dll Remove Duplicate Lines.dll



  • @Александр-К-ш ,

    Could you show an example script that you think should output Russian text in the PythonScript console, and show a screenshot of what actually did get output?



  • @Александр-К-ш said in Incorrect display of UTF-8 in the console.:

    Configured as shown in the YouTube video “Setting up Notepad ++ for Python”

    Nobody’s going to watch a video so that they can answer your question here.



  • @Александр-К-ш ,

    For example, if I use the PythonScript v1.54 plugin, and use the PythonScript console’s immediate command to run console.write("Здравствуйте"), I get:
    df4f11b4-9ada-4aa6-aadd-a3c444825b00-image.png

    (I snagged that Russian text from your “Russian Translation”-topic post of the same problem)

    If I paste the same text in a new editor window in Notepad++ (UTF-8), and use console.write(editor.getText()) to print the contents of the active file to the console window, I also get the correct display:

    b652cf18-02fa-402d-a01f-d95fbd0805f4-image.png

    Thus, if used properly, Notepad++ and PythonScript v1.54 has no difficulty in writing Russian text

    Show us your code that produces the problem in your environment, and one of the Python/PythonScript experts here will probably quickly see how you are mishandling encoding in Python.



  • Correction: I meant PythonScript version 1.5.4 – forgot the second dot.

    Addendum: The experiment above was in Notepad++ v8.1. However, I unzipped a fresh Notepad++ v7.9.1-64bit portable and installed PythonScript 1.5.4, and it behaved identically.



  • @Александр-К-ш said in Incorrect display of UTF-8 in the console.:

    I want to execute Python code so that the execution output is in the console.

    I’d read this as wanting to run an external python via NppExec and having the output in the NppExec console window, but that’s for the OP to elaborate on.
    Of course, maybe someone watched the video…and it is indeed confirmed to be PythonScript.



  • @Alan-Kilborn said in Incorrect display of UTF-8 in the console.:

    maybe someone watched the video

    Since the OP didn’t link to the video, but just expected our search to come up with exactly the same video he found…

    @Александр-К-ш ,

    At this point, the questions that you need to answer / thinks you need to understand:

    • What exactly are you trying to do?
    • Which plugin are you using to try to do it?
    • We are not going to watch a video to know how you think you’ve set up your system
    • If you are using PythonScript,
      • try the experiments I showed, and show your results
      • share the PythonScript code you are using, and a screenshot of results, showing that it doesn’t work
    • If you are using NppExec
      • Show an example of the code you ran, with a screenshot of the “bad” results
      • Show the output of npe_console, which will tell us what input and output encodings you are using: for example, mine is 50cfad57-c0f8-4221-99ba-8b67e3ad582a-image.png
        If you are using NppExec, and you are using it out-of-the-box without changing your encoding settings, it might not be properly set up for unicode output.
    • If you are using something other than PythonScript or NppExec, you will have to show us what you’re really doing, with example code/scripts and screenshots.


  • If you are using NppExec…

    I didn’t have a python executable installed on this machine (pythonscript’s DLL is sufficient for my python needs), so when I had a couple spare minutes, I downloaded the “Windows embeddable package” (the minimalistic Python3 executable, which can be embedded, or is otherwise “portable”).

    #!python3
    # encoding=utf-8
    
    print("Hello World")
    print("Здравствуйте")
    

    When I ran from the command line, I got

    909bc8b5-f63e-4336-9e3c-153057c187c4-image.png

    showing that the code was good.

    When I tried to run from NppExec’s console, I got a UnicodeEncodeError. When I changed my python command line to invoke the -X utf8 option, then it worked inside NppExec’s console as well. Both these runs are shown in the screen cap below:

    b6c7b0e8-95db-44f1-a7a6-cfe64577a6d8-image.png

    So, if I use the NppExec script

    cd "$(CURRENT_DIRECTORY)"
    C:\usr\local\apps\python-3.9.6-embed-amd64\python.exe -X utf8 "$(FULL_CURRENT_PATH)"
    

    … then it properly outputs UTF8 text for me.

    When I first got this error, I hopped over to my normal cmd.exe window, and verified that the code works under normal circumstances (as shown above); then I ran python -h to find command line options, and looked through until I found:

    python -h
    ...
             -X utf8: enable UTF-8 mode for operating system interfaces, overriding the default
                 locale-aware mode. -X utf8=0 explicitly disables UTF-8 mode (even when it would
                 otherwise activate automatically)
    

    That is how I found the option which enabled it to work.

    I am sharing this detailed process with you, because when you are dealing with command lines, and embedding one process inside another, encoding can get confusing, even if none of the individual pieces are technically acting wrong. You sometimes have to be willing to run experiments and make guesses, trying to troubleshoot it yourself, before you can claim that a specific piece is causing the problem. It appears that running from NppExec provides a different locale to the standard cmd.exe environment, at least with my version of python.exe v3.9.6.



  • I have the same problem
    You must unicode and use convertetion
    My solution bellow

    # -*- coding: utf-8 -*-
    import os
    import sys
    from Npp import notepad # import it first!
    
    filePathSrc=ur"c:\stash\test" # Path to the folder with files to convert
    for root, dirs, files in os.walk(filePathSrc):
        for fn in files: 
           if fn[-4:] ==  u'.txt' : # Specify type of the files
                notepad.open((root + u"\\" + fn).encode('utf8'))      
                notepad.runMenuCommand("Encoding", "Convert to UTF-8")
                notepad.saveAs(u"{}{}".format((root + u"\\" + fn),  u'.utf8_txt' ).encode('utf8'))
                notepad.close()
    


  • @Sergey-Titkov

    I’m not sure you or any of the previous posters have answered the question posed by the OP.
    But, since the OP hasn’t returned to post and clarify, we don’t know (and it doesn’t matter since OP was the one asking for help).


Log in to reply