Community
    • Login

    Help for an ANSI file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    34 Posts 7 Posters 7.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse @andrecool-68
      last edited by Ekopalypse

      @andrecool-68

      is your windows system also setup with Russian language pack?
      If so, by disabling autodetect chacacter encoding you shouldn’t have any issues with cp1251 encoded files.
      Afaik npp uses your os encoding if you haven’t configured something else.

      1 Reply Last reply Reply Quote 0
      • andrecool-68A
        andrecool-68
        last edited by

        @Ekopalypse
        Automatic encoding detection is always disabled for me. And when I need to reopen in the desired encoding, and Notepad ++ does the re-saving of the document without the possibility of rollback. So in such a situation, I need to do it in another editor, so as not to irrevocably lose this file. For example, AkelPad4 editor never breaks the Russian encoding.

        EkopalypseE 1 Reply Last reply Reply Quote 0
        • EkopalypseE
          Ekopalypse @andrecool-68
          last edited by Ekopalypse

          @andrecool-68

          If you open a file, what shows npp in the statusbar? ANSI or cp1251?

          1 Reply Last reply Reply Quote 0
          • andrecool-68A
            andrecool-68
            last edited by

            @Ekopalypse
            Example:
            Initially the file is encoded OEM 866

            And notepad ++ opens up different options: Macintosh, Windows-1251, ANSI, UTF-8

            EkopalypseE 1 Reply Last reply Reply Quote 0
            • EkopalypseE
              Ekopalypse @andrecool-68
              last edited by

              @andrecool-68

              Hmmm … why should npp do this if you have automatic encoding detection disabled? Strange, it does not do this for me.
              Is OEM866 much different to cp1251?

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @Ekopalypse
                last edited by

                @Ekopalypse @andrecool-68

                I am interested in how this conversation turns out.
                I would like to know if there is a bug with this or not.
                Note that I would not consider N++'s lack of autodetection of what the user thinks is the correct encoding a bug, but I would if the user sets and encoding, saves a file, and somehow N++ messes that up.

                1 Reply Last reply Reply Quote 0
                • andrecool-68A
                  andrecool-68
                  last edited by andrecool-68

                  @Ekopalypse
                  @Alan-Kilborn
                  My file is OEM 866 and notepad ++ opens it as Macintosh, Windows-1251, ANSI, UTF-8 (options are always different)

                  For bat files, I need exactly the OEM 866 encoding.
                  (sorry for my google translate)

                  1 Reply Last reply Reply Quote 0
                  • EkopalypseE
                    Ekopalypse
                    last edited by

                    @andrecool-68

                    Если Вы отключили автоматическое распознавание и используете набор символов, противоречащий Вашим настройкам “ANSI”, блокнот++ не сможет отобразить документ в правильном формате.

                    Вот пример:
                    Я использую OEM850 и сохранил этот текст: “Неприятности в раю”.
                    Если я сейчас запущу блокнот++ и открою файл, то блокнот++ покажет мне это.

                    45836fed-1a5b-4b7d-abf8-c6260e0d4b6a-image.png

                    Он использует настройку операционной системы ANSI, которой для меня является CP1251.
                    Это нормально, но блокнот++ всегда сообщает ANSI и больше ничего.
                    Тот факт, что ты сообщаешь о разных вещах, вот что путает меня с твоим заявлением.
                    Почему ANSI не всегда отображается? Странно.

                    EkopalypseE andrecool-68A 2 Replies Last reply Reply Quote 1
                    • EkopalypseE
                      Ekopalypse @Ekopalypse
                      last edited by

                      Конечно, моя кодировка 1252, а не 1251.

                      1 Reply Last reply Reply Quote 1
                      • gstaviG
                        gstavi
                        last edited by

                        In my humble opinion this is a user interface failure.
                        What is the meaning of disable autodetect character encoding? If Notepad++ does not autodetect then it must assume some default. What is this default? – I tested and it is not the new file encoding.

                        The UI should have had a radio button that selects one of two options:

                        • Autodetect character encoding.
                        • Assume any opened files is <combo box>

                        There could be more advanced features like letting the user select a group of acceptable encodings for his region where Notepad++ must guess one of them. But that goes beyond UI.

                        1 Reply Last reply Reply Quote 1
                        • EkopalypseE
                          Ekopalypse
                          last edited by

                          @gstavi

                          If Notepad++ does not autodetect then it must assume some default.

                          I thought then it is ANSI, which depends on what GetACP returns for the current setup.

                          gstaviG 1 Reply Last reply Reply Quote 1
                          • gstaviG
                            gstavi @Ekopalypse
                            last edited by

                            @Ekopalypse
                            It is the first time I ever heard of GetACP and I wonder how a typical user should anticipate the behavior when he disables autodetect.
                            And it is obviously still broken because a user should be allowed to instruct Notepad++ to assume some specific UNICODE encoding rather than codepage.

                            Alan KilbornA 1 Reply Last reply Reply Quote 1
                            • Alan KilbornA
                              Alan Kilborn @gstavi
                              last edited by

                              @gstavi said in Help for an ANSI file:

                              user should be allowed to instruct Notepad++ to assume some specific UNICODE encoding rather than codepage

                              This might be relevant to that:

                              HERE @PeterJones says:

                              1. In the Settings > Preferences > New Document settings, if UTF-8 is chosen as your default encoding, you can also choose to always apply UTF-8 interpretation to files that Notepad++ opens and guesses are ANSI, not just to new files.

                              It seems a bit strange, or downright bad, that this option is buried in with the “New Document” settings?

                              1 Reply Last reply Reply Quote 1
                              • EkopalypseE
                                Ekopalypse
                                last edited by

                                @gstavi said in Help for an ANSI file:

                                I am also not convinced that it works 100%, and I have tried to understand this part of the code, but I have to admit that it is quite confusing for me.

                                I agree, it would be nice to have a possibility to force an encoding but
                                what I would like to have is to force a lexer to a specific encoding.
                                Like batch to OEM850 and python to utf8 …

                                1 Reply Last reply Reply Quote 2
                                • Alan KilbornA
                                  Alan Kilborn
                                  last edited by

                                  I did some more tangential playing around with this.

                                  I found that N++ will open a “7-bit ASCII” file (not sure how to really say that!) that has a NUL character in it, as ANSI. All other characters are your typical A-z0-9.
                                  But if the NUL is replaced with a SOH character, N++ opens it as UTF-8.
                                  Curious about why it does it differently.

                                  Of course, I’m mostly set up (I think) to have it work with UTF-8, but I’m less and less sure as the discussion goes on, what I should have selected in the Preferences to do this. :-)

                                  1 Reply Last reply Reply Quote 1
                                  • EkopalypseE
                                    Ekopalypse
                                    last edited by

                                    My understanding, when having autodetection disabled, is the following:

                                    A Scintilla buffer is initialized with _codepage = ::GetACP().
                                    The entry point is

                                    Notepad_plus::doOpen(const generic_string& fileName, bool isRecursive, bool isReadOnly, int encoding, const TCHAR *backupFileName, FILETIME fileNameTimestamp)
                                    

                                    The following steps are performed

                                    1. npp checks if the file is an html or xml file and if the encoding can be read from the prolog.
                                    2. when it is loaded from a session, it gets the encoding that was used before
                                      else
                                    3. Npp tries to find out if it is Unicode or ANSI (I don’t understand this part of the code)
                                      if it is a Unicode, the encoding is set accordingly
                                      otherwise Npp checks if “open ANSI as utf8” is configured and sets either ANSI or utf8
                                    1 Reply Last reply Reply Quote 2
                                    • guy038G
                                      guy038
                                      last edited by

                                      Hello, @alan-kilborn and All,

                                      Well, Alan, I guess the problem and there is a real bug !


                                      First, I suppose that, in your Settings > Preferences... > New Document > Encoding :

                                      • The UTF-8 encoding ( Not the UTF-8 with BOM one ) is selected

                                      • The Apply to opened ANSI files option is selected

                                      And in Settings > Preferences... > New Document > MISC. :

                                      • The Autodetect character encoding option is UNCHECKED

                                      Note Alan, that is also my own configuration, too !


                                      Now, let’s suppose that you open an N++ new file => So, in the status bar, the UTF-8 encoding is displayed : logical !

                                      Now just write the string ABCD, save this new file as Test.txt and close Notepad++

                                      While opening this file, any editor, without any other indication, cannot tell which is its right encoding :

                                      • It could be encoded with four bytes 41424344 in an ANSI file ( so any Windows encoding as Win-1252, Win-1251, … because the ASCII part, from 00 to 7F is identical

                                      • It could be encoded, also, with four bytes 41424344 in a N++ UTF-8 file ( so without a BOM ). Indeed, with the UTF-8 encoding, any character with code-point under \x{0080} is coded with in 1 byte only, from 00 to 7F

                                      But, as we have the setting Apply to opened ANSI files set, when you re-open the Test.txt file, again, you should see the UTF-8 indication in the status bar

                                      And, adding the SOH character ( \x{01} ) , or any character till \x{1F} ( I verified ), between AB and CD does not change anything. The encoding will remain UTF-8 !

                                      But, adding the NUL character change does change the encoding as ANSI, which is in contradiction with our user settings ! However, this particular case ( NUL char + pure ASCII chars, only ) does not really matter as current contents file do not change when switching from ANSI to UTF-8 and vice-versa, anyway !


                                      Now, what’s more annoying is that the presence of the NUL character still forces the ANSI encoding, even if a character, with code over \x{007F}, is added to the file :-(( For instance, if you add the very common French char é, to get the string ABNULCDé and save this file with an UTF-8 encoding, when you re-open this file, the encoding is wrongly changed to ANSI. So, the wrong string ABNULCDé is displayed !

                                      Remember that the contents of Test.txt file, the string ABNULCDé, after saving, are 4142004344C3A9 with the UTF-8 encoding ( This same string, would be coded 4142004344E9 in an ANSI file )

                                      So, although files with NUL characters are not common in classical text files, I suppose that this bug need creating an issue. What is your feeling about it ?

                                      Best Regards,

                                      guy038

                                      Alan KilbornA 1 Reply Last reply Reply Quote 1
                                      • Alan KilbornA
                                        Alan Kilborn @guy038
                                        last edited by

                                        @guy038 said in Help for an ANSI file:

                                        First, I suppose that, in your Settings > Preferences… > New Document > Encoding

                                        Right on the settings assumptions, except for me The Autodetect character encoding option is CHECKED

                                        So, although files with NUL characters are not common in classical text files, I suppose that this bug need creating an issue. What is your feeling about it ?

                                        Well, I was just sort of experimenting around. NUL characters are not something I typically use. Although I do have the feeling that if Scintilla allows them in the buffer (and clearly it does because I can see a black-boxed “NUL”), then Notepad++ itself should try and “do the right thing” (whatever that is) about them.

                                        1 Reply Last reply Reply Quote 2
                                        • Alan KilbornA
                                          Alan Kilborn
                                          last edited by

                                          But…
                                          It does seem like I as a user should be able to tell the software: "If a file can’t officially be identified via a BOM, then open it as ‘xxxxxxx’ " (UTF-8 for me! but YMMV).

                                          1 Reply Last reply Reply Quote 0
                                          • andrecool-68A
                                            andrecool-68 @Ekopalypse
                                            last edited by

                                            @Ekopalypse An example of an error:

                                            oem-866.png

                                            EkopalypseE 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors