Community
    • 登入

    UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?

    已排程 已置頂 已鎖定 已移動 General Discussion
    20 貼文 5 Posters 2.1k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • Alan KilbornA
      Alan Kilborn @PeterJones
      最後由 編輯

      @PeterJones said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

      while you did find the section of source-code that deals with 0-byte files and that option, it also applies that option to non-0-byte files

      Yes, I did that because that (empty files) was what the OP was asking about.
      There’s one other usage of _openAnsiAsUtf8 in the source code, and it deals with the other situation you show.

      So, okay, the UI is “okay”.
      So, okay, the user manual is also “okay”.
      But I will put some thought into it…about how to make it better.

      Maybe what makes me feel weird about it, and this may be a nod to Notepad++ 's history, is it feels like it wants to favor ANSI over UTF-8, which, in today’s world, doesn’t feel right. Maybe I’d feel better if the logic were inverted and the wording were more of ‘“downgrade” to ANSI’. :-)

      I still have more “encoding” related exploration of the source code to do, so maybe I’ll change my mind.

      1 條回覆 最後回覆 回覆 引用 1
      • Alan KilbornA
        Alan Kilborn
        最後由 Alan Kilborn 編輯

        I will say that an option that isn’t related to Notepad++ 's creation of a new document but located in the New Document section of the Preferences is a bit odd. Maybe that contributes to my “weird feeling” about this Apply to opened ANSI files setting.

        BTW…a lot of my “feelings” exposed in these last 2 posts. :-)

        1 條回覆 最後回覆 回覆 引用 1
        • Alan KilbornA
          Alan Kilborn
          最後由 編輯

          Maybe this (changing my UI text) makes me feel better too:

          a242f4d4-9c32-4bdd-a2a6-d35c0b8f3b61-image.png

          PeterJonesP 1 條回覆 最後回覆 回覆 引用 0
          • PeterJonesP
            PeterJones @Alan Kilborn
            最後由 編輯

            @Alan-Kilborn ,

            Interesting idea. That works for someone who understands basic encoding ideas (like you). But a newbie wouldn’t be expected to know what a “7bit file” is.

            Regarding UI: I might be tempted to move the … MISC > Autodetect character encoding option to the New Document page, and rename that page Encodings / Line Endings or something – maybe File Interpretation ? I don’t know what’s the best phrasing.

            ccd9e26b-29a0-440c-a388-03e66e7d12a7-image.png

            I don’t know. UI Design is hard.

            Alan KilbornA 1 條回覆 最後回覆 回覆 引用 1
            • Alan KilbornA
              Alan Kilborn @PeterJones
              最後由 Alan Kilborn 編輯

              @PeterJones said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

              That works for someone who understands basic encoding ideas (like you). But a newbie wouldn’t be expected to know what a “7bit file” is.

              This is true, and it DOES help me, because in about a week (or perhaps less) I will forget again what Apply to opened ANSI files means. :-) Joys of being “old”.

              BTW, I said “changing MY UI text”. I wasn’t suggesting it as a change to Notepad++.

              1 條回覆 最後回覆 回覆 引用 1
              • ImSpecialI
                ImSpecial @Ekopalypse
                最後由 編輯

                @Ekopalypse said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                @ImSpecial

                Afaik this is the point where apply to opened ANSI files comes into play.

                Yes but I don’t want to “upgrade” an existing files if they are already in ANSI, only for new 0 byte files, when again, are not really anything yet.

                I like @PeterJones mockup UI, or at least those options makes more sense to me, ie. Default Encoding: UTF8 / Apply “Default” Encoding to 0-byte files / but use ANSI if the existing file is already encoded in ANSI.

                Alan KilbornA 1 條回覆 最後回覆 回覆 引用 1
                • Alan KilbornA
                  Alan Kilborn @ImSpecial
                  最後由 編輯

                  @ImSpecial said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                  I don’t want to “upgrade” an existing files if they are already in ANSI

                  It shouldn’t happen.
                  Try this test with the UTF-8 radio button chosen and the checkbox ticked.

                  Create a disk file containing only abc and perhaps a line-ending.
                  Open the file in N++.
                  It should be UTF-8. Why? Because there is nothing about its contents that indicate ANSI–and in a modern world this is how it should be.
                  You can call it “upgraded” if you want, but I wouldn’t.

                  Create another disk file containing abc followed by a byte with value 255. May want to use a hex editor to do this.
                  Open the file in N++.
                  Notepad++ should indicate ANSI.
                  Nothing was “upgraded”.

                  Do you have a different scenario in mind where this Notepad++ behavior would give you trouble?

                  Alan KilbornA ImSpecialI 2 條回覆 最後回覆 回覆 引用 0
                  • Alan KilbornA
                    Alan Kilborn @Alan Kilborn
                    最後由 編輯

                    @Alan-Kilborn said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                    Create another disk file containing abc followed by a byte with value 255. May want to use a hex editor to do this.

                    BTW, I tried to create this file within N++ by making sure my status bar said “ANSI” and then double-clicking the 255 line in the Character Panel to insert the character. Unexpectedly (to me at least) what I actually achieved was two F characters inserted into my file, instead of one byte with value 255.

                    PeterJonesP 1 條回覆 最後回覆 回覆 引用 0
                    • PeterJonesP
                      PeterJones @Alan Kilborn
                      最後由 PeterJones 編輯

                      @Alan-Kilborn said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                      Unexpectedly (to me at least) what I actually achieved was two F characters inserted into my file, instead of one byte with value 255.

                      Where you click in that panel determines what you get. If you click on FF, you get FF. If you click on ÿ, you get ÿ. If you click on ÿ, you get ÿ

                      7d667da8-8db3-4d7a-a757-7f2fcc86304b-image.png

                      Alan KilbornA 1 條回覆 最後回覆 回覆 引用 3
                      • Alan KilbornA
                        Alan Kilborn @PeterJones
                        最後由 編輯

                        @PeterJones said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                        Where you click in that panel determines what you get.

                        Ha. Amazing. I never knew that.
                        Of course, I have never used this panel before today.
                        I have very close to zero use for “ANSI”; that’s probably the reason.
                        Thanks for the good info!
                        :-)

                        1 條回覆 最後回覆 回覆 引用 1
                        • guy038G
                          guy038
                          最後由 編輯

                          Hello, @alan-kilborn, @imspecial, @ekopalypse, @peterjones and All,

                          Alan, I think that, in your post, the term 7 Bit is not appropriate as ANSI encoded files use a 1-byte encoding, i.e. a 8-bit encoding !

                          Eventually, it would be better to change the phrase Apply to opened ANSI files in section Settings > Preferences... > New Document > Encoding with :

                          • Encode, in UTF-8, any ANSI file, with ASCII characters only (NO char > \x7F)

                          Best Regards

                          guy038

                          Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                          • Alan KilbornA
                            Alan Kilborn @guy038
                            最後由 編輯

                            @guy038

                            the term 7 Bit is not appropriate as ANSI encoded files use a 1-byte encoding, i.e. a 8-bit encoding !

                            We are really talking about how N++ works to “classify” a file it is loading. When I mentioned “7 bit” I meant, at the point in the classification process, N++ has seen a file that contains only bytes with no most-significant-bits set, i.e., only 7-bit data.

                            Here’s the relevant piece of code:

                            https://github.com/notepad-plus-plus/notepad-plus-plus/blob/2d4640ce42dcf0c2878bb42f92a31182e4986cd1/PowerEditor/src/ScintillaComponent/Buffer.cpp#L695

                            So in my mind, the checkbox we are talking about comes into play when we have either of these situations:

                            • zero-byte file
                            • file with bytes with only 7-bit data

                            That was my rationale in using the term “7 bit”.
                            It doesn’t mean that the file would never contain characters from 128-255, just at the current time, no characters in the files meet that criterion.
                            Does it make sense?

                            1 條回覆 最後回覆 回覆 引用 2
                            • guy038G
                              guy038
                              最後由 編輯

                              Hello , @alan-kilborn,

                              OK, I understand what you mean and, actually, my formulation (NO char > \x7F) and yours 7 bit files do mean the same thing, namely that the eighth bit is not used ( so = 0 ) in existing ANSI files which are automaticaly encoded in UTF-8 on opening, if the squared box is ticked !


                              Now, I think that my formulation, of my previous post, should be improved as :

                              • Encode, in UTF-8, any opened ANSI file, with ASCII characters only (NO char > \x7F)

                              Best Regards,

                              guy038

                              Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                              • Alan KilbornA
                                Alan Kilborn @guy038
                                最後由 Alan Kilborn 編輯

                                @guy038 said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                                Encode, in UTF-8, any opened ANSI file, with ASCII characters only (NO char > \x7F)

                                One problem is, that text is very long.
                                For UI text, that is.

                                (NO char > \x7F) and yours 7 bit files do mean the same thing

                                Yes! I thought this, I just didn’t type it. :-)

                                1 條回覆 最後回覆 回覆 引用 0
                                • ImSpecialI
                                  ImSpecial @Alan Kilborn
                                  最後由 編輯

                                  @Alan-Kilborn said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                                  Do you have a different scenario in mind where this Notepad++ behavior would give you trouble?

                                  My concern with upgrading files other then 0 byte ones, is that I very often will have my own copy of something and like to compare it against the original, which might be in ANSI, and when doing compares with the notepad++'s Compare plugin, it will complain that the encodings are different when doing so.

                                  Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                                  • Alan KilbornA
                                    Alan Kilborn @ImSpecial
                                    最後由 編輯

                                    @ImSpecial said in UTF-8 encoding for new documents - This a bug, an oversight or intended behavior?:

                                    My concern with upgrading files other then 0 byte ones, is that I very often will have my own copy of something and like to compare it against the original, which might be in ANSI, and when doing compares with the notepad++'s Compare plugin, it will complain that the encodings are different when doing so.

                                    So, again, this “upgrade” will only take place if there are no characters in the file that would make the file ANSI. Meaning, there are no characters with byte values from 128 to 255.

                                    If you intentionally work in ANSI most of the time, presumably your files WILL contain characters with byte values from 128 to 255 (because otherwise, why work in ANSI).

                                    Thus, with your “compare” scenario, your fears probably aren’t realized, as you pull the first (base) file in, it is detected and shown as ANSI. Same thing when you pull the second (changed) file in. So when you go to do the compare there is no encoding difference.

                                    Perhaps that’s not the reality of it, but that’s how I see it. :-)

                                    1 條回覆 最後回覆 回覆 引用 1
                                    • 第一個貼文
                                      最後的貼文
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors