• Login
Community
  • Login

Names of character sets

Scheduled Pinned Locked Moved General Discussion
character set
22 Posts 4 Posters 3.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Alan Kilborn @Paul Wormer
    last edited by Alan Kilborn Nov 17, 2022, 1:12 PM Nov 17, 2022, 1:10 PM

    @Paul-Wormer said in Names of character sets:

    CP 437 (or OEM 437) would be more telling than OEM-US

    See HERE for some also-known-as’s.

    It is interesting to find the entry “OEM-US” under the heading “Western European”

    I found that strange as well; I have no explanation for why it was slotted under that.

    1 Reply Last reply Reply Quote 0
    • P
      Paul Wormer @PeterJones
      last edited by Nov 17, 2022, 1:53 PM

      @PeterJones As Alan Kilborn pointed out, one can change code pages within N++ (under Encoding). I played around a little with it and found that often Edit > Character Panel followed the setting of the code page, but not consistently. Is there a kind of systematics that I overlook?

      A 1 Reply Last reply Nov 17, 2022, 1:55 PM Reply Quote 0
      • A
        Alan Kilborn @Paul Wormer
        last edited by Nov 17, 2022, 1:55 PM

        @Paul-Wormer said in Names of character sets:

        found that often Edit > Character Panel followed the setting of the code page, but not consistently

        Please give an example of the inconsistency.

        P 1 Reply Last reply Nov 17, 2022, 2:12 PM Reply Quote 0
        • P
          Paul Wormer @Alan Kilborn
          last edited by Paul Wormer Nov 17, 2022, 2:13 PM Nov 17, 2022, 2:12 PM

          @Alan-Kilborn
          53a20dab-a28d-4ff6-b803-b9141c9d2f81-afbeelding.png

          11e8fb79-6f17-4d52-9a1a-393ac4b01ed3-afbeelding.png

          As you know 0x80 is a good character to concentrate on. In UTF8 it is a control character, in CP1252 it is € and in CP 437 it is Ç. I showed results of one edit session. First UTF8 that shows the character panel of CP1252 and then OEM-US that shows CP 437.

          A 1 Reply Last reply Nov 17, 2022, 2:24 PM Reply Quote 0
          • A
            Alan Kilborn @Paul Wormer
            last edited by Alan Kilborn Nov 17, 2022, 2:24 PM Nov 17, 2022, 2:24 PM

            @Paul-Wormer

            Hmm, not exactly sure what you’re getting at, but if I use the Character Panel to insert the “0x80” character into a UTF-8 file (and I think the accurate way to say that is, insert a U+0080 character), I visually obtain the € character (as I would expect), and if I save the file and open it in a hex editor I see the 3-byte combination E2 82 AC (which I also expect). So I don’t see anything unexpected or inconsistent here, but maybe I’m just missing what you’re trying to say.

            P 1 Reply Last reply Nov 17, 2022, 2:31 PM Reply Quote 0
            • P
              Paul Wormer @Alan Kilborn
              last edited by Paul Wormer Nov 17, 2022, 2:45 PM Nov 17, 2022, 2:31 PM

              @Alan-Kilborn I would like to see the character panel that agrees with my choice of character set. For instance, if I choose UTF8, I like to see a panel with control codes (no letters) between 0x80 and 0xa0. Or, as in Windows charmap, the control characters may be simply skipped in the panel. In other words, I would like to see more or less the same character panels as in charmap.

              Now it seems from the panel as if the € sign has the code 0x80 in UTF8. And, as you correctly point out, it has a 3-byte code, not a 1-byte code in UTF8. BTW, the official Unicode code point of € is 0x20AC, the 3-bytes: E282AC give its UTF8 coding.

              P 2 Replies Last reply Nov 17, 2022, 2:40 PM Reply Quote 0
              • P
                PeterJones @Paul Wormer
                last edited by Nov 17, 2022, 2:40 PM

                This post is deleted!
                1 Reply Last reply Reply Quote 0
                • P
                  PeterJones @Paul Wormer
                  last edited by PeterJones Nov 17, 2022, 2:51 PM Nov 17, 2022, 2:50 PM

                  @Paul-Wormer,

                  UTF-8 and UTF-16 seem to be “oddities”. The character sets all just show what’s at the 255 individual codepoints for that character set. But my guess with the UTF-# encodings, because “unicode” handling was an afterthought about halfway through the Notepad++ lifecycle, is that Don just left those there in the UTF-# character panels to show the character from the nth position in Win-1252, but map it to the correct codepoint in Unicode, with the correct bytes for the UTF-# encoding. This would have made it easier for people to transition from Win-1252 to Unicode/UTF-8, while still being able to find their “favorite” characters at the same location in the Notepad++ Character Panel.

                  Personally, it doesn’t annoy me, because I found the character panel not overly helpful for the Unicode encodings. I have a Notepad++ Run-menu entry to bring up charmap.exe. If I were going to recommend changes to the Character Panel (other than fixing the compiled and english.xml and english_customizable.xml name to be just “Character Panel” that Alan uses) would be to allow it to have multiple “subpages” on the UTF-8/UTF-16 encodings, just like charmap.exe “group by unicode subpage” does, so that you can access more than 255 characters in the Character Panel

                  P 1 Reply Last reply Nov 17, 2022, 3:00 PM Reply Quote 1
                  • P
                    Paul Wormer @PeterJones
                    last edited by Nov 17, 2022, 3:00 PM

                    @PeterJones said in Names of character sets:

                    because “unicode” handling was an afterthought about halfway through the Notepad++ lifecycle

                    Well, the transition from a 1-byte character set to a variable-byte character set seems to my layman’s eye more than an afterthought.

                    I don’t know anything about coding of Windows apps, but couldn’t N++ simply interface with charmap.exe?

                    P 1 Reply Last reply Nov 17, 2022, 3:13 PM Reply Quote 0
                    • P
                      PeterJones @Paul Wormer
                      last edited by PeterJones Nov 17, 2022, 3:15 PM Nov 17, 2022, 3:13 PM

                      @Paul-Wormer said in Names of character sets:

                      @PeterJones said in Names of character sets:

                      because “unicode” handling was an afterthought about halfway through the Notepad++ lifecycle

                      Well, the transition from a 1-byte character set to a variable-byte character set seems to my layman’s eye more than an afterthought.

                      It was a detailed afterthought… but it wasn’t part of the original design of Notepad++ (in the old days, you used to have to download a separate unicode executable vs the standard ANSI executable). And the unicode updates to the character panel do look to me like they were just trying to make something look the same as the WIN-1252/ANSI panel, but mapping those same characters to the UTF-8 or UTF-16 encoding.

                      I don’t know anything about coding of Windows apps, but couldn’t N++ simply interface with charmap.exe?

                      46a4a7d1-59bd-4667-93e2-3a79216e148c-image.png

                      I don’t think Windows win32-api provides the charmap.exe interface as a widget, unlike the color picker or other such tool. And since you can easily save a run-menu entry and assign a keyboard shortcut to run charmap.exe, doing fancy stuff to embed it seems a waste of time.

                      1 Reply Last reply Reply Quote 0
                      • H
                        Harry Arthur
                        last edited by Dec 6, 2022, 1:34 PM

                        Thanks for the discussion guys. It really helps me to figure out my problem.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors