Extended ASCII ALT+xxx char Display issue



  • Hello,

    I meet some issues while using Extended ASCII code (127-255) in an text UTF-8 file.
    Some char are displayed with a “void square”

    1st case :

    • With NotePAD++ Installer v7.x : KO
    • After updating to last v8 (v8.1.3) : KO
    • After removing all extensions : KO

    2nd case :

    • By using NotePAD++ Portable v8.1.4 : OK

    3rd case :

    • By using Microsoft NotePAD : OK

    Rem. : As you can see, the ASCII Code panel display ALT+0xxx char values (BAD Extended ASCII Chars) and not ALT+000 char value

    P.S. : I don’t know how to join some pictures to illustrate the issue.



  • @David-Tcheki Update :
    The issue concerns the following char (x17) :

    • 176-178 (x3)
    • 185-188 (x4)
    • 200-206 (x7)
    • 219-220 (x2)
    • 223 (x1)


  • @David-Tcheki said in Extended ASCII ALT+xxx char Display issue:

    First, responding to some side points:

    1st case :

    • With NotePAD++ Installer v7.x : KO
    • After updating to last v8 (v8.1.3) : KO
    • After removing all extensions : KO

    By “KO” I assume you mean it doesn’t work, for some definition of “work”

    • By using NotePAD++ Portable v8.1.4 : OK

    By “OK” I assume you mean it does work.

    3rd case :

    • By using Microsoft NotePAD : OK

    FYI: neither product capitalizes the PAD – Microsoft Notepad and Don Ho’s Notepad++.

    Rem. : As you can see,

    Sorry, I cannot see that from your post.

    P.S. : I don’t know how to join some pictures to illustrate the issue.

    Copy the image (Windows standard feature: old-fashioned Alt+PrintScreen, or the new Snip & Sketch tool), then paste it in your post here.


    Now back to the meat of your question.

    I meet some issues while using Extended ASCII code (127-255) in an text UTF-8 file.
    Some char are displayed with a “void square”

    Could you give a more-specific example? Screenshots would have helped. Or copy/paste the actual text you are trying to display, so we know which characters you mean.

    Do you mean you’re really getting the unknown-glyph symbol 𖡄 or 𖡄 ? (Sometimes that glyph is rendered with a ? inside and sometimes not)
    867a2429-fab3-4da4-8b24-bf89b1a05992-image.png d90c1cf0-c46e-476a-be0a-fdfbc6d1770a-image.png

    … That symbol means your chosen font doesn’t have a glyph for that character. Though I would doubt that any of the codepoints in the 127-255 range would give you that in a modern Windows environment. You can see/change your chosen font in Settings > Style Configurator > Global Styles > Default Style:
    0128e383-d1ac-4479-a08f-61aa89d21482-image.png

    Sometimes toggling the setting of Settings > Preferences > MISC > Use DirectWrite will help Windows/Notepad++ find glyphs for all of your characters, or display those glyphs better. But sometimes it makes it harder for some people to read. You will have to choose whichever toggle on that setting works best for you.

    But given all the mentions of ALT+xxx and the specific characters you mentioned, I wonder if you meant the old box-drawing shaded boxes like ░ ▒ ▓ 9561ede9-1b65-4630-a7c2-e75180a09e16-image.png
    I am going to go with that for the rest of the post.

    the ASCII Code panel display ALT+0xxx char values (BAD Extended ASCII Chars) and not ALT+000 char value

    903e4094-3cb9-42d6-8828-6273e023dfcc-image.png
    Microsoft’s documentation

    When Microsoft documentation tells you to use ALT+0xxx to get a particular character, it is the right way of doing it, not the “BAD” way.

    The OLD, 1980’s-technology way of entering characters from the OEM-US character set is to use ALT+xxx for the codepoint within that 255-character set. The correct way in modern windows is to prefix with the 0 (ALT+0xxx).

    Here’s an external resource that shows the ways of typing the degree symbol:
    6214e7ce-f452-49a3-8be8-798d55305b53-image.png

    And Wikipedia’s Alt Codes entry concurs.

    Note that in Notepad++, if you use Edit > Character Panel to show the ASCII codes insertion panel, you can see that 176 is the degree symbol when the encoding is UTF-8:
    76068162-637b-4b8c-ad49-39f9ea9b46bf-image.png

    OTOH, if you create a file, and use Encoding > Character Sets > Western European > OEM-US to get the old so-called “extend ASCII” or “box drawing” character set, then codepoint 176 is a box-drawing character .
    b9f663ee-c061-4952-84c4-e1423c109476-image.png

    But it doesn’t matter which encoding you are in in Notepad++, if you type ALT+176 it will do the box-drawing character; if you do ALT+0176 it will do the degree symbol. The same is true in MS Notepad as well:
    e5f24d12-b5f1-42f6-86ad-0e21134e6256-image.png



  • @PeterJones
    First a big Thanks for your complete answer.

    I don’t know why, but I have only 3mn to have the possibility of editing my post, so I can’t apply any update now (I mean as example using “Notepad” instead “NotePAD”).

    UPDATE :

    • OK means “No Issue”
    • KO means “Issue appears”

    About screenshots, I have done all I need.
    But I will expect as when I select the “image” icon I could choose a file from my computer and upload it on the message.
    Instead, it asks a link, so I didn’t know how to do.
    (Habits…)

    Finally :
    The issue comes from the font used :

    1. It seems there is no Default Style defined :
      79d2b33b-e01f-44a5-8837-cfb999501655-image.png

    2. I can’t see DejaVu Sans Mono font in the list but by memory I remember Consolas is also a monospaced font.
      930e26d7-fd48-4216-a4bb-2303f2a3c582-image.png

    3. From the Portable version of Notepad++ (v8.1.4), there is well a Default Style defined.
      So I don’t know what happened with the Installer version (even after the update, it seems there is always no Default Style defined)
      0050c95b-a596-45be-929d-0ada136d38df-image.png



  • @PeterJones

    UPDATE

    Please see below the initial screenshots, I would like to share :

    1. With NotePAD++ Installer v7.x and also after updating to last v8 (v8.1.3) -> KO
      1dcc9fc9-2b3e-4886-a986-48fc3b15e9c3-image.png

    2. By using Notepad++ Portable v8.1.4 -> OK
      5a904c4f-32e0-47e7-b72b-528d96bd76ff-image.png

    3. By using Microsoft Notepad -> OK
      037b8111-1f1c-44e4-9caf-2710955fd819-image.png



  • @David-Tcheki

    SOLUTION :
    It seems the issue comes from the Obsidian theme which has no Default Style Font defined.



  • @PeterJones

    About Notepad++ char setting - OEM-850 vs UTF-8

    In all case if I want to display an extended ASCII char, I will have to type ALT+xxx sequence (not ALT+0xxx) either in OEM-850 or either in UTF-8 Encoding.

    e9a22c41-7b0c-4d23-8339-0967fe40d4dd-image.png

    a8824240-01d2-4542-a628-319436ab5059-image.png

    It seems not enough clear for me :

    • what is the binary “char” code of an extended ASCII char (128-255) in UTF-8 format ? (I mean are Extended ASCII char have their own code in Unicode Format ?)
    • Where are coded Extended ASCII char with Windows Page Code (ALT+0xxx “new” Format) ?


  • @David-Tcheki said,

    I can’t see DejaVu Sans Mono font in the list

    No, I had to install that myself. I prefer it to the default Courier New; it has a lot more of the technical unicode glyphs that I use, and I like the look of the font. Consolas is a reasonable choice (it comes default with modern Windows, and has more glyphs than Courier New), though I don’t like it’s “look” quite as much as DejaVu Sans Mono (personal preference).

    SOLUTION : It seems the issue comes from the Obsidian theme which has no Default Style Font defined.

    Oi. I’m surprised. But yes, if there is no font defined, then Windows probably goes through its fallback choices, which can get confusing. The DirectWrite option I mentioned might have made it pick a better font… but it’s better to define a font.

    Re: “surprised”: Ah, looking through the GitHub “blame” on the Obsidian, it shows that the Default style was fixed to include a font name in commit 6dacca9, which has been in effect since v7.3.2. So apparently your upgrade path started from v7.3.1 or earlier. (Notepad++ doesn’t overwrite theme files when you update, because people would complain that styles they had customized were lost – for example, if they had tweaked Obsidian to use DejaVu Sans Mono instead of nothing or Courier New).

    In all case if I want to display an extended ASCII char, I will have to type ALT+xxx sequence (not ALT+0xxx) either in OEM-850 or either in UTF-8 Encoding.

    Yes, as I said (emphasis added): “it doesn’t matter which encoding you are in in Notepad++, if you type ALT+176 it will do the box-drawing character; if you do ALT+0176 it will do the degree symbol.”

    what is the binary “char” code of an extended ASCII char (128-255) in UTF-8 format ? (I mean are Extended ASCII char have their own code in Unicode Format ?)

    Easy enough to look up: the “extended ASCII” OEM-850 is well documented… for example, Wikipedia’s Code page 850 entry shows the upper 128 characters; the position in CP850 is noted by the rows and columns; the 4 hex-digit Unicode codepoint is listed in each box:
    3c0a3a85-0171-4a77-8025-f168c644228a-image.png
    … so, for example, ALT+176 = CP850#176 is U+2591 (so it’s at Uniocode codepoint 0x2591 = decimal 9617).
    And that screenshot, or the Wiki article, can be used to look up any of the others

    Where are coded Extended ASCII char with Windows Page Code (ALT+0xxx “new” Format) ?

    The original “external resource” I linked has a link on the “How to type in Microsoft Windows” which explains – and even the screenshot I included above showed three ways of typing the degree symbol; that site has pages for Unicode character, and will show the ALT sequences for each.

    Further, Wikipedia’s Alt Codes entry, which I also linked, says how to type the ALT sequences for any unicode hex point (so the #### from the U+#### notation or from the #### at the bottom of each of the cells in the table shown above)



  • @David-Tcheki wrote,

    OEM-850

    BTW: OEM-850 / CP850 was the default codepage in Western Europe. US Computers default to CP437 (OEM-US), so their table of ALT+### is a bit different:

    a4eebea5-5263-452d-b9b6-9d9424215403-image.png

    As the Wiki: Alt Codes page points out,

    The familiar Alt+number combinations produced codes from the OEM code page (for example, CP437 in the United States)[c], matching the results from MS-DOS. But prefixing a leading zero (0) to the number (usually meaning 4 digits) produced the character specified by the newer Windows code page, allowing them to be typed as well.

    So future readers on a US machine would want to use this table as their map, not the CP850 table shown previously.


Log in to reply