Community
    • Login

    How to mantain special ASCII characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 3 Posters 24.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 3d1l3
      3d1l
      last edited by 3d1l

      Hi,

      I’m using notepad++ to keep notes. I type using C language because it allows me to edit sections of text that I can expand and collapse using { }. I’m using special ASCII codes for arrows, bullets and others, like ALT 16 ►, ALT 17 ◄, ←. After saving the file and re-openig it, Notepad++ replaces the characters with DEL, DC1, ETB, etc. The only ones that it keeps are ALT 254 ■ and ALT 251 √. Is there a way to keep the special characters?

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, 3d1l,

        I suppose that your Window OEM codepage, for your system, is OEM 437 ( Encoding > Character Sets > Western European > OEM US )

        If you open the Character Panel ( Edit > Character Panel ) it easy to verify, for instance, that characters, as the symbols ► and ◄, do NOT exist in the OEM 437 encoding and are simply replaced by the C0 Control characters DLE and DC1( of Unicode value 0016 and 0017 ) !

        You should use an Unicode encoding ( Encode as UTF-8, Encode as UTF-8 with BOM, UCS-2 BE BOM or UCS2 LE BOM ), which are the only encodings able to display an huge amount of characters/symbols, providing that your current N++ text font can display them !

        Refer to the end of my post, on Sourceforce.net, below :

        https://sourceforge.net/p/notepad-plus/discussion/331753/thread/e5b72494/#b5c1

        I would advise to use the universal Unicode UTF-8 encoding, which allows to code any character of any language, in the world ! Of course, depending of your current font, some glyphs of characters may be displayed or not and then, replaced with a small white square or a question mark !

        So, once your text, with its current encoding, and containing specific symbols, is written, just use the N++ option Encoding > Convert to UTF-8. Then, save your file, with this new encoding. After restarting N++, your file should display all your symbols, as expected :-) ( Note that I said Convert to UTF-8. Don’t use the option Encode in UTF-8 ! )

        BTW, the N++ default Courrier New font is able to display the 31 symbols ( from ALT + 1 to ALT + 31 ), below :

        ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀ ♪ ♫ ☼ ► ◄ ↕ ‼ ¶ § ▬ ↨ ↑ ↓ → ← ∟ ↔ ▲ ▼
        

        Best Regards

        guy038

        1 Reply Last reply Reply Quote 1
        • 3d1l3
          3d1l
          last edited by

          guy038 thanks for your responce. I just take a quick glance to your answer I will be checking into that. At this moment I’m using the font consolas because it make the zero (0) diferrent from the letter O.

          1 Reply Last reply Reply Quote 0
          • 3d1l3
            3d1l
            last edited by

            Ok I read your messages but I did something wrong and I mess up big time. Now I loose all accented characters (á, é, í, ú, ó, ñ, Ñ), hundred of them. They get replaced with “words” like [xA2] and some other with the “?” character.

            Actually I’m concerned because my idea of using notepad++ was so I can take notes, in a single file, in plain “vanilla” ASCII or text form. So I can open the file anywhere without caring of proprietary formats (like onenote or evernote). Now you explain to me that there isn’t really a plain text format. I like to use the font consolas (even when the font is not available in all platforms), because the zero and the letter O is different and I type using the C language, not because I’m coding, but because using the curly character { } I can keep the document indented and organized and N++ allows me to fold and unfold sections of the text.

            I don’t know if it made a difference but I pres CTRL-A to select all text and then went to encoding and selected convert to UTF-8-BOM. Then I went to Edit -> Character panel but there where no difference (the ASCII value still says NULL, SOH, BEL, DC1, etc). I type several special characters and they were properly displayed, then save the file. When I reopen the file not only were the arrows and dots replaced but I also lost the accented characters. I retyped some of them, save the document but after reopening they were replaced. I tried to use find/replace but after selecting the weird [xA2] word in the replace windows the programs put a ? inside of a black diamond so it can not find that.

            Is there a way to recover all the accented characters? and how exactly do I setup the program so it keeps the special ASCII characters?

            Thanks again.

            1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones
              last edited by

              First, to correct a misconception. There is a plain “vanilla” ASCII. It’s a 7bit encoding that hasn’t technically been used for decades. It involves only 128 code points, the first 32 of which are control characters, and are not guaranteed to have any specific glyph associated with them. They are control characters that are supposed to do fancy things to physical and (by extension, virtual) terminals. For codes 16 and 17 decimal (10h and 11h), your ancient font happened to assign a glyph under certain circumstances, but those are not guaranteed displayed values under all circumstances, not even under all “plain ASCII” circumstances.

              Next, accented characters. Even in the old days of MS-DOS, those were not part of ASCII. So if you were really using a plain vanilla ASCII, they are not possible. In the MS-DOS world, they were part of the “IBM PC” 8-bit “extended ASCII”, which was different from various other 8-bit extensions of ASCII throughout the world. The OEM 437 (aka CP437, “code page 437”) that @guy038 mentioned is the encoding / code page for IBM PC extended ASCII characters. But that’s only “plain vanilla encoding” if you happen to be using a machine that defaults to CP437.

              (Unicode and character-encoding pedants would probably find holes in my explanation…)

              Now, on to your actual problem: Go to Settings > Preferences > New Documents; change Encoding to ☑ UTF-8 and ☑ Apply to opened ANSI files. Close that dialog. This selection means that for new files, it will enocde in UTF-8 (without the BOM, the Byte Order Mark that goes at the beginning of the file) per the first checkmark, and will also assume that ANSI files (files without any BOM or other internal indication of the encoding) will be assumed to be UTF-8.

              Now create a new file (File > New). Encoding menu should now show “Encode in UTF-8” selected. Enter some accented characters and some others, “á, é, í, ú, ó, ñ, Ñ, ☑, →, ▶, ◀” (note that those last two are NOT code-points 17 and 18. They are U+25B6 “Black Right-Pointing Triangle” and U+25C0 “Black Left-Pointing Triangle”. They easiest way to get them into Notepad++ is to copy them from someplace else – I often use the FileFormat.Info Unicode Character Search, because you can just type a name of a character, or part of a name like “right”, and find all the unicode characters with that in the name. But I also often use the Windows Character Map (WIN+R, charmap.exe), then select your font of choice to make sure the Unicode character you want is in your font (BTW: I would recommend a more-complete UNICODE font, such as DejaVu Sans Mono, which still differentiates between O and 0 and between 1 and l, but has a wide selection of Unicode characters). Then ☑ Advanced View, Character Set = Unicode, Group By = Unicode Subrange. Selecting a Subrange will give you an organized list of characters; double-click on a character to put it in the Characters to copy, and hit Copy to put it into the Windows clipboard. Then paste into NPP as usual. (The arrows I showed are in the “Block Elements & Geometric Shapes” subrange, BTW. But you should really get to know the general subranges yourself, to help you find the character you want.)

              Save and close this file. Exit and reload, and re-open the file. The fancy characters should be preserved. The Encoding menu should still show “UTF-8” selected. If you select Encoding > Convert to UTF-8-BOM, save, exit and reload, the Encoding menu should now say “UTF-8-BOM”, and the Unicode characters should still show up (the file should also be two bytes longer because of the BOM).

              Let us know if this doesn’t work for you.

              1 Reply Last reply Reply Quote 3
              • guy038G
                guy038
                last edited by guy038

                Hi, Peter,

                Many thanks for your very detailed post !

                Just a small rectification : An UTF-8 BOM encoded file should be three bytes longer than the same UTF-8 encoded file !

                Indeed ! In a file, with a Unicode Transformation Format encoding, the invisible BOM character ( of code-point \x{FEFF} ) is written, with the three bytes EF BB BF ;-)

                Refer to :

                https://en.wikipedia.org/wiki/Byte_order_mark#Byte_order_marks_by_encoding

                With an UCS-2 BE encoding ( Universal Coded Character Set-2 ), the BOM is written with the two bytes FE FF

                With an UCS-2 LE encoding ( Universal Coded Character Set-2 ), the BOM is written with the two bytes FF FE

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 2
                • PeterJonesP
                  PeterJones
                  last edited by

                  Oh, right. I forgot the BOM is encoded in its own encoding. Thanks for the correction.

                  1 Reply Last reply Reply Quote 0
                  • 3d1l3
                    3d1l
                    last edited by

                    Wow!

                    Peter thanks. I followed what you said and is working. The only problem is that it seems that I lost the accents for good. I have a backup but it was not up to date :-(

                    Thanks for the character search web page, very handy. The Déjà Vu font is impressive but then if I get used to it and open the file in other platform without the font I will not see some of the characters. Funny it use a dot instead of a forward slash for the zero.

                    Peter thanks as well, very helpful comments.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors