Community
    • Login

    How to show ascii value of one selected character or a double byte character?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 5 Posters 14.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C BaccaC
      C Bacca
      last edited by

      Hi,
      I sometimes copy and paste text from a web page into a text file which ends with “.md”. But sometimes the text has extended ascii characters above the decimal value of 127. I’d like to do a search and replace to replace an extended left double quote, with a normal double quote from the keyboard. Some text editors will show you the ascii value of that extended character in the status bar when your cursor is to the left of the character, or when you select that character but NPP doesn’t seem to do that.

      Here’s my info:
      Notepad++ v8.3.3 (64-bit)
      Build time : Mar 13 2022 - 17:20:02
      Path : C:\Apps\NPP\notepad++.exe
      Command Line : “H:\pandocbooks\00ebookwip\libnews\liblog.md”
      Admin mode : OFF
      Local Conf mode : ON
      Cloud Config : OFF
      OS Name : Windows 11 (64-bit)
      OS Version : 2009
      OS Build : 22000.675
      Current ANSI codepage : 1252
      Plugins : mimeTools.dll NppConverter.dll NppExport.dll

      NPP says the file is UTF-8 in the status bar.

      When I try to use the DELETE (not BACKSPACE) key to delete these extended characters it deletes one byte of the character but not the other and NPP seems to get confused. I am unable to see the other byte of the character to delete it.

      NOTE: Some of these extended characters that end up in the .md text file have 2 bytes. So my request has 2 parts:

      • When hidden characters are displayed I’d like to see these extended ascii characters as 2 bytes, or some other way, so I can delete them both.
      • I’d like to have NPP show the value of the character I select in the status bar even if the character is composed of 2 bytes.

      Additional info:
      Most of the extended characters I find on web pages are left and right double quotes, and the apostrophe. However there are a few other extended ascii characters, like accented letters, that also sometimes show up, plus the occasional copyright symbol. I would like to convert all these types of symbols to low-ascii text with a decimal value less than 128.

      PSPad can do this but I can no longer use it because sometimes it deletes data in the file while it is editing the file. It has simply become too unstable.

      Thank you!

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @C Bacca
        last edited by PeterJones

        @c-bacca ,

        If you are willing to use the PythonScript plugin, then install this script following these instructions to give yourself an on-demand reference as to what Unicode character is at the current cursor position.

        But if you are pasting “smart quotes” into a UTF8 tab in Notepad++ and it is not showing up as smart quotes but as multiple characters, then I am confused, because that shouldn’t happen. Please share a screenshot with those characters and the full status bar shown.

        Addendum: for example, here is an animated screenshot of my pasting in the “smart quotes” from above, showing the underlying character using the script I linked to, and showing that it only takes a single backspace to delete each smart quote:

        C BaccaC 1 Reply Last reply Reply Quote 1
        • C BaccaC
          C Bacca @PeterJones
          last edited by C Bacca

          @peterjones When I paste text into NPP that has smart quotes, (as an example of a 2 byte character) the smart quotes appear in NPP. The problem is I don’t want smart quotes in my UTF-8 file. So I put my cursor to the right of the smart quote, hit the BACKSPACE key once, and the character is deleted, EXCEPT there is now an invisible character (the first byte of the double byte character) that is still there and needs to be deleted by hitting backspace a second time.

          If I forget to delete that hidden first byte this causes more problems when I try to use the DELETE or BACKSPACE key in that line.

          The details here are: there appears to be at least 2 pairs of extended ASCII smartquote characters. One that has this problem, the other doesn’t. One set appears to be a single byte of an extended ASCII character above the value of 128. The other appears to be a double byte character, which has the problem I mentioned above.

          Unfortunately I cannot tell which characters have the problem just by looking at them.

          Does that make sense?

          p.s. If I find the website which produces double-byte smart quotes in NPP I will try to update you here so you can do what I do.

          Alan KilbornA PeterJonesP Shridhar KumarS 3 Replies Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @C Bacca
            last edited by

            @c-bacca

            You really haven’t given any more info than your first posting.
            Have you done everything Peter suggested that you do?

            1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @C Bacca
              last edited by

              @c-bacca ,

              I cannot find any of the “quotation marks” in Unicode that show that problem in Notepad++.

              I am wondering if you have an encoding problem: maybe whatever you are copying from is putting a UTF16 sequence as two raw bytes into the clipboard, and the clipboard is pasting the two raw bytes into Notepad++… but if that were the case, I wouldn’t expect the visible character to be “correct” as a smart quote.

              Using my WhatUniChar.py script linked above will help identify what character(s) are being put into Notepad++. And the script here will put little black boxes with text for any (most) invisible characters – so that could also help identify what was going on. But either of those is going to take you installing PythonScript and actually running the script to tell us more details.

              1 Reply Last reply Reply Quote 2
              • Shridhar KumarS
                Shridhar Kumar @C Bacca
                last edited by

                @c-bacca, you could also try the GotoLineCol plugin. The plugin will display the ANSI byte value, the UTF-8 byte sequence and the Unicode code point. All this info is displayed in the side panel and as a calltip when you navigate the doc using the side panel controls. See the sample clip below.

                When I find some rare emojis being used on Twitter, I copy & paste it into NPP and use GotoLineCol to ascertain the Unicode info.

                ac778117-9f0f-4e12-83e0-b401a5af9130-image.png

                [DISCLAIMER: I am the author of this plugin.]

                1 Reply Last reply Reply Quote 2
                • guy038G
                  guy038
                  last edited by guy038

                  Hello @peterjones and All,

                  Peter, regarding your WhatUniChar.py script, could you do a quick text ?

                  • Open any text file

                  • With the Edit > Character Panel menu option, insert a NUL character, roughly, near the middle of current file

                  • Move the cursor right before this NULL char

                  • Run your WhatUniChar.py script

                  Did you ever notice this case ? Of course, it quite easy to recognize that the next char, in reverse video, is \x{0000} !

                  Bu, to be rigorous, I changed the end of line 20 of your script into :

                  if c != 0 else 'END-OF-FILE / NUL')
                  

                  Best Regards,

                  guy038

                  PeterJonesP 1 Reply Last reply Reply Quote 0
                  • PeterJonesP
                    PeterJones @guy038
                    last edited by

                    @guy038 said in How to show ascii value of one selected character or a double byte character?:

                    Did you ever notice this case ?

                    Nope.

                    I decided to figure out how to tell the difference between EOF and NUL, so it is now changed to:

                            is_eof = (editor.getCurrentPos()==editor.getLength())
                            info = "'{1}' = HEX:0x{0:04X} = DEC:{0} ".format(c, s.encode('utf-8') if c not in [13, 10, 0] else 'LINE-ENDING' if c != 0 else 'END-OF_FILE' if is_eof else 'NUL')
                    
                    1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn
                      last edited by

                      As fine as the script is, I like @Shridhar-Kumar 's plugin for this, as more convenient. Now if it could only convert the text at the caret into hex/dec/bin equivalents (something I seem to have to do endlessly these days), I’d be truly excited.

                      Shridhar KumarS 1 Reply Last reply Reply Quote 2
                      • Shridhar KumarS
                        Shridhar Kumar @Alan Kilborn
                        last edited by

                        @alan-kilborn, I will be happy to implement your suggested enhancement. I will keep you posted. It might take a while since I am totally occupied with a couple of other projects.

                        Alan KilbornA 1 Reply Last reply Reply Quote 1
                        • Alan KilbornA
                          Alan Kilborn @Shridhar Kumar
                          last edited by

                          @shridhar-kumar said in How to show ascii value of one selected character or a double byte character?:

                          I will be happy to implement your suggested enhancement.

                          That’s great. I will create an issue on your github page describing what I’m looking for. If you choose to implement it, great! :-)

                          Alan KilbornA 1 Reply Last reply Reply Quote 1
                          • Alan KilbornA
                            Alan Kilborn @Alan Kilborn
                            last edited by

                            @alan-kilborn said in How to show ascii value of one selected character or a double byte character?:

                            I will create an issue on your github page

                            I did this HERE.

                            (Sorry for getting off-topic.)

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors