Community
    • Login

    What is BUFFERENCODING.COOKIE ??

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 3 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by

      I’m aware of what is referred to as “baking cookies” on this site, but I’ve encountered a different type of cookie and I’d like to know where it comes from and what it means.

      In a Pythonscript, when I run these commands with a UTF-8 encoded file as the active one, I get the result indicated:

      >>> notepad.getEncoding()
      Npp.BUFFERENCODING.COOKIE
      >>> notepad.getEncoding(notepad.getCurrentBufferID())
      Npp.BUFFERENCODING.COOKIE
      

      I kinda expected to get BUFFERENCODING.UTF8 back instead.
      Anyone have ideas of the whys and the hows on this?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Alan Kilborn
        last edited by

        @Alan-Kilborn ,

        I don’t know what it’s meant for – I don’t see it in the list of IDM_FORMAT_xxx contants, which are what I thought that the NPPM_GETBUFFERENCODING message was supposed to return. Hmm, I might have a bug in my Perl library, because it appears those actually return from the enum UniMode, which is enum UniMode {uni8Bit=0, uniUTF8=1, uni16BE=2, uni16LE=3, uniCookie=4, uni7Bit=5, uni16BE_NoBOM=6, uni16LE_NoBOM=7, uniEnd}; according to the source code.

        Running a quick experiment, if I set Encoding > UTF-8, notepad.getEncoding() shows Npp.BUFFERENCODING.COOKIE, but if I set Encoding > UTF-8-BOM, it returns Npp.BUFFERENCODING.UTF8.

        PeterJonesP 2 Replies Last reply Reply Quote 1
        • PeterJonesP
          PeterJones @PeterJones
          last edited by

          I ran more experiments

          Encoding >       | notepad.getEncoding() retval
          -----------------+-------------------------------
          ANSI             | Npp.BUFFERENCODING.ENC8BIT
          UTF-8            | Npp.BUFFERENCODING.COOKIE
          UTF-8-BOM        | Npp.BUFFERENCODING.UTF8
          UCS-2 BE BOM     | Npp.BUFFERENCODING.UCS2BE
          UCS-2 LE BOM     | Npp.BUFFERENCODING.UCS2LE
          CharacterSets>*  | Npp.BUFFERENCODING.COOKIE
          

          I didn’t actually try all the character sets, but I tried a few, and all the ones I tried gave me COOKIE.

          PeterJonesP 1 Reply Last reply Reply Quote 3
          • Alan KilbornA
            Alan Kilborn
            last edited by

            a718156b-df1e-47aa-a6c1-d11397249c58-image.png

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones @PeterJones
              last edited by PeterJones

              I tried digging into the Github Blames with code history, and as far as I can tell, for at least the last 12 years, he has mapped uniCookie to IDM_FORMAT_AS_UTF_8 (as opposed to uniUTF8 => IDM_FORMAT_UTF_8)

              • current: https://github.com/notepad-plus-plus/notepad-plus-plus/blob/8426c9ccd98157d2712e1b54d54f498a32e6481f/PowerEditor/src/Notepad_plus.cpp#L4439-L4453

              • 12 years ago: https://github.com/notepad-plus-plus/notepad-plus-plus/blob/4dd3b257e06c56093d109f055911addfe93770f7/PowerEditor/src/Notepad_plus.cpp#L5484-L5497

              I cannot find a reason that he called one “cookie” in the commit comments…

              1 Reply Last reply Reply Quote 2
              • Alan KilbornA
                Alan Kilborn
                last edited by

                So I guess it is fine, if a bit strange, that someone wants to call UTF-8 without BOM “cookie” encoding in their software…

                But, paraphrasing the above table:

                Encoding >       | notepad.getEncoding() retval
                -----------------+-------------------------------
                UTF-8            | Npp.BUFFERENCODING.COOKIE
                CharacterSets>*  | Npp.BUFFERENCODING.COOKIE
                

                How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

                PeterJonesP 1 Reply Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones @Alan Kilborn
                  last edited by

                  @Alan-Kilborn ,

                  I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either.

                  But I’m often bad at finding the things in PythonScript that others find easily; maybe @Ekopalypse has insight on this one.

                  EkopalypseE Alan KilbornA 2 Replies Last reply Reply Quote 0
                  • EkopalypseE
                    Ekopalypse @PeterJones
                    last edited by

                    @PeterJones

                    Sorry, but I have no idea.

                    1 Reply Last reply Reply Quote 0
                    • PeterJonesP
                      PeterJones @PeterJones
                      last edited by

                      @PeterJones said in What is BUFFERENCODING.COOKIE ??:

                      I might have a bug in my Perl library

                      I did. It’s been fixed. :-)

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @PeterJones
                        last edited by

                        @alan-kilborn said in What is BUFFERENCODING.COOKIE ??:

                        How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

                        @peterjones said in What is BUFFERENCODING.COOKIE ??:

                        I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either. …maybe @Ekopalypse has insight on this one.

                        @ekopalypse said in What is BUFFERENCODING.COOKIE ??:

                        Sorry, but I have no idea.


                        So…it’s a bit cheesy, but one can read the status bar field that shows the encoding type to get this information (as its text string).

                        We’ve discussed in this forum how to get the status bar data before using a PythonScript, but I’ve revamped my code for doing it so I’ll post the whole thing here.

                        Here’s NotepadGetStatusBar.py with a demo at the end to show the current file’s encoding:

                        # -*- coding: utf-8 -*-
                        from __future__ import print_function
                        
                        from Npp import *
                        import ctypes
                        from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT
                        
                        class NGSB(object):  # implements a "getStatusBar" function (complement to notepad.setStatusBar())
                        
                            def __init__(self):
                                self.SendMessageW = ctypes.windll.user32.SendMessageW
                                LRESULT = LPARAM
                                self.SendMessageW.restype = LRESULT
                                self.SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
                                self.create_unicode_buffer = ctypes.create_unicode_buffer
                                self.curr_class_256 = self.create_unicode_buffer(256)
                                self.STATUSBAR_HANDLE = None
                                self._determine_statusbar_handle()
                                assert self.STATUSBAR_HANDLE
                        
                            def get_statusbar_by_section(self, statusbar_item_number):
                                # statusbar_item_number can be integer 0 thru 5 or one of:
                                #  STATUSBARSECTION.DOCTYPE
                                #  STATUSBARSECTION.DOCSIZE
                                #  STATUSBARSECTION.CURPOS
                                #  STATUSBARSECTION.EOFFORMAT
                                #  STATUSBARSECTION.UNICODETYPE
                                #  STATUSBARSECTION.TYPINGMODE
                                return self._get_statusbar_section(statusbar_item_number)
                        
                            def get_statusbar_as_tuple(self):
                                section_list = [
                                    STATUSBARSECTION.DOCTYPE,
                                    STATUSBARSECTION.DOCSIZE,
                                    STATUSBARSECTION.CURPOS,
                                    STATUSBARSECTION.EOFFORMAT,
                                    STATUSBARSECTION.UNICODETYPE,
                                    STATUSBARSECTION.TYPINGMODE,
                                ]
                                return tuple(list(map(lambda x: self._get_statusbar_section(x), section_list)))
                        
                            def _determine_statusbar_handle(self):
                        
                                WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
                                FindWindowW = ctypes.windll.user32.FindWindowW
                                FindWindowExW = ctypes.windll.user32.FindWindowExW
                                EnumChildWindows = ctypes.windll.user32.EnumChildWindows
                                GetClassNameW = ctypes.windll.user32.GetClassNameW
                        
                                self.STATUSBAR_HANDLE = None
                        
                                def enum_callback_fn(hwnd, lparam):
                                    GetClassNameW(hwnd, self.curr_class_256, 256)
                                    if self.curr_class_256.value.lower() == "msctls_statusbar32":
                                        self.STATUSBAR_HANDLE = hwnd
                                        return False  # stop the enumeration
                                    return True  # continue the enumeration
                        
                                npp_hwnd = FindWindowW(u"Notepad++", None)
                                #print('npp_hwnd:', npp_hwnd)
                                EnumChildWindows(npp_hwnd, WNDENUMPROC(enum_callback_fn), 0)
                                #print('self.STATUSBAR_HANDLE:', self.STATUSBAR_HANDLE)
                        
                            def _get_statusbar_section(self, statusbar_item_number):
                                assert statusbar_item_number <= 5
                                WM_USER = 0x400
                                SB_GETTEXTLENGTHW = WM_USER + 12
                                SB_GETTEXTW = WM_USER + 13
                                SBT_OWNERDRAW = 0x1000
                                retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
                                length = retcode & 0xFFFF
                                type = (retcode >> 16) & 0xFFFF
                                assert (type != SBT_OWNERDRAW)
                                text_buffer = self.create_unicode_buffer(length)
                                retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
                                retval = '{}'.format(text_buffer[:length])
                                return retval
                        
                        if __name__ == '__main__':
                            print(NGSB().get_statusbar_by_section(STATUSBARSECTION.UNICODETYPE))
                        

                        Running the script will print the current file’s encoding (from statusbar information) to the Pythonscript console window.

                        1 Reply Last reply Reply Quote 3
                        • Alan KilbornA Alan Kilborn referenced this topic on
                        • Alan KilbornA Alan Kilborn referenced this topic on
                        • Alan KilbornA Alan Kilborn referenced this topic on
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors