• Login
Community
  • Login

What is BUFFERENCODING.COOKIE ??

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
10 Posts 3 Posters 1.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Alan Kilborn
    last edited by Oct 12, 2020, 2:37 PM

    I’m aware of what is referred to as “baking cookies” on this site, but I’ve encountered a different type of cookie and I’d like to know where it comes from and what it means.

    In a Pythonscript, when I run these commands with a UTF-8 encoded file as the active one, I get the result indicated:

    >>> notepad.getEncoding()
    Npp.BUFFERENCODING.COOKIE
    >>> notepad.getEncoding(notepad.getCurrentBufferID())
    Npp.BUFFERENCODING.COOKIE
    

    I kinda expected to get BUFFERENCODING.UTF8 back instead.
    Anyone have ideas of the whys and the hows on this?

    P 1 Reply Last reply Oct 12, 2020, 2:53 PM Reply Quote 0
    • P
      PeterJones @Alan Kilborn
      last edited by Oct 12, 2020, 2:53 PM

      @Alan-Kilborn ,

      I don’t know what it’s meant for – I don’t see it in the list of IDM_FORMAT_xxx contants, which are what I thought that the NPPM_GETBUFFERENCODING message was supposed to return. Hmm, I might have a bug in my Perl library, because it appears those actually return from the enum UniMode, which is enum UniMode {uni8Bit=0, uniUTF8=1, uni16BE=2, uni16LE=3, uniCookie=4, uni7Bit=5, uni16BE_NoBOM=6, uni16LE_NoBOM=7, uniEnd}; according to the source code.

      Running a quick experiment, if I set Encoding > UTF-8, notepad.getEncoding() shows Npp.BUFFERENCODING.COOKIE, but if I set Encoding > UTF-8-BOM, it returns Npp.BUFFERENCODING.UTF8.

      P 2 Replies Last reply Oct 12, 2020, 3:12 PM Reply Quote 1
      • P
        PeterJones @PeterJones
        last edited by Oct 12, 2020, 3:12 PM

        I ran more experiments

        Encoding >       | notepad.getEncoding() retval
        -----------------+-------------------------------
        ANSI             | Npp.BUFFERENCODING.ENC8BIT
        UTF-8            | Npp.BUFFERENCODING.COOKIE
        UTF-8-BOM        | Npp.BUFFERENCODING.UTF8
        UCS-2 BE BOM     | Npp.BUFFERENCODING.UCS2BE
        UCS-2 LE BOM     | Npp.BUFFERENCODING.UCS2LE
        CharacterSets>*  | Npp.BUFFERENCODING.COOKIE
        

        I didn’t actually try all the character sets, but I tried a few, and all the ones I tried gave me COOKIE.

        P 1 Reply Last reply Oct 12, 2020, 3:32 PM Reply Quote 3
        • A
          Alan Kilborn
          last edited by Oct 12, 2020, 3:31 PM

          a718156b-df1e-47aa-a6c1-d11397249c58-image.png

          1 Reply Last reply Reply Quote 2
          • P
            PeterJones @PeterJones
            last edited by PeterJones Oct 12, 2020, 3:32 PM Oct 12, 2020, 3:32 PM

            I tried digging into the Github Blames with code history, and as far as I can tell, for at least the last 12 years, he has mapped uniCookie to IDM_FORMAT_AS_UTF_8 (as opposed to uniUTF8 => IDM_FORMAT_UTF_8)

            • current: https://github.com/notepad-plus-plus/notepad-plus-plus/blob/8426c9ccd98157d2712e1b54d54f498a32e6481f/PowerEditor/src/Notepad_plus.cpp#L4439-L4453

            • 12 years ago: https://github.com/notepad-plus-plus/notepad-plus-plus/blob/4dd3b257e06c56093d109f055911addfe93770f7/PowerEditor/src/Notepad_plus.cpp#L5484-L5497

            I cannot find a reason that he called one “cookie” in the commit comments…

            1 Reply Last reply Reply Quote 2
            • A
              Alan Kilborn
              last edited by Oct 12, 2020, 6:56 PM

              So I guess it is fine, if a bit strange, that someone wants to call UTF-8 without BOM “cookie” encoding in their software…

              But, paraphrasing the above table:

              Encoding >       | notepad.getEncoding() retval
              -----------------+-------------------------------
              UTF-8            | Npp.BUFFERENCODING.COOKIE
              CharacterSets>*  | Npp.BUFFERENCODING.COOKIE
              

              How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

              P 1 Reply Last reply Oct 12, 2020, 7:16 PM Reply Quote 1
              • P
                PeterJones @Alan Kilborn
                last edited by Oct 12, 2020, 7:16 PM

                @Alan-Kilborn ,

                I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either.

                But I’m often bad at finding the things in PythonScript that others find easily; maybe @Ekopalypse has insight on this one.

                E A 2 Replies Last reply Oct 12, 2020, 7:26 PM Reply Quote 0
                • E
                  Ekopalypse @PeterJones
                  last edited by Oct 12, 2020, 7:26 PM

                  @PeterJones

                  Sorry, but I have no idea.

                  1 Reply Last reply Reply Quote 0
                  • P
                    PeterJones @PeterJones
                    last edited by Oct 17, 2020, 7:55 PM

                    @PeterJones said in What is BUFFERENCODING.COOKIE ??:

                    I might have a bug in my Perl library

                    I did. It’s been fixed. :-)

                    1 Reply Last reply Reply Quote 0
                    • A
                      Alan Kilborn @PeterJones
                      last edited by Dec 21, 2021, 4:05 PM

                      @alan-kilborn said in What is BUFFERENCODING.COOKIE ??:

                      How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

                      @peterjones said in What is BUFFERENCODING.COOKIE ??:

                      I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either. …maybe @Ekopalypse has insight on this one.

                      @ekopalypse said in What is BUFFERENCODING.COOKIE ??:

                      Sorry, but I have no idea.


                      So…it’s a bit cheesy, but one can read the status bar field that shows the encoding type to get this information (as its text string).

                      We’ve discussed in this forum how to get the status bar data before using a PythonScript, but I’ve revamped my code for doing it so I’ll post the whole thing here.

                      Here’s NotepadGetStatusBar.py with a demo at the end to show the current file’s encoding:

                      # -*- coding: utf-8 -*-
                      from __future__ import print_function
                      
                      from Npp import *
                      import ctypes
                      from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT
                      
                      class NGSB(object):  # implements a "getStatusBar" function (complement to notepad.setStatusBar())
                      
                          def __init__(self):
                              self.SendMessageW = ctypes.windll.user32.SendMessageW
                              LRESULT = LPARAM
                              self.SendMessageW.restype = LRESULT
                              self.SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
                              self.create_unicode_buffer = ctypes.create_unicode_buffer
                              self.curr_class_256 = self.create_unicode_buffer(256)
                              self.STATUSBAR_HANDLE = None
                              self._determine_statusbar_handle()
                              assert self.STATUSBAR_HANDLE
                      
                          def get_statusbar_by_section(self, statusbar_item_number):
                              # statusbar_item_number can be integer 0 thru 5 or one of:
                              #  STATUSBARSECTION.DOCTYPE
                              #  STATUSBARSECTION.DOCSIZE
                              #  STATUSBARSECTION.CURPOS
                              #  STATUSBARSECTION.EOFFORMAT
                              #  STATUSBARSECTION.UNICODETYPE
                              #  STATUSBARSECTION.TYPINGMODE
                              return self._get_statusbar_section(statusbar_item_number)
                      
                          def get_statusbar_as_tuple(self):
                              section_list = [
                                  STATUSBARSECTION.DOCTYPE,
                                  STATUSBARSECTION.DOCSIZE,
                                  STATUSBARSECTION.CURPOS,
                                  STATUSBARSECTION.EOFFORMAT,
                                  STATUSBARSECTION.UNICODETYPE,
                                  STATUSBARSECTION.TYPINGMODE,
                              ]
                              return tuple(list(map(lambda x: self._get_statusbar_section(x), section_list)))
                      
                          def _determine_statusbar_handle(self):
                      
                              WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
                              FindWindowW = ctypes.windll.user32.FindWindowW
                              FindWindowExW = ctypes.windll.user32.FindWindowExW
                              EnumChildWindows = ctypes.windll.user32.EnumChildWindows
                              GetClassNameW = ctypes.windll.user32.GetClassNameW
                      
                              self.STATUSBAR_HANDLE = None
                      
                              def enum_callback_fn(hwnd, lparam):
                                  GetClassNameW(hwnd, self.curr_class_256, 256)
                                  if self.curr_class_256.value.lower() == "msctls_statusbar32":
                                      self.STATUSBAR_HANDLE = hwnd
                                      return False  # stop the enumeration
                                  return True  # continue the enumeration
                      
                              npp_hwnd = FindWindowW(u"Notepad++", None)
                              #print('npp_hwnd:', npp_hwnd)
                              EnumChildWindows(npp_hwnd, WNDENUMPROC(enum_callback_fn), 0)
                              #print('self.STATUSBAR_HANDLE:', self.STATUSBAR_HANDLE)
                      
                          def _get_statusbar_section(self, statusbar_item_number):
                              assert statusbar_item_number <= 5
                              WM_USER = 0x400
                              SB_GETTEXTLENGTHW = WM_USER + 12
                              SB_GETTEXTW = WM_USER + 13
                              SBT_OWNERDRAW = 0x1000
                              retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
                              length = retcode & 0xFFFF
                              type = (retcode >> 16) & 0xFFFF
                              assert (type != SBT_OWNERDRAW)
                              text_buffer = self.create_unicode_buffer(length)
                              retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
                              retval = '{}'.format(text_buffer[:length])
                              return retval
                      
                      if __name__ == '__main__':
                          print(NGSB().get_statusbar_by_section(STATUSBARSECTION.UNICODETYPE))
                      

                      Running the script will print the current file’s encoding (from statusbar information) to the Pythonscript console window.

                      1 Reply Last reply Reply Quote 3
                      • A Alan Kilborn referenced this topic on Dec 22, 2021, 2:24 PM
                      • A Alan Kilborn referenced this topic on Dec 22, 2021, 2:24 PM
                      • A Alan Kilborn referenced this topic on Feb 2, 2024, 12:34 PM
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors