What is BUFFERENCODING.COOKIE ??

Alan Kilborn

I’m aware of what is referred to as “baking cookies” on this site, but I’ve encountered a different type of cookie and I’d like to know where it comes from and what it means.

In a Pythonscript, when I run these commands with a UTF-8 encoded file as the active one, I get the result indicated:

>>> notepad.getEncoding()
Npp.BUFFERENCODING.COOKIE
>>> notepad.getEncoding(notepad.getCurrentBufferID())
Npp.BUFFERENCODING.COOKIE

I kinda expected to get BUFFERENCODING.UTF8 back instead.
Anyone have ideas of the whys and the hows on this?

PeterJones

@Alan-Kilborn ,

I don’t know what it’s meant for – I don’t see it in the list of IDM_FORMAT_xxx contants, which are what I thought that the NPPM_GETBUFFERENCODING message was supposed to return. Hmm, I might have a bug in my Perl library, because it appears those actually return from the enum UniMode, which is enum UniMode {uni8Bit=0, uniUTF8=1, uni16BE=2, uni16LE=3, uniCookie=4, uni7Bit=5, uni16BE_NoBOM=6, uni16LE_NoBOM=7, uniEnd}; according to the source code.

Running a quick experiment, if I set Encoding > UTF-8, notepad.getEncoding() shows Npp.BUFFERENCODING.COOKIE, but if I set Encoding > UTF-8-BOM, it returns Npp.BUFFERENCODING.UTF8.

PeterJones

I ran more experiments

Encoding >       | notepad.getEncoding() retval
-----------------+-------------------------------
ANSI             | Npp.BUFFERENCODING.ENC8BIT
UTF-8            | Npp.BUFFERENCODING.COOKIE
UTF-8-BOM        | Npp.BUFFERENCODING.UTF8
UCS-2 BE BOM     | Npp.BUFFERENCODING.UCS2BE
UCS-2 LE BOM     | Npp.BUFFERENCODING.UCS2LE
CharacterSets>*  | Npp.BUFFERENCODING.COOKIE

I didn’t actually try all the character sets, but I tried a few, and all the ones I tried gave me COOKIE.

Alan Kilborn

PeterJones

I tried digging into the Github Blames with code history, and as far as I can tell, for at least the last 12 years, he has mapped uniCookie to IDM_FORMAT_AS_UTF_8 (as opposed to uniUTF8 => IDM_FORMAT_UTF_8)

I cannot find a reason that he called one “cookie” in the commit comments…

Alan Kilborn

So I guess it is fine, if a bit strange, that someone wants to call UTF-8 without BOM “cookie” encoding in their software…

But, paraphrasing the above table:

Encoding >       | notepad.getEncoding() retval
-----------------+-------------------------------
UTF-8            | Npp.BUFFERENCODING.COOKIE
CharacterSets>*  | Npp.BUFFERENCODING.COOKIE

How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

PeterJones

@Alan-Kilborn ,

I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either.

But I’m often bad at finding the things in PythonScript that others find easily; maybe @Ekopalypse has insight on this one.

Ekopalypse

@PeterJones

Sorry, but I have no idea.

PeterJones

@PeterJones said in What is BUFFERENCODING.COOKIE ??:

I might have a bug in my Perl library

I did. It’s been fixed. :-)

Alan Kilborn

@alan-kilborn said in What is BUFFERENCODING.COOKIE ??:

How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?

@peterjones said in What is BUFFERENCODING.COOKIE ??:

I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either. …maybe @Ekopalypse has insight on this one.

@ekopalypse said in What is BUFFERENCODING.COOKIE ??:

Sorry, but I have no idea.

So…it’s a bit cheesy, but one can read the status bar field that shows the encoding type to get this information (as its text string).

We’ve discussed in this forum how to get the status bar data before using a PythonScript, but I’ve revamped my code for doing it so I’ll post the whole thing here.

Here’s NotepadGetStatusBar.py with a demo at the end to show the current file’s encoding:

# -*- coding: utf-8 -*-
from __future__ import print_function

from Npp import *
import ctypes
from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

class NGSB(object):  # implements a "getStatusBar" function (complement to notepad.setStatusBar())

    def __init__(self):
        self.SendMessageW = ctypes.windll.user32.SendMessageW
        LRESULT = LPARAM
        self.SendMessageW.restype = LRESULT
        self.SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
        self.create_unicode_buffer = ctypes.create_unicode_buffer
        self.curr_class_256 = self.create_unicode_buffer(256)
        self.STATUSBAR_HANDLE = None
        self._determine_statusbar_handle()
        assert self.STATUSBAR_HANDLE

    def get_statusbar_by_section(self, statusbar_item_number):
        # statusbar_item_number can be integer 0 thru 5 or one of:
        #  STATUSBARSECTION.DOCTYPE
        #  STATUSBARSECTION.DOCSIZE
        #  STATUSBARSECTION.CURPOS
        #  STATUSBARSECTION.EOFFORMAT
        #  STATUSBARSECTION.UNICODETYPE
        #  STATUSBARSECTION.TYPINGMODE
        return self._get_statusbar_section(statusbar_item_number)

    def get_statusbar_as_tuple(self):
        section_list = [
            STATUSBARSECTION.DOCTYPE,
            STATUSBARSECTION.DOCSIZE,
            STATUSBARSECTION.CURPOS,
            STATUSBARSECTION.EOFFORMAT,
            STATUSBARSECTION.UNICODETYPE,
            STATUSBARSECTION.TYPINGMODE,
        ]
        return tuple(list(map(lambda x: self._get_statusbar_section(x), section_list)))

    def _determine_statusbar_handle(self):

        WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
        FindWindowW = ctypes.windll.user32.FindWindowW
        FindWindowExW = ctypes.windll.user32.FindWindowExW
        EnumChildWindows = ctypes.windll.user32.EnumChildWindows
        GetClassNameW = ctypes.windll.user32.GetClassNameW

        self.STATUSBAR_HANDLE = None

        def enum_callback_fn(hwnd, lparam):
            GetClassNameW(hwnd, self.curr_class_256, 256)
            if self.curr_class_256.value.lower() == "msctls_statusbar32":
                self.STATUSBAR_HANDLE = hwnd
                return False  # stop the enumeration
            return True  # continue the enumeration

        npp_hwnd = FindWindowW(u"Notepad++", None)
        #print('npp_hwnd:', npp_hwnd)
        EnumChildWindows(npp_hwnd, WNDENUMPROC(enum_callback_fn), 0)
        #print('self.STATUSBAR_HANDLE:', self.STATUSBAR_HANDLE)

    def _get_statusbar_section(self, statusbar_item_number):
        assert statusbar_item_number <= 5
        WM_USER = 0x400
        SB_GETTEXTLENGTHW = WM_USER + 12
        SB_GETTEXTW = WM_USER + 13
        SBT_OWNERDRAW = 0x1000
        retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode & 0xFFFF
        type = (retcode >> 16) & 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = self.create_unicode_buffer(length)
        retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

if __name__ == '__main__':
    print(NGSB().get_statusbar_by_section(STATUSBARSECTION.UNICODETYPE))

Running the script will print the current file’s encoding (from statusbar information) to the Pythonscript console window.