What is BUFFERENCODING.COOKIE ??
-
I’m aware of what is referred to as “baking cookies” on this site, but I’ve encountered a different type of cookie and I’d like to know where it comes from and what it means.
In a Pythonscript, when I run these commands with a UTF-8 encoded file as the active one, I get the result indicated:
>>> notepad.getEncoding() Npp.BUFFERENCODING.COOKIE >>> notepad.getEncoding(notepad.getCurrentBufferID()) Npp.BUFFERENCODING.COOKIE
I kinda expected to get
BUFFERENCODING.UTF8
back instead.
Anyone have ideas of the whys and the hows on this? -
I don’t know what it’s meant for – I don’t see it in the list of IDM_FORMAT_xxx contants, which are what I thought that the NPPM_GETBUFFERENCODING message was supposed to return. Hmm, I might have a bug in my Perl library, because it appears those actually return from the
enum UniMode
, which isenum UniMode {uni8Bit=0, uniUTF8=1, uni16BE=2, uni16LE=3, uniCookie=4, uni7Bit=5, uni16BE_NoBOM=6, uni16LE_NoBOM=7, uniEnd};
according to the source code.Running a quick experiment, if I set Encoding > UTF-8,
notepad.getEncoding()
shows Npp.BUFFERENCODING.COOKIE, but if I set Encoding > UTF-8-BOM, it returns Npp.BUFFERENCODING.UTF8. -
I ran more experiments
Encoding > | notepad.getEncoding() retval -----------------+------------------------------- ANSI | Npp.BUFFERENCODING.ENC8BIT UTF-8 | Npp.BUFFERENCODING.COOKIE UTF-8-BOM | Npp.BUFFERENCODING.UTF8 UCS-2 BE BOM | Npp.BUFFERENCODING.UCS2BE UCS-2 LE BOM | Npp.BUFFERENCODING.UCS2LE CharacterSets>* | Npp.BUFFERENCODING.COOKIE
I didn’t actually try all the character sets, but I tried a few, and all the ones I tried gave me COOKIE.
-
-
I tried digging into the Github Blames with code history, and as far as I can tell, for at least the last 12 years, he has mapped
uniCookie
toIDM_FORMAT_AS_UTF_8
(as opposed touniUTF8
=>IDM_FORMAT_UTF_8
)I cannot find a reason that he called one “cookie” in the commit comments…
-
So I guess it is fine, if a bit strange, that someone wants to call UTF-8 without BOM “cookie” encoding in their software…
But, paraphrasing the above table:
Encoding > | notepad.getEncoding() retval -----------------+------------------------------- UTF-8 | Npp.BUFFERENCODING.COOKIE CharacterSets>* | Npp.BUFFERENCODING.COOKIE
How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?
-
I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either.
But I’m often bad at finding the things in PythonScript that others find easily; maybe @Ekopalypse has insight on this one.
-
Sorry, but I have no idea.
-
@PeterJones said in What is BUFFERENCODING.COOKIE ??:
I might have a bug in my Perl library
I did. It’s been fixed. :-)
-
@alan-kilborn said in What is BUFFERENCODING.COOKIE ??:
How is one to tell the difference, in Pythonscript, if the current file’s encoding is UTF-8 without BOM, or for example, ISO-8859-1 ?
@peterjones said in What is BUFFERENCODING.COOKIE ??:
I thought there would be a way to read back the character set, distinct from the GETBUFFERENCODING message… but I cannot find a message that does that. Barring that, I thought that the scintilla object might have that info somewhere, but I cannot find it there, either. …maybe @Ekopalypse has insight on this one.
@ekopalypse said in What is BUFFERENCODING.COOKIE ??:
Sorry, but I have no idea.
So…it’s a bit cheesy, but one can read the status bar field that shows the encoding type to get this information (as its text string).
We’ve discussed in this forum how to get the status bar data before using a PythonScript, but I’ve revamped my code for doing it so I’ll post the whole thing here.
Here’s
NotepadGetStatusBar.py
with a demo at the end to show the current file’s encoding:# -*- coding: utf-8 -*- from __future__ import print_function from Npp import * import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT class NGSB(object): # implements a "getStatusBar" function (complement to notepad.setStatusBar()) def __init__(self): self.SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM self.SendMessageW.restype = LRESULT self.SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] self.create_unicode_buffer = ctypes.create_unicode_buffer self.curr_class_256 = self.create_unicode_buffer(256) self.STATUSBAR_HANDLE = None self._determine_statusbar_handle() assert self.STATUSBAR_HANDLE def get_statusbar_by_section(self, statusbar_item_number): # statusbar_item_number can be integer 0 thru 5 or one of: # STATUSBARSECTION.DOCTYPE # STATUSBARSECTION.DOCSIZE # STATUSBARSECTION.CURPOS # STATUSBARSECTION.EOFFORMAT # STATUSBARSECTION.UNICODETYPE # STATUSBARSECTION.TYPINGMODE return self._get_statusbar_section(statusbar_item_number) def get_statusbar_as_tuple(self): section_list = [ STATUSBARSECTION.DOCTYPE, STATUSBARSECTION.DOCSIZE, STATUSBARSECTION.CURPOS, STATUSBARSECTION.EOFFORMAT, STATUSBARSECTION.UNICODETYPE, STATUSBARSECTION.TYPINGMODE, ] return tuple(list(map(lambda x: self._get_statusbar_section(x), section_list))) def _determine_statusbar_handle(self): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW self.STATUSBAR_HANDLE = None def enum_callback_fn(hwnd, lparam): GetClassNameW(hwnd, self.curr_class_256, 256) if self.curr_class_256.value.lower() == "msctls_statusbar32": self.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) #print('npp_hwnd:', npp_hwnd) EnumChildWindows(npp_hwnd, WNDENUMPROC(enum_callback_fn), 0) #print('self.STATUSBAR_HANDLE:', self.STATUSBAR_HANDLE) def _get_statusbar_section(self, statusbar_item_number): assert statusbar_item_number <= 5 WM_USER = 0x400 SB_GETTEXTLENGTHW = WM_USER + 12 SB_GETTEXTW = WM_USER + 13 SBT_OWNERDRAW = 0x1000 retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = self.create_unicode_buffer(length) retcode = self.SendMessageW(self.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval if __name__ == '__main__': print(NGSB().get_statusbar_by_section(STATUSBARSECTION.UNICODETYPE))
Running the script will print the current file’s encoding (from statusbar information) to the Pythonscript console window.
-
-
-