"Summary" feature improvement
-
Hello, @alan-kilborn and All,
Alan, first of all, I could have told you that I didn’t want to bother modifying your script and that I’d integrated it, as is. But, the truth is that I’m still a long way from understanding your script and seing any possible simplifications :-((
So, here is my final version of this Python script which can be used instead of the
View > Summary
feature. It contains the @alan-kilborn section which reads the right part of thestatus-bar
, relative to the current encodingI have to split this script into two consecutive posts !
# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.4 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )
See next post for continuation !
-
Hi @alan-kilborn and All,
Continuation of my script :
# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Line_title = 93 else: Line_title = 71 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UCS-2 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UCS-2 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'[^\r\n]', number) if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'[\x{0080}-\x{07FF}]', number) if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: editor.research(r'[^\r\n]', number) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_Standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Total_4_bytes = Total_Standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_Standard + Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = Total_EOL + Total_1_byte # Default ANSI if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: Bytes_length = 2 * Total_chars if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if notepad.getEncoding() == BUFFERENCODING.UTF8: BOM = 3 if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) else: editor.research(r'((?!\s).)+', number) Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_Standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' * 1 + ' + str(Total_1_byte) + ' * 1b + '\ + str(Total_2_bytes) + ' * 2b + ' + str(Total_3_bytes) + ' * 3b + ' + str(Total_4_bytes) + ' * 4b)') if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 2b)') if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 1b)') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)\n') line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) notepad.new() editor.paste() if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UCS-2 BE BOM' and St_bar != 'UCS-2 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
-
Hi, Alan and All,
( Continuation of the previous post )
Now, I’ve come accross a problem with the encodings !
Have you ever noticed that, when you decide to re-interpret the present encoding of a file with the
View > Character Set > ...
feature, that there are two possible scenarios ?-
A) - The present econding is an Unicode encoding with a BOM (
Byte Order Mark
). So, either theUTF-8-BOM
,UCS-2 BE BOM
orUCS-2 LE BOM
encoding -
B) - The present encoding is an
ANSI
orUTF-8
file, so without aBOM
In the first case, whatever the new encoding chosen (
one-byte
ortwo-bytes
encoding ), the file contents do not change and my script just respects the real encoding of the current fileFor example, with an
UCS-2 LE BOM
encoded file, if I change its encoding toView > Character Set > Western European > OEM-US
, my new summary just consider that it’s still a trueUCS-2 LE BOM
encoded file, leading to a correct summary report !In the second case, the new encoding chosen does modify the current file contents in the editor window. In addition, it automatically supposes that the current file is an
UTF-8
encoded file, leading to erroneous results in the summary rapport :-( However, the current file contents, saved on the disk, seem still unchanged !!For instance :
-
Open a new tab
-
Use the
Encoding > Convert to UTF-8
feature, if necessary -
Enter the four chars
Aé☀𝜜
, without any line-break, at the end -
Save this file as
Test-UTF8.txt
-
Using my script, you get, in a new tab :
--------------------------------------------------------------------------------------------- SUMMARY on 2024-02-05 16:50:23.656000 --------------------------------------------------------------------------------------------- FULL File Path : D:\@@\792\Test-UTF8.txt CREATION Date : Mon Feb 5 16:45:24 2024 MODIFICATION Date : Mon Feb 5 15:17:02 2024 READ-ONLY flag : NO READ-ONLY editor : NO Current VIEW : MAIN View Current ENCODING : UTF-8 Current LANGUAGE : TXT (Normal text file) Current Line END : Windows (CR LF) Current WRAPPING : YES 1-BYTE Chars : 1 2-BYTES Chars : 1 3-BYTES Chars : 1 Sum BMP Chars : 3 4-BYTES Chars : 1 CHARS w/o CR & LF : 4 EOL ( CR or LF ) : 0 TOTAL characters : 4 BYTES Length : 10 (0 * 1 + 1 * 1b + 1 * 2b + 1 * 3b + 1 * 4b) Byte Order Mark : 0 BUFFER Length : 10 Length on DISK : 10 NON-Blank Chars : 4 WORDS Count : 1 (Caution !) NON-SPACE Count : 1 True EMPTY lines : 0 True BLANK lines : 0 EMPTY/BLANK lines : 0 NON-BLANK lines : 1 TOTAL Lines : 1 SELECTION(S) : 0 selected char (0 selected byte) in 1 EMPTY range
Everything is OK (
buffer length
andlength on disk
are identical and thebytes length
description shows one char for each number of bytes, without any EOL )-
Now, switch back to the
Test-UTF8.txt
file -
Run the
View > Character Set > Western European > OEM-US
feature -
Re-run my script. This time, in a other new tab, you get :
--------------------------------------------------------------------------------------------- SUMMARY on 2024-02-05 16:51:16.937000 --------------------------------------------------------------------------------------------- FULL File Path : D:\@@\792\Test-UTF8.txt CREATION Date : Mon Feb 5 16:45:24 2024 MODIFICATION Date : Mon Feb 5 15:17:02 2024 READ-ONLY flag : NO READ-ONLY editor : NO Current VIEW : MAIN View Current ENCODING : UTF-8 Current LANGUAGE : TXT (Normal text file) Current Line END : Windows (CR LF) Current WRAPPING : YES 1-BYTE Chars : 1 2-BYTES Chars : 6 3-BYTES Chars : 3 Sum BMP Chars : 10 4-BYTES Chars : 0 CHARS w/o CR & LF : 10 EOL ( CR or LF ) : 0 TOTAL characters : 10 BYTES Length : 22 (0 * 1 + 1 * 1b + 6 * 2b + 3 * 3b + 0 * 4b) Byte Order Mark : 0 BUFFER Length : 22 Length on DISK : 10 NON-Blank Chars : 10 WORDS Count : 2 (Caution !) NON-SPACE Count : 1 True EMPTY lines : 0 True BLANK lines : 0 EMPTY/BLANK lines : 0 NON-BLANK lines : 1 TOTAL Lines : 1 SELECTION(S) : 0 selected char (0 selected byte) in 1 EMPTY range
And, at the same time, a prompt displays this warning :
CURRENT file re-interpreted as OEM-US => Possible ERRONEOUS results
So, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script
Indeed, this time, as the file contents are unchanged, the
length on DISK
is still correct but theBUFFER length
is wrong, due to the re-interpretation of the characters by theOEM-US
encoding. That’s why I preferred to add this warning at the end of the script !Now, do as it is said :
-
Close the
Test-UTF8.txt
file (Ctrl + W
) -
Restore it (
Ctrl + Shift + T
) -
Again, you get the
UTF-8
indication, for theTest-UTF8.txt
file, at right of the status bar -
Re-run my script
=> This time, we get again a correct summary, without any prompt !
Alan or other
python
gurus, feel free to improve this last version and/or test on various files if all the numbers shown are coherent !Best Regards,
guy038
-
-
Hi All,
I"ve just realized that, up to now, I simply improved my script with an old version of N++ (
v7.9.2
). I apologize…So, I’m first going to update my last portable version, on my
W10
laptop, fromv8.5.4
to thev8.6.2
version and I will update my script and redo all the testsSee you later !
BR
guy038
-
@guy038 said in Improved version of the "Summary" feature, ...:
I"ve just realized that, up to now, I simply improved my script with an old version of N++ ( v7.9.2 ). I apologize…
:-(
You ought to close out these ancient versions…permanently.
-
Hello, @alan-kilborn and All,
I’e just discovered that, since the
v8.0
N++ version, theUCS-2 BE BOM
andUCS-2 LE BOM
encodings are able to handle all the characters over the BMP. Thus, these encoding were renamed, respectively, asUTF-16 BE BOM
andUTF-16 LE BOM
!Note that, with these two encodings, each character with code
> \x{FFFF}
is built with the surrogate pair mechanism, so with two16-bytes
chars. Consequently, the total number of characters in the buffer = 2 (BOM
) + number of chars<= x{FFFF}
x 2 + number of chars> x{FFFF}
x 4For example, the simple string
Aé☀𝜜
, without anyEOL
, in anUTF-16 BE
encoding file, is coded with12
bytes as :FE FF 00 41 00 E9 26 00 D8 35 DF 1C ----- ----- ----- ----- ----------- BOM A é ☀ 𝜜
So, here is my final and updated version of the script, which works in all versions since the
v8.0
one !# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.5 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )
See next post for continuation !
-
Hi, @alan-kilborn and All,
Continuation of the script :
# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'[^\r\n]', number) if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'[\x{0080}-\x{07FF}]', number) if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if Curr_encoding != 'ANSI': Total_4_bytes = Total_standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_EOL + Total_standard # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Bytes_length = Total_EOL + Total_1_byte if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'((?!\s).)+', number) else: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)\n') line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) notepad.new() editor.paste() editor.copyText('') if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
If you’re still working or doing tests wih a N++ version prior to
v8.0
:-
First, change any sub-string
UTF-16
withUCS-2
, in the python script -
And, of course, do not forget to get rid of any character over
\x{FFFF}
in yourUCS-2 BE/LE BOM
encoded files, before using this script
Note, that the encoding problem, described two posts ago, when trying to encode any file, without a
BOM
, with aEncoding > Character Set > ...
encoding, stll remains. Thus, the warning prompt is still present at the end of this final version !
Now, I’m going to update an old post where I explained the poor performance of the present
summary
feature. I’ll take the opportunity to include the instructions for understanding this improved script !Best Regards,
guy038
-
-
You have this line in your script:
line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
I would suggest changing it to:
line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
This is because, without the
int
, under Python3 we see the following error:TypeError: can't multiply sequence by non-int of type 'float'
-
Hi, @alan-kilborn and All,
Just follow this link to find out why I decided to improve the
View > Summary
feature and to get the last version of the Python script, wich gives us a decent and exactSummary
feature !https://community.notepad-plus-plus.org/post/92794 ( 4 posts )
BR
guy038
-
@guy038 said:
Just follow this link
I’m MIGHTY confused as to why you felt the need to reanimate a several-years-old topic/thread to continue discussing what you dedicated this current thread to…
Why not just keep talking here? -
Hello, @alan-kilborn,
Sorry to get you confused. I’ll try to explain why I wanted to continue on the other thread !
-
Firsly, I wanted to show from where and why my script came : the whole logic of the
View > Summary
needed to be completely rebuilt :-(( -
Secondly, I wanted to update these old posts. Indeed, at that time, the
v7.9.1
N++ version was just released. So, I recently did some tests to verify if, consecutively to the encoding improvements of thev8.0
version, the global logic of thesummary
has been improved. Unfortunately, theView > Summary
feature still gives wrong results, especially when the present file is aUTF-16 BE BOM
orUTF-16 LE BOM
encoded file :-(( -
Thus, it seemed obvious to me to continue on this thread and add the consecutive versions of my script !
Now, I realized that I could have stayed with this new thread, and put a link to my initial post to help people to understand the reasons of this Python script !
So, unless you’re terribly upset of my decision ( which would need a lot of modifications ) , I suppose that I’m going on to post the possible new versions of my script on the other thread !
In order to get it more clear, I could simply rename this present thread as
Summary feature improvement
and rename the other thread asEmulation of the "Summary" feature with Python script
Alan, what do you think of ?
Best Regards,
guy038
-
-
@guy038 said in Improved version of the "Summary" feature, ...:
what do you think of ?
I wouldn’t bother trying to rename things at this point.
It’s no problem simply because I was confused (that’s MY problem). :-)
Carry on… :-)