Emulation of the "View > Summary" feature with a Python script
- 
 Hi, All, Remarks : Although most of the regexes, above, can be easily understood, here are some additional elements : - 
The regex (?-s).[\x{D800}-\x{DFFF}]is the sole correct syntax, with our Boost regex engine, to count all the characters over theBMP. But it may fail with the messageRan out of stack space trying to match the regular expression.. Luckily, I do not use it because it can be deduced from the differenceTotal_Standard - Total_BMP
- 
The regex (?s)((?!\s).[\x{D800}-\x{DFFF}]?)+, to count all theNon_Spacestrings, was explained before but may fail with the messageRan out of stack space trying to match the regular expression.
- 
In all the regexes, relative to the counting of lines, you probably noticed the character class [\f\x{0085}\x{2028}\x{2029}]. It must be present because the four characters\f,\x{0085},\x{2028}and\x{2029}are, both, considered as astartand anEndof line, like the assertions^and$!- For instance, if, in a new file, you insert one Next_Line char ( NEL), of code-point\x{0085}and hit theEnterkey, this sole line is wrongly seen as an empty line by the simple regex^(?:\r\n|\r|\n)which matches the line-break after theNext_Linechar !
 
- For instance, if, in a new file, you insert one Next_Line char ( 
 
 Here is the python script, split on two posts # encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.6 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )See next post for continuation ! 
- 
- 
 Continuation of the script : # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'[^\r\n]', number) if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'[\x{0080}-\x{07FF}]', number) if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if Curr_encoding != 'ANSI': Total_4_bytes = Total_standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_EOL + Total_standard # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Bytes_length = Total_EOL + Total_1_byte if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)\n') line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) notepad.new() editor.paste() editor.copyText('') if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
 The way to use this script is quite self-explanatory. Just three points to emphazise : - 
On the BUFFER lengthline, the values between parentheses :- 
Always begin with the number of EOL( I omitted thebafterx 1, on purpose ! )- 
Followed with the number of the 1-BYTEfor anANSIencoded file
- 
Followed with the numbers of the 1-BYTE,2-BYTES,3-BYTESand4-BYTES, for anUTF-8orUTF-8-BOMencoded file
- 
Followed with the numbers of the 2-BYTESand4-BYTES, for anUTF-16 BE BOMorUTF-16 LE BOMencoded file
 
- 
 
- 
- 
Normally, when a file is saved the values BUFFEER lengthandLength on DISKshould always be equal. If not, two cases are possible :- 
This file have been recently modified ( trivial case ) 
- 
The file is not identified with a BOMand has been re-interpreted with an other NON-Unicode encoding. Then, apply the actions, indicated in the pop-up message !
 
- 
- 
For a new #file, some values are obviously absent. These are theMODIFICATION date, theCREATION date, theREAD-ONLYflag and theLength on DISK( size ) values
 Best Regards, guy038 
- 
- 
 @guy038 said in Tests and impressions on the "View > Summary..." functionality: editor.copyText (‘\r\n’.join(line_list)) notepad.new() editor.paste() editor.copyText(‘’) Couldn’t you just do notepad.new() editor.setText('\r\n'.join(line_list))and thus avoid overwriting the user’s clipboard? 
- 
 Hello, All, - 
So, I followed the excellent @mark-olson’s suggestion to bypass the clipboard functionality ! 
- 
Now, in case of a RuntimeError, when searching for the NON-SPACE count of characters, I used an exception which displays a warning message, if theErr_Regexis True. But, even when theErr_Regexvariable is False, the result is not totally guaranteed too, if the analyzed file contains bytes over theBMP.
 So, globally, whatever the Err_Regexstatus, theNON-SPACE countvalue may be increased or decreased by1, in some cases ( still unclear ) !
 Here is the v0.7version of my script ( I indeed gave a version number to my successive attempts ! )# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.7 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )Continuation on next post guy038 
- 
- 
 Hi all, Continuation of version v0.7of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'[^\r\n]', number) if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'[\x{0080}-\x{07FF}]', number) if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if Curr_encoding != 'ANSI': Total_4_bytes = Total_standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_EOL + Total_standard # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Bytes_length = Total_EOL + Total_1_byte if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_Regex = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_Regex = True Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)\n') if Err_Regex == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + ' (ERROR : Ran out of stack space trying to match the regular expressions !)\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
 So, just test this script against any file, to get any possible bug or limitation !! I’ve also heard of compiled regexes in Python. Would that be interesting for this script ? Best Regards, guy038 
- 
 Hi, All, I realized that it was the mess regarding the line_endings, in the Summaryreport. Thus, by defining aLine_endvariable equal to\r\n, the results are more harmonious !One advantage : if you do not want any supplementary line-break, in the Summaryreport, simply change the line :Line_end = '\r\n'by this one : Line_end = ''So, here is the v0.8version of my script :# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.8 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )Continuation on next post guy038 
- 
 Hi all, Continuation of version v0.8of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'[^\r\n]', number) if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'[\x{0080}-\x{07FF}]', number) if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if Curr_encoding != 'ANSI': Total_4_bytes = Total_standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_EOL + Total_standard # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Bytes_length = Total_EOL + Total_1_byte if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_Regex = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_Regex = True Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + Line_end) line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)' + Line_end) if Err_Regex == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + ' (ERROR : Ran out of stack space trying to match the regular expressions !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
 Best Regards, guy038 
- 
 Hi, All, You’ll find, below, the v1.0version of my script. I changed a lot of things :- 
I add a counter to get the execution time of the script, which is written right after the current date, at the beginning of the summary 
- 
I modified some regexes in order to improve their performance as well as the order to search them for 
- 
I used the Pythonscript methods .editor.getLength(),editor.countCharacters(0, editor.getLength())andeditor.getLineCount()to get, respectively, the bytes length ( without a possibleBOM) value, the Total_chars value and the Total_lines value. Note that, in case of anUTF-8orUTF-8-BOMencoded file, we get two relations :- (A) Buffer length - Total_EOL - Total_1_byte - 2 × Total_2_bytes - 3 × Total_3_bytes = 4 × Total_4_bytes
- (B) Total_Chars - Total_EOL - Total_1_byte - Total_2_bytes - Total_3_bytes = Total_4_bytes
 
- (A) 
 So, we can deduce, from the relation A-B, the equations :Total_4_bytes = ( Total_length - Total_chars - Total_2_bytes - 2 × Total_3_bytes ) / 3and then : Total_1_byte = Total_chars - Total_EOL - Total_2_bytes - Total_3_bytes - Total_4_bytesThus, after counting the number of Total_2_bytesandTotal_3_bytes, the two resultsTotal_4_bytesandTotal_1_byteare easily deduced. This new way decreases, from a factor2to3, the execution time of the script, because, most of the time, the file contains only1-bytechars :-))However, the Buffer_lengthvalue wrongly remains the same, in case of anUTF-16 BE BOMorUTF-16 LE BOMencoded file. Thus, I needed to calcul theTotal_4_bytesandBuffer_lengthvalues, from the number ofTotal_2_bytes, with the relations :Total_4_bytes = Total_chars - Total_EOL - Total_2_bytesBytes_length = 2 * Total_EOL + 2 * Total_2_bytes + 4 × Total_4_bytes- 
Now, because some huge files may lead to a long time before getting the Summaryresults ( even with the native N++ version, BTW ! ), you can follow the progression of the different searches on thePythonconsole, which is automatically enabled at beginning of the script and disabled right before outputting the results
- 
At the end of the script, I just replace the notepad.promptmethod by thenotepad.messageBoxmethod in order to display the warning ( more logical ! )
 
 IMPORTANT : - 
Never switch to an other tab when running this script. Else, you’ll probably get unpredictable or negative results ! 
- 
Thus, by viewing the console messages, if you think that the results seem too long to happen for a specific file and that you prefer to abort its Summaryreport, simply stop the currentPythonscript with the classicalPlugins > Python Script > Stop scriptmenu option
 
 Now, I was a bit upset by some inconsistant results regarding the number of NON-SPACEstrings, when current file, with anUnicodeencoding, contains some bytes over theBMPSo, I searched among all my posts, since 2013, as well as some others used as documentation, for only those containing some four-bytescharacters and here is the list of these files with the reported results :•=============================•===========•=================•==================•============•================• | | | Expected | Summary Report | | | | Filename | 4_BYTES | NON-SPACE count | Difference | Encoding | | | | (?:(?!\s).[\x{D800}-\x{DFFF}]?)+ | | | •=============================•===========•=================•==================•============•================• | Symbola_Monospacified.txt | 11,951 | 199,891 | 199,882 | - 9 | UTF-8-BOM | | Total_Chars.txt | 262,136 | 9 | 18 | + 9 | UTF-8-BOM | •=============================•===========•=================•==================•============•================• | Caractères.txt | 2,901 | 7,361 | 7,358 | - 3 | UTF-8-BOM | | Test_2.txt | 1,276 | 8 | 9 | + 1 | UTF-8 | | Test_1.txt | 881 | 8 | 9 | + 1 | UTF-8 | | Plane_0.txt | 0 | 9 | 10 | + 1 | UCS-2 BE BOM | | Clemens.txt | 3,968 | 2,816 | 2,818 | + 2 | UTF-8-BOM | | Planes_0+1.txt | 65,534 | 9 | 12 | + 3 | UTF-8-BOM | •=============================•===========•=================•==================•============•================• | Chars_Over_BMP.txt | 28 | 455 | 455 | 0 | UTF-8-BOM | | Entites_by_Name.txt | 133 | 15,968 | 15,968 | 0 | UTF-8 | | Entites_by_Number.txt | 133 | 15,968 | 15,968 | 0 | UTF-8 | | Invisible_chars.txt | 31 | 3,459 | 3,459 | 0 | UTF-8-BOM | | Osmanya_Tout.txt | 119 | 605 | 605 | 0 | UTF-8-BOM | | Smileys.txt | 1,031 | 10,157 | 10,157 | 0 | UTF-8-BOM | | Alan_K.txt | 114 | 46,082 | 46,082 | 0 | UTF-8 | | Alexolog.txt | 13 | 2,199 | 2,199 | 0 | UTF-8 | | André_Z.txt | 8 | 5,860 | 5,860 | 0 | UTF-8 | | Bidule.txt | 1 | 327 | 327 | 0 | UTF-8 | | Carypt.txt | 1 | 3,551 | 3,551 | 0 | UTF-8 | | Dean_Corso.txt | 761 | 9,632 | 9,632 | 0 | UTF-8 | | Don_Ho.txt | 2 | 41,426 | 41,426 | 0 | UTF-8 | | Durkin.txt | 144 | 4,638 | 4,638 | 0 | UTF-8 | | Dylan.txt | 34 | 2,180 | 2,180 | 0 | UTF-8 | | Furek.txt | 20 | 499 | 499 | 0 | UTF-8 | | Gary_2.txt | 2 | 458 | 458 | 0 | UTF-8 | | Haleba.txt | 5 | 817 | 817 | 0 | UTF-8 | | ImSpecial.txt | 1 | 161 | 161 | 0 | UTF-8 | | Joss.txt | 6 | 105 | 105 | 0 | UTF-8 | | JR.txt | 39 | 1,735 | 1,735 | 0 | UTF-8 | | Mark_Olson.txt | 1 | 3,652 | 3,652 | 0 | UTF-8 | | Minus_Majus.txt | 62 | 9,931 | 9,931 | 0 | UTF-8 | | Niting-jain.txt | 4 | 537 | 537 | 0 | UTF-8 | | PeterCJ.txt | 31 | 37,323 | 37,323 | 0 | UTF-8 | | Petr_jaja.txt | 14 | 3,168 | 3,168 | 0 | UTF-8 | | Pintas.txt | 4 | 614 | 614 | 0 | UTF-8 | | Register.txt | 20 | 242 | 242 | 0 | UTF-8 | | Scott_3.txt | 4 | 42,552 | 42,552 | 0 | UTF-8 | | Skevich.txt | 6 | 715 | 715 | 0 | UTF-8 | | Statistiques.txt | 7 | 9,012 | 9,012 | 0 | UTF-8 | | Summary.txt | 7 | 4,322 | 4,322 | 0 | UTF-8 | | Summary_NEW.txt | 10 | 8,903 | 8,903 | 0 | UTF-8 | | Uzivatel.txt | 2 | 873 | 873 | 0 | UTF-8 | | Xavier_mdq.txt | 13 | 3,652 | 3,652 | 0 | UTF-8 | | Text.txt | 2,400 | 1,000 | 1,000 | 0 | UTF-8 | •============================•============•=================•==================•============•================•From that list, I deduced that the number of NON-space chars is erroneous in very rare cases, especially when current file contains consecutively : - 
All the characters of a font 
- 
All the characters of an Unicoderange
- 
All the characters of all Unicoderanges
 Luckily, in all the other cases, with a random position of these four-byteschars, theSummaryreport always gives the right results, regarding theNON-SPACEcount !
 Here is the v1.0version of my script, split on two posts :# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v1.0 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )Continuation on next post guy038 
- 
- 
 Hi all, Continuation of version v1.0of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 console.show() console.clear() Start_time = time.time() # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- print ('START') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = editor.getLength() Total_chars = editor.countCharacters(0, editor.getLength()) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num print ('EOL') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_standard = Total_chars - Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Total_BMP = Total_standard Total_1_byte = Total_BMP Total_2_bytes = 0 Total_3_bytes = 0 Total_4_bytes = 0 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': num = 0 editor.research(r'[\x{0080}-\x{07FF}]', number) Total_2_bytes = num print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number) Total_3_bytes = num print ('3-BYTES') # ----------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3 Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': num = 0 editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP chars different from '\r' and '\n' Total_2_bytes = num Total_4_bytes = Total_standard - Total_2_bytes Total_BMP = Total_2_bytes Total_1_byte = 0 Total_3_bytes = 0 Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\t|\x20', number) Non_blank_chars = Total_standard - num print ('NON-BLANK') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num print ('WORDS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_regex = True Non_space_count = num print ('NON-SPACE') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^(?:\r\n|\r|\n)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\r|\n)', number) Special_empty = num num = 0 editor.research(r'^(?:\r\n|\r|\n)', number) Default_empty = num Empty_lines = Default_empty - Special_empty print ('EMPTY lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Special_blank = num num = 0 editor.research(r'^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Default_blank = num Blank_lines = Default_blank - Special_blank print ('BLANK lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_lines = editor.getLineCount() num = 0 editor.research(r'(?-s)^.+\z', number) if num == 0: Total_lines = Total_lines - 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- console.hide() line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )') line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append (' CREATION Date : ' + Creation_date) line_list.append (' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append (' READ-ONLY flag : ' + RO_flag) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: if Line_end == '\r\n': line_list.append (Line_end) line_list.append (' NON-Blank Count : ' + str(Non_blank_chars) + Line_end) line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)' + Line_end) if Err_regex == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + ' (Caution as " RuntimeError " occured !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
 Remenber that you can use a shorter summaryreport by changing the line :Line_end = '\r\n'by this one : Line_end = ''Best Regards, guy038 
- 
 
- 
 Hello, @alan-kilborn and All, Following your advice, I included the number of selected words \w+in the last line of thesummaryreport, regarding the different selectionsIf needed, the OP may choose this second syntax, which includes the hyphen, the apostrophe and the Right Single Quotation Mark, when surrounded by word chars, as true words chars ! SEARCH (?:(?<=\w)[-'’](?=\w)|\w)+And thus, replace the line editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))by this one : editor.research(r'(?:(?<=\w)[-'’](?=\w)|\w)+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
 So, here is the v1.1version of my script, split on two posts :# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v1.1 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )Continuation on next post guy038 
- 
 Hi Alan and all, Continuation of version v1.1of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 console.show() console.clear() Start_time = time.time() # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename().decode('utf-8') if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- print ('START') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = editor.getLength() Total_chars = editor.countCharacters(0, editor.getLength()) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num print ('EOL') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_standard = Total_chars - Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Total_BMP = Total_standard Total_1_byte = Total_BMP Total_2_bytes = 0 Total_3_bytes = 0 Total_4_bytes = 0 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': num = 0 editor.research(r'[\x{0080}-\x{07FF}]', number) Total_2_bytes = num print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number) Total_3_bytes = num print ('3-BYTES') # ----------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3 Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': num = 0 editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP chars different from '\r' and '\n' Total_2_bytes = num Total_4_bytes = Total_standard - Total_2_bytes Total_BMP = Total_2_bytes Total_1_byte = 0 Total_3_bytes = 0 Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\t|\x20', number) Non_blank_chars = Total_standard - num print ('NON-BLANK') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_total = num print ('WORDS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_regex = True Non_space_count = num print ('NON-SPACE') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^(?:\r\n|\r|\n)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\r|\n)', number) Special_empty = num num = 0 editor.research(r'^(?:\r\n|\r|\n)', number) Default_empty = num Empty_lines = Default_empty - Special_empty print ('EMPTY lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Special_blank = num num = 0 editor.research(r'^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Default_blank = num Blank_lines = Default_blank - Special_blank print ('BLANK lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_lines = editor.getLineCount() num = 0 editor.research(r'(?-s)^.+\z', number) if num == 0: Total_lines = Total_lines - 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' if Chars_count < 2: Txt_chars = ' selected char, ' else: Txt_chars = ' selected chars, ' if Words_count < 2: Txt_words = ' selected word (' else: Txt_words = ' selected words (' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- console.hide() line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )') line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append (' CREATION Date : ' + Creation_date) line_list.append (' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append (' READ-ONLY flag : ' + RO_flag) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: if Line_end == '\r\n': line_list.append (Line_end) line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + Line_end) line_list.append (' WORDS Count : ' + str(Words_total) + ' (Caution !)' + Line_end) if Err_regex == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + ' (Caution as " RuntimeError " occured !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + Line_end) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------Best Regards, guy038 
- 
 Hello, @alan-kilborn and Python gurus, I’ve just found out a bug when trying to run my script against à “French” file called Numéros( which meansNumbers) :-((
 In that Python section of my script below, it detects if the current tab is associated with a true file, saved on disk, or if the current tab refers to a new #file, not saved yet# -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # --------------------------------------------------------------------------------------------------------------------------------------------------------------
 And unfortunately, if current name contains accentuated characters, like Numéros, it wrongly suppose it’s anew #file !As soon as it is renamed as Numeros, everything is OK againSo, how to recognize the filename even if current file or current path contain NON-ASCIIcharacters ?TIA guy038 
- 
 @guy038 said in Emulation of the "View > Summary" feature with a Python script: how to recognize the filename even if current file or current path contain NON-ASCII characters ? Short answer: This is better done with Python3, i.e., PythonScript 3.x. Then things “just work”. :-) But, for Python2, (and PS 2.x) you can make a call to .encode('utf-8')or.decode('utf-8')– depending upon your circumstance (I’m not commenting on your specific code) – in order to get what you need.Basically, if you have a Python2 string (in a variable s) and you want to get a Unicode string (for things like Windows pathnames with non-trivial characters), uses.decode('utf-8')and to go the other way, where you have a Unicode str (in a variableu) and you want a Python2 str, dou.encode('utf-8').
- 
 Hi, @alan-kilborn, Many thanks for the tip ! I did some Google searches before, but just saw some obscur explanations. But, right now, trying again with this question : How to get "os.path.isfile(Filename)" == True: when Filename contains "NON ASCII" chars ?And reading the first article, named “python - UnicodeEncodeError on joining file name”, on Jan. 05 2010, from the site Stack Overflow, it is textually said, in the middle of the article :So I would first try filename = filename.decode('utf-8') -- that should allow the os.path.join to work
 Now, I won’t bother to re-edit my script with a new version number ! I just changed, in my v1.1version, above, the line :File_name = notepad.getCurrentFilename()by this one : File_name = notepad.getCurrentFilename().decode('utf-8')BR guy038 
- 
 G guy038 referenced this topic on G guy038 referenced this topic on
- 
 Hello, @alan-kilborn and All, Below, the v1.2version of the Python script for an enhancedSummaryfeature :- 
I decomposed the total number of chars in 3parts : EOL chars, Space and Tab chars and True chars ([^\t\x20\r\n])
- 
I also decomposed the total number of word chars in 3parts : letters chars, digits chars and low_line chars
- 
I added a count of the paragraphs ( You may adapt the corresponding regex to your needs ) 
- 
I added a count of the sentences ( You may adapt the corresponding regex to your needs ) 
- 
I added some remarks at the end of the summary report, regarding the global accurancy of some results ! 
 
 Now, Alan, I needed to change this part, regarding the selections : for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += numby this one : for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += numBecause, if the unique zero-length selection was on a pure empty line, it did write, as expected, the message : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY rangeBut if this unique zero-length selection was on a non-empty line, it would wrongly write, for example : 0 selected char, **`568`** selected words (0 selected byte) in 1 EMPTY rangeGiven that the total file contains 568words
 So, here is the v1.2version of my script, split on two posts :# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v1.2 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE ) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 console.show() console.clear() Start_time = time.time() # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename().decode('utf-8') if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES'Continuation on next post guy038 
- 
- 
 Hi @alan-kilborn and all, Continuation of version v1.2of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- print ('START') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = editor.getLength() Total_chars = editor.countCharacters(0, editor.getLength()) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\n|\r', number) Total_EOL = num print ('EOL') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\t|\x20', number) Blank_chars = num print ('BLANK') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_standard = Total_chars - Total_EOL True_chars = Total_chars - Total_EOL - Blank_chars # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Total_BMP = Total_standard Total_1_byte = Total_BMP Total_2_bytes = 0 Total_3_bytes = 0 Total_4_bytes = 0 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': num = 0 editor.research(r'[\x{0080}-\x{07FF}]', number) Total_2_bytes = num print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number) Total_3_bytes = num print ('3-BYTES') # ----------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3 Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': num = 0 editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP chars different from '\r' and '\n' Total_2_bytes = num Total_4_bytes = Total_standard - Total_2_bytes Total_BMP = Total_2_bytes Total_1_byte = 0 Total_3_bytes = 0 Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\d', number) Number_chars = num print ('NUMBERS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'_', number) Lowline_chars = num print ('LOW_LINES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w', number) Word_chars = num print ('WORDS') Letter_chars = Word_chars - Number_chars - Lowline_chars # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_total = num print ('WORDS+') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_non_space = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_regex_non_space = True Non_space_count = num print ('NON-SPACE+') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_sentence = False num = 0 try: editor.research(r'(?-s)(?:\A|(?<=[\h\r\n.?!])).+?(?:(?=[.?!](\h|\R|\z))|(?=\R|\z))', number) except RuntimeError: Err_regex_sentence = True Sentence_count = num print ('SENTENCES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_paragraph = False num = 0 try: editor.research(r'(?-s)(?:(?:.[\x{D800}-\x{DFFF}]?)+(?:\r\n|\n|\r))+(?:\r\n|\n|\r){1,}(?:(?:.[\x{D800}-\x{DFFF}]?)+\z)?|(?:.[\x{D800}-\x{DFFF}]?)+\z', number) except RuntimeError: Err_regex_paragraph = True Paragraph_count = num print ('PARAGRAPHS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^(?:\r\n|\n|\r)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\n|\r)', number) Special_empty = num num = 0 editor.research(r'^(?:\r\n|\n|\r)', number) Default_empty = num Empty_lines = Default_empty - Special_empty print ('EMPTY lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^[\t\x20]+(?:\r\n|\n|\r|\z)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\n|\r|\z)', number) Special_blank = num num = 0 editor.research(r'^[\t\x20]+(?:\r\n|\n|\r|\z)', number) Default_blank = num Blank_lines = Default_blank - Special_blank print ('BLANK lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_lines = editor.getLineCount() num = 0 editor.research(r'(?-s)^.+\z', number) if num == 0: Total_lines = Total_lines - 1 # Because LAST line totally EMPTY # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' if Chars_count < 2: Txt_chars = ' selected char, ' else: Txt_chars = ' selected chars, ' if Words_count < 2: Txt_words = ' selected word (' else: Txt_words = ' selected words (' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- console.hide() line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )') line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append (' CREATION Date : ' + Creation_date) line_list.append (' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append (' READ-ONLY flag : ' + RO_flag) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard) + Line_end * 2) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL)) line_list.append (' SPC & TAB Chars : ' + str(Blank_chars)) line_list.append (' TRUE Chars : ' + str(True_chars) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: if Line_end == '\r\n': line_list.append (Line_end) line_list.append (' NUMBER Chars : ' + str(Number_chars) + '\t(*)') line_list.append (' LOW_LINE Chars : ' + str(Lowline_chars)) line_list.append (' LETTER Chars : ' + str(Letter_chars) + '\t(*)' + Line_end) line_list.append (' WORD Chars : ' + str(Word_chars) + '\t(*)' + Line_end * 2) line_list.append (' WORDS Count : ' + str(Words_total) + '\t(*)' + Line_end) if Err_regex_non_space == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\t(**)' + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2) if Err_regex_sentence == False: line_list.append (' SENTENCES Count : ' + str(Sentence_count) + '\t(**)' + Line_end) else: line_list.append (' SENTENCES Count : ' + str(Sentence_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end) if Err_regex_paragraph == False: line_list.append (' PARAGRAPHS Count : ' + str(Paragraph_count) + '\t(**)' + Line_end * 2) else: line_list.append (' PARAGRAPHS Count : ' + str(Paragraph_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + '\r\n' + Line_end) line_list.append (' (*) Our BOOST regex engine ignore all WORD, NUMBER and LETTER characters over the BMP and may ignore some others within the BMP !') line_list.append (' (**) The results may NOT be very accurate for "technical" or "non-regular" files !' + Line_end) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------Best Regards, guy038 
- 
 @guy038 said : But if this unique zero-length selection was on a non-empty line, it would wrongly write… I removed the if Bytes_count != 0:and tried to replicate the problem you indicated, but did not see the same issue. Can you provide more detail on your “steps to reproduce”?
 Also, this line of your script gave me an error under Python3: File_name = notepad.getCurrentFilename().decode('utf-8')Here’s a way to make it work under Python2 or 3: import sys python3 = sys.version_info.major == 3 if python3: File_name = notepad.getCurrentFilename() else: File_name = notepad.getCurrentFilename().decode('utf-8')
- 
 Hi, @alan-kilborn and All, Ah… OK. No problem ! So, this script will work with both Python script 2and3, nice !
 Regarding the bug, I can reproduce it very easily ! So, we use this part of the script, relative to selections, where I put the line if Bytes_count != 0:in comments :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 # if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # --------------------------------------------------------------------------------------------------------------------------------------------------------------Then : - 
Open, let’s say, the license.txtfile
- 
Move the caret to the very beginning of the license.txtfile ( so, before the letter C of the wordCOPYING)
- 
Do not do any selection 
- 
Run the script 
 => You should see, in the SELECTION(S)line, a non-null number of words :SELECTION(S) : 0 selected char, 5822 selected words (0 selected byte) in 1 EMPTY range- 
Now, just move the caret one character on the right ( so, between the C and the O letters of the word COPYING)
- 
Do not do any selection, again 
- 
Re-run the script 
 => This time, we get, for the SELECTION(S)line, the expected results :SELECTION(S) : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY rangeAt first sight, this bug occurs only when the caret is at the very beginning of current file ! Once, you’ll find an explanation ( if any ! ), I will post the new version of the script. BR guy038 P.S. : May be, this bug do not occur with Python script 3?
- 
- 
 @guy038 said: You should see, in the SELECTION(S) line, a non-null number of words Well, I tried, using both PS3 and PS2, using license file and code change of: #if Bytes_count != 0:, and I still see in the output:SELECTION(S) : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range

