Emulation of the "View > Summary" feature with a Python script
-
Hi Alan and all,
Continuation of version
v1.1
of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 console.show() console.clear() Start_time = time.time() # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename().decode('utf-8') if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- print ('START') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = editor.getLength() Total_chars = editor.countCharacters(0, editor.getLength()) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num print ('EOL') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_standard = Total_chars - Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Total_BMP = Total_standard Total_1_byte = Total_BMP Total_2_bytes = 0 Total_3_bytes = 0 Total_4_bytes = 0 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': num = 0 editor.research(r'[\x{0080}-\x{07FF}]', number) Total_2_bytes = num print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number) Total_3_bytes = num print ('3-BYTES') # ----------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3 Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': num = 0 editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP chars different from '\r' and '\n' Total_2_bytes = num Total_4_bytes = Total_standard - Total_2_bytes Total_BMP = Total_2_bytes Total_1_byte = 0 Total_3_bytes = 0 Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\t|\x20', number) Non_blank_chars = Total_standard - num print ('NON-BLANK') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_total = num print ('WORDS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_regex = True Non_space_count = num print ('NON-SPACE') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^(?:\r\n|\r|\n)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\r|\n)', number) Special_empty = num num = 0 editor.research(r'^(?:\r\n|\r|\n)', number) Default_empty = num Empty_lines = Default_empty - Special_empty print ('EMPTY lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Special_blank = num num = 0 editor.research(r'^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Default_blank = num Blank_lines = Default_blank - Special_blank print ('BLANK lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_lines = editor.getLineCount() num = 0 editor.research(r'(?-s)^.+\z', number) if num == 0: Total_lines = Total_lines - 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' if Chars_count < 2: Txt_chars = ' selected char, ' else: Txt_chars = ' selected chars, ' if Words_count < 2: Txt_words = ' selected word (' else: Txt_words = ' selected words (' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- console.hide() line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )') line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append (' CREATION Date : ' + Creation_date) line_list.append (' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append (' READ-ONLY flag : ' + RO_flag) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: if Line_end == '\r\n': line_list.append (Line_end) line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + Line_end) line_list.append (' WORDS Count : ' + str(Words_total) + ' (Caution !)' + Line_end) if Err_regex == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + ' (Caution as " RuntimeError " occured !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + Line_end) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
Best Regards,
guy038
-
Hello, @alan-kilborn and Python gurus,
I’ve just found out a bug when trying to run my script against à “French” file called
Numéros
( which meansNumbers
) :-((
In that Python section of my script below, it detects if the current tab is associated with a true file, saved on disk, or if the current tab refers to a
new #
file, not saved yet# -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # --------------------------------------------------------------------------------------------------------------------------------------------------------------
And unfortunately, if current name contains accentuated characters, like
Numéros
, it wrongly suppose it’s anew #
file !As soon as it is renamed as
Numeros
, everything is OK againSo, how to recognize the filename even if current file or current path contain
NON-ASCII
characters ?TIA
guy038
-
@guy038 said in Emulation of the "View > Summary" feature with a Python script:
how to recognize the filename even if current file or current path contain NON-ASCII characters ?
Short answer: This is better done with Python3, i.e., PythonScript 3.x. Then things “just work”. :-)
But, for Python2, (and PS 2.x) you can make a call to
.encode('utf-8')
or.decode('utf-8')
– depending upon your circumstance (I’m not commenting on your specific code) – in order to get what you need.Basically, if you have a Python2 string (in a variable
s
) and you want to get a Unicode string (for things like Windows pathnames with non-trivial characters), uses.decode('utf-8')
and to go the other way, where you have a Unicode str (in a variableu
) and you want a Python2 str, dou.encode('utf-8')
. -
Hi, @alan-kilborn,
Many thanks for the tip ! I did some Google searches before, but just saw some obscur explanations. But, right now, trying again with this question :
How to get "os.path.isfile(Filename)" == True: when Filename contains "NON ASCII" chars ?
And reading the first article, named “python - UnicodeEncodeError on joining file name”, on Jan. 05 2010, from the site
Stack Overflow
, it is textually said, in the middle of the article :So I would first try filename = filename.decode('utf-8') -- that should allow the os.path.join to work
Now, I won’t bother to re-edit my script with a new version number ! I just changed, in my
v1.1
version, above, the line :File_name = notepad.getCurrentFilename()
by this one :
File_name = notepad.getCurrentFilename().decode('utf-8')
BR
guy038
-
-
Hello, @alan-kilborn and All,
Below, the
v1.2
version of the Python script for an enhancedSummary
feature :-
I decomposed the total number of chars in
3
parts : EOL chars, Space and Tab chars and True chars ([^\t\x20\r\n]
) -
I also decomposed the total number of word chars in
3
parts : letters chars, digits chars and low_line chars -
I added a count of the paragraphs ( You may adapt the corresponding regex to your needs )
-
I added a count of the sentences ( You may adapt the corresponding regex to your needs )
-
I added some remarks at the end of the summary report, regarding the global accurancy of some results !
Now, Alan, I needed to change this part, regarding the selections :
for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num
by this one :
for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num
Because, if the unique zero-length selection was on a pure empty line, it did write, as expected, the message :
0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range
But if this unique zero-length selection was on a non-empty line, it would wrongly write, for example :
0 selected char, **`568`** selected words (0 selected byte) in 1 EMPTY range
Given that the total file contains
568
words
So, here is the
v1.2
version of my script, split on two posts :# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v1.2 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time, datetime import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE ) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 console.show() console.clear() Start_time = time.time() # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF-8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UTF-16 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UTF-16 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': Line_title = 95 else: Line_title = 75 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename().decode('utf-8') if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES'
Continuation on next post
guy038
-
-
Hi @alan-kilborn and all,
Continuation of version
v1.2
of the script :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- print ('START') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = editor.getLength() Total_chars = editor.countCharacters(0, editor.getLength()) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\n|\r', number) Total_EOL = num print ('EOL') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\t|\x20', number) Blank_chars = num print ('BLANK') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_standard = Total_chars - Total_EOL True_chars = Total_chars - Total_EOL - Blank_chars # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'ANSI': Total_BMP = Total_standard Total_1_byte = Total_BMP Total_2_bytes = 0 Total_3_bytes = 0 Total_4_bytes = 0 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': num = 0 editor.research(r'[\x{0080}-\x{07FF}]', number) Total_2_bytes = num print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number) Total_3_bytes = num print ('3-BYTES') # ----------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3 Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': num = 0 editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number) # ALL BMP chars different from '\r' and '\n' Total_2_bytes = num Total_4_bytes = Total_standard - Total_2_bytes Total_BMP = Total_2_bytes Total_1_byte = 0 Total_3_bytes = 0 Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes print ('2-BYTES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if Curr_encoding == 'UTF-8-BOM': BOM = 3 if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\d', number) Number_chars = num print ('NUMBERS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'_', number) Lowline_chars = num print ('LOW_LINES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w', number) Word_chars = num print ('WORDS') Letter_chars = Word_chars - Number_chars - Lowline_chars # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_total = num print ('WORDS+') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_non_space = False num = 0 if Curr_encoding == 'ANSI' or Total_4_bytes == 0: editor.research(r'\S+', number) else: try: editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number) except RuntimeError: Err_regex_non_space = True Non_space_count = num print ('NON-SPACE+') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_sentence = False num = 0 try: editor.research(r'(?-s)(?:\A|(?<=[\h\r\n.?!])).+?(?:(?=[.?!](\h|\R|\z))|(?=\R|\z))', number) except RuntimeError: Err_regex_sentence = True Sentence_count = num print ('SENTENCES') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Err_regex_paragraph = False num = 0 try: editor.research(r'(?-s)(?:(?:.[\x{D800}-\x{DFFF}]?)+(?:\r\n|\n|\r))+(?:\r\n|\n|\r){1,}(?:(?:.[\x{D800}-\x{DFFF}]?)+\z)?|(?:.[\x{D800}-\x{DFFF}]?)+\z', number) except RuntimeError: Err_regex_paragraph = True Paragraph_count = num print ('PARAGRAPHS') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^(?:\r\n|\n|\r)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\n|\r)', number) Special_empty = num num = 0 editor.research(r'^(?:\r\n|\n|\r)', number) Default_empty = num Empty_lines = Default_empty - Special_empty print ('EMPTY lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if Curr_encoding == 'ANSI': editor.research(r'\f^[\t\x20]+(?:\r\n|\n|\r|\z)', number) else: editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\n|\r|\z)', number) Special_blank = num num = 0 editor.research(r'^[\t\x20]+(?:\r\n|\n|\r|\z)', number) Default_blank = num Blank_lines = Default_blank - Special_blank print ('BLANK lines') # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_lines = editor.getLineCount() num = 0 editor.research(r'(?-s)^.+\z', number) if num == 0: Total_lines = Total_lines - 1 # Because LAST line totally EMPTY # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' if Chars_count < 2: Txt_chars = ' selected char, ' else: Txt_chars = ' selected chars, ' if Words_count < 2: Txt_words = ' selected word (' else: Txt_words = ' selected words (' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- console.hide() line_list = [] # empty list Line_end = '\r\n' line_list.append ('-' * Line_title) line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )') line_list.append ('-' * Line_title + Line_end) line_list.append (' FULL File Path : ' + File_name + Line_end) if os.path.isfile(File_name) == True: line_list.append (' CREATION Date : ' + Creation_date) line_list.append (' MODIFICATION Date : ' + Modif_date + Line_end) line_list.append (' READ-ONLY flag : ' + RO_flag) line_list.append (' READ-ONLY editor : ' + RO_editor + Line_end * 2) line_list.append (' Current VIEW : ' + Curr_view + Line_end) line_list.append (' Current ENCODING : ' + Curr_encoding + Line_end) line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')' + Line_end) line_list.append (' Current Line END : ' + Curr_eol + Line_end) line_list.append (' Current WRAPPING : ' + Curr_wrap + Line_end * 2) line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + Line_end) line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + Line_end) line_list.append (' CHARS w/o CR & LF : ' + str(Total_standard) + Line_end * 2) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL)) line_list.append (' SPC & TAB Chars : ' + str(Blank_chars)) line_list.append (' TRUE Chars : ' + str(True_chars) + Line_end) line_list.append (' TOTAL characters : ' + str(Total_chars) + Line_end * 2) if Curr_encoding == 'ANSI': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)') if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\ + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)') if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM': line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)') line_list.append (' Byte Order Mark : ' + str(BOM) + Line_end) line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + Line_end * 2) else: if Line_end == '\r\n': line_list.append (Line_end) line_list.append (' NUMBER Chars : ' + str(Number_chars) + '\t(*)') line_list.append (' LOW_LINE Chars : ' + str(Lowline_chars)) line_list.append (' LETTER Chars : ' + str(Letter_chars) + '\t(*)' + Line_end) line_list.append (' WORD Chars : ' + str(Word_chars) + '\t(*)' + Line_end * 2) line_list.append (' WORDS Count : ' + str(Words_total) + '\t(*)' + Line_end) if Err_regex_non_space == False: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\t(**)' + Line_end * 2) else: line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2) if Err_regex_sentence == False: line_list.append (' SENTENCES Count : ' + str(Sentence_count) + '\t(**)' + Line_end) else: line_list.append (' SENTENCES Count : ' + str(Sentence_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end) if Err_regex_paragraph == False: line_list.append (' PARAGRAPHS Count : ' + str(Paragraph_count) + '\t(**)' + Line_end * 2) else: line_list.append (' PARAGRAPHS Count : ' + str(Paragraph_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2) line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + Line_end) line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + Line_end) line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + Line_end * 2) line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + '\r\n' + Line_end) line_list.append (' (*) Our BOOST regex engine ignore all WORD, NUMBER and LETTER characters over the BMP and may ignore some others within the BMP !') line_list.append (' (**) The results may NOT be very accurate for "technical" or "non-regular" files !' + Line_end) notepad.new() editor.setText('\r\n'.join(line_list)) if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
Best Regards,
guy038
-
@guy038 said :
But if this unique zero-length selection was on a non-empty line, it would wrongly write…
I removed the
if Bytes_count != 0:
and tried to replicate the problem you indicated, but did not see the same issue. Can you provide more detail on your “steps to reproduce”?
Also, this line of your script gave me an error under Python3:
File_name = notepad.getCurrentFilename().decode('utf-8')
Here’s a way to make it work under Python2 or 3:
import sys python3 = sys.version_info.major == 3 if python3: File_name = notepad.getCurrentFilename() else: File_name = notepad.getCurrentFilename().decode('utf-8')
-
Hi, @alan-kilborn and All,
Ah… OK. No problem ! So, this script will work with both Python script
2
and3
, nice !
Regarding the bug, I can reproduce it very easily !
So, we use this part of the script, relative to selections, where I put the line
if Bytes_count != 0:
in comments :# -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 Words_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) num = 0 # if Bytes_count != 0: editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) Words_count += num # --------------------------------------------------------------------------------------------------------------------------------------------------------------
Then :
-
Open, let’s say, the
license.txt
file -
Move the caret to the very beginning of the
license.txt
file ( so, before the letter C of the wordCOPYING
) -
Do not do any selection
-
Run the script
=> You should see, in the
SELECTION(S)
line, a non-null number of words :SELECTION(S) : 0 selected char, 5822 selected words (0 selected byte) in 1 EMPTY range
-
Now, just move the caret one character on the right ( so, between the C and the O letters of the word
COPYING
) -
Do not do any selection, again
-
Re-run the script
=> This time, we get, for the
SELECTION(S)
line, the expected results :SELECTION(S) : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range
At first sight, this bug occurs only when the caret is at the very beginning of current file !
Once, you’ll find an explanation ( if any ! ), I will post the new version of the script.
BR
guy038
P.S. : May be, this bug do not occur with
Python script 3
? -
-
@guy038 said:
You should see, in the SELECTION(S) line, a non-null number of words
Well, I tried, using both PS3 and PS2, using license file and code change of:
#if Bytes_count != 0:
, and I still see in the output:SELECTION(S) : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range
-
Hello, @alan-kilborn,
BTW, regarding the bug that you cannot identify, did you receive my e-mail to you, on March, 21, with an attached zip archive to possibly reproduce the problem ?
BR
guy038
-
@guy038 said in Emulation of the "View > Summary" feature with a Python script:
did you receive my e-mail to you, on March, 21, with an attached zip archive to possibly reproduce the problem ?
Hi Guy. Yes, I did receive it but haven’t had time to work with it. Because of your prompting, however, I just did finish evaluating it.
I believe that what is happening in the buggy case is that THIS PS bug is manifesting (side note: it’s a bug that I reported). When the caret is at the first location in the file (aka position 0) – which is one of your test cases – then the bug kicks in.
The bug has been fixed, but I don’t believe there has been a release of PS2 after the fixing, so only PS3 contains the fix (which is why I – running PS3 – did not see a problem with your script code that did not include the
bytes_count
check against0
).I hope this clears it up.