Emulation of the "View > Summary" feature with a Python script

guy038

Continuation of version v1.2 of the script :

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

print ('START')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Bytes_length = editor.getLength()

Total_chars = editor.countCharacters(0, editor.getLength())

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\n|\r', number)

Total_EOL = num

print ('EOL')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\t|\x20', number)

Blank_chars = num

print ('BLANK')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_standard = Total_chars - Total_EOL

True_chars = Total_chars - Total_EOL - Blank_chars

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':

    Total_BMP = Total_standard
    
    Total_1_byte = Total_BMP

    Total_2_bytes = 0

    Total_3_bytes = 0

    Total_4_bytes = 0

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':

    num = 0
    editor.research(r'[\x{0080}-\x{07FF}]', number)

    Total_2_bytes = num

    print ('2-BYTES')

    # --------------------------------------------------------------------------------------------------------------------------------------------------------------

    num = 0
    editor.research(r'[\x{0800}-\x{D7FF}\x{E000}-\x{FFFF}]', number)

    Total_3_bytes = num

    print ('3-BYTES')

    # -----------------------------------------------------------------------------------------------------------------------------

    Total_4_bytes = ( Bytes_length - Total_chars - Total_2_bytes - 2 * Total_3_bytes ) / 3

    Total_1_byte = Total_standard - Total_2_bytes - Total_3_bytes - Total_4_bytes

    Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------


if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':

    num = 0
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP chars different from '\r' and '\n'

    Total_2_bytes = num

    Total_4_bytes = Total_standard - Total_2_bytes

    Total_BMP = Total_2_bytes

    Total_1_byte = 0

    Total_3_bytes = 0

    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

    print ('2-BYTES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\d', number)

Number_chars = num

print ('NUMBERS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'_', number)

Lowline_chars = num

print ('LOW_LINES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w', number)

Word_chars = num

print ('WORDS')

Letter_chars = Word_chars - Number_chars - Lowline_chars

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_total = num

print ('WORDS+')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex_non_space = False

num = 0

if Curr_encoding == 'ANSI' or Total_4_bytes == 0:
    editor.research(r'\S+', number)
else:
    try:
        editor.research(r'(?:(?!\s).[\x{D800}-\x{DFFF}]?)+', number)
    except RuntimeError:
        Err_regex_non_space = True

Non_space_count = num

print ('NON-SPACE+')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Err_regex_sentence = False

num = 0

try:
    editor.research(r'(?-s)(?:\A|(?<=[\h\r\n.?!])).+?(?:(?=[.?!](\h|\R|\z))|(?=\R|\z))', number)
except RuntimeError:
    Err_regex_sentence = True

Sentence_count = num

print ('SENTENCES')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
Err_regex_paragraph = False

num = 0

try:
    editor.research(r'(?-s)(?:(?:.[\x{D800}-\x{DFFF}]?)+(?:\r\n|\n|\r))+(?:\r\n|\n|\r){1,}(?:(?:.[\x{D800}-\x{DFFF}]?)+\z)?|(?:.[\x{D800}-\x{DFFF}]?)+\z', number)
except RuntimeError:
    Err_regex_paragraph = True

Paragraph_count = num

print ('PARAGRAPHS')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^(?:\r\n|\n|\r)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^(?:\r\n|\n|\r)', number)

Special_empty = num

num = 0
editor.research(r'^(?:\r\n|\n|\r)', number)

Default_empty = num

Empty_lines = Default_empty - Special_empty

print ('EMPTY lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'\f^[\t\x20]+(?:\r\n|\n|\r|\z)', number)
else:
    editor.research(r'[\f\x{0085}\x{2028}\x{2029}]^[\t\x20]+(?:\r\n|\n|\r|\z)', number)

Special_blank = num

num = 0
editor.research(r'^[\t\x20]+(?:\r\n|\n|\r|\z)', number)

Default_blank = num

Blank_lines = Default_blank - Special_blank

print ('BLANK lines')

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_lines = editor.getLineCount()

num = 0
editor.research(r'(?-s)^.+\z', number)

if num == 0:
    Total_lines = Total_lines - 1  #  Because LAST line totally EMPTY

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0
    Words_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
        if Bytes_count != 0:
            editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Bytes_count < 2:
        Txt_bytes = ' selected byte) in '
    else:
        Txt_bytes = ' selected bytes) in '

    if Chars_count < 2:
        Txt_chars = ' selected char, '
    else:
        Txt_chars = ' selected chars, '

    if Words_count < 2:
        Txt_words = ' selected word ('
    else:
        Txt_words = ' selected words ('

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel < 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range'

    if Num_sel < 2 and Bytes_count > 0:
        Txt_ranges = ' range'

    if Num_sel > 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges'

    if Num_sel > 1 and Bytes_count > 0:
        Txt_ranges = ' ranges (EMPTY or NOT)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

console.hide()

line_list = []  # empty list

Line_end = '\r\n'

line_list.append ('-' * Line_title)

line_list.append (' ' * int((Line_title - 54) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()) + ' ( ' + str(time.time() - Start_time) + ' )')

line_list.append ('-' * Line_title + Line_end)

line_list.append (' FULL File Path    :  ' + File_name + Line_end)

if os.path.isfile(File_name) == True:

    line_list.append (' CREATION     Date :  ' + Creation_date)

    line_list.append (' MODIFICATION Date :  ' + Modif_date + Line_end)

    line_list.append (' READ-ONLY flag    :  ' + RO_flag)

line_list.append (' READ-ONLY editor  :  ' + RO_editor + Line_end * 2)

line_list.append (' Current VIEW      :  ' + Curr_view + Line_end)

line_list.append (' Current ENCODING  :  ' + Curr_encoding + Line_end)

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')' + Line_end)

line_list.append (' Current Line END  :  ' + Curr_eol + Line_end)

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + Line_end * 2)

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + Line_end)

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + Line_end)

line_list.append (' CHARS w/o CR & LF :  ' + str(Total_standard) + Line_end * 2)

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL))

line_list.append (' SPC & TAB  Chars  :  ' + str(Blank_chars))

line_list.append (' TRUE       Chars  :  ' + str(True_chars) + Line_end)

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + Line_end * 2)

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + Line_end)

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + Line_end * 2)
else:
    if Line_end == '\r\n':
        line_list.append (Line_end)

line_list.append (' NUMBER     Chars  :  ' + str(Number_chars) + '\t(*)')

line_list.append (' LOW_LINE   Chars  :  ' + str(Lowline_chars))

line_list.append (' LETTER     Chars  :  ' + str(Letter_chars) + '\t(*)' + Line_end)

line_list.append (' WORD       Chars  :  ' + str(Word_chars) + '\t(*)' + Line_end * 2)

line_list.append (' WORDS      Count  :  ' + str(Words_total) + '\t(*)' + Line_end)

if Err_regex_non_space == False:
    line_list.append (' NON-SPACE  Count  :  ' + str(Non_space_count) + '\t(**)' + Line_end * 2)
else:
    line_list.append (' NON-SPACE  Count  :  ' + str(Non_space_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2)

if Err_regex_sentence == False:
    line_list.append (' SENTENCES  Count  :  ' + str(Sentence_count) + '\t(**)' + Line_end)
else:
    line_list.append (' SENTENCES  Count  :  ' + str(Sentence_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end)

if Err_regex_paragraph == False:
    line_list.append (' PARAGRAPHS Count  :  ' + str(Paragraph_count) + '\t(**)' + Line_end * 2)
else:
    line_list.append (' PARAGRAPHS Count  :  ' + str(Paragraph_count) + '\t(Caution : a " RuntimeError " occured !)' + Line_end * 2)

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + Line_end)

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + Line_end)

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + Line_end * 2)

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Words_count) + Txt_words + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges + '\r\n' + Line_end)

line_list.append (' (*)   Our BOOST regex engine ignore all WORD, NUMBER and LETTER characters over the BMP and may ignore some others within the BMP !')

line_list.append (' (**)  The results may NOT be very accurate for "technical" or "non-regular" files !' + Line_end)

notepad.new()

editor.setText('\r\n'.join(line_list))

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature

        notepad.messageBox ('CURRENT file re-interpreted as ' + St_bar + '  =>  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------

Best Regards,

guy038

Alan Kilborn

@guy038 said :

But if this unique zero-length selection was on a non-empty line, it would wrongly write…

I removed the if Bytes_count != 0: and tried to replicate the problem you indicated, but did not see the same issue. Can you provide more detail on your “steps to reproduce”?

Also, this line of your script gave me an error under Python3:

File_name = notepad.getCurrentFilename().decode('utf-8')

Here’s a way to make it work under Python2 or 3:

import sys
python3 = sys.version_info.major == 3
if python3:
    File_name = notepad.getCurrentFilename()
else:
    File_name = notepad.getCurrentFilename().decode('utf-8')

guy038

Hi, @alan-kilborn and All,

Ah… OK. No problem ! So, this script will work with both Python script 2 and 3, nice !

Regarding the bug, I can reproduce it very easily !

So, we use this part of the script, relative to selections, where I put the line if Bytes_count != 0: in comments :

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0
    Words_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

        num = 0
#        if Bytes_count != 0:
        editor.research(r'\w+', number, 0, editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
        Words_count += num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Then :

Open, let’s say, the license.txt file
Move the caret to the very beginning of the license.txt file ( so, before the letter C of the word COPYING )
Do not do any selection
Run the script

=> You should see, in the SELECTION(S) line, a non-null number of words :

 SELECTION(S)      :  0 selected char, 5822 selected words (0 selected byte) in 1 EMPTY range

Now, just move the caret one character on the right ( so, between the C and the O letters of the word COPYING )
Do not do any selection, again
Re-run the script

=> This time, we get, for the SELECTION(S) line, the expected results :

 SELECTION(S)      :  0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range

At first sight, this bug occurs only when the caret is at the very beginning of current file !

Once, you’ll find an explanation ( if any ! ), I will post the new version of the script.

BR

guy038

P.S. : May be, this bug do not occur with Python script 3 ?

Alan Kilborn

@guy038 said:

You should see, in the SELECTION(S) line, a non-null number of words

Well, I tried, using both PS3 and PS2, using license file and code change of: #if Bytes_count != 0:, and I still see in the output:

SELECTION(S) : 0 selected char, 0 selected word (0 selected byte) in 1 EMPTY range

guy038

Hello, @alan-kilborn,

BTW, regarding the bug that you cannot identify, did you receive my e-mail to you, on March, 21, with an attached zip archive to possibly reproduce the problem ?

BR

guy038

Alan Kilborn

@guy038 said in Emulation of the "View > Summary" feature with a Python script:

did you receive my e-mail to you, on March, 21, with an attached zip archive to possibly reproduce the problem ?

Hi Guy. Yes, I did receive it but haven’t had time to work with it. Because of your prompting, however, I just did finish evaluating it.

I believe that what is happening in the buggy case is that THIS PS bug is manifesting (side note: it’s a bug that I reported). When the caret is at the first location in the file (aka position 0) – which is one of your test cases – then the bug kicks in.

The bug has been fixed, but I don’t believe there has been a release of PS2 after the fixing, so only PS3 contains the fix (which is why I – running PS3 – did not see a problem with your script code that did not include the bytes_count check against 0).

I hope this clears it up.