"Summary" feature improvement

Alan Kilborn

…returns Npp.BUFFERENCODING.COOKIE, although I would have expected Npp.BUFFERENCODING.OEM-US or perhaps just OEM-US…

Rather that relying on a kludged read of the Notepad++ status bar, perhaps you should make a github issue against Notepad++, stating that NPPM_GETBUFFERENCODING doesn’t provide the data you need/expect, and maybe this plugin command will be enhanced for you?

guy038

Hello, @alan-kilborn and All,

Alan, first of all, I could have told you that I didn’t want to bother modifying your script and that I’d integrated it, as is. But, the truth is that I’m still a long way from understanding your script and seing any possible simplifications :-((

So, here is my final version of this Python script which can be used instead of the View > Summary feature. It contains the @alan-kilborn section which reads the right part of the status-bar, relative to the current encoding

I have to split this script into two consecutive posts !

# encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v0.4 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number <= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode & 0xFFFF
        type = (retcode >> 16) & 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

See next post for continuation !

guy038

Hi @alan-kilborn and All,

Continuation of my script :

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    Line_title = 93
else:
    Line_title = 71

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UCS-2 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UCS-2 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
    editor.research(r'[^\r\n]', number)

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)

Total_1_byte = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    editor.research(r'[\x{0080}-\x{07FF}]', number)

if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
    editor.research(r'[^\r\n]', number)

Total_2_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)

Total_3_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
num = 0
editor.research(r'[^\r\n]', number)

Total_Standard = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_4_bytes = 0  #  By default

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    Total_4_bytes = Total_Standard - Total_BMP

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_chars = Total_Standard + Total_EOL

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Bytes_length = Total_EOL + Total_1_byte  #  Default ANSI

if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
    Bytes_length = 2 * Total_chars

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if notepad.getEncoding() == BUFFERENCODING.UTF8:
    BOM = 3

if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'[^\r\n\t\x20]', number)

Non_blank_chars = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number)
else:
    editor.research(r'((?!\s).)+', number)

Non_space_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
    editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)

Empty_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
    editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Blank_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
else:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)

Total_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

# print ('Res = ', Num_sel)

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)

        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count < 2:
        Txt_chars = ' selected char ('

    else:
        Txt_chars = ' selected chars ('


    if Bytes_count < 2:
        Txt_bytes = ' selected byte) in '

    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel < 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel < 2 and Bytes_count > 0:
        Txt_ranges = ' range\n'

    if Num_sel > 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel > 1 and Bytes_count > 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

line_list = []  # empty list

line_list.append ('-' * Line_title)

line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

line_list.append ('-' * Line_title +'\n')

line_list.append (' FULL File Path    :  ' + File_name + '\n')

if os.path.isfile(File_name) == True:

    line_list.append(' CREATION     Date :  ' + Creation_date)

    line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')

    line_list.append(' READ-ONLY flag    :  ' + RO_flag )

line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')

line_list.append (' Current VIEW      :  ' + Curr_view + '\n')

line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')

line_list.append (' Current Line END  :  ' + Curr_eol + '\n')

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')

line_list.append (' CHARS w/o CR & LF :  ' + str(Total_Standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')

if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' * 1 + ' + str(Total_1_byte) + ' * 1b + '\
    + str(Total_2_bytes) + ' * 2b + ' + str(Total_3_bytes) + ' * 3b + ' + str(Total_4_bytes) + ' * 4b)')

if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 2b)')

if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 1b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
else:
    line_list.append ('\n')

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')

line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

editor.copyText ('\r\n'.join(line_list))

notepad.new()

editor.paste()

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UCS-2 BE BOM' and St_bar != 'UCS-2 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature

        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =>  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------

guy038

Hi, Alan and All,

( Continuation of the previous post )

Now, I’ve come accross a problem with the encodings !

Have you ever noticed that, when you decide to re-interpret the present encoding of a file with the View > Character Set > ... feature, that there are two possible scenarios ?

A) - The present econding is an Unicode encoding with a BOM ( Byte Order Mark ). So, either the UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM encoding
B) - The present encoding is an ANSI or UTF-8 file, so without a BOM

In the first case, whatever the new encoding chosen ( one-byte or two-bytes encoding ), the file contents do not change and my script just respects the real encoding of the current file

For example, with an UCS-2 LE BOM encoded file, if I change its encoding to View > Character Set > Western European > OEM-US, my new summary just consider that it’s still a true UCS-2 LE BOM encoded file, leading to a correct summary report !

In the second case, the new encoding chosen does modify the current file contents in the editor window. In addition, it automatically supposes that the current file is an UTF-8 encoded file, leading to erroneous results in the summary rapport :-( However, the current file contents, saved on the disk, seem still unchanged !!

For instance :

Open a new tab
Use the Encoding > Convert to UTF-8 feature, if necessary
Enter the four chars Aé☀𝜜, without any line-break, at the end
Save this file as Test-UTF8.txt
Using my script, you get, in a new tab :

---------------------------------------------------------------------------------------------
                            SUMMARY on 2024-02-05 16:50:23.656000
---------------------------------------------------------------------------------------------

 FULL File Path    :  D:\@@\792\Test-UTF8.txt

 CREATION     Date :  Mon Feb  5 16:45:24 2024
 MODIFICATION Date :  Mon Feb  5 15:17:02 2024

 READ-ONLY flag    :  NO
 READ-ONLY editor  :  NO


 Current VIEW      :  MAIN View

 Current ENCODING  :  UTF-8

 Current LANGUAGE  :  TXT  (Normal text file)

 Current Line END  :  Windows (CR LF)

 Current WRAPPING  :  YES


 1-BYTE  Chars     :  1
 2-BYTES Chars     :  1
 3-BYTES Chars     :  1

 Sum BMP Chars     :  3
 4-BYTES Chars     :  1

 CHARS w/o CR & LF :  4
 EOL ( CR or LF )  :  0

 TOTAL characters  :  4


 BYTES Length      :  10 (0 * 1 + 1 * 1b + 1 * 2b + 1 * 3b + 1 * 4b)
 Byte Order Mark   :  0

 BUFFER Length     :  10
 Length on DISK    :  10


 NON-Blank Chars   :  4

 WORDS     Count   :  1 (Caution !)

 NON-SPACE Count   :  1


 True EMPTY lines  :  0
 True BLANK lines  :  0

 EMPTY/BLANK lines :  0

 NON-BLANK lines   :  1
 TOTAL Lines       :  1


 SELECTION(S)      :  0 selected char (0 selected byte) in 1 EMPTY range

Everything is OK ( buffer length and length on disk are identical and the bytes length description shows one char for each number of bytes, without any EOL )

Now, switch back to the Test-UTF8.txt file
Run the View > Character Set > Western European > OEM-US feature
Re-run my script. This time, in a other new tab, you get :

---------------------------------------------------------------------------------------------
                            SUMMARY on 2024-02-05 16:51:16.937000
---------------------------------------------------------------------------------------------

 FULL File Path    :  D:\@@\792\Test-UTF8.txt

 CREATION     Date :  Mon Feb  5 16:45:24 2024
 MODIFICATION Date :  Mon Feb  5 15:17:02 2024

 READ-ONLY flag    :  NO
 READ-ONLY editor  :  NO


 Current VIEW      :  MAIN View

 Current ENCODING  :  UTF-8

 Current LANGUAGE  :  TXT  (Normal text file)

 Current Line END  :  Windows (CR LF)

 Current WRAPPING  :  YES


 1-BYTE  Chars     :  1
 2-BYTES Chars     :  6
 3-BYTES Chars     :  3

 Sum BMP Chars     :  10
 4-BYTES Chars     :  0

 CHARS w/o CR & LF :  10
 EOL ( CR or LF )  :  0

 TOTAL characters  :  10


 BYTES Length      :  22 (0 * 1 + 1 * 1b + 6 * 2b + 3 * 3b + 0 * 4b)
 Byte Order Mark   :  0

 BUFFER Length     :  22
 Length on DISK    :  10


 NON-Blank Chars   :  10

 WORDS     Count   :  2 (Caution !)

 NON-SPACE Count   :  1


 True EMPTY lines  :  0
 True BLANK lines  :  0

 EMPTY/BLANK lines :  0

 NON-BLANK lines   :  1
 TOTAL Lines       :  1


 SELECTION(S)      :  0 selected char (0 selected byte) in 1 EMPTY range

And, at the same time, a prompt displays this warning :

CURRENT file re-interpreted as OEM-US => Possible ERRONEOUS results
So, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script

Indeed, this time, as the file contents are unchanged, the length on DISK is still correct but the BUFFER length is wrong, due to the re-interpretation of the characters by the OEM-US encoding. That’s why I preferred to add this warning at the end of the script !

Now, do as it is said :

Close the Test-UTF8.txt file ( Ctrl + W )
Restore it ( Ctrl + Shift + T )
Again, you get the UTF-8 indication, for the Test-UTF8.txt file, at right of the status bar
Re-run my script

=> This time, we get again a correct summary, without any prompt !

Alan or other python gurus, feel free to improve this last version and/or test on various files if all the numbers shown are coherent !

Best Regards,

guy038

guy038

Hi All,

I"ve just realized that, up to now, I simply improved my script with an old version of N++ ( v7.9.2 ). I apologize…

So, I’m first going to update my last portable version, on my W10 laptop, from v8.5.4 to the v8.6.2 version and I will update my script and redo all the tests

See you later !

BR

guy038

Alan Kilborn

@guy038 said in Improved version of the "Summary" feature, ...:

I"ve just realized that, up to now, I simply improved my script with an old version of N++ ( v7.9.2 ). I apologize…

:-(

You ought to close out these ancient versions…permanently.

guy038

Hello, @alan-kilborn and All,

I’e just discovered that, since the v8.0 N++ version, the UCS-2 BE BOM and UCS-2 LE BOM encodings are able to handle all the characters over the BMP. Thus, these encoding were renamed, respectively, as UTF-16 BE BOM and UTF-16 LE BOM !

Note that, with these two encodings, each character with code > \x{FFFF} is built with the surrogate pair mechanism, so with two 16-bytes chars. Consequently, the total number of characters in the buffer = 2 (BOM) + number of chars <= x{FFFF} x 2 + number of chars > x{FFFF} x 4

For example, the simple string Aé☀𝜜, without any EOL, in an UTF-16 BE encoding file, is coded with 12 bytes as :


FE FF 00 41 00 E9 26 00 D8 35 DF 1C
----- ----- ----- ----- -----------
 BOM    A     é     ☀       𝜜

So, here is my final and updated version of the script, which works in all versions since the v8.0 one !

# encoding=utf-8

#-------------------------------------------------------------------------
#                    STATISTICS about the CURRENT file ( v0.5 )
#-------------------------------------------------------------------------

from __future__ import print_function    # for Python2 compatibility

from Npp import *

import re

import os, time, datetime

import ctypes

from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
#  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def npp_get_statusbar(statusbar_item_number):

    WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
    FindWindowW = ctypes.windll.user32.FindWindowW
    FindWindowExW = ctypes.windll.user32.FindWindowExW
    SendMessageW = ctypes.windll.user32.SendMessageW
    LRESULT = LPARAM
    SendMessageW.restype = LRESULT
    SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
    EnumChildWindows = ctypes.windll.user32.EnumChildWindows
    GetClassNameW = ctypes.windll.user32.GetClassNameW
    create_unicode_buffer = ctypes.create_unicode_buffer

    SBT_OWNERDRAW = 0x1000
    WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13

    npp_get_statusbar.STATUSBAR_HANDLE = None

    def get_result_from_statusbar(statusbar_item_number):
        assert statusbar_item_number <= 5
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
        length = retcode & 0xFFFF
        type = (retcode >> 16) & 0xFFFF
        assert (type != SBT_OWNERDRAW)
        text_buffer = create_unicode_buffer(length)
        retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
        retval = '{}'.format(text_buffer[:length])
        return retval

    def EnumCallback(hwnd, lparam):
        curr_class = create_unicode_buffer(256)
        GetClassNameW(hwnd, curr_class, 256)
        if curr_class.value.lower() == "msctls_statusbar32":
            npp_get_statusbar.STATUSBAR_HANDLE = hwnd
            return False  # stop the enumeration
        return True  # continue the enumeration

    npp_hwnd = FindWindowW(u"Notepad++", None)
    EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
    if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
    assert False

St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )

See next post for continuation !

guy038

Hi, @alan-kilborn and All,

Continuation of the script :

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

def number(occ):
    global num
    num += 1

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_encoding = str(notepad.getEncoding())

if Curr_encoding == 'ENC8BIT':
    Curr_encoding = 'ANSI'

if Curr_encoding == 'COOKIE':
    Curr_encoding = 'UTF-8'

if Curr_encoding == 'UTF8':
    Curr_encoding = 'UTF-8-BOM'

if Curr_encoding == 'UCS2BE':
    Curr_encoding = 'UTF-16 BE BOM'

if Curr_encoding == 'UCS2LE':
    Curr_encoding = 'UTF-16 LE BOM'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Line_title = 95
else:
    Line_title = 75

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

File_name = notepad.getCurrentFilename()

if os.path.isfile(File_name) == True:

    Creation_date = time.ctime(os.path.getctime(File_name))

    Modif_date = time.ctime(os.path.getmtime(File_name))

    Size_length = os.path.getsize(File_name)

    RO_flag = 'YES'

    if os.access(File_name, os.W_OK):
        RO_flag = 'NO'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

RO_editor = 'NO'

if editor.getReadOnly() == True:
    RO_editor = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if notepad.getCurrentView() == 0:
    Curr_view = 'MAIN View'
else:
    Curr_view = 'SECONDARY view'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_lang = notepad.getCurrentLang()

Lang_desc = notepad.getLanguageDesc(Curr_lang)

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if editor.getEOLMode() == 0:
    Curr_eol = 'Windows (CR LF)'

if editor.getEOLMode() == 1:
    Curr_eol = 'Macintosh (CR)'

if editor.getEOLMode() == 2:
    Curr_eol = 'Unix (LF)'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Curr_wrap = 'NO'

if editor.getWrapMode() == 1:
    Curr_wrap = 'YES'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'[^\r\n]', number)

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)

Total_1_byte = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'[\x{0080}-\x{07FF}]', number)

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !)

Total_2_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)

Total_3_bytes = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------
num = 0
editor.research(r'[^\r\n]', number)

Total_standard = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_4_bytes = 0  #  By default

if Curr_encoding != 'ANSI':
    Total_4_bytes = Total_standard - Total_BMP

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\r|\n', number)

Total_EOL = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Total_chars = Total_EOL + Total_standard

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

if Curr_encoding == 'ANSI':
    Bytes_length = Total_EOL + Total_1_byte

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

BOM = 0  #  Default ANSI and UTF-8

if Curr_encoding == 'UTF-8-BOM':
    BOM = 3

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    BOM = 2

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Buffer_length = Bytes_length + BOM

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'[^\r\n\t\x20]', number)

Non_blank_chars = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
editor.research(r'\w+', number)

Words_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0

if Curr_encoding == 'ANSI':
    editor.research(r'((?!\s).)+', number)
else:
    editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number)

Non_space_count = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number)
else:
    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)

Empty_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
else:
    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)

Blank_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Emp_blk_lines = Empty_lines + Blank_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

num = 0
if Curr_encoding == 'ANSI':
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
else:
    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)

Total_lines = num

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Non_blk_lines = Total_lines - Emp_blk_lines

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )

# print ('Res = ', Num_sel)

if Num_sel != 0:

    Bytes_count = 0
    Chars_count = 0

    for n in range(Num_sel):

        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)

        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Chars_count < 2:
        Txt_chars = ' selected char ('

    else:
        Txt_chars = ' selected chars ('


    if Bytes_count < 2:
        Txt_bytes = ' selected byte) in '

    else:
        Txt_bytes = ' selected bytes) in '

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

    if Num_sel < 2 and Bytes_count == 0:
        Txt_ranges = ' EMPTY range\n'

    if Num_sel < 2 and Bytes_count > 0:
        Txt_ranges = ' range\n'

    if Num_sel > 1 and Bytes_count == 0:
        Txt_ranges = ' EMPTY ranges\n'

    if Num_sel > 1 and Bytes_count > 0:
        Txt_ranges = ' ranges (EMPTY or NOT)\n'

# --------------------------------------------------------------------------------------------------------------------------------------------------------------

line_list = []  # empty list

line_list.append ('-' * Line_title)

line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

line_list.append ('-' * Line_title +'\n')

line_list.append (' FULL File Path    :  ' + File_name + '\n')

if os.path.isfile(File_name) == True:

    line_list.append(' CREATION     Date :  ' + Creation_date)

    line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')

    line_list.append(' READ-ONLY flag    :  ' + RO_flag )

line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')

line_list.append (' Current VIEW      :  ' + Curr_view + '\n')

line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')

line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')

line_list.append (' Current Line END  :  ' + Curr_eol + '\n')

line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')

line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))

line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))

line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')

line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))

line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')

line_list.append (' CHARS w/o CR & LF :  ' + str(Total_standard))

line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')

line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')

if Curr_encoding == 'ANSI':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')

if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')

if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')

line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')

line_list.append (' BUFFER Length     :  ' + str(Buffer_length))

if os.path.isfile(File_name) == True:
    line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
else:
    line_list.append ('\n')

line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')

line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')

line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')

line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))

line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')

line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')

line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))

line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')

line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)

editor.copyText ('\r\n'.join(line_list))

notepad.new()

editor.paste()

editor.copyText('')

if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':

    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature

        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =>  Possible ERRONEOUS results' + \
                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')

# ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------

If you’re still working or doing tests wih a N++ version prior to v8.0 :

First, change any sub-string UTF-16 with UCS-2, in the python script
And, of course, do not forget to get rid of any character over \x{FFFF} in your UCS-2 BE/LE BOM encoded files, before using this script

Note, that the encoding problem, described two posts ago, when trying to encode any file, without a BOM, with a Encoding > Character Set > ... encoding, stll remains. Thus, the warning prompt is still present at the end of this final version !

Now, I’m going to update an old post where I explained the poor performance of the present summary feature. I’ll take the opportunity to include the instructions for understanding this improved script !

Best Regards,

guy038

Alan Kilborn

@guy038

You have this line in your script:

line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

I would suggest changing it to:

line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))

This is because, without the int, under Python3 we see the following error:

TypeError: can't multiply sequence by non-int of type 'float'

guy038

Hi, @alan-kilborn and All,

Just follow this link to find out why I decided to improve the View > Summary feature and to get the last version of the Python script, wich gives us a decent and exact Summary feature !

https://community.notepad-plus-plus.org/post/92794 ( 4 posts )

BR

guy038

Alan Kilborn

@guy038 said:

Just follow this link

I’m MIGHTY confused as to why you felt the need to reanimate a several-years-old topic/thread to continue discussing what you dedicated this current thread to…
Why not just keep talking here?

guy038

Hello, @alan-kilborn,

Sorry to get you confused. I’ll try to explain why I wanted to continue on the other thread !

Firsly, I wanted to show from where and why my script came : the whole logic of the View > Summary needed to be completely rebuilt :-((
Secondly, I wanted to update these old posts. Indeed, at that time, the v7.9.1 N++ version was just released. So, I recently did some tests to verify if, consecutively to the encoding improvements of the v8.0 version, the global logic of the summary has been improved. Unfortunately, the View > Summary feature still gives wrong results, especially when the present file is a UTF-16 BE BOM or UTF-16 LE BOM encoded file :-((
Thus, it seemed obvious to me to continue on this thread and add the consecutive versions of my script !

Now, I realized that I could have stayed with this new thread, and put a link to my initial post to help people to understand the reasons of this Python script !

So, unless you’re terribly upset of my decision ( which would need a lot of modifications ) , I suppose that I’m going on to post the possible new versions of my script on the other thread !

In order to get it more clear, I could simply rename this present thread as Summary feature improvement and rename the other thread as Emulation of the "Summary" feature with Python script

Alan, what do you think of ?

Best Regards,

guy038

Alan Kilborn

@guy038 said in Improved version of the "Summary" feature, ...:

what do you think of ?

I wouldn’t bother trying to rename things at this point.
It’s no problem simply because I was confused (that’s MY problem). :-)
Carry on… :-)