"Summary" feature improvement
-
@guy038 said:
Insert the Creation Date and the Modification Date at the beginning of the results
import os, time cfn = notepad.getCurrentFilename() ti_m = os.path.getmtime(cfn) m_ti = time.ctime(ti_m) mt_obj = time.strptime(m_ti) mts = time.strftime("%Y-%m-%d %H:%M:%S", mt_obj) print('Modified time:', mts) ti_c = os.path.getctime(cfn) c_ti = time.ctime(ti_c) ct_obj = time.strptime(c_ti) cts = time.strftime("%Y-%m-%d %H:%M:%S", ct_obj) print('Created time:', cts)
-
Of course, you’ll want to make sure what you are dealing with for
cfn
is actually a real file on disk and not a soft-named file, e.g. anew 4
file. If you don’t do that kind of check, the OS won’t know what to make ofnew 4
as a pathname.Do a check for that with:
if os.path.isfile(cfn): ...
-
@guy038 said:
my poor first version
Well, I really like how you used the
number
function as the real workhorse of the script! -
@guy038 said :
X characters (Y bytes) in Z ranges)
“Z” (number of selection(s) ) is perhaps easiest, so let’s deal with it first:
z = editor.getSelections()
However, if you don’t want to count any empty selections (i.e., caret(s) ), then:
z = 0 for n in range(editor.getSelections()): if editor.getSelectionNStart(n) != editor.getSelectionNEnd(n): z += 1
“Y” (number of bytes in selection(s) ) is next-easiest, and could be calculated as:
y = 0 for n in range(editor.getSelections()): y += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
“X” (number of characters in selection(s) ) could be done with:
x = 0 for n in range(editor.getSelections()): x += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
The caveat here is that it is letting Scintilla determine what is a character to count; should be fine?
Scintilla seems to count line-ending characters as one for
\r
and one for\n
– maybe that is not wanted here? This script fragment would eliminate line-ending characters from the character count:x = 0 for n in range(editor.getSelections()): x += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) t = editor.getTextRange(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) count_of_line_ending_chars = len(re.findall(r'[\r\n]', t)) x -= count_of_line_ending_chars
EDIT: Looking more at your script listing, I think you probably want finer control over what a character is than just “asking Scintilla”…hmmm…
-
Hello, @alan-kilborn and All,
First of all, many thanks, Alan, for your additional posts, regarding the way to insert the
creation
andmodification date
, as well as theselection
rangesLuckily, after some Google search, I was able to get my own versions in order to output the
Full file name
, itssize
and the twodates
:-))Thus, I going to study your last post, regarding the
selection
ranges !
In the meanwhile, here is my second version of the Python script which outputs a new summary of the current file :
#---------------------------------------------------------------------------- # STATISTIQUES about the CURRENT file ( v0.1 ) #---------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility import re import os, time # --------------------------------------------------------------------------------------------------------------- def number(occ): global num num = num + 1 # --------------------------------------------------------------------------------------------------------------- if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Line_title = 93 else: Line_title = 71 # --------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'[^\r\n]', number) if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'[\x{0080}-\x{07FF}]', number) if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: editor.research(r'[^\r\n]', number) Total_2_bytes = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # --------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_Standard = num # --------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Total_4_bytes = Total_Standard - Total_BMP # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # --------------------------------------------------------------------------------------------------------------- Total_chars = Total_Standard + Total_EOL # --------------------------------------------------------------------------------------------------------------- Bytes_length = Total_EOL + Total_1_byte # Default ANSI if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: Bytes_length = 2 * Total_chars if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes # --------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if notepad.getEncoding() == BUFFERENCODING.UTF8: BOM = 3 if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: BOM = 2 # --------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) else: editor.research(r'((?!\s).)+', number) Non_space_count = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # --------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # --------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # --------------------------------------------------------------------------------------------------------------- console.show() console.clear() print ('-' * Line_title) print (' ' * ((Line_title - 7) / 2) + 'Summary') print ('-' * Line_title,'\n') print (' Full File Path :' , File_name, '\n') if os.path.isfile(File_name) == True: print (' Creation Date :' , Creation_date) print (' Modification Date :' , Modif_date,'\n\n') else: print('\n') print (' 1-Byte Chars : ', Total_1_byte) print (' 2-Bytes Chars : ', Total_2_bytes) print (' 3-Bytes Chars : ', Total_3_bytes, '\n') print (' Sum BMP Chars : ', Total_BMP) print (' 4-Bytes Chars : ', Total_4_bytes, '\n') print (' Chars w/o CR & LF : ', Total_Standard) print (' EOL ( CR or LF ) : ', Total_EOL,'\n') print (' TOTAL characters : ', Total_chars, '\n\n') if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: print (' BYTES Length : ', Bytes_length, '( 1 *', Total_1_byte, '+ 1 *', Total_EOL , '+ 2 *', Total_2_bytes, '+ 3 *', Total_3_bytes, '+ 4 *', Total_4_bytes, ')') if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: print (' BYTES Length : ', Bytes_length, '( 2 *', Total_chars, ')') if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: print (' BYTES Length : ', Bytes_length, '( 1 *', Total_chars, ')') print (' Byte Order Mark : ', BOM, '\n') print (' BUFFER Length : ', Buffer_length) if os.path.isfile(File_name) == True: print (' Length on disk : ', Size_length,'\n\n') else: print ('\n\n') print (' NON-Blank Chars : ', Non_blank_chars,'\n') print (' Words Count : ', Words_count, '\n') print (' NON-Space Count : ', Non_space_count,'\n\n') print (' True EMPTY lines : ', Empty_lines) print (' True BLANK lines : ', Blank_lines, '\n') print (' EMPTY/BLANK lines : ', Emp_blk_lines,'\n') print (' NON-BLANK lines : ', Non_blk_lines) print (' TOTAL Lines : ', Total_lines,'\n\n') print (' Selection(s) : ') # X characters (Y bytes) in Z ranges) # ---------------------------------------------------------------------------------------------------------------
Just test it ! I may had missed some points !
Now, in my previous post, I said :
In addition, I presently just output the results on the python console and I, of course, would prefer to paste all these results directly on the clipboard !
Does someone can help me, regarding this specific point ?
Many thanks, by advance !
Best Regards,
guy038
-
@guy038 said in Improved version of the "Summary" feature, ...:
I, of course, would prefer to paste all these results directly on the clipboard !
If you mean you want to put the text you’ve created into the clipboard, so you can later paste it,
editor.copyText(stringVar)
will put the contents ofstringVar
into the Windows clipboard. -
Hi, @alan-kilborn, @peterjones and All,
Peter, thanks for the tip ! I do understand that I may replace all the lines, with this way. For example, the line :
print (' TOTAL Lines : ', Total_lines,'\n\n')
by these two ones :
l = ' TOTAL Lines : ' + str(Total_lines) + '\n\n' editor.copyText(l)
But I was thinking about the way to store all the results, first, let’s say, in the
text
variable and then, use a uniqueeditor.copyText(text)
command ! It could be quicker to execute ?BR
guy038
-
Hi, @peterjones, and All,
I think I understood the way to do it :
Just replace, for instance, these
3
lines :print (' True EMPTY lines : ', Empty_lines) print (' True BLANK lines : ', Blank_lines, '\n') print (' EMPTY/BLANK lines : ', Emp_blk_lines,'\n')
by the
4
lines :t = ' True EMPTY lines : ' + str(Empty_lines) + '\n' t = t + ' True BLANK lines : ' + str(Blank_lines) + '\n\n' t = t + ' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n\n' editor.copyText(t)
Isn’t it ?
BR
guy038
-
@guy038 ,
Yes, I was thinking something along those lines.
-
Or… you could do a list of lines, adding a line to the list each time it is calculated, then join the lines together into a string and copy that to the clipboard:
line_list = [] # empty list line_list.append(' True EMPTY lines : ' + str(Empty_lines)) line_list.append(' True BLANK lines : ' + str(Blank_lines)) line_list.append(' EMPTY/BLANK lines : ' + str(Emp_blk_lines)) editor.copyText('\r\n'.join(line_list))
The line endings don’t have to match the current document, since they are going to the clipboard.
-
Hi, All,
I’m off, tomorrow, with a dozen friends from the ski club, for three days, full board, at the ‘Villages Vacances Familles’ in Monetier-les-bains, in one of France’s great ski areas, near Briançon!
The weather forecast for Sunday, Monday and Tuesday is fine with sunshine most of the time, but cloudy on Tuesday, and the snow, which is a bit hard, is on the cards: from
116
cm on the summits, at 2800 m down to 2100 m and with40
cm in the resorts.To whet your appetite, here’s an interactive piste map. Once you’ve switched to full screen, you can even ignore certain markers (pistes, ski lifts or other details) by clicking on the icon at the top left !
https://www.serre-chevalier.com/en/ski-area/interactive-trail-map
So, see you on next Tuesday evening or Wednesday !
BR
guy038
You may also have a look to this site :
-
@guy038 said in Improved version of the "Summary" feature, ...:
I’m off … with a dozen friends from the ski club, for three days
Oh, great… starts writing an interesting script, publishes it here partially-finished, then goes on vacation… :-(
Enjoy, but I hope the script doesn’t get forgotten about!
-
Hello, @alan-kilborn and All,
Alan, you’ve been reassured. I’m back. Our stay went well, although there was some very hard snow, following rain in the previous days. And, on the first day, on the Luc Alphand black run, at Chante-Merle, I was not very proud ! Basically, it was OK between
10.00 am
and2.30 pm
max !
Now, let’s get back to our beloved editor !
So, here is my third version ( and still incomplete ) of my Python script which improves the
View > Summary
feature :# encoding=utf-8 #---------------------------------------------------------------------------- # STATISTIQUES about the CURRENT file ( v0.2 ) #---------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility import re import os, time # --------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # --------------------------------------------------------------------------------------------------------------- if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Line_title = 93 else: Line_title = 71 # --------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) # --------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UCS-2 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UCS-2 LE BOM' # --------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'[^\r\n]', number) if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'[\x{0080}-\x{07FF}]', number) if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: editor.research(r'[^\r\n]', number) Total_2_bytes = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # --------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_Standard = num # --------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Total_4_bytes = Total_Standard - Total_BMP # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # --------------------------------------------------------------------------------------------------------------- Total_chars = Total_Standard + Total_EOL # --------------------------------------------------------------------------------------------------------------- Bytes_length = Total_EOL + Total_1_byte # Default ANSI if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: Bytes_length = 2 * Total_chars if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes # --------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if notepad.getEncoding() == BUFFERENCODING.UTF8: BOM = 3 if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: BOM = 2 # --------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # --------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) else: editor.research(r'((?!\s).)+', number) Non_space_count = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # --------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # --------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # --------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # --------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # ----Aé☀𝜜---------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * ((Line_title - 7) / 2) + 'Summary') line_list.append ('-' * Line_title +'\n') line_list.append (' Full File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' Creation Date : ' + Creation_date) line_list.append(' Modification Date : ' + Modif_date + '\n\n') else: line_list.append ('\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' Chars w/o CR & LF : ' + str(Total_Standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (1 * ' + str(Total_1_byte) + ' + 1 * ' + str(Total_EOL) + ' + 2 * ' + str(Total_2_bytes) + ' + 3 * ' + str(Total_3_bytes) + ' + 4 * ' + str(Total_4_bytes) + ')') if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (2 * ' + str(Total_chars) + ')') if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (1 * ' + str(Total_chars) + ')') line_list.append (' Byte Order Mark : '+ str(BOM) + '\n') line_list.append (' BUFFER Length : '+ str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : '+ str(Size_length) + '\n\n') else: line_list.append ('\n\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' Words Count : ' + str(Words_count) + '\n') line_list.append (' NON-Space Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) # ---------------------------------------------------------------------------------------------------------------
Now, two points are still not clear :
- A To get the current language of the current file, I use the
notepad.getCurrentLang()
fonction. But I also saw thenotepad.getLangType()
fonction which seems to return the same string !? Which fonction would be best for this specific script ?
- B For the current encoding, I would like to get the zone at right of the
status bar
. I did see thenotepad.setStatusBar(statusBarSection
, function) but I don’t see the counterpartnotepad.getStatusBar(statusBarSection
!For instance, if you have a
UTF-8
default file, typingnotepad.getEncoding()
, on the python console, returnsNpp.BUFFERENCODING.COOKIE
Now, if you decide to change the way this file’s bytes are interpreted with, for example, the
Encoding > Character Set > Western European > OEM-US
encoding, typing againnotepad.getEncoding()
on the python console, still returnsNpp.BUFFERENCODING.COOKIE
, although I would have expectedNpp.BUFFERENCODING.OEM-US
or perhaps justOEM-US
!TIA for any hint !
Best Regards,
guy038
-
@guy038 said:
notepad.getCurrentLang() versus notepad.getLangType()
If you’re interested in the currently active tab, you can use either one.
If you wanted a tab that is not the active one,notepad.getLangType()
allows you to specify a buffer id for that other tab.
but I don’t see the counterpart notepad.getStatusBar(statusBarSection)
My script for that is HERE.
…still returns Npp.BUFFERENCODING.COOKIE, although I would have expected …
Some good “cookie” discussion is HERE, and it also includes the get-status-bar technique.
Some more good discussions about encoding are found in these threads:
- https://community.notepad-plus-plus.org/topic/25175/how-can-i-get-the-encoding-of-current-document
- https://community.notepad-plus-plus.org/topic/24560/new-plugin-multireplace
Quoting @Coises from one of those threads:
Notepad++ doesn’t handle character sets internally the way it appears to a user. For editing, everything is either in the user default code page (“ANSI”) or in UTF-8. Whenever you’re not using the default code page, Notepad++ uses UTF-8 (so it is possible to enter and see characters that aren’t in the code page). Translation to other code pages is done when reading and writing the file.
This could be the reason you aren’t obtaining the results you expect. Now, Notepad++ could, if it wanted to, provide the info that it itself knows about (from the status bar). For whatever reason, it chooses not to (probably lack of developer attention to this detail).
-
Hello, @alan-kilborn and All,
I’m still improving my Python script which can be used instead of the
View > Summary
feature. I added :-
The
current
date and time in the title -
The
Read-only
file flag status -
The Notepad++
Read only
status -
The current
view
, the currentLine End
and the currentwrap
mode -
The script, now, opens automatically a
new
tab and pastes the contents of the summary in this new tab
Here is the fourth ( and still incomplete ) version :
# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.3 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility import re import os, time # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Line_title = 93 else: Line_title = 71 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UCS-2 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UCS-2 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'[^\r\n]', number) if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'[\x{0080}-\x{07FF}]', number) if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: editor.research(r'[^\r\n]', number) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_Standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Total_4_bytes = Total_Standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_Standard + Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = Total_EOL + Total_1_byte # Default ANSI if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: Bytes_length = 2 * Total_chars if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if notepad.getEncoding() == BUFFERENCODING.UTF8: BOM = 3 if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) else: editor.research(r'((?!\s).)+', number) Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_Standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (1 * ' + str(Total_1_byte) + ' + 1 * ' + str(Total_EOL) + ' + 2 * ' + str(Total_2_bytes) + ' + 3 * ' + str(Total_3_bytes) + ' + 4 * ' + str(Total_4_bytes) + ')') if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (2 * ' + str(Total_chars) + ')') if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (1 * ' + str(Total_chars) + ')') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + '\n') line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) notepad.new() editor.paste() # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
Now, Alan, I may incorporate, of course, your script, partially displayed below, within my script, in order to get the exact encoding used by the current file !
# -*- coding: utf-8 -*- from __future__ import print_function from Npp import * import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer ..... ..... npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False print(npp_get_statusbar(4)) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )
But, given that I’m only interressed in the fourth zone of the
Status Bar
, can’t we used a simplified version of your script to do so ?TIA again !
Best Regards,
guy038
-
-
@guy038 said in Improved version of the "Summary" feature, ...:
given that I’m only interressed in the fourth zone of the Status Bar, can’t we used a simplified version of your script to do so ?
I’m not sure how it could be simplified, but if you have ideas on that, please do it and publish it as part of your script.
It could be put in its own module, and
import
ed if you’d like… -
@guy038 said :
…returns Npp.BUFFERENCODING.COOKIE, although I would have expected Npp.BUFFERENCODING.OEM-US or perhaps just OEM-US…
Rather that relying on a kludged read of the Notepad++ status bar, perhaps you should make a github issue against Notepad++, stating that NPPM_GETBUFFERENCODING doesn’t provide the data you need/expect, and maybe this plugin command will be enhanced for you?
-
Hello, @alan-kilborn and All,
Alan, first of all, I could have told you that I didn’t want to bother modifying your script and that I’d integrated it, as is. But, the truth is that I’m still a long way from understanding your script and seing any possible simplifications :-((
So, here is my final version of this Python script which can be used instead of the
View > Summary
feature. It contains the @alan-kilborn section which reads the right part of thestatus-bar
, relative to the current encodingI have to split this script into two consecutive posts !
# encoding=utf-8 #------------------------------------------------------------------------- # STATISTICS about the CURRENT file ( v0.4 ) #------------------------------------------------------------------------- from __future__ import print_function # for Python2 compatibility from Npp import * import re import os, time import ctypes from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT # -------------------------------------------------------------------------------------------------------------------------------------------------------------- # From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- def npp_get_statusbar(statusbar_item_number): WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM) FindWindowW = ctypes.windll.user32.FindWindowW FindWindowExW = ctypes.windll.user32.FindWindowExW SendMessageW = ctypes.windll.user32.SendMessageW LRESULT = LPARAM SendMessageW.restype = LRESULT SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ] EnumChildWindows = ctypes.windll.user32.EnumChildWindows GetClassNameW = ctypes.windll.user32.GetClassNameW create_unicode_buffer = ctypes.create_unicode_buffer SBT_OWNERDRAW = 0x1000 WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13 npp_get_statusbar.STATUSBAR_HANDLE = None def get_result_from_statusbar(statusbar_item_number): assert statusbar_item_number <= 5 retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0) length = retcode & 0xFFFF type = (retcode >> 16) & 0xFFFF assert (type != SBT_OWNERDRAW) text_buffer = create_unicode_buffer(length) retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer)) retval = '{}'.format(text_buffer[:length]) return retval def EnumCallback(hwnd, lparam): curr_class = create_unicode_buffer(256) GetClassNameW(hwnd, curr_class, 256) if curr_class.value.lower() == "msctls_statusbar32": npp_get_statusbar.STATUSBAR_HANDLE = hwnd return False # stop the enumeration return True # continue the enumeration npp_hwnd = FindWindowW(u"Notepad++", None) EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0) if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number) assert False St_bar = npp_get_statusbar(4) # Zone 4 ( STATUSBARSECTION.UNICODETYPE )
See next post for continuation !
-
Hi @alan-kilborn and All,
Continuation of my script :
# -------------------------------------------------------------------------------------------------------------------------------------------------------------- def number(occ): global num num += 1 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Line_title = 93 else: Line_title = 71 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- File_name = notepad.getCurrentFilename() if os.path.isfile(File_name) == True: Creation_date = time.ctime(os.path.getctime(File_name)) Modif_date = time.ctime(os.path.getmtime(File_name)) Size_length = os.path.getsize(File_name) RO_flag = 'YES' if os.access(File_name, os.W_OK): RO_flag = 'NO' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- RO_editor = 'NO' if editor.getReadOnly() == True: RO_editor = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if notepad.getCurrentView() == 0: Curr_view = 'MAIN View' else: Curr_view = 'SECONDARY view' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_encoding = str(notepad.getEncoding()) if Curr_encoding == 'ENC8BIT': Curr_encoding = 'ANSI' if Curr_encoding == 'COOKIE': Curr_encoding = 'UTF-8' if Curr_encoding == 'UTF8': Curr_encoding = 'UTF8-BOM' if Curr_encoding == 'UCS2BE': Curr_encoding = 'UCS-2 BE BOM' if Curr_encoding == 'UCS2LE': Curr_encoding = 'UCS-2 LE BOM' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_lang = notepad.getCurrentLang() Lang_desc = notepad.getLanguageDesc(Curr_lang) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if editor.getEOLMode() == 0: Curr_eol = 'Windows (CR LF)' if editor.getEOLMode() == 1: Curr_eol = 'Macintosh (CR)' if editor.getEOLMode() == 2: Curr_eol = 'Unix (LF)' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Curr_wrap = 'NO' if editor.getWrapMode() == 1: Curr_wrap = 'YES' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'[^\r\n]', number) if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number) Total_1_byte = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'[\x{0080}-\x{07FF}]', number) if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: editor.research(r'[^\r\n]', number) Total_2_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number) Total_3_bytes = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n]', number) Total_Standard = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_4_bytes = 0 # By default if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Total_4_bytes = Total_Standard - Total_BMP # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\r|\n', number) Total_EOL = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Total_chars = Total_Standard + Total_EOL # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Bytes_length = Total_EOL + Total_1_byte # Default ANSI if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: Bytes_length = 2 * Total_chars if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes # -------------------------------------------------------------------------------------------------------------------------------------------------------------- BOM = 0 # Default ANSI and UTF-8 if notepad.getEncoding() == BUFFERENCODING.UTF8: BOM = 3 if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: BOM = 2 # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Buffer_length = Bytes_length + BOM # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'[^\r\n\t\x20]', number) Non_blank_chars = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 editor.research(r'\w+', number) Words_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number) else: editor.research(r'((?!\s).)+', number) Non_space_count = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number) Empty_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number) else: editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number) Blank_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Emp_blk_lines = Empty_lines + Blank_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- num = 0 if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number) else: editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number) Total_lines = num # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Non_blk_lines = Total_lines - Emp_blk_lines # -------------------------------------------------------------------------------------------------------------------------------------------------------------- Num_sel = editor.getSelections() # Get ALL selections ( EMPTY or NOT ) # print ('Res = ', Num_sel) if Num_sel != 0: Bytes_count = 0 Chars_count = 0 for n in range(Num_sel): Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n) Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n)) # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Chars_count < 2: Txt_chars = ' selected char (' else: Txt_chars = ' selected chars (' if Bytes_count < 2: Txt_bytes = ' selected byte) in ' else: Txt_bytes = ' selected bytes) in ' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- if Num_sel < 2 and Bytes_count == 0: Txt_ranges = ' EMPTY range\n' if Num_sel < 2 and Bytes_count > 0: Txt_ranges = ' range\n' if Num_sel > 1 and Bytes_count == 0: Txt_ranges = ' EMPTY ranges\n' if Num_sel > 1 and Bytes_count > 0: Txt_ranges = ' ranges (EMPTY or NOT)\n' # -------------------------------------------------------------------------------------------------------------------------------------------------------------- line_list = [] # empty list line_list.append ('-' * Line_title) line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now())) line_list.append ('-' * Line_title +'\n') line_list.append (' FULL File Path : ' + File_name + '\n') if os.path.isfile(File_name) == True: line_list.append(' CREATION Date : ' + Creation_date) line_list.append(' MODIFICATION Date : ' + Modif_date + '\n') line_list.append(' READ-ONLY flag : ' + RO_flag ) line_list.append (' READ-ONLY editor : ' + RO_editor + '\n\n') line_list.append (' Current VIEW : ' + Curr_view + '\n') line_list.append (' Current ENCODING : ' + Curr_encoding + '\n') line_list.append (' Current LANGUAGE : ' + str(Curr_lang) + ' (' + Lang_desc + ')\n') line_list.append (' Current Line END : ' + Curr_eol + '\n') line_list.append (' Current WRAPPING : ' + Curr_wrap + '\n\n') line_list.append (' 1-BYTE Chars : ' + str(Total_1_byte)) line_list.append (' 2-BYTES Chars : ' + str(Total_2_bytes)) line_list.append (' 3-BYTES Chars : ' + str(Total_3_bytes) + '\n') line_list.append (' Sum BMP Chars : ' + str(Total_BMP)) line_list.append (' 4-BYTES Chars : ' + str(Total_4_bytes) + '\n') line_list.append (' CHARS w/o CR & LF : ' + str(Total_Standard)) line_list.append (' EOL ( CR or LF ) : ' + str(Total_EOL) + '\n') line_list.append (' TOTAL characters : ' + str(Total_chars) + '\n\n') if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' * 1 + ' + str(Total_1_byte) + ' * 1b + '\ + str(Total_2_bytes) + ' * 2b + ' + str(Total_3_bytes) + ' * 3b + ' + str(Total_4_bytes) + ' * 4b)') if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 2b)') if notepad.getEncoding() == BUFFERENCODING.ENC8BIT: line_list.append (' BYTES Length : ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 1b)') line_list.append (' Byte Order Mark : ' + str(BOM) + '\n') line_list.append (' BUFFER Length : ' + str(Buffer_length)) if os.path.isfile(File_name) == True: line_list.append (' Length on DISK : ' + str(Size_length) + '\n\n') else: line_list.append ('\n') line_list.append (' NON-Blank Chars : ' + str(Non_blank_chars) + '\n') line_list.append (' WORDS Count : ' + str(Words_count) + ' (Caution !)\n') line_list.append (' NON-SPACE Count : ' + str(Non_space_count) + '\n\n') line_list.append (' True EMPTY lines : ' + str(Empty_lines)) line_list.append (' True BLANK lines : ' + str(Blank_lines) + '\n') line_list.append (' EMPTY/BLANK lines : ' + str(Emp_blk_lines) + '\n') line_list.append (' NON-BLANK lines : ' + str(Non_blk_lines)) line_list.append (' TOTAL Lines : ' + str(Total_lines) + '\n\n') line_list.append (' SELECTION(S) : ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges) editor.copyText ('\r\n'.join(line_list)) notepad.new() editor.paste() if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UCS-2 BE BOM' and St_bar != 'UCS-2 LE BOM': if Curr_encoding == 'UTF-8': # SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + ' => Possible ERRONEOUS results' + \ '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '') # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
-
Hi, Alan and All,
( Continuation of the previous post )
Now, I’ve come accross a problem with the encodings !
Have you ever noticed that, when you decide to re-interpret the present encoding of a file with the
View > Character Set > ...
feature, that there are two possible scenarios ?-
A) - The present econding is an Unicode encoding with a BOM (
Byte Order Mark
). So, either theUTF-8-BOM
,UCS-2 BE BOM
orUCS-2 LE BOM
encoding -
B) - The present encoding is an
ANSI
orUTF-8
file, so without aBOM
In the first case, whatever the new encoding chosen (
one-byte
ortwo-bytes
encoding ), the file contents do not change and my script just respects the real encoding of the current fileFor example, with an
UCS-2 LE BOM
encoded file, if I change its encoding toView > Character Set > Western European > OEM-US
, my new summary just consider that it’s still a trueUCS-2 LE BOM
encoded file, leading to a correct summary report !In the second case, the new encoding chosen does modify the current file contents in the editor window. In addition, it automatically supposes that the current file is an
UTF-8
encoded file, leading to erroneous results in the summary rapport :-( However, the current file contents, saved on the disk, seem still unchanged !!For instance :
-
Open a new tab
-
Use the
Encoding > Convert to UTF-8
feature, if necessary -
Enter the four chars
Aé☀𝜜
, without any line-break, at the end -
Save this file as
Test-UTF8.txt
-
Using my script, you get, in a new tab :
--------------------------------------------------------------------------------------------- SUMMARY on 2024-02-05 16:50:23.656000 --------------------------------------------------------------------------------------------- FULL File Path : D:\@@\792\Test-UTF8.txt CREATION Date : Mon Feb 5 16:45:24 2024 MODIFICATION Date : Mon Feb 5 15:17:02 2024 READ-ONLY flag : NO READ-ONLY editor : NO Current VIEW : MAIN View Current ENCODING : UTF-8 Current LANGUAGE : TXT (Normal text file) Current Line END : Windows (CR LF) Current WRAPPING : YES 1-BYTE Chars : 1 2-BYTES Chars : 1 3-BYTES Chars : 1 Sum BMP Chars : 3 4-BYTES Chars : 1 CHARS w/o CR & LF : 4 EOL ( CR or LF ) : 0 TOTAL characters : 4 BYTES Length : 10 (0 * 1 + 1 * 1b + 1 * 2b + 1 * 3b + 1 * 4b) Byte Order Mark : 0 BUFFER Length : 10 Length on DISK : 10 NON-Blank Chars : 4 WORDS Count : 1 (Caution !) NON-SPACE Count : 1 True EMPTY lines : 0 True BLANK lines : 0 EMPTY/BLANK lines : 0 NON-BLANK lines : 1 TOTAL Lines : 1 SELECTION(S) : 0 selected char (0 selected byte) in 1 EMPTY range
Everything is OK (
buffer length
andlength on disk
are identical and thebytes length
description shows one char for each number of bytes, without any EOL )-
Now, switch back to the
Test-UTF8.txt
file -
Run the
View > Character Set > Western European > OEM-US
feature -
Re-run my script. This time, in a other new tab, you get :
--------------------------------------------------------------------------------------------- SUMMARY on 2024-02-05 16:51:16.937000 --------------------------------------------------------------------------------------------- FULL File Path : D:\@@\792\Test-UTF8.txt CREATION Date : Mon Feb 5 16:45:24 2024 MODIFICATION Date : Mon Feb 5 15:17:02 2024 READ-ONLY flag : NO READ-ONLY editor : NO Current VIEW : MAIN View Current ENCODING : UTF-8 Current LANGUAGE : TXT (Normal text file) Current Line END : Windows (CR LF) Current WRAPPING : YES 1-BYTE Chars : 1 2-BYTES Chars : 6 3-BYTES Chars : 3 Sum BMP Chars : 10 4-BYTES Chars : 0 CHARS w/o CR & LF : 10 EOL ( CR or LF ) : 0 TOTAL characters : 10 BYTES Length : 22 (0 * 1 + 1 * 1b + 6 * 2b + 3 * 3b + 0 * 4b) Byte Order Mark : 0 BUFFER Length : 22 Length on DISK : 10 NON-Blank Chars : 10 WORDS Count : 2 (Caution !) NON-SPACE Count : 1 True EMPTY lines : 0 True BLANK lines : 0 EMPTY/BLANK lines : 0 NON-BLANK lines : 1 TOTAL Lines : 1 SELECTION(S) : 0 selected char (0 selected byte) in 1 EMPTY range
And, at the same time, a prompt displays this warning :
CURRENT file re-interpreted as OEM-US => Possible ERRONEOUS results
So, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script
Indeed, this time, as the file contents are unchanged, the
length on DISK
is still correct but theBUFFER length
is wrong, due to the re-interpretation of the characters by theOEM-US
encoding. That’s why I preferred to add this warning at the end of the script !Now, do as it is said :
-
Close the
Test-UTF8.txt
file (Ctrl + W
) -
Restore it (
Ctrl + Shift + T
) -
Again, you get the
UTF-8
indication, for theTest-UTF8.txt
file, at right of the status bar -
Re-run my script
=> This time, we get again a correct summary, without any prompt !
Alan or other
python
gurus, feel free to improve this last version and/or test on various files if all the numbers shown are coherent !Best Regards,
guy038
-