Community
    • Login

    "Summary" feature improvement

    Scheduled Pinned Locked Moved General Discussion
    31 Posts 3 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by

      Hi @alan-kilborn and All,

      Continuation of my script :

      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      def number(occ):
          global num
          num += 1
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          Line_title = 93
      else:
          Line_title = 71
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      File_name = notepad.getCurrentFilename()
      
      if os.path.isfile(File_name) == True:
      
          Creation_date = time.ctime(os.path.getctime(File_name))
      
          Modif_date = time.ctime(os.path.getmtime(File_name))
      
          Size_length = os.path.getsize(File_name)
      
          RO_flag = 'YES'
      
          if os.access(File_name, os.W_OK):
              RO_flag = 'NO'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      RO_editor = 'NO'
      
      if editor.getReadOnly() == True:
          RO_editor = 'YES'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      if notepad.getCurrentView() == 0:
          Curr_view = 'MAIN View'
      else:
          Curr_view = 'SECONDARY view'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Curr_encoding = str(notepad.getEncoding())
      
      if Curr_encoding == 'ENC8BIT':
          Curr_encoding = 'ANSI'
      
      if Curr_encoding == 'COOKIE':
          Curr_encoding = 'UTF-8'
      
      if Curr_encoding == 'UTF8':
          Curr_encoding = 'UTF8-BOM'
      
      if Curr_encoding == 'UCS2BE':
          Curr_encoding = 'UCS-2 BE BOM'
      
      if Curr_encoding == 'UCS2LE':
          Curr_encoding = 'UCS-2 LE BOM'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Curr_lang = notepad.getCurrentLang()
      
      Lang_desc = notepad.getLanguageDesc(Curr_lang)
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      if editor.getEOLMode() == 0:
          Curr_eol = 'Windows (CR LF)'
      
      if editor.getEOLMode() == 1:
          Curr_eol = 'Macintosh (CR)'
      
      if editor.getEOLMode() == 2:
          Curr_eol = 'Unix (LF)'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Curr_wrap = 'NO'
      
      if editor.getWrapMode() == 1:
          Curr_wrap = 'YES'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
          editor.research(r'[^\r\n]', number)
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)
      
      Total_1_byte = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          editor.research(r'[\x{0080}-\x{07FF}]', number)
      
      if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
          editor.research(r'[^\r\n]', number)
      
      Total_2_bytes = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)
      
      Total_3_bytes = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      num = 0
      editor.research(r'[^\r\n]', number)
      
      Total_Standard = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Total_4_bytes = 0  #  By default
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          Total_4_bytes = Total_Standard - Total_BMP
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      editor.research(r'\r|\n', number)
      
      Total_EOL = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Total_chars = Total_Standard + Total_EOL
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Bytes_length = Total_EOL + Total_1_byte  #  Default ANSI
      
      if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
          Bytes_length = 2 * Total_chars
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      BOM = 0  #  Default ANSI and UTF-8
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8:
          BOM = 3
      
      if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
          BOM = 2
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Buffer_length = Bytes_length + BOM
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      editor.research(r'[^\r\n\t\x20]', number)
      
      Non_blank_chars = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      editor.research(r'\w+', number)
      
      Words_count = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number)
      else:
          editor.research(r'((?!\s).)+', number)
      
      Non_space_count = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
          editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number)
      else:
          editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)
      
      Empty_lines = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
          editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
      else:
          editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
      
      Blank_lines = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Emp_blk_lines = Empty_lines + Blank_lines
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      num = 0
      if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
          editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
      else:
          editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)
      
      Total_lines = num
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Non_blk_lines = Total_lines - Emp_blk_lines
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )
      
      # print ('Res = ', Num_sel)
      
      if Num_sel != 0:
      
          Bytes_count = 0
          Chars_count = 0
      
          for n in range(Num_sel):
      
              Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
      
              Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
          if Chars_count < 2:
              Txt_chars = ' selected char ('
      
          else:
              Txt_chars = ' selected chars ('
      
      
          if Bytes_count < 2:
              Txt_bytes = ' selected byte) in '
      
          else:
              Txt_bytes = ' selected bytes) in '
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
          if Num_sel < 2 and Bytes_count == 0:
              Txt_ranges = ' EMPTY range\n'
      
          if Num_sel < 2 and Bytes_count > 0:
              Txt_ranges = ' range\n'
      
          if Num_sel > 1 and Bytes_count == 0:
              Txt_ranges = ' EMPTY ranges\n'
      
          if Num_sel > 1 and Bytes_count > 0:
              Txt_ranges = ' ranges (EMPTY or NOT)\n'
      
      # --------------------------------------------------------------------------------------------------------------------------------------------------------------
      
      line_list = []  # empty list
      
      line_list.append ('-' * Line_title)
      
      line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
      
      line_list.append ('-' * Line_title +'\n')
      
      line_list.append (' FULL File Path    :  ' + File_name + '\n')
      
      if os.path.isfile(File_name) == True:
      
          line_list.append(' CREATION     Date :  ' + Creation_date)
      
          line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')
      
          line_list.append(' READ-ONLY flag    :  ' + RO_flag )
      
      line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')
      
      line_list.append (' Current VIEW      :  ' + Curr_view + '\n')
      
      line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')
      
      line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')
      
      line_list.append (' Current Line END  :  ' + Curr_eol + '\n')
      
      line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')
      
      line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))
      
      line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))
      
      line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')
      
      line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))
      
      line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')
      
      line_list.append (' CHARS w/o CR & LF :  ' + str(Total_Standard))
      
      line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')
      
      line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')
      
      if notepad.getEncoding() == BUFFERENCODING.UTF8 or notepad.getEncoding() == BUFFERENCODING.COOKIE:
          line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' * 1 + ' + str(Total_1_byte) + ' * 1b + '\
          + str(Total_2_bytes) + ' * 2b + ' + str(Total_3_bytes) + ' * 3b + ' + str(Total_4_bytes) + ' * 4b)')
      
      if notepad.getEncoding() == BUFFERENCODING.UCS2BE or notepad.getEncoding() == BUFFERENCODING.UCS2LE:
          line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 2b)')
      
      if notepad.getEncoding() == BUFFERENCODING.ENC8BIT:
          line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_chars) + ' * 1b)')
      
      line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')
      
      line_list.append (' BUFFER Length     :  ' + str(Buffer_length))
      
      if os.path.isfile(File_name) == True:
          line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
      else:
          line_list.append ('\n')
      
      line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')
      
      line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')
      
      line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')
      
      line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))
      
      line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')
      
      line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')
      
      line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))
      
      line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')
      
      line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)
      
      editor.copyText ('\r\n'.join(line_list))
      
      notepad.new()
      
      editor.paste()
      
      if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UCS-2 BE BOM' and St_bar != 'UCS-2 LE BOM':
      
          if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature
      
              notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =>  Possible ERRONEOUS results' + \
                              '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')
      
      # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
      
      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, Alan and All,

        ( Continuation of the previous post )

        Now, I’ve come accross a problem with the encodings !

        Have you ever noticed that, when you decide to re-interpret the present encoding of a file with the View > Character Set > ... feature, that there are two possible scenarios ?

        • A) - The present econding is an Unicode encoding with a BOM ( Byte Order Mark ). So, either the UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM encoding

        • B) - The present encoding is an ANSI or UTF-8 file, so without a BOM

        In the first case, whatever the new encoding chosen ( one-byte or two-bytes encoding ), the file contents do not change and my script just respects the real encoding of the current file

        For example, with an UCS-2 LE BOM encoded file, if I change its encoding to View > Character Set > Western European > OEM-US, my new summary just consider that it’s still a true UCS-2 LE BOM encoded file, leading to a correct summary report !

        In the second case, the new encoding chosen does modify the current file contents in the editor window. In addition, it automatically supposes that the current file is an UTF-8 encoded file, leading to erroneous results in the summary rapport :-( However, the current file contents, saved on the disk, seem still unchanged !!

        For instance :

        • Open a new tab

        • Use the Encoding > Convert to UTF-8 feature, if necessary

        • Enter the four chars Aé☀𝜜, without any line-break, at the end

        • Save this file as Test-UTF8.txt

        • Using my script, you get, in a new tab :

        ---------------------------------------------------------------------------------------------
                                    SUMMARY on 2024-02-05 16:50:23.656000
        ---------------------------------------------------------------------------------------------
        
         FULL File Path    :  D:\@@\792\Test-UTF8.txt
        
         CREATION     Date :  Mon Feb  5 16:45:24 2024
         MODIFICATION Date :  Mon Feb  5 15:17:02 2024
        
         READ-ONLY flag    :  NO
         READ-ONLY editor  :  NO
        
        
         Current VIEW      :  MAIN View
        
         Current ENCODING  :  UTF-8
        
         Current LANGUAGE  :  TXT  (Normal text file)
        
         Current Line END  :  Windows (CR LF)
        
         Current WRAPPING  :  YES
        
        
         1-BYTE  Chars     :  1
         2-BYTES Chars     :  1
         3-BYTES Chars     :  1
        
         Sum BMP Chars     :  3
         4-BYTES Chars     :  1
        
         CHARS w/o CR & LF :  4
         EOL ( CR or LF )  :  0
        
         TOTAL characters  :  4
        
        
         BYTES Length      :  10 (0 * 1 + 1 * 1b + 1 * 2b + 1 * 3b + 1 * 4b)
         Byte Order Mark   :  0
        
         BUFFER Length     :  10
         Length on DISK    :  10
        
        
         NON-Blank Chars   :  4
        
         WORDS     Count   :  1 (Caution !)
        
         NON-SPACE Count   :  1
        
        
         True EMPTY lines  :  0
         True BLANK lines  :  0
        
         EMPTY/BLANK lines :  0
        
         NON-BLANK lines   :  1
         TOTAL Lines       :  1
        
        
         SELECTION(S)      :  0 selected char (0 selected byte) in 1 EMPTY range
        

        Everything is OK ( buffer length and length on disk are identical and the bytes length description shows one char for each number of bytes, without any EOL )

        • Now, switch back to the Test-UTF8.txt file

        • Run the View > Character Set > Western European > OEM-US feature

        • Re-run my script. This time, in a other new tab, you get :

        ---------------------------------------------------------------------------------------------
                                    SUMMARY on 2024-02-05 16:51:16.937000
        ---------------------------------------------------------------------------------------------
        
         FULL File Path    :  D:\@@\792\Test-UTF8.txt
        
         CREATION     Date :  Mon Feb  5 16:45:24 2024
         MODIFICATION Date :  Mon Feb  5 15:17:02 2024
        
         READ-ONLY flag    :  NO
         READ-ONLY editor  :  NO
        
        
         Current VIEW      :  MAIN View
        
         Current ENCODING  :  UTF-8
        
         Current LANGUAGE  :  TXT  (Normal text file)
        
         Current Line END  :  Windows (CR LF)
        
         Current WRAPPING  :  YES
        
        
         1-BYTE  Chars     :  1
         2-BYTES Chars     :  6
         3-BYTES Chars     :  3
        
         Sum BMP Chars     :  10
         4-BYTES Chars     :  0
        
         CHARS w/o CR & LF :  10
         EOL ( CR or LF )  :  0
        
         TOTAL characters  :  10
        
        
         BYTES Length      :  22 (0 * 1 + 1 * 1b + 6 * 2b + 3 * 3b + 0 * 4b)
         Byte Order Mark   :  0
        
         BUFFER Length     :  22
         Length on DISK    :  10
        
        
         NON-Blank Chars   :  10
        
         WORDS     Count   :  2 (Caution !)
        
         NON-SPACE Count   :  1
        
        
         True EMPTY lines  :  0
         True BLANK lines  :  0
        
         EMPTY/BLANK lines :  0
        
         NON-BLANK lines   :  1
         TOTAL Lines       :  1
        
        
         SELECTION(S)      :  0 selected char (0 selected byte) in 1 EMPTY range
        

        And, at the same time, a prompt displays this warning :

        CURRENT file re-interpreted as OEM-US => Possible ERRONEOUS results
        So, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script

        Indeed, this time, as the file contents are unchanged, the length on DISK is still correct but the BUFFER length is wrong, due to the re-interpretation of the characters by the OEM-US encoding. That’s why I preferred to add this warning at the end of the script !

        Now, do as it is said :

        • Close the Test-UTF8.txt file ( Ctrl + W )

        • Restore it ( Ctrl + Shift + T )

        • Again, you get the UTF-8 indication, for the Test-UTF8.txt file, at right of the status bar

        • Re-run my script

        => This time, we get again a correct summary, without any prompt !


        Alan or other python gurus, feel free to improve this last version and/or test on various files if all the numbers shown are coherent !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hi All,

          I"ve just realized that, up to now, I simply improved my script with an old version of N++ ( v7.9.2 ). I apologize…

          So, I’m first going to update my last portable version, on my W10 laptop, from v8.5.4 to the v8.6.2 version and I will update my script and redo all the tests

          See you later !

          BR

          guy038

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @guy038
            last edited by

            @guy038 said in Improved version of the "Summary" feature, ...:

            I"ve just realized that, up to now, I simply improved my script with an old version of N++ ( v7.9.2 ). I apologize…

            :-(

            You ought to close out these ancient versions…permanently.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, @alan-kilborn and All,

              I’e just discovered that, since the v8.0 N++ version, the UCS-2 BE BOM and UCS-2 LE BOM encodings are able to handle all the characters over the BMP. Thus, these encoding were renamed, respectively, as UTF-16 BE BOM and UTF-16 LE BOM !

              Note that, with these two encodings, each character with code > \x{FFFF} is built with the surrogate pair mechanism, so with two 16-bytes chars. Consequently, the total number of characters in the buffer = 2 (BOM) + number of chars <= x{FFFF} x 2 + number of chars > x{FFFF} x 4

              For example, the simple string Aé☀𝜜, without any EOL, in an UTF-16 BE encoding file, is coded with 12 bytes as :

              
              FE FF 00 41 00 E9 26 00 D8 35 DF 1C
              ----- ----- ----- ----- -----------
               BOM    A     é     ☀       𝜜
              
              

              So, here is my final and updated version of the script, which works in all versions since the v8.0 one !

              # encoding=utf-8
              
              #-------------------------------------------------------------------------
              #                    STATISTICS about the CURRENT file ( v0.5 )
              #-------------------------------------------------------------------------
              
              from __future__ import print_function    # for Python2 compatibility
              
              from Npp import *
              
              import re
              
              import os, time, datetime
              
              import ctypes
              
              from ctypes.wintypes import BOOL, HWND, WPARAM, LPARAM, UINT
              
              # --------------------------------------------------------------------------------------------------------------------------------------------------------------
              #  From @alan-kilborn, in post https://community.notepad-plus-plus.org/topic/21733/pythonscript-different-behavior-in-script-vs-in-immediate-mode/4
              # --------------------------------------------------------------------------------------------------------------------------------------------------------------
              
              def npp_get_statusbar(statusbar_item_number):
              
                  WNDENUMPROC = ctypes.WINFUNCTYPE(BOOL, HWND, LPARAM)
                  FindWindowW = ctypes.windll.user32.FindWindowW
                  FindWindowExW = ctypes.windll.user32.FindWindowExW
                  SendMessageW = ctypes.windll.user32.SendMessageW
                  LRESULT = LPARAM
                  SendMessageW.restype = LRESULT
                  SendMessageW.argtypes = [ HWND, UINT, WPARAM, LPARAM ]
                  EnumChildWindows = ctypes.windll.user32.EnumChildWindows
                  GetClassNameW = ctypes.windll.user32.GetClassNameW
                  create_unicode_buffer = ctypes.create_unicode_buffer
              
                  SBT_OWNERDRAW = 0x1000
                  WM_USER = 0x400; SB_GETTEXTLENGTHW = WM_USER + 12; SB_GETTEXTW = WM_USER + 13
              
                  npp_get_statusbar.STATUSBAR_HANDLE = None
              
                  def get_result_from_statusbar(statusbar_item_number):
                      assert statusbar_item_number <= 5
                      retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTLENGTHW, statusbar_item_number, 0)
                      length = retcode & 0xFFFF
                      type = (retcode >> 16) & 0xFFFF
                      assert (type != SBT_OWNERDRAW)
                      text_buffer = create_unicode_buffer(length)
                      retcode = SendMessageW(npp_get_statusbar.STATUSBAR_HANDLE, SB_GETTEXTW, statusbar_item_number, ctypes.addressof(text_buffer))
                      retval = '{}'.format(text_buffer[:length])
                      return retval
              
                  def EnumCallback(hwnd, lparam):
                      curr_class = create_unicode_buffer(256)
                      GetClassNameW(hwnd, curr_class, 256)
                      if curr_class.value.lower() == "msctls_statusbar32":
                          npp_get_statusbar.STATUSBAR_HANDLE = hwnd
                          return False  # stop the enumeration
                      return True  # continue the enumeration
              
                  npp_hwnd = FindWindowW(u"Notepad++", None)
                  EnumChildWindows(npp_hwnd, WNDENUMPROC(EnumCallback), 0)
                  if npp_get_statusbar.STATUSBAR_HANDLE: return get_result_from_statusbar(statusbar_item_number)
                  assert False
              
              St_bar = npp_get_statusbar(4)  # Zone 4 ( STATUSBARSECTION.UNICODETYPE )
              

              See next post for continuation !

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by

                Hi, @alan-kilborn and All,

                Continuation of the script :

                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                def number(occ):
                    global num
                    num += 1
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Curr_encoding = str(notepad.getEncoding())
                
                if Curr_encoding == 'ENC8BIT':
                    Curr_encoding = 'ANSI'
                
                if Curr_encoding == 'COOKIE':
                    Curr_encoding = 'UTF-8'
                
                if Curr_encoding == 'UTF8':
                    Curr_encoding = 'UTF-8-BOM'
                
                if Curr_encoding == 'UCS2BE':
                    Curr_encoding = 'UTF-16 BE BOM'
                
                if Curr_encoding == 'UCS2LE':
                    Curr_encoding = 'UTF-16 LE BOM'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    Line_title = 95
                else:
                    Line_title = 75
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                File_name = notepad.getCurrentFilename()
                
                if os.path.isfile(File_name) == True:
                
                    Creation_date = time.ctime(os.path.getctime(File_name))
                
                    Modif_date = time.ctime(os.path.getmtime(File_name))
                
                    Size_length = os.path.getsize(File_name)
                
                    RO_flag = 'YES'
                
                    if os.access(File_name, os.W_OK):
                        RO_flag = 'NO'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                RO_editor = 'NO'
                
                if editor.getReadOnly() == True:
                    RO_editor = 'YES'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                if notepad.getCurrentView() == 0:
                    Curr_view = 'MAIN View'
                else:
                    Curr_view = 'SECONDARY view'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Curr_lang = notepad.getCurrentLang()
                
                Lang_desc = notepad.getLanguageDesc(Curr_lang)
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                if editor.getEOLMode() == 0:
                    Curr_eol = 'Windows (CR LF)'
                
                if editor.getEOLMode() == 1:
                    Curr_eol = 'Macintosh (CR)'
                
                if editor.getEOLMode() == 2:
                    Curr_eol = 'Unix (LF)'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Curr_wrap = 'NO'
                
                if editor.getWrapMode() == 1:
                    Curr_wrap = 'YES'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'ANSI':
                    editor.research(r'[^\r\n]', number)
                
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    editor.research(r'(?![\r\n])[\x{0000}-\x{007F}]', number)
                
                Total_1_byte = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    editor.research(r'[\x{0080}-\x{07FF}]', number)
                
                if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
                    editor.research(r'(?![\r\n\x{D800}-\x{DFFF}])[\x{0000}-\x{FFFF}]', number)  #  ALL BMP vchars ( With PYTHON, the [^\r\n\x{D800}-\x{DFFF}] syntax does NOT work properly !)
                
                Total_2_bytes = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    editor.research(r'(?![\x{D800}-\x{DFFF}])[\x{0800}-\x{FFFF}]', number)
                
                Total_3_bytes = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Total_BMP = Total_1_byte + Total_2_bytes + Total_3_bytes
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                num = 0
                editor.research(r'[^\r\n]', number)
                
                Total_standard = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Total_4_bytes = 0  #  By default
                
                if Curr_encoding != 'ANSI':
                    Total_4_bytes = Total_standard - Total_BMP
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                editor.research(r'\r|\n', number)
                
                Total_EOL = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Total_chars = Total_EOL + Total_standard
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                if Curr_encoding == 'ANSI':
                    Bytes_length = Total_EOL + Total_1_byte
                
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    Bytes_length = Total_EOL + Total_1_byte + 2 * Total_2_bytes + 3 * Total_3_bytes + 4 * Total_4_bytes
                
                if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
                    Bytes_length = 2 * Total_EOL + 2 * Total_BMP + 4 * Total_4_bytes
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                BOM = 0  #  Default ANSI and UTF-8
                
                if Curr_encoding == 'UTF-8-BOM':
                    BOM = 3
                
                if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
                    BOM = 2
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Buffer_length = Bytes_length + BOM
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                editor.research(r'[^\r\n\t\x20]', number)
                
                Non_blank_chars = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                editor.research(r'\w+', number)
                
                Words_count = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                
                if Curr_encoding == 'ANSI':
                    editor.research(r'((?!\s).)+', number)
                else:
                    editor.research(r'((?!\s).[\x{D800}-\x{DFFF}]?)+', number)
                
                Non_space_count = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'ANSI':
                    editor.research(r'(?<!\f)^(?:\r\n|\r|\n)', number)
                else:
                    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^(?:\r\n|\r|\n)', number)
                
                Empty_lines = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'ANSI':
                    editor.research(r'(?<!\f)^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
                else:
                    editor.research(r'(?<![\f\x{0085}\x{2028}\x{2029}])^[\t\x20]+(?:\r\n|\r|\n|\z)', number)
                
                Blank_lines = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Emp_blk_lines = Empty_lines + Blank_lines
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                num = 0
                if Curr_encoding == 'ANSI':
                    editor.research(r'(?-s)\r\n|\r|\n|(?:.|\f)\z', number)
                else:
                    editor.research(r'(?-s)\r\n|\r|\n|(?:.|[\f\x{0085}\x{2028}\x{2029}])\z', number)
                
                Total_lines = num
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Non_blk_lines = Total_lines - Emp_blk_lines
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                Num_sel = editor.getSelections()  # Get ALL selections ( EMPTY or NOT )
                
                # print ('Res = ', Num_sel)
                
                if Num_sel != 0:
                
                    Bytes_count = 0
                    Chars_count = 0
                
                    for n in range(Num_sel):
                
                        Bytes_count += editor.getSelectionNEnd(n) - editor.getSelectionNStart(n)
                
                        Chars_count += editor.countCharacters(editor.getSelectionNStart(n), editor.getSelectionNEnd(n))
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                    if Chars_count < 2:
                        Txt_chars = ' selected char ('
                
                    else:
                        Txt_chars = ' selected chars ('
                
                
                    if Bytes_count < 2:
                        Txt_bytes = ' selected byte) in '
                
                    else:
                        Txt_bytes = ' selected bytes) in '
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                    if Num_sel < 2 and Bytes_count == 0:
                        Txt_ranges = ' EMPTY range\n'
                
                    if Num_sel < 2 and Bytes_count > 0:
                        Txt_ranges = ' range\n'
                
                    if Num_sel > 1 and Bytes_count == 0:
                        Txt_ranges = ' EMPTY ranges\n'
                
                    if Num_sel > 1 and Bytes_count > 0:
                        Txt_ranges = ' ranges (EMPTY or NOT)\n'
                
                # --------------------------------------------------------------------------------------------------------------------------------------------------------------
                
                line_list = []  # empty list
                
                line_list.append ('-' * Line_title)
                
                line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
                
                line_list.append ('-' * Line_title +'\n')
                
                line_list.append (' FULL File Path    :  ' + File_name + '\n')
                
                if os.path.isfile(File_name) == True:
                
                    line_list.append(' CREATION     Date :  ' + Creation_date)
                
                    line_list.append(' MODIFICATION Date :  ' + Modif_date + '\n')
                
                    line_list.append(' READ-ONLY flag    :  ' + RO_flag )
                
                line_list.append (' READ-ONLY editor  :  ' + RO_editor + '\n\n')
                
                line_list.append (' Current VIEW      :  ' + Curr_view + '\n')
                
                line_list.append (' Current ENCODING  :  ' + Curr_encoding + '\n')
                
                line_list.append (' Current LANGUAGE  :  ' + str(Curr_lang) + '  (' + Lang_desc + ')\n')
                
                line_list.append (' Current Line END  :  ' + Curr_eol + '\n')
                
                line_list.append (' Current WRAPPING  :  ' + Curr_wrap + '\n\n')
                
                line_list.append (' 1-BYTE  Chars     :  ' + str(Total_1_byte))
                
                line_list.append (' 2-BYTES Chars     :  ' + str(Total_2_bytes))
                
                line_list.append (' 3-BYTES Chars     :  ' + str(Total_3_bytes) + '\n')
                
                line_list.append (' Sum BMP Chars     :  ' + str(Total_BMP))
                
                line_list.append (' 4-BYTES Chars     :  ' + str(Total_4_bytes) + '\n')
                
                line_list.append (' CHARS w/o CR & LF :  ' + str(Total_standard))
                
                line_list.append (' EOL ( CR or LF )  :  ' + str(Total_EOL) + '\n')
                
                line_list.append (' TOTAL characters  :  ' + str(Total_chars) + '\n\n')
                
                if Curr_encoding == 'ANSI':
                    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b)')
                
                if Curr_encoding == 'UTF-8' or Curr_encoding == 'UTF-8-BOM':
                    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 1 + ' + str(Total_1_byte) + ' x 1b + '\
                    + str(Total_2_bytes) + ' x 2b + ' + str(Total_3_bytes) + ' x 3b + ' + str(Total_4_bytes) + ' x 4b)')
                
                if Curr_encoding == 'UTF-16 BE BOM' or Curr_encoding == 'UTF-16 LE BOM':
                    line_list.append (' BYTES Length      :  ' + str(Bytes_length) + ' (' + str(Total_EOL) + ' x 2 + ' + str(Total_BMP) + ' x 2b + ' + str(Total_4_bytes) + ' x 4b)')
                
                line_list.append (' Byte Order Mark   :  ' + str(BOM) + '\n')
                
                line_list.append (' BUFFER Length     :  ' + str(Buffer_length))
                
                if os.path.isfile(File_name) == True:
                    line_list.append (' Length on DISK    :  ' + str(Size_length) + '\n\n')
                else:
                    line_list.append ('\n')
                
                line_list.append (' NON-Blank Chars   :  ' + str(Non_blank_chars) + '\n')
                
                line_list.append (' WORDS     Count   :  ' + str(Words_count) + ' (Caution !)\n')
                
                line_list.append (' NON-SPACE Count   :  ' + str(Non_space_count) + '\n\n')
                
                line_list.append (' True EMPTY lines  :  ' + str(Empty_lines))
                
                line_list.append (' True BLANK lines  :  ' + str(Blank_lines) + '\n')
                
                line_list.append (' EMPTY/BLANK lines :  ' + str(Emp_blk_lines) + '\n')
                
                line_list.append (' NON-BLANK lines   :  ' + str(Non_blk_lines))
                
                line_list.append (' TOTAL Lines       :  ' + str(Total_lines) + '\n\n')
                
                line_list.append (' SELECTION(S)      :  ' + str(Chars_count) + Txt_chars + str(Bytes_count) + Txt_bytes + str(Num_sel) + Txt_ranges)
                
                editor.copyText ('\r\n'.join(line_list))
                
                notepad.new()
                
                editor.paste()
                
                editor.copyText('')
                
                if St_bar != 'ANSI' and St_bar != 'UTF-8' and St_bar != 'UTF-8-BOM' and St_bar != 'UTF-16 BE BOM' and St_bar != 'UTF-16 LE BOM':
                
                    if Curr_encoding == 'UTF-8':  #  SAME value for both an 'UTF-8' or 'ANSI' file, when RE-INTERPRETED with the 'Encoding > Character Set > ...' feature
                
                        notepad.prompt ('CURRENT file re-interpreted as ' + St_bar + '  =>  Possible ERRONEOUS results' + \
                                        '\nSo, CLOSE the file WITHOUT saving, RESTORE it (CTRL + SHIFT + T) and RESTART script', '!!! WARNING !!!', '')
                
                # ----Aé☀𝜜-----------------------------------------------------------------------------------------------------------------------------------------------------
                

                If you’re still working or doing tests wih a N++ version prior to v8.0 :

                • First, change any sub-string UTF-16 with UCS-2, in the python script

                • And, of course, do not forget to get rid of any character over \x{FFFF} in your UCS-2 BE/LE BOM encoded files, before using this script


                Note, that the encoding problem, described two posts ago, when trying to encode any file, without a BOM, with a Encoding > Character Set > ... encoding, stll remains. Thus, the warning prompt is still present at the end of this final version !


                Now, I’m going to update an old post where I explained the poor performance of the present summary feature. I’ll take the opportunity to include the instructions for understanding this improved script !

                Best Regards,

                guy038

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038

                  You have this line in your script:

                  line_list.append (' ' * ((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
                  

                  I would suggest changing it to:

                  line_list.append (' ' * int((Line_title - 37) / 2) + 'SUMMARY on ' + str(datetime.datetime.now()))
                  

                  This is because, without the int, under Python3 we see the following error:

                  TypeError: can't multiply sequence by non-int of type 'float'
                  
                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by

                    Hi, @alan-kilborn and All,

                    Just follow this link to find out why I decided to improve the View > Summary feature and to get the last version of the Python script, wich gives us a decent and exact Summary feature !

                    https://community.notepad-plus-plus.org/post/92794 ( 4 posts )

                    BR

                    guy038

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038 said:

                      Just follow this link

                      I’m MIGHTY confused as to why you felt the need to reanimate a several-years-old topic/thread to continue discussing what you dedicated this current thread to…
                      Why not just keep talking here?

                      1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @alan-kilborn,

                        Sorry to get you confused. I’ll try to explain why I wanted to continue on the other thread !

                        • Firsly, I wanted to show from where and why my script came : the whole logic of the View > Summary needed to be completely rebuilt :-((

                        • Secondly, I wanted to update these old posts. Indeed, at that time, the v7.9.1 N++ version was just released. So, I recently did some tests to verify if, consecutively to the encoding improvements of the v8.0 version, the global logic of the summary has been improved. Unfortunately, the View > Summary feature still gives wrong results, especially when the present file is a UTF-16 BE BOM or UTF-16 LE BOM encoded file :-((

                        • Thus, it seemed obvious to me to continue on this thread and add the consecutive versions of my script !


                        Now, I realized that I could have stayed with this new thread, and put a link to my initial post to help people to understand the reasons of this Python script !

                        So, unless you’re terribly upset of my decision ( which would need a lot of modifications ) , I suppose that I’m going on to post the possible new versions of my script on the other thread !

                        In order to get it more clear, I could simply rename this present thread as Summary feature improvement and rename the other thread as Emulation of the "Summary" feature with Python script

                        Alan, what do you think of ?

                        Best Regards,

                        guy038

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @guy038
                          last edited by

                          @guy038 said in Improved version of the "Summary" feature, ...:

                          what do you think of ?

                          I wouldn’t bother trying to rename things at this point.
                          It’s no problem simply because I was confused (that’s MY problem). :-)
                          Carry on… :-)

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors