Community
    • Login

    How Np++ handles char count?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 4 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pintassilgoP
      pintassilgo
      last edited by pintassilgo

      Please, could someone point me to the piece of code that handles the char count displayed in statusbar (“length”)? Not the displaying of the counter, but the code that does the counting.

      Context:

      I switched to Linux a year ago and, as much as Np++ works though Wine, a native app is obviously preferred. I spent all this time without finding a proper Linux replacement to Np++ until I discovered CudaText past week. Super customizable, has powerful plugins similar to Np++ ones, the UI can be simplified to simple notepad app with tabs instead of the scary UI of complete IDEs… And it has good performance with big files and files with very long lines, where many editors fail. Even Np++ has a little delay when I write on big files.

      I asked for a few improvements to the app and the dev quickly added them. Now, my last request was to add char counter to statusbar. The dev said it wasn’t implemented for performance reasons. I’d like to show code examples or at least approaches used by other FOSS editors to convince him that a char counter is harmless.

      Thanks.

      Alan KilbornA pintassilgoP 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @pintassilgo
        last edited by Alan Kilborn

        @pintassilgo said in How Np++ handles char count?:

        I asked for a few improvements to the app and the dev quickly added them

        Wow. I’d be really grateful for that if I were you. This type of thing does not typically happen in Notepad++ land. :-)

        add char counter to statusbar. The dev said it wasn’t implemented for performance reasons

        And you just don’t trust the dev on this?
        Wouldn’t you think he knows his code and gave you a good answer to your inquiry?
        And you think trying to “encourage” him by telling him how “easy” it is for another editor to do it?
        Software authors love it when people that don’t know anything start bandying about the word “easy”.

        You’re on a dubious course of action.
        Perhaps you will burn the goodwill that he seems to have towards you.
        Ah, but no matter, I suppose…


        To answer your direct question, you can search the codebase for exactly this: length : ; note there is a space before and a space after the :, just as in the N++ status bar.

        1 Reply Last reply Reply Quote 1
        • pintassilgoP
          pintassilgo @pintassilgo
          last edited by pintassilgo

          @Alan-Kilborn said in How Np++ handles char count?:

          And you just don’t trust the dev on this?
          Wouldn’t you think he knows his code and gave you a good answer to your inquiry?

          Actually, asking here was his suggestion, he is open and kind and is apparently willing to add the feature if it’s not so hard to code.

          He’s just interested to see how it’s done in other editors, because I gave the example of some other that display char count and apparently has no impact even for big files. The examples I mentioned were Np++, SciTE and Kate.

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @pintassilgo
            last edited by

            @pintassilgo

            asking here was his suggestion, he is open and kind and is apparently willing to add the feature

            Well…nice.

            I think you’ll find that Notepad++ itself doesn’t keep track of character count; it’s Scintilla (Notepad++'s underlying editing component) that does it.

            1 Reply Last reply Reply Quote 0
            • Mark OlsonM
              Mark Olson
              last edited by

              Can’t point to the exact piece of code, but as a plugin developer who has to worry about this stuff I can say with confidence that the character count in a document is the number of bytes in the UTF-8 representation of that document.

              This is actually pretty simple:

              ASCII characters (anything with char code less than 128) get one byte, so if you move your cursor across such a character, the position increments by 1.
              Any character with char code between 128 and 2047 inclusive (for example, Я) gets 2 bytes, so when the cursor moves across such a character, the position increments by 2.
              Any character with char code between 2048 and 65535 inclusive (for example, ồ, 草) gets 3 bytes, so when the cursor moves across such a character, the position increments by 3.
              Any character bigger than that (mostly emojis like 😀) gets 4 bytes apiece, so the cursor increments by 4 when it moves across such a character.

              Copy and paste this into your editor to see what I mean:

              [
                  "Я",
                  "◐",
                  "ồ",
                  "ェ",
                  "草", // taiwanese char for "grass"
                  "😀",
                  // below are the byte representations of these characters in UTF-16 (which is used in Python and JavaScript)
                  "\u042f",
                  "\u25d0",
                  "\u1ed3",
                  "\uff6a",
                  "\u8349",
                  "\ud83d\ude00"
              ]
              
              1 Reply Last reply Reply Quote 3
              • Alan KilbornA
                Alan Kilborn
                last edited by Alan Kilborn

                @Mark-Olson said in How Np++ handles char count?:

                "😀",
                

                So, yea, Mark makes a good point here.
                In Notepad++, the length : in the status bar isn’t the number of characters in your document, it is the number of bytes that make up the file buffer.
                So if your UTF-8 file is made up entirely of characters that are all emojis, Notepad++ will show you the number of emoji characters, times 4.

                So really, it depends upon what you want in such an informative field.
                As a user, I’d think that I want to see the number of characters.

                In a really off-the-cuff kind of way, I’d think that neither number is processor-intensive to obtain. I’d think it is calculated once when the file is loaded (ok, that could be intensive, but users are used to waiting when huge files are loaded), but incrementally with typical editing adds/deletes (which would just be plusses/minuses to the number the editor already holds) not being a processing hog. But talk is cheap, the only way to truly know is to dig into the code and/or experiment.

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @pintassilgo, @alan-kilborn, @mark-olson and All,

                  Well, in order to recapitulate :

                  Let’s start with this string AЯ◐ồェ草😀, without an ending like-break, written in a new tab

                  • If the current encoding is not an UTF-8 file, use the Encoding > Convert to UTF-8 option

                  Then :

                  • In case of NO selection :

                    • If the cursor is right before the A char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 1 Pos: 1
                    • If the cursor is right before the Я char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 2 Pos: 2
                    • If the cursor is right before the ◐ char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 3 Pos: 4
                    • If the cursor is right before the ồ char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 4 Pos: 7
                    • If the cursor is right before the ェ char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 5 Pos: 10
                    • If the cursor is right before the 草 char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 6 Pos: 13
                    • If the cursor is right before the 😀 char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 7 Pos: 16
                    • If the cursor is right after the 😀 char, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 8 Pos: 20

                  • In case of selection of an unique char of the line :

                    • If the A char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 2 Sel: 1 | 1
                    • If the Я char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 3 Sel: 1 | 1
                    • If the ◐ char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 4 Sel: 1 | 1
                    • If the ồ char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 5 Sel: 1 | 1
                    • If the ェ char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 6 Sel: 1 | 1
                    • If the 草 char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 7 Sel: 1 | 1
                    • If the 😀 char is selected, the status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 8 Sel: 1 | 1

                  • In case of a CTRL + A action :

                    • The status bar shows Length: 19 lines: 1 ..... Ln : 1 Col: 8 Sel: 7 | 1

                  Notes :

                  • As this new file does not have a BOM header, note that the total length is simply : 1 + 2 + 3 + 3 + 3 + 3 + 4 bytes = 19 bytes

                  • In case of NO selection, the difference between two consecutive Pos: values shows the number of bytes of the current character passed through

                  • In case of a selection of an unique character, the sel: value is always the number of characters : only 1

                  • In case of a Ctrl + A action, the sel: value represents the total number of characters of the file

                  • In all cases, the Length: value represents the total number of bytes of the file, without the possible BOM bytes !

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • pintassilgoP
                    pintassilgo
                    last edited by

                    Thanks for all the replies. Byte count looks good enough for me, there is already a dedicated plugin to more refined statistics like count chars and words, for statusbar I believe byte count is enough.

                    However, the other editor I’m talking about (CudaText) doesn’t use Scintilla, it has its own editing component (ATSynEdit) that doesn’t use simple buffer storage to retrieve byte count, it has “list of lines” according to the dev so the count would be slow.

                    An optimization is possible by caching the size of each line, adding them up and then updating with edit events (typing one char sums 1…). But at the moment he don’t plan to work on it because it’s not that trivial as the editor supports multiple carets, multiple selections ans so on.

                    Just adding, other text editor I have installed here, Kate, displays not only chars count (precisely, not bytes) but also count of words in statusbar. However these counts aren’t instant, I can see they wait for like half a second without change in the text to update. Clearly it’s done this way because it’s not that cheap to be done instantly.

                    1 Reply Last reply Reply Quote 2
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors