How Np++ handles char count?
-
Please, could someone point me to the piece of code that handles the char count displayed in statusbar (“length”)? Not the displaying of the counter, but the code that does the counting.
Context:
I switched to Linux a year ago and, as much as Np++ works though Wine, a native app is obviously preferred. I spent all this time without finding a proper Linux replacement to Np++ until I discovered CudaText past week. Super customizable, has powerful plugins similar to Np++ ones, the UI can be simplified to simple notepad app with tabs instead of the scary UI of complete IDEs… And it has good performance with big files and files with very long lines, where many editors fail. Even Np++ has a little delay when I write on big files.
I asked for a few improvements to the app and the dev quickly added them. Now, my last request was to add char counter to statusbar. The dev said it wasn’t implemented for performance reasons. I’d like to show code examples or at least approaches used by other FOSS editors to convince him that a char counter is harmless.
Thanks.
-
@pintassilgo said in How Np++ handles char count?:
I asked for a few improvements to the app and the dev quickly added them
Wow. I’d be really grateful for that if I were you. This type of thing does not typically happen in Notepad++ land. :-)
add char counter to statusbar. The dev said it wasn’t implemented for performance reasons
And you just don’t trust the dev on this?
Wouldn’t you think he knows his code and gave you a good answer to your inquiry?
And you think trying to “encourage” him by telling him how “easy” it is for another editor to do it?
Software authors love it when people that don’t know anything start bandying about the word “easy”.You’re on a dubious course of action.
Perhaps you will burn the goodwill that he seems to have towards you.
Ah, but no matter, I suppose…
To answer your direct question, you can search the codebase for exactly this:
length :
; note there is a space before and a space after the:
, just as in the N++ status bar. -
@Alan-Kilborn said in How Np++ handles char count?:
And you just don’t trust the dev on this?
Wouldn’t you think he knows his code and gave you a good answer to your inquiry?Actually, asking here was his suggestion, he is open and kind and is apparently willing to add the feature if it’s not so hard to code.
He’s just interested to see how it’s done in other editors, because I gave the example of some other that display char count and apparently has no impact even for big files. The examples I mentioned were Np++, SciTE and Kate.
-
asking here was his suggestion, he is open and kind and is apparently willing to add the feature
Well…nice.
I think you’ll find that Notepad++ itself doesn’t keep track of character count; it’s Scintilla (Notepad++'s underlying editing component) that does it.
-
Can’t point to the exact piece of code, but as a plugin developer who has to worry about this stuff I can say with confidence that the character count in a document is the number of bytes in the UTF-8 representation of that document.
This is actually pretty simple:
ASCII characters (anything with char code less than 128) get one byte, so if you move your cursor across such a character, the position increments by 1.
Any character with char code between 128 and 2047 inclusive (for example, Я) gets 2 bytes, so when the cursor moves across such a character, the position increments by 2.
Any character with char code between 2048 and 65535 inclusive (for example, ồ, 草) gets 3 bytes, so when the cursor moves across such a character, the position increments by 3.
Any character bigger than that (mostly emojis like 😀) gets 4 bytes apiece, so the cursor increments by 4 when it moves across such a character.Copy and paste this into your editor to see what I mean:
[ "Я", "◐", "ồ", "ェ", "草", // taiwanese char for "grass" "😀", // below are the byte representations of these characters in UTF-16 (which is used in Python and JavaScript) "\u042f", "\u25d0", "\u1ed3", "\uff6a", "\u8349", "\ud83d\ude00" ]
-
@Mark-Olson said in How Np++ handles char count?:
"😀",
So, yea, Mark makes a good point here.
In Notepad++, thelength :
in the status bar isn’t the number of characters in your document, it is the number of bytes that make up the file buffer.
So if your UTF-8 file is made up entirely of characters that are all emojis, Notepad++ will show you the number of emoji characters, times 4.So really, it depends upon what you want in such an informative field.
As a user, I’d think that I want to see the number of characters.In a really off-the-cuff kind of way, I’d think that neither number is processor-intensive to obtain. I’d think it is calculated once when the file is loaded (ok, that could be intensive, but users are used to waiting when huge files are loaded), but incrementally with typical editing adds/deletes (which would just be plusses/minuses to the number the editor already holds) not being a processing hog. But talk is cheap, the only way to truly know is to dig into the code and/or experiment.
-
Hi, @pintassilgo, @alan-kilborn, @mark-olson and All,
Well, in order to recapitulate :
Let’s start with this string
AЯ◐ồェ草😀
, without an ending like-break, written in a new tab- If the current encoding is not an
UTF-8
file, use theEncoding > Convert to UTF-8
option
Then :
-
In case of NO selection :
- If the cursor is right before the
A
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 1 Pos: 1
- If the cursor is right before the
Я
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 2 Pos: 2
- If the cursor is right before the
◐
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 3 Pos: 4
- If the cursor is right before the
ồ
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 4 Pos: 7
- If the cursor is right before the
ェ
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 5 Pos: 10
- If the cursor is right before the
草
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 6 Pos: 13
- If the cursor is right before the
😀
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 7 Pos: 16
- If the cursor is right after the
😀
char, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 8 Pos: 20
- If the cursor is right before the
-
In case of selection of an unique char of the line :
- If the
A
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 2 Sel: 1 | 1
- If the
Я
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 3 Sel: 1 | 1
- If the
◐
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 4 Sel: 1 | 1
- If the
ồ
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 5 Sel: 1 | 1
- If the
ェ
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 6 Sel: 1 | 1
- If the
草
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 7 Sel: 1 | 1
- If the
😀
char is selected, the status bar showsLength: 19 lines: 1 ..... Ln : 1 Col: 8 Sel: 1 | 1
- If the
-
In case of a
CTRL + A
action :- The status bar shows
Length: 19 lines: 1 ..... Ln : 1 Col: 8 Sel: 7 | 1
- The status bar shows
Notes :
-
As this new file does not have a
BOM
header, note that the total length is simply :1 + 2 + 3 + 3 + 3 + 3 + 4
bytes =19
bytes -
In case of NO selection, the difference between two consecutive
Pos:
values shows the number of bytes of the current character passed through -
In case of a selection of an unique character, the
sel:
value is always the number of characters : only1
-
In case of a
Ctrl + A
action, thesel:
value represents the total number ofcharacters
of the file -
In all cases, the
Length:
value represents the total number ofbytes
of the file, without the possibleBOM
bytes !
Best Regards,
guy038
- If the current encoding is not an
-
Thanks for all the replies. Byte count looks good enough for me, there is already a dedicated plugin to more refined statistics like count chars and words, for statusbar I believe byte count is enough.
However, the other editor I’m talking about (CudaText) doesn’t use Scintilla, it has its own editing component (ATSynEdit) that doesn’t use simple buffer storage to retrieve byte count, it has “list of lines” according to the dev so the count would be slow.
An optimization is possible by caching the size of each line, adding them up and then updating with edit events (typing one char sums 1…). But at the moment he don’t plan to work on it because it’s not that trivial as the editor supports multiple carets, multiple selections ans so on.
Just adding, other text editor I have installed here, Kate, displays not only chars count (precisely, not bytes) but also count of words in statusbar. However these counts aren’t instant, I can see they wait for like half a second without change in the text to update. Clearly it’s done this way because it’s not that cheap to be done instantly.