Colored "Find what:" zone

PeterJones

The only time I don’t have regex mode enabled is when I do a fresh unzip to help someone on the forum. And given that many of those are with old versions of Notepad++, a new feature wouldn’t help me. ;-)

That said, I would not mind some immediate indicator, like the colors you showed. Though I might suggest that Normal should be white (ie, the way it is now), Extended be one of the yellows you show, and Regex be the green; that way, it would tell us power users “green means you are in the right mode, yellow means you are in a weird mode, and boring white means default”.

guy038

Hello All,

The Peter’s advices are quite judicious ! Indeed :

Less colors are used than with my solution
For [ noob ] people, the Normal mode would stay unchanged
For people more involved with regexes, the color green is definitively more relaxing than any kind of red, anyway !

So, I updated my previous picture, as below :

Best regards,

guy038

P.S :

Of course, @peterjones, like you and many others, I mainly use the Regular expression mode. However I also use the Normal mode when comparing two text zones which should be identical … in the end ! So I select the concerned zone, even on multiple lines, choose the Normal mode and tick the Match case option.

I, then, hit the Ctrl + F3 shortcut and any other selected area is, necessarily, strictly identical to the initial one. I use this method when I want to compare the expected result of an OP and the results obtained after running a specific S/R !

Alan Kilborn

I don’t like color-coding these fields. One reason is that it is not obvious what the colors are for. I suppose experimentation by a user would reveal it.

I would much rather colors be used for an error condition (red) a non-error condition (green) or a warning condition (yellow). Not sure what these could mean in this context, but that’s what I think of for colors.

Why not just change the text to the left? Instead of “Find what”, how about “Find Regex” or “Find Ext” when in those modes. This moves the info closer to point-of-use.

I do agree that it can be a bit tedious when you accidentally search in the wrong mode, but I find that I accidentally search with “whole word” or “match case” on just as often as the wrong mode. So I think maybe the answer is pay closer attention to the UI, but that may not be a great answer.

Like Peter, I stay in regex mode most of the time. If I want normal, I just pop a quick \Q on the front of my field (no need for a trailing \E). Although still in regex mode technically, it is effectively normal mode.

And “Extended” mode – I say “bah! What good is that?”

I also use the Normal mode when comparing two text zones which should be identical …

Why not use the Search > Mark All feature for that?
If 2+ sections are identical, they instantly get the same colorization.
From this:

To this at the press of one keycombo:

mere-human

@guy038 said in Colored "Find what:" zone:

I also thought about a specific shortcut to open the Find dialog with the Regular expression automatically ticked. But, in that case, what about the Replace, Find in Files and Mark dialogs ?

How about opening the needed search dialog (e.g. Ctrl + F) and then pressing Alt + G to switch to the Regex mode?

guy038

Hello, @alan-kilborn, @mere-human and All,

So, you don’t fancy the color-coding feature ?!

You said :

Why not just change the text to the left? Instead of “Find what”, how about “Find Regex” or “Find Ext”

Well, one might say, with reason, just give a quick look to the search mode area, at bottom of dialogs !! I don’t think that would better noticed the change of the field :-(

You also said :

but I find that I accidentally search with “whole word” or “match case” on just as often as the wrong mode

You score one point, there !

And :

If I want normal, I just pop a quick \Q on the front of my field

I do use this form, sometimes, if I want to search, for instance, a text like /* this a ( simple ) test of the \Q syntax ! */ in regex mode ! However for multi-lines blocks of text, I prefer the Ctrl + F3 way. But, anyway, I need to know the current search mode !

Finally, you added :

Why not use the Search > Mark All feature for that?

As I’ve just said, above, you must be sure that the current search mode is Normal before running the Mark process !

So a general solution seems not easy ! Many times, when building regexes, I’m obviously not 100% sure that it’s a right one. So, when I got a Can't find message I know that my try is syntactically correct but not exact and five minutes may pass before I realize that I am, stupidly, in Normal mode :-(

Now, a solution could be to slightly change the look of the Find and Replace buttons, according to the current search mode

A last idea, in the picture below :

BR

guy038

@mere-human :

Oh… yes, really interesting ! I’ve never used it before but I suppose, as a shortcuts-maniac person, that I will get used to, very easily ;-))

In addition, the G key just follows the F key, on any keyboard, all over the world !

Alan Kilborn

@guy038 said in Colored "Find what:" zone:

I don’t think that this is your day for understanding my posts, or maybe I’m just not clear enough.

When I said:

Why not just change the text to the left? Instead of “Find what”, how about “Find Regex” or “Find Ext”

I didn’t mean change this as a user; just like your colors feature, having the text change would be done by the developers, when a mode is selected. Changing the text so that it is constant as a user can do is pointless here. Or maybe I misunderstand you as well.

But, having something much closer to the find box that shows the mode is better than at the bottom, no?

When I said:

Why not use the Search > Mark All feature for that?

I think you interpreted that as Search > Mark… instead, despite my orange (not red) coloring in my screenshot. The Mark All commands always work in “normal” mode on literal text; no regexes involved. (I think I remember something about a tie-in to Match case however?)

How about opening the needed search dialog (e.g. Ctrl + F) and then pressing Alt + G to switch to the Regex mode?

This is doable, but quite awkward. If it was Ctrl+f, Ctrl+g it would flow much better. But it’s not.

A last idea, in the picture below :

I’m not understanding that picture fully. Does it suggest to move search-mode into a one-letter dropdown, between the Find what text and the box itself? I don’t think that’s what you had in mind, but I kind of like that. :-)

If a user didn’t know what those N/E/R were they could hover and a speech callout could explain it.

But overall, sorry, no, I for one wouldn’t be on-board with the “colors” idea. But not everyone agrees with everyone else all the time (surely this site has reinforced that idea!).

It must be “the week” for people to come up with ideas to change the Find windows – right Terry? :-)

astrosofista

@mere-human said in Colored "Find what:" zone:

@guy038 said in Colored "Find what:" zone:

I also thought about a specific shortcut to open the Find dialog with the Regular expression automatically ticked. But, in that case, what about the Replace, Find in Files and Mark dialogs ?

How about opening the needed search dialog (e.g. Ctrl + F) and then pressing Alt + G to switch to the Regex mode?

Since sometimes the same thing happened to me as @guy038 tells, I solved the issue with a script. I couldn’t use Alt + G, as you suggested, because I usually switch between two localizations, so I had to use another approach. Using RegEx, of course!

The demo shows the English version of this AutoHotkey script, which can be easily adapted to other languages by adding or replacing the corresponding terms:

SetTitleMatchMode, RegEx

~^f::
~^h::
~^m::
WinWaitActive, Find|Replace|Mark 
ControlGet, status, Checked,, button18, Find|Replace|Mark
If (status = 0) {
	SetControlDelay -1
	ControlClick, button18, A ;checks regular expression
}
return

Take care and have fun!

guy038

Hello, @peterjones, @alan-kilborn, @mere-human, @astrosofista and All,

OMG, I must have been very tired last night and had my head elsewhere. Let’s start again :

First, in my picture of my previous post, just pay attention to the small box, colored or not, between the mention Find what: and the field where you type in your search regex. This box would not be a button at all , just a visual and/or colored mark, located near the zone to fill in, which would help us to remember the current search mode. ( Think of it as the colored boxes of the options Search > Mark All and Search > Unmark All, with, in addition, the uppercase N, E or R letter, inside )

I did understand that this small box would be updated, to the present search mode, by some new program code (not by user), when :
- You open one of the Find dialogs
- You manually click on a radio button of the Search mode section

Now, regarding your Mark idea, I’m terribly sorry ! Indeed, I did not noticed that you spoke about the five Mark Style # ( with style 2 in your picture ). Indeed, I do like this solution, too ! But you’re right : when using the Search > Mark All option, the search is done with current state of the Match Whole word only and Match case options. So, in order to detect strictly identical areas of text, I suppose that :
- The Match Whole word only option should be unset
- The Match case option should be set

Now, in short, I finally think that it would not be valuable to submit such idea, regarding the @mere-human’s and @Astrosofista’s workarounds and the @alan-kilborn’s suggestion of the Mark Style feature !

And, anyway, I suppose that, for at least April, I will remember to check current search mode, just because I’ve had written all this stuff ;-))

Best Regards,

guy038

Alan Kilborn

@guy038 said in Colored "Find what:" zone:

just pay attention to the small box, colored or not, between the mention Find what: and the field where you type in your search regex. This box would not be a button at all

I think it actually has more value to truly BE a button, or, more accurately, a horizontally-narrow dropdown, where one could select, for example:

N for normal mode search
E for extended mode search
R for regular expression mode search (with . does not match newline)
R. for regular expression mode search (with . matches newline)

Of course, when dropped, it could appear like this for maximum user help:

N : Normal Search Mode
E : Extended Search Mode
R : Regular Expression Search Mode (. does not match newline)
R. : Regular Expression Search Mode (. matches newline)

With such a change, the Search Mode box near the bottom of the window could be eliminated, making room for future searching goodie options! :-)

Most users could just ignore it, and if it is small that is easy to do (just leave it set at N). But for users that switch, something small and to the left of the text to find (Guy’s idea!) seems definitely worth it.

I can’t believe I would get excited about a Find UI change that I know devs won’t like – because they don’t like any of these types of things.

I did not noticed that you spoke about the five Mark Style # ( with style 2 in your picture ). Indeed, I do like this solution, too !

I suppose I had forgotten that “styling” is controlled by match-case and match-whole-word settings. I think maybe that makes my suggested solution somewhat less great than I first thought. :-(

Somewhat like @astrosofista , I use automation to help keep these boxes clear when invoking a new Find, so I think that’s why I forgot.

guy038

Hi, @alan-kilborn and All,

Regarding the Mark style feature, unfortunately, it cannot take in account a range of more of 2,047 characters too, as the Find what: zone :-(

Just duplicate a zone of, let say, 5,000 characters or so ! I had never done such a test before ;-))

BR

guy038

Alan Kilborn

@guy038 said in Colored "Find what:" zone:

Regarding the Mark style feature, unfortunately, it cannot take in account a range of more of 2,047 characters as the Find what: zone

Well, I had not encountered this before (having not attempted such a large “styling”) but it does not surprise me because clearly “styling” is going to involve Notepad++'s internal find routines to do its job. And those, as we know, have this 2047 limit.

But really, it isn’t much of a limitation to what we’re discussing (your usage when doing before/after regex replacement “compares”), right? Perhaps you were thinking that the “styling” method may be better because it might not have this limitation?

As an alternative for such a mechanism for such compares, what I do is to use an independent compare utility. The utility can do quick-to-invoke compares on the last two things copied to the clipboard. Thus it is perfect for your described application. Everyone touts the N++ Compare plugin, but I find a separate utility outside of N++, with possibly some hooks “into” N++ (via PythonScript) to be even more useful. Nothing against N++'s Compare plugin, however.

But, back to the 2047 (or is it 2046? I can’t remember) limit…

Is it truly 2047 characters, or is it 2047 bytes? Also a “can’t remember” for me. If it is “bytes” then, worst case for UTF-8 data, it might be as little as 2047-divided-by-4, or roughly 512 characters.

But even if it is characters, is such a limit “too small” for today’s conditions?
Maybe lobbying the N++ devs for an increase in this number is a reasonable thing to do?

Alan Kilborn

@Alan-Kilborn said in Colored "Find what:" zone:

I think it actually has more value to truly BE a button, or, more accurately, a horizontally-narrow dropdown,

One thing that my earlier proposal does NOT consider, is changing modes via keyboard.
I don’t currently have a great idea for this, without increasing the size on the UI.
But probably it is all pointless anyway, as Find UI changes are rarely considered by the devs.

guy038

Hi, @alan-kilborn and All,

I did a series of tests and I’ve found out an interesting point about the Find What: filling zone !!

A) First case :

Make a normal selection of some text or use the current selection
Hit the Ctrl + F, Ctrl+ H, Ctrl + Shift + F or Ctrl + M shortcut. So, this selection usually fills in the Find What: zone, automatically

=> In this case, the maximum size of this zone is 2,046 bytes, whatever the characters stored and the number of chars to encode each character. For instance, the string Aé▣🎷 contains 1 + 2 + 3 + 4 bytes, in an UTF-8 file. So, 10 bytes are inserted in the Find what: zone

B) Second case :

Copy the current selection in the clipboard, with Ctrl + C
Cancel the current selection
Hit the Ctrl + F, Ctrl+ H, Ctrl + Shift + F or Ctrl + M shortcut
Delete the contents of the Find What: zone, whatever it is
Paste the contents of the clipboard with Ctrl + V, in the Find what zone

=> In that case, the maximum size of this zone is 2046 chars and :

Each character, with Unicode code-point <= U+FFFF, stands for one character
Each character, with Unicode code-point > U+FFFF, stands for two characters !

So the same string Aé▣🎷 contains 1 + 1 + 1 + 2 chars. Thus, 5 pseudo characters are inserted in the Find what: zone

Remark that this case B) occurs, also, for the Replace with: zone, as we need, necessarily, to fill in this zone with clipboard contents, anyway ! Therefore, the maximum size of the Replace with: zone is 2,046 characters, too, with the above distinction between characters within or outside the BMP !

BTW, we get the same results whatever the current search mode used !

Best Regards,

guy038

P.S. :

For a quick test, note the differences between cases A) and B) with the one-line text of the ▣ character ( U+25A3 ), below :

▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣▣

First case :

Select all the above text ( 2,109 chars )
Open the Find dialog ( The Fins what: zone is filled in, automatically )
Tick the Wrap around option
Click once only on the Find Next button

=> 682 characters ▣ are selected ( each char is coded with three bytes E2 96 A3 => 682 * 3 = 2,046 bytes )

Second case :

Select again all this text ( 2,109 chars )
Hit Ctrl + C
Cancel the selection
Open the Find dialog
Delete anything in the Find what: zone
Paste the clipboard contents with Ctrl + V, in the Find what: zone
Tick the Wrap around option
Click on the Find Next button

=> 2,046 characters ▣ are selected ( each char counts for itself, as its code-point is <= U+FFFF )

Alan Kilborn

@guy038 said in Colored "Find what:" zone:

I did a series of tests and I’ve found out an interesting point about the Find What: filling zone !!

This sounds like more than “interesting” behavior.
It sounds like “buggy” behavior.
And it sounds like possibly several bugs. :-(

guy038

Hi, @alan-kilborn and All,

Yeah, I admit that it’s really border line ! Now, which case seems more logical and which case seems more interesting ?

I would say that :

The first case seems more logical as it considers the total amount of bytes inserted in the Search what: zone, which is strictly equal to the total amount of bytes of the current selection, before calling the Find dialog
Now, the second case, pasting contents in the Search what: zone, is more interesting, of course, because we can search for a greater range of characters ( at least, 2 times more ) !

Best Regards,

guy038

Alan Kilborn

@guy038

I would say, that if the developers are going to set some kind of limit (and often in software a limit must be set), then for user convenience and understanding, it should be a character limit. Users don’t understand characters versus bytes (unless those numbers are strictly the same, and with UTF-8 and other encodings they are NOT).

And, different methods of entry should of course not alter the amount of data that can be accepted.

guy038

Hello, @alan-kilborn and All,

Yes, Alan, the character point of vue should be preferred to the byte one, like, for instance, the sel : number, in status bar, which refers to characters ( not bytes ) !

So, the second case should, therfore, be preferred. However, note that, presently, there is still a difference between chars in the BMP, counting for one char and characters outside the BMP, counting for two. A bit weird, BTW ?

BR

guy038

Alan Kilborn

@guy038 said in Colored "Find what:" zone:

the character point of vue should be preferred to the byte one, like, for instance, the sel : number, in status bar, which refers to characters ( not bytes ) !

And the Pos : number, in the status bar, bothers me somewhat, as it seems intuitively like it should be one character = one “position” change as you cursor over it. But for multibyte characters it is NOT a change of one.

The Pythonscript programmer in me sort of understands this, however. Meaning how Scintilla deals with “position”.

chars in the BMP, counting for one char and characters outside the BMP, counting for two.

You might understand that way better than me.
The “two” makes me think of “surrogate pairs” but here is where I back off because I don’t know what I’m talking about. :-)

guy038

Hello, @Alan-kilborn and All,

Thank, alan, for your feedback !

Yes, I know that the Pos number, in the status bar, refers to exact position (starting from 0), in current file, of the first byte of the sequence needed to write a character, in a specific encoding ! For instance, the UTF-8 sequence of the 🎷 character, representing a saxophone, is the four bytes sequence ( F0 9F 8E B7 ). So, if you insert in a new tab, the string A🎷Z0 you can jump, with the Search > Go to... feature, when the Offset radio button is set, to :

Pos 0, right before the A letter
Pos 1, right before the 🎷 letter
Pos 5, right before the Z letter
Pos 6, right before the 0 digit

And, if you try the offset 2, 3 or 4, which are all within the UTF-8 encoding of the 🎷 character, you would just jump to the next Z char !

This behavior is now correct, because I created an issue about this problem. Refer to this issue !

Now, I think that you’re right regarding your assumption about the two bytes used by a char, over the BMP : this has really something to do with the two bytes of the surrogate mechanism !

For instance :

The regex to get the 🎷 character, use its surrogate pair \x{D83C}\x{DFB7} ( as we cannot use its complete hexadecimal code \x{1F3B7} )
The general regex (?-s).[\x{D800}-\x{DFFF}] finds any character over the BMP ( Basic Multilingual Plane ), so with code-point over \x{FFFF}

BR

guy038