Columns++ version 1.2: better Unicode search
-
I decided to release Columns++ version 1.2 now, mark it stable and put in a pull request for the plugins list. Hopefully it will make it into the list for Notepad++ 8.7.8. Hopefully calling it “stable” won’t prove to be hubris.
This version is focused on improving the behavior of regular expressions for Unicode documents:
-
Matching is now based on Unicode code points rather than UTF-16 code units. Each Unicode code point is a single regular expression “character” — surrogate pairs are not used.
-
The hexadecimal representation for code points beyond the basic multilingual plane can be entered directly (e.g.,
\x{1F642}
for 🙂) in both find and replace fields. -
The character classes documented for Unicode work, with the exception of Cs/Surrogate. (Unpaired surrogates cannot yield valid UTF-8; Scintilla displays attempts to encode them — aka WTF-8 — as three invalid bytes, and this regular expression implementation treats them the same way.)
-
These escapes are added:
\i
- matches invalid UTF-8 bytes. (You can also use[[:invalid:]]
.)\o
- matches ASCII characters (code points 0-127).\y
- matches defined characters (all except unassigned, invalid and private use — you can also use[[:defined:]]
).\I
,\O
and\Y
match the complements of those classes.
-
\X
reliably matches a “grapheme cluster” (what normal people call a character) regardless of how many Unicode code points (what the regular expression engine sees as a “character”) comprise it. -
The Unicode character classes, named character classes and the
\l
and\u
escapes are always case-sensitive, even when Match case is not checked or the(?i)
modifier is used. -
All the control character and non-printing character abbreviations that are shown (depending on View | Show Symbol settings) in reverse colors can be used as symbolic character names: e.g.,
[[.NBSP.]]
will find non-breaking spaces.
In the Search in indicated region dialog, for all documents and search modes:
-
Columns++ shows a progress dialog if the estimated time for a multiple search action (Count, Select, Replace All/Before/After) exceeds about two seconds.
-
When nothing is selected, no search region is set, and a stepwise find or replace is initiated with Auto set checked — causing the search region to be set to the entire document — the search now starts from the caret position instead of from the beginning or end of the document.
-
-
C Coises referenced this topic on
-
Hi, @Coises,
Many thanks for your new
Columns++ version 1.2
. So, you just anticipated my last reply which confirmed, to my mind, that you last experimental release was mature ;-))
I was a bit confused by your last sentence :
the search now starts from the caret position instead of from the beginning or end of the document.
I was initially afraid that it would just, for example, count from caret position to end of file. But I understood, by comparing your last version and the present one, that results are identical, as long as no previous selection occurred and that the
Auto set
option was checked. It’s just the start of the cycle among the matches which is different !
Now, may I request for one useful improvement ? The font, used in the two drop-down lists
Find what :
andReplace with
, is visibly a proportional font. To be convinced of this fact, enter the string WWWWWIIIII in theFind what
zone !To my mind , it would be nice, like within Notepad++, to choose, instead, a mono-spaced font ( maybe an option ! ).
A second possibility would be to allow the selection from a drop-down list of all the installed fonts ?
A third possibility would be to have an option, in the dialog, to enlarge, temporarily or not, these two zones. I suppose that this last solution would be more difficult to implement !
As for now, I just use the Microsoft magnifier feature (
300 %
) to solve this problem !Best Regards,
guy038
-
@guy038 said in Columns++ version 1.2: better Unicode search:
It’s just the start of the cycle among the matches which is different !
While testing things, I kept making the mistake of placing the caret just before something I wanted to check, then opening search, clicking Find (not noticing that it said Find First and not Find Next) and having it bounce to the start of the document. I figured if it’s counter-intuitive to me, it’s surely surprising to everyone else. Losing one’s place in a large document seems much more annoying than having to press Ctrl+Home if you want to start from the beginning, so I figured this to be a change that will do more good than harm.
Now, may I request for one useful improvement ? The font, used in the two drop-down lists
Find what :
andReplace with
, is visibly a proportional font. To be convinced of this fact, enter the string WWWWWIIIII in theFind what
zone !To my mind , it would be nice, like within Notepad++, to choose, instead, a mono-spaced font ( maybe an option ! ).
A second possibility would be to allow the selection from a drop-down list of all the installed fonts ?
A third possibility would be to have an option, in the dialog, to enlarge, temporarily or not, these two zones. I suppose that this last solution would be more difficult to implement !
All good ideas. I hadn’t thought about the monospaced font. (I forgot that Notepad++ has that option — I remember that I liked it except that it takes up more space, so I can see less of what I’ve typed without making the dialog obscure even more of the document.)
A thought I’ve had for some time is to have a button that opens a second dialog, or an extended “pane” attached to the search dialog, that’s just for entering a regular expression or a replacement. My “vision” (and it’s only that — I’ve done no coding or even a mock-up yet) is that the expression entry areas would be Scintilla controls which would, at least by default, reflect the font and size used in the document; they could contain multiple lines and possibly have appropriate syntax highlighting. Ideally there would be some kind of a “builder” to help people who are less familiar with regular expressions know what they can enter (escapes, class names, symbolic character names, quantifiers — and those formulas I process in the replacement), and an area where users could save frequently-used expressions.
I’ve also wondered if search should be a dockable panel — so results of a find don’t get hidden behind the dialog, which I find an annoying occurrence. Dockable dialogs are kind of strange, though, and from what I’ve seen (I’m still learning), some of the control one has with an ordinary dialog is lost when it becomes dockable (such as that setting height and width constraints don’t seem to work, even when the dialog is undocked).
Either of those ideas are getting so far from the nominal purposes of Columns++, though, that it seems it would really be time to make a separate plugin. (Yes, @Alan-Kilborn, hoping someday it could be part of the main program. But far less “aggressive” changes have caused consternation when made to Notepad++; at the least, I think anything so dramatic should have a considerable test period to demonstrate its value and stability before I would dream of suggesting it as a replacement for existing functionality.)
-
Hi, @Coises and All,
Luckily, I do not need the Microsoft magnifier in my everyday work on my Windows-10 laptop !! But sometimes, as the size of your search dialog font seems a bit small, it helped me to clearly see which kind or regex I typed in, during the tests of your experimental versions. However, for example, I just use the N++ default zoom to prepare this post !
Note that regular expressions use a lot of chars not easy to distinguish, like the
.
char, the(
and)
chars, the[
and]
chars, the{
and}
chars, and so on…, which look very thin, with the present proportional font !So, whatever you plan to do in the future, regarding my request, it should be better than the present situation. No doubt about it !
Best Regards,
guy038
-
@Coises said:
it seems it would really be time to make a separate plugin
I would go so far as to suggest something that looks and operates like the Notepad++ Find dialog and its tabs.
Then someone using the plugin would not have to learn anything new, and would feel “right at home”.
You know why I suggest this, right? :-) -
@Alan-Kilborn said in Columns++ version 1.2: better Unicode search:
I would go so far as to suggest something that looks and operates like the Notepad++ Find dialog and its tabs.
Then someone using the plugin would not have to learn anything new, and would feel “right at home”.
You know why I suggest this, right? :-)I think I do, but to be honest, if and when I take on such a project, non-trivial user interface changes would be the whole point. Given that, I’m not sure I’d want to tie myself to recreating a legacy user interface and using it as an underlying model. Familiarity would be a plus, but I am unlikely to impose it on myself as a constraint.
This is all far enough down the road that someone else might well get to it before I do, anyway. I have at least two other self-assignments that would come first, and that’s just in the realm of computer programming.