Columns++ version 1.3: All Unicode, all the time
-
Columns++ version 1.3 brings the enhancements for regular expressions in Unicode documents to ANSI documents as well:
-
Regular expressions now match based on Unicode code points in all documents, so the syntax and semantics of regular expressions are no longer dependent on the underlying representation in Scintilla. The features added in version 1.2 for Unicode documents now work in all documents.
-
Regular expressions did not work properly in ANSI documents for the system default code pages 932 (Japanese, Shift-JIS), 936 (Chinese Simplified, GB2312), 949 (Korean, Windows-949 / Unified Hangul Code) or 950 (Chinese Traditional, Big5) in version 1.2. Regular expressions now match these documents based on Unicode code points.
-
Character information tables for regular expressions were updated to reflect Unicode 17.0.0 (released on 2025-09-09).
I haven’t yet set this as “stable” (and of course it isn’t yet in the plugins admin list). Anyone feeling adventuresome is welcome to try it and see if it behaves. There is documentation describing how Unicode-based matching in Columns++ differs from Notepad++ matching.
If anyone reading this routinely has one of the CJK code pages as their system default code page, I would be very interested to know if regular expressions in this version appear to work as expected on your locale’s ANSI files.
-
-
Hello, @coises and All,
I’ve just tried your last
ColumnsPlusPlus v1.3
release and indeed, the search is now considered as a true Unicode search, whatever the individual encoding of each file !Let’s consider this simple
UTF-8
text :This ‟ is a † very • small ‰ text ‱ for › test 201F 2020 2022 2030 2031 203A in Unicode UTF-8 enoding
And this
ANSI
text :This ? is a † very • small ‰ text ? for › test ? 0086 0095 0089 ? 009B in Windows-1252 encoding
IMPORTANT Don’t forget, when this second text is opened in N++, to run the
Encoding > Convert to ANSI
option, first !
Now, we can create the following table, which recapitulates the
Non-ASCII
characters used in my examples :•--------•-----------------•-----------------• | | Windows-1252 | Unicode | | •--------•--------•--------•--------• | Char | Dec | Hex | Dec | Hex | •--------•--------•--------•--------•--------• | ‟ | ? | ? | 8223 | 201F | | | | | | | | † | 0134 | 0086 | 8224 | 2020 | | | | | | | | • | 0149 | 0095 | 8226 | 2022 | | | | | | | | ‰ | 0137 | 0089 | 8240 | 2030 | | | | | | | | ‱ | ? | ? | 8241 | 2031 | | | | | | | | › | 0155 | 009B | 8250 | 203A | •--------•--------•--------•--------•--------•
-
In
Notepad++
:-
Within an
ANSI
file, the regexes[†-‰]
or[\x86-\x89]
would only find the characters†
and‰
but not the•
whoseWin-1252
code (\x95
) is after\x89
-
Within an
UTF8
file, the regexes[†-‰]
or[\x{2020}-\x{2030}]
would find the characters†
and‰
and also the•
whose Unicode code-point is between2020
and2030
-
-
In
Columns++
:-
Within an
ANSI
file, the regexes[†-‰]
or[\x{2020}-\x{2030}]
would find the characters†
and‰
and also the•
whose Unicode code-point is between2020
and2030
-
Within an
UTF8
file, the regexes[†-‰]
or[\x{2020}-\x{2030}]
would find the characters†
and‰
and also the•
whose Unicode code-point is between2020
and2030
-
Note that using the range
[†-›]
within anANSI
file, a N++ search of the•
char would have been successful as its code-point (2022
) lies within the2020
and203A
range !
Now, @coises, I cannot test easily the
CJK
behaviour of your new search engine as it’s obvious that I do not a defaultCJK
code-page, needed for such a study ! However, I do not see why your new search behavior couln’t be applied to any kind of Unicode chars ;-)Best Regards,
guy038
-