Hello, @coises and All,
I’ve just tried your last ColumnsPlusPlus v1.3 release and indeed, the search is now considered as a true Unicode search, whatever the individual encoding of each file !
Let’s consider this simple UTF-8 text :
This ‟ is a † very • small ‰ text ‱ for › test 201F 2020 2022 2030 2031 203A in Unicode UTF-8 enodingAnd this ANSI text :
This ? is a † very • small ‰ text ? for › test ? 0086 0095 0089 ? 009B in Windows-1252 encodingIMPORTANT Don’t forget, when this second text is opened in N++, to run the Encoding > Convert to ANSI option, first !
Now, we can create the following table, which recapitulates the Non-ASCII characters used in my examples :
•--------•-----------------•-----------------• | | Windows-1252 | Unicode | | •--------•--------•--------•--------• | Char | Dec | Hex | Dec | Hex | •--------•--------•--------•--------•--------• | ‟ | ? | ? | 8223 | 201F | | | | | | | | † | 0134 | 0086 | 8224 | 2020 | | | | | | | | • | 0149 | 0095 | 8226 | 2022 | | | | | | | | ‰ | 0137 | 0089 | 8240 | 2030 | | | | | | | | ‱ | ? | ? | 8241 | 2031 | | | | | | | | › | 0155 | 009B | 8250 | 203A | •--------•--------•--------•--------•--------•In Notepad++ :
Within an ANSI file, the regexes [†-‰] or [\x86-\x89] would only find the characters † and ‰ but not the • whose Win-1252 code ( \x95 ) is after \x89
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
In Columns++ :
Within an ANSI file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Note that using the range [†-›] within an ANSI file, a N++ search of the • char would have been successful as its code-point ( 2022 ) lies within the 2020 and 203A range !
Now, @coises, I cannot test easily the CJK behaviour of your new search engine as it’s obvious that I do not a default CJK code-page, needed for such a study ! However, I do not see why your new search behavior couln’t be applied to any kind of Unicode chars ;-)
Best Regards,
guy038