Hello, @coises and All,
I’ve just tried your last ColumnsPlusPlus v1.3 release and indeed, the search is now considered as a true Unicode search, whatever the individual encoding of each file !
Let’s consider this simple UTF-8 text :
This ‟ is a †very • small ‰ text ‱ for › test
201F 2020 2022 2030 2031 203A in Unicode UTF-8 enoding
And this ANSI text :
This ? is a †very • small ‰ text ? for › test
? 0086 0095 0089 ? 009B in Windows-1252 encoding
IMPORTANT Don’t forget, when this second text is opened in N++, to run the Encoding > Convert to ANSI option, first !
Now, we can create the following table, which recapitulates the Non-ASCII characters used in my examples :
•--------•-----------------•-----------------•
| | Windows-1252 | Unicode |
| •--------•--------•--------•--------•
| Char | Dec | Hex | Dec | Hex |
•--------•--------•--------•--------•--------•
| ‟ | ? | ? | 8223 | 201F |
| | | | | |
| †| 0134 | 0086 | 8224 | 2020 |
| | | | | |
| • | 0149 | 0095 | 8226 | 2022 |
| | | | | |
| ‰ | 0137 | 0089 | 8240 | 2030 |
| | | | | |
| ‱ | ? | ? | 8241 | 2031 |
| | | | | |
| › | 0155 | 009B | 8250 | 203A |
•--------•--------•--------•--------•--------•
In Notepad++ :
Within an ANSI file, the regexes [†-‰] or [\x86-\x89] would only find the characters †and ‰ but not the • whose Win-1252 code ( \x95 ) is after \x89
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters †and ‰ and also the • whose Unicode code-point is between 2020 and 2030
In Columns++ :
Within an ANSI file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters †and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters †and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Note that using the range [†-›] within an ANSI file, a N++ search of the • char would have been successful as its code-point ( 2022 ) lies within the 2020 and 203A range !
Now, @coises, I cannot test easily the CJK behaviour of your new search engine as it’s obvious that I do not a default CJK code-page, needed for such a study ! However, I do not see why your new search behavior couln’t be applied to any kind of Unicode chars ;-)
Best Regards,
guy038