Hello, @coises and All,
I’ve just tried your last ColumnsPlusPlus v1.3 release and indeed, the search is now considered as a true Unicode search, whatever the individual encoding of each file !
Let’s consider this simple UTF-8 text :
This ‟ is a † very • small ‰ text ‱ for › test
   201F    2020   2022   2030    2031    203A  in Unicode UTF-8 enoding
And this ANSI text :
This ? is a † very • small ‰ text ? for › test
     ?     0086   0095    0089    ?    009B   in Windows-1252 encoding
IMPORTANT Don’t forget, when this second text is opened in N++, to run the Encoding > Convert to ANSI option, first !
Now, we can create the following table, which recapitulates the Non-ASCII characters used in my examples :
    •--------•-----------------•-----------------•
    |        |   Windows-1252  |     Unicode     |
    |        •--------•--------•--------•--------•
    |  Char  |   Dec  |   Hex  |   Dec  |   Hex  |
    •--------•--------•--------•--------•--------•
    |   ‟    |   ?    |   ?    |  8223  |  201F  |
    |        |        |        |        |        |
    |   †    |  0134  |  0086  |  8224  |  2020  |
    |        |        |        |        |        |
    |   •    |  0149  |  0095  |  8226  |  2022  |
    |        |        |        |        |        |
    |   ‰    |  0137  |  0089  |  8240  |  2030  |
    |        |        |        |        |        |
    |   ‱  |   ?    |   ?    |  8241  |  2031  |
    |        |        |        |        |        |
    |   ›    |  0155  |  009B  |  8250  |  203A  |
    •--------•--------•--------•--------•--------•
In Notepad++ :
Within an ANSI file, the regexes [†-‰] or [\x86-\x89] would only find the characters † and ‰ but not the • whose Win-1252 code ( \x95 )  is after \x89
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
In Columns++ :
Within an ANSI file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Within an UTF8 file, the regexes [†-‰] or [\x{2020}-\x{2030}] would find the characters † and ‰ and also the • whose Unicode code-point is between 2020 and 2030
Note that using the range [†-›] within an ANSI file, a N++ search of the • char would have been successful as its code-point ( 2022 ) lies within the 2020 and 203A range !
Now, @coises, I cannot test easily the CJK behaviour of your new search engine as it’s obvious that I do not a default CJK code-page, needed for such a study ! However, I do not see why your new search behavior couln’t be applied to any kind of Unicode chars ;-)
Best Regards,
guy038