@guy038 said in Columns++ version 1.3: All Unicode, all the time:
As you can see, a lot of Format characters return an erroneous result of 3,240 occurrences. But we’re not going to bother about these wrong equivalence classes, as long as the similar collating names, with the [[.XXX.]] syntax, are totally correct !
Luckily, all the other equivalence classes are also correct, except for [[=ls=]] which returns 2 matches \x{2028} and \x{FE47} ??
Still looking into this, I find this statement in the Boost::regex documentation (emphasis mine):
LCMapStringEx(locale.data(), LCMAP_SORTKEY | LINGUISTIC_IGNOREDIACRITIC | NORM_IGNORECASE | NORM_IGNOREKANATYPE | NORM_IGNOREWIDTH | NORM_LINGUISTIC_CASING, ...An expression of the form [[=col=]], matches any character or collating element whose primary sort key is the same as that for collating element col, as with collating elements the name col may be a symbolic name. A primary sort key is one that ignores case, accentation, or locale-specific tailorings; so for example [[=a=]] matches any of the characters: a, À, Á, Â, Ã, Ä, Å, A, à, á, â, ã, ä and å. Unfortunately implementation of this is reliant on the platform’s collation and localisation support; this feature can not be relied upon to work portably across all platforms, or even all locales on one platform.
as my best guess at how to do this.
There are some differences other than the format characters between my search and Notepad++. For example, [[=k=]] matches Ʞ (U+A7B0) in Columns++ search, but not in Notepad++ native search; though both match its lower-case counterpart, ʞ (U+029E).
I do wonder why [[=ls=]] matches ﹇ (U+FE47) as well as U+2028. Though Notepad++ native search does not accept the [[=ls=]] syntax, substituting the actual U+2028 character, [[= =]] (you can copy that even though you can’t see it), yields 12 matches, including U+FE47.
Do you know if there is a precise definition of what should count as an equivalence class in Unicode regular expressions? It is unclear to me for what target I should be aiming.