@guy038 said in Columns++ version 1.3: All Unicode, all the time:
As you can see, a lot of Format characters return an erroneous result of 3,240 occurrences. But we’re not going to bother about these wrong equivalence classes, as long as the similar collating names, with the [[.XXX.]] syntax, are totally correct !
Luckily, all the other equivalence classes are also correct, except for [[=ls=]] which returns 2 matches \x{2028} and \x{FE47} ??
Thank you for the observation. I will have to look into this more closely. I believe the Boost::regex engine uses the transform_primary member function of the character traits class to determine equivalence: if the sort key returned by that function for two characters is the same, then they are equivalent. I implemented transform_primary using LCMapStringEx, as that is normally how one does Unicode sorting. But how is sorting relevant to regular expressions?
It could be — despite the documented requirement for the function — that what is needed from transform_primary isn’t a sort key, but rather a case folding followed by a compatibility decomposition.
Again, thank you for all your testing, and for calling this to my attention.