Hello, @coises and All,
Continuation and end of my post
I also tested ALL the `equivalence class feature :
You can use ANY equivalent character to get the total number of matches of the equivalence class character. For example, [[=ª=]] = [[=Å=]] = [[=ã=]] = … )
Here is, below, the list of all the equivalences of any char of the Windows-1252 code-page, against the Total_ANSI.txt file. Note that I did not consider the equivalence classes which returns only one match !
[[=1=]] = [[=one=]] => 2 [1¹]
[[=2=]] = [[=two=]] => 2 [2²]
[[=3=]] = [[=three=]] => 2 [3³]
[[=A=]] => 15 [AaªÀÁÂÃÄÅàáâãäå]
[[=B=]] => 2 [Bb]
[[=C=]] => 4 [CcÇç]
[[=D=]] => 4 [DdÐð]
[[=E=]] => 10 [EeÈÉÊËèéêë]
[[=F=]] => 3 [Ffƒ]
[[=G=]] => 2 [Gg]
[[=H=]] => 2 [Hh]
[[=I=]] => 10 [IiÌÍÎÏìíîï]
[[=J=]] => 2 [Jj]
[[=K=]] => 2 [Kk]
[[=L=]] => 2 [Ll]
[[=M=]] => 2 [Mm]
[[=N=]] => 4 [NnÑñ]
[[=O=]] => 15 [OoºÒÓÔÕÖØòóôõöø]
[[=P=]] => 2 [Pp]
[[=Q=]] => 2 [Qq]
[[=R=]] => 2 [Rr]
[[=S=]] => 4 [SsŠš]
[[=T=]] => 2 [Tt]
[[=U=]] => 10 [UuÙÚÛÜùúûü]
[[=V=]] => 2 [Vv]
[[=W=]] => 2 [Ww]
[[=X=]] => 2 [Xx]
[[=Y=]] => 6 [YyÝýÿŸ]
[[=Z=]] => 4 [ZzŽž]
[[=^=]] = [[=circumflex=]] => 2 [^ˆ] = [\x5E\x{02C6}]
[[=Œ=]] => 2 [Œœ] = [\x{0152}\x{0153}]
[[==]] => 2 [[.NUL.][.SHY.]] = [\x00\xAD]
[[=Þ=]] => 2 [Þþ] = [\xDE\xFE]
Some double-letter characters equivalences :
[[=AE=]] = [[=Ae=]] = [[=ae=]] => 2 [Ææ] = [\xC6\xE6]
[[=SS=]] = [[=Ss=]] = [[=ss=]] => 1 [ß] = [\xDF]
An example : let’s suppose that we run this regex [A-F[:lower:]], against my Total_ANSI.txt file. It does give 69 matches, so 6 UPPER letters + 63 LOWER letters
The regexes [[:upper:]]|[[:lower:]] and [[:upper:][:lower:]] act as insensitive regexes and return 123 matches ( So 60 UPPER letters + 63 LOWER letters )
The regexes (?=\u)\l and (?=\l)\u do not find anything. This implies that the sets of UPPER and LOWER letters, in Total_ANSI.twt, are totally disjoint
Best Regards
guy038
P.S. :
BTW, I forgot to list the equivalence classes, > 1, of the Control C0/C1 and Control Format characters, against the Total_Chars.txt file ! Here are the results, below :
[[=nul=]] => 3,240 [\x{0000}\X{00AD}....] Cc
[[= =]] => 3 [\x{0020}\x{205F}\x{3000}] Zs
[[=mmsp=]] => 3 [\x{0020}\x{205F}\x{3000}] Zs
[[=idsp=]] => 3 [\x{0020}\x{205F}\x{3000}] Zs
[[=shy=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=alm=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=sam=]] => 2 [\x{070F}\x{2E1A}] Po
[[=nqsp=]] => 2 [\x{2000}\X[2002}] Zs
[[=ensp=]] => 2 [\x{2000}\X[2002}] Zs
[[=mqsp=]] => 2 [\x{2001}\X{2003}] Zs
[[=emsp=]] => 2 [\x{2001}\X{2003}] Zs
[[=zwnj=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=zwj=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=lrm=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=rlm=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=ls=]] => 2 [\x{2028}\x{FE47}] Zl
[[=lre=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=rle=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=pdf=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=lro=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=rlo=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=wj=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=(fa)=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=(it)=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=(is)=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=(ip)=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=lri=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=rli=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=fsi=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=pdi=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=iss=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=ass=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=iafs=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=aafs=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=nads=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=nods=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=zwnbsp=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=iaa=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=ias=]] => 3,240 [\x{0000}\X{00AD}....] Cf
[[=iat=]] => 3,240 [\x{0000}\X{00AD}....] Cf
As you can see, a lot of Format characters return an erroneous result of 3,240 occurrences. But we’re not going to bother about these wrong equivalence classes, as long as the similar collating names, with the [[.XXX.]] syntax, are totally correct !
Luckily, all the other equivalence classes are also correct, except for [[=ls=]] which returns 2 matches \x{2028} and \x{FE47} ??