Hi, @coises,
If you need my Total_Chars.txt file, simply extract it from the Unicode.zip archive, within my Google Drive account :
https://drive.google.com/file/d/1kYtbIGPRLdypY7hNMI-vAJXoE7ilRMOC/view?usp=sharing
You do not need the other files of this archive, as the main information is described below !
The Total_Chars.txt file is a true UTF-8 file with a BOM, which contains each Unicode assigned and unassigned code-point, once only, from \x{0000} to \x{EFFFD}
Pysically, it contains 3 lines :
A first line, from \x{0000} to \x{0009}, with the \x{000A} line-break
A second line, from \x{000B} to \x{000C}, with the \x{000D} line-break
A third very LONG line with all characters, from \x{000E} to \x{EFFFD}, without some excluded ones ( refer below )
In UTF-8 terms, the Total_Chars.txt file can be decomposed as :
• [\x{0000}-\x{007F}] 128 chars coded with 1 byte => 128
• [\x{0080}-\x{07FF}] 1,920 chars coded with 2 bytes => 3,840
• [\x{0800}-\x{FFFD}] 61,406 chars coded with 3 bytes => 184,218
• Planes 1, 2, 3, 14 = 4 × 65,534 = 262,136 chars coded with 4 bytes => 1,048,544
----------- --------------
325,590 chars 1 236 730 bytes
• BOM 3 bytes
----------- --------------
325,590 chars 1 236 733 bytes
As mentionned above, the Total_Chars.txt does NOT contain the following zones :
• The SURROGATES block, from \x{D800} to \x{DFFF}
• The 32 NOT-Unicode chars, from \x{FDD0} to \x{FDEF}
• The two NOT-Unicode chars, ending the Plane 0 \x{FFFE} and \x{FFFF}
• The two NOT-Unicode chars, ending the Plane 1 \x{1FFFE} and \x{1FFFF}
• The two NOT-Unicode chars, ending the Plane 2 \x{2FFFE} and \x{2FFFF}
• The two NOT-Unicode chars, ending the Plane 3 \x{3FFFE} and \x{3FFFF}
• The COMPLETE planes 4 to 13, from \x{40000} to \x{DFFFF}
• The two NOT-Unicode chars, ending the plane 14 \x{EFFFE} and \x{EFFFF}
• The PRIVATE-USE planes 15 to 16, from \x{F0000} to \x{10FFFF}
Here is, below, the list of all INCLUDED planes, followed with all the EXCLUDED zones of the Total_Chars.txt file :
•=========================================•=======================================•
| Zones INCLUDED in 'Total_Chars.txt' | Range | Plane | # Chars |
•=========================================•================•=========•============•
| | 0000..FFFD | 0 | 63,454 |
•-----------------------------------------•----------------•---------•------------•
| | 10000..1FFFD | 1 | 65,534 |
•-----------------------------------------•----------------•---------•------------•
| | 20000..2FFFD | 2 | 65,534 |
•-----------------------------------------•----------------•---------•------------•
| | 30000..3FFFD | 3 | 65,534 |
•-----------------------------------------•----------------•---------•------------•
| | E0000..EFFFD | 14 | 65,534 |
•=========================================•================•=========•============•
| Total INCLUDED characters | | | 325,590 |
•=========================================•================•=========•============•
•=========================================•================•=========•===========•
| Zones EXCLUDED from 'Total_Chars.txt' | Range | Plane | # Chars |
•=========================================•================•=========•===========•
| Surrogates | D800..DFFF | 0 | 2,048 |
| Not Unicode | FDD0..FDEF | 0 | 32 |
| Not Unicode | FFFE..FFFF | 0 | 2 |
•----------------------------------------------------------•---------•-----------•
| Not Unicode | 1FFFE..1FFFF | 1 | 2 |
•----------------------------------------------------------•---------•-----------•
| Not Unicode | 2FFFE..2FFFF | 2 | 2 |
•----------------------------------------------------------•---------•-----------•
| Not Unicode | 3FFFE..3FFFF | 3 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 40000..4FFFD | 4 | 65,534 |
| Not Unicode | 4FFFE..4FFFF | 4 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 50000..5FFFD | 5 | 65,534 |
| Not Unicode | 5FFFE..5FFFF | 5 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 60000..6FFFD | 6 | 65,534 |
| Not Unicode | 6FFFE..6FFFF | 6 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 70000..7FFFD | 7 | 65,534 |
| Not Unicode | 7FFFE..7FFFF | 7 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 80000..8FFFD | 8 | 65,534 |
| Not Unicode | 8FFFE..8FFFF | 8 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | 90000..9FFFD | 9 | 65,534 |
| Not Unicode | 9FFFE..9FFFF | 9 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | A0000..AFFFD | 10 | 65,534 |
| Not Unicode | AFFFE..AFFFF | 10 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | B0000..BFFFD | 11 | 65,534 |
| Not Unicode | BFFFE..BFFFF | 11 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | C0000..CFFFD | 12 | 65,534 |
| Not Unicode | CFFFE..CFFFF | 12 | 2 |
•----------------------------------------------------------•---------•-----------•
| Unassigned | D0000..DFFFD | 13 | 65,534 |
| Not Unicode | DFFFE..DFFFF | 13 | 2 |
•----------------------------------------------------------•---------•-----------•
| Not Unicode | EFFFE..EFFFF | 14 | 2 |
•----------------------------------------------------------•---------•-----------•
| Supplementary_Private_Use_Area-A | F0000..FFFFD | 15 | 65,534 |
| Not Unicode | FFFFE..FFFFF | 15 | 2 |
•----------------------------------------------------------•---------•-----------•
| Supplementary_Private_Use_Area-B | 100000..10FFFD | 16 | 65,534 |
| Not Unicode | 10FFFE..10FFFF | 16 | 2 |
•=========================================•================•=========•===========•
| Total EXCLUDED characters | | | 788,522 |
•=========================================•================•=========•===========•
•-----------------------------------------•----------------•---------•-----------•
| Total UNICODE characters | 0000..10FFFF | 0 - 16 | 1,114,112 |
•-----------------------------------------•----------------•---------•-----------•
Best Regards,
guy038