Failed to display text using simultaneously combined Chinese fonts characters Correctley
-
Notepad++ v8.5.8 (64-bit)
Build time : Oct 15 2023 - 21:43:56
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line :
Admin mode : ON
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 11 Pro (64-bit)
OS Version : 22H2
OS Build : 22621.2715
Current ANSI codepage : 1252
Plugins :
ComparePlugin (2.0.2)
ComparePlus (1.1)
mimeTools (2.9)
NppConverter (4.5)
NppExport (0.4)
NppMenuSearch (0.9.6)
PythonScript (2) -
@mkupper said in Failed to display text using simultaneously combined Chinese fonts characters Correctley:
How are you changing the the fonts used by Notepad++?
-
@mkupper said in Failed to display text using simultaneously combined Chinese fonts characters Correctley:
My guess is that the underlying fonts used by Windows on our computers do not support those characters. It’s probably more of a Windows configuration than Notepad++. Are you using a Chinese version of Windows?
I’m using Brave v 1.61.101 right now and there is now problem while viewing the complete form of text, i Guess the website that i am taking the text from it uses some web fonts or something like that.
-
@mkupper said in Failed to display text using simultaneously combined Chinese fonts characters Correctley:
Are you using a Chinese version of Windows?
No, its default English US
-
found a temporary solution for those PUA Characters
U+E810 is 时 U+E820 is 她 U+E830 is 能 U+E840 is 没 U+E850 is 总 U+E811 is 大 U+E821 is 出 U+E831 is 对 U+E841 is 成 U+E851 is 从 U+E812 is 地 U+E822 is 也 U+E832 is 小 U+E842 is 只 U+E852 is 无 U+E813 is 为 U+E823 is 得 U+E833 is 多 U+E843 is 如 U+E853 is 情 U+E814 is 子 U+E824 is 里 U+E834 is 然 U+E844 is 事 U+E854 is 己 U+E815 is 中 U+E825 is 后 U+E835 is 于 U+E845 is 把 U+E855 is 面 U+E816 is 你 U+E826 is 自 U+E836 is 心 U+E846 is 还 U+E856 is 最 U+E817 is 说 U+E827 is 以 U+E837 is 学 U+E847 is 用 U+E857 is U+E818 is 生 U+E828 is 会 U+E838 is 么 U+E848 is 第 U+E858 is 但 U+E819 is 国 U+E829 is 家 U+E839 is 之 U+E849 is 样 U+E859 is 现 U+E81A is 年 U+E82a is 可 U+E83a is 都 U+E84a is 道 U+E85a is 前 U+E81b is 着 U+E82b is 下 U+E83b is 好 U+E84b is 想 U+E85b is 些 U+E81C is 就 U+E82c is 而 U+E83c is 看 U+E84c is 作 U+E85c is 所 U+E81d is 那 U+E82d is 过 U+E83d is 起 U+E84d is 种 U+E85d is 同 U+E81e is 和 U+E82e is 天 U+E83e is 发 U+E84e is 开 U+E85e is 日 U+E81f is 要 U+E82f is 去 U+E83F is 当 U+E860 is 又 U+E861 is 行 U+E862 is 意 U+E863 is 动 U+E85f is 手 U+E84f is
Notice! if you want to use find and replace you should following format:
for example if you gonna find and replace following Unicode
U+E85e
you should use it like this
\x{E85e}
While using **Regular Expression** mode enabled
-
Aside: unsolicited advice
It’s not the focus of the topic, but seeing your screenshot where you set the Global override font instead of the Default Style font, I am thinking you have never read the User Manual paragraphs on Global Override. You should do so. In short: use Global Override if you want to ignore syntax highlighting; use Default Style if you want proper inheritance so that it will still syntax highlight but will use the font you specify.
-
-
@2dmnGood4u said in Failed to display text using simultaneously combined Chinese fonts characters Correctley:
it was done temporarily just to give it a go ;)
That’s fine: you can do it however you want. However, it’s actually easier to “give it a go” using the Default Style, as there you don’t need an extra checkbox to make it apply the font
-
It looks like the answer is that this can’t be fixed for you in Notepad++. Both the Brave and Firefox web browsers seem to be doing something special that allows those characters to display correctly. I suspect the browsers are detecting that most of the surrounding text is Chinese ideographs and when the browsers see an attempt to display a private use area character that is not supported by the font that the browsers are either using a font built into the browsers to display those characters or on the fly are translating those characters into valid Unicode character codes for the Chinese ideographs.
You posted a table of translations from various private use area character codes into valid Unicode characters. Assuming that table is widely known then it’s likely the browsers are doing the substitution on the fly meaning there may well be no font that directly supports the characters.
I suspect Notepad++ with Python could be set up to also automatically translate those private use area character codes. That’s outside the scope of what I’m personally familiar with.
Following are some notes that lead me to the conclusion I posted above:
@2dmnGood4u had written:
I’m using Brave v 1.61.101 right now and there is now problem while viewing the complete form of text, i Guess the website that i am taking the text from it uses some web fonts or something like that.
I get the same results as you when I view https://www.xianquwx.com/phpipr/0filli1k.html in Firefox. For example, a
looks like之
in Firefox. When I select-all and copy/paste into Notepad++ the之
displays as
I then downloaded the https://www.xianquwx.com/phpipr/0filli1k.html file and found something curious in the HTML which is:
姜望在长城<i></i>内,<i></i>敢剑劈皇夜羽,计昭南<i></i><i></i><i></i>枪<i></i><i></i>龙。9
Each and every time they wanted to display one of those private use area characters they were using
&#xXXXX
style encoding though without the trailing semicolon. Maybe that is valid HTML. Web browsers figured out what to do.The usage of
&#xXXXX
encoding is more of a curiosity.When I view the 0filli1k.html page I had downloaded using Firefox then U+E839 displays as
rather than之
. I also tried Save-As HTML from Firefox of the original page and still get
rather than之
.At this point I realized that the web browsers have support for presumably legacy character codes that appear on web sites. Whatever that support is seems well beyond what we expect of a text editor such as Notepad++.
I also tried copy/pasting from Firefox into Microsoft Word and it’s like Notepad++ in that Word does not know what to do with the private use area characters. Word displays them as a box with a question mark inside.
I also discovered it’s not easy to see how all of the fonts installed on a Windows machine will display a single character such as U+E839 in hopes of finding a font that displays U+E839 in a desirable way while also supporting standard Chinese ideographs. Thus, while we can pick a font from for Notepad++ to use when displaying text it’s not easy to find out which font we should use.
-
Well Tank you very much for the time you have spent on it.
It was an analysis which I myself did not think of it.I don’t have much knowledge in this field maybe browsers or web pages like this use Woff or something like that.
By the way TYSM.