Problem adding and copy-pasting UTF characters.



  • I am using Notepad++ v6.8.3 on Windows 10. Ouukay, an oldish version… but the answer to my question in this version might still be the same as the answer in the newest version.

    I have a php file, encoded as UTF-8 (without BOM), where I convert some hand-picked UTF characters in a text string into corresponding HTML entities. The commands look like this, and they function OK:

            $api_data = str_replace('[xF3]', 'ó', $api_data);
            $api_data = str_replace('[xD3]', 'Ó', $api_data);
    

    What is written above as [xF3], is not actually a sequence of five ASCII characters “[”, “x”, “F”, “3”, “]”. Rather, it is one character (the UTF character “ö”, I assume), which Notepad++ shows to me as a grey square with letters “xF3” inside it.

    My first problem: I need to expand the code, and add a similar handling for character ø, which would look like this:

            $api_data = str_replace('[xF8]', 'ø', $api_data);
    

    But I cannot find a way to create this grey-square character [xF8] in Notepad++. I wrote the original code a year ago, so I must have been able to create those characters back then, but I no longer remember how I did it.

    If I write the desired ø character in some other program, and copy-paste it into this php file, it appears in the code as the character actually looks like – and the code will not work. This UTF character in input string will not get converted into “ø”.

    My second problem is that I cannot copy-paste these code lines in Notepad++ into a different part of the code. If I copy the two code lines mentioned at start of this post, and then paste them to a different place in the same php file, what I get with the paste command is not identical with the lines that I copied. Rather, they look like this:

            $api_data = str_replace('󧬠'ó', $api_data);
            $api_data = str_replace('ӧ, 'Ó', $api_data);
    

    What happens during the copy-paste process is that "[xF3]’, " (the grey square character + three basic ASCII characters) becomes “󧬠’” (some strange character + one basic ASCII character), and on the second line "[xD3]’, " (the grey square character + three basic ASCII characters) becomes "ӧ, " (one special character [which by chance is LOWERCASE of the uppercase character to encode] + two basic ASCII characters.

    Below is a broader sample of what weird stuff I get with the Paste command, when the copied code lines had as the firts parameter of str_replace command a grey-square UTF character (in single quotation marks) which corresponds to the html entity code given as second parameter of the command:

            $api_data = str_replace('䧬 'ä', $api_data);
            $api_data = str_replace('ħ, 'Ä', $api_data);
            $api_data = str_replace('��'ö', $api_data);
            $api_data = str_replace('֧, 'Ö', $api_data);
            $api_data = str_replace('姬 'å', $api_data);
            $api_data = str_replace('ŧ, 'Å', $api_data);
            $api_data = str_replace('��'ü', $api_data);
            $api_data = str_replace('ܧ, 'Ü', $api_data);
            $api_data = str_replace('᧬ 'á', $api_data);
            $api_data = str_replace('g, 'Á', $api_data);
            $api_data = str_replace('駬 'é', $api_data);
            $api_data = str_replace('ɧ, 'É', $api_data);
            $api_data = str_replace('� 'í', $api_data);
            $api_data = str_replace('ͧ, 'Í', $api_data);

Log in to reply