<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[UTF-8 doc becomes ANSI doc !]]></title><description><![CDATA[<p dir="auto">Hi,<br />
I create a new document in utf-8 without BOM.<br />
I write some words with “normal” ASCII characters : no é, no ï for example<br />
I save it as a txt doc and close it<br />
I open it<br />
Notepad says to me the doc is in ANSI !<br />
I didn’t see it the first time so I didn’t change the encoding<br />
I write inside some other words, one with an é for example<br />
I save the doc and close it<br />
I open the doc : now Notepad says to me that the doc is utf-8 and my é is the charactere xE9 !!!</p>
<p dir="auto">In the parameters I"ve tried the “detect automatically the encoding” (I don’t know if they are the good words, I use french version : “détecter l’encodage automatiquement”) but it changes nothing.</p>
<p dir="auto">How to have a saved utf-8 doc and an opened utf-8 doc even if inside there is no special chars ?</p>
<p dir="auto">Debug info :<br />
Notepad++ v6.9.2<br />
Build time : May 18 2016 - 00:34:05<br />
Path : C:\Program Files (x86)\Notepad++\notepad++.exe<br />
Admin mode : OFF<br />
Local Conf mode : OFF<br />
OS : Windows 8.1<br />
Plugins : mimeTools.dll NppConverter.dll NppExport.dll NppFTP.dll PluginManager.dll</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/12226/utf-8-doc-becomes-ansi-doc</link><generator>RSS for Node</generator><lastBuildDate>Sun, 07 Jun 2026 10:52:45 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/12226.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 15 Aug 2016 22:51:27 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Tue, 23 Aug 2016 22:49:20 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/gerdb42" aria-label="Profile: gerdb42">@<bdi>gerdb42</bdi></a></p>
<p dir="auto">I agree that this would break the principle but on the other hand it could be beneficial as well.<br />
But, now as I’m typing I’m thinking, when this conversion takes place and you don’t know from which encoding it came from<br />
you might corrupt the document without knowing how to fix it.<br />
Yes - bad idea.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17526</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17526</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Tue, 23 Aug 2016 22:49:20 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Tue, 23 Aug 2016 07:36:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/claudia-frank" aria-label="Profile: Claudia-Frank">@<bdi>Claudia-Frank</bdi></a> said:<br />
Not quite an error, but</p>
<blockquote>
<p dir="auto">I would also find it very useful if the setting<br />
New Document-&gt;Encoding: UTF-8 and Apply to opened ANSI files (or any other configured encoding)<br />
would force npp to treat all new opened documents as “configured encoding” when<br />
auto detection of encoding has been disabled.</p>
</blockquote>
<p dir="auto">would require an implicit conversion to UTF-8. And besides breaking the principle of not doing changes without user action, it will pop up a whole bunch of other issues.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17500</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17500</guid><dc:creator><![CDATA[gerdb42]]></dc:creator><pubDate>Tue, 23 Aug 2016 07:36:03 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Mon, 22 Aug 2016 17:25:26 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/gerdb42" aria-label="Profile: gerdb42">@<bdi>gerdb42</bdi></a></p>
<p dir="auto">I assume we have the same understanding so I’m interested to know<br />
what I have written that could be misunderstood?<br />
Could you point me to my error?</p>
<p dir="auto">Thank you and cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17483</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17483</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Mon, 22 Aug 2016 17:25:26 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Mon, 22 Aug 2016 09:49:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/claudia-frank" aria-label="Profile: Claudia-Frank">@<bdi>Claudia-Frank</bdi></a></p>
<p dir="auto">Let’s assume a file contains Byte-sequence 20-A9-20 (in ANSI this would be Space-Copyright-Space). This Sequence is <strong>invalid in UTF-8</strong> so NPP has no alternative other than assuming an single-Byte encoding. And since it never does changes to the file’s content on its own, it is left to treat such a file as ANSI (or whatever your favorite single-Byte encoding is).</p>
<p dir="auto">This is not a shortcoming of NPP but part of that single-Byte heritage we still have to deal with today.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17472</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17472</guid><dc:creator><![CDATA[gerdb42]]></dc:creator><pubDate>Mon, 22 Aug 2016 09:49:40 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Sun, 21 Aug 2016 23:27:27 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/mahabarata" aria-label="Profile: Mahabarata">@<bdi>Mahabarata</bdi></a></p>
<p dir="auto">you are correct, there is no way to always guess the correct encoding.<br />
In regards to phps mb_detect_encoding function, npp is,<br />
when “Autodetect character encoding” is checked, using mozillas chardet library,<br />
so it has such functionality but, as you found already out, cannot guess the<br />
correct encoding all time.</p>
<p dir="auto">I would also find it very useful if the setting<br />
New Document-&gt;Encoding: UTF-8 and Apply to opened ANSI files (or any other configured encoding)<br />
would force npp to treat all new opened documents as “configured encoding” when<br />
auto detection of encoding has been disabled.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17458</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17458</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Sun, 21 Aug 2016 23:27:27 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Sat, 20 Aug 2016 16:47:19 GMT]]></title><description><![CDATA[<p dir="auto">For me the problem is rather a problem of npp than anything.</p>
<p dir="auto">The encoding of a text with no special character could be anything : iso-8859-1, iso-8859-2, iso-8859-15, windows-1252, utf-8, ASCII and I imagine a lot of others.<br />
The encoding of a file with only normal characters can’t not be known just by looking at it !!!</p>
<p dir="auto">In php, there is a function : mb_detect_encoding($txt, array(encoding1, encoding2)).<br />
When you use it, php try to know which encoding is used in the string $txt : it begins to look at the encoding1, if it fits the function answers encoding1, if it doesn’t the function try the second encoding and so and on.</p>
<p dir="auto">So if there is no special character in the $txt, the function will tell you the encoding is encoding1. In your case, with encoding1=utf-8, the function will tell you that the $txt is a utf-8 even if it’s impossible to know !</p>
<p dir="auto">npp has <strong>no option</strong> to do the same (it is what I was looking for between my first post and my second one) except the “Settings - Preferences… - New Document - Apply to opened ANSI file” (thanks again Claudia).<br />
You can’t obtain from npp that your docs are <strong>forever</strong> iso-5988-1, iso-5988-2 or ASCII ! For me it’s a big pb because a polish guy will probably use an ISO-5988-2 encoding and npp will tell some docs are in ANSI (that is windows-1252) and others are in ISO-5988-2 !</p>
<p dir="auto"><strong>So the pb is not only a pb with utf-8/ANSI but with a lot of encodings !</strong></p>
<p dir="auto">I think it will be a <strong>good</strong> evolution of npp to add an option to tell what to do when a doc is open and it is impossible to know the encoding : <strong>only the user can tell it, npp what clever it is will never !</strong></p>
]]></description><link>https://community.notepad-plus-plus.org/post/17434</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17434</guid><dc:creator><![CDATA[Mahabarata]]></dc:creator><pubDate>Sat, 20 Aug 2016 16:47:19 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Fri, 19 Aug 2016 22:42:56 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a></p>
<p dir="auto">I still think this is the wrong way to solve the problem because you loose the automatism by interacting<br />
with npp to run the script, but you insist on the python script plugin solution so the lines in question are</p>
<pre><code>notepad.runMenuCommand("Encoding", "Convert to UTF-8")
notepad.save()
</code></pre>
<p dir="auto">But this does only work if you have an english ui, in case you use a different language than you have to replace<br />
“Encoding” and “Convert to UTF-8” with the ones from your language.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17423</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17423</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Fri, 19 Aug 2016 22:42:56 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Fri, 19 Aug 2016 11:41:33 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a><br />
Sorry, I can’t help you with Python, but it seems like this part of the process:</p>
<ul>
<li><em>A job picks up the data. Converts to xml and saves the xml in text file.</em></li>
</ul>
<p dir="auto">is broken because it does not (always) create valid XML. The XML is invalid any time it claims to be in UTF-8 but includes characters encoded in a single byte (e.g. a “£” encoded as 0xA3) that require multiple bytes to be properly encoded in UTF-8 (e.g. “£” should be encoded as 0xC2 0xA3).</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17409</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17409</guid><dc:creator><![CDATA[Jim Dailey]]></dc:creator><pubDate>Fri, 19 Aug 2016 11:41:33 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Fri, 19 Aug 2016 02:35:15 GMT]]></title><description><![CDATA[<p dir="auto">Jim, all the XMLs have<br />
&lt;?xml version=“1.0” encoding=“UFT-8” ?&gt;<br />
I am working on the source as well to see if there is anything can be done to save as UFT-8. I would like to approach the script option (Python), but at this time I do not have any idea how it is done.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17405</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17405</guid><dc:creator><![CDATA[Bahram Yaghobinia]]></dc:creator><pubDate>Fri, 19 Aug 2016 02:35:15 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 20:12:21 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a></p>
<p dir="auto">Do the XML files contain something like this as their first line:</p>
<pre><code>&lt;?xml version="1.0" encoding="???" ?&gt;
</code></pre>
<p dir="auto">If so, can you provide that line to us?</p>
<p dir="auto">If there is no such line, or if it does not include information about the encoding method, then UTF-8 is assumed.</p>
<p dir="auto">I think that means that if the file contains a £ encoded as  “A3” instead of “C2 A3”, that it isn’t technically valid XML (because it isn’t encoded as UTF-8).</p>
<p dir="auto">Guy, Claudia, or anyone else who has a better understanding please correct me if I am wrong.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17398</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17398</guid><dc:creator><![CDATA[Jim Dailey]]></dc:creator><pubDate>Thu, 18 Aug 2016 20:12:21 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 19:23:24 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a></p>
<p dir="auto">I agree with guy, but I would say that your process, which creates the xml needs to take care<br />
about it as utf-8 is the standard encoding for xml. If this process was designed for writing xml<br />
it should have an option to save as utf-8 encoded. Did you double check this?</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17396</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17396</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Thu, 18 Aug 2016 19:23:24 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 19:21:23 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>Bahram-Yaghobinia</strong>,</p>
<p dir="auto">I was quite <strong>surprised</strong> and really <strong>sorry</strong> that your problem isn’t solved at all and seems even <strong>worse</strong> than before :-(( But, after some minutes, I just realize that this behaviour is quite <strong>logical</strong> :</p>
<ul>
<li>
<p dir="auto">As you changed the <strong>default</strong> encoding, for a <strong>new</strong> document, to <strong>UTF-8 BOM</strong>, which hasn’t, of course, an option relative to opened ANSI files, Notepad++ will <strong>never</strong> try to change an <strong>ANSI-style</strong> file read, in a <strong>true UTF-8</strong> file !</p>
</li>
<li>
<p dir="auto">The contents of your file, opened with your <strong>default</strong> editor N++ <strong>6.9</strong>, seem to be, only, <strong>one-byte</strong> characters. So, as well as characters with value <strong><code>&lt; \x80</code></strong>, the characters, as, for instance, <strong><code>£ © ® — or €</code></strong>, are, also, written with a <strong>one-byte</strong> sequence, between <strong><code>\x80</code></strong> and <strong><code>\xFF</code></strong>. Therefore, N++ always saved it, with its present <strong>ANSI</strong> encoding, without any encoding <strong>conversion</strong> !</p>
</li>
</ul>
<p dir="auto">( See the list of all of them, using the N++ menu option <strong>Edit - Character Panel</strong>, for values &gt; <strong><code>127</code></strong> )</p>
<p dir="auto">So, a solution would be to run a simple <strong>script</strong>, when starting N++, which :</p>
<ul>
<li>
<p dir="auto">would apply the menu option <strong>Encoding - Convert to UTF-8 BOM</strong></p>
</li>
<li>
<p dir="auto">would save the new <strong>UTF-8</strong> contents of your file</p>
</li>
</ul>
<p dir="auto">I think that a <strong>Python</strong> or <strong>NppExec</strong> script should do that job, easily !</p>
<hr />
<p dir="auto">For information, let’s suppose the <strong>exact</strong> text <strong><code>£ , © , ® , — or €</code></strong>, in a <strong>new</strong> file, with an <strong><code>ANSI</code></strong> encoding. This text would produce the sequence of  bytes :</p>
<ul>
<li><strong>A3</strong> 20 2C 20 <strong>A9</strong> 20 2C 20 <strong>AE</strong> 20 2C 20 <strong>97</strong> 20 6F 72 20 <strong>80</strong> ( <strong>18</strong> bytes )</li>
</ul>
<p dir="auto">Once, this text converted with the <strong><code>UTF-8 BOM</code></strong> encoding, it would give the sequence of bytes, below  :</p>
<ul>
<li><strong><code>EF BB BF</code></strong> <strong>C2 A3</strong> 20 2C 20 <strong>C2 A9</strong> 20 2C 20 <strong>C2 AE</strong> 20 2C 20 <strong>E2 80 94</strong> 20 6F 72 20 <strong>E2 82 AC</strong> ( <strong>28</strong> bytes )</li>
</ul>
<p dir="auto">I just indicated, in <strong>bold</strong>, the values of the <strong>five</strong> characters, which have a <strong>different</strong> representation in <strong><code>ANSI</code></strong> and <strong><code>UTF-8 BOM</code></strong> encodings, as well and the value of the <strong><code>BOM</code></strong>, at the beginning of the <strong>second</strong> sequence, in <strong><code>red</code></strong></p>
<p dir="auto">Cheers,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17393</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17393</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Thu, 18 Aug 2016 19:21:23 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 18:34:38 GMT]]></title><description><![CDATA[<p dir="auto">•	Data is sitting in the queue.<br />
•	A job picks up the data. Converts to xml and saves the xml in text file.<br />
•	Brings it to the server and saves it on disk<br />
•	Right now Notepad++ is the default text editor on the server<br />
•	All text files are saved as UFT-8 except the files that have special characters (ALT ####. These files are saved as ANSI).<br />
•	Notepad++ set up:<br />
o	Setting --&gt; Preference --&gt; New document: I have UTF-8 Apply to opened ANSI files is checked.<br />
o	Unchecked the "Auto detect character encoding.<br />
I am trying to add a script that will do this for me, but no idea how it works. Something like (notepad.runMenuCommand(“Encoding”, “Convert to UTF-8”)).</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17391</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17391</guid><dc:creator><![CDATA[Bahram Yaghobinia]]></dc:creator><pubDate>Thu, 18 Aug 2016 18:34:38 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 17:20:16 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a> and all,</p>
<p dir="auto">maybe I misunderstood the topic but I thought the problem is that a xml file,<br />
which has been created outside from npp gets loaded, manipulated and saved as txt.<br />
Is this the case? If not, why not explaining the steps in detail? Otherwise we<br />
are fishing in the dark to try to find a solution for you.</p>
<p dir="auto">In addition, i totally agree what guy wrote about detecting utf-8 files by npp,<br />
but unfortunately there is also a reason not to use UTF-8 with BOM.<br />
In the case your manipulated data gets loaded/processed by other applications,<br />
e.g. databases, webserver, web framworks etc… it might be that those apps<br />
can’t handle those data correctly.<br />
Unfortunattely, there are still many of those applications active,<br />
which don’t support UTF-8 BOM files.<br />
I don’t know if this is the case for you - so, just for information.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17384</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17384</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Thu, 18 Aug 2016 17:20:16 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 16:01:34 GMT]]></title><description><![CDATA[<p dir="auto">Thank you for all the details. I tested your logic and all went well.<br />
I changed my set up to Settings - Preferences… - New Document - Encoding =&gt; Options UTF-8 BOM.<br />
Still the same issue but now all my files are saved as Encoding ANSI.<br />
I think the only way to do this is to write a few lines of codes to covert from ANSI to UTF-8.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17381</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17381</guid><dc:creator><![CDATA[Bahram Yaghobinia]]></dc:creator><pubDate>Thu, 18 Aug 2016 16:01:34 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 17:36:08 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <strong>Bahram-Yaghobinia</strong> and <strong>All</strong>,</p>
<p dir="auto">The <strong>encoding</strong> problems, generally speaking, coupled with the <strong>default</strong> Notepad++ behaviour, are <strong>difficult</strong> enough to understand :-((</p>
<p dir="auto">In the <strong>last</strong> part of this post, I give you a <strong>possible solution</strong> to your problem !?</p>
<p dir="auto">But, before <strong>testing</strong> some situations, here are <strong>my usual</strong> parameters, relative to <strong>encoding</strong>, of the last N++ version <strong>6.9.2</strong> :</p>
<ul>
<li>
<p dir="auto">In <strong>Settings - Preferences… - New Document - Encoding</strong>  =&gt; Options <strong>UTF-8</strong> and <strong>Apply to opened ANSI files</strong> <strong><code>CHECKED</code></strong></p>
</li>
<li>
<p dir="auto">In <strong>Settings - Preferences… - MISC</strong> =&gt; Option <strong>Autodetect character encoding</strong>  <strong><code>UNCHECKED</code></strong></p>
</li>
</ul>
<p dir="auto">In addition, I, generally, have the option <strong>Remember current session for next launch</strong>, in <strong>Settings - Preferences… - Backup</strong>, <strong><code>CHECKED</code></strong></p>
<hr />
<p dir="auto">OK. Now, open Notepad++ and let’s make some simple <strong>tests</strong> :</p>
<ul>
<li>
<p dir="auto">Open a new file ( <strong>CTRL + N</strong> ) =&gt; Information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Type an  <strong>upper</strong> letter <strong><code>A</code></strong></p>
</li>
<li>
<p dir="auto">Save the file, with name <strong>Test.txt</strong> =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would show a <strong>one-byte</strong> file, of value <strong><code>41</code></strong> )</p>
<ul>
<li>
<p dir="auto">Replace the letter <strong>A</strong> by the <strong>Euro</strong> character <strong>€</strong>, of Unicode value = <strong><code>20AC</code></strong> ( &gt; <strong><code>\x7F</code></strong> )</p>
</li>
<li>
<p dir="auto">Save the changes of <strong>Test.txt</strong> =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would show a <strong>three-bytes</strong> file, of value <strong><code>e2 82 ac</code></strong>, which is the <strong>UTF-8</strong> representation of the Unicode code-point <strong><code>20AC</code></strong> of the <strong>Euro</strong> sign )</p>
<ul>
<li>
<p dir="auto">Now, <strong>delete</strong> the <strong>Euro</strong> sign</p>
</li>
<li>
<p dir="auto">Save this <strong>empty</strong> file <strong>Test.txt</strong> =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; This time, we get, for this <strong>empty</strong> file, the <strong><code>ANSI</code></strong> information in the <strong>status</strong> bar !!</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would, effectively, show a <strong>zero-byte</strong> file )</p>
<p dir="auto"><strong>REMARK</strong> :</p>
<p dir="auto">Although I can understand that N++ <strong>cannot</strong> decide about the <strong>right</strong> encoding to choose ( as the file is just <strong>empty</strong> ) the logic would have been that it chose the <strong>default user</strong> choice, which is <strong>UTF-8</strong> !! Let’s go on :</p>
<ul>
<li>
<p dir="auto">Type, again, an <strong>upper</strong> letter <strong>A</strong></p>
</li>
<li>
<p dir="auto">Re-save the file <strong>Test.txt</strong> =&gt; Still information <strong><code>ANSI</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; We have, again, the information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would show a <strong>one-byte</strong> file, of value <strong><code>41</code></strong> )</p>
<ul>
<li>
<p dir="auto">For the last time, delete the <strong>A</strong> character</p>
</li>
<li>
<p dir="auto">Save, again, this <strong>empty</strong> file <strong>Test.txt</strong> =&gt; Still information <strong><code>UTF-8</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; Again, we get, for this <strong>empty</strong> file, the <strong><code>ANSI</code></strong> information in the <strong>status</strong> bar !!</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would, effectively, show a <strong>zero-byte</strong> file )</p>
<ul>
<li>
<p dir="auto">Type, for the last time, the <strong>Euro</strong> sign <strong>€</strong></p>
</li>
<li>
<p dir="auto">Save the changes of <strong>Test.txt</strong> =&gt; Still information <strong><code>ANSI</code></strong>, in the <strong>status</strong> bar</p>
</li>
<li>
<p dir="auto">Close and restart N++</p>
</li>
<li>
<p dir="auto">Click, if necessary, on the <strong>Test.txt</strong> tab  =&gt; This time, the information remains <strong><code>ANSI</code></strong>, in the <strong>status</strong> bar !!</p>
</li>
</ul>
<p dir="auto">( An <strong>Hex</strong> editor would show a <strong>one-byte</strong> file, of value <strong><code>80</code></strong>, as the <strong>ANSI-1252</strong> value, for the <strong>Euro</strong> character, is just … <strong><code>\x80</code></strong> )</p>
<p dir="auto">As you, certainly, would like to obtain an <strong>UTF-8</strong> encoding, then, you would have to use the menu option <strong>Encoding - Convert to UTF-8</strong> and <strong>re-save</strong> the file <strong>Test.txt</strong>, because N++ changed it, again, into a <strong>three-bytes</strong> file, with values <strong><code>e2 82 ac</code></strong> !</p>
<p dir="auto">Beware not to use the option <strong>Encoding - Encode in UTF-8</strong>, which only tries to <strong>re-interpret</strong> the present contents of the file, in the <strong>UTF-8</strong> encoding =&gt; An unknown one-byte <strong><code>x80</code></strong> character, which is an <strong>invalid UTF-8</strong> value !</p>
<hr />
<p dir="auto">So, from above, <strong>Bahram-Yaghobinia</strong>, it’s obvious that the simple <strong>UTF-8</strong> encoding, ( formally named <strong>UTF-8 without BOM</strong> ) must <strong>NOT</strong> be used. I, <strong>strongly</strong>, advice you to adopt the <strong>UTF-8 BOM</strong>, instead ! Why ?</p>
<p dir="auto">Just because the <strong>invisible <code>BOM</code></strong> ( for <strong>Byte Order Mark</strong> ) identifies, <strong>without any ambiguity</strong> the encoding of a file !</p>
<p dir="auto">When you save, from <strong>within</strong> Notepad++, a file, with the <strong>UTF-8 BOM</strong> encoding, three <strong>invisible</strong> bytes ( <strong><code>EF BB BF</code></strong> ) are added, at the <strong>very beginning</strong> of your file. These <strong>three</strong> bytes are, simply, the <strong>UTF-8</strong> representation of the <strong>Byte Order Mark</strong> (BOM) of <strong>Unicode</strong> code-point <strong><code>FEFF</code></strong></p>
<p dir="auto">Thus, <strong>each</strong> time that <strong>N++</strong>, ( or any <strong>modern editor</strong> ! ) will open this file, it <strong>automatically</strong> understands that it’s a <strong>true UTF-8</strong> file, due these invisible three bytes <strong><code>EF BB BF</code></strong>, located at the its beginning !</p>
<hr />
<p dir="auto">If you reproduce <strong>all</strong> the tests above, on a new file, with the <strong>UTF-8 BOM</strong> encoding ( instead of <strong>UTF-8</strong> ), this encoding will remain <strong>UTF-8 BOM</strong>, throughout <strong>all</strong> the tests, even when the <strong>test</strong> file is <strong>empty</strong> ( just note that, in this <strong>specific</strong> case, the file is <strong>not entirely</strong> empty, as it, still, contains the <strong>three</strong> bytes of the <strong>Byte Order Mark</strong> !! )</p>
<p dir="auto">Further information on  :</p>
<p dir="auto"><a href="http://en.wikipedia.org/wiki/Byte_order_mark" rel="nofollow ugc">http://en.wikipedia.org/wiki/Byte_order_mark</a></p>
<p dir="auto"><a href="http://en.wikipedia.org/wiki/Unicode_Specials" rel="nofollow ugc">http://en.wikipedia.org/wiki/Unicode_Specials</a></p>
<p dir="auto"><a href="http://en.wikipedia.org/wiki/Endianness" rel="nofollow ugc">http://en.wikipedia.org/wiki/Endianness</a></p>
<p dir="auto">Best regards</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17374</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17374</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Thu, 18 Aug 2016 17:36:08 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 07:52:41 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a></p>
<p dir="auto">Files in ANSI and UTF-8 w/o BOM cannot be distinguished if they do not contain any non-ASCII characters. So the settings Claudia mentioned tell NPP what to assume when it can’t decide upon the encoding (this becomes important once you later add non-ASCII characters).</p>
<p dir="auto">On the other hand, if a file <strong>does</strong> contain non-ASCII characters, NPP can clearly detect what the encoding is and <strong>it will not perform an automatic conversion</strong>.</p>
<p dir="auto">However, I also don’t get how NPP fits into an automated process. How do you control its actions? Since you know you receive ANSI and need UTF-8 how about throwing together a few lines of C# or powershell?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17369</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17369</guid><dc:creator><![CDATA[gerdb42]]></dc:creator><pubDate>Thu, 18 Aug 2016 07:52:41 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 01:17:14 GMT]]></title><description><![CDATA[<p dir="auto">Thank you for the reply Claudia.<br />
Our process uses the default text editor on the server. Right now the default text editor is NP++ 6.9.<br />
NP++ is saving all the text files with UTF-8 except the files with special Characters (ALT ####).<br />
I was hoping that Np++ could save everything with UTF-8. If not, are there any third party solutions that I can look for?<br />
Your help will be greatly appreciated.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17368</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17368</guid><dc:creator><![CDATA[Bahram Yaghobinia]]></dc:creator><pubDate>Thu, 18 Aug 2016 01:17:14 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Thu, 18 Aug 2016 00:19:30 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/bahram-yaghobinia" aria-label="Profile: Bahram-Yaghobinia">@<bdi>Bahram-Yaghobinia</bdi></a></p>
<p dir="auto">First let me clarify that, currently, there is no way to<br />
force npp to open/save documents as utf-8 ALWAYS.<br />
(I’m talking about builtin functionality - not about 3rd party solutions)<br />
Best what we can do is to minimize the number of “false” encoding hints.</p>
<p dir="auto">In regards to your question, sorry, don’t get the point.<br />
If this is an automated process, how does npp come into place?</p>
<p dir="auto">If the automatism is creating the file, then this process is<br />
responsible to make sure that everything is in utf-8 encoded,<br />
if it isn’t, npp seems to decide which encoding should be used.</p>
<p dir="auto">If you use my settings and you create a file with copyright and registered trademark chars,<br />
save it, close it and reopen it - should be still utf-8 (this is what I get.)<br />
Create a file with only the euro char, save it, close it and reopen it - still utf-8<br />
Close it again but open it with a hex editor you should see only three bytes.<br />
Within hex editor delete those bytes and put in 0x80 value, save it, close it and reopen it<br />
with npp -&gt; file should now be opened as ANSI encoded. You get my point?</p>
<p dir="auto">If I misunderstood your question, please explain your steps in detail.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17365</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17365</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Thu, 18 Aug 2016 00:19:30 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Wed, 17 Aug 2016 01:51:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/claudia-frank" aria-label="Profile: Claudia-Frank">@<bdi>Claudia-Frank</bdi></a><br />
Hi I have the same issue in NotePad++ 6.9.<br />
We have a job that reads some data and place the result in xml format and saves it as .txt file. If I do not have any special characters in the xml the file is successfully saved as UTF-8.<br />
When I have some special characters (£©®—€) in the xml the file is saved as ANSI. If I manually go to Encoding and select convert to UTF-8, then the file is saved as UTF-8 (this is not exceptable because our process is an automated process).</p>
<p dir="auto">Please advise how to save the files with special characters as UTF-8 automatically.</p>
<p dir="auto">My settings:<br />
Setting --&gt; Perference --&gt; New document:  I have UTF-8 Apply to opened ANSI files is checked.<br />
I took your advise and unchecked the "Auto detect character encoding.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17347</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17347</guid><dc:creator><![CDATA[Bahram Yaghobinia]]></dc:creator><pubDate>Wed, 17 Aug 2016 01:51:03 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Wed, 17 Aug 2016 00:40:54 GMT]]></title><description><![CDATA[<p dir="auto">Lol.<br />
I’ve spend all my day to find where was the pb<br />
And then I’ve spent all my day to find another text editor because I did’t find the pb</p>
<p dir="auto">Half an hour ago, I found another great software so I came here to tell I will put NPP to my trash !<br />
And I just read your good answer : it works ! Thanks !</p>
<p dir="auto">But now I don’t know what to do : go on with npp or change to the new one…</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17344</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17344</guid><dc:creator><![CDATA[Mahabarata]]></dc:creator><pubDate>Wed, 17 Aug 2016 00:40:54 GMT</pubDate></item><item><title><![CDATA[Reply to UTF-8 doc becomes ANSI doc ! on Tue, 16 Aug 2016 23:46:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/mahabarata" aria-label="Profile: Mahabarata">@<bdi>Mahabarata</bdi></a></p>
<p dir="auto">I do the same often but it is working for me.<br />
Personally I have the following setup.</p>
<p dir="auto">Settings-&gt;Preferences-&gt;New Document-&gt;Encoding<br />
UTF-8 and Apply to opened ANSI files</p>
<p dir="auto">Settings-&gt;Preferences-&gt;MISC.<br />
unchecked (not used) Autodetect character encoding</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17343</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17343</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Tue, 16 Aug 2016 23:46:09 GMT</pubDate></item></channel></rss>