Encodage

Franckybleu

Bonjour
Depuis la version 1.6 de Notepad++ il semble y avoir un problème concernant l’encodage des fichiers !
Exemple : Notepad++ m’indique que ce fichier (révision 113136) https://zone.spip.net/trac/spip-zone/browser/spip-zone/_plugins/mosaique/trunk/paquet.xml est en CRLF et Windows-1258
Alors que Geany m’indique CRLF mais utf-8 comme les versions antérieures à Notepad++ 7.6
Mon PC est en Windows 10 (1803) en langue française
Franck

Franckybleu

https://zone.spip.net/trac/spip-zone/browser/spip-zone/plugins/mosaique/trunk/paquet.xml
Il faut faire l’ajout d’un _ avant et après le mot plugin dans le lien, car il sans quoi, le lien ne fonctionne pas

Eko palypse

@Franckybleu

Interestingly, a commit was made with 7.6 which should improve the char detection area.
Not sure if this commit has side effects on the other side.
Did you try to disable the auto detection at all? (Settings->Preferences->MISC.->Autodetect character encoding)

Meta Chuh

@Franckybleu

i can confirm the wrong detection as windows-1258 (vietnamese) instead of utf-8 in notepad++ 7.6.2
it seems to be triggered by the ï in Mosaïque

here are the direct links to your paquet.xml if any one else likes to test it:

page link to paquet.xml

direct download link to paquet.xml

Franckybleu

Merci de votre aide !!! :-)
Oui, le problème vient de l’auto-détection (paramètre/préférences/divers)
Quand “détecter l’encodage automatiquement” est cocher, alors le problème est présent !

Un autre exemple:
Le fichier paquet.xml en révision 111406 de https://zone.spip.net/trac/spip-zone/browser/spip-zone/plugins/reservation_communication/trunk/paquet.xml?rev=111406
Geany m’indique utf-8 !
Notepad++ 7.2 avec l’auto-détection cocher me dit = windows-1258
Notepad++ 7.2 sans l’auto-détection coché me dit = utf-8
https://zone.spip.net/trac/spip-zone/export/111406/spip-zone/plugins/reservation_communication/trunk/paquet.xml

Ce qui implique, le même problème que dans ce sujet :-(
https://notepad-plus-plus.org/community/topic/16828/encoding
Franck

Eko palypse

@Franckybleu

Encoding is a difficult area, there is not really a safe way to ensure
that the correct encoding is detected always.
Personally, I don’t use the automatic encoding at all.

Meta Chuh

@Franckybleu

i can confirm all

Ce qui implique, le même problème que dans ce sujet :-(
https://notepad-plus-plus.org/community/topic/16828/encoding

yes, but thanks to you, the devs have real life example files now. 👍
the topic you mentioned did not provide us with any file(s) i asked for that can be tested.

i’ve opened a new issue #5202 at github:
Auto Detect UTF-8 Encoding for French is broken in Notepad++ 7.6.x

Franckybleu

Si besoin, je peux fournir d’autres exemples, j’ai eu le problème sur environ 15/20 fichiers !

guy038

Hello, @meta-chuh, @franckybleu, @eko-palypse and All,

@meta-chuh, I read your #5202 issue from :

https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5202

And I agree, that, for instance, the phrase “Cette mosaïque est jolie” ( so, in English : This mosaic is pretty nice ), in a new UTF-8-encoded file, is wrongly detected as the Windows-1258 encoding :-((

Let’s me add 3 remarks :

If we simply change this phrase, as “Cette mosaïque était jolie” ( This mosaic was pretty nice ), the UTF-8 encoding is, this time, preserved ! Quite logic, as this auto-detection needs some “material”, in order to works correctly !
You may, of course, just disable the auto-detection of encodings, in Settings > Preferences... > MISC.
You may, also, convert your file to the UTF-8-BOM encoding ( Encoding > Convert to UTF-8 BOM ), before saving it and you will not have any encoding problem, anymore, for that file, thanks to the 3 invisible bytes of the BOM ;-))

Best Regards

guy038

Meta Chuh

@guy038

there’s more than that broken with the current auto detection !
same vietnamese detection happens if you save the word “Réservation” to a new utf-8 file and reopen,

so only the combination of ï and é in the same document is detected correctly, é only will detect it as vietnamese

this did not happen prior to this commit that @Eko-palypse mentioned.

You may, also, convert your file to the UTF-8-BOM encoding

this is not possible in many cases, because those files like in this case are in an open source repository.

same with spanish, exept if i have an ñ in my documents, so i didn’t notice it before, as most of them have an ñ.
german characters work ok so far.