Community
    • Login

    How to auto-convert text (Umlaute) when changing file encoding from ANSI to UTF-8 BOM?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    7 Posts 2 Posters 261 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Claudia SvensonC Offline
      Claudia Svenson
      last edited by Claudia Svenson

      Assume I open a new, empty file with file encoding ANSI.

      I type some german text which contains Umlaute like äöü

      Later I decide to switch the file encoding to UTF-8 by clicking on menu

      Encoding—>UTF-8 BOM

      Yes, the file encoding is now UTF-8 BOM.

      But the Umlaute äöü appear now as
      xE4 xF6 xFC (with black background)

      How can I tell NP++ to automatically convert all Umlaute to the corresponding UTF-8 bytes when switching file encoding from ANSI to UTF-8?

      If this is not possible automatically:
      How can I mark the text and do it manually?

      CoisesC 1 Reply Last reply Reply Quote 0
      • CoisesC Offline
        Coises @Claudia Svenson
        last edited by Coises

        @Claudia-Svenson said:

        Assume I open a new, empty file with file encoding ANSI.

        I type some german text which contains Umlaute like äöü

        Later I decide to switch the file encoding to UTF-8 by clicking on menu

        Encoding—>UTF-8 BOM

        Yes, the file encoding is now UTF-8 BOM.

        But the Umlaute äöü appear now as
        xE4 xF6 xFC (with black background)

        How can I tell NP++ to automatically convert all Umlaute to the corresponding UTF-8 bytes when switching file encoding from ANSI to UTF-8?

        If this is not possible automatically:
        How can I mark the text and do it manually?

        Use the bottom section of the Encoding menu, e.g, Convert to UTF-8-BOM, when you want to convert.

        There are usually only two times you should use the top section:

        • when you have a completely empty new file and you want to change from the default encoding to something else before you start adding text;

        • when you have just opened a file and the encoding Notepad++ determined it to be is wrong, so you want to change it and have Notepad++ reread the file as a different encoding.

        Claudia SvensonC 1 Reply Last reply Reply Quote 3
        • Claudia SvensonC Offline
          Claudia Svenson @Coises
          last edited by

          @Coises

          funzt. Danke

          Claudia SvensonC 1 Reply Last reply Reply Quote 2
          • Claudia SvensonC Offline
            Claudia Svenson @Claudia Svenson
            last edited by

            It works only partially.

            Assume I started with an empty file an ANSI encoding.
            I write some text.

            Then (later) I copied some UTF-8 encoded text from browser webpage or from other document into this ANSI file.

            Now this file contains two types of text:
            One part is ANSI encoded the other UTF-8 encoded.

            No matter if I switch the file encoding or if I convert the text
            a part of the file content does not match the encoding.

            What I need is a smarter convert feature:

            If I select a part of the text and click a “Convert to UTF-8 BOM” then NP++ should…

            …check if some text is marked. If yes, then only the marked text should be converted. Otherwise the full text.

            Can this be implemented in the next release?

            CoisesC 1 Reply Last reply Reply Quote 0
            • CoisesC Offline
              Coises @Claudia Svenson
              last edited by

              @Claudia-Svenson said:
              Assume I started with an empty file an ANSI encoding.
              I write some text.

              Then (later) I copied some UTF-8 encoded text from browser webpage or from other document into this ANSI file.

              Now this file contains two types of text:
              One part is ANSI encoded the other UTF-8 encoded.

              No matter if I switch the file encoding or if I convert the text
              a part of the file content does not match the encoding.

              Have you actually tried this? Can you show a minimal demonstration? I can’t reproduce it.

              When you paste text from the Windows clipboard into a document, the text should be converted right then to match the current encoding Scintilla (the control used to display documents in Notepad++) is using. (That encoding is not always the same as the file encoding that will be saved; it will always be either ANSI, if the file encoding is ANSI, or else UTF-8; anything else is converted when reading or writing the file.) There cannot be two different encodings in the same document window in Notepad++.

              Does the text appear wrong in Notepad++ when you paste it? Or are you saying that it looks good when you paste it, but when you reload the file the text you pasted is corrupted?

              If the text appears wrong when you paste, it is probably a problem with the application from which you are copying the text. If it is a common application that some of us might have, please tell us and give an example of how to reproduce the problem; but I suspect it will be out of Notepad++’s control.

              If the text appears good when you paste it but is corrupt when you reload, then you are probably pasting characters that are not in the codepage you are using. That can happen if you are using a named legacy codepage (not ANSI, but something like ISO-8859-15), because internally Notepad++ uses UTF-8 when you have anything other than ANSI. The pasted characters look fine, because they exist in UTF-8, but they can’t be converted to the codepage when you save if they aren’t in the codepage.

              Claudia SvensonC 1 Reply Last reply Reply Quote 2
              • Claudia SvensonC Offline
                Claudia Svenson @Coises
                last edited by

                @Coises

                You want a sample. Ok here it is.
                Download the following simplified text file with UTF-8 BOM encoding
                I zipped it to prevent conversion by webserver.

                https://mega.nz/file/RMQlzCTD#LhDRpJSoWAL4Vi6EP8-XlUyDeHpfp1-_aRFLlCMzICk

                The first two lines contain english sentence with german Umlaute
                The last two lines russian/cyrillic text

                If I switch encoding to ANSI I can see the Umlaute but the russian text is scrambled.

                How can I convert only a part (e.g.first two lines from ANSI to UTF-8 BOM)?

                CoisesC 1 Reply Last reply Reply Quote 0
                • CoisesC Offline
                  Coises @Claudia Svenson
                  last edited by

                  @Claudia-Svenson said:

                  @Coises

                  You want a sample. Ok here it is.
                  Download the following simplified text file with UTF-8 BOM encoding
                  I zipped it to prevent conversion by webserver.

                  https://mega.nz/file/RMQlzCTD#LhDRpJSoWAL4Vi6EP8-XlUyDeHpfp1-_aRFLlCMzICk

                  The first two lines contain english sentence with german Umlaute
                  The last two lines russian/cyrillic text

                  If I switch encoding to ANSI I can see the Umlaute but the russian text is scrambled.

                  How can I convert only a part (e.g.first two lines from ANSI to UTF-8 BOM)?

                  1. Select the lines or characters that are incorrectly encoded (ANSI characters in a UTF-8-BOM file).

                  2. Switch to ANSI. (Encoding|ANSI).

                  3. Copy the highlighted characters (which should now appear correctly) to the clipboard.

                  4. Switch to UTF-8-BOM. (Encoding|UTF-8-BOM).

                  5. Paste.

                  6. You can now save the corrected file as UTF-8-BOM.

                  With your example file, the highlighted section persists perfectly when changing encoding. I wouldn’t trust that to be the case always: watch what is happening to be sure the right section is highlighted both when copying and when pasting.

                  (Earlier in this thread I said you usually shouldn’t use the top section of the Encoding menu except in the two particular cases that I listed. This is one the rare other cases where you need to use the top section.)

                  1 Reply Last reply Reply Quote 0

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors