• Login
Community
  • Login

Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
23 Posts 5 Posters 8.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus @Alan Kilborn
    last edited by May 23, 2021, 4:50 AM

    @Alan-Kilborn said in Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM:

    @Vasile-Caraus said in Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM:

    I got this error, after running the Python script

    You’re probably not running it under the Notepad++ plugin called PythonScript.

    good morning @Alan-Kilborn Yes you are right. I had to use PythonScript from Notepad++. WORKS !! Thank you

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 May 23, 2021, 10:02 AM May 23, 2021, 9:54 AM

      Hi, @Vasile-caraus, @alan-kilborn and All,

      Oh, I’m really silly ! You just can do it with Notepad++, without any restriction !

      So, I assume that all your .html files, in your directory, are UTF-8 encoded ( and not UTF-8-BOM ! )

      In this case, here is the road map :

      • First back-up the directory containing all the .html files to modify ( Wise ! )

      • Start Notepad++

      • If some .html files, located in this specific directory, are opened in N++, it’s best to close all these files

      • Now, open the Find in Files dialog ( Ctrl + Shift + F )

        • SEARCH \A

        • REPLACE \x{FEFF}

        • FILTERS *.html

        • DIRECTORY Your SPECIFIC folder

        • Tick the Regular expression search mode

        • Click on the Replace in Files button

        • Confirm the Are you sure? dialog

      Voila ! Now, all your .html files, in this specific folder, should be UTF-8-BOM encoded ;-))

      Best Regards,

      guy038

      P.S. :

      Note that the opposite manipulation of changing a UTF-8-BOM encoded file to an UTF-8 encoded file is always impossible with a regex !

      Indeed, as the \A is the location between the BOM ( The three bytes \xEF\xBB\xBF ) and the very first byte of your UTF-8-BOM file, you cannot delete the Byte Order Mark with a regex !!

      H 1 Reply Last reply Feb 22, 2022, 8:06 AM Reply Quote 3
      • V
        Vasile Caraus
        last edited by May 23, 2021, 10:14 AM

        @guy038 said in Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM:

        \x{FEFF}

        hello @guy038

        I test your solution, this is the print screen. I can tell you that is not working. Nothing is change. Only the Python script of @Alan-Kilborn WORKS !

        9b11b6a6-4c37-46ac-820c-5ccc6c6d9688-image.png

        1 Reply Last reply Reply Quote 0
        • A
          Alan Kilborn
          last edited by May 23, 2021, 11:13 AM

          @guy038 's solution works for me.

          But, it has some problems:

          • it doesn’t check to see if a BOM is present before adding one
          • if run multiple times it will keep inserting more and more BOM byte sequences at the start of file

          So, if you’re sure that NONE of your files already has a BOM, it seems like the regex replacement approach will work.

          1 Reply Last reply Reply Quote 2
          • V
            Vasile Caraus
            last edited by Vasile Caraus May 23, 2021, 12:38 PM May 23, 2021, 12:37 PM

            @guy038 said in Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM:

            \A

            Indeed. @guy038 solution is good. Except if I press CANCEL for the first times. So, as to work, after I press “Replace All” I must also press OK immediate, not cancel it and again press “Replace All” and Ok

            A 1 Reply Last reply May 23, 2021, 12:57 PM Reply Quote 0
            • A
              Alan Kilborn @Vasile Caraus
              last edited by May 23, 2021, 12:57 PM

              @Vasile-Caraus

              If you end up with extra BOM sequences at the start of your files, you won’t see them. They’ll be zero-width-non-breaking-spaces. They’ll be there, but you won’t know it. I don’t know what that will do to the “integrity” of your files, probably nothing. But I always like to know what’s in my files.

              1 Reply Last reply Reply Quote 1
              • H
                Hellena Crainicu @guy038
                last edited by Hellena Crainicu Feb 22, 2022, 8:07 AM Feb 22, 2022, 8:06 AM

                @guy038 said in Change / Save encoding How to convert 800 txt files UTF-8 to UTF-8-BOM:

                SEARCH \A
                REPLACE \x{FEFF}
                FILTERS *.html

                doesn’t work anymore. I just test it !
                Doesn’t convert anymore ANSI files to UTF-8-BOM

                E 1 Reply Last reply Feb 22, 2022, 9:13 AM Reply Quote 0
                • E
                  Ekopalypse @Hellena Crainicu
                  last edited by Feb 22, 2022, 9:13 AM

                  @hellena-crainicu

                  Why do you think this is so? How did you test?
                  A quick test from my side seems to show that it still works.
                  The two lines in the PythonScript console show the state before and after the replace action was executed.

                  d68c0f7e-bef8-48f1-9f3b-fb5d57c162ca-image.png

                  H 1 Reply Last reply Feb 28, 2022, 7:08 AM Reply Quote 1
                  • H
                    Hellena Crainicu @Ekopalypse
                    last edited by Feb 28, 2022, 7:08 AM

                    @ekopalypse ok, try “Find in files”, for multiple files.

                    E 1 Reply Last reply Feb 28, 2022, 8:21 AM Reply Quote 0
                    • E
                      Ekopalypse @Hellena Crainicu
                      last edited by Feb 28, 2022, 8:21 AM

                      @hellena-crainicu

                      That works for me too.

                      e7b5ba8e-5e9f-4164-ac83-1f8e62771b78-image.png

                      1 Reply Last reply Reply Quote 2
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors