Hello, All,
In my last post, I forgot some important points ! So, refer to this correct version, below and forget the previous one !
When I began to “follow” Notepad++, in 2012, I was using N++ ANSI versions. At this time, N++'s forums were located on SourceForge.Net and, of course, my first replies to users were stored as ANSI files.
Recently, I was wondering how many files, among my 1,800 posts about, stored on my laptop, have a true ANSI encoding, in order to re-encode them as UTF-8 with the two commands :
Encoding > Convert to UTF-8
File > Save
I suceeded to find out a way to achieve it ! You just need two DOS tools : iconv.exe and xxd.exe
Notes :
Using the xxd.exe program alone, which is a hexdump utility, is not enough to determine the correct encoding. In fact, it can identify, without error :
True UTF-8-BOM files with the first three bytes EFBBBF
True UTF-16 BE BOM with the first two bytes FEFF
True UTF-16 LE BOM with the first two bytes FFFE
But it cannot display, at the first sight, any difference between an ANSI and a UTF-8 file without a BOM, unless scanning the files completely !
Rememver also that, if all bytes of a file have a value below \x80, Notepad++ considers that it’s a pseudo UTF-8 file, by default !
Thus,
Download the
binaries,
dependencies and
documentation archives, relative to the
iconv.exe program, from :
https://gnuwin32.sourceforge.net/packages/libiconv.htm
Move to the directory where you extract the iconv.exe program and its 3 dependencies : libcharset1.dll, libiconv2.dll and libintl3.dll
Now, in this same directory, download the xxd archive from :
https://sourceforge.net/projects/xxd-for-windows/files/xxd-1.11_win32(static).zip/download
Extract the xxd.exe file
Open a DOS window
Run, first, the command chcp 1252 ( in order to get an ANSI output for this DOS window ! )
Move to the folder containing, both, iconv.exe and xxd.exe
Then, run the command iconv -f UTF-8 -t UTF-8 <Full_Directory_Path>\*.* 1> NUL 2>> results.txt
Of course, you must replace the <Full_Directory_Path> zone by the folder location containing all the files to scan for encoding !
=> With this command, you are certain that the OUTPUT file ( results.txt ) contains only files which are not presently encoded in UTF-8 ( so, in N++, they can be true ANSI files, with some bytes between \x80 and \xFF or true UTF-16 files with a BOM )
Now, open the results.txt file in N++
Open the Replace dialog ( Ctrl + H )
SEARCH (?-si)^Permission denied\R|^.*iconv:\x20|: cannot convert$
REPLACE Leave EMPTY
Tick the Wrap around option
Select the Regular expression search mode
Click on the Replace All button
=> We keep the full pathnames, only
Save the results.txt file
Open, again, the Replace dialog ( Ctrl + H )
SEARCH (?-s)^.+
REPLACE \( echo | set /p="$0 " & xxd.exe -l2 -u "$0" \) >> results.txt
Tick the Wrap around option
Select the Regular expression search mode
Click on the Replace All button
Now, add the line @echo OFF at the very beginning of the file
And rename this file as results.bat
If the current encoding of this batch file is UTF-8, run the Encoding > Convert to ANSI option ( IMPORTANT, as the DOS window is ANSI too )
Save this new results.bat file
Run the results.bat file, in the DOS window
Back to Notepad++, open the new results.txt file
Open the Replace dialog ( Ctrl + H )
SEARCH (?-s)^.+(FFFE|FEFF).+\R|\x2000000000:.+
REPLACE Leave EMPTY
Tick the Wrap around option
Select the Regular expression search mode
Click on the Replace All button
=> We get rid of the UTF-16 files which have a BOM equal to FEFF or FFFE
Save the
results.txt file
=> Thus, you should get, only, the true ANSI files, containing some bytes between \x80 and \xFF, that need a further UTF-8 or UTF-8-BOM encoding !
Delete the results.bat file
Finally, close the DOS window
Best Regards,
guy038