Find BOMs
- 
 Hello: 
 Is it possible to use Find in Files to find which files in a folder have the Byte Order Marker in them? I have two test files in a folder–both are UTF-8 encoded, one has the BOM and the other doesn’t. I tried using the regex \xEF\xBB\xBF in the Find What box, but the search returned no results.Am I doing something wrong? Is it not possible? Is there another way to find BOMs? Thanks, 
 Brig
- 
 hi @Brigham-Narins 
 this is a very good and intriguing question. 👍Am I doing something wrong? no, you are doing everything correctly. apparently any notepad++ search will only begin after the bom. 
 this applies to any search mode, regardless if it is a normal search within the current document, or a find in files search.for now i did not find any possibility to find e.g. ef bb bf (utf-8-bom) with the built in functions. 
 i only found some ps, batch, python scripts that list all bom files externally, but you have probably seen them as well (stackoverflow)i think i/we need some more time to figure out something simple. 
 (e.g. a custom batch script at the run menu, that searches all files at the path of the current active document. or a python script if you have this plugin installed)ps: if you are faster in implementing something like this, please share it. 
 it would be an enrichment.
- 
 @Meta-Chuh said: apparently any notepad++ search will only begin after the bom And this seems right as BOM is meta 
- 
 i am bom, i am bom ;-) 
- 
 Thanks @Meta-Chuh. And thanks @Alan-Kilborn. I really appreciate your interest in this. @Meta-Chuh said: i only found some ps, batch, python scripts that list all bom files externally, but you have probably seen them as well (stackoverflow) I did see those, yes. Ideally I’d like to come up with a solution inside Notepad++, because these outside scripts and such seem to call for expertise and programs I don’t have. ps: if you are faster in implementing something like this, please share it. 
 it would be an enrichment.I’ll do my best and keep you posted, but I came to you for enrichment and enlightenment! :) 
- 
 @Brigham-Narins said: I’d like to come up with a solution inside Notepad++ I understand why you’d want this. My earlier comment was intended to mean that I believe the BOM stuff is “consumed” when a file is opened, and thus isn’t “obtainable” later. I haven’t done any investigation, so could be totally wrong about this… By “inside Notepad++”, I’m sure you could write a Pythonscript that could open files in binary and detect BOM. That may or may not qualify as “inside Notepad++” and of course might be more effort than you were hoping to put in… 
- 
 I’m waiting for a Python program to do its work, so I started playing. Here’s a Pythonscript that does what I mentioned, operating on all files currently open within Notepad++. It seemed to work for the little bit of testing I did with it. for (filename, bufferID, index, view) in notepad.getFiles(): inf = open(filename, 'rb') data_at_start_of_file = inf.read(3) inf.close() if len(data_at_start_of_file) >= 3 and ord(data_at_start_of_file[0]) == 0xEF and ord(data_at_start_of_file[1]) == 0xBB and ord(data_at_start_of_file[2]) == 0xBF: print(filename, ': found utf-8 bom') elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFE and ord(data_at_start_of_file[1]) == 0xFF: print(filename, ': found ucs-2 big endian bom') elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFF and ord(data_at_start_of_file[1]) == 0xFE: print(filename, ': found ucs-2 little endian bom')
- 
 Hello, @brigham_narins, @meta-chuh, @alan-kilborn and All, To simply answer your question, I would say that, among all files created from within N++, the files having a BOM( a Byte Order Mark ) are :- 
The files with UTF8-BOMencoding, which have a3bytes invisible BOM (EF BB BF)
- 
The files with UCS-2 BE BOMencoding, which have a2bytes invisible BOM (FE FF)
- 
The files with UCS-2 LE BOMencoding, which have a2bytes invisible BOM (FF FE)
 In all the other encodings, BOMdoes not exist !
 Here is an other way to verify the presence of a BOM:- 
Click on the View > Summary...menu option
- 
Calculate the difference File length (in byte)-Current document length!
 You’ve just got the BOMlength, which should be2or3bytes, depending on the file encodingBest Regards, guy038 
- 
- 
 @Alan-Kilborn said: Here’s a Pythonscript that does what I mentioned, operating on all files currently open within Notepad++. Thanks for that framework. My thought process was that I wanted to see whether the scintilla buffer contained the BOM or whether it was filtered out before then. With this framework, I added some scintilla-buffer editor.xxxcommands, and found that no, the BOM is not in the scintilla buffer:firstBufferID = notepad.getCurrentBufferID() for (filename, bufferID, index, view) in notepad.getFiles(): inf = open(filename, 'rb') data_at_start_of_file = inf.read(3) inf.close() if len(data_at_start_of_file) >= 3 and ord(data_at_start_of_file[0]) == 0xEF and ord(data_at_start_of_file[1]) == 0xBB and ord(data_at_start_of_file[2]) == 0xBF: console.write(filename+': found utf-8 bom'+'\n') elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFE and ord(data_at_start_of_file[1]) == 0xFF: console.write(filename+': found ucs-2 big endian bom'+'\n') elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFF and ord(data_at_start_of_file[1]) == 0xFE: console.write(filename+': found ucs-2 little endian bom'+'\n') # addendum: notepad.activateBufferID( bufferID ) str = editor.getText() console.write('buffer: length = {}\n'.format(len(str))) for i in range(3): console.write('\t#{}: {} => {}\n'.format(i, str[i], ord(str[i]))) notepad.activateBufferID( firstBufferID )Which results in: C:\Users\peter.jones\...\Peter's Scratchpad.md: found ucs-2 little endian bom buffer: length = 10861 #0: ~ => 126 #1: ~ => 126 #2: ~ => 126 C:\usr\local\apps\notepad++\plugins\Config\PythonScript\scripts\NppForumPythonScripts\17244-utf-bom-reader.py: found utf-8 bom buffer: length = 1513 #0: # => 35 #1: => 32 #2: e => 101(And no, normally my scratchpad is in UTF8-BOM, not in UCS-2 LE BOM; I just changed it’s encoding temporarily to test out the other BOM-detections.) 
- 
 @PeterJones said: and found that no, the BOM is not in the scintilla buffer …we’re back to what I postulated in the beginning: meta! 
- 
 
- 
 
- 
 it’s my real name. 
 unfortunately our family has generations of such strange names.
 my brothers for example are called pikachuh and raichuh.here’s a family picture of us:  😄 seriously: i got meta as a nick name ages ago, as when i was little, i started to use anything for everything, beyond of what specific items were originally intended, or designed to be used for … and through the years, more and more of doing that actually started to work out, without anybody (including me) understanding why. 😉 



