Community
    • Login

    Find BOMs

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 5 Posters 8.3k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA Offline
      Alan Kilborn @Meta Chuh
      last edited by

      @Meta-Chuh said:

      apparently any notepad++ search will only begin after the bom

      And this seems right as BOM is meta

      Meta ChuhM 1 Reply Last reply Reply Quote 1
      • Meta ChuhM Offline
        Meta Chuh moderator @Alan Kilborn
        last edited by

        @Alan-Kilborn

        i am bom, i am bom ;-)

        1 Reply Last reply Reply Quote 0
        • Brigham NarinsB Offline
          Brigham Narins @Meta Chuh
          last edited by

          Thanks @Meta-Chuh. And thanks @Alan-Kilborn. I really appreciate your interest in this.

          @Meta-Chuh said:

          i only found some ps, batch, python scripts that list all bom files externally, but you have probably seen them as well (stackoverflow)

          I did see those, yes. Ideally I’d like to come up with a solution inside Notepad++, because these outside scripts and such seem to call for expertise and programs I don’t have.

          ps: if you are faster in implementing something like this, please share it.
          it would be an enrichment.

          I’ll do my best and keep you posted, but I came to you for enrichment and enlightenment! :)

          Alan KilbornA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA Offline
            Alan Kilborn @Brigham Narins
            last edited by

            @Brigham-Narins said:

            I’d like to come up with a solution inside Notepad++

            I understand why you’d want this. My earlier comment was intended to mean that I believe the BOM stuff is “consumed” when a file is opened, and thus isn’t “obtainable” later. I haven’t done any investigation, so could be totally wrong about this…

            By “inside Notepad++”, I’m sure you could write a Pythonscript that could open files in binary and detect BOM. That may or may not qualify as “inside Notepad++” and of course might be more effort than you were hoping to put in…

            1 Reply Last reply Reply Quote 2
            • Alan KilbornA Offline
              Alan Kilborn
              last edited by

              I’m waiting for a Python program to do its work, so I started playing. Here’s a Pythonscript that does what I mentioned, operating on all files currently open within Notepad++. It seemed to work for the little bit of testing I did with it.

              for (filename, bufferID, index, view) in notepad.getFiles():
                  inf = open(filename, 'rb')
                  data_at_start_of_file = inf.read(3)
                  inf.close()
                  if len(data_at_start_of_file) >= 3 and ord(data_at_start_of_file[0]) == 0xEF and ord(data_at_start_of_file[1]) == 0xBB and ord(data_at_start_of_file[2]) == 0xBF:
                      print(filename, ': found utf-8 bom')
                  elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFE and ord(data_at_start_of_file[1]) == 0xFF:
                      print(filename, ': found ucs-2 big endian bom')
                  elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFF and ord(data_at_start_of_file[1]) == 0xFE:
                      print(filename, ': found ucs-2 little endian bom')
              
              1 Reply Last reply Reply Quote 3
              • guy038G Offline
                guy038
                last edited by

                Hello, @brigham_narins, @meta-chuh, @alan-kilborn and All,

                To simply answer your question, I would say that, among all files created from within N++, the files having a BOM ( a Byte Order Mark ) are :

                • The files with UTF8-BOM encoding, which have a 3 bytes invisible BOM ( EF BB BF )

                • The files with UCS-2 BE BOM encoding, which have a 2 bytes invisible BOM ( FE FF )

                • The files with UCS-2 LE BOM encoding, which have a 2 bytes invisible BOM ( FF FE )

                In all the other encodings, BOM does not exist !


                Here is an other way to verify the presence of a BOM :

                • Click on the View > Summary... menu option

                • Calculate the difference File length (in byte) - Current document length !

                You’ve just got the BOM length, which should be 2 or 3 bytes, depending on the file encoding

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 2
                • PeterJonesP Offline
                  PeterJones
                  last edited by

                  @Alan-Kilborn said:

                  Here’s a Pythonscript that does what I mentioned, operating on all files currently open within Notepad++.

                  Thanks for that framework. My thought process was that I wanted to see whether the scintilla buffer contained the BOM or whether it was filtered out before then. With this framework, I added some scintilla-buffer editor.xxx commands, and found that no, the BOM is not in the scintilla buffer:

                  firstBufferID = notepad.getCurrentBufferID()
                  for (filename, bufferID, index, view) in notepad.getFiles():
                      inf = open(filename, 'rb')
                      data_at_start_of_file = inf.read(3)
                      inf.close()
                      if len(data_at_start_of_file) >= 3 and ord(data_at_start_of_file[0]) == 0xEF and ord(data_at_start_of_file[1]) == 0xBB and ord(data_at_start_of_file[2]) == 0xBF:
                          console.write(filename+': found utf-8 bom'+'\n')
                      elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFE and ord(data_at_start_of_file[1]) == 0xFF:
                          console.write(filename+': found ucs-2 big endian bom'+'\n')
                      elif len(data_at_start_of_file) >= 2 and ord(data_at_start_of_file[0]) == 0xFF and ord(data_at_start_of_file[1]) == 0xFE:
                          console.write(filename+': found ucs-2 little endian bom'+'\n')
                  
                      # addendum:
                      notepad.activateBufferID( bufferID )
                      str = editor.getText()
                      console.write('buffer: length = {}\n'.format(len(str)))
                      for i in range(3):
                          console.write('\t#{}: {} => {}\n'.format(i, str[i], ord(str[i])))
                  
                  notepad.activateBufferID( firstBufferID )
                  

                  Which results in:

                  C:\Users\peter.jones\...\Peter's Scratchpad.md: found ucs-2 little endian bom
                  buffer: length = 10861
                      #0: ~ => 126
                      #1: ~ => 126
                      #2: ~ => 126
                  C:\usr\local\apps\notepad++\plugins\Config\PythonScript\scripts\NppForumPythonScripts\17244-utf-bom-reader.py: found utf-8 bom
                  buffer: length = 1513
                      #0: # => 35
                      #1:   => 32
                      #2: e => 101
                  

                  (And no, normally my scratchpad is in UTF8-BOM, not in UCS-2 LE BOM; I just changed it’s encoding temporarily to test out the other BOM-detections.)

                  Alan KilbornA 1 Reply Last reply Reply Quote 2
                  • Alan KilbornA Offline
                    Alan Kilborn @PeterJones
                    last edited by

                    @PeterJones said:

                    and found that no, the BOM is not in the scintilla buffer

                    …we’re back to what I postulated in the beginning: meta!

                    Meta ChuhM 1 Reply Last reply Reply Quote 2
                    • Meta ChuhM Offline
                      Meta Chuh moderator @Alan Kilborn
                      last edited by

                      @Alan-Kilborn

                      …we’re back to what I postulated in the beginning: meta!

                      yes … you were calling ? ;-)

                      Alan KilbornA 1 Reply Last reply Reply Quote 3
                      • Alan KilbornA Offline
                        Alan Kilborn @Meta Chuh
                        last edited by

                        @Meta-Chuh

                        LOL

                        Okay, that has me thinking…what does your username actually mean?

                        Meta ChuhM 1 Reply Last reply Reply Quote 2
                        • Meta ChuhM Offline
                          Meta Chuh moderator @Alan Kilborn
                          last edited by Meta Chuh

                          @Alan-Kilborn

                          it’s my real name.
                          unfortunately our family has generations of such strange names.
                          my brothers for example are called pikachuh and raichuh.

                          here’s a family picture of us:

                          Imgur

                          😄

                          seriously: i got meta as a nick name ages ago, as when i was little, i started to use anything for everything, beyond of what specific items were originally intended, or designed to be used for … and through the years, more and more of doing that actually started to work out, without anybody (including me) understanding why. 😉

                          1 Reply Last reply Reply Quote 2

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors