Can't decode byte 0x89 in position 0: ordinal not in range(128)

LanceMarchetti

Hey Guys…need some help again with binary to Hex It seems I’m back to square one when trying to create a working python script that translates the Unicode to binary before converting to Hex. Error reads: UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x89 in position 0: ordinal not in range(128)


import re

# Use a very large number of chars per line to keep it on one line.
CHAR_PER_LINE = 1 << 63

def convert_to_hex_lines(binary_data):
    # Convert binary data to hex representation
    raw_hex = ''.join('%02X' % ord(byte) for byte in binary_data)
    if CHAR_PER_LINE >= len(raw_hex):
        return raw_hex
    return '\r\n'.join(re.findall('.{1,%s}' % CHAR_PER_LINE, raw_hex))

# Read the binary data from the selected text in Notepad++
selstart = editor.getSelectionStart()
selend = editor.getSelectionEnd()

if selstart == selend:
    # If no selection, read the entire document as binary
    editor.beginUndoAction()
    editor.setReadOnly(False)
    editor.selectAll()  # Select all text
    binary_data = editor.getText().encode('latin-1')  # Encode as binary
    hex_output = convert_to_hex_lines(binary_data)
    editor.setText(hex_output)
    editor.endUndoAction()
else:
    # If there is a selection, convert only the selected text
    editor.beginUndoAction()
    editor.setReadOnly(False)
    selected_text = editor.getSelText().encode('latin-1')  # Encode as binary
    hex_output = convert_to_hex_lines(selected_text)
    editor.replaceSel(hex_output)
    editor.endUndoAction()

Any help would be appreciated
Thanks. binary-Hex.Notepad++ .png

Coises

I don’t know Python, so I could be mistaken, but I think your approach cannot work reliably. I believe editor.getText() and editor.getSelText() return Python strings, which are always utf-8. Immediately there will be a problem: 0x89 is not a valid start for a utf-8 character, so if the document coding is set to utf-8, the result will be invalid; if it’s something else, the transformation might not be what you expect. I suspect what is happening here is that your document encoding is set to utf-8, and Python is balking when the code behind editor.getSelText() tries to put that in a Python string.

You might have better luck with editor.getCharAt(), which, according to documentation returns an integer rather than a string. Loop through the selection one byte at a time and convert each byte to hex.

Alan Kilborn

@LanceMarchetti said in Can't decode byte 0x89 in position 0: ordinal not in range(128):

Any help would be appreciated

The best help I can give would be to tell you that I would not do what you are trying to do.

Mark Olson

Don’t use editor.getText with binary files. Notepad++ treats the file’s text as a NUL-terminated string for the purposes of that command, and most other text-retrieving commands. Apparently there are some ways for a plugin to get around this, but they are undoubtedly much more arcane.

Have I ever mentioned how irritated I am that Don Ho refuses to implement a system for warning people that NPP is an inappropriate app for working with binary files?