Is there a better way to unpack a binary string in Python - python

At the moment I have a byte stream of a string that is received by my Python code and must be converted into a string. For now I managed to extract each character, convert them and append them to a string individually. The code looks something like this:
import struct
# The byte stream is received and stored in byte_stream
text = ''
i = 0
while i < len(byte_stream):
text = text + struct.unpack('c', byte_stream[i])[0]
i += 1
print(text)
But that surely cannot be the most efficient way... Is there a more elegant way to do achieve the same result?

From Convert bytes to a Python string:
byte_stream = [112, 52, 52]
''.join(map(chr, bytes))
>> p44

Related

Unknown pdf encoding from JSON response

I have an API that returns a pdf from json, but it just returns as a long string of integers like following
[{"status":"SUCCESS"},{"data":"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46,...
...,1,32,49,55,10,47,82,111,111,116,32,56,32,48,32,82,10,47,73,110,102,111,32,49,32,48,32,82,62,62,10,115,116,97,114,116,120,114,101,102,10,54,55,54,56,53,10,37,37,69,79,70"}
My questions are:
What is this encoding?
How to convert this into a pdf using python?
P.S: Here is the endpoint to get the full response.
The beginning of data is a hint that you actually have a list of the bytes values of the PDF file: it starts with the byte values of '%PDF-1.4'.
So you must first extract that curious string:
data = json_data[1]['data']
to have:
"37,80,68,70,45,49,46,52,10,37,-45,-21,-23,-31,10,49,32,48,32,111,98,106,10,60,60,47,84,105,116,108,101,32,40,49,49,32,67,83,45,73,73,32,32,83,117,98,106,101,99,116,105,118,101,32,81,46, ..."
convert it to a list of int first, then a byte string (i if i >=0 else i+256 ensure positive values...):
intlist = [int(i) for i in data.split(",")]
b = bytes(i if i >=0 else i+256 for i in intlist)
to get b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (11 CS-II Subjective Q...'
And finaly save that to a file:
with open('file.pdf', 'wb') as fd:
fd.write(b)

Hex String to Image File from varbinary(max)

I have a table in a database which stores image files in varbinary(max) type. I would like to extract, convert and save the image file. Then, I used the cast as varcharmax to extract:
cast([IMG_FILE] as varchar(max))
The result of this cast looks like a hex string (I've removed part of string to protect the privacy of the person):
\
I tried to used this hex string in a online tool (https://codepen.io/abdhass/full/jdRNdj), and the image is corrected displayed (remembering that I've cutted part of string to preserve the persons privacy):
Then, I've tried to take this hex string and tried to convert to a image file using python3. I've been trying a lot of things (the majority found here), but until now, I coudn't save the correct file.
Saving directly doesn't generate the image.
with open(photo_path + 'file.jpg', 'wb') as new_jpg:
new_jpg.write(hexString)
Using binascii.unhexlify returns "Non-hexadecimal digit found"
binascii.unhexlify(hexString)
Converting to int/bin returns invalid literal for int() with base 16:
bin(int(hexString, 16))[2:]
I would like to know how to solve this problem? That is, I would like to take this hex string and save a image file in my computer.
If I have string without \x then I can convert every two chars to integer value, create bytearray and save it
text = ''
integers = []
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
If I have string with \x then first \xff is treated as char's code so I have to use ord() to convert it integer.
text = '\'
integers = []
value = ord(text[0])
integers.append(value)
text = text[1:]
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
Because string in your question is incomplete so it creates incomplete image.
But with data from link it create correct JPG file.
EDIT:
It seems you have raw string and \x is treated as normal string, not part of byte \xff - and you have to remove \x at start using using text = text[2:]
text = r'\xffd8ff...'
integers = []
text = text[2:]
while text:
value = int(text[:2], 16)
integers.append(value)
text = text[2:]
data = bytearray(integers)
with open('output.jpg', 'wb') as fh:
print(fh.write(data))
EDIT:
Simpler version with standard module codecs. It still need to remove \x from string.
If you have bytes:
text = b'\\xffd8ff...' # bytes
import codecs
text = text[2:] # remove `\x`
data = codecs.decode(text, 'hex_codec')
with open('output.jpg', 'wb') as fh:
fh.write(data)
If you have string - then you have to first encode() to bytes:
text = '\\xffd8ff...' # string
import codecs
text = text.encode() # bytes
text = text[2:] # remove `\x`
data = codecs.decode(text, 'hex_codec')
with open('output-1.jpg', 'wb') as fh:
fh.write(data)

Convert String to hex and send via serial in Python

I want to convert the string 400AM49L01 to a hexadecimal form (and then into bytes) b'x\34\x30\x30\x41\x4d\x34\x39\x4c\x30', so I can write it with pySerial.
I already tried to convert the elements of a list, which contains the single hexadecimals like 0x31 (equals 4), into bytes, but this will result in b'400AM49L01'.
device = '400AM49L01'
device = device.encode()
device = bytes(device)
device = str(binascii.hexlify(device), 'ascii')
code = '0x'
text = []
count = 0
for i in device:
if count % 2 == 0 and count != 0:
text.append(code)
code = '0x'
count = 0
code += i
count += 1
text.append((code))
result = bytes([int(x, 0) for x in text])
Really looking forward for your help!
The following code will give the result you expecting.
my_str = '400AM49L01'
"".join(hex(ord(c)) for c in my_str).encode()
# Output
# '0x340x300x300x410x4d0x340x390x4c0x300x31'
What is it doing ?
In order to convert a string to hex, you need to convert each character to the integer value from the ascii table using ord().
Convert each int value to hex using the function hex().
Concatenate all hex value generated using join().
Encode the str to bytes using .encode().
Regards!

Some conversion issues between the byte and strings

Here is the I am trying:
import struct
#binary_data = open("your_binary_file.bin","rb").read()
#your binary data would show up as a big string like this one when you .read()
binary_data = '\x44\x69\x62\x65\x6e\x7a\x6f\x79\x6c\x70\x65\x72\x6f\x78\x69\x64\x20\x31\
x32\x30\x20\x43\x20\x30\x33\x2e\x30\x35\x2e\x31\x39\x39\x34\x20\x31\x34\x3a\x32\
x34\x3a\x33\x30'
def search(text):
#convert the text to binary first
s = ""
for c in text:
s+=struct.pack("b", ord(c))
results = binary_data.find(s)
if results == -1:
print ("no results found")
else:
print ("the string [%s] is found at position %s in the binary data"%(text, results))
search("Dibenzoylperoxid")
search("03.05.1994")
And this is the error I am getting:
Traceback (most recent call last):
File "dec_new.py", line 22, in <module>
search("Dibenzoylperoxid")
File "dec_new.py", line 14, in search
s+=struct.pack("b", ord(c))
TypeError: Can't convert 'bytes' object to str implicitly
Kindly, let me know what I can do to make it functional properly.
I am using Python 3.5.0.
s = ""
for c in text:
s+=struct.pack("b", ord(c))
This won't work because s is a string, and struct.pack returns a bytes, and you can't add a string and a bytes.
One possible solution is to make s a bytes.
s = b""
... But it seems like a lot of work to convert a string to a bytes this way. Why not just use encode()?
def search(text):
#convert the text to binary first
s = text.encode()
results = binary_data.find(s)
#etc
Also, "your binary data would show up as a big string like this one when you .read()" is not, strictly speaking, true. The binary data won't show up as a big string, because it is a bytes, not a string. If you want to create a bytes literal that resembles what might be returned by open("your_binary_file.bin","rb").read(), use the bytes literal syntax binary_data = b'\x44\x69<...etc...>\x33\x30'

How to convert a string of numbers back into binary hex (\x values) type?

Edited:
The code below is reading a file (an image in this example) in binary mode:
with open("img_80px.png", mode='rb') as file:
file_content = file.read()
binary_data = []
for i in file_content:
binary_data.append(i)
Now, printing out file_content will give us a string of hex values in binary b' ' format:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00P\x00\x00\x00<
\x08\x06\x00\x00\x00\xf1\'=\x8c\x00\x00\x00\tpHYs\x00\x00\x0fa
\x00\x00\x0fa\x01\xa8?\xa7i\x00\x009\xeeiTXtXML:com.adobe.xmp
\x00\\x00\x00\x00\x00<?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?>
\n<x:xmpmeta xmlns:x="adobe:ns:meta/"
x:xmptk="Adobe XMP Core 5.6-c138 79.159824, 2016/09/14-01:09:01
....'
So, the code converts this binary string into the list of numbers by going through file_content and appending each bit into the binary_data (not sure if it's the best way, not sure why it even works), so we're getting this:
[137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13 ....]
The question is, how do I convert this list back into that b'' hex binary string or whatever it is? As you can see, it has \x values and metadata in plain text there. Not sure how to convert it back.
If this way of conversion is disrative, could you suggest another way to convert binary in to a string of integers and back?
I tried doing this:
binary_data_string = "".join(map(str, binary_data))
with open("edited_img_80px.png", mode='wb') as edited_file:
binary_hex = bytes.fromhex(binary_data_string)
edited_file.write(binary_hex)
it throws an error:
ValueError: non-hexadecimal number found in fromhex() arg at position 58313
And I also tried to not convert it to a string to preserve the information about each converted item in the list and be able to convert it back into the binary, but I get:
TypeError: fromhex() argument must be str, not list
Since you're using Python 3, you can do this:
>>>numbers = [222, 173, 190, 239]
>>>bytes(numbers)
b'\xde\xad\xbe\xef'
Cheers!

Categories

Resources