Struct.unpack and Length of Byte Object - python

I have the following code (data is a byte object):
v = sum(struct.unpack('!%sH' % int(len(data)/2), data))
The part that confuses me is the %sH in the format string and the % int(len(data)/2
How exactly is this part of the code working? What is the length of a byte object? And what exactly is this taking the sum of?

Assuming you have a byte string data such as:
>>> data = b'\x01\x02\x03\x04'
>>> data
'\x01\x02\x03\x04'
The length is the number of bytes (or characters) in the byte string:
>>> len(data)
4
So this is equivalent to your code:
>>> import struct
>>> struct.unpack('!2H', data)
(258, 772)
This tells the struct module to use the following format characters:
! - use network (big endian) mode
2H - unpack 2 x unsigned shorts (16 bits each)
And it returns two integers which correspond to the data we supplied:
>>> '%04x' % 258
'0102'
>>> '%04x' % 772
'0304'
All your code does is automatically calculate the number of unsigned shorts on the fly
>>> struct.unpack('!%sH' % int(len(data)/2), data)
(258, 772)
But the int convesion is unnecessary, and it shouldn't really be using the %s placeholder as that is for string substitution:
>>> struct.unpack('!%dH' % (len(data)/2), data)
(258, 772)
So unpack returns two integers relating to the unpacking of 2 unsigned shorts from the data byte str. Sum then returns the sum of these:
>>> sum(struct.unpack('!%dH' % (len(data)/2), data))
1030

How your code works:
You are interpreting the byte structure of data
struct.unpack uses a string to determine the byte format of the data you want to interpret
Given the format stuct.unpack returns an iterable of the interpreted data.
You then sum the interable.
Byte Formatting
To interpret your data you are passing, you create a string to tell Python what form data comes in. Specifically the %sH part is a short hand for this number of unsigned shorts which you then format to say the exact number of unsigned short you want.
In this case the number is:
int(len(data) / 2)
because an unsigned short is normally 2 bytes wide.

Related

Converting integer to a pair of bytes produces unexpected format?

I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?
All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'

How to turn a binary string into a byte?

If I take the letter 'à' and encode it in UTF-8 I obtain the following result:
'à'.encode('utf-8')
>> b'\xc3\xa0'
Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:
byte = bytearray('à','utf-8')
for x in byte:
print(bin(x))
I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:
s = '1100001110100000'
value1 = s[0:8].encode('utf-8')
value2 = s[9:16].encode('utf-8')
value = value1 + value2
print(chr(int(value, 2)))
>> 憠
No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.
>>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
'à'
There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.
you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000
you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)
s = '1100001110100000'
value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
int(s[8:],2)] # bits 8..15 (8 total)
)
print(value1.decode("utf8"))
Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):
>>> s = '1100001110100000'
>>> int(s,2)
50080
>>> int(s,2).to_bytes(len(s)//8,'big')
b'\xc3\xa0'
>>> int(s,2).to_bytes(len(s)//8,'big').decode()
'à'

Python - Packing string representing short int into 2 byte string

I do some sort of calculation and in the end I have an int that needs at most 16 bits to represent. I want to pack it into a string in unsigned short int format. For example, if I have 1223, I want to store 0000010011000111.
I tried using:
n = pack('H', 1223)
When I tried to print in (in binary representation) I got:
11000111 100
But I want the leading zeros to also be encoded into n, how can I do it elegantly?

Python 3.5 - Convert bytes object to 16bit hex string (b'\x07\x89' -> '0x0789')

some device returns data (hex bytes) in the form of
data = b'\x07\x89\x00\x00\x12\x34'
How can I convert this to something in the form of the following?
['0x0789', '0x0000', '0x1234']
I already tried compositions of hexlify. I am using Python 3.5.
Take groups of two from your bytes object. Multiply the first value from each group with 16**2. Add the two values. Use hex on the result to convert it to its string representation.
>>> [hex(data[i]*16**2 + data[i+1]) for i in range(0,len(data),2)]
['0x789', '0x0', '0x1234']
I assume that you don't need your strings padded with useless zeros for now.
Use the struct module; it has unpack function which allows to specify the chunk size size (byte, 2-byte, 4-bytes) and endianess in the data. If what you have is big-endian half-word sized data chunks, then the right format key is ">H".
To parse all data at one, add count in the format specifier: for example ">3H" for you input array. You can also write the number of fields dynamically.
Full example:
import struct
data = b'\x07\x89\x00\x00\x12\x34'
d = struct.unpack(">{}H".format(len(data) // 2), data) # extract fields
result = [hex(x) for x in d] # convert to strings
There are two steps:
Chunk the input bytestring into 2-byte sequences
Display the sequences as hex literals.
You could use array module from stdlib:
import sys
from array import array
a = array('H', data)
if sys.byteorder != 'big':
a.byteswap() # use big-endian order
result = ['0x%04X' % i for i in a]
# -> ['0x0789', '0x0000', '0x1234']
It is efficient, especially if you need to read data from a file.

How to convert hexadecimal string to bytes in Python?

I have a long Hex string that represents a series of values of different types. I need to convert this Hex String into bytes or bytearray so that I can extract each value from the raw data. How can I do this?
For example, the string "ab" should convert to the bytes b"\xab" or equivalent byte array. Longer example:
>>> # what to use in place of `convert` here?
>>> convert("8e71c61de6a2321336184f813379ec6bf4a3fb79e63cd12b")
b'\x8eq\xc6\x1d\xe6\xa22\x136\x18O\x813y\xeck\xf4\xa3\xfby\xe6<\xd1+'
Suppose your hex string is something like
>>> hex_string = "deadbeef"
Convert it to a bytearray (Python 3 and 2.7):
>>> bytearray.fromhex(hex_string)
bytearray(b'\xde\xad\xbe\xef')
Convert it to a bytes object (Python 3):
>>> bytes.fromhex(hex_string)
b'\xde\xad\xbe\xef'
Note that bytes is an immutable version of bytearray.
Convert it to a string (Python ≤ 2.7):
>>> hex_data = hex_string.decode("hex")
>>> hex_data
"\xde\xad\xbe\xef"
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.
provided I understood correctly, you should look for binascii.unhexlify
import binascii
a='45222e'
s=binascii.unhexlify(a)
b=[ord(x) for x in s]
Assuming you have a byte string like so
"\x12\x45\x00\xAB"
and you know the amount of bytes and their type you can also use this approach
import struct
bytes = '\x12\x45\x00\xAB'
val = struct.unpack('<BBH', bytes)
#val = (18, 69, 43776)
As I specified little endian (using the '<' char) at the start of the format string the function returned the decimal equivalent.
0x12 = 18
0x45 = 69
0xAB00 = 43776
B is equal to one byte (8 bit) unsigned
H is equal to two bytes (16 bit) unsigned
More available characters and byte sizes can be found here
The advantages are..
You can specify more than one byte and the endian of the values
Disadvantages..
You really need to know the type and length of data your dealing with
You can use the Codecs module in the Python Standard Library, i.e.
import codecs
codecs.decode(hexstring, 'hex_codec')
You should be able to build a string holding the binary data using something like:
data = "fef0babe"
bits = ""
for x in xrange(0, len(data), 2)
bits += chr(int(data[x:x+2], 16))
This is probably not the fastest way (many string appends), but quite simple using only core Python.
A good one liner is:
byte_list = map(ord, hex_string)
This will iterate over each char in the string and run it through the ord() function. Only tested on python 2.6, not too sure about 3.0+.
-Josh

Categories

Resources