python struct unpack - python

I'm trying to convert the following perl code:
unpack(.., "Z*")
to python, however the lack of a "*" format modifier in struct.unpack() seems to make this impossible. Is there a way I can do this in python?
P.S. The "*" modifier in perl from the perldoc - Supplying a * for the repeat count instead of a number means to use however many items are left, ...
So although python has a numeric repeat count like perl, it seems to lack a * repeat count.

python's struct.unpack doesn't have the Z format
Z A null-terminated (ASCIZ) string, will be null padded.
i think this
unpack(.., "Z*")
would be:
data.split('\x00')
although that strips the nulls

I am assuming that you create the struct datatype and you know the size of the struct. If that is the case, then you can create a buffer allocated for that struct and the pack the value into the buffer. While unpacking, you can use the same buffer to unpack directly by just specifying the starting point.
For e.g.
import ctypes
import struct
s = struct.Struct('I')
b = ctypes.create_string_buffer(s.size)
s.pack_into(b, 0, 42)
s.unpack_from(b, 0)

You must calculate the repeat count yourself:
n = len(s) / struct.calcsize(your_fmt_string)
f = '%d%s' % (n, your_fmt_string)
data = struct.unpack(s, f)
I am assuming your_fmt_string doesn't unpack more than one element, and len(s) is perfectly divided by that element's packed size.

Related

How do I pack bits from one byte array to another efficiently in python3?

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.
For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1])
Desired output would be b'\x89\xe2\xff'.
I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.
In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?
In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.
And you can extend it to 2-bit values pretty easily:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes.
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src into groups of 8 in big-endian order:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:
np.packbits(src)
(A bytearray works just fine as an "array-like".)
It's also hard to beat for speed.
For comparison, here's some measurements:
60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
60ns/byte + 1.9µs: np.packbits
440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
610ns/byte + 9.1µs: bitarray
800ns/byte + 2.9µs: gmpy.mpz(…)…
1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
16.µs/byte + 0.9µs: for and bit-twiddling
Using numpy, with test code and comments:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

Saving and loading bits/bytes in Python

I've been studying compression algorithms recently, and I'm trying to understand how I can store integers as bits in Python to save space.
So first I save '1' and '0' as strings in Python.
import os
import numpy as np
array= np.random.randint(0, 2, size = 200)
string = [str(i) for i in array]
with open('testing_int.txt', 'w') as f:
for i in string:
f.write(i)
print(os.path.getsize('testing_int.txt'))
I get back 200 bytes which makes sense, since each each char is represented by one byte in ascii (and utf-8 as well if characters are latin?).
Now if trying to save these ones and zeroes as bits, I should only take up around 25 bytes right?
200 bits/8 = 25 bytes.
However, when I try the following code below, I get 105 bytes.
Am I doing something wrong?
Using the same 'array variable' as above I tried this:
bytes_string = [bytes(i) for i in array]
with open('testing_bytes.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
Then I tried this:
bin_string = [bin(i) for i in array]
with open('testing_bin.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
This also takes up around 105 bytes.
So I tried looking at the text files, and I noticed that
both the 'bytes.txt' and 'bin.txt' are blank.
So I tried to read the 'bytes.txt' file via this code:
with open(r"C:\Users\Moondra\Desktop\testing_bytes\testing_bytes.txt", 'rb') as f:
x =f.read()
Now I get get back as this :
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
So I tried these commands:
>>> int.from_bytes(x, byteorder='big')
0
>>> int.from_bytes(x, byteorder='little')
0
>>>
So apparently I'm doing multiple things incorrectly.
I can't figure out:
1) Why I am not getting a text file that is 25 bytes
2) Why can I read back the bytes file correctly.
Thank you.
bytes_string = [bytes(i) for i in array]
It looks like you expect bytes(x) to give you a one-byte bytes object with the value of x. Follow the documentation, and you'll see that bytes() is initialized like bytearray(), and bytearray() says this about its argument:
If it is an integer, the array will have that size and will be initialized with null bytes.
So bytes(0) gives you an empty bytes object, and bytes(1) gives you a single byte with the ordinal zero. That's why bytes_string is about half the size of array and is made up completely of zero bytes.
As for why the bin() example didn't work, it looks like a simple case of copy-pasting and forgetting to change bytes_string to bin_string in the for loop.
This all still doesn't accomplish your goal of treating 0 or 1 value integers as bits. Python doesn't really have that sort of functionality built in. There are third-party modules that allow you to work at the bit level, but I can't speak to any of them specifically. Personally I would probably just roll my own specific to the application.
It looks like you're trying to bit shift all the values into a single byte. For example, you expect the integer values [0,1,0,1,0,1,0,1] to be packed into a byte that looks like the following binary number: 0b01010101. To do this, you need to use the bitwise shift operator and bitwise or operator along with the struct module to pack the values into an unsigned Char which represents the sequence of int values you have.
The code below takes the array of random integers in range [0,1] and shifts them together to make a binary number that can be packed into a single byte. I used 256 ints for convenience. The expected number of bytes for the file to be is then 32 (256/8). You will see that when it is run this is indeed what you get.
import struct
import numpy as np
import os
a = np.random.randint(0, 2, size = 256)
bool_data = []
bin_vals = []
for i in range(0, len(a), 8):
bin_val = (a[i] << 0) | (a[i+1] << 1) | \
(a[i+2] << 2) | (a[i+3] << 3) | \
(a[i+4] << 4) | (a[i+5] << 5) | \
(a[i+6] << 6) | (a[i+7] << 7)
bin_vals.append(struct.pack('B', bin_val))
with open("output.txt", 'wb') as f:
for val in bin_vals:
f.write(val)
print(os.path.getsize('output.txt'))
Please note, however, that this will only work for values of integers in the range [0,1] since if they are bigger it will shift more non-zeros and wreck the structure of the generated byte. The binary number may also exceed 1 byte in size in this case.
It seems like you're just using python in attempt to generate an array of bits for demonstration purposes, and to that token I would say that python probably isn't best suited for this. I would recommend using a lower level language such as C/C++ which has more direct access to data type than python does.

Convert a list of ints to a float

I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.
I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)
I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)
I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)
Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])

How to create a fixed size (unsigned) integer in python?

I want to create a fixed size integer in python, for example 4 bytes. Coming from a C background, I expected that all the primitive types will occupy a constant space in memory, however when I try the following in python:
import sys
print sys.getsizeof(1000)
print sys.getsizeof(100000000000000000000000000000000000000000000000000000000)
I get
>>>24
>>>52
respectively.
How can I create a fixed size (unsigned) integer of 4 bytes in python? I need it to be 4 bytes regardless if the binary representation uses 3 or 23 bits, since later on I will have to do byte level memory manipulation with Assembly.
You can use struct.pack with the I modifier (unsigned int). This function will warn when the integer does not fit in four bytes:
>>> from struct import *
>>> pack('I', 1000)
'\xe8\x03\x00\x00'
>>> pack('I', 10000000)
'\x80\x96\x98\x00'
>>> pack('I', 1000000000000000)
sys:1: DeprecationWarning: 'I' format requires 0 <= number <= 4294967295
'\x00\x80\xc6\xa4'
You can also specify endianness.
the way I do this (and its usually to ensure a fixed width integer before sending to some hardware) is via ctypes
from ctypes import c_ushort
def hex16(self, data):
'''16bit int->hex converter'''
return '0x%004x' % (c_ushort(data).value)
#------------------------------------------------------------------------------
def int16(self, data):
'''16bit hex->int converter'''
return c_ushort(int(data,16)).value
otherwise struct can do it
from struct import pack, unpack
pack_type = {'signed':'>h','unsigned':'>H',}
pack(self.pack_type[sign_type], data)
you are missing something here I think
when you send a character you will be sending 1 byte so even though
sys.getsizeof('\x05')
reports larger than 8 you are still only sending a single byte when you send it. the extra overhead is python methods that are attached to EVERYTHING in python, those do not get transmitted
you complained about getsizeof for the struct pack answer but accepted the c_ushort answer so I figured I would show you this
>>> sys.getsizeof(struct.pack("I",15))
28
>>> sys.getsizeof(c_ushort(15))
80
however that said both of the answers should do exactly what you want
I have no idea if there's a better way to do this, but here's my naive approach:
def intn(n, num_bits=4):
return min(2 ** num_bits - 1, n)

Python 2,3 Convert Integer to "bytes" Cleanly

The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number

Categories

Resources