Construct parsing for unaligned int field? - python

I am using this nice little package "construct" for binary data parsing. However, I ran into a case where the format is defined as:
31 24 23 0
+-------------------------+
| status | an int number |
+-------------------------+
Basically, the higher 8 bits are used for status, and 3 bytes left for integer: an int type with higher bits masked off. I am a bit lost on what is the proper way of defining the format:
The brute force way is to define it as ULInt32 and do bit masking myself
Is there anyway I can use BitStruct to save the trouble?
Edit
Assuming Little Endian and based on jterrace's example and swapped=True suggestion, I think this is what will work in my case:
sample = "\xff\x01\x01\x01"
c = BitStruct("foo", BitField("i", 24, swapped=True), BitField("status", 8))
c.parse(sample)
Container({'i': 66047, 'status': 1})
Thanks
Oliver

This would be easy if construct contained Int24 types, but it doesn't. Instead, you can specify the bit lengths yourself like this:
>>> from construct import BitStruct, BitField
>>> sample = "\xff\x01\x01\x01"
>>> c = BitStruct("foo", BitField("status", 8), BitField("i", 24))
>>> c.parse(sample)
Container({'status': 255, 'i': 65793})
Note: The value \x01\x01\x01 is 65536 + 256 + 1 = 65793

BitStruct("foo",
BitField("status", 8),
BitField("number", 24))

Related

Packing an integer number to 3 bytes in Python

With background knowledge of C I want to serialize an integer number to 3 bytes. I searched a lot and found out I should use struct packing. I want something like this:
number = 1195855
buffer = struct.pack("format_string", number)
Now I expect buffer to be something like ['\x12' '\x3F' '\x4F']. Is it also possible to set endianness?
It is possible, using either > or < in your format string:
import struct
number = 1195855
def print_buffer(buffer):
print(''.join(["%02x" % ord(b) for b in buffer])) # Python 2
#print(buffer.hex()) # Python 3
# Little Endian
buffer = struct.pack("<L", number)
print_buffer(buffer) # 4f3f1200
# Big Endian
buffer = struct.pack(">L", number)
print_buffer(buffer) # 00123f4f
2.x docs
3.x docs
Note, however, that you're going to have to figure out how you want to get rid of the empty byte in the buffer, since L will give you 4 bytes and you only want 3.
Something like:
buffer = struct.pack("<L", number)
print_buffer(buffer[:3]) # 4f3f12
# Big Endian
buffer = struct.pack(">L", number)
print_buffer(buffer[-3:]) # 123f4f
would be one way.
Another way is to manually pack the bytes:
>>> import struct
>>> number = 1195855
>>> data = struct.pack('BBB',
... (number >> 16) & 0xff,
... (number >> 8) & 0xff,
... number & 0xff,
... )
>>> data
b'\xa5Z'
>>> list(data)
[18, 63, 79]
As just the 3-bytes, it's a bit redundant since the last 3 parameters of struct.pack equals the data. But this worked well in my case because I had header and footer bytes surrounding the unsigned 24-bit integer.
Whether this method, or slicing is more elegant is up to your application. I found this was cleaner for my project.

How to pack & unpack 64 bits of data?

I have a 64-bit data structure as follows:
HHHHHHHHHHHHHHHHGGGGGGGGGGGGFFFEEEEDDDDCCCCCCCCCCCCBAAAAAAAAAAAA
A: 12 bits (unsigned)
B: 1 bit
C: 12 bits (unsigned)
D: 4 bits (unsigned)
E: 4 bits (unsigned)
F: 3 bits (unsigned)
G: 12 bits (unsigned)
H: 16 bits (unsigned)
Using Python, I am trying to determine which module (preferably native Python 3.x) I should be using. I am looking at BitVector but having trouble figuring some things out.
For ease of use, I want to be able to do something like the following:
# To set the `A` bits, use a mapped mask 'objectId'
bv = BitVector(size=64)
bv['objectId'] = 1
I'm not sure that BitVector actually works the way I want it to. Whatever module I end up implementing, the data structure will be encapsulated in a class that reads and writes to the structure via property getters/setters.
I will also be using constants (or enums) for some of the bit values and it would be convenient to be able to set a mapped mask using something like:
# To set the 'B' bit, use the constant flag to set the mapped mask 'visibility'
bv['visibility'] = PUBLIC
print(bv['visibility']) # output: 1
# To set the 'G' bits, us a mapped mask 'imageId'
bv['imageId'] = 238
Is there a Python module in 3.x that will help me achieve this goal? If BitVector will (or should) work, some helpful hints (e.g. examples) would be appreciated. It seems that BitVector wants to force everything to an 8-bit format which is not ideal for my application (IMHO).
Based on the recommendation to use bitarray I have come up with the following implementation with two utility methods:
def test_bitvector_set_block_id_slice(self):
bv = bitvector(VECTOR_SIZE)
bv.setall(False)
print("BitVector[{len}]: {bv}".format(len=bv.length(),
bv=bv.to01()))
print("set block id: current {bid} --> {nbid}".format(bid=bv[52:VECTOR_SIZE].to01(),
nbid=inttobitvector(12, 1).to01()))
# set blockVector.blockId (last 12 bits)
bv[52:VECTOR_SIZE] = inttobitvector(12, 1)
block_id = bv[52:VECTOR_SIZE]
self.assertTrue(bitvectortoint(block_id) == 1)
print("BitVector[{len}] set block id: {bin} [{val}]".format(len=bv.length(),
bin=block_id.to01(),
val=bitvectortoint(block_id)))
print("BitVector[{len}]: {bv}".format(len=bv.length(),
bv=bv.to01()))
print()
# utility methods
def bitvectortoint(bitvec):
out = 0
for bit in bitvec:
out = (out << 1) | bit
return out
def inttobitvector(size, n):
bits = "{bits}".format(bits="{0:0{size}b}".format(n,
size=size))
print("int[{n}] --> binary: {bits}".format(n=n,
bits=bits))
return bitvector(bits)
The output is as follows:
BitVector[64]: 0000000000000000000000000000000000000000000000000000000000000000
int[1] --> binary: 000000000001
set block id: current 000000000000 --> 000000000001
int[1] --> binary: 000000000001
BitVector[64] set block id: 000000000001 [1]
BitVector[64]: 0000000000000000000000000000000000000000000000000000000000000001
If there are improvements to the utility methods, I am more than willing to take some advice.

Python bitstring uint seen as long

I have the following:
I read 30 bits from a bitstream:
MMSI = b.readlist('uint:30')
This seems to work normally except when the values get higher.
MMSI = b.readlist('uint:30')
p = 972128254
# repr(MMSI)[:-1]
print p
print "MMSI :"
print MMSI
if MMSI == p:
The code above outputs:
972128254
MMSI :
[972128254L]
The whole if MMSI ==p: is skipped for it is not equal for some reason.
I do not understand why the value that is far lower than max.int:
>>> import sys
>>> sys.maxint
2147483647
I do not understand why I get a Long returned and not a uint?
If the value returned is 244123456 it works like a charm.
2147483647 is maxint, but an int is 32 bits and you're using 30 bits. So your max is 1/4 of that, or about 500 million.
Values will be 'long' if an intermediate value was a long. So for example 2**1000 / 2**999 will equal 2L. This is just to do with the internals of the method you called and shouldn't affect most code.
The real problem is the comparison you have in your code is comparing an int to a list, which is not what you want to do. You can either use the read method rather than readlist to return a single item, or take the first element of the returned list: if MMSI[0] == p:
Aha, I figured that if I read 30bits and returned it as uint value it would automatically be a 32bits value.
So what you are saying is that I would have to add leading zeros to make a 32 bit value and then it should work.
So that is what I have tested and now I am lost.
I figured lets encode the value I want to compare to in the same way. So this is what I did:
from bitarray import bitarray
from bitstring import BitArray, BitStream,pack
from time import sleep
import string
MMSI = b.readlist('uint:30')
x = pack('uint:30',972000000)
p = x.readlist('uint:30')
y = pack('uint:30',972999999)
q = y.read('uint:30')
print p
print q
print x
print y
print MMSI
resulting in:
p = [972000000L]
q = 972999999
x = 0b111001111011111000101100000000
y = 0b111001111111101100110100111111
MMSI = [972128254L]
How can it be that the higher value 972999999 is not a long?

How can I store a list of number inside a number in Python?

I need to save a tuple of 4 numbers inside a column that only accepts numbers (int or floats)
I have a list of 4 number like -0.0123445552, -29394.2393339, 0.299393333, 0.00002345556.
How can I "store" all these numbers inside a number and be able to retrieve the original tuple in Python?
Thanks
Following up on #YevgenYampolskiy's idea of using numpy:
You could use numpy to convert the numbers to 16-bit floats, and then view the array as one 64-bit int:
import numpy as np
data = np.array((-0.0123445552, -29394.2393339, 0.299393333, 0.00002345556))
stored_int = data.astype('float16').view('int64')[0]
print(stored_int)
# 110959187158999634
recovered = np.array([stored_int], dtype='int64').view('float16')
print(recovered)
# [ -1.23443604e-02 -2.93920000e+04 2.99316406e-01 2.34842300e-05]
Note: This requires numpy version 1.6 or better, as this was the first version to support 16-bit floats.
If by int you mean the datatype int in Python (which is unlimited as of the current version), you may use the following solution
>>> x
(-0.0123445552, -29394.2393339, 0.299393333, 2.345556e-05)
>>> def encode(data):
sz_data = str(data)
import base64
b64_data = base64.b16encode(sz_data)
int_data = int(b64_data, 16)
return int_data
>>> encode(x)
7475673073900173755504583442986834619410853148159171975880377161427327210207077083318036472388282266880288275998775936614297529315947984169L
>>> def decode(data):
int_data = data
import base64
hex_data = hex(int_data)[2:].upper()
if hex_data[-1] == 'L':
hex_data = hex_data[:-1]
b64_data = base64.b16decode(hex_data)
import ast
sz_data = ast.literal_eval(b64_data)
return sz_data
>>> decode(encode(x))
(-0.0123445552, -29394.2393339, 0.299393333, 2.345556e-05)
You can combine 4 integers into a single integer, or two floats into a double using struct module:
from struct import *
s = pack('hhhh', 1, -2, 3,-4)
i = unpack('Q', pack('Q', i[0]))
print i
print unpack('hhhh', s)
s = pack('ff', 1.12, -2.32)
f = unpack('d', s)
print f
print unpack('ff', pack('d', f[0]))
prints
(18445618190982447105L,)
(1, -2, 3, -4)
(-5.119999879002571,)
(1.1200000047683716, -2.319999933242798)
Basically in this example tuple (1,-2,3,-4) gets packed into an integer 18445618190982447105, and tuple ( 1.12, -2.32) gets packed into -5.119999879002571
To pack 4 floats into a single float you will need to use half-floats, however this is a problem here:
With half-float it looks like there is no native support in python as of now:
http://bugs.python.org/issue11734
However numpy module do have some support for half-floats (http://docs.scipy.org/doc/numpy/user/basics.types.html). Maybe you can use it somehow to pack 4 floats into a single float
This does not really answer your question, but what you're trying to do violates 1NF. Is changing the DB schema to introduce an intersection table really not an option?
my idea is weird; but will it work??
In [31]: nk="-0.0123445552, -29394.2393339, 0.299393333, 0.00002345556"
In [32]: nk1="".join(str(ord(x)) for x in nk)
In [33]: nk1
Out[33]: '454846484950515252535353504432455057515752465051575151515744324846505757515751515151443248464848484850515253535354'
In [34]: import math
In [35]: math.log(long(nk1), 1000)
Out[36]: 37.885954947611985
In [37]: math.pow(1000,_)
Out[37]: 4.548464849505043e+113
you can easily unpack this string(Out[33]); for example split it at 32; its for space.
also this string is very long; we can make it to a small number by math.log; as we got in Out[36].

Read 32-bit signed value from an "unsigned" bytestream

I want to extract data from a file whoose information is stored in big-endian and always unsigned. How does the "cast" from unsigned int to int affect the actual decimal value? Am I correct that the most left bit decides about the whether the value is positive or negative?
I want to parse that file-format with python, and reading and unsigned value is easy:
def toU32(bits):
return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8 | ord(bits[3])
but how would the corresponding toS32 function look like?
Thanks for the info about the struct-module. But I am still interested in the solution about my actual question.
I would use struct.
import struct
def toU32(bits):
return struct.unpack_from(">I", bits)[0]
def toS32(bits):
return struct.unpack_from(">i", bits)[0]
The format string, ">I", means read a big endian, ">", unsigned integer, "I", from the string bits. For signed integers you can use ">i".
EDIT
Had to look at another StackOverflow answer to remember how to "convert" a signed integer from an unsigned integer in python. Though it is less of a conversion and more of reinterpreting the bits.
import struct
def toU32(bits):
return ord(bits[0]) << 24 | ord(bits[1]) << 16 | ord(bits[2]) << 8 | ord(bits[3])
def toS32(bits):
candidate = toU32(bits);
if (candidate >> 31): # is the sign bit set?
return (-0x80000000 + (candidate & 0x7fffffff)) # "cast" it to signed
return candidate
for x in range(-5,5):
bits = struct.pack(">i", x)
print toU32(bits)
print toS32(bits)
I would use the struct module's pack and unpack methods.
See Endianness of integers in Python for some examples.
The non-conditional version of toS32(bits) could be something like:
def toS32(bits):
decoded = toU32(bits)
return -(decoded & 0x80000000) + (decoded & 0x7fffffff)
You can pre-compute the mask for any other bit size too of course.

Categories

Resources