There is a need for storing and retrieving some data which can be string formatted in a known way. But the data needs to be 16 bit aligned as the storing device doesn't support it. In order to store the data efficiently, I am packing the data in a known format using struct pack. But how should I align the data to be 16 bit so that retrieving and storing doesn't screw up the data?
for eg.
data = [12,b'c', 100009, b"string", 3.45]
stringformat of data = "icl6sd"
packed data =b'\x0c\x00\x00\x00c\x00\x00\x00\xa9\x86\x01
\x00\x00\x00\x00\x00string\x00\x00\x9a\x99\x99\x99\x99\x99\x0b#'
How do I convert this data to be 16 bit aligned?
struct.pack knows all about C-compiler byte-alignment and will add packing bytes to your data without being asked. That is why your structure is 32 bytes long, even though your data calls for 4 + 1 + 4 + 6 + 8 (=23) bytes of of data. Look closely and you will see (for example) that string has been padded out on the right with an extra null byte so that the double will occupy the last 8 bytes of the structure. So I'm not sure I'm convinced that the alignment is wrong.
You can take charge of the padding yourself (format x) but I suggest that struct.pack knows exactly what it is doing.
If your packed data isn't being interpreted as you expect, it is much more likely that there is a problem with the struct specification or that the byte order is not what you expect.
Related
I have this code which uses zlib to compresses a binary string:
f = open("compressed.bin", "wb")
x = zlib.compress(b"Some text to be compressed")
f.write(x)
f.close()
The data always gets compressed with fixed Huffman codes, regardless of the input string, whereas I want the data to be compressed with dynamic Huffman codes. I even tried using a string with several repetitions (such as 100 'A's, followed by 100 'B's, followed by 100 'A's, and so on) but it also ends up compressed with fixed codes.
Edit: the result is the same if I specify level=9.
How do I specify that the data should be compressed with dynamic codes regardless of which method achieves better compression?
Give it more data. At least a few 10's of K.
Fixed coding is only used when a fixed block would be smaller than a dynamic block for the same data. That is generally only the case when a small amount of data is being compressed, or for the last deflate block where only a small amount of data remains.
So I did a stupid thing, and forgot to explicitly type-convert some values I was putting into an SQLite database (using Python's SQLalchemy). The column was set up to store an INT, whereas the input was actually a numpy.int64 dtype.
The values I am getting back out of the database look like:
b'\x15\x00\x00\x00\x00\x00\x00\x00'
It seems that SQLite has gone and stored the binary representation for these values, rather than the integer itself.
Is there a way to decode these values in Python, or am I stuck with loading all my data again (not a trivial exercise at this point)?
You can use struct.unpack():
>>> import struct
>>> value = struct.unpack('<q', b'\x15\x00\x00\x00\x00\x00\x00\x00')
>>> value
(21,)
>>> value[0]
21
That assumes that the data was stored little endian as specified by the < in the unpack() format string, and that it is a signed "long long" (8 bytes) as specified by the q. If the data is big endian:
>>> struct.unpack('>q', b'\x15\x00\x00\x00\x00\x00\x00\x00')
(1513209474796486656,)
I imagine that little endian is more likely to be correct in this case.
P.S. I have just confirmed that when a numpy.int64 is inserted into a SQLite int field it can be retrieved using struct.unpack() as shown above.
Just as a preamble I am using python 3 and the bitstring library.
So Arinc429 words are 32 bit data words.
Bits 1-8 are used to store the label. Say for example I want the word to set the latitude, according to the label docs, set latitude is set to the octal
041
I can model this in python by doing:
label = BitArray(oct='041')
print(label.bin)
>> 000100001
The next two bits can be used to send a source, or extend the label by giving an equipment ID. Equipment IDs are given in hex, the one I wish to use is
002
So again, I add it to a new BitArray object and convert it to binary
>> 000000010
Next comes the data field which spans from bits 11-29. Say I want to set the latitude to the general area of London (51.5072). This is where I'm getting stuck as floats can only be 32/64 bits long.
There are 2 other parts of the word, but before I go there I am just wondering, if I am going along the right track, or way off how you would construct such a word?
Thanks.
I think you're on the right track, but you need to either know or decide the format for your data field.
If the 19 bits you want to represent a float are documented somewhere as being a float then look how that conversion is done (as that's not at all a standard number of bits for a floating point number). If those bits are free-form and you can choose both the encode and decode then just pick something appropriate.
There is a standard for 16-bit floats which is occasionally used, but if you only want to represent a latitude I'd go for something simpler. As it can only got from 0 to 360 just scale that to an integer from 0 to 2^19 and store the integer.
So 51.5072 becomes (51.5072/360*(2**19)) = 75012
Then store this as a unsigned integer
> latitude = BitArray(uint=75012, length=19)
This gives you a resolution of about 0.0007 degrees, which is the best you can hope for. To convert back:
> latitude.uint*360.0/2**19
51.50665283203125
I am trying to delta compress a list of pixels and store them in a binary file. I have managed to do this however the method I found takes ~4 minutes a frame.
def getByte_List(self):
values = BitArray("")
for I in range(len(self.delta_values)):
temp = Bits(int= self.delta_values[I], length=self.num_bits_pixel)
values.append(temp)
##start_time = time.time()
bit_stream = pack("uint:16, uint:5, bits", self.intial_value, self.num_bits_pixel, values)
##end_time = time.time()
##print(end_time - start_time)
# Make sure that the list of bits contains a multiple of 8 values
if (len(bit_stream) % 8):
bit_stream.append(Bits(uint=0, length = (8-(len(bit_stream) % 8)))) #####Append? On a pack? (Only work on bitarray? bit_stream = BitArray("")
# Create a list of unsigned integer values to represent each byte in the stream
fmt = (len(bit_stream)/8) * ["uint:8"]
return bit_stream.unpack(fmt)
This is my code. I take the initial value, the number of bits per pixel and the delta values and convert them into bits. I then byte align and take the integer representation of the bytes and use it elsewhere. The problem areas are where I convert each delta value to bits(3min) and where I pack(1min). Is it possible to do what I am doing faster or another way to pack them straight into integers representing bytes.
From a quick Google of the classes you're instantiating, it looks like you're using the bitstring module. This is written in pure Python, so it's not a big surprise that it's pretty slow. You might look at one or more of the following:
struct - a module that comes with Python that will let you pack and unpack C structures into constituent values
bytearray - a built-in type that lets you accumulate, well, an array of bytes, and has both list-like and string-like operations
bin(x), int(x, 2) - conversion of numbers to a binary representation as a string, and back - string manipulations can sometimes be a reasonably efficient way to do this
bitarray - native (C) module for bit manipulation, looks like it has similar functionality to bitstring but should be much faster. Available here in form suitable for compiling on Linux or here pre-compiled for Windows.
numpy - fast manipulation of arrays of various types including single bytes. Kind of the go-to module for this sort of thing, frankly. http://www.numpy.org/
I'm reading in a binary file into a list and parsing the binary data. I'm using unpack() to extract certain parts of the data as primitive data types, and I want to edit that data and insert it back into the original list of bytes. Using pack_into() would make it easy, except that I'm using Python 2.4, and pack_into() wasn't introduced until 2.5
Does anyone know of a good way to go about serializing the data this way so that I can accomplish essentially the same functionality as pack_into()?
Have you looked at the bitstring module? It's designed to make the construction, parsing and modification of binary data easier than using the struct and array modules directly.
It's especially made for working at the bit level, but will work with bytes just as well. It will also work with Python 2.4.
from bitstring import BitString
s = BitString(filename='somefile')
# replace byte range with new values
# The step of '8' signifies byte rather than bit indicies.
s[10:15:8] = '0x001122'
# Search and replace byte value with two bytes
s.replace('0xcc', '0xddee', bytealigned=True)
# Different interpretations of the data are available through properties
if s[5:7:8].int > 1000:
s[5:7:8] = 1000
# Use the bytes property to get back to a Python string
open('newfile', 'wb').write(s.bytes)
The underlying data stored in the BitString is just an array object, but with a comprehensive set of functions and special methods to make it simple to modify and interpret.
Do you mean editing data in a buffer object? Documentation on manipulating those at all from Python directly is fairly scarce.
If you just want to edit bytes in a string, it's simple enough, though; struct.pack_into is new to 2.5, but struct.pack isn't:
import struct
s = open("file").read()
ofs = 1024
fmt = "Ih"
size = struct.calcsize(fmt)
before, data, after = s[0:ofs], s[ofs:ofs+size], s[ofs+size:]
values = list(struct.unpack(fmt, data))
values[0] += 5
values[1] /= 2
data = struct.pack(fmt, *values)
s = "".join([before, data, after])