Converting data to binary using struct in python - python

I have the following dict which I want to write to a file in binary:
data = {(7, 190, 0): {0: 0, 1: 101, 2: 7, 3: 0, 4: 0},
(7, 189, 0): {0: 10, 1: 132, 2: 17, 3: 20, 4: 40}}
I went ahead to use the struct module in this way:
packed=[]
for ssd, add_val in data.iteritems():
# am trying to using 0xcafe as a marker to tell me where to grab the keys
pack_ssd = struct.pack('HBHB', 0xcafe, *ssd)
packed.append(pack_ssd)
for add, val in data[ssd].iteritems():
pack_add_val = struct.pack('HH', add, val)
packed.append(pack_add_val)
The output of this is packed = ['\xfe\xca\x07\x00\xbe\x00\x00', '\x00\x00\x00\x00', '\x01\x00e\x00', '\x02\x00\x07\x00', '\x03\x00\x00\x00', '\x04\x00\x00\x00', '\xfe\xca\x07\x00\xbd\x00\x00', '\x00\x00\n\x00', '\x01\x00\x84\x00', '\x02\x00\x11\x00', '\x03\x00\x14\x00', '\x04\x00(\x00']
After which I write this as a binary file :
ifile = open('test.bin', 'wb')
for pack in packed:
ifile.write(pack)
Here is what the binary file looks like:
'\xfe\xca\x07\x00\xbe\x00\x00\x00\x00\x00\x00\x01\x00e\x00\x02\x00\x07\x00\x03\x00\x00\x00\x04\x00\x00\x00\xfe\xca\x07\x00\xbd\x00\x00\x00\x00\n\x00\x01\x00\x84\x00\x02\x00\x11\x00\x03\x00\x14\x00\x04\x00(\x00'
It's all OK until I tried to unpack the data. Now I want to read the contents of the binary file and arrange it back to how my dict looked liked in the first place. This is how I tried to unpack it but I was always getting an error:
unpack=[]
while True:
chunk = ifile.read(log_size)
if len(chunk) == log_size:
str = struct.unpack('HBHB', chunk)
unpack.append(str)
chunk = ifile.read(log1_size)
str= struct.unpack('HH', chunk)
unpack.append(str)
Traceback (most recent call last):
File "<interactive input>", line 7, in ?
error: unpack str size does not match format
I realize the method I tried to unpack will always run into problems, but I can't seem to find a good way in unpacking the contents of the binary file. Any help is much appreciated..

If you need to write something custom, I would suggest doing the following:
1) 64 bit integer: Number of keys
2) 64 bit integer * 3 * number of keys: Key tuple data
for i in number of keys:
3i) 64 bit integer: Number of keys for dictionary i
4i): 64 bit integer * 2 * number of keys for i: key data, value data, key data, value data...
After that, just make sure you read and write with the same endianness and that specifying an invalid length at any point (too high, too low) doesn't crash your program and you are good.
The idea is that at any state in the unpacker it is either expecting a length or to read data as something, and so it is 100% unambiguous where everything starts and ends as long as you follow the format.

Related

struct.pack_into requires more bytes then specified in format

Python struct.pack_into with format char 'x' requires more bytes.
I am trying to learn about python byte arrays to be able to write my own IP,TPC,UDP headers. I use the struct in python to pack and unpack binary data so the specified types given the format string.
ba2 = bytearray(2)
print(ba2, "The size: ", ba2.__len__())
struct.pack_into(">bx", ba2, 1, 1)
print(struct.unpack(">bx", ba2))
Now when I try to pack into a buffer of length 2 with ">bx" as format, according to above code, I get the error:
bytearray(b'\x00\x00') The size: 2
Traceback (most recent call last):
File "D:/User/Documents/Python/Network/Main.py", line 58, in <module>
bitoperations_bytes_bytearrays_test()
File "D:/User/Documents/Python/Network/Main.py", line 49, in bitoperations_bytes_bytearrays_test
struct.pack_into(">bx", ba2, 1, 1)
struct.error: pack_into requires a buffer of at least 2 bytes
but I have a byte array of 2 bytes.
What am I doing wrong?
And please reference to some documentation, if I have missed it (I have read the python doc, but may have missed it).
Edit:
Sorry if I was unclear. but i want to just change the second byte in the byte array. Thus the 'x' padd in the format.
And as stupid as i was it is just to exclude the 'x' in the format like thiss:
struct.pack_into(">b", ba2, 1, 1)
and the right packing will have ben made. With this output:
bytearray(b'\x00\x00') The size: 2
A pack with one byte shift: 0001
(0, 1)
You need one additional parameter for pack_into() function call. The third parameter is mandatory and it is offset in the target buffer (refer to https://docs.python.org/2/library/struct.html). Your format is also incorrect, because it just expects one byte. Following code fixes your problems:
import struct
ba2 = bytearray(2)
print(ba2, "The size: ", ba2.__len__())
struct.pack_into("bb", ba2, 0, 1, 1)
print(struct.unpack("bb", ba2))
And as stupid as i was it is just to exclude the 'x' in the format like thiss:
struct.pack_into(">b", ba2, 1, 1)
and the right packing will have ben made. With this output:
bytearray(b'\x00\x00') The size: 2
A pack with one byte shift: 0001
(0, 1)

2's compliment of a Byte in python

I am working on the following python code:
import wave
from bitstring import BitArray
w = wave.open('file.wav','rb')
totalFrames = w.getnframes() #Total number of samples
bytesData = w.readframes(totalFrames)
binData = BitArray(bytesData)
bin2Data = (binData.bin)
The file.wav has 88200 samples at a sampling rate of 44.1KHz.
My goal is to be able to get the 2's compliment of the binary data I obtain from file.wav. 'binData.bin' gives me the binary form of the bytes (\x00\x00N\x00n\xff..) obtained through w.readframes but in a string format.
I was using this to obtain 2'scompliment:
2comp = ~(bin2Data) + 0b1
but in vain. It would show the following error:
Traceback (most recent call last):
File "speaker_bin.py", line 16, in <module>
bin2Data = ~((binData.bin)) + 0b1
TypeError: bad operand type for unary ~: 'str'
I tried int(bin2Data) to convert it but it would not work (It would not print anything at all. I guess because of the size of the data.)
What am I doing wrong?
I would really appreciate any feedback. (even a simple nudge in the right direction)
You need to use
int(binData.bin, 2)
To create an int, you can specify the base as a second parameter, otherwise it will just assume the value is in base 10. As you can see from the docs, the default base is 10, which is why you need to specify a different base other than 10
Also do the same with 0b1

Writing to binary file as int in python

I'm working on a project where the output size is very important. As my outputs are numbers between 0 and 100, I'm trying to write them as bytes (or unsigned chars).
However, I'm getting errors when trying to read them.
Here is a simple example:
test_filename='test.b'
g=(3*ones(shape=[1000])).astype('c')
g.tofile(test_filename)
with open(test_filename, "rb") as f:
bytes = f.read(1)
num = int(bytes.encode('hex'), 1)
print num
Here is the error I get, somehow the bytes.encode thingy excepts a binary string or something of that sort (not sure of course):
ValueError Traceback (most recent call last)
<ipython-input-43-310a447041fe> in <module>()
----> 1 num = int(bytes.encode('hex'), 1)
2 print num
ValueError: int() base must be >= 2 and <= 36
I should state that I would later need to read the output files in C++.
Thanks in advance,
Gil
There is some iffiness to this based on the version of python you are using.
If python2, which I assume you are using because of the print statement, the main problem you have is that you are getting a string from the read, so if the value is say 50 you would get an ascii value of 2 if you print it. You need to tell python that those bits should be in an int type not a str type and a simple cast does not do that.
I personally would use the struct package and do the following:
with open(test_filename, "rb") as f:
bytes = f.read(1)
num = struct.unpack("B", bytes)[0]
print num
Another option would be to encode the string to hex and read it in as a hex string (which looks like is what you are trying):
num = int(bytes.encode("hex_codec"), 16))
print num
One final option would be to put the string in a bytearray and pull the first byte:
num = bytearray(bytes)[0]
print num
If you are actually using python 3 this is simpler because you will get back a bytes object (if so dont name a variable bytes, very confusing). With a bytes object you can just pull the first element out which will be pulled out as an int:
num = bytes[0]
print num

python conversion int into arbitrary number of bytes

I am facing a little corner case of the famous struct.pack.
The situation is the following: I have a dll with a thin layer wrapper to python. One of the python method in the wraper accept a byte array as argument. This byte array the representation of a register on a specific hardware bus. Each bus has different register width, typically 8, 16 and 24 bits wide (alignement is the same in all cases).
When calling this method I need to convert my value (whatever that is) to a byte array of 8/16 or 24bits. Such conversion is relatively easy with 8 or 16bits using the struct.pack:
byteList = struct.pack( '>B', regValue ) # For 8 bits case
byteList = struct.pack( '>H', regValue ) # for 16 bits case
I am now looking to make it flexible enough for all three cases 8/16 & 24 bits. I could use a mix of the two previous line to handle the three cases; but I find it quite ugly.
I was hoping this would work:
packformat = ">{0}B".format(regSize)
byteList = struct.pack( packformat, regValue )
But it is not the case as the struct.pack expect an equal amount of arguments.
Any idea how can I convert (neatly) my register value into an arbitrary number of bytes?
You are always packing unsigned integers, and only big endian to boot. Take a look at what happens when you pack them:
>>> import struct
>>> struct.pack('>B', 255)
'\xff'
>>> struct.pack('>H', 255)
'\x00\xff'
>>> struct.pack('>I', 255)
'\x00\x00\x00\xff'
Essentially the value is padded with null bytes at the start. Use this to your advantage:
>>> struct.pack('>I', 255)[-3:]
'\x00\x00\xff'
>>> struct.pack('>I', 255)[-2:]
'\x00\xff'
>>> struct.pack('>I', 255)[-1:]
'\xff'
You won't get an exception now, if your value is too large, but it would simplify your code enormously. You can always add a separate validation step:
def packRegister(value, size):
if value < 0 or value.bit_length() > size:
raise ValueError("Value won't fit in register of size {} bits".format(size))
return struct.pack('>I', value)[-(size // 8):]
Demo:
>>> packRegister(255, 8)
'\xff'
>>> packRegister(1023, 16)
'\x03\xff'
>>> packRegister(324353, 24)
'\x04\xf3\x01'
>>> packRegister(324353, 8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in packRegister
ValueError: Value won't fit in register of size 8 bits

py2neo rel() list indices must be integer not float

I'm trying to import nodes into Neo4j in a batch. But when I try to execute it, it throws an error: List indices must be integers, not float. I don't really understand which listitems, I do have floats, but these are cast to strings...
Partial code:
graph_db = neo4j.GraphDatabaseService("http://127.0.0.1:7474/db/data/")
batch = neo4j.WriteBatch(graph_db)
for ngram, one_grams in data.items():
ngram_rank = int(one_grams['_rank'])
ngram_prob = '%.16f' % float(one_grams['_prob'])
ngram_id = 'a'+str(n)
ngram_node = batch.create(node({"word": ngram, "rank": str(ngram_rank), "prob": str(ngram_prob)}))
for one_gram, two_grams in one_grams.items():
one_rank = int(two_grams['_rank'])
one_prob = '%.16f' % float(two_grams['_prob'])
one_node = batch.create(node({"word": one_gram, "rank": str(one_rank), "prob": one_prob}))
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))) #line 81 throwing error
results = batch.submit()
Full traceback
Traceback (most recent call last):
File "Ngram_neo4j.py", line 81, in probability_items
batch.create(rel((ngram_node, "FOLLOWED_BY", one_node))),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2692, in create
uri = self._uri_for(entity.start_node, "relationships"),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2537, in _uri_for
uri = "{{{0}}}".format(self.find(resource)),
File "virtenv\\lib\\site-packages\\py2neo\\neo4j.py", line 2525, in find
for i, req in pendulate(self._requests):,
File "virtenv\\lib\\site-packages\\py2neo\\util.py", line 161, in pendulate
yield index, collection[index],
TypeError: list indices must be integers, not float
running neo4j 2.0, py2neo 1.6.1, Windows 7/64bit, python 3.3/64bit
--EDIT--
Did some testing, but the error is located in the referencing to nodes.
oversimplified sample code:
for key, dict in data.items(): #string, dictionary
batch = neo4j.WriteBatch(graph_db)
three_gram_node = batch.create(node({"word": key}))
pprint(three_gram_node)
batch.add_labels(three_gram_node, "3gram") # must be int, not float
for k,v in dict.items(): #string, string
four_gram_node = batch.create(node({"word": k}))
batch.create_path(three_gram_node, "FOLLOWED_BY", four_gram_node)
# cannot cast node from BatchRequest obj
batch.submit()
When a node is created batch.create(node({props})), the pprint returns a P2Neo.neo4j. batchrequest object.
At the line add_labels(), it gives the same error as when trying to create a relation: List indices must be integers, not float.
At the batch.create_path() line it throws an error saying it can't cast a node from a P2Neo.neo4j. batchrequest object.
I'm trying the dirty-debug now to understand the indices.
--Dirty Debug Edit--
I've been meddling around with the pendulate(collection) function.
Although I don't really understand how it fits in, and how it's used, the following is happening:
Whenever it hits an uneven number, it gets cast to a float (which is weird, since count - ((i + 1) / 2), where i is an uneven number.) This float then throws the list indices error. Some prints:
count: 3
i= 0
index: 0
(int)index: 0
i= 1 # i = uneven
index: 2.0 # a float appears
(int)index: 2 # this is a safe cast
This results in the list indices error. This also happens when i=0. As this is a common case, I made an additional if() to circumvent the code (possible speedup?) Although I've not unit tested this, it seems that we can safely cast index to an int...
The pendulate function as used:
def pendulate(collection):
count = len(collection)
print("count: ", count)
for i in range(count):
print("i=", i)
if i == 0:
index = 0
elif i % 2 == 0:
index = i / 2
else:
index = count - ((i + 1) / 2)
print("index:", index)
index = int(index)
print("(int)index:", index)
yield index, collection[index]
soft debug : print ngram_node and one_node to see what they contains
dirty debug : modify File "virtenv\lib\site-packages\py2neo\util.py", line 161, add a line before :
print index
You are accessing a collection (a Python list given the traceback), so, for sure, index must be an integer :)
printing it will probably help you to understand why exception raised
(Don't forget to remove your dirty debug afterwards ;))
While it is currently possible for WriteBatch objects to be executed multiple times with edits in between, it is inadvisable to use them in this way and this will be restricted in the next version of py2neo. This is because objects created during one execution will not be available during a subsequent execution and it is not easy to detect when this is being requested.
Without looking back at the underlying code, I'm unsure why you are seeing this exact error but I would suggest refactoring your code so that each WriteBatch creation is paired with one and only one execution call (submit). You can probably achieve this by putting your batch creation within your outer loop and moving your submit call out of the inner loop into the outer loop as well.

Categories

Resources