I am trying to delta compress a list of pixels and store them in a binary file. I have managed to do this however the method I found takes ~4 minutes a frame.
def getByte_List(self):
values = BitArray("")
for I in range(len(self.delta_values)):
temp = Bits(int= self.delta_values[I], length=self.num_bits_pixel)
values.append(temp)
##start_time = time.time()
bit_stream = pack("uint:16, uint:5, bits", self.intial_value, self.num_bits_pixel, values)
##end_time = time.time()
##print(end_time - start_time)
# Make sure that the list of bits contains a multiple of 8 values
if (len(bit_stream) % 8):
bit_stream.append(Bits(uint=0, length = (8-(len(bit_stream) % 8)))) #####Append? On a pack? (Only work on bitarray? bit_stream = BitArray("")
# Create a list of unsigned integer values to represent each byte in the stream
fmt = (len(bit_stream)/8) * ["uint:8"]
return bit_stream.unpack(fmt)
This is my code. I take the initial value, the number of bits per pixel and the delta values and convert them into bits. I then byte align and take the integer representation of the bytes and use it elsewhere. The problem areas are where I convert each delta value to bits(3min) and where I pack(1min). Is it possible to do what I am doing faster or another way to pack them straight into integers representing bytes.
From a quick Google of the classes you're instantiating, it looks like you're using the bitstring module. This is written in pure Python, so it's not a big surprise that it's pretty slow. You might look at one or more of the following:
struct - a module that comes with Python that will let you pack and unpack C structures into constituent values
bytearray - a built-in type that lets you accumulate, well, an array of bytes, and has both list-like and string-like operations
bin(x), int(x, 2) - conversion of numbers to a binary representation as a string, and back - string manipulations can sometimes be a reasonably efficient way to do this
bitarray - native (C) module for bit manipulation, looks like it has similar functionality to bitstring but should be much faster. Available here in form suitable for compiling on Linux or here pre-compiled for Windows.
numpy - fast manipulation of arrays of various types including single bytes. Kind of the go-to module for this sort of thing, frankly. http://www.numpy.org/
Related
To save memory, I want to use less bytes (4) for each int I have instead of 24.
I looked at structs, but I don't really understand how to use them.
https://docs.python.org/3/library/struct.html
When I do the following:
myInt = struct.pack('I', anInt)
sys.getsizeof(myInt) doesn't return 4 like I expected.
Is there something that I am doing wrong? Is there another way for Python to save memory for each variable?
ADDED: I have 750,000,000 integers in an array that I wish to be able to use given an index.
If you want to hold many integers in an array, use a numpy ndarray. Numpy is a very popular third-party package that handles arrays more compactly than Python alone does. Numpy is not in the standard library so that it could be updated more frequently than Python itself is updated--it was considered to be added to the standard library. Numpy is one of the reasons Python has become so popular for Data Science and for other scientific uses.
Numpy's np.int32 type uses four bytes for an integer. Declare your array full of zeros with
import numpy as np
myarray = np.zeros((750000000,), dtype=np.int32)
Or if you just want the array and do not want to spend any time initializing the values,
myarray = np.empty((750000000,), dtype=np.int32)
You then fill and use the array as you like. There is some Python overhead for the complete array, so the array's size will be slightly larger than 4 * 750000000, but the size will be close.
I have different uint64 numbers which I want to send via CAN-Bus with SocketCAN in Python. I need to divide the large number in 8 bytes, so I can assign the values to the CAN data-bytes. But I'm struggling with the implementation.
I am grateful for any help or suggestion.
Thanks for your help!
When it comes to convert numbers to their byte representation, the struct module is your friend:
i = 65357
print(hex(i))
bigendian = struct.pack(">Q", i)
littleendian = struct.pack("<Q", i)
print(repr(bigendian)
print(repr(littleendian)
output is as expected:
'0x10001'
b'\x00\x00\x00\x00\x00\x01\x00\x01'
b'\x01\x00\x01\x00\x00\x00\x00\x00'
That means that you can easily use the individual bytes (in the order you need) to send them via CAN-bus
Assuming you are using Python 3 you can simply use Python int's to_bytes method like so:
i = 65357
print(hex(i))
print(i.to_bytes(8, 'big'))
print(i.to_bytes(8, 'little'))
Output:
0xff4d
b'\x00\x00\x00\x00\x00\x00\xffM'
b'M\xff\x00\x00\x00\x00\x00\x00'
Not sure if you're using the python-can library, but if you are you can pass either bytes, a list of ints or a bytesarray to the can.Message.
I read (int)32 bit audio data (given as string by previous commands) into a numpy.int32 array with :
myarray = numpy.fromstring(data, dtype=numpy.int32)
But then I want to store it in memory as int16 (I know this will decrease the bit depth / resolution / sound quality) :
myarray = myarray >> 16
my_16bit_array = myarray.astype('int16')
It works very well, but : is there a faster solution? (here I use : a string buffer, 1 array in int32, 1 array in int16 ; I wanted to know if it's possible to save one step)
How about this?
np.fromstring(data, dtype=np.uint16)[0::2]
Note however, that overhead of the kind you describe here is common when working with numpy, and cannot always be avoided. If this kind of overhead isn't acceptable for your application, make sure that you plan ahead to write extension modules for the performance critical parts.
Note: it should be 0::2 or 1::2 depending on the endianness of your platform
Let's say I need to save a matrix(each line corresponds one row) that could be loaded from fortran later. What method should I prefer? Is converting everything to string is the only one approach?
You can save them in binary format as well. Please see the documentation on the struct standard module, it has a pack function for converting Python object into binary data.
For example:
import struct
value = 3.141592654
data = struct.pack('d', value)
open('file.ext', 'wb').write(data)
You can convert each element of your matrix and write to a file. Fortran should be able to load that binary data. You can speed up the process by converting a row as a whole, like this:
row_data = struct.pack('d' * len(matrix_row), *matrix_row)
Please note, that 'd' * len(matrix_row) is a constant for your matrix size, so you need to calculate that format string only once.
I don't know fortran, so it's hard to tell what is easy for you to perform on that side for parsing.
It sounds like your options are either saving the doubles in plaintext (meaning, 'converting' them to string), or in binary (using struct and the likes). The decision for which one is better depends.
I would go with the plaintext solution, as it means the files will be easily readable, and you won't have to mess with different kinds of details (endianity, default double sizes).
But, there are cases where binary is better (for example, if you have a really big list of doubles and space is of importance, or if it is easier for you to parse it and you need the optimization) - but this is likely not your case.
You can use JSON
import json
matrix = [[2.3452452435, 3.34134], [4.5, 7.9]]
data = json.dumps(matrix)
open('file.ext', 'wb').write(data)
File content will look like:
[[2.3452452435, 3.3413400000000002], [4.5, 7.9000000000000004]]
If legibility and ease of access is important (and file size is reasonable), Fortran can easily parse a simple array of numbers, at least if it knows the size of the matrix beforehand (with something like READ(FILE_ID, '2(F)'), I think):
1.234 5.6789e4
3.1415 9.265358978
42 ...
Two nested for loops in your Python code can easily write your matrix in this form.
I'm reading in a binary file into a list and parsing the binary data. I'm using unpack() to extract certain parts of the data as primitive data types, and I want to edit that data and insert it back into the original list of bytes. Using pack_into() would make it easy, except that I'm using Python 2.4, and pack_into() wasn't introduced until 2.5
Does anyone know of a good way to go about serializing the data this way so that I can accomplish essentially the same functionality as pack_into()?
Have you looked at the bitstring module? It's designed to make the construction, parsing and modification of binary data easier than using the struct and array modules directly.
It's especially made for working at the bit level, but will work with bytes just as well. It will also work with Python 2.4.
from bitstring import BitString
s = BitString(filename='somefile')
# replace byte range with new values
# The step of '8' signifies byte rather than bit indicies.
s[10:15:8] = '0x001122'
# Search and replace byte value with two bytes
s.replace('0xcc', '0xddee', bytealigned=True)
# Different interpretations of the data are available through properties
if s[5:7:8].int > 1000:
s[5:7:8] = 1000
# Use the bytes property to get back to a Python string
open('newfile', 'wb').write(s.bytes)
The underlying data stored in the BitString is just an array object, but with a comprehensive set of functions and special methods to make it simple to modify and interpret.
Do you mean editing data in a buffer object? Documentation on manipulating those at all from Python directly is fairly scarce.
If you just want to edit bytes in a string, it's simple enough, though; struct.pack_into is new to 2.5, but struct.pack isn't:
import struct
s = open("file").read()
ofs = 1024
fmt = "Ih"
size = struct.calcsize(fmt)
before, data, after = s[0:ofs], s[ofs:ofs+size], s[ofs+size:]
values = list(struct.unpack(fmt, data))
values[0] += 5
values[1] /= 2
data = struct.pack(fmt, *values)
s = "".join([before, data, after])