I have a huge binary file(several GB) that has the following dataformat:
4 subsequent bytes form one composite datapoint(32 bits) which consists of:
b0-b3 4 flag bits
b4-b17 14 bit signed integer
b18-b32 14 bit signed integer
I need to access both signed integers and the flag bits separately and append to a list or some smarter datastructure (not yet decided). At the moment I'm using the following code to read it in:
from collections import namedtuple
DataPackage = namedtuple('DataPackage', ['ie', 'if1', 'if2', 'if3', 'quad2', 'quad1'])
def _unpack_integer(bits):
value = int(bits, 2)
if bits[0] == '1':
value -= (1 << len(bits))
return value
def unpack(data):
bits = ''.join(['{0:08b}'.format(b) for b in bytearray(data)])
flags = [bool(bits[i]) for i in range(4)]
quad2 = _unpack_integer(bits[4:18])
quad1 = _unpack_integer(bits[18:])
return DataPackage(flags[0], flags[1], flags[2], flags[3], quad2, quad1)
def read_file(filename, datapoints=None):
data = []
i = 0
with open(filename, 'rb') as fh:
value = fh.read(4)
while value:
dp = unpack(value)
data.append(dp)
value = fh.read(4)
i += 1
if i % 10000 == 0:
print('Read: %d kB' % (float(i) * 4.0 / 1000.0))
if datapoints:
if i == datapoints:
break
return data
if __name__ == '__main__':
data = read_heterodyne_file('test.dat')
This code works but it's too slow for my purposes (2s for 100k datapoints with 4byte each). I would need a factor of 10 in speed at least.
The profiler says that the code spends it's time mostly in string formatting(to get the bits) and in _unpack_integer().
Unfortunately I am not sure how to proceed here. I'm thinking about either using cython or directly writing some c code to do the read in. I also tried Pypy ant it gave me huge performance gain but unfortunately it needs to be compatible to a bigger project which doesn't work with Pypy.
I would recommend trying ctypes, if you already have a c/c++ library that recognizes the data-strcture. The benefits are, the datastructues are still available to your python while the 'loading' would be fast. If you already have a c library to load the data you can use the function call from that library to do the heavy lifting and just map the data into your python structures. I'm sorry I won't be able to try out and provide proper code for your example (perhaps someone else cane) but here are a couple of tips to get you started
My take on how one might create bit vectors in python:
https://stackoverflow.com/a/40364970/262108
The approach I mentioned above which I applied to a similar problem that you described. Here I use ctypes to create a ctypes data-structure (thus enabling me to use the object as any other python object), while also being able to pass it along to a C library:
https://gist.github.com/lonetwin/2bfdd41da41dae326afb
Thanks to the hint by Jean-François Fabre I found a suitable sulution using bitmasks which gives me a speedup of factor 6 in comparison to the code in the question. It has now a throuput of around 300k datapoints/s.
Also I neglected using the admittedly nice named tuples and replaced it by a list because I found out this is also a bottleneck.
The code now looks like
masks = [2**(31-i) for i in range(4)]
def unpack3(data):
data = struct.unpack('>I', data)[0]
quad2 = (data & 0xfffc000) >> 14
quad1 = data & 0x3fff
if (quad2 & (1 << (14 - 1))) != 0:
quad2 = quad2 - (1 << 14)
if (quad1 & (1 << (14 - 1))) != 0:
quad1 = quad1 - (1 << 14)
flag0 = data & masks[0]
flag1 = data & masks[1]
flag2 = data & masks[2]
flag3 = data & masks[3]
return flag0, flag1, flag2, flag3, quad2, quad1
The line profiler says:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
58 #profile
59 def unpack3(data):
60 1000000 3805727 3.8 12.3 data = struct.unpack('>I', data)[0]
61 1000000 2670576 2.7 8.7 quad2 = (data & 0xfffc000) >> 14
62 1000000 2257150 2.3 7.3 quad1 = data & 0x3fff
63 1000000 2634679 2.6 8.5 if (quad2 & (1 << (14 - 1))) != 0:
64 976874 2234091 2.3 7.2 quad2 = quad2 - (1 << 14)
65 1000000 2660488 2.7 8.6 if (quad1 & (1 << (14 - 1))) != 0:
66 510978 1218965 2.4 3.9 quad1 = quad1 - (1 << 14)
67 1000000 3099397 3.1 10.0 flag0 = data & masks[0]
68 1000000 2583991 2.6 8.4 flag1 = data & masks[1]
69 1000000 2486619 2.5 8.1 flag2 = data & masks[2]
70 1000000 2473058 2.5 8.0 flag3 = data & masks[3]
71 1000000 2742228 2.7 8.9 return flag0, flag1, flag2, flag3, quad2, quad1
So there is not one clear bottleneck anymore. Probably now it's as fast as it gets in pure Python. Or does anyone have an idea for further speedup?
Related
I recently bought the PTZ-Camera-controller from arducam, and found a kind of API/Controller software on github (https://github.com/ArduCAM/PTZ-Camera-Controller). The module is integrated with I2C, and has several functions on different registers. I tried understanding the python-program, but I was confused by one line that occured several times in the program:
def read(self, I2C_address, register_address):
value = self.bus.read_word_data(I2C_address, register_address)
value = ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8) #This one
return value
def write(self, I2C_address, register_address, value):
if value < 0:
value = 0
value = ((value & 0x00FF) << 8) | ((value & 0xFF00) >> 8) #And this one
return self.bus.write_word_data(I2C_address, register_address, value)
These can be used to, for instance, read or write to the cameras zoom-motor. The motor has 2317 steps, and the default span for the zoom-value is 0-18000.
Which makes some sense, i guess, because if you for instance try writing 18000 = 0x4650, you get:
value = ((0x4650 & 0x00FF) << 8 | (0x4650 & 0xFF00) >> 8))
Which should equal 2400, by my estimation. Fairly close to 2317. However:
Why would they do this, instead of just having the input span from 0 to 2317?
It looks like that's flipping the order of two bytes in a 16 bit integer.
That's probably to convert from big endian to little endian, two different ways of representing integers. I think the controller uses big endian, PCs use little endian. (https://en.wikipedia.org/wiki/Endianness)
Problem
In Julia, one could easily see the value of intermediate variable using #show marco for debugging purposes. For example
for i in 1:5
#show i ^ 2
end
which will output
i ^ 2 = 1
i ^ 2 = 4
i ^ 2 = 9
i ^ 2 = 16
i ^ 2 = 25
However, in order to show intermediate value in Python, one have to write print("<variable> = " + ...), which is too much work for debugging a large project. I am wondering if there is any way that could enable Python to have similar functionality as in Julia.
I previously saw people use decorator to acquire the program runtime (see here), which is quite similar to Julia marco. But I do not know how to get it to work here to show intermediate variable.
Could someone help me, thank you in advance!
Why not just make a function (using python3):
def show(s):
print(s + ' = ' + str(eval(s)))
for i in range(10):
show(f"{i} ** 2")
This will output:
0 ** 2 = 0
1 ** 2 = 1
2 ** 2 = 4
3 ** 2 = 9
4 ** 2 = 16
5 ** 2 = 25
6 ** 2 = 36
7 ** 2 = 49
8 ** 2 = 64
9 ** 2 = 81
Still not as simple as julia but slightly easier to write than full on concatenation. Note that using eval() may be considered bad practice (see here and here).
As for using decorators, they can only be applied to functions or classes.
you can use spyder, visual studio code or any IDE which supports debug mode.
I googled this issue for last 2 weeks and wasn't able to find an algorithm or solution. I have some short .wav file but it has MULAW compression and python doesn't seem to have function inside wave.py that can successfully decompresses it. So I've taken upon myself to build a decoder in python.
I've found some info about MULAW in basic elements:
Wikipedia
A-law u-Law comparison
Some c-esc codec library
So I need some guidance, since I don't know how to approach getting from signed short integer to a full wave signal. This is my initial thought from what I've gathered so far:
So from wiki I've got a equation for u-law compression and decompression :
compression :
decompression :
So judging by compression equation, it looks like the output is limited to a float range of -1 to +1 , and with signed short integer from –32,768 to 32,767 so it looks like I would need to convert it from short int to float in specific range.
Now, to be honest, I've heard of quantisation before, but I am not sure if I should first try and dequantize and then decompress or in the other way, or even if in this case it is the same thing... the tutorials/documentation can be a bit of tricky with terminology.
The wave file I am working with is supposed to contain 'A' sound like for speech synthesis, I could probably verify success by comparing 2 waveforms in some audio software and custom wave analyzer but I would really like to diminish trial and error section of this process.
So what I've had in mind:
u = 0xff
data_chunk = b'\xe7\xe7' # -6169
data_to_r1 = unpack('h',data_chunk)[0]/0xffff # I suspect this is wrong,
# # but I don't know what else
u_law = ( -1 if data_chunk<0 else 1 )*( pow( 1+u, abs(data_to_r1)) -1 )/u
So is there some sort of algorithm or crucial steps I would need to take in form of first: decompression, second: quantisation : third ?
Since everything I find on google is how to read a .wav PCM-modulated file type, not how to manage it if wild compression arises.
So, after scouring the google the solution was found in github ( go figure ). I've searched for many many algorithms and found 1 that is within bounds of error for lossy compression. Which is for u law for positive values from 30 -> 1 and for negative values from -32 -> -1
To be honest i think this solution is adequate but not quite per equation per say, but it is best solution for now. This code is transcribed to python directly from gcc9108 audio codec
def uLaw_d(i8bit):
bias = 33
sign = pos = 0
decoded = 0
i8bit = ~i8bit
if i8bit&0x80:
i8bit &= ~(1<<7)
sign = -1
pos = ( (i8bit&0xf0) >> 4 ) + 5
decoded = ((1 << pos) | ((i8bit & 0x0F) << (pos - 4)) | (1 << (pos - 5))) - bias
return decoded if sign else ~decoded
def uLaw_e(i16bit):
MAX = 0x1fff
BIAS = 33
mask = 0x1000
sign = lsb = 0
pos = 12
if i16bit < 0:
i16bit = -i16bit
sign = 0x80
i16bit += BIAS
if ( i16bit>MAX ): i16bit = MAX
for x in reversed(range(pos)):
if i16bit&mask != mask and pos>=5:
pos = x
break
lsb = ( i16bit>>(pos-4) )&0xf
return ( ~( sign | ( pos<<4 ) | lsb ) )
With test:
print( 'normal :\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(0xff) )
print( 'encoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_e(0xff)) )
print( 'decoded:\t{0}\t|\t{0:2X}\t:\t{0:016b}'.format(uLaw_d(uLaw_e(0xff))) )
and output:
normal : 255 | FF : 0000000011111111
encoded: -179 | -B3 : -000000010110011
decoded: 263 | 107 : 0000000100000111
And as you can see 263-255 = 8 which is within bounds. When i tried to implement seeemmmm method described in G.711 ,that kind user Oliver Charlesworth suggested that i look in to , the decoded value for maximum in data was -8036 which is close to the maximum of uLaw spec, but i couldn't reverse engineer decoding function to get binary equivalent of function from wikipedia.
Lastly, i must say that i am currently disappointed that python library doesn't support all kind of compression algorithms since it is not just a tool that people use, it is also a resource python consumers learn from since most of data for further dive into code isn't readily available or understandable.
EDIT
After decoding the data and writing wav file via wave.py i've successfully succeeded to write a new raw linear PCM file. This works... even though i was sceptical at first.
EDIT 2: ::> you can find real solution oncompressions.py
I find this helpful for converting to/from ulaw with numpy arrays.
import audioop
def numpy_audioop_helper(x, xdtype, func, width, ydtype):
'''helper function for using audioop buffer conversion in numpy'''
xi = np.asanyarray(x).astype(xdtype)
if np.any(x != xi):
xinfo = np.iinfo(xdtype)
raise ValueError("input must be %s [%d..%d]" % (xdtype, xinfo.min, xinfo.max))
y = np.frombuffer(func(xi.tobytes(), width), dtype=ydtype)
return y.reshape(xi.shape)
def audioop_ulaw_compress(x):
return numpy_audioop_helper(x, np.int16, audioop.lin2ulaw, 2, np.uint8)
def audioop_ulaw_expand(x):
return numpy_audioop_helper(x, np.uint8, audioop.ulaw2lin, 2, np.int16)
Python actually supports decoding u-Law out of the box:
audioop.ulaw2lin(fragment, width)
Convert sound fragments in u-LAW encoding to linearly encoded sound fragments. u-LAW encoding always uses 8 bits samples, so width
refers only to the sample width of the output fragment here.
https://docs.python.org/3/library/audioop.html#audioop.ulaw2lin
I know that pdb is an interactive system and it is very helpful.
My ultimate goal is to gather all memory states after executing each command in certain function, command by command. For example, with a code snippet
0: def foo() :
1: if True:
2: x=1
3: else:
4: x=2
5: x
then the memory state of each command is
0: empty
1: empty
2: x = 1
3: x = 1
4: (not taken)
5: x = 1
To do this, what I'd like to do with pdb is to write a script that interact with pdb class. I know that s is a function to step forward in statements and print var(in the above case, var is x) is a function to print the value of certain variable. I can gather variables at each command. Then, I want to run a script like below:
import pdb
pdb.run('foo()')
while(not pdb.end()):
pdb.s()
pdb.print('x')
But I cannot find any way how to implement this functionality. Can anybody help me??
Try memory_profiler:
The line-by-line memory usage mode is used much in the same way of the
line_profiler: first decorate the function you would like to profile
with #profile and then run the script with a special script (in this
case with specific arguments to the Python interpreter).
Line # Mem usage Increment Line Contents
==============================================
3 #profile
4 5.97 MB 0.00 MB def my_func():
5 13.61 MB 7.64 MB a = [1] * (10 ** 6)
6 166.20 MB 152.59 MB b = [2] * (2 * 10 ** 7)
7 13.61 MB -152.59 MB del b
8 13.61 MB 0.00 MB return a
Or Heapy:
The aim of Heapy is to support debugging and optimization regarding
memory related issues in Python programs.
Partition of a set of 132527 objects. Total size = 8301532 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 35144 27 2140412 26 2140412 26 str
1 38397 29 1309020 16 3449432 42 tuple
2 530 0 739856 9 4189288 50 dict (no owner)
NB: This is my first foray into memory profiling with Python, so perhaps I'm asking the wrong question here. Advice re improving the question appreciated.
I'm working on some code where I need to store a few million small strings in a set. This, according to top, is using ~3x the amount of memory reported by heapy. I'm not clear what all this extra memory is used for and how I can go about figuring out whether I can - and if so how to - reduce the footprint.
memtest.py:
from guppy import hpy
import gc
hp = hpy()
# do setup here - open files & init the class that holds the data
print 'gc', gc.collect()
hp.setrelheap()
raw_input('relheap set - enter to continue') # top shows 14MB resident for python
# load data from files into the class
print 'gc', gc.collect()
h = hp.heap()
print h
raw_input('enter to quit') # top shows 743MB resident for python
The output is:
$ python memtest.py
gc 5
relheap set - enter to continue
gc 2
Partition of a set of 3197065 objects. Total size = 263570944 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3197061 100 263570168 100 263570168 100 str
1 1 0 448 0 263570616 100 types.FrameType
2 1 0 280 0 263570896 100 dict (no owner)
3 1 0 24 0 263570920 100 float
4 1 0 24 0 263570944 100 int
So in summary, heapy shows 264MB while top shows 743MB. What's using the extra 500MB?
Update:
I'm running 64 bit python on Ubuntu 12.04 in VirtualBox in Windows 7.
I installed guppy as per the answer here:
sudo pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppy