Speedup sympy-lamdified and vectorized function - python

I am using sympy to generate some functions for numerical calculations. Therefore I lambdify an expression an vectorize it to use it with numpy arrays. Here is an example:
import numpy as np
import sympy as sp
def numpy_function():
x, y, z = np.mgrid[0:1:40*1j, 0:1:40*1j, 0:1:40*1j]
T = (1 - np.cos(2*np.pi*x))*(1 - np.cos(2*np.pi*y))*np.sin(np.pi*z)*0.1
return T
def sympy_function():
x, y, z = sp.Symbol("x"), sp.Symbol("y"), sp.Symbol("z")
T = (1 - sp.cos(2*sp.pi*x))*(1 - sp.cos(2*sp.pi*y))*sp.sin(sp.pi*z)*0.1
lambda_function = np.vectorize(sp.lambdify((x, y, z), T, "numpy"))
x, y, z = np.mgrid[0:1:40*1j, 0:1:40*1j, 0:1:40*1j]
T = lambda_function(x,y,z)
return T
The problem between the sympy version and a pure numpy version is the speed i.e.
In [3]: timeit test.numpy_function()
100 loops, best of 3: 11.9 ms per loop
vs.
In [4]: timeit test.sympy_function()
1 loops, best of 3: 634 ms per loop
So is there any way to get closer to the speed of the numpy version ?
I think np.vectorize is pretty slow but somehow some part of my code does not work without it. Thank you for any suggestions.
EDIT:
So I found the reason why the vectorize function is necessary, i.e:
In [35]: y = np.arange(10)
In [36]: f = sp.lambdify(x,sin(x),"numpy")
In [37]: f(y)
Out[37]:
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])
this seems to work fine however:
In [38]: y = np.arange(10)
In [39]: f = sp.lambdify(x,1,"numpy")
In [40]: f(y)
Out[40]: 1
So for simple expression like 1 this function doesn't return an array.
Is there a way to fix this and isn't this some kind of bug or at least inconsistent design?

lambdify returns a single value for constants because no numpy functions are involved. This is because of the way lambdify works (see https://stackoverflow.com/a/25514007/161801).
But this is typically not a problem because a constant will automatically broadcast to the correct shape in any operation that you use it in with an array. On the other hand, if you explicitly worked with an array of the same constant, it would be much less efficient because you would compute the same operations multiple times.

Using np.vectorize() in this case is like looping over the first dimension of x, y and z, and that's why it becomes slower. You don't need np.vectorize() IF you tell lambdify()to use NumPy's functions, which is exactly what you are doing. Then, using:
def sympy_function():
x, y, z = sp.Symbol("x"), sp.Symbol("y"), sp.Symbol("z")
T = (1 - sp.cos(2*sp.pi*x))*(1 - sp.cos(2*sp.pi*y))*sp.sin(sp.pi*z)*0.1
lambda_function = sp.lambdify((x, y, z), T, "numpy")
x, y, z = np.mgrid[0:1:40*1j, 0:1:40*1j, 0:1:40*1j]
T = lambda_function(x,y,z)
return T
makes the performance comparable:
In [26]: np.allclose(numpy_function(), sympy_function())
Out[26]: True
In [27]: timeit numpy_function()
100 loops, best of 3: 4.08 ms per loop
In [28]: timeit sympy_function()
100 loops, best of 3: 5.52 ms per loop

Related

Optimizing nested for loops in Python for numpy arrays

I am trying to optimize a nested for loop in python. Here is the code (Note: inputting the data needs not to be optimized):
Y=numpy.zeros(44100)
for i in range(len(Y)):
Y[i]=numpy.sin(i/len(Y))
### /Data^^
Z=numpy.zeros(len(Y))
for i in range(len(Y))
for j in range(len(Y))
Z[i]+=Y[j]*numpy.sinc(i-j)
How to best optimize code written for numpy arrays when nested for loops are involved?
EDIT: For clarity.
I guess this only makes sense to do if you multiply the argument to sinc with some factor f.. But then you can use numpy.convolve:
def orig(Y, f):
Z=numpy.zeros(len(Y))
for i in range(len(Y)):
for j in range(len(Y)):
Z[i]+=Y[j]*numpy.sinc((i-j)*f)
return Z
def new(Y, f):
sinc = np.sinc(np.arange(1-len(Y), len(Y)) * f)
return np.convolve(Y, sinc, 'valid')
In [111]: Y=numpy.zeros(441)
...: for i in range(len(Y)):
...: Y[i]=numpy.sin(i/len(Y))
In [112]: %time Z = orig(Y, 0.9)
Wall time: 2.81 s
In [113]: %timeit Z = new1(Y, 0.9)
The slowest run took 5.56 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 109 µs per loop
For the really good speed have a look at scipy.signal.fftconvolve

Fast sequential addition to rectangular subarray in numpy

I have come across a problem which is to rewrite a piece of code in vectorized form. The code shown below is a simplified illustration of initial problem
K = 20
h, w = 15, 20
H, W = 1000-h, 2000-w
q = np.random.randint(0, 20, size=(H, W, K)) # random just for illustration
Q = np.zeros((H+h, W+w, K))
for n in range(H):
for m in range(W):
Q[n:n+h, m:m+w, :] += q[n, m, :]
This code takes long to execute and it seems to me it is rather simple to allow vectorized implementation.
I am aware of numpy's s_ function which allows to construct slices which in turn can help in code vectorizing. But because every single element in Q is the result of multiple subsequent additions of elements from q I found it difficult to proceed in that simple way.
I guess that np.add.at could be useful to cope with sequential addition. But i have spent much time trying to make this two functions work for me and decided to ask for help because I constantly get an
IndexError: failed to coerce slice entry of type numpy.ndarray to integer
for any attempt i make.
Maybe there is some another numpy's magic which I am unaware of and which could help me in my task but it seems extremely difficult to google for it.
Well you are basically summing across sliding windows along the first and second axes, which in signal processing domain is termed as convolution. For two axes that would be 2D convolution. Now, Scipy has it implemented as convolve2d and could be used for each slice along the third axis.
Thus, we would have an implementation with it, like so -
from scipy.signal import convolve2d
kernel = np.ones((h,w),dtype=int)
m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
out = np.empty((m,n,r),dtype=q.dtype)
for i in range(r):
out[...,i] = convolve2d(q[...,i],kernel)
As it turns out, we can use fftconvolve from the same repo that allows us to work with higher-dimensional arrays. This would get us the output in a fully vectorized way, like so -
from scipy.signal import fftconvolve
out = fftconvolve(q,np.ones((h,w,1),dtype=int))
Runtime test
Function definitions -
def original_app(q,K,h,w,H,W):
Q = np.zeros((H+h-1, W+w-1, K))
for n in range(H):
for m in range(W):
Q[n:n+h, m:m+w, :] += q[n, m, :]
return Q
def convolve2d_app(q,K,h,w,H,W):
kernel = np.ones((h,w),dtype=int)
m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
out = np.empty((m,n,r),dtype=q.dtype)
for i in range(r):
out[...,i] = convolve2d(q[...,i],kernel)
return out
def fftconvolve_app(q,K,h,w,H,W):
return fftconvolve(q,np.ones((h,w,1),dtype=int))
Timings and verification -
In [128]: # Setup inputs
...: K = 20
...: h, w = 15, 20
...: H, W = 200-h, 400-w
...: q = np.random.randint(0, 20, size=(H, W, K))
...:
In [129]: %timeit original_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop
In [130]: %timeit convolve2d_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop
In [131]: %timeit fftconvolve_app(q,K,h,w,H,W)
1 loops, best of 3: 233 ms per loop
In [132]: np.allclose(original_app(q,K,h,w,H,W),convolve2d_app(q,K,h,w,H,W))
Out[132]: True
In [133]: np.allclose(original_app(q,K,h,w,H,W),fftconvolve_app(q,K,h,w,H,W))
Out[133]: True
So, it seems fftconvolve based approach is doing really well there!

Python Multiple Simple Linear Regression

Note this is not a question about multiple regression, it is a question about doing simple (single-variable) regression multiple times in Python/NumPy (2.7).
I have two m x n arrays x and y. The rows correspond to each other, and each pair is the set of (x,y) points for a measurement. That is, plt.plot(x.T, y.T, '.') would plot each of m datasets/measurements.
I'm wondering what the best way to perform the m linear regressions is. Currently I loop over the rows and use scipy.stats.linregress(). (Assume I don't want solutions based on doing linear algebra with the matrices but instead want to work with this function, or an equivalent black-box function.) I could try np.vectorize, but the docs indicate it also loops.
With some experimenting, I've also found a way to use list comprehensions with map() and get correct results. I've put both solutions below. In IPython, `%%timeit`` returns, using a small dataset (commented out):
(loop) 1000 loops, best of 3: 642 µs per loop
(map) 1000 loops, best of 3: 634 µs per loop
To try magnifying this, I made a much bigger random dataset (dimension trials x trials):
(loop, trials = 1000) 1 loops, best of 3: 299 ms per loop
(loop, trials = 10000) 1 loops, best of 3: 5.64 s per loop
(map, trials = 1000) 1 loops, best of 3: 256 ms per loop
(map, trials = 10000) 1 loops, best of 3: 2.37 s per loop
That's a decent speedup on a really big set, but I was expecting a bit more. Is there a better way?
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
np.random.seed(42)
#y = np.array(((0,1,2,3),(1,2,3,4),(2,4,6,8)))
#x = np.tile(np.arange(4), (3,1))
trials = 1000
y = np.random.rand(trials,trials)
x = np.tile(np.arange(trials), (trials,1))
num_rows = shape(y)[0]
slope = np.zeros(num_rows)
inter = np.zeros(num_rows)
for k, xrow in enumerate(x):
yrow = y[k,:]
slope[k], inter[k], t1, t2, t3 = stats.linregress(xrow, yrow)
#plt.plot(x.T, y.T, '.')
#plt.hold = True
#plt.plot(x.T, x.T*slope + intercept)
# Can the loop be removed?
tempx = [x[k,:] for k in range(num_rows)]
tempy = [y[k,:] for k in range(num_rows)]
results = np.array(map(stats.linregress, tempx, tempy))
slope_vec = results[:,0]
inter_vec = results[:,1]
#plt.plot(x.T, y.T, '.')
#plt.hold = True
#plt.plot(x.T, x.T*slope_vec + inter_vec)
print "Slopes equal by both methods?: ", np.allclose(slope, slope_vec)
print "Inters equal by both methods?: ", np.allclose(inter, inter_vec)
Single variable linear regression is simple enough to vectorize it manually:
def multiple_linregress(x, y):
x_mean = np.mean(x, axis=1, keepdims=True)
x_norm = x - x_mean
y_mean = np.mean(y, axis=1, keepdims=True)
y_norm = y - y_mean
slope = (np.einsum('ij,ij->i', x_norm, y_norm) /
np.einsum('ij,ij->i', x_norm, x_norm))
intercept = y_mean[:, 0] - slope * x_mean[:, 0]
return np.column_stack((slope, intercept))
With some made up data:
m = 1000
n = 1000
x = np.random.rand(m, n)
y = np.random.rand(m, n)
it outperforms your looping options by a fair margin:
%timeit multiple_linregress(x, y)
100 loops, best of 3: 14.1 ms per loop

Faster way to do this using Python, Numpy?

I have a function in Python. I would like to make it a lot faster? Does anyone have any tips?
def garchModel(e2, omega=0.01, beta=0.1, gamma=0.8 ):
sigma = np.empty( len( e2 ) )
sigma[0] = omega
for i in np.arange( 1, len(e2) ):
sigma[i] = omega + beta * sigma[ i-1 ] + gamma * e2[ i-1 ]
return sigma
The following code works, but there's too much trickery going on, I am not sure it is not depending on some undocumented implementation detail that could eventually break down:
from numpy.lib.stride_tricks import as_strided
from numpy.core.umath_tests import inner1d
def garch_model(e2, omega=0.01, beta=0.1, gamma=0.8):
n = len(e2)
sigma = np.empty((n,))
sigma[:] = omega
sigma[1:] += gamma * e2[:-1]
sigma_view = as_strided(sigma, shape=(n-1, 2), strides=sigma.strides*2)
inner1d(sigma_view, [beta, 1], out=sigma[1:])
return sigma
In [75]: e2 = np.random.rand(1e6)
In [76]: np.allclose(garchModel(e2), garch_model(e2))
Out[76]: True
In [77]: %timeit garchModel(e2)
1 loops, best of 3: 6.93 s per loop
In [78]: %timeit garch_model(e2)
100 loops, best of 3: 17.5 ms per loop
I have tried using Numba, which for my dataset gives a 200x improvement!
Thanks for all the suggestions above, but I can't get them to give me the correct answer. I will try to read up about linear filters, but it's Friday night now and I'm a bit too tired to take in anymore information.
from numba import autojit
#autojit
def garchModel2(e2, beta=0.1, gamma=0.8, omega=0.01, ):
sigma = np.empty( len( e2 ) )
sigma[0] = omega
for i in range( 1, len(e2) ):
sigma[i] = omega + beta * sigma[ i-1 ] + gamma * e2[ i-1 ]
return sigma
This is a solution based on #stx2 's idea. One potential problem is that beta**N may cause float point overflow when N becomes large (same with cumprod).
>>> def garchModel2(e2, omega=0.01, beta=0.1, gamma=0.8):
wt0=cumprod(array([beta,]*(len(e2)-1)))
wt1=cumsum(hstack((0.,wt0)))+1
wt2=hstack((wt0[::-1], 1.))*gamma
wt3=hstack((1, wt0))[::-1]*beta
pt1=hstack((0.,(array(e2)*wt2)[:-1]))
pt2=wt1*omega
return cumsum(pt1)/wt3+pt2
>>> garchModel([1,2,3,4,5])
array([ 0.01 , 0.811 , 1.6911 , 2.57911 , 3.467911])
>>> garchModel2([1,2,3,4,5])
array([ 0.01 , 0.811 , 1.6911 , 2.57911 , 3.467911])
>>> f1=lambda: garchModel2(range(5))
>>> f=lambda: garchModel(range(5))
>>> T=timeit.Timer('f()', 'from __main__ import f')
>>> T1=timeit.Timer('f1()', 'from __main__ import f1')
>>> T.timeit(1000)
0.01588106868331031
>>> T1.timeit(1000) #When e2 dimension is samll, garchModel2 is slower
0.04536693909403766
>>> f1=lambda: garchModel2(range(10000))
>>> f=lambda: garchModel(range(10000))
>>> T.timeit(1000)
35.745981961394534
>>> T1.timeit(1000) #When e2 dimension is large, garchModel2 is faster
1.7330512676890066
>>> f1=lambda: garchModel2(range(1000000))
>>> f=lambda: garchModel(range(1000000))
>>> T.timeit(50)
167.33835501439427
>>> T1.timeit(50) #The difference is even bigger.
8.587259274572716
I didn't use beta**N but cumprod instead. ** will probably slow it down a lot.
Your calculation is a linear filter of the sequence omega + gamma*e2, so you can use scipy.signal.lfilter. Here's a version of your calculation, with appropriate tweaks of the initial conditions and input of the filter to generate the same output as garchModel:
import numpy as np
from scipy.signal import lfilter
def garch_lfilter(e2, omega=0.01, beta=0.1, gamma=0.8):
# Linear filter coefficients:
b = [1]
a = [1, -beta]
# Initial condition for the filter:
zi = np.array([beta*omega])
# Preallocate the output array, and set the first value to omega:
sigma = np.empty(len(e2))
sigma[0] = omega
# Apply the filter to omega + gamma*e2[:-1]
sigma[1:], zo = lfilter(b, a, omega + gamma*e2[:-1], zi=zi)
return sigma
Verify that it gives the same result as #Jaime's function:
In [6]: e2 = np.random.rand(1e6)
In [7]: np.allclose(garch_model(e2), garch_lfilter(e2))
Out[7]: True
It is a lot faster than garchModel, but not as fast as #Jaime's function.
Timing for #Jaime's garch_model:
In [8]: %timeit garch_model(e2)
10 loops, best of 3: 21.6 ms per loop
Timing for garch_lfilter:
In [9]: %timeit garch_lfilter(e2)
10 loops, best of 3: 26.8 ms per loop
As #jaime shows, there's a way. However, I don't know if there's a way to rewrite the function, make it much faster, and keep it simple.
An alternative approach then, is using optimization "magics", such as cython or numba.

Vectorizing ndimage functions for my code

I want to be able to vectorize this code:
def sobHypot(rec):
a, b, c = rec.shape
hype = np.ones((a,b,c))
for i in xrange(c):
x=ndimage.sobel(abs(rec[...,i])**2,axis=0, mode='constant')
y=ndimage.sobel(abs(rec[...,i])**2,axis=1, mode='constant')
hype[...,i] = np.hypot(x,y)
hype[...,i] = hype[...,i].mean()
index = hype.argmax()
return index
where rec,shape returns (1024,1024,20)
Here's how you can avoid the for-loop with the sobel filter:
import numpy as np
from scipy.ndimage import sobel
def sobHypot_vec(rec):
r = np.abs(rec)
x = sobel(r, 0, mode='constant')
y = sobel(r, 1, mode='constant')
h = np.hypot(x, y)
h = np.apply_over_axes(np.mean, h, [0,1])
return h.argmax()
I'm not sure if the sobel filter is particularly necessary in your application, and this is hard to test without your particular 20-layer 'image', but you could try using np.gradient instead of running the sobel twice. The advantage is that gradient runs in three dimensions. You can ignore the component in the third, and take the hypot of just the first two. This seems wasteful but is actually still faster in my tests.
For a variety of randomly generated images, r = np.random.rand(1024,1024,20) + np.random.rand(1024,1024,20)*1j, this gives the same answer as your code, but test it to be sure, and possibly fiddle with the dx, dy arguments of np.gradient
def grad_max(rec):
g = np.gradient(np.abs(rec))[:2] # ignore derivative in third dimension
h = np.hypot(*g)
h = np.apply_over_axes(np.mean, h, [0,1]) # mean along first and second dimension
return h.argmax()
Using this code for timing:
def sobHypot_clean(rec):
rs = rec.shape
hype = np.ones(rs)
r = np.abs(rec)
for i in xrange(rs[-1]):
ri = r[...,i]
x = sobel(ri, 0, mode='constant')
y = sobel(ri, 1, mode='constant')
hype[...,i] = np.hypot(x,y).mean()
return hype.argmax()
Timing:
In [1]: r = np.random.rand(1024,1024,20) + np.random.rand(1024,1024,20)*1j
# Original Post
In [2]: timeit sobHypot(r)
1 loops, best of 3: 9.85 s per loop
#cleaned up a bit:
In [3]: timeit sobHypot_clean(r)
1 loops, best of 3: 7.64 s per loop
# vectorized:
In [4]: timeit sobHypot_vec(r)
1 loops, best of 3: 5.98 s per loop
# using np.gradient:
In [5]: timeit grad_max(r)
1 loops, best of 3: 4.12 s per loop
Please test any of these functions on your own images to be sure they give the desired output, since different types of arrays could react differently from the simple random tests I did.

Categories

Resources