I am trying to speed up the following code that computes:
where I only need to compute this function for x > y from 0 to 1 (but need very high discretization like dt = 0.001). I have vectorized my solution, but it still not fast enough(really need like a 10x improvement). Any ideas? (Tried something like cython, but still slow because of the nature of vectorization)
def solveF(x, f, lam):
nx = len(x)
res = np.zeros((nx, nx))
for i in range(0, nx):
for j in range(0, nx):
if i > j:
res[i][j] = f*np.exp(lam*(x[i]-x[j]))
return res
def fastKernelCalc(f, x, dx):
nx = len(x)
kappa = np.zeros((nx, nx))
f2 = f.transpose()
for i in range(nx):
t1 = time.time()
for j, xj in enumerate(x):
kernel = 0
if i-j>0 and j!=0:
kernel -= sum(np.diagonal(f, offset=j-i)[0:j])*dx
for k in range(0, j):
kernel += sum(f2[k][k:k+i-j]*kappa[i-j+k][k:k+i-j])*dx*dx
kappa[i][j] = kernel
return kappa
X = 1
dx = 0.001
nx = int(round(X/dx))+1
spatial = np.linspace(0, X, nx)
f = solveF(spatial, 5, 5)
kernel= fastKernelCalc(f, spatial, dx)
My first thought was that if speed is paramount, you should probably use C or Fortran for numerical stuff. Python is great, but not fast.
Things that do pop out:
In the double loop in solveF, you could do
for j in range(0,i)
since if j > i you do nothing. That won't save you much time because there were no calculations done, but it is something that could be improved.
Could you rewrite your equation so that you don't calculate the transpose of f? that could be computationally intensive if f is big.
I'm not a python expert, so this might be stupid, but I would avoid using "sum" and "diagonal". Sometimes (take this with a grain of salt) this generic functions have to do a lot of checks to ensure the operation can be done.
If this is of the utmost importance, and worth the effort, I would add timers at different parts of the code to time which part is the bottleneck. If there is a bottleneck.
Hope this helps.
Related
I am trying to do a quite memory intensive multiplication and it seems that I am always filling up my RAM. The idea is that I have a 2D gaussian centered in (0,0) and then I have another 2D gaussian that changes its distance with respect to the (0,0) point in time. For each time I need to compute the product of the two gaussian on a specific grid and sum all over the indices. Basically I should end up at each timestep I should have *SUM{g1ij g2ij} and end up with a 1D array of the same length of time
The code here is just a pseudo-code. The problem is the creation of a 1001x1001x25000 array here xx[:,:,np.newaxis] which gives a huge array
import numpy as np
import numexpr as ne
def gaussian2d(x,y,x0,y0,x_std,y_std):
return np.exp(-(x-x0)**2/x_std**2-(y-y0)**2/y_std**2)
x = np.linspace(-5,5,1001)
y = np.linspace(-5,5,1001)
xx, yy = np.meshgrid(x,y)
g1 = gaussian2d(xx, yy, 0, 0, 0.25, 0.25)
x0 = np.random.rand(25000)
y0 = np.random.rand(25000)
X = np.subtract(xx[:,:,np.newaxis], x0)
Y = np.subtract(yy[:,:,np.newaxis], y0)
X_std = 0.75
Y_std = 0.75
temp = ne.evaluate('exp(-(X)**2/(2*X_std**2)-(Y)**2/(2*Y_std**2))')
final = np.sum(np.multiply(temp.T, g1), axis=(1,2))
A very slow alternative would be to just loop along the x0 length, but in future x0 may be as long as 100000 points. The other solution would be to reduce the grid. But in that case I would lose resolution and if the fixed function is not a gaussian but something different may affect calulations.
Any suggestion?
There is no need for all such HUGE arrays. xx and yy as well as X and Y contains 1001 times the same repeated line/columns which is a huge waste of memory! The RAM is a very scarce resource (both throughput and space) so you should avoid operating on very large array (so to use the CPU cache which are far much faster or even CPU registers). You can rewrite the code using loops and use a JIT compiler like Numba (or a static compiler like Cython) so to run this efficiently by removing all the big arrays. In fact, thinking about loops can help to optimize the operation further even in pure Numpy (see later). So Numba/Cython is not even needed. Here is a naive implementation:
import numpy as np
import numba as nb
#nb.njit('(f8[:], f8[:], i8, i8, f8[:,::1])', parallel=True)
def compute(x0, y0, N, T, g1):
X_std = 0.75
Y_std = 0.75
final = np.empty(T)
for i in nb.prange(T):
s = 0.0
for k in range(N):
for j in range(N):
X = x[k] - x0[i]
Y = y[j] - y0[i]
temp = np.exp(-(X)**2/(2*X_std**2)-(Y)**2/(2*Y_std**2))
s += g1[k, j] * temp
final[i] = s
return final
N = 1001
T = 25000
# [...] (same code as in the question without the big temporary arrays)
final = compute_clever(x0, y0, N, T, g1)
This code is much faster and use only a tiny amount of RAM compared to the initial code that could not even run on a regular PC. The same strategy can be used to compute g1 so not to create xx and yy.
Actually, the above code is not even optimized. On can split the exponential expression in two parts so to pre-compute partial results using only basic math. The computation can then be factorized to reduce the number of mathematical operations even more. Here is a better implementation:
#nb.njit('(f8[:], f8[:], i8, i8, f8[:,::1])', parallel=True)
def compute_clever(x0, y0, N, T, g1):
X_std = 0.75
Y_std = 0.75
final = np.empty(T)
exp1 = np.empty((T, N))
exp2 = np.empty((T, N))
for i in nb.prange(T):
for k in range(N):
X = x[k] - x0[i]
exp1[i, k] = np.exp(-(X)**2/(2*X_std**2))
for i in nb.prange(T):
for j in range(N):
Y = y[j] - y0[i]
exp2[i, j] = np.exp(-(Y)**2/(2*Y_std**2))
for i in nb.prange(T):
s = 0.0
for k in range(N):
s2 = 0.0
for j in range(N):
s2 += g1[k, j] * exp2[i, j]
s += s2 * exp1[i, k]
final[i] = s
return final
Here are results with N=1001 and T=250 on my 6-core machine:
Naive Numpy: 2380 ms (use about 4 GiB of RAM)
compute: 374 ms (use few MiB of RAM)
compute_clever: 55 ms (use few MiB of RAM)
Note that the code can be further optimized using register blocking though it will make the code more complex. Also note that the last kernel can certainly be computed efficiently using np.einsum. exp1 and exp2 can also be computed using basic Numpy operation (though it will be a bit less efficient). Thus, you could even solve this using a pure Numpy code.
So, I need help minimizing the time it takes to run the code with large numbers of data only by using NumPy. I think the for loops made my code inefficient.. But I do not know how to make the for loop into a list comprehension, which might help it run faster..
def lagrange(p,node,n,x):
m=[]
#base lagrange polynomial
for i in range(n):
for j in range(p+1):
L=1
for k in range(p+1):
if k!=j:
L= L*(x[i] - node[k])/(node[j] - node[k])
m.append(L)
lagrange= np.array(m).reshape(n,p+1)
return lagrange
def interpolant(a,b,p,n,x,f):
m=[]
node=np.linspace(a,b,p+1)
for j in range(n):
polynomial=0
for i in range(p+1):
polynomial += f(node[i]) * lagrange(p,node,n,x)
m.append(polynomial)
interpolant = np.array(inter)
return interpolant
It appears the value of lagrange_poly(...) is recomputed n*(p+1) times for no reason which is very very expensive! You can compute it once before the loop, store it in a variable and reuse the variable later.
Here is the fixed code:
def uniform_poly_interpolation(a,b,p,n,x,f,produce_fig):
inter=[]
xhat=np.linspace(a,b,p+1)
#use for loop to iterate interpolant.
mat = lagrange_poly(p,xhat,n,x,1e-10)[0]
for j in range(n):
po=0
for i in range(p+1):
po += f(xhat[i]) * mat[i,j]
inter.append(po)
interpolant = np.array(inter)
return interpolant
This should be much much faster.
Moreover, the execution is slow because accessing scalar values of Numpy arrays from CPython is very slow. Numpy is designed to work with array and not to extract scalar values in loops. Additionally, the loop CPython interpreter are relatively slow. You can solve this problem efficiently with Numba that compile your code to a very fast native code using a JIT-compiler.
Here is the Numba code:
import numba as nb
#nb.njit
def lagrange_poly(p, xhat, n, x, tol):
error_flag = 0
er = 1
lagrange_matrix = np.empty((n, p+1), dtype=np.float64)
for l in range(p):
if abs(xhat[l] - xhat[l+1]) < tol:
error_flag = er
# Base lagrange polynomial
for i in range(n):
for j in range(p+1):
L = 1.0
for k in range(p+1):
if k!=j:
L = L * (x[i] - xhat[k]) / (xhat[j] - xhat[k])
lagrange_matrix[i, j] = L
return lagrange_matrix, error_flag
Overall, this should be several order of magnitude faster.
I tried implementing the forward substitution method, a solving process to solve the problem Lx = b with L being a lower triangle matrix and x,b as vectors.
This was an easy task:
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = 0;
for k in range(0,i):
index = L[i,k]
preSolution = x[k]
comp = comp + index * preSolution
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Now I compared my calculation times for different sized matrices several times with linalg.solve from the scipy module and it turns out that it is much faster. This makes sense in some points, since SciPy is written in C and C++, but I still expected similar or better calculation times for matrices up to 10x10 dimension. Beginning with 6x6 matrices, linalg.solves becomes slightly faster on average.
Is there a way to improve my rather simple solution?
You could try solve_triangular
If you want to accelerate your code, what you could do is to vectorize the inner loop.
def tri_solve(L,b):
n = len(b)
x = np.zeros(n)
x[0] = b[0]/L[0,0];
for i in range(1,n):
comp = np.sum(L[i,:i] * x[:i])
x[i] = 1/L[i,i] * (b[i] - comp)
return x;
Edit: How to use it
You have to pass as first argument a square lower triangular matrix and as second argument you can pass a 1D array
N = 20
A = np.tril(np.random.randn(N, N))
b = np.random.randn(N)
assert np.allclose(np.linalg.solve(A, b), tri_solve(A, b))
Of course this is a naive implementation and is not stable, you can't use it to solve very large or ill conditioned systems.
I am trying to solve the dynamics of a network composed of N=400 neurons.
That means I have 400 coupled equations that obey the following rules:
i = 0,1,2...399
J(i,j) = some function of i and j (where j is a dummy variable)
I(i) = some function of i
dr(i,t)/dt = -r(i,t) + sum over j from 0 to 399[J(i,j)*r(j)] + I(i)
How do I solve?
I know that for a system of 3 odes. I defined the 3 odes and the initial conditions and then apply odeint. Is there a better way to perform in this case?
So far I tried the following code (it isn't good since it enters an infinite loop):
N=400
t=np.linspace(0,20,1000)
J0=0.5
J1=2.5
I0=0.5
I1=0.001
i=np.arange(0,400,1)
theta=(2*np.pi)*i/N
I=I0+I1*cos(theta)
r=np.zeros(400)
x0 = [np.random.rand() for ii in i]
def threshold(y):
if y>0:
return y
else:
return 0
def vectors(x,t):
for ii in i:
r[ii]=x[ii]
for ii in i:
drdt[ii] = -r[ii] + threshold(I[ii]+sum(r[iii]*(J0+J1*cos(theta[ii]-theta[iii]))/N for iii in i))
return drdt
x=odeint(vectors,x0,t)
After making what I think are the obvious corrections and additions to your code, I was able to run it. It was not actually in an infinite loop, it was just very slow. You can greatly improve the performance by "vectorizing" your calculations as much as possible. This allows the loops to be computed in C code rather than Python. A hint that there is room for a lot of improvement is in the expression sum over j from 0 to 399[J(i,j)*r(j)]. That is another way of expressing the product of a matrix J and a vector r. So we should really have something like J # r in the code, and not all those explicit Python loops.
After some more tweaking, here's a modified version of your code. It is significantly faster than the original. I also reorganized a bit, and added a plot.
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def computeIJ(N):
i = np.arange(N)
theta = (2*np.pi)*i/N
I0 = 0.5
I1 = 0.001
I = I0 + I1*np.cos(theta)
J0 = 0.5
J1 = 2.5
delta_theta = np.subtract.outer(theta, theta)
J = J0 + J1*np.cos(delta_theta)
return I, J / N
def vectors2(r, t, I, J):
s = J # r
drdt = -r + np.maximum(I + s, 0)
return drdt
N = 400
I, J = computeIJ(N)
np.random.seed(123)
r0 = np.random.rand(N)
t = np.linspace(0, 20, 1000)
r = odeint(vectors2, r0, t, args=(I, J))
for i in [0, 100, 200, 300, 399]:
plt.plot(t, r[:, i], label='i = %d' % i)
plt.xlabel('t')
plt.legend(shadow=True)
plt.grid()
plt.show()
Here's the plot generated by the script:
I have 10k data points like this:
0.010222
0.010345
0.010465
0.010611
0.010768
0.010890
0.011049
0.011206
0.011329
0.011465
0.011613
0.11763
0.011888
0.012015
0.012154
0.012282
0.012408
0.012524
....
I want to calculate Lyapunov exponent for that. This is what I've done so far:
lyapunovs = []
eps = 0.0001
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(1, min(N - i, N - j)):
d0 = np.abs(data[i] - data[j])
dn = np.abs(data[i + k] - data[j + k])
lyapunovs.append(math.log(dn) - math.log(d0)) # problem
My problem is that I don't know first Lyapunov exponent is average of all the lyapunovs when k = 1 or average of all the lyapunovs for the first time that data[i] - data[j] < eps?
Is this right implementation for Lyapunov exponent?
And this is the Numerical Calculation of Lyapunov Exponent
I would calculate the Lyapunov Exponent in this way and then output the results as tuples in a file see blog:
https://blog.abhranil.net/2014/07/22/calculating-the-lyapunov-exponent-of-a-time-series-with-python-code/:
from math import log
import numpy as np
with open('data.txt', 'r') as f:
data = [float(i) for i in f.read().split()]
N = len(data)
eps = 0.001
lyapunovs = [[] for i in range(N)]
for i in range(N):
for j in range(i + 1, N):
if np.abs(data[i] - data[j]) < eps:
for k in range(min(N - i, N - j)):
lyapunovs[k].append(log(np.abs(data[i+k] - data[j+k])))
with open('lyapunov.txt', 'w') as f:
for i in range(len(lyapunovs)):
if len(lyapunovs[i]):
string = str((i, sum(lyapunovs[i]) / len(lyapunovs[i])))
f.write(string + '\n')
I see from the chosen loop structure in the question that a triangle of the Cartesian product of the points is being used. This might improve the estimate of the derivatives, which are susceptible to noise, but it is not part of the Lyapunov exponent explicitly. See this example of the calculations on a known function in the absence of measurement error. Feel free to look into that aspect more, but below I will assume the comparison of signal points adjacent in time.
Your original question uses NumPy, so I will also make use of it. One of the rules of thumb to using NumPy well is to avoid loops, although it is possible to vectorize functions that contain loops. With no explicit time measurements, and no repeated values, you could simply do:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
np.mean(np.log(np.abs(np.diff(x))))
Or if the signal is paired with an array of timepoints, then the numerical derivative can involve time:
import numpy as np
x = np.random.normal(0,1,size=10**4) # Mock signal data
t = np.arange(10**4) # Mock time data
np.mean(np.log(np.abs(np.diff(x) / np.diff(t))))
However, in some datasets it is possible for adjacent values to repeat! This can occur when you've measured the signal only to a few decimal places, and it is a problem because it leads to np.log(0) (=-np.inf) which will blow up your calculation. A simple solution is to remove duplicated values, but this will only be suitable if duplicates are relatively rare and you have a large sample size. It is possible to estimate an upper bound on the estimate of the L-exponent by considering the precision of your measurements, but that is not the estimate of the L-exponent itself.
I just want to mention that knowing the literal expression is the best.
I will take an example with the logistic map equation :
def logisticmap(x_init, r, length):
x = [x_init]
for t in range(length):
x.append(r*x[-1]*(1-x[-1]))
return np.array(x)
Now let's generate the data :
x = logistic(0.2, 3.92, 1000)
plt.plot(x)
plt.show()
Plot logistic map
Here is the proposed solution by Galan,
np.mean(np.log(abs(np.diff(x))))
Which gives : -1.0379
When you derive the Lyapunov exponent from the logistic map equation :
np.mean(np.log(abs(r*(1-2*x))))
It gives : 0.538296
Which is the actual true value for the Lyapunov, since the system is in its chaotic regime it must be positive, so I guess the evaluation from data points is not working in this example, you can try with more data points, but it will still give you a negative LE.
Unfortunately I don't know enough to guide you towards a better estimation for the Lyapunov if you can't derive a mathematical expression, but I would be intersted to know !
I tried to reduce computational complexity with numpy vectorization.
def lyapunov_exponent(series: np.array, threshold: float): -> np.array
N = len(series)
eps = threshold
L = [np.array([0]*N)]
for i in range(1, N):
diff = np.abs(series[i:]-series[:-i])
dist = np.log(diff)
L.append(np.concatenate([[0]*i, dist]))
L = np.array(L)
tf_L = np.where(L<eps, 1, 0)
count_L = np.zeros_like(tf_L)
for i in range(N):
indices = ( np.array(range(0,N-i)), np.array(range(i,N)) )
count_L[indices] = np.cumsum(tf_L[indices])
avg = np.sum(count_L * L, axis=0) / np.sum(count_L, axis=0)
return avg
If there is room for improvement or you get some different result than already answered, please reply.