I am doing a double loop to sum a function that has mesh grids as an input. The problem is that it runs very slow... I want to optimize the code with an alternative procedure, maybe using vectorize function of numpy, but I don't see how can be implemented. I show you the code that I have:
import numpy as np
import time
Lxx = 2.
Lyy = 1.0
dxx = dyy = 0.01
nxx = 100
nyy = 100
XX, YY = np.meshgrid(np.arange(0, Lxx+dxx, dxx), np.arange(0, Lyy+dyy, dyy)) #mesh grid
def solution(xx,yy,nnmax,mmmax):
sol = 0.
for m in range(nnmax):
for n in range(mmmax):
sol = sol+np.sin(XX*0.356*n)+np.cos(YY*2.3*m)
return sol
start = time.time()
solution(XX,YY,nxx,nyy)
end = time.time()
print ("TIME", end-start)
What I want is to make the sum for large values in nxx, nyy. But of course then it takes a lot of time...This is the reason why I want optimize the code.
If you notice, the terms of the sum are completely separable: they don't share any loop variables. You can therefore create independent (smaller) arrays for the sum over XX, n and YY, m, and take the trig functions and sum of those. The final grid can be accumulated by broadcasting.
To begin, don't bother making the grid:
x = np.arange(0, Lxx+dxx, dxx)
y = np.arange(0, Lyy+dyy, dyy)
Compute a single sum using broadcasting:
n = np.arange(nyy)[:, None]
m = np.arange(nxx)[:, None]
sumx = np.sin(x + 0.356 * n).sum(0)
sumy = np.cos(y + 2.3 * m).sum(0)
You can use the same broadcasting trick to get the final sum in a grid:
result = sumx[:, None] + sumy
Related
I want to write a program which turns a 2nd order differential equation into two ordinary differential equations but I don't know how I can do that in Python.
I am getting lots of errors, please help in writing the code correctly.
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
import numpy as np
N = 30 # Number of coupled oscillators.
alpha=0.25
A = 1.0
# Initial positions.
y[0] = 0 # Fix the left-hand side at zero.
y[N+1] = 0 # Fix the right-hand side at zero.
# The range(1,N+1) command only prints out [1,2,3, ... N].
for p in range(1, N+1): # p is particle number.
y[p] = A * np.sin(3 * p * np.pi /(N+1.0))
####################################################
# Initial velocities.
####################################################
v[0] = 0 # The left and right boundaries are
v[N+1] = 0 # clamped and don't move.
# This version sets them all the particle velocities to zero.
for p in range(1, N+1):
v[p] = 0
w0 = [v[p], y[p]]
def accel(t,w):
v[p], y[p] = w
global a
a[0] = 0.0
a[N+1] = 0.0
# This version loops explicitly over all the particles.
for p in range(1,N+1):
a[p] = [v[p], y(p+1)+y(p-1)-2*y(p)+ alpha * ((y[p+1] - y[p])**2 - (y[p] - y[p-1])**2)]
return a
duration = 50
t = np.linspace(0, duration, 800)
abserr = 1.0e-8
relerr = 1.0e-6
solution = solve_ivp(accel, [0, duration], w0, method='RK45', t_eval=t,
vectorized=False, dense_output=True, args=(), atol=abserr, rtol=relerr)
Most general-purpose solvers do not do structured state objects. They just work with a flat array as representation of the state space points. From the construction of the initial point you seem to favor the state space ordering
[ v[0], v[1], ... v[N+1], y[0], y[1], ..., y[N+1] ]
This allows to simply split both and to assemble the derivatives vector from the velocity and acceleration arrays.
Let's keep things simple and separate functionality in small functions
a = np.zeros(N+2)
def accel(y):
global a ## initialized to the correct length with zero, avoids repeated allocation
a[1:-1] = y[2:]+y[:-2]-2*y[1:-1] + alpha*((y[2:]-y[1:-1])**2-(y[1:-1]-y[:-2])**2)
return a
def derivs(t,w):
v,y = w[:N+2], w[N+2:]
return np.concatenate([accel(y), v])
or keeping the theme of avoiding allocations
dwdt = np.zeros(2*N+4)
def derivs(t,w):
global dwdt
v,y = w[:N+2], w[N+2:]
dwdt[:N+2] = accel(y)
dwdt[N+2:] = v
return dwdt
Now you only need to set
w0=np.concatenate([v,y])
to rapidly get to a more interesting class of errors.
The problem is that I would like to be able to integrate the differential equations starting for each point of the grid at once instead of having to loop over the scipy integrator for each coordinate. (I'm sure there's an easy way)
As background for the code I'm trying to solve the trajectories of a Couette flux alternating the direction of the velocity each certain period, that is a well known dynamical system that produces chaos. I don't think the rest of the code really matters as the part of the integration with scipy and my usage of the meshgrid function of numpy.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, writers
from scipy.integrate import solve_ivp
start_T = 100
L = 1
V = 1
total_run_time = 10*3
grid_points = 10
T_list = np.arange(start_T, 1, -1)
x = np.linspace(0, L, grid_points)
y = np.linspace(0, L, grid_points)
X, Y = np.meshgrid(x, y)
condition = True
totals = np.zeros((start_T, total_run_time, 2))
alphas = np.zeros(start_T)
i = 0
for T in T_list:
alphas[i] = L / (V * T)
solution = np.array([X, Y])
for steps in range(int(total_run_time/T)):
t = steps*T
if condition:
def eq(t, x):
return V * np.sin(2 * np.pi * x[1] / L), 0.0
condition = False
else:
def eq(t, x):
return 0.0, V * np.sin(2 * np.pi * x[1] / L)
condition = True
time_steps = np.arange(t, t + T)
xt = solve_ivp(eq, time_steps, solution)
solution = np.array([xt.y[0], xt.y[1]])
totals[i][t: t + T][0] = solution[0]
totals[i][t: t + T][1] = solution[1]
i += 1
np.save('alphas.npy', alphas)
np.save('totals.npy', totals)
The error given is :
ValueError: y0 must be 1-dimensional.
And it comes from the 'solve_ivp' function of scipy because it doesn't accept the format of the numpy function meshgrid. I know I could run some loops and get over it but I'm assuming there must be a 'good' way to do it using numpy and scipy. I accept advice for the rest of the code too.
Yes, you can do that, in several variants. The question remains if it is advisable.
To implement a generally usable ODE integrator, it needs to be abstracted from the models. Most implementations do that by having the state space a flat-array vector space, some allow a vector space engine to be passed as parameter, so that structured vector spaces can be used. The scipy integrators are not of this type.
So you need to translate the states to flat vectors for the integrator, and back to the structured state for the model.
def encode(X,Y): return np.concatenate([X.flatten(),Y.flatten()])
def decode(U): return U.reshape([2,grid_points,grid_points])
Then you can implement the ODE function as
def eq(t,U):
X,Y = decode(U)
Vec = V * np.sin(2 * np.pi * x[1] / L)
if int(t/T)%2==0:
return encode(Vec, np.zeros(Vec.shape))
else:
return encode(np.zeros(Vec.shape), Vec)
with initial value
U0 = encode(X,Y)
Then this can be directly integrated over the whole time span.
Why this might be not such a good idea: Thinking of each grid point and its trajectory separately, each trajectory has its own sequence of adapted time steps for the given error level. In integrating all simultaneously, the adapted step size is the minimum over all trajectories at the given time. Thus while the individual trajectories might have only short intervals with very small step sizes amid long intervals with sparse time steps, these can overlap in the ensemble to result in very small step sizes everywhere.
If you go beyond the testing stage, switch to a more compiled solver implementation, odeint is a Fortran code with wrappers, so half a solution. JITcode translates to C code and links with the compiled solver behind odeint. Leaving python you get sundials, the diffeq module of julia-lang, or boost::odeint.
TL;DR
I don't think you can "integrate the differential equations starting for each point of the grid at once".
MWE
Please try to provide a MWE to reproduce your problem, like you said : "I don't think the rest of the code really matters", and it makes it harder for people to understand your problem.
Understanding how to talk to the solver
Before answering your question, there are several things that seem to be misunderstood :
by defining time_steps = np.arange(t, t + T) and then calling solve_ivp(eq, time_steps, solution) : the second argument of solve_ivp is the time span you want the solution for, ie, the "start" and "stop" time as a 2-uple. Here your time_steps is 30-long (for the first loop), so I would probably replace it by (t, t+T). Look for t_span in the doc.
from what I understand, it seems like you want to control each iteration of the numerical resolution : that's not how solve_ivp works. More over, I think you want to switch the function "eq" at each iteration. Since you have to pass the "the right hand side" of the equation, you need to wrap this behavior inside a function. It would not work (see right after) but in terms of concept something like this:
def RHS(t, x):
# unwrap your variables, condition is like an additional variable of your problem,
# with a very simple differential equation
x0, x1, condition = x
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x[1] / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x[1] / L)
# compute new result for condition
condition_out = not(condition)
return [x0_out, x1_out, condition_out]
This would not work because the evolution of condition doesn't satisfy some mathematical properties of derivation/continuity. So condition is like a boolean switch that parametrizes the model, we can use global to control the state of this boolean :
condition = True
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
finaly, and this is the ValueError you mentionned in your post : you define solution = np.array([X, Y]) which actually is initial condition and supposed to be "y0: array_like, shape (n,)" where n is the number of variable of the problem (in the case of [x0_out, x1_out] that would be 2)
A MWE for a single initial condition
All that being said, lets start with a simple MWE for a single starting point (0.5,0.5), so we have a clear view of how to use the solver :
import numpy as np
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
# initial conditions for x0, x1, and condition
initial = [0.5, 0.5]
condition = True
# time span
t_span = (0, 100)
# constants
V = 1
L = 1
# define the "model", ie the set of equations of t
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
solution = solve_ivp(RHS_eq, # Right Hand Side of the equation(s)
t_span, # time span, a 2-uple
initial, # initial conditions
)
fig, ax = plt.subplots()
ax.plot(solution.t,
solution.y[0],
label="x0")
ax.plot(solution.t,
solution.y[1],
label="x1")
ax.legend()
Final answer
Now, what we want is to do the exact same thing but for various initial conditions, and from what I understand, we can't : again, quoting the doc
y0 : array_like, shape (n,) : Initial state. . The solver's initial condition only allows one starting point vector.
So to answer the initial question : I don't think you can "integrate the differential equations starting for each point of the grid at once".
I tried to model voxels of 3D cylinder with the following code:
import math
import numpy as np
R0 = 500
hz = 1
x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)
xx, yy, zz = np.meshgrid(x, y, z)
def density_f(x, y, z):
r_xy = math.sqrt(x ** 2 + y ** 2)
if r_xy <= R0 and -hz <= z <= hz:
return 1
else:
return 0
density = np.vectorize(density_f)(xx, yy, zz)
and it took many minutes to compute.
Equivalent suboptimal Java code runs 10-15 seconds.
How to make Python compute voxels at the same speed? Where to optimize?
Please do not use .vectorize(..), it is not efficient since it will still do the processing at the Python level. .vectorize() should only be used as a last resort if for example the function can not be calculated in "bulk" because its "structure" is too complex.
But you do not need to use .vectorize here, you can implement your function to work over arrays with:
r_xy = np.sqrt(xx ** 2 + yy ** 2)
density = (r_xy <= R0) & (-hz <= zz) & (zz <= hz)
or even a bit faster:
r_xy = xx * xx + yy * yy
density = (r_xy <= R0 * R0) & (-hz <= zz) & (zz <= hz)
This will construct a 2000×2000×20 array of booleans. We can use:
intdens = density.astype(int)
to construct an array of ints.
Printing the array here is quite combersome, but it contains a total of 2'356'047 ones:
>>> density.astype(int).sum()
2356047
Benchmarks: If I run this locally 10 times, I get:
>>> timeit(f, number=10)
18.040479518999973
>>> timeit(f2, number=10) # f2 is the optimized variant
13.287886952000008
So on average, we calculate this matrix (including casting it to ints) in 1.3-1.8 seconds.
You can also use a compiled version of the function to calculate density. You can use cython or numba for that. I use numba to jit compile the density calculation function in the ans, as it is as easy as putting in a decorator.
Pros :
You can write if conditions as you mention in your comments
Slightly faster than the numpy version mentioned in the ans by
#Willem Van Onsem, as we have to iterate through the boolean array to
calculate the sum in density.astype(int).sum().
Cons:
Write an ugly three level loop. Looses the beauty of the singlish liner numpy solution.
Code:
import numba as nb
#nb.jit(nopython=True, cache=True)
def calc_density(xx, yy, zz, R0, hz):
threshold = R0 * R0
dimensions = xx.shape
density = 0
for i in range(dimensions[0]):
for j in range(dimensions[1]):
for k in range(dimensions[2]):
r_xy = xx[i][j][k] ** 2 + yy[i][j][k] ** 2
if(r_xy <= threshold and -hz <= zz[i][j][k] <= hz):
density+=1
return density
Running times:
Willem Van Onsem solution, f2 variant : 1.28s without sum, 2.01 with sum.
Numba solution( calc_density, on second run, to discount the compile time) : 0.48s.
As suggested in the comments, we need not calculate the meshgrid also. We can directly pass the x, y, z to the function. Thus:
#nb.jit(nopython=True, cache=True)
def calc_density2(x, y, z, R0, hz):
threshold = R0 * R0
dimensions = len(x), len(y), len(z)
density = 0
for i in range(dimensions[0]):
for j in range(dimensions[1]):
for k in range(dimensions[2]):
r_xy = x[i] ** 2 + y[j] ** 2
if(r_xy <= threshold and -hz <= z[k] <= hz):
density+=1
return density
Now, for fair comparison, we also include the time of np.meshgrid in #Willem Van Onsem's ans.
Running times:
Willem Van Onsem solution, f2 variant(np.meshgrid time included) : 2.24s
Numba solution( calc_density2, on second run, to discount the compile time) : 0.079s.
This is meant as a lengthy comment on the answer of Deepak Saini.
The main change is to not use the coordinates generated by np.meshgrid which contains unecessary repetitions. This isn't recommandable if you can avoid it (both in terms of memory usage and performance)
Code
import numba as nb
import numpy as np
#nb.jit(nopython=True,parallel=True)
def calc_density_2(x, y, z,R0,hz):
threshold = R0 * R0
density = 0
for i in nb.prange(y.shape[0]):
for j in range(x.shape[0]):
r_xy = x[j] ** 2 + y[i] ** 2
for k in range(z.shape[0]):
if(r_xy <= threshold and -hz <= z[k] <= hz):
density+=1
return density
Timings
R0 = 500
hz = 1
x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)
xx, yy, zz = np.meshgrid(x, y, z)
#after the first call (compilation overhead)
#calc_density_2 9.7 ms
#calc_density_2 parallel 3.9 ms
##Deepak Saini 115 ms
I am trying to implement a finite difference approximation to solve the Heat Equation, u_t = k * u_{xx}, in Python using NumPy.
Here is a copy of the code I am running:
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# parameters
L = 1 # legnth of the rod
T = 10 # terminal time
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution u(x,t) to u_t = k * u_xx
u = np.zeros((N, M+1)) # array to store values of the solution
# Finite Difference Scheme:
u[:,0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1,m] = 0 # Boundary condition
elif j == N-1:
u[j+1,m] = 0
else:
u[j,m+1] = u[j,m] + s * ( u[j+1,m] -
2 * u[j,m] + u[j-1,m] )
print u, #t, x
plt.plot(u, t)
#plt.show()
I think my code is working properly and it is producing an output. I want to plot the output of the solution u versus t (my time vector). If I can plot the graph then I am able to check if my numerical approximation agrees with the expected phenomena for the Heat Equation. However, I am getting the error that "x and y must have same first dimension". How can I correct this issue?
An additional question: Am I better off attempting to make an animation with matplotlib.animation instead of using matplotlib.plyplot ???
Thanks so much for any and all help! It is very greatly appreciated!
Okay so I had a "brain dump" and tried plotting u vs. t sort of forgetting that u, being the solution to the Heat Equation (u_t = k * u_{xx}), is defined as u(x,t) so it has values for time. I made the following correction to my code:
print u #t, x
plt.plot(u)
plt.show()
And now my programming is finally displaying an image. And here it is:
It is absolutely beautiful, isn't it?
I wonder if there is a possibility to specify the shift expressed by k variable for the cross-correlation of two 1D arrays. Because with the numpy.correlate function and its mode parameter set to 'full' I will get cross-correlate coefficients for each k shift for whole length of the taken array (assuming that both arrays are the same size). Let me show you what I mean exactly on below example:
import numpy as np
# Define signal 1.
signal_1 = np.array([1, 2 ,3])
# Define signal 2.
signal_2 = np.array([1, 2, 3])
# Other definitions.
Xi = signal_1
Yi = signal_2
N = np.size(Xi)
k = 3
Xs = np.average(Xi)
Ys = np.average(Yi)
# Cross-covariance coefficient function.
def crossCovariance(Xi, Yi, N, k, Xs, Ys, forCorrelation = False):
autoCov = 0
for i in np.arange(0, N-k):
autoCov += ((Xi[i+k])-Xs)*(Yi[i]-Ys)
if forCorrelation == True:
return autoCov/N
else:
return (1/(N-1))*autoCov
# Expected value function.
def E(X, P):
expectedValue = 0
for i in np.arange(0, np.size(X)):
expectedValue += X[i] * (P[i] / np.size(X))
return expectedValue
# Cross-correlation coefficient function.
def crossCorrelation(Xi, Yi, k):
# Calculate the covariance coefficient.
cov = crossCovariance(Xi, Yi, N, k, Xs, Ys, forCorrelation = True)
# Calculate standard deviations.
EX = E(Xi, np.ones(np.size(Xi)))
SDX = (E((Xi - EX) ** 2, np.ones(np.size(Xi)))) ** (1/2)
EY = E(Yi, np.ones(np.size(Yi)))
SDY = (E((Yi - EY) ** 2, np.ones(np.size(Yi)))) ** (1/2)
# Calculate correlation coefficient.
return cov / (SDX * SDY)
# Express cross-covariance or cross-correlation function in a form of a 1D vector.
def array(k, norm = True):
# If norm = True, return array of autocorrelation coefficients.
# If norm = False, return array of autocovariance coefficients.
vector = np.array([])
shifts = np.abs(np.arange(-k, k+1, 1))
for i in shifts:
if norm == True:
vector = np.append(crossCorrelation(Xi, Yi, i), vector)
else:
vector = np.append(crossCovariance(Xi, Yi, N, i, Xs, Ys), vector)
return vector
In my example, calling the method array(k, norm = True) for different values of k will give resuslt as I shown below:
k = 3, [ 0. -0.5 0. 1. 0. -0.5 0. ]
k = 2, [-0.5 0. 1. 0. -0.5]
k = 1, [ 0. 1. 0.]
k = 0, [ 1.]
My approach is good for the learning purposes but I need to move to the native numpy functions in order to speed up my analysis. How one could specify the k shift value while using the native numpy.correlate function? PS k parameter specify the "time" shift between two arrays. Thank you in advance.
Whilst I'm not aware of any built-in function for computing the cross-correlation for a particular range of signal lags, you can speed your version up a lot by vectorization, i.e. performing operations on arrays rather than single elements in an array.
This version uses only a single Python loop over the lags:
import numpy as np
def xcorr(x, y, k, normalize=True):
n = x.shape[0]
# initialize the output array
out = np.empty((2 * k) + 1, dtype=np.double)
lags = np.arange(-k, k + 1)
# pre-compute E(x), E(y)
mu_x = x.mean()
mu_y = y.mean()
# loop over lags
for ii, lag in enumerate(lags):
# use slice indexing to get 'shifted' views of the two input signals
if lag < 0:
xi = x[:lag]
yi = y[-lag:]
elif lag > 0:
xi = x[:-lag]
yi = y[lag:]
else:
xi = x
yi = y
# x - mu_x; y - mu_y
xdiff = xi - mu_x
ydiff = yi - mu_y
# E[(x - mu_x) * (y - mu_y)]
out[ii] = xdiff.dot(ydiff) / n
# NB: xdiff.dot(ydiff) == (xdiff * ydiff).sum()
if normalize:
# E[(x - mu_x) * (y - mu_y)] / (sigma_x * sigma_y)
out /= np.std(x) * np.std(y)
return lags, out
Some more general points of advice:
As I mentioned in the comments, you should try to give your functions names that are informative, and that aren't likely to conflict with other things in your namespace (e.g. array vs np.array).
It's much better to make your functions self-contained. In your version, N, k, Xs and Ys are defined outside the main function. In this situation you might accidentally modify or overwrite one of these variables, and it can get tricky to debug errors caused by this sort of thing.
Appending to numpy arrays (e.g. using np.append or np.concatenate) is slow, so avoid it whenever you can. If, as in this case, you know the size of the output ahead of time, it's much faster to pre-allocate the output array (e.g. using np.empty or np.zeros), then fill in the elements. If you absolutely have to do concatenation, it's often faster to append to a normal Python list, then convert it to a numpy array at the end.
It's available by specifying maxlags:
import matplotlib.pyplot as plt
xcorr = plt.xcorr(signal_1, signal_2, maxlags=1)
Documentation can be found here. This implementation is based on np.correlate.