This problem is similar to a former problem answered Fast interpolation over 3D array, but cannot solve my problem.
I have a 4D array with dimensions of (time,altitude,latitude, longitude), marked as y.shape=(nt, nalt, nlat, nlon). The x is altitude and change with (time, latitude, longtitude), which means x.shape = (nt, nalt, nlat, nlon). I want to interpolate in altitude for every (nt, nlat, nlon). The interpolated x_new should be 1d, not change with (time, latitude, longtitude).
I use numpy.interp, same as scipy.interpolate.interp1d and think about the answers in former post. I cannot reduced the loops with those answers.
I can only do like this:
# y is a 4D ndarray
# x is a 4D ndarray
# new_y is a 4D array
for i in range(nlon):
for j in range(nlat):
for k in range(nt):
y_new[k,:,j,i] = np.interp(new_x, x[k,:,j,i], y[k,:,j,i])
These loops make this interpolation too slow to calculation. Would someone have good ideas? Help will be highly appreciated.
Here is my solution by using numba, it's about 3x faster.
create the test data first, x need to in ascending order:
import numpy as np
rows = 200000
cols = 66
new_cols = 69
x = np.random.rand(rows, cols)
x.sort(axis=-1)
y = np.random.rand(rows, cols)
nx = np.random.rand(new_cols)
nx.sort()
do 200000 times interp in numpy:
%%time
ny = np.empty((x.shape[0], len(nx)))
for i in range(len(x)):
ny[i] = np.interp(nx, x[i], y[i])
I use merge method instead of binary search method, because nx is in order, and the length of nx is about the same as x.
interp() use binary search, the time complexity is O(len(nx)*log2(len(x))
merge method: the time complexity is O(len(nx) + len(x))
Here is the numba code:
import numba
#numba.jit("f8[::1](f8[::1], f8[::1], f8[::1], f8[::1])")
def interp2(x, xp, fp, f):
n = len(x)
n2 = len(xp)
j = 0
i = 0
while x[i] <= xp[0]:
f[i] = fp[0]
i += 1
slope = (fp[j+1] - fp[j])/(xp[j+1] - xp[j])
while i < n:
if x[i] >= xp[j] and x[i] < xp[j+1]:
f[i] = slope*(x[i] - xp[j]) + fp[j]
i += 1
continue
j += 1
if j + 1 == n2:
break
slope = (fp[j+1] - fp[j])/(xp[j+1] - xp[j])
while i < n:
f[i] = fp[n2-1]
i += 1
#numba.jit("f8[:, ::1](f8[::1], f8[:, ::1], f8[:, ::1])")
def multi_interp(x, xp, fp):
nrows = xp.shape[0]
f = np.empty((nrows, x.shape[0]))
for i in range(nrows):
interp2(x, xp[i, :], fp[i, :], f[i, :])
return f
Then call the numba function:
%%time
ny2 = multi_interp(nx, x, y)
To check the result:
np.allclose(ny, ny2)
On my pc, the time is:
python version: 3.41 s
numba version: 1.04 s
This method need an array that the last axis is the axis to be interp().
Related
I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?
I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.
I tried to model voxels of 3D cylinder with the following code:
import math
import numpy as np
R0 = 500
hz = 1
x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)
xx, yy, zz = np.meshgrid(x, y, z)
def density_f(x, y, z):
r_xy = math.sqrt(x ** 2 + y ** 2)
if r_xy <= R0 and -hz <= z <= hz:
return 1
else:
return 0
density = np.vectorize(density_f)(xx, yy, zz)
and it took many minutes to compute.
Equivalent suboptimal Java code runs 10-15 seconds.
How to make Python compute voxels at the same speed? Where to optimize?
Please do not use .vectorize(..), it is not efficient since it will still do the processing at the Python level. .vectorize() should only be used as a last resort if for example the function can not be calculated in "bulk" because its "structure" is too complex.
But you do not need to use .vectorize here, you can implement your function to work over arrays with:
r_xy = np.sqrt(xx ** 2 + yy ** 2)
density = (r_xy <= R0) & (-hz <= zz) & (zz <= hz)
or even a bit faster:
r_xy = xx * xx + yy * yy
density = (r_xy <= R0 * R0) & (-hz <= zz) & (zz <= hz)
This will construct a 2000×2000×20 array of booleans. We can use:
intdens = density.astype(int)
to construct an array of ints.
Printing the array here is quite combersome, but it contains a total of 2'356'047 ones:
>>> density.astype(int).sum()
2356047
Benchmarks: If I run this locally 10 times, I get:
>>> timeit(f, number=10)
18.040479518999973
>>> timeit(f2, number=10) # f2 is the optimized variant
13.287886952000008
So on average, we calculate this matrix (including casting it to ints) in 1.3-1.8 seconds.
You can also use a compiled version of the function to calculate density. You can use cython or numba for that. I use numba to jit compile the density calculation function in the ans, as it is as easy as putting in a decorator.
Pros :
You can write if conditions as you mention in your comments
Slightly faster than the numpy version mentioned in the ans by
#Willem Van Onsem, as we have to iterate through the boolean array to
calculate the sum in density.astype(int).sum().
Cons:
Write an ugly three level loop. Looses the beauty of the singlish liner numpy solution.
Code:
import numba as nb
#nb.jit(nopython=True, cache=True)
def calc_density(xx, yy, zz, R0, hz):
threshold = R0 * R0
dimensions = xx.shape
density = 0
for i in range(dimensions[0]):
for j in range(dimensions[1]):
for k in range(dimensions[2]):
r_xy = xx[i][j][k] ** 2 + yy[i][j][k] ** 2
if(r_xy <= threshold and -hz <= zz[i][j][k] <= hz):
density+=1
return density
Running times:
Willem Van Onsem solution, f2 variant : 1.28s without sum, 2.01 with sum.
Numba solution( calc_density, on second run, to discount the compile time) : 0.48s.
As suggested in the comments, we need not calculate the meshgrid also. We can directly pass the x, y, z to the function. Thus:
#nb.jit(nopython=True, cache=True)
def calc_density2(x, y, z, R0, hz):
threshold = R0 * R0
dimensions = len(x), len(y), len(z)
density = 0
for i in range(dimensions[0]):
for j in range(dimensions[1]):
for k in range(dimensions[2]):
r_xy = x[i] ** 2 + y[j] ** 2
if(r_xy <= threshold and -hz <= z[k] <= hz):
density+=1
return density
Now, for fair comparison, we also include the time of np.meshgrid in #Willem Van Onsem's ans.
Running times:
Willem Van Onsem solution, f2 variant(np.meshgrid time included) : 2.24s
Numba solution( calc_density2, on second run, to discount the compile time) : 0.079s.
This is meant as a lengthy comment on the answer of Deepak Saini.
The main change is to not use the coordinates generated by np.meshgrid which contains unecessary repetitions. This isn't recommandable if you can avoid it (both in terms of memory usage and performance)
Code
import numba as nb
import numpy as np
#nb.jit(nopython=True,parallel=True)
def calc_density_2(x, y, z,R0,hz):
threshold = R0 * R0
density = 0
for i in nb.prange(y.shape[0]):
for j in range(x.shape[0]):
r_xy = x[j] ** 2 + y[i] ** 2
for k in range(z.shape[0]):
if(r_xy <= threshold and -hz <= z[k] <= hz):
density+=1
return density
Timings
R0 = 500
hz = 1
x = np.arange(-1000, 1000, 1)
y = np.arange(-1000, 1000, 1)
z = np.arange(-10, 10, 1)
xx, yy, zz = np.meshgrid(x, y, z)
#after the first call (compilation overhead)
#calc_density_2 9.7 ms
#calc_density_2 parallel 3.9 ms
##Deepak Saini 115 ms
I have the following code:
# positions: np.ndarray of shape(N,d)
# fitness: np.ndarray of shape(N,)
# mass: np.ndarray of shape(N,)
iteration = 1
while iteration <= maxiter:
K = round((iteration-maxiter)*(N-1)/(1-maxiter) + 1)
for i in range(N):
displacement = positions[:K]-positions[i]
dist = np.linalg.norm(displacement, axis=-1)
if i<K:
dist[i] = 1.0 # prevent 1/0
force_i = (mass[:K]/dist)[:,np.newaxis]*displacement
rand = np.random.rand(K,1)
force[i] = np.sum(np.multiply(rand,force_i), axis=0)
So I have an array that stores the coordinates of N particles in d dimensions. I need to first calculate the euclidean distance between particle i and the first K particles, and then calculate the 'force' due to each of the K particles. Then, I need to sum over K particles to find the total force acting on particle i, and repeat for all N particles. It is only parts of the code, but after some profiling this part is the most time-critical step.
So my question is how I can optimize the above code. I have tried to vectorize it as much as possible, and I'm not sure if there is still room for improvement. The profiling results say that {method 'reduce' of 'numpy.ufunc' objects}, fromnumeric.py:1778(sum) and linalg.py:2103(norm) take the longest time to run. Is the first one die to array broadcasting? How can I optimize these three function calls?
We would keep the loops, but try to optimize by pre-computing certain things -
from scipy.spatial.distance import cdist
iteration = 1
while iteration <= maxiter:
K = round((iteration-maxiter)*(N-1)/(1-maxiter) + 1)
posd = cdist(positions,positions)
np.fill_diagonal(posd,1)
rands = np.random.rand(N,K)
s = rands*(mass[:K]/posd[:,:K])
for i in range(N):
displacement = positions[:K]-positions[i]
force[i] = s[i].dot(displacement)
I had to make some adjustments since your code was missing a few parts. But the first optimization would be to get rid of the for i in range(N) loop:
import numpy as np
np.random.seed(42)
N = 10
d = 3
maxiter = 50
positions = np.random.random((N, d))
force = np.random.random((N, d))
fitness = np.random.random(N)
mass = np.random.random(N)
iteration = 1
while iteration <= maxiter:
K = round((iteration-maxiter)*(N-1)/(1-maxiter) + 1)
displacement = positions[:K, None]-positions[None, :]
dist = np.linalg.norm(displacement, axis=-1)
dist[dist == 0] = 1
force = np.sum((mass[:K, None, None]/dist[:,:,None])*displacement * np.random.rand(K,N,1), axis=0)
iteration += 1
Other improvements would be to try faster implementations of the norm, such as scipy.cdist or numpy.einsum
I am trying to implement a finite difference approximation to solve the Heat Equation, u_t = k * u_{xx}, in Python using NumPy.
Here is a copy of the code I am running:
## This program is to implement a Finite Difference method approximation
## to solve the Heat Equation, u_t = k * u_xx,
## in 1D w/out sources & on a finite interval 0 < x < L. The PDE
## is subject to B.C: u(0,t) = u(L,t) = 0,
## and the I.C: u(x,0) = f(x).
import numpy as np
import matplotlib.pyplot as plt
# parameters
L = 1 # legnth of the rod
T = 10 # terminal time
N = 10
M = 100
s = 0.25
# uniform mesh
x_init = 0
x_end = L
dx = float(x_end - x_init) / N
x = np.arange(x_init, x_end, dx)
x[0] = x_init
# time discretization
t_init = 0
t_end = T
dt = float(t_end - t_init) / M
t = np.arange(t_init, t_end, dt)
t[0] = t_init
# Boundary Conditions
for m in xrange(0, M):
t[m] = m * dt
# Initial Conditions
for j in xrange(0, N):
x[j] = j * dx
# definition of solution u(x,t) to u_t = k * u_xx
u = np.zeros((N, M+1)) # array to store values of the solution
# Finite Difference Scheme:
u[:,0] = x**2 #initial condition
for m in xrange(0, M):
for j in xrange(1, N-1):
if j == 1:
u[j-1,m] = 0 # Boundary condition
elif j == N-1:
u[j+1,m] = 0
else:
u[j,m+1] = u[j,m] + s * ( u[j+1,m] -
2 * u[j,m] + u[j-1,m] )
print u, #t, x
plt.plot(u, t)
#plt.show()
I think my code is working properly and it is producing an output. I want to plot the output of the solution u versus t (my time vector). If I can plot the graph then I am able to check if my numerical approximation agrees with the expected phenomena for the Heat Equation. However, I am getting the error that "x and y must have same first dimension". How can I correct this issue?
An additional question: Am I better off attempting to make an animation with matplotlib.animation instead of using matplotlib.plyplot ???
Thanks so much for any and all help! It is very greatly appreciated!
Okay so I had a "brain dump" and tried plotting u vs. t sort of forgetting that u, being the solution to the Heat Equation (u_t = k * u_{xx}), is defined as u(x,t) so it has values for time. I made the following correction to my code:
print u #t, x
plt.plot(u)
plt.show()
And now my programming is finally displaying an image. And here it is:
It is absolutely beautiful, isn't it?
I have 4-dimensional data, say for the temperature, in an numpy.ndarray.
The shape of the array is (ntime, nheight_in, nlat, nlon).
I have corresponding 1D arrays for each of the dimensions that tell me which time, height, latitude, and longitude a certain value corresponds to, for this example I need height_in giving the height in metres.
Now I need to bring it onto a different height dimension, height_out, with a different length.
The following seems to do what I want:
ntime, nheight_in, nlat, nlon = t_in.shape
nheight_out = len(height_out)
t_out = np.empty((ntime, nheight_out, nlat, nlon))
for time in range(ntime):
for lat in range(nlat):
for lon in range(nlon):
t_out[time, :, lat, lon] = np.interp(
height_out, height_in, t[time, :, lat, lon]
)
But with 3 nested loops, and lots of switching between python and numpy, I don't think this is the best way to do it.
Any suggestions on how to improve this? Thanks
scipy's interp1d can help:
import numpy as np
from scipy.interpolate import interp1d
ntime, nheight_in, nlat, nlon = (10, 20, 30, 40)
heights = np.linspace(0, 1, nheight_in)
t_in = np.random.normal(size=(ntime, nheight_in, nlat, nlon))
f_out = interp1d(heights, t_in, axis=1)
nheight_out = 50
new_heights = np.linspace(0, 1, nheight_out)
t_out = f_out(new_heights)
I was looking for a similar function that works with irregularly spaced coordinates, and ended up writing my own function. As far as I see, the interpolation is handled nicely and the performance in terms of memory and speed is also quite good. I thought I'd share it here in case anyone else comes across this question looking for a similar function:
import numpy as np
import warnings
def interp_along_axis(y, x, newx, axis, inverse=False, method='linear'):
""" Interpolate vertical profiles, e.g. of atmospheric variables
using vectorized numpy operations
This function assumes that the x-xoordinate increases monotonically
ps:
* Updated to work with irregularly spaced x-coordinate.
* Updated to work with irregularly spaced newx-coordinate
* Updated to easily inverse the direction of the x-coordinate
* Updated to fill with nans outside extrapolation range
* Updated to include a linear interpolation method as well
(it was initially written for a cubic function)
Peter Kalverla
March 2018
--------------------
More info:
Algorithm from: http://www.paulinternet.nl/?page=bicubic
It approximates y = f(x) = ax^3 + bx^2 + cx + d
where y may be an ndarray input vector
Returns f(newx)
The algorithm uses the derivative f'(x) = 3ax^2 + 2bx + c
and uses the fact that:
f(0) = d
f(1) = a + b + c + d
f'(0) = c
f'(1) = 3a + 2b + c
Rewriting this yields expressions for a, b, c, d:
a = 2f(0) - 2f(1) + f'(0) + f'(1)
b = -3f(0) + 3f(1) - 2f'(0) - f'(1)
c = f'(0)
d = f(0)
These can be evaluated at two neighbouring points in x and
as such constitute the piecewise cubic interpolator.
"""
# View of x and y with axis as first dimension
if inverse:
_x = np.moveaxis(x, axis, 0)[::-1, ...]
_y = np.moveaxis(y, axis, 0)[::-1, ...]
_newx = np.moveaxis(newx, axis, 0)[::-1, ...]
else:
_y = np.moveaxis(y, axis, 0)
_x = np.moveaxis(x, axis, 0)
_newx = np.moveaxis(newx, axis, 0)
# Sanity checks
if np.any(_newx[0] < _x[0]) or np.any(_newx[-1] > _x[-1]):
# raise ValueError('This function cannot extrapolate')
warnings.warn("Some values are outside the interpolation range. "
"These will be filled with NaN")
if np.any(np.diff(_x, axis=0) < 0):
raise ValueError('x should increase monotonically')
if np.any(np.diff(_newx, axis=0) < 0):
raise ValueError('newx should increase monotonically')
# Cubic interpolation needs the gradient of y in addition to its values
if method == 'cubic':
# For now, simply use a numpy function to get the derivatives
# This produces the largest memory overhead of the function and
# could alternatively be done in passing.
ydx = np.gradient(_y, axis=0, edge_order=2)
# This will later be concatenated with a dynamic '0th' index
ind = [i for i in np.indices(_y.shape[1:])]
# Allocate the output array
original_dims = _y.shape
newdims = list(original_dims)
newdims[0] = len(_newx)
newy = np.zeros(newdims)
# set initial bounds
i_lower = np.zeros(_x.shape[1:], dtype=int)
i_upper = np.ones(_x.shape[1:], dtype=int)
x_lower = _x[0, ...]
x_upper = _x[1, ...]
for i, xi in enumerate(_newx):
# Start at the 'bottom' of the array and work upwards
# This only works if x and newx increase monotonically
# Update bounds where necessary and possible
needs_update = (xi > x_upper) & (i_upper+1<len(_x))
# print x_upper.max(), np.any(needs_update)
while np.any(needs_update):
i_lower = np.where(needs_update, i_lower+1, i_lower)
i_upper = i_lower + 1
x_lower = _x[[i_lower]+ind]
x_upper = _x[[i_upper]+ind]
# Check again
needs_update = (xi > x_upper) & (i_upper+1<len(_x))
# Express the position of xi relative to its neighbours
xj = (xi-x_lower)/(x_upper - x_lower)
# Determine where there is a valid interpolation range
within_bounds = (_x[0, ...] < xi) & (xi < _x[-1, ...])
if method == 'linear':
f0, f1 = _y[[i_lower]+ind], _y[[i_upper]+ind]
a = f1 - f0
b = f0
newy[i, ...] = np.where(within_bounds, a*xj+b, np.nan)
elif method=='cubic':
f0, f1 = _y[[i_lower]+ind], _y[[i_upper]+ind]
df0, df1 = ydx[[i_lower]+ind], ydx[[i_upper]+ind]
a = 2*f0 - 2*f1 + df0 + df1
b = -3*f0 + 3*f1 - 2*df0 - df1
c = df0
d = f0
newy[i, ...] = np.where(within_bounds, a*xj**3 + b*xj**2 + c*xj + d, np.nan)
else:
raise ValueError("invalid interpolation method"
"(choose 'linear' or 'cubic')")
if inverse:
newy = newy[::-1, ...]
return np.moveaxis(newy, 0, axis)
And this is a small example to test it:
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d as scipy1d
# toy coordinates and data
nx, ny, nz = 25, 30, 10
x = np.arange(nx)
y = np.arange(ny)
z = np.tile(np.arange(nz), (nx,ny,1)) + np.random.randn(nx, ny, nz)*.1
testdata = np.random.randn(nx,ny,nz) # x,y,z
# Desired z-coordinates (must be between bounds of z)
znew = np.tile(np.linspace(2,nz-2,50), (nx,ny,1)) + np.random.randn(nx, ny, 50)*0.01
# Inverse the coordinates for testing
z = z[..., ::-1]
znew = znew[..., ::-1]
# Now use own routine
ynew = interp_along_axis(testdata, z, znew, axis=2, inverse=True)
# Check some random profiles
for i in range(5):
randx = np.random.randint(nx)
randy = np.random.randint(ny)
checkfunc = scipy1d(z[randx, randy], testdata[randx,randy], kind='cubic')
checkdata = checkfunc(znew)
fig, ax = plt.subplots()
ax.plot(testdata[randx, randy], z[randx, randy], 'x', label='original data')
ax.plot(checkdata[randx, randy], znew[randx, randy], label='scipy')
ax.plot(ynew[randx, randy], znew[randx, randy], '--', label='Peter')
ax.legend()
plt.show()
Following the criteria of numpy.interp, one can assign the left/right bounds to the points outside the range adding this lines after within_bounds = ...
out_lbound = (xi <= _x[0,...])
out_rbound = (_x[-1,...] <= xi)
and
newy[i, out_lbound] = _y[0, out_lbound]
newy[i, out_rbound] = _y[-1, out_rbound]
after newy[i, ...] = ....
If I understood well the strategy used by #Peter9192, I think the changes are in the same line. I've checked a little bit, but maybe some strange case could not work properly.