I have a 3-Dimensional numpy array A. I would like to multiply every element A[i,j,k] by w*( i / Lx + j / Ly + k / Lz ) where w, Lx, Ly and Lz are real numbers (floats). Performing this operation in a for loop is highly impractical, since I need to be able to scale this for large arrays and a for loop over the three indices ijk scales in O(N^3).
Is there an efficient way to perform an operation on each element of a numpy array that cares about index?
You can use broadcasting -
M,N,R = A.shape
p1 = np.arange(M)[:,None,None]/Lx
p2 = np.arange(N)[:,None]/Ly
p3 = np.arange(R)/Lz
out = A/(w*(p1 + p2 + p3))
You can also use np.ix_ for a more elegant solution -
M,N,R = A.shape
X,Y,Z = np.ix_(np.arange(M),np.arange(N),np.arange(R))
out = A/(w*((X/Lx) + (Y/Ly) + (Z/Lz)))
Runtime tests and output verification -
Function definitions:
def vectorized_app1(A, w, Lx, Ly, Lz ):
M,N,R = A.shape
p1 = np.arange(M)[:,None,None]/Lx
p2 = np.arange(N)[:,None]/Ly
p3 = np.arange(R)/Lz
return A/(w*(p1 + p2 + p3))
def vectorized_app2(A, w, Lx, Ly, Lz ):
M,N,R = A.shape
X,Y,Z = np.ix_(np.arange(M),np.arange(N),np.arange(R))
return A/(w*((X/Lx) + (Y/Ly) + (Z/Lz)))
def original_app(A, w, Lx, Ly, Lz ):
out = np.empty_like(A)
M,N,R = A.shape
for i in range(M):
for j in range(N):
for k in range(R):
out[i,j,k] = A[i,j,k]/(w*( (i / Lx) + (j / Ly) + (k / Lz) ))
return out
Timings:
In [197]: # Inputs
...: A = np.random.rand(100,100,100)
...: w, Lx, Ly, Lz = 2.3, 3.2, 4.2, 5.2
...:
In [198]: np.allclose(original_app(A,w,Lx,Ly,Lz),vectorized_app1(A,w,Lx,Ly,Lz))
Out[198]: True
In [199]: np.allclose(original_app(A,w,Lx,Ly,Lz),vectorized_app2(A,w,Lx,Ly,Lz))
Out[199]: True
In [200]: %timeit original_app(A, w, Lx, Ly, Lz )
1 loops, best of 3: 1.39 s per loop
In [201]: %timeit vectorized_app1(A, w, Lx, Ly, Lz )
10 loops, best of 3: 24.6 ms per loop
In [202]: %timeit vectorized_app2(A, w, Lx, Ly, Lz )
10 loops, best of 3: 24.2 ms per loop
Related
This seems more of a direct question. I will generalize it a bit at the end.
I am trying to this function in numpy. I have been successful using nested for loops but I can't think of a numpy way to do it.
My way of implementation:
bs = 10 # batch_size
nb = 8 # number of bounding boxes
nc = 15 # number of classes
bbox = np.random.random(size=(bs, nb, 4)) # model output bounding boxes
p = np.random.random(size=(bs, nb, nc)) # model output probability
p = softmax(p, axis=-1)
s_rand = np.random.random(size=(nc, nc))
s = (s_rand + s_rand.T)/2 # similarity matrix
pp = np.random.random(size=(bs, nb, nc)) # proposed probability
pp = softmax(pp, axis=-1)
first_term = 0
for b in range(nb):
for b_1 in range(nb):
if b_1 == b:
continue
for l in range(nc):
for l_1 in range(nc):
first_term += (s[l, l_1] * (pp[:, b, l] - pp[:, b_1, l_1])**2)
second_term = 0
for b in range(nb):
for l in range(nc):
second_term += (np.linalg.norm(s[l, :], ord=1) * (pp[:, b, l] - p[:, b, l])**2)
second_term *= nb
epsilon = 0.5
output = ((1 - epsilon) * first_term) + (epsilon * second_term)
I have tried hard to remove the loops and use np.tile and np.repeat instead, in order to achieve the task. But can't think of a possible way.
I have tried searching google for finding exercises like such which can help me learn such conversions in numpy but wasn't successful.
P_hat.shape is (B,L), S.shape is (L,L), P.shape is (B,L).
array_before_sum = S[None,:,None,:]*(P_hat[:,:,None,None]- P_hat[None,None,:,:])**2
array_after_sum = array_before_sum.sum(axis=(1,3))
array_sum_again = (array_after_sum*(1-np.ones((B,B)))).sum()
first_term = (1-epsilon)*array_sum_again
second_term = epsilon*(B*np.abs(S).sum(axis=1)[None,:]*(P_hat - P)**2).sum()
I think you can do both with einsum
first_term = np.einsum('km, ijklm -> i', s, (pp[..., None, None] - pp[:, None, None, ...])**2 )
second_term = np.einsum('k, ijk -> i', np.linalg.norm(s, axis = 1), (pp - p)**2 )
Now there's a problem: that ijklm tensor in first_term is going to get huge if nb and nc get large. You should probably distribute it so that you get 3 smaller tensors:
first_term = np.einsum('km, ijk, ijk -> i', s, pp, pp) +\
np.einsum('km, ilm, ilm -> i', s, pp, pp) -\
2 * np.einsum('km, ijk, ilm -> i', s, pp, pp)
This takes advantage of the fact that (a-b)**2 = a**2 + b**2 - 2ab to allow you to break the problem into three parts that can each be done in one step with the dot product
Maximally optimized code: (removal of first two loops is inspired from L.Iridium's answer)
squared_diff = (pp[:, :, None, :, None] - pp[:, None, :, None, :]) ** 2
weighted_diff = s * squared_diff
b_eq_b_1_removed = b.sum(axis=(3,4)) * (1 - np.eye(nb))
first_term = b_eq_b_1_removed.sum(axis=(1,2))
normalized_s = np.linalg.norm(s, ord=1, axis=1)
squared_diff = (pp - p)**2
second_term = nb * (normalized_s * squared_diff).sum(axis=(1,2))
loss = ((1 - epsilon) * first_term) + (epsilon * second_term)
Timeit track:
512 µs ± 13 µs per loop
Timeit track of code posted in question:
62.5 ms ± 197 µs per loop
That's a huge improvement.
Given two NumPy arrays, say:
import numpy as np
import numpy.random as rand
n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))
Is there a more efficient way to compute X in the following:
X = np.zeros((n, n))
for i in range(n):
for j in range(n):
X[i, j] = 1 * np.all(x[i] == y[j])
Approach #1 : Input arrays with 0s & 1s
For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -
n = x.shape[1]
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)
Approach #2 : Generic case
For a generic case, we can use views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)
Runtime test
# Setup
In [287]: np.random.seed(0)
...: n = 1000
...: x = rand.binomial(n=1, p=.5, size=(n, 10))
...: y = rand.binomial(n=1, p=.5, size=(n, 10))
# Original approach
In [288]: %%timeit
...: X = np.zeros((n, n))
...: for i in range(n):
...: for j in range(n):
...: X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop
# Approach #1
In [290]: %%timeit
...: n = x.shape[1]
...: s = 2**np.arange(n)
...: x1D = x.dot(s)
...: y1D = y.dot(s)
...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop
# Approach #2
In [291]: %%timeit
...: x1D, y1D = view1D(x, y)
...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop
I am working on a personal project which involves predicting weather pattern movements from radar. I have three n by m numpy arrays; one with precipitation intensity values, one with the movement (in pixels) in the X direction of that precipitation and one with the movement (in pixels) in the Y direction of that precipitation. I want to use these three arrays to determine the location of the precipitation pixels using the offsets in the other two arrays.
xMax = currentReflectivity.shape[0]
yMax = currentReflectivity.shape[1]
for x in xrange(currentReflectivity.shape[0]):
for y in xrange(currentReflectivity.shape[1]):
targetPixelX = xOffsetArray[x,y] + x
targetPixelY = yOffsetArray[x,y] + y
targetPixelX = int(targetPixelX)
targetPixelY = int(targetPixelY)
if targetPixelX < xMax and targetPixelY < yMax:
interpolatedReflectivity[targetPixelX,targetPixelY] = currentReflectivity[x,y]
I can't think of a way to vectorize this; any ideas?
Here's a vectorized approach making use of broadcasting -
x_arr = np.arange(currentReflectivity.shape[0])[:,None]
y_arr = np.arange(currentReflectivity.shape[1])
targetPixelX_arr = (xOffsetArray[x_arr, y_arr] + x_arr).astype(int)
targetPixelY_arr = (yOffsetArray[x_arr, y_arr] + y_arr).astype(int)
valid_mask = (targetPixelX_arr < xMax) & (targetPixelY_arr < yMax)
R = targetPixelX_arr[valid_mask]
C = targetPixelY_arr[valid_mask]
interpolatedReflectivity[R,C] = currentReflectivity[valid_mask]
Runtime test
Approaches -
def org_app(currentReflectivity, xOffsetArray, yOffsetArray):
m,n = currentReflectivity.shape
interpolatedReflectivity = np.zeros((m,n))
xMax = currentReflectivity.shape[0]
yMax = currentReflectivity.shape[1]
for x in xrange(currentReflectivity.shape[0]):
for y in xrange(currentReflectivity.shape[1]):
targetPixelX = xOffsetArray[x,y] + x
targetPixelY = yOffsetArray[x,y] + y
targetPixelX = int(targetPixelX)
targetPixelY = int(targetPixelY)
if targetPixelX < xMax and targetPixelY < yMax:
interpolatedReflectivity[targetPixelX,targetPixelY] = \
currentReflectivity[x,y]
return interpolatedReflectivity
def broadcasting_app(currentReflectivity, xOffsetArray, yOffsetArray):
m,n = currentReflectivity.shape
interpolatedReflectivity = np.zeros((m,n))
xMax, yMax = m,n
x_arr = np.arange(currentReflectivity.shape[0])[:,None]
y_arr = np.arange(currentReflectivity.shape[1])
targetPixelX_arr = (xOffsetArray[x_arr, y_arr] + x_arr).astype(int)
targetPixelY_arr = (yOffsetArray[x_arr, y_arr] + y_arr).astype(int)
valid_mask = (targetPixelX_arr < xMax) & (targetPixelY_arr < yMax)
R = targetPixelX_arr[valid_mask]
C = targetPixelY_arr[valid_mask]
interpolatedReflectivity[R,C] = currentReflectivity[valid_mask]
return interpolatedReflectivity
Timings and verification -
In [276]: # Setup inputs
...: m,n = 100,110 # currentReflectivity.shape
...: max_r = 120 # xOffsetArray's extent
...: max_c = 130 # yOffsetArray's extent
...:
...: currentReflectivity = np.random.rand(m, n)
...: xOffsetArray = np.random.randint(0,max_r,(m, n))
...: yOffsetArray = np.random.randint(0,max_c,(m, n))
...:
In [277]: out1 = org_app(currentReflectivity, xOffsetArray, yOffsetArray)
...: out2 = broadcasting_app(currentReflectivity, xOffsetArray, yOffsetArray)
...: print np.allclose(out1, out2)
...:
True
In [278]: %timeit org_app(currentReflectivity, xOffsetArray, yOffsetArray)
100 loops, best of 3: 6.86 ms per loop
In [279]: %timeit broadcasting_app(currentReflectivity, xOffsetArray, yOffsetArray)
1000 loops, best of 3: 212 µs per loop
In [280]: 6860.0/212 # Speedup number
Out[280]: 32.35849056603774
I am pretty sure that you can vectorize this by just taking everything out of the loop:
targetPixelX = (xOffsetArray + np.arange(xMax).reshape(xMax, 1)).astype(np.int)
targetPixelY = (yOffsetArray + np.arange(yMax)).astype(np.int)
mask = ((targetPixelX < xMax) & (targetPixelY < yMax))
interpolatedReflectivity[mask] = currentReflectivity[mask]
This will be much faster but more memory intensive. Basically, targetPixelX and targetPixelY are now arrays containing the values for each pixel that were before computed on a per-iteration basis.
Only the masked values are set in interpolatedReflectivity, similarly to what the if statement was doing in the loop.
I have a 3D numpy array of shape (t, n1, n2):
x = np.random.rand(10, 2, 4)
I need to calculate another 3D array y which is of shape (t, n1, n1) such that:
y[0] = np.cov(x[0,:,:])
...and so on for all slices along the first axis.
So, a loopy implementation would be:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
But it didn't work.
Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Runtime test
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Huge speedup there!
I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:
def _func(a, b, radius):
"Return 0 if a>b, otherwise return 1"
if distance.euclidean(a, b) < radius:
return 1
else:
return 0
def _make_mask(volume, roi, radius):
mask = numpy.zeros(volume.shape)
for x in range(volume.shape[0]):
for y in range(volume.shape[1]):
for z in range(volume.shape[2]):
mask[x, y, z] = _func((x, y, z), roi, radius)
return mask
Where volume.shape (182, 218, 200) and roi.shape (3,) are both ndarray types; and radius is an int
Approach #1
Here's a vectorized approach -
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
mask = X**2 + Y**2 + Z**2 < radius**2
Possible improvement : We can probably speedup the last step with numexpr module -
import numexpr as ne
mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
Approach #2
We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi on the fly without actually creating the meshes as done earlier with np.mgrid. This would be benefited by the use of broadcasting for efficiency purposes. The implementation would look like this -
m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
mask = vals < radius**2
Simplified version : Thanks to #Bi Rico for suggesting an improvement here as we can use np.ogrid to perform those operations in a bit more concise manner, like so -
m,n,r = volume.shape
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
mask = (x**2+y**2+z**2) < radius**2
Runtime test
Function definitions -
def vectorized_app1(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return X**2 + Y**2 + Z**2 < radius**2
def vectorized_app1_improved(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.mgrid[0:m,0:n,0:r]
X = x - roi[0]
Y = y - roi[1]
Z = z - roi[2]
return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
def vectorized_app2(volume, roi, radius):
m,n,r = volume.shape
vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \
((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)
return vals < radius**2
def vectorized_app2_simplified(volume, roi, radius):
m,n,r = volume.shape
x,y,z = np.ogrid[0:m,0:n,0:r]-roi
return (x**2+y**2+z**2) < radius**2
Timings -
In [106]: # Setup input arrays
...: volume = np.random.rand(90,110,100) # Half of original input sizes
...: roi = np.random.rand(3)
...: radius = 3.4
...:
In [107]: %timeit _make_mask(volume, roi, radius)
1 loops, best of 3: 41.4 s per loop
In [108]: %timeit vectorized_app1(volume, roi, radius)
10 loops, best of 3: 62.3 ms per loop
In [109]: %timeit vectorized_app1_improved(volume, roi, radius)
10 loops, best of 3: 47 ms per loop
In [110]: %timeit vectorized_app2(volume, roi, radius)
100 loops, best of 3: 4.26 ms per loop
In [139]: %timeit vectorized_app2_simplified(volume, roi, radius)
100 loops, best of 3: 4.36 ms per loop
So, as always broadcasting showing its magic for a crazy almost 10,000x speedup over the original code and more than 10x better than creating meshes by using on-the-fly broadcasted operations!
Say you first build an xyzy array:
import itertools
xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]
Now, using numpy.linalg.norm,
np.linalg.norm(xyz - roi, axis=1) < radius
checks whether the distance for each tuple from roi is smaller than radius.
Finally, just reshape the result to the dimensions you need.