How can I vectorize this for loop in numpy? - python

The code is below:
import numpy as np
X = np.array(range(15)).reshape(5,3) # X's element value is meaningless
flag = np.random.randn(5,4)
y = np.array([0, 1, 2, 3, 0]) # Y's element value in range(flag.shape[1]) and Y.shape[0] equals X.shape[0]
dW = np.zeros((3, 4)) # dW.shape equals (X.shape[1], flag.shape[1])
for i in xrange(5):
for j in xrange(4):
if flag[i,j] > 0:
dW[:,j] += X[i,:].T
dW[:,y[i]] -= X[i,:].T
To compute dW more efficiently, how to vectorize this for loop?

Here's how I'd do it:
# has shape (x.shape[1],) + flag.shape
masked = np.where(flag > 0, X.T[...,np.newaxis], 0)
# sum over the i index
dW = masked.sum(axis=1)
# sum over the j index, np.s_[:,y], masked.sum(axis=2))
# dW[:,y] -= masked.sum(axis=2) does not work here
See the documentation of for an explanation of that last comment

Here's a vectorized approach based upon np.add.reduceat -
# --------------------- Setup output array ----------------------------------
dWOut = np.zeros((X.shape[1], flag.shape[1]))
# ------ STAGE #1 : Vectorize calculations for "dW[:,j] += X[i,:].T" --------
# Get indices where flag's transposed version has > 0
idx1 = np.argwhere(flag.T > 0)
# Row-extended version of X using idx1's col2 that corresponds to i-iterator
X_ext1 = X[idx1[:,1]]
# Get the indices at which we need to columns change
shift_idx1 = np.append(0,np.where(np.diff(idx1[:,0])>0)[0]+1)
# Use the changing indices as boundaries for add.reduceat to add
# groups of rows from extended version of X
dWOut[:,np.unique(idx1[:,0])] += np.add.reduceat(X_ext1,shift_idx1,axis=0).T
# ------ STAGE #2 : Vectorize calculations for "dW[:,y[i]] -= X[i,:].T" -------
# Repeat same philsophy for this second stage, except we need to index into y.
# So, that would involve sorting and also the iterator involved is just "i".
idx2 = idx1[idx1[:,1].argsort()]
cols_idx1 = y[idx2[:,1]]
X_ext2 = X[idx2[:,1]]
sort_idx = (y[idx2[:,1]]).argsort()
X_ext2 = X_ext2[sort_idx]
shift_idx2 = np.append(0,np.where(np.diff(cols_idx1[sort_idx])>0)[0]+1)
dWOut[:,np.unique(cols_idx1)] -= np.add.reduceat(X_ext2,shift_idx2,axis=0).T

You can do this:
ff = (flag > 0) * 1
ff = ff.reshape((5, 4, 1, 1))
XX = ff * X
[ii, jj] = np.meshgrid(np.arange(5), np.arange(4))
dW[:, jj] += XX[ii, jj, ii, :].transpose((2, 0, 1))
dW[:, y[ii]] -= XX[ii, jj, ii, :].transpose((2, 0, 1))
You can further merge and fold these expressions to get a one-liner but it won't add any more performance.
Update #1: Yep, sorry this is not giving correct results, I had a typo in my check


Faster way to threshold a 4-D numpy array

I have a 4D numpy array of size (98,359,256,269) that I want to threshold.
Right now, I have two separate lists that keep the coordinates of the first 2 dimension and the last 2 dimensions. (mag_ang for the first 2 dimensions and indices for the last 2).
size of indices : (61821,2)
size of mag_ang : (35182,2)
Currently, my code looks like this:
inner_points = []
for k in indices:
x = k[0]
y = k[1]
for i,ctr in enumerate(mag_ang):
mag = ctr[0]
ang = ctr[1]
if X[mag][ang][x][y] > 10:
This code works but it's pretty slow and I wonder if there's any more pythonic/faster way to do this?s
(EDIT: added a second alternate method)
Use numpy multi-array indexing:
import time
import numpy as np
n_mag, n_ang, n_x, n_y = 10, 12, 5, 6
shape = n_mag, n_ang, n_x, n_y
X = np.random.random_sample(shape) * 20
nb_indices = 100 # 61821
indices = np.c_[np.random.randint(0, n_x, nb_indices), np.random.randint(0, n_y, nb_indices)]
nb_mag_ang = 50 # 35182
mag_ang = np.c_[np.random.randint(0, n_mag, nb_mag_ang), np.random.randint(0, n_ang, nb_mag_ang)]
# original method
inner_points = []
start = time.time()
for x, y in indices:
for mag, ang in mag_ang:
if X[mag][ang][x][y] > 10:
inner_points.append((y, x))
end = time.time()
print(end - start)
# faster method 1:
inner_points_faster1 = []
start = time.time()
for x, y in indices:
if np.any(X[mag_ang[:, 0], mag_ang[:, 1], x, y] > 10):
inner_points_faster1.append((y, x))
end = time.time()
print(end - start)
# faster method 2:
start = time.time()
# note: depending on the real size of mag_ang and indices, you may wish to do this the other way round ?
found = X[:, :, indices[:, 0], indices[:, 1]][mag_ang[:, 0], mag_ang[:, 1], :] > 10
# 'found' shape is (nb_mag_ang x nb_indices)
assert found.shape == (nb_mag_ang, nb_indices)
matching_indices_mask = found.any(axis=0)
inner_points_faster2 = indices[matching_indices_mask, :]
end = time.time()
print(end - start)
# finally assert equality of findings
inner_points = np.unique(np.array(inner_points))
inner_points_faster1 = np.unique(np.array(inner_points_faster1))
inner_points_faster2 = np.unique(inner_points_faster2)
assert np.array_equal(inner_points, inner_points_faster1)
assert np.array_equal(inner_points, inner_points_faster2)
(of course if you increase the shape the time will not be zero for the second and third)
Final note: here I use "unique" at the end, but it would maybe be wise to do it upfront for the indices and mag_ang arrays (except if you are sure that they are unique already)
Use numpy directly. If indices and mag_ang are numpy arrays of two columns each for the appropriate coordinate:
(x, y), (mag, ang) = indices.T, mag_ang.T
index_matrix = np.meshgrid(mag, ang, x, y).T.reshape(-1,4)
inner_mag, inner_ang, inner_x, inner_y = np.where(X[index_matrix] > 10)
Now you the inner... variables hold arrays for each coordinate. To get a single list of pars you can zip the inner_y and inner_x.
Here are few vecorized ways leveraging broadcasting -
thresh = 10
mask = X[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]>thresh
r = np.where(mask)[0]
inner_points_out = indices[r][:,::-1]
For larger arrays, we can compare first and then index to get the mask -
mask = (X>thresh)[mag_ang[:,0],mag_ang[:,1],indices[:,0,None],indices[:,1,None]]
If you are only interested in the unique coordinates off indices, use the mask directly -
inner_points_out = indices[mask.any(1)][:,::-1]
For large arrays, we can also leverage multi-cores with numexpr module.
Thus, first off import the module -
import numexpr as ne
Then, replace (X>thresh) with ne.evaluate('X>thresh') in the computation(s) listed earlier.
Use np.where
inner = np.where(X > 10)
a, b, x, y = zip(*inner)
inner_points = np.vstack([y, x]).T

How to searchsorted 2D arrays for rows' insertion indices in Python?

I am searching a sorted array for the proper insertion indices of new data so that it remains sorted. Although searchsorted2d by #Divakar works great along column insertions, it just cannot work along rows. Is there a way to perform the same, yet along the rows?
The first idea that comes to mind is to adapt searchsorted2d for the desired behavior. However, that does not seem as easy as it appears. Here is my attempt at adapting it, but it still does not work when axis is set to 0.
import numpy as np
# By Divakar
# See
def searchsorted2d(a, b, axis=0):
shape = list(a.shape)
shape[axis] = 1
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = np.ceil(max_num) * np.arange(a.shape[1-axis]).reshape(shape)
p = np.searchsorted((a + r).ravel(), (b + r).ravel()).reshape(b.shape)
return p #- a.shape[axis] * np.arange(a.shape[1-axis]).reshape(shape)
axis = 0 # Operate along which axis?
n = 16 # vector size
# Initial array
a = np.random.rand(n).reshape((n, 1) if axis else (1, n))
insert_into_a = np.random.rand(n).reshape((n, 1) if axis else (1, n))
indices = searchsorted2d(a, insert_into_a, axis=axis)
a = np.insert(a, indices.ravel(), insert_into_a.ravel()).reshape(
(n, -1) if axis else (-1, n))
assert(np.all(a == np.sort(a, axis=axis))), 'Failed :('
print('Success :)')
I expect that the assertion passes in both cases (axis = 0 and axis = 1).

How to do vector-matrix multiplication with conditions?

I want to obtain a list (or array, doesn't matter) of A from the following formula:
A_i = X_(k!=i) * S_(k!=i) * X'_(k!=i)
X is a vector (and X' is the transpose of X), S is a matrix, and the subscript k is defined as {k=1,2,3,...n| k!=i}.
X = [x1, x2, ..., xn]
S = [[s11,s12,...,s1n],
[... ... ... ..]
I take the following as an example:
X = [0.1,0.2,0.3,0.5]
S = [[0.4,0.1,0.3,0.5],
So, eventually, I would get a list of four values for A.
I did this:
import numpy as np
x = np.array([0.1,0.2,0.3,0.5])
s = np.matrix([[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6],[0.4,0.1,0.3,0.5],[1,2,1.5,2.4,0.6]])
for k in range(x) if k!=i
A = (
print (A)
I am confused with how to use a conditional 'for' loop. Could you please help me to solve it? Thanks.
Just to explain more. If you take i=1, then the formula will be:
A_1 = X_(k!=1) * S_(k!=1) * X'_(k!=1)
So any array (or value) associated with subscript 1 will be deleted in X and S. like:
X = [0.2,0.3,0.5]
S = [[1.5,2.4,0.6]
Step 1: correctly calculate A_i
Step 2: collect them into A
I assume what you want to calculate is
An easy way to do so is to mask away the entries using masked arrays. This way we don't need to delete or copy any matrixes.
# sample
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
# we will use masked arrays to remove k=i
vec_mask = np.zeros_like(x)
matrix_mask = np.zeros_like(s)
i = 0 # start
# set masks
vec_mask[i] = 1
matrix_mask[i] = matrix_mask[:,i] = 1
s_mask =, mask=matrix_mask)
x_mask =, mask=vec_mask)
# reduced product, remember using instead np.inner
Ai =, s_mask), x_mask.T)
vec_mask[i] = 0
matrix_mask[i] = matrix_mask[:,i] = 0
As terms of 0 don't add to the sum, we actually can ignore masking the matrix and just mask the vector:
# we will use masked arrays to remove k=i
mask = np.zeros_like(x)
i = 0 # start
# set masks
mask[i] = 1
x_mask =, mask=mask)
# reduced product
Ai =, s), x_mask.T)
# unset mask
mask[i] = 0
The final step is to assemble A out of the A_is, so in total we get
x = np.array([1,2,3,4])
s = np.diag([4,5,6,7])
mask = np.zeros_like(x)
x_mask =, mask=mask)
A = []
for i in range(len(x)):
x_mask.mask[i] = 1
Ai =, s), x_mask.T)
x_mask.mask[i] = 0
A_vec = np.array(A)
Implementing a matrix/vector product using loops will be rather slow in Python. Therefore, I suggest to actually delete the rows/columns/elements at the given index and perform the fast built-in dot product without any explicit loops:
i = 0 # don't forget Python's indices are zero-based
x_ = np.delete(X, i) # remove element
s_ = np.delete(S, i, axis=0) # remove row
s_ = np.delete(s_, i, axis=1) # remove column
result = # no need to transpose a 1-D array

Numpy convert list of 3D variable size volumes to 4D array

I'm working on a neural network where I am augmenting data via rotation and varying the size of each input volume.
Let me back up, the input to the network is a 3D volume. I generate variable size 3D volumes, and then pad each volume with zero's such that the input volume is constant. Check here for an issue I was having with padding (now resolved).
I generate a variable size 3D volume, append it to a list, and then convert the list into a numpy array. At this point, padding hasn't occured so converting it into a 4D tuple makes no sense...
input_augmented_matrix = []
label_augmented_matrix = []
for i in range(n_volumes):
if i % 50 == 0:
print ("Augmenting step #" + str(i))
slice_index = randint(0,n_input)
z_max = randint(5,n_input)
z_rand = randint(3,5)
z_min = z_max - z_rand
x_max = randint(75, n_input_x)
x_rand = randint(60, 75)
x_min = x_max - x_rand
y_max = randint(75, n_input_y)
y_rand = randint(60, 75)
y_min = y_max - y_rand
random_rotation = randint(1,4) * 90
for j in range(2):
temp_volume = np.empty((z_rand, x_rand, y_rand))
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
temp_volume[k][l][m] = input_matrix[z][x][y]
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
temp_volume = np.asarray(temp_volume)
temp_volume = np.rot90(temp_volume,random_rotation)
if j == 0:
input_augmented_matrix = np.asarray(input_augmented_matrix)
label_augmented_matrix = np.asarray(label_augmented_matrix)
The dimensions of input_augmented_matrix at this point is (N,)
Then I pad with the following code...
for i in range(n_volumes):
print("Padding volume #" + str(i))
input_augmented_matrix[i] = np.lib.pad(input_augmented_matrix[i], ((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
label_augmented_matrix[i] = np.lib.pad(label_augmented_matrix[i], ((0,n_input_z - int(label_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(label_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(label_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
At this point, the dimensions are still (N,) even though every element of the list is constant. For example input_augmented_matrix[0] = input_augmented_matrix[1]
Currently I just loop through and create a new array, but it takes too long and I would prefer some sort of method that automates this. I do it with the following code...
input_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
label_4d = np.empty((n_volumes, n_input_z, n_input_x, n_input_y))
for i in range(n_volumes):
print("Converting to 4D tuple #" + str(i))
for j in range(n_input_z):
for k in range(n_input_x):
for l in range(n_input_y):
input_4d[i][j][k][l] = input_augmented_matrix[i][j][k][l]
label_4d[i][j][k][l] = label_augmented_matrix[i][j][k][l]
Is there a cleaner and faster way to do this?
As I understood this part
k = 0
for z in range(z_min, z_max):
l = 0
for x in range(x_min, x_max):
m = 0
for y in range(y_min, y_max):
if j == 0:
#input volume
temp_volume[k][l][m] = input_matrix[z][x][y]
#ground truth volume
temp_volume[k][l][m] = label_matrix[z][x][y]
m = m + 1
l = l + 1
k = k + 1
You just want to do this
temp_input = input_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
temp_label = label_matrix[z_min:z_max, x_min:x_max, y_min:y_max]
and then
temp_input = np.rot90(temp_input, random_rotation)
temp_label = np.rot90(temp_label, random_rotation)
input_augmented_matrix[i] = np.lib.pad(
((0,n_input_z - int(input_augmented_matrix[i][:,0,0].shape[0])),
(0,n_input_x - int(input_augmented_matrix[i][0,:,0].shape[0])),
(0,n_input_y - int(input_augmented_matrix[i][0,0,:].shape[0]))),
'constant', constant_values=0)
Better to do this, because shape property gives you size of array by all dimensions
ia_shape = input_augmented_matrix[i].shape
input_augmented_matrix[i] = np.lib.pad(
((0, n_input_z - ia_shape[0]),
(0, n_input_x - ia_shape[1])),
(0, n_input_y - ia_shape[2]))),
I guess now you're ready to refactor the last part of your code with magic indexing of NumPy.
My common suggestions:
use functions for repeated parts of code to avoid such indents like in your cascade of loops;
if you need so lot of nested loops, think about recursion, if you can't deal without them;
explore abilities of NumPy in official documentation: they're really exciting ;) For example, indexing is helpful for this task;
use PyLint and Flake8 packages to inspect quality of your code.
Do you want to write neural network by yourself, or you just want to solve some patterns recognition task? SciPy library may contain what you need and it's based on NumPy.

Efficient double iteration over array

I have the following code, where points is many lines by 3 cols list of lists, coorRadius is a radius within which I want to find the local coordinate maxima, and localCoordinateMaxima is an array where I store the i's of these maxima:
for i,x in enumerate(points):
check = 1
for j,y in enumerate(points):
if linalg.norm(x-y) <= coorRadius and x[2] < y[2]:
check = 0
if check == 1:
print localCoordinateMaxima
Unfortunately, this takes forever when I have several thousand points, I am looking for a way to speed it up. I tried to do it with if all() condition, however I didn't manage it and I am not even sure it will be more efficient. Could you guys propose a way to make it faster?
Your search for neighbors is best done using a KDTree.
from scipy.spatial import cKDTree
tree = cKDTree(points)
pairs = tree.query_pairs(coorRadius)
Now pairs is a set of two item tuples (i, j), where i < j and points[i] and points[j] are within coorRadius of each other. You can now simply iterate over these, which will likely be a much smaller set than the len(points)**2 you are currently iterating over:
is_maximum = [True] * len(points)
for i, j in pairs:
if points[i][2] < points[j][2]:
is_maximum[i] = False
elif points[j][2] < points[i][2]:
is_maximum[j] = False
localCoordinateMaxima, = np.nonzero(is_maximum)
This can be further sped up by vectorizing it:
pairs = np.array(list(pairs))
pairs = np.vstack((pairs, pairs[:, ::-1]))
pairs = pairs[np.argsort(pairs[:, 0])]
is_z_smaller = points[pairs[:, 0], 2] < points[pairs[:, 1], 2]
bins, = np.nonzero(pairs[:-1, 0] != pairs[1:, 0])
bins = np.concatenate(([0], bins+1))
is_maximum = np.logical_and.reduceat(is_z_smaller, bins)
localCoordinateMaxima, = np.nonzero(is_maximum)
The above code assumes that every point has at least one neighbor within coorRadius. If that is not the case, you need to slightly complicate things:
pairs = np.array(list(pairs))
pairs = np.vstack((pairs, pairs[:, ::-1]))
pairs = pairs[np.argsort(pairs[:, 0])]
is_z_smaller = points[pairs[:, 0], 2] < points[pairs[:, 1], 2]
bins, = np.nonzero(pairs[:-1, 0] != pairs[1:, 0])
has_neighbors = pairs[np.concatenate(([True], bins)), 0]
bins = np.concatenate(([0], bins+1))
is_maximum = np.ones((len(points),), bool)
is_maximum[has_neighbors] &= np.logical_and.reduceat(is_z_smaller, bins)
localCoordinateMaxima, = np.nonzero(is_maximum)
Here is the version of your code just tightened-up a bit:
for i, x in enumerate(points):
x2 = x[2]
for y in points:
if linalg.norm(x-y) <= coorRadius and x2 < y[2]:
print localCoordinateMaxima
Factor-out the x[2] lookup.
The j variable was unused.
Add a break for an early-out
Use a for-else construct instead of a flag variable
With numpy this is not too hard. You can do it with a single (long) expression, if you want:
import numpy as np
points = np.array(points)
localCoordinateMaxima = np.where(np.all((np.linalg.norm(points-points[None,:], axis=-1) >
coorRadius) |
(points[:,2] >= points[:,None,2]),
The algorithm your current code implements is essentially where(not(any(w <= x and y < z))). If you distribute the not through the logical operations inside of it (using Demorgan's laws), you can avoid one level of nesting by flipping the inequalities, getting where(all(w > x or y >= z))).
w is a matrix of norms applied to the differences of the points broadcast together. x is a constant. y and z are both arrays with the third coordinates of the points, shaped so that they broadcast together into the same shape as w.

