The situation is I'd like to take the following Python / NumPy code:
# Procure some data:
z = np.zeros((32,32))
chunks = []
for i in range(0,32,step):
for j in range(0,32,step):
chunks.append( z[i:i+step,j:j+step] )
chunks = np.array(chunks)
chunks.shape # (256, 2, 2)
And vectorize it / remove the for loops. Is this possible? I don't mind much about ordering of the final array, e.g. 256,2,2 vs 2,2,256, as long as the spatial structure remains the same. That is, blocks of 2x2 from the original array.
Perhaps some magic using :: in addition to regular indexing can do this? Any NumPy masters here?
You may need transpose:
a = np.arange(1024).reshape(32,32)
a.reshape(16,2,16,2).transpose((0,2,1,3)).reshape(-1,2,2)
Output:
array([[[ 0, 1],
[ 32, 33]],
[[ 2, 3],
[ 34, 35]],
[[ 4, 5],
[ 36, 37]],
...,
[[ 986, 987],
[1018, 1019]],
[[ 988, 989],
[1020, 1021]],
[[ 990, 991],
[1022, 1023]]])
This question already has answers here:
Most efficient way to map function over numpy array
(11 answers)
Closed 4 years ago.
Lets say I create a 3x3 NumPy Matrix. What is the best way to apply a function to all elements in the matrix, with out looping through each element if possible?
import numpy as np
def myFunction(x):
return (x * 2) + 3
myMatrix = np.matlib.zeros((4, 4))
# What is the best way to apply myFunction to each element in myMatrix?
EDIT: The current solutions proposed work great if the function is matrix-friendly, but what if it's a function like this that deals with scalars only?
def randomize():
x = random.randrange(0, 10)
if x < 5:
x = -1
return x
Would the only way be to loop through the matrix and apply the function to each scalar inside the matrix? I'm not looking for a specific solution (like how to randomize the matrix), but rather a general solution to apply a function over the matrix. Hope this helps!
This shows two possible ways of doing maths on a whole Numpy array without using an explicit loop:
import numpy as np
# Make a simple array with unique elements
m = np.arange(12).reshape((4,3))
# Looks like:
# array([[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8],
# [ 9, 10, 11]])
# Apply formula to all elements without loop
m = m*2 + 3
# Looks like:
# array([[ 3, 5, 7],
# [ 9, 11, 13],
# [15, 17, 19],
# [21, 23, 25]])
# Define a function
def f(x):
return (x*2) + 3
# Apply function to all elements
f(m)
# Looks like:
# array([[ 9, 13, 17],
# [21, 25, 29],
# [33, 37, 41],
# [45, 49, 53]])
If I have an ndarray like this:
>>> a = np.arange(27).reshape(3,3,3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know I can get the maximum along a certain axis using np.max(axis=...):
>>> a.max(axis=2)
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
Alternatively, I could get the indices along that axis which correspond to the maximum values from:
>>> indices = a.argmax(axis=2)
>>> indices
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
My question -- Given the array indices and the array a, is there an elegant way to reproduce the array the array returned by a.max(axis=2)?
This would probably work:
import itertools as it
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
#It seems highly likely that there is a more numpy-approved way to do this.
idx = [range(i) for i in indices.shape]
for idx_tup,zidx in zip(it.product(*idx),indices.flat):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
But, it seems pretty hacky/inefficient. It also doesn't allow for me to use this with any axis other than the "last" axis. Is there a numpy function (or some use of magical numpy indexing) to make this work? The naive a[:,:,a.argmax(axis=2)] doesn't work.
UPDATE:
It seems the following also works (and is a little nicer):
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
for idx_tup,zidx in np.ndenumerate(indices):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
I would like to do this because I would like to extract the indices based on the data in 1 array (typically using argmax(axis=...)) and use those indices to pull data out of a bunch of other (equivalently shaped) arrays. I'm open to alternative ways to accomplish this (e.g. using boolean masked arrays). However, I like the "safety" that I get using these "index" arrays. With this I am guaranteed to have the right number of elements to create a new array which looks like a 2d "slice" through the 3d field.
Here is some magic numpy indexing that will do what you want, but unfortunately it's pretty unreadable.
def apply_mask(a, indices, axis):
magic_index = [np.arange(i) for i in indices.shape]
magic_index = np.ix_(*magic_index)
magic_index = magic_index[:axis] + (indices,) + magic_index[axis:]
return a[magic_index]
or equally unreadable:
def apply_mask(a, indices, axis):
magic_index = np.ogrid[tuple(slice(i) for i in indices.shape)]
magic_index.insert(axis, indices)
return a[magic_index]
I use index_at() to create the full index:
import numpy as np
def index_at(idx, shape, axis=-1):
if axis<0:
axis += len(shape)
shape = shape[:axis] + shape[axis+1:]
index = list(np.ix_(*[np.arange(n) for n in shape]))
index.insert(axis, idx)
return tuple(index)
a = np.random.randint(0, 10, (3, 4, 5))
axis = 1
idx = np.argmax(a, axis=axis)
print a[index_at(idx, a.shape, axis=axis)]
print np.max(a, axis=axis)
I have a three dimensional ndarray of 2D coordinates, for example:
[[[1704 1240]
[1745 1244]
[1972 1290]
[2129 1395]
[1989 1332]]
[[1712 1246]
[1750 1246]
[1964 1286]
[2138 1399]
[1989 1333]]
[[1721 1249]
[1756 1249]
[1955 1283]
[2145 1399]
[1990 1333]]]
The ultimate goal is to remove the point closest to a given point ([1989 1332]) from each "group" of 5 coordinates. My thought was to produce a similarly shaped array of distances, and then using argmin to determine the indices of the values to be removed. However, I am not certain how to go about applying a function, like one to calculate a distance to a given point, to every element in an ndarray, at least in a NumPythonic way.
List comprehensions are a very inefficient way to deal with numpy arrays. They're an especially poor choice for the distance calculation.
To find the difference between your data and a point, you'd just do data - point. You can then calculate the distance using np.hypot, or if you'd prefer, square it, sum it, and take the square root.
It's a bit easier if you make it an Nx2 array for the purposes of the calculation though.
Basically, you want something like this:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist
This yields:
array([[[ 299.48121811],
[ 259.38388539],
[ 45.31004304],
[ 153.5219854 ],
[ 0. ]],
[[ 290.04310025],
[ 254.0019685 ],
[ 52.35456045],
[ 163.37074401],
[ 1. ]],
[[ 280.55837182],
[ 247.34186868],
[ 59.6405902 ],
[ 169.77926846],
[ 1.41421356]]])
Now, removing the closest element is a bit harder than simply getting the closest element.
With numpy, you can use boolean indexing to do this fairly easily.
However, you'll need to worry a bit about the alignment of your axes.
The key is to understand that numpy "broadcasts" operations along the last axis. In this case, we want to brodcast along the middle axis.
Also, -1 can be used as a placeholder for the size of an axis. Numpy will calculate the permissible size when -1 is put in as the size of an axis.
What we'd need to do would look a bit like this:
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
You could make that a single line, I'm just breaking it down for readability. The key is that dist != something yields a boolean array which you can then use to index the original array.
So, Putting it all together:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
print filtered
Yields:
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
On a side note, if more than one point is equally close, this won't work. Numpy arrays have to have the same number of elements along each dimension, so you'll need to re-do your grouping in that case.
If I understand your question correctly, I think you're looking for apply_along_axis. Using numpy's built-in broadcasting, we can simply subtract the point from the array:
>>> a - numpy.array([1989, 1332])
array([[[-285, -92],
[-244, -88],
[ -17, -42],
[ 140, 63],
[ 0, 0]],
[[-277, -86],
[-239, -86],
[ -25, -46],
[ 149, 67],
[ 0, 1]],
[[-268, -83],
[-233, -83],
[ -34, -49],
[ 156, 67],
[ 1, 1]]])
Then we can apply numpy.linalg.norm to it:
>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811, 259.38388539, 45.31004304,
153.5219854 , 0. ],
[ 290.04310025, 254.0019685 , 52.35456045,
163.37074401, 1. ],
[ 280.55837182, 247.34186868, 59.6405902 ,
169.77926846, 1.41421356]])
Finally, some boolean mask trickery, along with a couple of reshape calls:
>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
Joe Kington's answer is faster though. Oh well. I'll leave this for posterity.
def joes(data, point):
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
dist = dist.reshape(data.shape[0], data.shape[1], 1)
mask = np.squeeze(dist) != dist.min(axis=1)
return data[mask].reshape((3, 4, 2))
def mine(a, point):
dist = a - point
normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop
There are multiple ways to do this, but here is one using list comprehensions:
Distance function:
In [35]: from numpy.linalg import norm
In [36]: dist = lambda x,y:norm(x-y)
Input data:
In [39]: GivenMatrix = scipy.rand(3, 5, 2)
In [40]: GivenMatrix
Out[40]:
array([[[ 0.83798666, 0.90294439],
[ 0.8706959 , 0.88397176],
[ 0.91879085, 0.93512921],
[ 0.15989245, 0.57311869],
[ 0.82896003, 0.53589968]],
[[ 0.0207089 , 0.9521768 ],
[ 0.94523963, 0.31079109],
[ 0.41929482, 0.88559614],
[ 0.87885236, 0.45227422],
[ 0.58365369, 0.62095507]],
[[ 0.14757177, 0.86101539],
[ 0.58081214, 0.12632764],
[ 0.89958321, 0.73660852],
[ 0.3408943 , 0.45420989],
[ 0.42656333, 0.42770216]]])
In [41]: q = scipy.rand(2)
In [42]: q
Out[42]: array([ 0.03280889, 0.71057403])
Compute output distances:
In [44]: distances = [[dist(x, q) for x in SubMatrix]
for SubMatrix in GivenMatrix]
In [45]: distances
Out[45]:
[[0.82783910695733931,
0.85564093542511577,
0.91399620574915652,
0.18720096539588818,
0.81508758596405939],
[0.24190557184498068,
0.99617079746515047,
0.42426891258164884,
0.88459501973012633,
0.55808740166908177],
[0.18921712490174292,
0.80103146210692744,
0.86716521557255788,
0.40079819635686459,
0.48482888965287363]]
To rank the results for each submatrix:
In [46]: scipy.argsort(distances)
Out[46]:
array([[3, 4, 0, 1, 2],
[0, 2, 4, 3, 1],
[0, 3, 4, 1, 2]])
As for the deletion, I personally think that's easiest by converting GivenMatrix to a list, then using del:
>>> GivenList = GivenMatrix.tolist()
>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix