I have a 3D stack of masked arrays. I'd like to sample all arrays in the stack at the same fixed locations.
stack.ma_stack.shape
(1461, 390, 327)
#Indices to be sampled
x = np.array([ 117.38670304, 119.1220485 ])
y = np.array([ 209.98120554, 210.37202372])
The following is very efficient, but only works for integer indices:
x_int = np.rint(x).astype(int)
y_int = np.rint(y).astype(int)
samp = stack.ma_stack[:,y_int,x_int]
samp.shape
(1461, 2)
I'm trying to implement the scipy.ndimage.map_coordinates interpolated sampling for float indices, but I can't seem to figure out how to format the coordinates properly.
Most examples use map_coordinates to sample a single array, and the following works for a single array from the stack:
map_coord = np.array([[y,], [x,]])
samp = scipy.ndimage.map_coordinates(stack.ma_stack[0], map_coord, order=1)
samp.shape
(1, 2)
I can loop through each array in the stack, but I know there is a simple indexing trick that will sample the entire stack in a single call. I read about mgrid, and did some experimentation, but couldn't find the right solution (I'm still learning advanced indexing). I know somebody out there will know the answer right away. Thanks.
On a related note: Anybody know how to do this for masked arrays without replacing missing data with fill_value or np.nan? The ndimage interpolation doesn't play nicely with masked arrays:
https://github.com/scipy/scipy/issues/1682
There must be a way to get it to broadcast automatically... in the meantime, you can force the broadcasting with np.arange(...) to get one point from each 2d array in the stack:
map_coords = np.broadcast_arrays(np.arange(stack.ma_stack.shape[0])[:, None], y, x)
samp = ndimage.map_coordinates(stack.ma_stack, map_coords, order=1)
This is inefficient though, because the "broadcasting" is done in advance (and presumably copies all that data), but it's still quite a bit faster than the loop:
In [88]: a = np.random.rand(1461, 390, 327)
In [89]: x = np.array([ 117.38670304, 119.1220485 ])
In [90]: y = np.array([ 209.98120554, 210.37202372])
In [107]: %%timeit
.....: map_coord = np.array([[y,], [x,]])
.....: np.concatenate([ndimage.map_coordinates(ai, map_coord, order=1) for ai in a])
.....:
10 loops, best of 3: 33.1 ms per loop
In [108]: %%timeit
.....: map_coords = np.broadcast_arrays(np.arange(a.shape[0])[:, None], y, x)
.....: ndimage.map_coordinates(a, map_coords, order=1)
.....:
100 loops, best of 3: 4.67 ms per loop
In [109]: samp_OP = np.concatenate([ndimage.map_coordinates(ai, map_coord, order=1) for ai in a])
In [110]: samp_chan = ndimage.map_coordinates(a, map_coords, order=1)
In [111]: np.allclose(samp_chan, samp_OP)
Out[111]: True
Related
I want to calculate a large distance matrix, based on a higher dimensional vector. For instance, I have 1000 instances each represented by 20 vectors of length 10. The distance between each two instances is given by the mean distance between each of the 20 vectors associated to each vector. So I want to go from a 1000 by 20 by 10 matrix to a 1000 by 1000 (lower-triangular) matrix. Because these calculations can get slow, I want to use Dask distributed to block the algorithm and spread it over several CPU's. Below is how far I've gotten:
Preamble
import itertools
import random
import numpy as np
import dask.array
from dask.distributed import Client
The distance function is defined by
def distance(u, v):
result = np.empty([int((len(u)*(len(u)+1))/2)], dtype=float)
for i, j in itertools.product(range(len(u)),range(len(v))):
if j <= i:
differences = []
k = int(((i*(i+1))/2 +j-1)+1)
for x,y in itertools.product(u[i], v[j]):
difference = np.abs(np.array(x) - np.array(y)).sum(axis=1)
differences.apply(difference)
result[k] = np.mean(differences)
return result
and returns an array of length n*(n+1)/2 to describe the lower triangular matrix for this block of the distance matrix.
def distance_matrix(X):
X = np.asarray(X, dtype=object)
X = dask.array.from_array(X, (100, 20, 10)).astype(float)
print("chunksize: ", X.chunksize)
resulting_length = [int((X.chunksize[0]*(X.chunksize[0])+1)/2)]
result = dask.array.map_blocks(distance, X, X, chunks=(resulting_length), drop_axis=[1,2], dtype=float)
return result.compute()
I split up the input array in chunks and use dask.array.map_blocks to apply the distance calculation to all the blocks.
if __name__ == '__main__':
workers = 6
X = np.array([[[random.random() for _ in range(10)] for _ in range(20)] for _ in range(1000)])
client = Client(n_workers=workers)
results = similarity_matrix(X)
client.close()
print(results)
Unfortunately, this approach returns the wrong length of array at the end of the process. Would somebody to help me out here? I don't have much experience in distributed computing.
I'm a big fan of dask, but this problem is way too small to need it. The runtime issue you're seeing is because you are looping through each element in python rather than using vectorized operations in numpy.
As with many packages in python, numpy relies on highly efficient compiled code written in other, faster languages such as C to carry out array operations. When you do something like an array operation A + B numpy calls these fast routines once, and the array operations are carried out within a highly optimized C routine. There is overhead involved with making calls to other languages, but this is overwhelmed by the performance gain due to the single call to a very fast routine. If instead you loop over every element, adding cell-wise, you have a (slow) python process, and on each element, this calls the C code, which adds overhead for each element of the array. Because of this, you actually would be better off not using numpy if you're going to do this once for each element.
To implement this in a vectorized manner, you can exploit numpy's broadcasting rules to ensure the first dimensions of your two arrays expand to a new dimension. I don't totally understand what's going on in your distance function, but you could extend this simple version to do whatever you want:
In [1]: import numpy as np
In [2]: A = np.random.random((1000, 20))
...: B = np.random.random((1000, 20))
In [3]: distance = np.abs(A[:, np.newaxis, :] - B[np.newaxis, :, :]).sum(axis=-1)
In [4]: distance
Out[4]:
array([[7.22985776, 7.76185666, 5.61824886, ..., 7.62092039, 6.35189562,
7.06365986],
[5.73359499, 5.8422105 , 7.2644021 , ..., 5.72230353, 6.79390303,
5.03074007],
[7.27871151, 8.6856818 , 5.97489449, ..., 8.86620029, 7.49875638,
6.57389575],
...,
[7.67783107, 7.24419076, 4.17941596, ..., 8.68674754, 6.65078093,
5.67279811],
[7.1550136 , 6.10590227, 5.75417987, ..., 7.05953998, 5.8306628 ,
6.55112672],
[5.81748615, 6.79246838, 6.95053088, ..., 7.63994705, 6.77720511,
7.5663236 ]])
In [5]: distance.shape
Out[5]: (1000, 1000)
The performance difference can be seen clearly against a looped implementation:
In [6]: %%timeit
...: np.abs(A[:, np.newaxis, :] - B[np.newaxis, :, :]).sum(axis=-1)
...:
...:
45 ms ± 326 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %%timeit
...: distances = np.empty((1000, 1000))
...: for i in range(1000):
...: for j in range(1000):
...: distances[i, j] = np.abs(A[i, :] - B[j, :]).sum()
...:
2.42 s ± 7.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The looped version takes more than 50x as long!
I have been reading in multiple places (e.g. here) that numpy.append() should never be used.
For example, if one wants to stack multiple arrays together, it is much better to do so via an intermediate Python list:
import numpy as np
def stacker(arrs):
result = arrs[0][None, ...]
for arr in arrs[1:]:
result = np.append(result, arr[None, ...], 0)
return result
n = 1000
shape = (100, 100)
x = [np.random.randint(0, n, shape) for _ in range(n)]
%timeit np.array(x)
# 100 loops, best of 3: 17.6 ms per loop
%timeit np.concatenate([arr[None, ...] for arr in x])
# 100 loops, best of 3: 17.7 ms per loop
%timeit np.stack(x)
# 100 loops, best of 3: 18.3 ms per loop
%timeit stacker(x)
# 1 loop, best of 3: 12.5 s per loop
I understand that np.append() creates a copy of both its NumPy array inputs and this is much more inefficient than list.append() or list.extend() in this use-case. However, I find it hard to believe that NumPy developers just added a useless function.
So, what is the use-case for numpy.append()?
Look at its code:
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)
It's just a simple interface to concatenate. With axis it's a direct call to concatenate. Without it it ravels the inputs, which often causes a problem. And it converts scalars to arrays.
If you have a 1d array, then it is an easy way to add one value:
In [8]: np.append(np.arange(3), 10)
Out[8]: array([ 0, 1, 2, 10])
but hstack is just as nice:
In [10]: np.hstack([np.arange(3), 10])
Out[10]: array([ 0, 1, 2, 10])
People write functions that seem to be a good idea at the time, usually with a specific use in mind. But the actual use (and misuses) may be different than anticipated.
np.stack is a more recent, and useful addition.
For a while there was a note in the docs urging us to use concatenate and stack and avoid all the other stack's, but that's been toned down. Now they just have:
This function makes most sense for arrays with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and
block provide more general stacking and concatenation operations.
I have a one-dimensional numpy array, which is quite large in size. For each entry of the array, I need to produce a linearly spaced sub-array upto that entry value. Here is what I have as an example.
import numpy as np
a = np.array([2, 3])
b = np.array([np.linspace(0, i, 4) for i in a])
In this case there is linear space of size 4. The last statement in the above code involves a for loop which is rather slow if a is very large. Is there a trick to implement this in numpy itself?
You can phrase this as an outer product:
In [37]: a = np.arange(100000)
In [38]: %timeit np.array([np.linspace(0, i, 4) for i in a])
1 loop, best of 3: 1.3 s per loop
In [39]: %timeit np.outer(a, np.linspace(0, 1, 4))
1000 loops, best of 3: 1.44 ms per loop
The idea is to a take a unit linspace and then scale it separately by each element of a.
As you can see, this gives ~1000x speed up for n=100000.
For completeness, I'll mention that this code has slightly different roundoff properties than your original version (likely not an issue in practical applications):
In [52]: np.max(np.abs(np.array([np.linspace(0, i, 4) for i in a]) -
...: np.outer(a, np.linspace(0, 1, 4))))
Out[52]: 1.4551915228366852e-11
P. S. An alternative way to express the idea is by using element-wise multiplication with broadcasting (based on a suggestion by #Scott Gigante):
In [55]: %timeit a[:, np.newaxis] * np.linspace(0, 1, 4)
1000 loops, best of 3: 1.48 ms per loop
P. P. S. See the comments below for further ideas on making this faster.
This sounds simple, and I think I'm overcomplicating this in my mind.
I want to make an array whose elements are generated from two source arrays of the same shape, depending on which element in the source arrays is greater.
to illustrate:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = (insert magic)
>> array([2, 5, 0))
I can't work out how to produce an array3 that combines the elements of array1 and array2 to produce an array where only the greater of the two array1/array2 element values is taken.
Any help would be much appreciated. Thanks.
We could use NumPy built-in np.maximum, made exactly for that purpose -
np.maximum(array1, array2)
Another way would be to use the NumPy ufunc np.max on a 2D stacked array and max-reduce along the first axis (axis=0) -
np.max([array1,array2],axis=0)
Timings on 1 million datasets -
In [271]: array1 = np.random.randint(0,9,(1000000))
In [272]: array2 = np.random.randint(0,9,(1000000))
In [274]: %timeit np.maximum(array1, array2)
1000 loops, best of 3: 1.25 ms per loop
In [275]: %timeit np.max([array1, array2],axis=0)
100 loops, best of 3: 3.31 ms per loop
# #Eric Duminil's soln1
In [276]: %timeit np.where( array1 > array2, array1, array2)
100 loops, best of 3: 5.15 ms per loop
# #Eric Duminil's soln2
In [277]: magic = lambda x,y : np.where(x > y , x, y)
In [278]: %timeit magic(array1, array2)
100 loops, best of 3: 5.13 ms per loop
Extending to other supporting ufuncs
Similarly, there's np.minimum for finding element-wise minimum values between two arrays of same or broadcastable shapes. So, to find element-wise minimum between array1 and array2, we would have :
np.minimum(array1, array2)
For a complete list of ufuncs that support this feature, please refer to the docs and look for the keyword : element-wise. Grep-ing for those, I got the following ufuncs :
add, subtract, multiply, divide, logaddexp, logaddexp2, true_divide,
floor_divide, power, remainder, mod, fmod, divmod, heaviside, gcd,
lcm, arctan2, hypot, bitwise_and, bitwise_or, bitwise_xor, left_shift,
right_shift, greater, greater_equal, less, less_equal, not_equal,
equal, logical_and, logical_or, logical_xor, maximum, minimum, fmax,
fmin, copysign, nextafter, ldexp, fmod
If your condition ever becomes more complex, you could use np.where:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = np.where( array1 > array2, array1, array2)
# array([2, 5, 0])
You could replace array1 > array2 with any condition. If all you want is the maximum, go with #Divakar's answer.
And just for fun :
magic = lambda x,y : np.where(x > y , x, y)
magic(array1, array2)
# array([2, 5, 0])
I am trying to optimize some code, and by profiling i noticed that this particular loop takes a lot of time. Can you help me write it faster?
import numpy as np
rows_a, rows_v, cols = (10, 15, 3)
a = np.arange(rows_a*cols).reshape(rows_a,cols)
v = np.arange(rows_v*cols).reshape(rows_v,cols)
c = 0
for i in range(len(v)):
D = ((a-v[i])**2).sum(axis=-1)
c += D.min()
print(c)
Is there any numpy function that can do this efficiently?
import numpy as np
rows_a, rows_v, cols = (10, 15, 3)
a = np.arange(rows_a*cols).reshape(rows_a,cols)
v = np.arange(rows_v*cols).reshape(rows_v,cols)
def using_loop():
c = 0
for i in range(len(v)):
D = ((a-v[i])**2).sum(axis=-1)
c += D.min()
return c
def using_broadcasting():
return ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).min(axis=0).sum()
In [106]: %timeit using_loop()
1000 loops, best of 3: 233 µs per loop
In [107]: %timeit using_broadcasting()
10000 loops, best of 3: 29.1 µs per loop
In [108]: assert using_loop() == using_broadcasting()
When using NumPy it usually helps to eliminate for-loops (if possible) and express the calculation with operations done on entire arrays -- or at least on arrays that are as large as possible. By doing so, you off-load more of the work to fast algorithms written in C or Fortran without intermediate Python code.
In the original code, D has shape (10,) for each iteration of the loop. Since there are 15 iterations of the loop, if we could express all the values for D from all 15 iterations at once as one big array, then D would have shape (10, 15). In fact, we can do that:
Since a has shape (10,3), a[:, np.newaxis, :] has shape (10,1,3).
Using NumPy broadcasting, since v has shape (15,3),
a[:,np.newaxis,:]-v
has shape (10,15,3). Squaring, then summing on the last axis gives an array of shape (10, 15). This is the new D:
In [109]: ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).shape
Out[109]: (10, 15)
Once you have D, the rest of the calculation follows naturally.