How to conditionally combine two numpy arrays of the same shape - python

This sounds simple, and I think I'm overcomplicating this in my mind.
I want to make an array whose elements are generated from two source arrays of the same shape, depending on which element in the source arrays is greater.
to illustrate:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = (insert magic)
>> array([2, 5, 0))
I can't work out how to produce an array3 that combines the elements of array1 and array2 to produce an array where only the greater of the two array1/array2 element values is taken.
Any help would be much appreciated. Thanks.

We could use NumPy built-in np.maximum, made exactly for that purpose -
np.maximum(array1, array2)
Another way would be to use the NumPy ufunc np.max on a 2D stacked array and max-reduce along the first axis (axis=0) -
np.max([array1,array2],axis=0)
Timings on 1 million datasets -
In [271]: array1 = np.random.randint(0,9,(1000000))
In [272]: array2 = np.random.randint(0,9,(1000000))
In [274]: %timeit np.maximum(array1, array2)
1000 loops, best of 3: 1.25 ms per loop
In [275]: %timeit np.max([array1, array2],axis=0)
100 loops, best of 3: 3.31 ms per loop
# #Eric Duminil's soln1
In [276]: %timeit np.where( array1 > array2, array1, array2)
100 loops, best of 3: 5.15 ms per loop
# #Eric Duminil's soln2
In [277]: magic = lambda x,y : np.where(x > y , x, y)
In [278]: %timeit magic(array1, array2)
100 loops, best of 3: 5.13 ms per loop
Extending to other supporting ufuncs
Similarly, there's np.minimum for finding element-wise minimum values between two arrays of same or broadcastable shapes. So, to find element-wise minimum between array1 and array2, we would have :
np.minimum(array1, array2)
For a complete list of ufuncs that support this feature, please refer to the docs and look for the keyword : element-wise. Grep-ing for those, I got the following ufuncs :
add, subtract, multiply, divide, logaddexp, logaddexp2, true_divide,
floor_divide, power, remainder, mod, fmod, divmod, heaviside, gcd,
lcm, arctan2, hypot, bitwise_and, bitwise_or, bitwise_xor, left_shift,
right_shift, greater, greater_equal, less, less_equal, not_equal,
equal, logical_and, logical_or, logical_xor, maximum, minimum, fmax,
fmin, copysign, nextafter, ldexp, fmod

If your condition ever becomes more complex, you could use np.where:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = np.where( array1 > array2, array1, array2)
# array([2, 5, 0])
You could replace array1 > array2 with any condition. If all you want is the maximum, go with #Divakar's answer.
And just for fun :
magic = lambda x,y : np.where(x > y , x, y)
magic(array1, array2)
# array([2, 5, 0])

Related

When to use `numpy.append()`?

I have been reading in multiple places (e.g. here) that numpy.append() should never be used.
For example, if one wants to stack multiple arrays together, it is much better to do so via an intermediate Python list:
import numpy as np
def stacker(arrs):
result = arrs[0][None, ...]
for arr in arrs[1:]:
result = np.append(result, arr[None, ...], 0)
return result
n = 1000
shape = (100, 100)
x = [np.random.randint(0, n, shape) for _ in range(n)]
%timeit np.array(x)
# 100 loops, best of 3: 17.6 ms per loop
%timeit np.concatenate([arr[None, ...] for arr in x])
# 100 loops, best of 3: 17.7 ms per loop
%timeit np.stack(x)
# 100 loops, best of 3: 18.3 ms per loop
%timeit stacker(x)
# 1 loop, best of 3: 12.5 s per loop
I understand that np.append() creates a copy of both its NumPy array inputs and this is much more inefficient than list.append() or list.extend() in this use-case. However, I find it hard to believe that NumPy developers just added a useless function.
So, what is the use-case for numpy.append()?
Look at its code:
arr = asanyarray(arr)
if axis is None:
if arr.ndim != 1:
arr = arr.ravel()
values = ravel(values)
axis = arr.ndim-1
return concatenate((arr, values), axis=axis)
It's just a simple interface to concatenate. With axis it's a direct call to concatenate. Without it it ravels the inputs, which often causes a problem. And it converts scalars to arrays.
If you have a 1d array, then it is an easy way to add one value:
In [8]: np.append(np.arange(3), 10)
Out[8]: array([ 0, 1, 2, 10])
but hstack is just as nice:
In [10]: np.hstack([np.arange(3), 10])
Out[10]: array([ 0, 1, 2, 10])
People write functions that seem to be a good idea at the time, usually with a specific use in mind. But the actual use (and misuses) may be different than anticipated.
np.stack is a more recent, and useful addition.
For a while there was a note in the docs urging us to use concatenate and stack and avoid all the other stack's, but that's been toned down. Now they just have:
This function makes most sense for arrays with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and
block provide more general stacking and concatenation operations.

Is there a faster implementation of the following code?

I have a one-dimensional numpy array, which is quite large in size. For each entry of the array, I need to produce a linearly spaced sub-array upto that entry value. Here is what I have as an example.
import numpy as np
a = np.array([2, 3])
b = np.array([np.linspace(0, i, 4) for i in a])
In this case there is linear space of size 4. The last statement in the above code involves a for loop which is rather slow if a is very large. Is there a trick to implement this in numpy itself?
You can phrase this as an outer product:
In [37]: a = np.arange(100000)
In [38]: %timeit np.array([np.linspace(0, i, 4) for i in a])
1 loop, best of 3: 1.3 s per loop
In [39]: %timeit np.outer(a, np.linspace(0, 1, 4))
1000 loops, best of 3: 1.44 ms per loop
The idea is to a take a unit linspace and then scale it separately by each element of a.
As you can see, this gives ~1000x speed up for n=100000.
For completeness, I'll mention that this code has slightly different roundoff properties than your original version (likely not an issue in practical applications):
In [52]: np.max(np.abs(np.array([np.linspace(0, i, 4) for i in a]) -
...: np.outer(a, np.linspace(0, 1, 4))))
Out[52]: 1.4551915228366852e-11
P. S. An alternative way to express the idea is by using element-wise multiplication with broadcasting (based on a suggestion by #Scott Gigante):
In [55]: %timeit a[:, np.newaxis] * np.linspace(0, 1, 4)
1000 loops, best of 3: 1.48 ms per loop
P. P. S. See the comments below for further ideas on making this faster.

Numpy assign columns of a 2d array as sum of columns of indices of another array

I want to represent the columns of one 2d array as the sum of the subset of the columns of another matrix. What is the most efficient way ?.
Right now, what I does is,
for i in xrange(U.shape[1])
U[:,i] = X[:,np.random.choice(X.shape[1], 10)].sum(axis=1)/10.0;
Is there a faster and better non-loop method ?
Generate all indices in one go, index into input 2D array to give us a 3D array and finally sum along the last axis axis=2, like so -
indx = np.random.randint(0,X.shape[1], (U.shape[1],10))
Uout = X[:,indx].sum(axis=2)/10.0
If you want to use np.random.choice to get indx -
np.random.choice(X.shape[1], size=(U.shape[1],10))
Indexing with np.take seems faster for large arrays -
In [63]: X = np.random.rand(1000,1000)
In [64]: indx = np.random.randint(0,1000, (1000,10))
In [67]: %timeit X[:,indx]
10 loops, best of 3: 69.9 ms per loop
In [68]: %timeit np.take(X, indx, axis=1)
10 loops, best of 3: 22.9 ms per loop

Efficiently compute columnwise sum of sparse array where every non-zero element is 1

I have a bunch of data in SciPy compressed sparse row (CSR) format. Of course the majority of elements is zero, and I further know that all non-zero elements have a value of 1. I want to compute sums over different subsets of rows of my matrix. At the moment I am doing the following:
import numpy as np
import scipy as sp
import scipy.sparse
# create some data with sparsely distributed ones
data = np.random.choice((0, 1), size=(1000, 2000), p=(0.95, 0.05))
data = sp.sparse.csr_matrix(data, dtype='int8')
# generate column-wise sums over random subsets of rows
nrand = 1000
for k in range(nrand):
inds = np.random.choice(data.shape[0], size=100, replace=False)
# 60% of time is spent here
extracted_rows = data[inds]
# 20% of time is spent here
row_sum = extracted_rows.sum(axis=0)
The last few lines there are the bottleneck in a larger computational pipeline. As I annotated in the code, 60% of time is spent slicing the data from the random indices, and 20% is spent computing the actual sum.
It seems to me I should be able to use my knowledge about the data in the array (i.e., any non-zero value in the sparse matrix will be 1; no other values present) to compute these sums more efficiently. Unfortunately, I cannot figure out how. Dealing with just data.indices perhaps? I have tried other sparsity structures (e.g. CSC matrix), as well as converting to dense array first, but these approaches were all slower than this CSR matrix approach.
It is well known that indexing of sparse matrices is relatively slow. And there have SO questions about getting around that by accessing the data attributes directly.
But first some timings. Using data and ind as you show I get
In [23]: datad=data.A # times at 3.76 ms per loop
In [24]: timeit row_sumd=datad[inds].sum(axis=0)
1000 loops, best of 3: 529 µs per loop
In [25]: timeit row_sum=data[inds].sum(axis=0)
1000 loops, best of 3: 890 µs per loop
In [26]: timeit d=datad[inds]
10000 loops, best of 3: 55.9 µs per loop
In [27]: timeit d=data[inds]
1000 loops, best of 3: 617 µs per loop
The sparse version is slower than the dense one, but not by a lot. The sparse indexing is much slower, but its sum is somewhat faster.
The sparse sum is done with a matrix product
def sparse.spmatrix.sum
....
return np.asmatrix(np.ones((1, m), dtype=res_dtype)) * self
That suggests that faster way - turn inds into an appropriate array of 1s and multiply.
In [49]: %%timeit
....: b=np.zeros((1,data.shape[0]),'int8')
....: b[:,inds]=1
....: rowmul=b*data
....:
1000 loops, best of 3: 587 µs per loop
That makes the sparse operation about as fast as the equivalent dense one. (but converting to dense is much slower)
==================
The last time test is missing the np.asmatrix that is present in the sparse sum. But times are similar, and the results are the same
In [232]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x1=np.asmatrix(b)*data
1000 loops, best of 3: 661 µs per loop
In [233]: timeit b=np.zeros((1,data.shape[0]),'int8'); b[:,inds]=1; x2=b*data
1000 loops, best of 3: 605 µs per loop
One produces a matrix, the other an array. But both are doing a matrix product, 2nd dim of B against 1st of data. Even though b is an array, the task is actually delegated to data and its matrix product - in a not so transparent a way.
In [234]: x1
Out[234]: matrix([[9, 9, 5, ..., 9, 5, 3]], dtype=int8)
In [235]: x2
Out[235]: array([[9, 9, 5, ..., 9, 5, 3]], dtype=int8)
b*data.A is element multiplication and raises an error; np.dot(b,data.A) works but is slower.
Newer numpy/python has a matmul operator. I see the same time pattern:
In [280]: timeit b#dataA # dense product
100 loops, best of 3: 2.64 ms per loop
In [281]: timeit b#data.A # slower due to `.A` conversion
100 loops, best of 3: 6.44 ms per loop
In [282]: timeit b#data # sparse product
1000 loops, best of 3: 571 µs per loop
np.dot may also delegate action to sparse, though you have to be careful. I just hung my machine with np.dot(csr_matrix(b),data.A).
Here's a vectorized approach after converting data to a dense array and also getting all those inds in a vectorized manner using argpartition-based method -
# Number of selections as a parameter
n = 100
# Get inds across all iterations in a vectorized manner as a 2D array.
inds2D = np.random.rand(nrand,data.shape[0]).argpartition(n)[:,:n]
# Index into data with those 2D array indices. Then, convert to dense NumPy array,
# reshape and sum reduce to get the final output
out = np.array(data.todense())[inds2D.ravel()].reshape(nrand,n,-1).sum(1)
Runtime test -
1) Function definitions :
def org_app(nrand,n):
out = np.zeros((nrand,data.shape[1]),dtype=int)
for k in range(nrand):
inds = np.random.choice(data.shape[0], size=n, replace=False)
extracted_rows = data[inds]
out[k] = extracted_rows.sum(axis=0)
return out
def vectorized_app(nrand,n):
inds2D = np.random.rand(nrand,data.shape[0]).argpartition(n)[:,:n]
return np.array(data.todense())[inds2D.ravel()].reshape(nrand,n,-1).sum(1)
Timings :
In [205]: # create some data with sparsely distributed ones
...: data = np.random.choice((0, 1), size=(1000, 2000), p=(0.95, 0.05))
...: data = sp.sparse.csr_matrix(data, dtype='int8')
...:
...: # generate column-wise sums over random subsets of rows
...: nrand = 1000
...: n = 100
...:
In [206]: %timeit org_app(nrand,n)
1 loops, best of 3: 1.38 s per loop
In [207]: %timeit vectorized_app(nrand,n)
1 loops, best of 3: 826 ms per loop

Efficiently sample all arrays in ndarray using scipy.ndimage.map_coordinates

I have a 3D stack of masked arrays. I'd like to sample all arrays in the stack at the same fixed locations.
stack.ma_stack.shape
(1461, 390, 327)
#Indices to be sampled
x = np.array([ 117.38670304, 119.1220485 ])
y = np.array([ 209.98120554, 210.37202372])
The following is very efficient, but only works for integer indices:
x_int = np.rint(x).astype(int)
y_int = np.rint(y).astype(int)
samp = stack.ma_stack[:,y_int,x_int]
samp.shape
(1461, 2)
I'm trying to implement the scipy.ndimage.map_coordinates interpolated sampling for float indices, but I can't seem to figure out how to format the coordinates properly.
Most examples use map_coordinates to sample a single array, and the following works for a single array from the stack:
map_coord = np.array([[y,], [x,]])
samp = scipy.ndimage.map_coordinates(stack.ma_stack[0], map_coord, order=1)
samp.shape
(1, 2)
I can loop through each array in the stack, but I know there is a simple indexing trick that will sample the entire stack in a single call. I read about mgrid, and did some experimentation, but couldn't find the right solution (I'm still learning advanced indexing). I know somebody out there will know the answer right away. Thanks.
On a related note: Anybody know how to do this for masked arrays without replacing missing data with fill_value or np.nan? The ndimage interpolation doesn't play nicely with masked arrays:
https://github.com/scipy/scipy/issues/1682
There must be a way to get it to broadcast automatically... in the meantime, you can force the broadcasting with np.arange(...) to get one point from each 2d array in the stack:
map_coords = np.broadcast_arrays(np.arange(stack.ma_stack.shape[0])[:, None], y, x)
samp = ndimage.map_coordinates(stack.ma_stack, map_coords, order=1)
This is inefficient though, because the "broadcasting" is done in advance (and presumably copies all that data), but it's still quite a bit faster than the loop:
In [88]: a = np.random.rand(1461, 390, 327)
In [89]: x = np.array([ 117.38670304, 119.1220485 ])
In [90]: y = np.array([ 209.98120554, 210.37202372])
In [107]: %%timeit
.....: map_coord = np.array([[y,], [x,]])
.....: np.concatenate([ndimage.map_coordinates(ai, map_coord, order=1) for ai in a])
.....:
10 loops, best of 3: 33.1 ms per loop
In [108]: %%timeit
.....: map_coords = np.broadcast_arrays(np.arange(a.shape[0])[:, None], y, x)
.....: ndimage.map_coordinates(a, map_coords, order=1)
.....:
100 loops, best of 3: 4.67 ms per loop
In [109]: samp_OP = np.concatenate([ndimage.map_coordinates(ai, map_coord, order=1) for ai in a])
In [110]: samp_chan = ndimage.map_coordinates(a, map_coords, order=1)
In [111]: np.allclose(samp_chan, samp_OP)
Out[111]: True

Categories

Resources