Is there a quicker way to do this in Numpy? - python

I want to generate a 3D matrix in numpy. The code is:
mean_value = np.array([1, 2, 3], dtype=np.float32)
h, w = 5, 5
b = np.ones((h, w, 1), dtype=np.float32) * np.reshape(mean_value, [1, 1, 3])
print(b.shape) # (5, 5, 3)
Is there any quicker way for generating b? Thanks.

For efficiency (memory, performance), simply broadcast with np.broadcast_to for a view output -
np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
Being a view, it has no memory overhead and hence, virtually free on runtime.
Let's verify the performance part -
In [45]: mean_value = np.array([1, 2, 3], dtype=np.float32)
...: h, w = 5, 5
In [46]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.21 µs per loop
In [47]: mean_value = np.random.rand(10000)
...: h, w = 5000, 5000
In [48]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.22 µs per loop
And memory part (being a view) -
In [49]: np.shares_memory(mean_value,np.broadcast_to(mean_value,(h,w,)+mean_value.shape))
Out[49]: True

Related

find where values in one numpy array fall between values in another numpy array

Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]
For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop
Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample

Vectorized assignment for numpy array with repeated indices (d[i,j,i,j] = s[i,j])

How can I set
d[i,j,i,j] = s[i,j]
using "NumPy" and without for loop?
I've tried the follow:
l1=range(M)
l2=range(N)
d[l1,l2,l1,l2] = s[l1,l2]
If you think about it, that would be same as creating a 2D array of shape (m*n, m*n) and assigning the values from s into the diagonal places. To have the final output as 4D, we just need a reshape at the end. That's basically being implemented below -
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
d.shape = (m,n,m,n)
Runtime test
Approaches -
# #MSeifert's solution
def assign_vals_ix(s):
d = np.zeros((m, n, m, n), dtype=s.dtype)
l1 = range(m)
l2 = range(n)
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
return d
# Proposed in this post
def assign_vals(s):
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
return d.reshape(m,n,m,n)
# Using a strides based approach
def assign_vals_strides(a):
m,n = a.shape
p,q = a.strides
d = np.zeros((m,n,m,n),dtype=a.dtype)
out_strides = (q*(n*m*n+n),(m*n+1)*q)
d_view = np.lib.stride_tricks.as_strided(d, (m,n), out_strides)
d_view[:] = a
return d
Timings -
In [285]: m,n = 10,10
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
...:
In [286]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 21.3 µs per loop
In [287]: %timeit assign_vals_strides(s)
100000 loops, best of 3: 9.37 µs per loop
In [288]: %timeit assign_vals(s)
100000 loops, best of 3: 4.13 µs per loop
In [289]: m,n = 20,20
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
In [290]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 60.2 µs per loop
In [291]: %timeit assign_vals_strides(s)
10000 loops, best of 3: 41.8 µs per loop
In [292]: %timeit assign_vals(s)
10000 loops, best of 3: 35.5 µs per loop
You can use integer array indexing (creating the broadcasted indices with np.ix_):
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
The first time the indices have to be duplicated (you want [i, j, i, j] instead of just [i, j]) that's why I multiplied the tuple returned by np.ix_ with 2.
For example:
>>> d = np.zeros((10, 10, 10, 10), dtype=int)
>>> s = np.arange(100).reshape(10, 10)
>>> l1 = range(3)
>>> l2 = range(5)
>>> d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
And to make sure that the correct values were assigned:
>>> # Assert equality for the given condition
>>> for i in l1:
... for j in l2:
... assert d[i, j, i, j] == s[i, j]
>>> # Interactive tests
>>> d[0, 0, 0, 0], s[0, 0]
(0, 0)
>>> d[1, 2, 1, 2], s[1, 2]
(12, 12)
>>> d[2, 0, 2, 0], s[2, 0]
(20, 20)
>>> d[2, 4, 2, 4], s[2, 4]
(24, 24)

numpy.argmin for elements greater than a threshold

I'm interested in getting the location of the minimum value in an 1-d NumPy array that meets a certain condition (in my case, a medium threshold). For example:
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
I'd like to effectively mask all numbers in a that are under the limit, such that the result of np.argmin would be 6. Is there a computationally cheap way to mask values that don't meet a condition and then apply np.argmin?
You could store the valid indices and use those for both selecting the valid elements from a and also indexing into with the argmin() among the selected elements to get the final index output. Thus, the implementation would look something like this -
valid_idx = np.where(a >= limit)[0]
out = valid_idx[a[valid_idx].argmin()]
Sample run -
In [32]: limit = 3
...: a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
...:
In [33]: valid_idx = np.where(a >= limit)[0]
In [34]: valid_idx[a[valid_idx].argmin()]
Out[34]: 6
Runtime test -
For performance benchmarking, in this section I am comparing the other solution based on masked array against a regular array based solution as proposed earlier in this post for various datasizes.
def masked_argmin(a,limit): # Defining func for regular array based soln
valid_idx = np.where(a >= limit)[0]
return valid_idx[a[valid_idx].argmin()]
In [52]: # Inputs
...: a = np.random.randint(0,1000,(10000))
...: limit = 500
...:
In [53]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 233 µs per loop
In [54]: %timeit masked_argmin(a,limit)
10000 loops, best of 3: 101 µs per loop
In [55]: # Inputs
...: a = np.random.randint(0,1000,(100000))
...: limit = 500
...:
In [56]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 1.73 ms per loop
In [57]: %timeit masked_argmin(a,limit)
1000 loops, best of 3: 1.03 ms per loop
This can simply be accomplished using numpy's MaskedArray
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
b = np.ma.MaskedArray(a, a<limit)
np.ma.argmin(b) # == 6

fast way of computing diagonals of XMX^T in python

I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:
X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
I know somehow I should be using numpy.einsum, but I have not been able to figure out how?
Many thanks!
Sure there is an np.einsum way, like so -
np.einsum('ij,ij->i',X.dot(M),X)
This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.
Runtime test -
This section compares all the approaches posted thus far to solve the problem.
In [132]: # Setup input arrays
...: X = np.random.randn(10000, 100)
...: M = np.random.rand(100, 100)
...:
...: def original_app(X,M):
...: out = np.zeros(10000)
...: for n in range(10000):
...: out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
...: return out
...:
In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True
In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop
In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# #Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop
In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # #hpaulj's solution
1 loops, best of 3: 442 ms per loop
In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop
Here is a simpler example:
M = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
X = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:
In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692
In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))
In [212]: out=np.zeros(4)
In [213]: for n in range(4):
out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
.....:
In [214]: out
Out[214]: array([ 9., 144., 441., 900.])
One einsum approach:
In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([ 9., 144., 441., 900.])
Comparing the other einsum:
In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop
In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop
This is a bit faster, but results may diff with your larger size.
einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).
Similar use of einsum - Combine Einsum Expresions

Is there a better way to broadcast arrays?

I want to broadcast an array b to the shape it would take if it were in an arithmetic operation with another array a.
For example, if a.shape = (3,3) and b was a scalar, I want to get an array whose shape is (3,3) and is filled with the scalar.
One way to do this is like this:
>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = 1 + a*0
>>> b
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
Although this works practically, I can't help but feel it looks a bit weird, and wouldn't be obvious to someone else looking at the code what I was trying to do.
Is there any more elegant way to do this? I've looked at the documentation for np.broadcast, but it's orders of magnitude slower.
In [1]: a = np.arange(10000).reshape((100,100))
In [2]: %timeit 1 + a*0
10000 loops, best of 3: 31.9 us per loop
In [3]: %timeit bc = np.broadcast(a,1);np.fromiter((v for u, v in bc),float).reshape(bc.shape)
100 loops, best of 3: 5.2 ms per loop
In [4]: 5.2e-3/32e-6
Out[4]: 162.5
If you just want to fill an array with a scalar, fill is probably the best choice. But it sounds like you want something more generalized. Rather than using broadcast you can use broadcast_arrays to get the result that (I think) you want.
>>> a = numpy.arange(9).reshape(3, 3)
>>> numpy.broadcast_arrays(a, 1)[1]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
This generalizes to any two broadcastable shapes:
>>> numpy.broadcast_arrays(a, [1, 2, 3])[1]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's not quite as fast as your ufunc-based method, but it's still on the same order of magnitude:
>>> %timeit 1 + a * 0
10000 loops, best of 3: 23.2 us per loop
>>> %timeit numpy.broadcast_arrays(a, 1)[1]
10000 loops, best of 3: 52.3 us per loop
But scalars, fill is still the clear front-runner:
>>> %timeit b = numpy.empty_like(a, dtype='i8'); b.fill(1)
100000 loops, best of 3: 6.59 us per loop
Finally, further testing shows that the fastest approach -- in at least some cases -- is to multiply by ones:
>>> %timeit numpy.broadcast_arrays(a, numpy.arange(100))[1]
10000 loops, best of 3: 53.4 us per loop
>>> %timeit (1 + a * 0) * numpy.arange(100)
10000 loops, best of 3: 45.9 us per loop
>>> %timeit b = numpy.ones_like(a, dtype='i8'); b * numpy.arange(100)
10000 loops, best of 3: 28.9 us per loop
The fastest and cleanest solution I know is:
b_arr = numpy.empty(a.shape) # Empty array
b_arr.fill(b) # Filling with one value
fill sounds like the simplest way:
>>> a = np.arange(9).reshape((3,3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.fill(10)
>>> a
array([[10, 10, 10],
[10, 10, 10],
[10, 10, 10]])
EDIT: As #EOL points out, you don't need arange if you want to create a new array, np.empty((100,100)) (or whatever shape) is better for this.
Timings:
In [3]: a = np.arange(10000).reshape((100,100))
In [4]: %timeit 1 + a*0
100000 loops, best of 3: 19.9 us per loop
In [5]: a = np.arange(10000).reshape((100,100))
In [6]: %timeit a.fill(1)
100000 loops, best of 3: 3.73 us per loop
If you just need to broadcast a scalar to some arbitrary shape, you can do something like this:
a = b*np.ones(shape=(3,3))
Edit: np.tile is more general. You can use it to duplicate any scalar/vector in any number of dimensions:
b = 1
N = 100
a = np.tile(b, reps=(N, N))

Categories

Resources