Is there a quicker way to do this in Numpy?

Is there a quicker way to do this in Numpy? - python

I want to generate a 3D matrix in numpy. The code is:
mean_value = np.array([1, 2, 3], dtype=np.float32)
h, w = 5, 5
b = np.ones((h, w, 1), dtype=np.float32) * np.reshape(mean_value, [1, 1, 3])
print(b.shape) # (5, 5, 3)
Is there any quicker way for generating b? Thanks.

For efficiency (memory, performance), simply broadcast with np.broadcast_to for a view output -
np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
Being a view, it has no memory overhead and hence, virtually free on runtime.
Let's verify the performance part -
In [45]: mean_value = np.array([1, 2, 3], dtype=np.float32)
...: h, w = 5, 5
In [46]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.21 µs per loop
In [47]: mean_value = np.random.rand(10000)
...: h, w = 5000, 5000
In [48]: %timeit np.broadcast_to(mean_value,(h,w,)+mean_value.shape)
100000 loops, best of 3: 3.22 µs per loop
And memory part (being a view) -
In [49]: np.shares_memory(mean_value,np.broadcast_to(mean_value,(h,w,)+mean_value.shape))
Out[49]: True

Related

find where values in one numpy array fall between values in another numpy array

Is there a more numpythonic way to do this?
#example arrays
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7], dtype=np.float32)
values = np.array([0.2, 3.0, 1.5])
#get the indices where each value falls between values in arr
between = [np.nonzero(i > arr)[0][-1] for i in values]

For sorted arr, we can use np.searchsorted for performance -
In [67]: np.searchsorted(arr,values)-1
Out[67]: array([0, 2, 1])
Timings on large dataset -
In [81]: np.random.seed(0)
...: arr = np.unique(np.random.randint(0,10000, 10000))
...: values = np.random.randint(0,10000, 1000)
# #Andy L.'s soln
In [84]: %timeit np.argmin(values > arr[:,None], axis=0) - 1
10 loops, best of 3: 28.2 ms per loop
# Original soln
In [82]: %timeit [np.nonzero(i > arr)[0][-1] for i in values]
100 loops, best of 3: 8.68 ms per loop
# From this post
In [83]: %timeit np.searchsorted(arr,values)-1
10000 loops, best of 3: 57.8 µs per loop

Use broadcast and argmin
np.argmin(values > arr[:,None], axis=0) - 1
Out[32]: array([0, 2, 1], dtype=int32)
Note: I assume arr is monotonic increasing as in the sample

Vectorized assignment for numpy array with repeated indices (d[i,j,i,j] = s[i,j])

How can I set
d[i,j,i,j] = s[i,j]
using "NumPy" and without for loop?
I've tried the follow:
l1=range(M)
l2=range(N)
d[l1,l2,l1,l2] = s[l1,l2]

If you think about it, that would be same as creating a 2D array of shape (m*n, m*n) and assigning the values from s into the diagonal places. To have the final output as 4D, we just need a reshape at the end. That's basically being implemented below -
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
d.shape = (m,n,m,n)
Runtime test
Approaches -
# #MSeifert's solution
def assign_vals_ix(s):
d = np.zeros((m, n, m, n), dtype=s.dtype)
l1 = range(m)
l2 = range(n)
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
return d
# Proposed in this post
def assign_vals(s):
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
return d.reshape(m,n,m,n)
# Using a strides based approach
def assign_vals_strides(a):
m,n = a.shape
p,q = a.strides
d = np.zeros((m,n,m,n),dtype=a.dtype)
out_strides = (q*(n*m*n+n),(m*n+1)*q)
d_view = np.lib.stride_tricks.as_strided(d, (m,n), out_strides)
d_view[:] = a
return d
Timings -
In [285]: m,n = 10,10
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
...:
In [286]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 21.3 µs per loop
In [287]: %timeit assign_vals_strides(s)
100000 loops, best of 3: 9.37 µs per loop
In [288]: %timeit assign_vals(s)
100000 loops, best of 3: 4.13 µs per loop
In [289]: m,n = 20,20
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
In [290]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 60.2 µs per loop
In [291]: %timeit assign_vals_strides(s)
10000 loops, best of 3: 41.8 µs per loop
In [292]: %timeit assign_vals(s)
10000 loops, best of 3: 35.5 µs per loop

You can use integer array indexing (creating the broadcasted indices with np.ix_):
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
The first time the indices have to be duplicated (you want [i, j, i, j] instead of just [i, j]) that's why I multiplied the tuple returned by np.ix_ with 2.
For example:
>>> d = np.zeros((10, 10, 10, 10), dtype=int)
>>> s = np.arange(100).reshape(10, 10)
>>> l1 = range(3)
>>> l2 = range(5)
>>> d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
And to make sure that the correct values were assigned:
>>> # Assert equality for the given condition
>>> for i in l1:
... for j in l2:
... assert d[i, j, i, j] == s[i, j]
>>> # Interactive tests
>>> d[0, 0, 0, 0], s[0, 0]
(0, 0)
>>> d[1, 2, 1, 2], s[1, 2]
(12, 12)
>>> d[2, 0, 2, 0], s[2, 0]
(20, 20)
>>> d[2, 4, 2, 4], s[2, 4]
(24, 24)

numpy.argmin for elements greater than a threshold

I'm interested in getting the location of the minimum value in an 1-d NumPy array that meets a certain condition (in my case, a medium threshold). For example:
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
I'd like to effectively mask all numbers in a that are under the limit, such that the result of np.argmin would be 6. Is there a computationally cheap way to mask values that don't meet a condition and then apply np.argmin?

You could store the valid indices and use those for both selecting the valid elements from a and also indexing into with the argmin() among the selected elements to get the final index output. Thus, the implementation would look something like this -
valid_idx = np.where(a >= limit)[0]
out = valid_idx[a[valid_idx].argmin()]
Sample run -
In [32]: limit = 3
...: a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
...:
In [33]: valid_idx = np.where(a >= limit)[0]
In [34]: valid_idx[a[valid_idx].argmin()]
Out[34]: 6
Runtime test -
For performance benchmarking, in this section I am comparing the other solution based on masked array against a regular array based solution as proposed earlier in this post for various datasizes.
def masked_argmin(a,limit): # Defining func for regular array based soln
valid_idx = np.where(a >= limit)[0]
return valid_idx[a[valid_idx].argmin()]
In [52]: # Inputs
...: a = np.random.randint(0,1000,(10000))
...: limit = 500
...:
In [53]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 233 µs per loop
In [54]: %timeit masked_argmin(a,limit)
10000 loops, best of 3: 101 µs per loop
In [55]: # Inputs
...: a = np.random.randint(0,1000,(100000))
...: limit = 500
...:
In [56]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 1.73 ms per loop
In [57]: %timeit masked_argmin(a,limit)
1000 loops, best of 3: 1.03 ms per loop

This can simply be accomplished using numpy's MaskedArray
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
b = np.ma.MaskedArray(a, a<limit)
np.ma.argmin(b) # == 6

fast way of computing diagonals of XMX^T in python

I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:
X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
I know somehow I should be using numpy.einsum, but I have not been able to figure out how?
Many thanks!

Sure there is an np.einsum way, like so -
np.einsum('ij,ij->i',X.dot(M),X)
This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.
Runtime test -
This section compares all the approaches posted thus far to solve the problem.
In [132]: # Setup input arrays
...: X = np.random.randn(10000, 100)
...: M = np.random.rand(100, 100)
...:
...: def original_app(X,M):
...: out = np.zeros(10000)
...: for n in range(10000):
...: out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
...: return out
...:
In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True
In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop
In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# #Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop
In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # #hpaulj's solution
1 loops, best of 3: 442 ms per loop
In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop

Here is a simpler example:
M = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
X = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:
In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692

In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))
In [212]: out=np.zeros(4)
In [213]: for n in range(4):
out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
.....:
In [214]: out
Out[214]: array([ 9., 144., 441., 900.])
One einsum approach:
In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([ 9., 144., 441., 900.])
Comparing the other einsum:
In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop
In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop
This is a bit faster, but results may diff with your larger size.
einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).
Similar use of einsum - Combine Einsum Expresions

Is there a better way to broadcast arrays?

I want to broadcast an array b to the shape it would take if it were in an arithmetic operation with another array a.
For example, if a.shape = (3,3) and b was a scalar, I want to get an array whose shape is (3,3) and is filled with the scalar.
One way to do this is like this:
>>> import numpy as np
>>> a = np.arange(9).reshape((3,3))
>>> b = 1 + a*0
>>> b
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
Although this works practically, I can't help but feel it looks a bit weird, and wouldn't be obvious to someone else looking at the code what I was trying to do.
Is there any more elegant way to do this? I've looked at the documentation for np.broadcast, but it's orders of magnitude slower.
In [1]: a = np.arange(10000).reshape((100,100))
In [2]: %timeit 1 + a*0
10000 loops, best of 3: 31.9 us per loop
In [3]: %timeit bc = np.broadcast(a,1);np.fromiter((v for u, v in bc),float).reshape(bc.shape)
100 loops, best of 3: 5.2 ms per loop
In [4]: 5.2e-3/32e-6
Out[4]: 162.5

If you just want to fill an array with a scalar, fill is probably the best choice. But it sounds like you want something more generalized. Rather than using broadcast you can use broadcast_arrays to get the result that (I think) you want.
>>> a = numpy.arange(9).reshape(3, 3)
>>> numpy.broadcast_arrays(a, 1)[1]
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
This generalizes to any two broadcastable shapes:
>>> numpy.broadcast_arrays(a, [1, 2, 3])[1]
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
It's not quite as fast as your ufunc-based method, but it's still on the same order of magnitude:
>>> %timeit 1 + a * 0
10000 loops, best of 3: 23.2 us per loop
>>> %timeit numpy.broadcast_arrays(a, 1)[1]
10000 loops, best of 3: 52.3 us per loop
But scalars, fill is still the clear front-runner:
>>> %timeit b = numpy.empty_like(a, dtype='i8'); b.fill(1)
100000 loops, best of 3: 6.59 us per loop
Finally, further testing shows that the fastest approach -- in at least some cases -- is to multiply by ones:
>>> %timeit numpy.broadcast_arrays(a, numpy.arange(100))[1]
10000 loops, best of 3: 53.4 us per loop
>>> %timeit (1 + a * 0) * numpy.arange(100)
10000 loops, best of 3: 45.9 us per loop
>>> %timeit b = numpy.ones_like(a, dtype='i8'); b * numpy.arange(100)
10000 loops, best of 3: 28.9 us per loop

The fastest and cleanest solution I know is:
b_arr = numpy.empty(a.shape) # Empty array
b_arr.fill(b) # Filling with one value

fill sounds like the simplest way:
>>> a = np.arange(9).reshape((3,3))
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> a.fill(10)
>>> a
array([[10, 10, 10],
[10, 10, 10],
[10, 10, 10]])
EDIT: As #EOL points out, you don't need arange if you want to create a new array, np.empty((100,100)) (or whatever shape) is better for this.
Timings:
In [3]: a = np.arange(10000).reshape((100,100))
In [4]: %timeit 1 + a*0
100000 loops, best of 3: 19.9 us per loop
In [5]: a = np.arange(10000).reshape((100,100))
In [6]: %timeit a.fill(1)
100000 loops, best of 3: 3.73 us per loop

If you just need to broadcast a scalar to some arbitrary shape, you can do something like this:
a = b*np.ones(shape=(3,3))
Edit: np.tile is more general. You can use it to duplicate any scalar/vector in any number of dimensions:
b = 1
N = 100
a = np.tile(b, reps=(N, N))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a quicker way to do this in Numpy? - python

I want to generate a 3D matrix in numpy. The code is: mean_value = np.array([1, 2, 3], dtype=np.float32) h, w = 5, 5 b = np.ones((h, w, 1), dtype=np.float32) * np.reshape(mean_value, [1, 1, 3]) print(b.shape) # (5, 5, 3) Is there any quicker way for generating b? Thanks.

Related

find where values in one numpy array fall between values in another numpy array

Vectorized assignment for numpy array with repeated indices (d[i,j,i,j] = s[i,j])

numpy.argmin for elements greater than a threshold

fast way of computing diagonals of XMX^T in python

Is there a better way to broadcast arrays?

Categories

Resources