The dot product of two vectors can be computed via numpy.dot. Now I want to compute the dot product of an array of vectors:
>>> numpy.arange(15).reshape((5, 3))
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
The vectors are row vectors and the output should be a 1d-array containing the results from the dot products:
array([ 5, 50, 149, 302, 509])
For the cross product (numpy.cross) this can be easily achieved specifying the axis keyword. However numpy.dot doesn't have such an option and passing it two 2d-arrays will result in the ordinary matrix product. I also had a look at numpy.tensordot but this doesn't seem to do the job either (being an extended matrix product).
I know that I can compute the dot product per element for 2d-arrays via
>>> numpy.einsum('ij, ji -> i', array2d, array2d.T)
However this solution doesn't work for 1d-arrays (i.e. just a single element). I would like to obtain a solution that works for both 1d-arrays (returning a scalar) and arrays of 1d-arrays (aka 2d-arrays) (returning a 1d-array).
Use np.einsum with ellipsis(...) to account for arrays with variable number of dimensions, like so -
np.einsum('...i,...i ->...', a, a)
Stating the docs on it -
To enable and control broadcasting, use an ellipsis. Default
NumPy-style broadcasting is done by adding an ellipsis to the left of
each term, like np.einsum('...ii->...i', a). To take the trace along
the first and last axes, you can do np.einsum('i...i', a), or to do a
matrix-matrix product with the left-most indices instead of rightmost,
you can do np.einsum('ij...,jk...->ik...', a, b).
Sample runs on 2D and 1D arrays -
In [88]: a
Out[88]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [89]: np.einsum('...i,...i ->...', a, a) # On 2D array
Out[89]: array([ 5, 50, 149, 302, 509])
In [90]: b = a[:,0]
In [91]: b
Out[91]: array([ 0, 3, 6, 9, 12])
In [92]: np.einsum('...i,...i ->...', b,b) # On 1D array
Out[92]: 270
Runtime test -
Since, we need to keep one axis aligned, at least with 2D arrays, one of np.einsum or np.matmul or the latest # operator would be efficient.
In [95]: a = np.random.rand(1000,1000)
# #unutbu's soln
In [96]: %timeit (a*a).sum(axis=-1)
100 loops, best of 3: 3.63 ms per loop
In [97]: %timeit np.einsum('...i,...i ->...', a, a)
1000 loops, best of 3: 944 µs per loop
In [98]: a = np.random.rand(1000)
# #unutbu's soln
In [99]: %timeit (a*a).sum(axis=-1)
100000 loops, best of 3: 9.11 µs per loop
In [100]: %timeit np.einsum('...i,...i ->...', a, a)
100000 loops, best of 3: 5.59 µs per loop
Related
I'm interested in getting the location of the minimum value in an 1-d NumPy array that meets a certain condition (in my case, a medium threshold). For example:
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
I'd like to effectively mask all numbers in a that are under the limit, such that the result of np.argmin would be 6. Is there a computationally cheap way to mask values that don't meet a condition and then apply np.argmin?
You could store the valid indices and use those for both selecting the valid elements from a and also indexing into with the argmin() among the selected elements to get the final index output. Thus, the implementation would look something like this -
valid_idx = np.where(a >= limit)[0]
out = valid_idx[a[valid_idx].argmin()]
Sample run -
In [32]: limit = 3
...: a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
...:
In [33]: valid_idx = np.where(a >= limit)[0]
In [34]: valid_idx[a[valid_idx].argmin()]
Out[34]: 6
Runtime test -
For performance benchmarking, in this section I am comparing the other solution based on masked array against a regular array based solution as proposed earlier in this post for various datasizes.
def masked_argmin(a,limit): # Defining func for regular array based soln
valid_idx = np.where(a >= limit)[0]
return valid_idx[a[valid_idx].argmin()]
In [52]: # Inputs
...: a = np.random.randint(0,1000,(10000))
...: limit = 500
...:
In [53]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 233 µs per loop
In [54]: %timeit masked_argmin(a,limit)
10000 loops, best of 3: 101 µs per loop
In [55]: # Inputs
...: a = np.random.randint(0,1000,(100000))
...: limit = 500
...:
In [56]: %timeit np.argmin(np.ma.MaskedArray(a, a<limit))
1000 loops, best of 3: 1.73 ms per loop
In [57]: %timeit masked_argmin(a,limit)
1000 loops, best of 3: 1.03 ms per loop
This can simply be accomplished using numpy's MaskedArray
import numpy as np
limit = 3
a = np.array([1, 2, 4, 5, 2, 5, 3, 6, 7, 9, 10])
b = np.ma.MaskedArray(a, a<limit)
np.ma.argmin(b) # == 6
I need to compute the diagonals of XMX^T without a for-loop, or in other words, replacing the following for loop:
X = nump.random.randn(10000, 100)
M = numpy.random.rand(100, 100)
out = numpy.zeros(10000)
for n in range(10000):
out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
I know somehow I should be using numpy.einsum, but I have not been able to figure out how?
Many thanks!
Sure there is an np.einsum way, like so -
np.einsum('ij,ij->i',X.dot(M),X)
This abuses the fast matrix-multiplication at the first level with X.dot(M) and then uses np.einsum to keep the first axis and sum reduces the second axis.
Runtime test -
This section compares all the approaches posted thus far to solve the problem.
In [132]: # Setup input arrays
...: X = np.random.randn(10000, 100)
...: M = np.random.rand(100, 100)
...:
...: def original_app(X,M):
...: out = np.zeros(10000)
...: for n in range(10000):
...: out[n] = np.dot(np.dot(X[n, :], M), X[n, :])
...: return out
...:
In [133]: np.allclose(original_app(X,M),np.einsum('ij,ij->i',X.dot(M),X))
Out[133]: True
In [134]: %timeit original_app(X,M) # Original solution
10 loops, best of 3: 97.8 ms per loop
In [135]: %timeit np.dot(X, np.dot(M,X.T)).trace()# #Colonel Beauvel's solution
1 loops, best of 3: 2.24 s per loop
In [136]: %timeit np.einsum('ij,jk,ik->i', X, M, X) # #hpaulj's solution
1 loops, best of 3: 442 ms per loop
In [137]: %timeit np.einsum('ij,ij->i',X.dot(M),X) # Proposed in this post
10 loops, best of 3: 28.1 ms per loop
Here is a simpler example:
M = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
X = array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
What you are looking for - the sum of diagonal elements - is more commonly known as the trace in maths. You can obtain the trace of your matrix product, without loop, by:
In [102]: np.dot(X, np.dot(M,X.T)).trace()
Out[102]: 692
In [210]: X=np.arange(12).reshape(4,3)
In [211]: M=np.ones((3,3))
In [212]: out=np.zeros(4)
In [213]: for n in range(4):
out[n]= np.dot(np.dot(X[n,:],M), X[n,:])
.....:
In [214]: out
Out[214]: array([ 9., 144., 441., 900.])
One einsum approach:
In [215]: np.einsum('ij,jk,ik->i', X, M, X)
Out[215]: array([ 9., 144., 441., 900.])
Comparing the other einsum:
In [218]: timeit np.einsum('ij,jk,ik->i', X, M, X)
100000 loops, best of 3: 8.98 µs per loop
In [219]: timeit np.einsum('ij,ij->i',X.dot(M),X)
100000 loops, best of 3: 11.9 µs per loop
This is a bit faster, but results may diff with your larger size.
einsum does save calculating a lot of unnecessary values (cf. to the diagonal or trace approaches).
Similar use of einsum - Combine Einsum Expresions
I've got a numpy array of strictly increasing "cutoff" values of length m, and a pandas series of values (thought the index isn't important and this could be cast to a numpy array) of values of length n.
I need to come up with an efficient way of spitting out a length m vector of counts of the number of elements in the pandas series less than the jth element of the "cutoff" array.
I could do this via a list iterator:
output = array([(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar])
but I was wondering if there were any way to do this that leveraged more of numpy's magic speed, as I have to do this quite a few times inside multiple loops and it keeps crasshing my computer.
Thanks!
Is this what you are looking for?
In [36]: a = np.random.random(20)
In [37]: a
Out[37]:
array([ 0.68574307, 0.15743428, 0.68006876, 0.63572484, 0.26279663,
0.14346269, 0.56267286, 0.47250091, 0.91168387, 0.98915746,
0.22174062, 0.11930722, 0.30848231, 0.1550406 , 0.60717858,
0.23805205, 0.57718675, 0.78075297, 0.17083826, 0.87301963])
In [38]: b = np.array((0.3,0.7))
In [39]: np.sum(a[:,None]<b[None,:], axis=0)
Out[39]: array([ 8, 16])
In [40]: np.sum(a[:,None]<b, axis=0) # b's new axis above is unnecessary...
Out[40]: array([ 8, 16])
In [41]: (a[:,None]<b).sum(axis=0) # even simpler
Out[41]: array([ 8, 16])
Timings are always well received (for a longish, 2E6 elements array)
In [47]: a = np.random.random(2000000)
In [48]: %timeit (a[:,None]<b).sum(axis=0)
10 loops, best of 3: 78.2 ms per loop
In [49]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
1 loop, best of 3: 448 ms per loop
For a smaller array
In [50]: a = np.random.random(2000)
In [51]: %timeit (a[:,None]<b).sum(axis=0)
10000 loops, best of 3: 89 µs per loop
In [52]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
The slowest run took 4.86 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 141 µs per loop
Edit
Divakar says that things may be different for lenghty bs, let's see
In [71]: a = np.random.random(2000)
In [72]: b =np.random.random(200)
In [73]: %timeit (a[:,None]<b).sum(axis=0)
1000 loops, best of 3: 1.44 ms per loop
In [74]: %timeit np.searchsorted(a, b, 'right',sorter=a.argsort())
10000 loops, best of 3: 172 µs per loop
quite different indeed! Thank you for prompting my curiosity.
Probably the OP should test for his use case, very long sample with respect to cutoff sequences or not? and where there is a balance?
Edit #2
I made a blooper in my timings, I forgot the axis=0 argument to .sum()...
I've edited the timings with the corrected statement and, of course, the corrected timing. My apologies.
You can use np.searchsorted for some NumPy magic -
# Convert to numpy array for some "magic"
pan_series_arr = np.array(pan_series)
# Let the magic begin!
sortidx = pan_series_arr.argsort()
out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
Explanation
You are performing [(pan_series < cutoff_val).sum() for cutoff_val in cutoff_ar] i.e. for each
element in cutoff_ar, we are counting the number of pan_series elements that are lesser than it. Now with np.searchsorted, we are looking for cutoff_ar to be put in a sorted pan_series_arr and get the indices of such positions compared to whom the current element in cutoff_ar is at 'right' position . These indices essentially represent the number of pan_series elements below the current cutoff_ar element, thus giving us our desired output.
Sample run
In [302]: cutoff_ar
Out[302]: array([ 1, 3, 9, 44, 63, 90])
In [303]: pan_series_arr
Out[303]: array([ 2, 8, 69, 55, 97])
In [304]: [(pan_series_arr < cutoff_val).sum() for cutoff_val in cutoff_ar]
Out[304]: [0, 1, 2, 2, 3, 4]
In [305]: sortidx = pan_series_arr.argsort()
...: out = np.searchsorted(pan_series_arr,cutoff_ar,'right',sorter=sortidx)
...:
In [306]: out
Out[306]: array([0, 1, 2, 2, 3, 4])
I have a ndarray of shape(z,y,x) containing values. I am trying to index this array with another ndarray of shape(y,x) that contains the z-index of the value I am interested in.
import numpy as np
val_arr = np.arange(27).reshape(3,3,3)
z_indices = np.array([[1,0,2],
[0,0,1],
[2,0,1]])
Since my arrays are rather large I tried to use np.take to avoid unnecessary copies of the array but just can't wrap my head around indexing 3-dimensional arrays with it.
How do I have to index val_arr with z_indices to get the values at the desired z-axis position? The expected outcome would be:
result_arr = np.array([[9,1,20],
[3,4,14],
[24,7,17]])
You can use choose to make the selection:
>>> z_indices.choose(val_arr)
array([[ 9, 1, 20],
[ 3, 4, 14],
[24, 7, 17]])
The function choose is incredibly useful, but can be somewhat tricky to make sense of. Essentially, given an array (val_arr) we can make a series of choices (z_indices) from each n-dimensional slice along the first axis.
Also: any fancy indexing operation will create a new array rather than a view of the original data. It is not possible to index val_arr with z_indices without creating a brand new array.
With readability, np.choose definitely looks great.
If performance is of essence, you can calculate the linear indices and then use np.take or use a flattened version with .ravel() and extract those specific elements from val_arr. The implementation would look something like this -
def linidx_take(val_arr,z_indices):
# Get number of columns and rows in values array
_,nC,nR = val_arr.shape
# Get linear indices and thus extract elements with np.take
idx = nC*nR*z_indices + nR*np.arange(nR)[:,None] + np.arange(nC)
return np.take(val_arr,idx) # Or val_arr.ravel()[idx]
Runtime tests and verify results -
Ogrid based solution from here is made into a generic version for these tests, like so :
In [182]: def ogrid_based(val_arr,z_indices):
...: v_shp = val_arr.shape
...: y,x = np.ogrid[0:v_shp[1], 0:v_shp[2]]
...: return val_arr[z_indices, y, x]
...:
Case #1: Smaller datasize
In [183]: val_arr = np.random.rand(30,30,30)
...: z_indices = np.random.randint(0,30,(30,30))
...:
In [184]: np.allclose(z_indices.choose(val_arr),ogrid_based(val_arr,z_indices))
Out[184]: True
In [185]: np.allclose(z_indices.choose(val_arr),linidx_take(val_arr,z_indices))
Out[185]: True
In [187]: %timeit z_indices.choose(val_arr)
1000 loops, best of 3: 230 µs per loop
In [188]: %timeit ogrid_based(val_arr,z_indices)
10000 loops, best of 3: 54.1 µs per loop
In [189]: %timeit linidx_take(val_arr,z_indices)
10000 loops, best of 3: 30.3 µs per loop
Case #2: Bigger datasize
In [191]: val_arr = np.random.rand(300,300,300)
...: z_indices = np.random.randint(0,300,(300,300))
...:
In [192]: z_indices.choose(val_arr) # Seems like there is some limitation here with bigger arrays.
Traceback (most recent call last):
File "<ipython-input-192-10c3bb600361>", line 1, in <module>
z_indices.choose(val_arr)
ValueError: Need between 2 and (32) array objects (inclusive).
In [194]: np.allclose(linidx_take(val_arr,z_indices),ogrid_based(val_arr,z_indices))
Out[194]: True
In [195]: %timeit ogrid_based(val_arr,z_indices)
100 loops, best of 3: 3.67 ms per loop
In [196]: %timeit linidx_take(val_arr,z_indices)
100 loops, best of 3: 2.04 ms per loop
If you have numpy >= 1.15.0 you could use numpy.take_along_axis. In your case:
result_array = numpy.take_along_axis(val_arr, z_indices.reshape((3,3,1)), axis=2)
That should give you the result you want in one neat line of code. Note the size of the indices array. It needs to have the same number of dimensions as your val_arr (and the same size in the first two dimensions).
Inspired by this thread, using np.ogrid:
y,x = np.ogrid[0:3, 0:3]
print [z_indices, y, x]
[array([[1, 0, 2],
[0, 0, 1],
[2, 0, 1]]),
array([[0],
[1],
[2]]),
array([[0, 1, 2]])]
print val_arr[z_indices, y, x]
[[ 9 1 20]
[ 3 4 14]
[24 7 17]]
I have to admit that multidimensional fancy indexing can be messy and confusing :)
Given a numpy array x of shape (m,) and a numpy array y of shape (m/n,), how do I multiply x by corresponding elements of y efficiently?
Here's my best attempt:
In [13]: x = np.array([1, 5, 3, 2, 9, 1])
In [14]: y = np.array([2, 4, 6])
In [15]: n = 2
In [16]: (y[:, np.newaxis] * x.reshape((-1, n))).flatten()
Out[16]: array([ 2, 10, 12, 8, 54, 6])
Your solution looks pretty good to me.
If you wanted to speed it up slightly, you could:
Use ravel() instead of flatten() (the former will return a view if possible, the latter always returns a copy).
Reshape x in Fortran order to avoid the overhead of another indexing operation on y (although subsequent timings suggest this speedup is negligible)
So rewritten the multiplication becomes:
>>> (x.reshape((2, -1), order='f') * y).ravel('f')
array([ 2, 10, 12, 8, 54, 6])
Timings:
>>> %timeit (y[:, np.newaxis] * x.reshape((-1, n))).flatten()
100000 loops, best of 3: 7.4 µs per loop
>>> %timeit (x.reshape((n, -1), order='f') * y).ravel('f')
100000 loops, best of 3: 4.98 µs per loop