Say I have two 3 dimensional matrices, like so (taken from this matlab example http://www.mathworks.com/help/matlab/ref/dot.html):
A = cat(3,[1 1;1 1],[2 3;4 5],[6 7;8 9])
B = cat(3,[2 2;2 2],[10 11;12 13],[14 15; 16 17])
If I want to take pairwise dot products along the third dimension, I could do so like this in matlab:
C = dot(A,B,3)
Which would give the result:
C =
106 140
178 220
What would be the equivalent operation in numpy, preferably a vectorized option, to avoid having to write a double for loop through the entire array. I can't seem to make sense of what np.tensordot or np.inner are supposed to do, but they might be options.
In [169]:
A = np.dstack([[[1, 1],[1 ,1]],[[2 ,3],[4, 5]],[[6, 7],[8, 9]]])
B = np.dstack([[[2, 2],[2, 2]],[[10, 11],[12, 13]],[[14, 15], [16, 17]]])
c=np.tensordot(A, B.T,1)
np.vstack([np.diag(c[:,i,i]) for i in range(A.shape[0])]).T
Out[169]:
array([[106, 140],
[178, 220]])
But surprisingly it is the slowest:
In [170]:
%%timeit
c=np.tensordot(A, B.T,1)
np.vstack([np.diag(c[:,i,i]) for i in range(A.shape[0])]).T
10000 loops, best of 3: 95.2 µs per loop
In [171]:
%timeit np.einsum('i...,i...',a,b)
100000 loops, best of 3: 6.93 µs per loop
In [172]:
%timeit inner1d(A,B)
100000 loops, best of 3: 4.51 µs per loop
Using np.einsum:
In [9]: B = np.array([[[2, 2],[2, 2]],[[10, 11],[12, 13]],[[14, 15],[16, 17]]])
In [10]: A = np.array([[[1, 1],[1, 1]],[[2, 3],[4, 5]],[[6, 7],[8, 9]]])
In [11]: np.einsum('i...,i...',A,B)
Out[11]:
array([[106, 140],
[178, 220]])
Or here's another fun one:
In [37]: from numpy.core.umath_tests import inner1d
In [38]: inner1d(A,B)
Out[38]:
array([[106, 140],
[178, 220]])
Edit in response to #flebool's comment, inner1d works for both (2,2,3) and (3,2,2) shaped arrays:
In [41]: A = dstack([[[1, 1],[1 ,1]],[[2 ,3],[4, 5]],[[6, 7],[8, 9]]])
In [42]: B = dstack([[[2, 2],[2, 2]],[[10, 11],[12, 13]],[[14, 15], [16, 17]]])
In [43]: inner1d(A,B)
Out[43]:
array([[106, 140],
[178, 220]])
Here's a solution:
A = dstack([[[1, 1],[1 ,1]],[[2 ,3],[4, 5]],[[6, 7],[8, 9]]])
B = dstack([[[2, 2],[2, 2]],[[10, 11],[12, 13]],[[14, 15], [16, 17]]])
C = einsum('...k,...k',A,B)
Basically dstack concatenates along the third axis, (docs), and then you use the powerful einstein summation tool einsum provided by numpy (docs)
Related
The dot product of two vectors can be computed via numpy.dot. Now I want to compute the dot product of an array of vectors:
>>> numpy.arange(15).reshape((5, 3))
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
The vectors are row vectors and the output should be a 1d-array containing the results from the dot products:
array([ 5, 50, 149, 302, 509])
For the cross product (numpy.cross) this can be easily achieved specifying the axis keyword. However numpy.dot doesn't have such an option and passing it two 2d-arrays will result in the ordinary matrix product. I also had a look at numpy.tensordot but this doesn't seem to do the job either (being an extended matrix product).
I know that I can compute the dot product per element for 2d-arrays via
>>> numpy.einsum('ij, ji -> i', array2d, array2d.T)
However this solution doesn't work for 1d-arrays (i.e. just a single element). I would like to obtain a solution that works for both 1d-arrays (returning a scalar) and arrays of 1d-arrays (aka 2d-arrays) (returning a 1d-array).
Use np.einsum with ellipsis(...) to account for arrays with variable number of dimensions, like so -
np.einsum('...i,...i ->...', a, a)
Stating the docs on it -
To enable and control broadcasting, use an ellipsis. Default
NumPy-style broadcasting is done by adding an ellipsis to the left of
each term, like np.einsum('...ii->...i', a). To take the trace along
the first and last axes, you can do np.einsum('i...i', a), or to do a
matrix-matrix product with the left-most indices instead of rightmost,
you can do np.einsum('ij...,jk...->ik...', a, b).
Sample runs on 2D and 1D arrays -
In [88]: a
Out[88]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [89]: np.einsum('...i,...i ->...', a, a) # On 2D array
Out[89]: array([ 5, 50, 149, 302, 509])
In [90]: b = a[:,0]
In [91]: b
Out[91]: array([ 0, 3, 6, 9, 12])
In [92]: np.einsum('...i,...i ->...', b,b) # On 1D array
Out[92]: 270
Runtime test -
Since, we need to keep one axis aligned, at least with 2D arrays, one of np.einsum or np.matmul or the latest # operator would be efficient.
In [95]: a = np.random.rand(1000,1000)
# #unutbu's soln
In [96]: %timeit (a*a).sum(axis=-1)
100 loops, best of 3: 3.63 ms per loop
In [97]: %timeit np.einsum('...i,...i ->...', a, a)
1000 loops, best of 3: 944 µs per loop
In [98]: a = np.random.rand(1000)
# #unutbu's soln
In [99]: %timeit (a*a).sum(axis=-1)
100000 loops, best of 3: 9.11 µs per loop
In [100]: %timeit np.einsum('...i,...i ->...', a, a)
100000 loops, best of 3: 5.59 µs per loop
I have some Python code that I would like to speed up by using Cython. I use a lot of Numpy operations in my script, like np.reshape and np.sum, when working with ndarrays. When I use the operations in my code I don't know how to make them not interact with python, so they slow my Cython code down to about the speed Python is taking.
Here is an example. Here is Python code that bins arrays by an arbitrary amount.
import numpy as np
def binarray (array,nbin):
temp=array.reshape(int(array.shape[0]/nbin),nbin,int(array.shape[1]/nbin),nbin)
temp=temp.sum(axis=(3,1))
return temp
I have defined the data member types and declared the np.ndarray instances as described in the Cython documentation.
import numpy as np
cimport numpy as np
DTYPE = np.double
ctypedef np.int_t DTYPE_t
def binarray (np.ndarray[DTYPE_t, ndim=2] array,int nbin):
cdef int x0 = int(array.shape[0]/nbin)
cdef int x2 = int(array.shape[1]/nbin)
cdef np.ndarray[DTYPE_t, ndim=4] temp = np.zeros([x0,nbin,x2,nbin], dtype=DTYPE)
temp = array.reshape(x0,nbin,x2,nbin)
return temp.sum(axis=(3,1))
But I can't find anywhere how I might use a Numpy operation in Cython. This is my first time using Cython, as I understand it Numpy arrays and operations on them are compiled in c so I thought I would be able to use them in Cython, is this the case? Or will I have to rewrite these functions myself?
So this function is supposed to do something like this:
In [474]: arr = np.arange(24).reshape(4,6)
In [475]: arr
Out[475]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
In [476]: binarray(arr,2)
Out[476]:
array([[14, 22, 30],
[62, 70, 78]])
Times don't look too shabby to me (this is on a relatively old laptop):
In [483]: timeit binarray(arr,2)
The slowest run took 4.93 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 14.6 µs per loop
In [484]: arr=np.ones((1000,1000),int)
In [485]: timeit binarray(arr,50)
100 loops, best of 3: 2.96 ms per loop
In [486]: timeit binarray(arr,100)
100 loops, best of 3: 2.78 ms per loop
In [487]: timeit binarray(arr,10)
100 loops, best of 3: 4.91 ms per loop
For reference, sum without reshaping:
In [489]: timeit arr.sum()
100 loops, best of 3: 2.28 ms per loop
And the minor reshape cost:
In [551]: %%timeit arr=np.ones((1000,1000));nbin=100
temp = arr.reshape(int(arr.shape[0]/nbin), nbin, int(arr.shape[1]/nbin), nbin)
...:
...
100000 loops, best of 3: 2.55 µs per loop
======================
I'd suggest working through this nditer turtorial, all the way to the cython implementation:
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
Here's a test of concept - using nditer to sum on 2 axes:
In [543]: arr = np.arange(24).reshape(4,6)
In [544]: res = np.zeros((2,3),int)
In [545]: it = np.nditer([arr.reshape(2,2,3,2),res.reshape(2,1,3,1)],
flags=['reduce_ok','external_loop'],
op_flags=[['readonly'],['readwrite']])
In [546]: for x,y in it:
...: print(x,y)
...: y[...] += x
...:
[0 1] [0 0]
[2 3] [0 0]
[4 5] [0 0]
[6 7] [1 1]
[8 9] [5 5]
[10 11] [9 9]
[12 13] [0 0]
[14 15] [0 0]
[16 17] [0 0]
[18 19] [25 25]
[20 21] [29 29]
[22 23] [33 33]
In [547]: res
Out[547]:
array([[14, 22, 30],
[62, 70, 78]])
In Cython that has a chance of being as fast as temp.sum(axis=(3,1)).
I'm sure this question is Googleable, but I don't know what keywords to use. I'm curious about a specific case, but also about how to do it in general. Lets say I have a RGB image as an array of shape (width, height, 3) and I want to find all the pixels where the red channel is greater than 100. I feel like image > [100, 0, 0] should give me an array of indices (and would if I was comparing a scalar and using a greyscale image) but this compares each element with the list. How do I compare over the first two dimensions where each "element" is the last dimension?
To detect for red-channel only, you can do something like this -
np.argwhere(image[:,:,0] > threshold)
Explanation :
Compare the red-channel with the threshold to give us a boolean array of same shape as the input image without the third axis (color channel).
Use np.argwhere to get the indices of successful matches.
For a case when you want to see if any channel is above some threshold, use .any(-1) (any elements that satisfy the condition along the last axis/color channel).
np.argwhere((image > threshold).any(-1))
Sample run
Input image :
In [76]: image
Out[76]:
array([[[118, 94, 109],
[ 36, 122, 6],
[ 85, 91, 58],
[ 30, 2, 23]],
[[ 32, 47, 50],
[ 1, 105, 141],
[ 91, 120, 58],
[129, 127, 111]]], dtype=uint8)
In [77]: threshold
Out[77]: 100
Case #1: Red-channel only
In [69]: np.argwhere(image[:,:,0] > threshold)
Out[69]:
array([[0, 0],
[1, 3]])
In [70]: image[0,0]
Out[70]: array([118, 94, 109], dtype=uint8)
In [71]: image[1,3]
Out[71]: array([129, 127, 111], dtype=uint8)
Case #2: Any-channel
In [72]: np.argwhere((image > threshold).any(-1))
Out[72]:
array([[0, 0],
[0, 1],
[1, 1],
[1, 2],
[1, 3]])
In [73]: image[0,1]
Out[73]: array([ 36, 122, 6], dtype=uint8)
In [74]: image[1,1]
Out[74]: array([ 1, 105, 141], dtype=uint8)
In [75]: image[1,2]
Out[75]: array([ 91, 120, 58], dtype=uint8)
Faster alternative to np.any in np.einsum
np.einsum could be tricked to perform np.any's work and as it turns out is a tad faster.
Thus, boolean_arr.any(-1) would be equivalent to np.einsum('ijk->ij',boolean_arr).
Here are the associated runtimes across various datasizes -
In [105]: image = np.random.randint(0,255,(30,30,3)).astype('uint8')
...: %timeit np.argwhere((image > threshold).any(-1))
...: %timeit np.argwhere(np.einsum('ijk->ij',image>threshold))
...: out1 = np.argwhere((image > threshold).any(-1))
...: out2 = np.argwhere(np.einsum('ijk->ij',image>threshold))
...: print np.allclose(out1,out2)
...:
10000 loops, best of 3: 79.2 µs per loop
10000 loops, best of 3: 56.5 µs per loop
True
In [106]: image = np.random.randint(0,255,(300,300,3)).astype('uint8')
...: %timeit np.argwhere((image > threshold).any(-1))
...: %timeit np.argwhere(np.einsum('ijk->ij',image>threshold))
...: out1 = np.argwhere((image > threshold).any(-1))
...: out2 = np.argwhere(np.einsum('ijk->ij',image>threshold))
...: print np.allclose(out1,out2)
...:
100 loops, best of 3: 5.47 ms per loop
100 loops, best of 3: 3.69 ms per loop
True
In [107]: image = np.random.randint(0,255,(3000,3000,3)).astype('uint8')
...: %timeit np.argwhere((image > threshold).any(-1))
...: %timeit np.argwhere(np.einsum('ijk->ij',image>threshold))
...: out1 = np.argwhere((image > threshold).any(-1))
...: out2 = np.argwhere(np.einsum('ijk->ij',image>threshold))
...: print np.allclose(out1,out2)
...:
1 loops, best of 3: 833 ms per loop
1 loops, best of 3: 640 ms per loop
True
Lets say I have an tensor of the following form:
import numpy as np
a = np.array([ [[1,2],
[3,4]],
[[5,6],
[7,3]]
])
# a.shape : (2,2,2) is a tensor containing 2x2 matrices
indices = np.argmax(a, axis=2)
#print indices
for mat in a:
max_i = np.argmax(mat,axis=1)
# Not really working I would like to
# change 4 in the first matrix to -1
# and 3 in the last to -1
mat[max_i] = -1
print a
Now what I would like to do is to use indices as a mask on a to replace every max element with say -1. Is there a numpy way of doing this ? so far all I have figured out is using for loops.
Here's one way using linear indexing in 3D -
m,n,r = a.shape
offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
np.put(a,indices + offset,-1)
Sample run -
In [92]: a
Out[92]:
array([[[28, 59, 26, 70],
[57, 28, 71, 49],
[33, 6, 10, 90]],
[[24, 16, 83, 67],
[96, 16, 72, 56],
[74, 4, 71, 81]]])
In [93]: indices = np.argmax(a, axis=2)
In [94]: m,n,r = a.shape
...: offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
...: np.put(a,indices + offset,-1)
...:
In [95]: a
Out[95]:
array([[[28, 59, 26, -1],
[57, 28, -1, 49],
[33, 6, 10, -1]],
[[24, 16, -1, 67],
[-1, 16, 72, 56],
[74, 4, 71, -1]]])
Here's another way with linear indexing again, but in 2D -
m,n,r = a.shape
a.reshape(-1,r)[np.arange(m*n),indices.ravel()] = -1
Runtime tests and verify output -
In [156]: def vectorized_app1(a,indices): # 3D linear indexing
...: m,n,r = a.shape
...: offset = n*r*np.arange(m)[:,None] + r*np.arange(n)
...: np.put(a,indices + offset,-1)
...:
...: def vectorized_app2(a,indices): # 2D linear indexing
...: m,n,r = a.shape
...: a.reshape(-1,r)[np.arange(m*n),indices.ravel()] = -1
...:
In [157]: # Generate random 3D array and the corresponding indices array
...: a = np.random.randint(0,99,(100,100,100))
...: indices = np.argmax(a, axis=2)
...:
...: # Make copies for feeding into functions
...: ac1 = a.copy()
...: ac2 = a.copy()
...:
In [158]: vectorized_app1(ac1,indices)
In [159]: vectorized_app2(ac2,indices)
In [160]: np.allclose(ac1,ac2)
Out[160]: True
In [161]: # Make copies for feeding into functions
...: ac1 = a.copy()
...: ac2 = a.copy()
...:
In [162]: %timeit vectorized_app1(ac1,indices)
1000 loops, best of 3: 311 µs per loop
In [163]: %timeit vectorized_app2(ac2,indices)
10000 loops, best of 3: 145 µs per loop
You can use indices to index into the last dimension of a provided that you also specify index arrays into the first two dimensions as well:
import numpy as np
a = np.array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 3]]])
indices = np.argmax(a, axis=2)
print(repr(a[range(a.shape[0]), range(a.shape[1]), indices]))
# array([[2, 3],
# [2, 7]])
a[range(a.shape[0]), range(a.shape[1]), indices] = -1
print(repr(a))
# array([[[ 1, -1],
# [ 3, 4]],
# [[ 5, 6],
# [-1, -1]]])
Suppose I have a numpy array
a = np.array([0, 8, 25, 78, 68, 98, 1])
and a mask array b = [0, 1, 1, 0, 1]
Is there an easy way to get the following array:
[8, 25, 68] - which is first, second and forth element from the original array. Which sounds like a mask for me.
The most obvious way I have tried is a[b], but this does not yield a desirable result.
After this I tried to look into masked operations in numpy but it looks like it guides me in the wrong direction.
If a and b are both numpy arrays and b is strictly 1's and 0's:
>>> a[b.astype(np.bool)]
array([ 8, 25, 68])
It should be noted that this is only noticeably faster for extremely small cases, and is much more limited in scope then #falsetru's answer:
a = np.random.randint(0,2,5)
%timeit a[a==1]
100000 loops, best of 3: 4.39 µs per loop
%timeit a[a.astype(np.bool)]
100000 loops, best of 3: 2.44 µs per loop
For the larger case:
a = np.random.randint(0,2,5E6)
%timeit a[a==1]
10 loops, best of 3: 59.6 ms per loop
%timeit a[a.astype(np.bool)]
10 loops, best of 3: 56 ms per loop
>>> a = np.array([0, 8, 25, 78, 68, 98, 1])
>>> b = np.array([0, 1, 1, 0, 1])
>>> a[b == 1]
array([ 8, 25, 68])
Alternative using itertools.compress:
>>> import itertools
>>> list(itertools.compress(a, b))
[8, 25, 68]