Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop
Related
I have two 3d arrays A and B with shape (N, 2, 2) that I would like to multiply element-wise according to the N-axis with a matrix product on each of the 2x2 matrix. With a loop implementation, it looks like
C[i] = dot(A[i], B[i])
Is there a way I could do this without using a loop? I've looked into tensordot, but haven't been able to get it to work. I think I might want something like tensordot(a, b, axes=([1,2], [2,1])) but that's giving me an NxN matrix.
It seems you are doing matrix-multiplications for each slice along the first axis. For the same, you can use np.einsum like so -
np.einsum('ijk,ikl->ijl',A,B)
We can also use np.matmul -
np.matmul(A,B)
On Python 3.x, this matmul operation simplifies with # operator -
A # B
Benchmarking
Approaches -
def einsum_based(A,B):
return np.einsum('ijk,ikl->ijl',A,B)
def matmul_based(A,B):
return np.matmul(A,B)
def forloop(A,B):
N = A.shape[0]
C = np.zeros((N,2,2))
for i in range(N):
C[i] = np.dot(A[i], B[i])
return C
Timings -
In [44]: N = 10000
...: A = np.random.rand(N,2,2)
...: B = np.random.rand(N,2,2)
In [45]: %timeit einsum_based(A,B)
...: %timeit matmul_based(A,B)
...: %timeit forloop(A,B)
100 loops, best of 3: 3.08 ms per loop
100 loops, best of 3: 3.04 ms per loop
100 loops, best of 3: 10.9 ms per loop
You just need to perform the operation on the first dimension of your tensors, which is labeled by 0:
c = tensordot(a, b, axes=(0,0))
This will work as you wish. Also you don't need a list of axes, because it's just along one dimension you're performing the operation. With axes([1,2],[2,1]) you're cross multiplying the 2nd and 3rd dimensions. If you write it in index notation (Einstein summing convention) this corresponds to c[i,j] = a[i,k,l]*b[j,k,l], thus you're contracting the indices you want to keep.
EDIT: Ok, the problem is that the tensor product of a two 3d object is a 6d object. Since contractions involve pairs of indices, there's no way you'll get a 3d object by a tensordot operation. The trick is to split your calculation in two: first you do the tensordot on the index to do the matrix operation and then you take a tensor diagonal in order to reduce your 4d object to 3d. In one command:
d = np.diagonal(np.tensordot(a,b,axes=()), axis1=0, axis2=2)
In tensor notation d[i,j,k] = c[i,j,i,k] = a[i,j,l]*b[i,l,k].
Assume that I have two arrays A and B, where both A and B are m x n. My goal is now, for each row of A and B, to find where I should insert the elements of row i of A in the corresponding row of B. That is, I wish to apply np.digitize or np.searchsorted to each row of A and B.
My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?
We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a. Additionally, to make it work for negative numbers too, we just need to offset for the minimum numbers as well.
So, we would have a vectorized implementation like so -
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
Runtime test -
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
The solution provided by #Divakar is ideal for integer data, but beware of precision issues for floating point values, especially if they span multiple orders of magnitude (e.g. [[1.0, 2,0, 3.0, 1.0e+20],...]). In some cases r may be so large that applying a+r and b+r wipes out the original values you're trying to run searchsorted on, and you're just comparing r to r.
To make the approach more robust for floating-point data, you could embed the row information into the arrays as part of the values (as a structured dtype), and run searchsorted on these structured dtypes instead.
def searchsorted_2d (a, v, side='left', sorter=None):
import numpy as np
# Make sure a and v are numpy arrays.
a = np.asarray(a)
v = np.asarray(v)
# Augment a with row id
ai = np.empty(a.shape,dtype=[('row',int),('value',a.dtype)])
ai['row'] = np.arange(a.shape[0]).reshape(-1,1)
ai['value'] = a
# Augment v with row id
vi = np.empty(v.shape,dtype=[('row',int),('value',v.dtype)])
vi['row'] = np.arange(v.shape[0]).reshape(-1,1)
vi['value'] = v
# Perform searchsorted on augmented array.
# The row information is embedded in the values, so only the equivalent rows
# between a and v are considered.
result = np.searchsorted(ai.flatten(),vi.flatten(), side=side, sorter=sorter)
# Restore the original shape, decode the searchsorted indices so they apply to the original data.
result = result.reshape(vi.shape) - vi['row']*a.shape[1]
return result
Edit: The timing on this approach is abysmal!
In [21]: %timeit searchsorted_2d(a,b)
10 loops, best of 3: 92.5 ms per loop
You would be better off just just using map over the array:
In [22]: %timeit np.array(list(map(np.searchsorted,a,b)))
100 loops, best of 3: 13.8 ms per loop
For integer data, #Divakar's approach is still the fastest:
In [23]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 7.26 ms per loop
I have a matrix M1 of shape (N*2) and another matrix M2 (2*N), I want to obtain a result of (N), each element i is the product of ith row of M1 and ith column of M2.
I tried to use dot in NumPy, but it can only give me the matrix multiplication result, which is (N*N), of course, I can take the diagonal which is what I want, I would like to know is there a better way to do this?
Approach #1
You can use np.einsum -
np.einsum('ij,ji->i',M1,M2)
Explanation :
The original loopy solution would look something like this -
def original_app(M1,M2):
N = M1.shape[0]
out = np.zeros(N)
for i in range(N):
out[i] = M1[i].dot(M2[:,i])
return out
Thus, for each iteration, we have :
out[i] = M1[i].dot(M2[:,i])
Looking at the iterator, we need to align the first axis of M1 with the second axis of M2. Again, since we are performing matrix-multiplication and that by its very definition is aligning the second axis of M1 with the first axis of M2 and also sum-reducing these elements at each iteration.
When porting over to einsum, keep the axes to be aligned between the two inputs to have the same string when specifying the string notation to it. So, the inputs would be 'ij,ji for M1 and M2 respectively. The output after losing the second string from M1, which is same as first string from M2 in that sum-reduction, should be left as i. Thus, the complete string notation would be : 'ij,ji->i' and the final solution as : np.einsum('ij,ji->i',M1,M2).
Approach #2
The number of cols in M1 or number of rows in M2 is 2. So, alternatively, we can just slice, perform the element-wise multiplication and sum up those, like so -
M1[:,0]*M2[0] + M1[:,1]*M2[1]
Runtime test
In [431]: # Setup inputs
...: N = 1000
...: M1 = np.random.rand(N,2)
...: M2 = np.random.rand(2,N)
...:
In [432]: np.allclose(original_app(M1,M2),np.einsum('ij,ji->i',M1,M2))
Out[432]: True
In [433]: np.allclose(original_app(M1,M2),M1[:,0]*M2[0] + M1[:,1]*M2[1])
Out[433]: True
In [434]: %timeit original_app(M1,M2)
100 loops, best of 3: 2.09 ms per loop
In [435]: %timeit np.einsum('ij,ji->i',M1,M2)
100000 loops, best of 3: 13 µs per loop
In [436]: %timeit M1[:,0]*M2[0] + M1[:,1]*M2[1]
100000 loops, best of 3: 14.2 µs per loop
Massive speedup there!
I have 2 arrays: x and bigx. They span the same range, but bigx has many more points.
e.g.
x = np.linspace(0,10,100)
bigx = np.linspace(0,10,1000)
I want to find the indices in bigx where x and bigx match to 2 significant figures. I need to do this extremely quickly as I need the indices for each step of an integral.
Using numpy.where is very slow:
index_bigx = [np.where(np.around(bigx,2) == i) for i in np.around(x,2)]
Using numpy.in1d is ~30x faster
index_bigx = np.where(np.in1d(np.around(bigx), np.around(x,2) == True)
I also tried using zip and enumerate as I know that's supposed be faster but it returns empty:
>>> index_bigx = [i for i,(v,myv) in enumerate(zip(np.around(bigx,2), np.around(x,2))) if myv == v]
>>> print index_bigx
[]
I think I must have muddled things here and I want to optimise it as much as possible. Any suggestions?
Since bigx is always evenly spaced, it's quite straightforward to just directly compute the indices:
start = bigx[0]
step = bigx[1] - bigx[0]
indices = ((x - start)/step).round().astype(int)
Linear time, no searching necessary.
Since we are mapping x to bigx which has its elemments equidistant, you can use a binning operation with np.searchsorted to simulate the index finding operation using its 'left' option. Here's the implementation -
out = np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')
Runtime tests
In [879]: import numpy as np
...:
...: xlen = 10000
...: bigxlen = 70000
...: bigx = 100*np.linspace(0,1,bigxlen)
...: x = bigx[np.random.permutation(bigxlen)[:xlen]]
...:
In [880]: %timeit np.where(np.in1d(np.around(bigx,2), np.around(x,2)))
...: %timeit np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')
...:
100 loops, best of 3: 4.1 ms per loop
1000 loops, best of 3: 1.81 ms per loop
If you want just the elements, this should work:
np.intersect1d(np.around(bigx,2), np.around(x,2))
If you want the indices, try this:
around_x = set(np.around(x,2))
index_bigx = [i for i,b in enumerate(np.around(bigx,2)) if b in around_x]
Note: these were not tested.
I need to extend this question, which sums values of an array based on indices from a second array. Let A be the result array, B be the index array, and C the array to be summed over. Then A[i] = sum over C such that index(B) == i.
Instead, my setup is
N = 5
M = 2
A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)
I need A[i,j] = sum_{k in 0...N} C[j,k] such that C[k] == i , i.e. a rowsum conditional on the indices of B matching i. Is there an efficient way to do this? For my application N is around 10,000 and M is around 20. This operation is called for every iteration in a minimization problem... my current looping method is terribly slow.
Thanks!
Following #DSM's comment, I'm assuming your C[k] == i supposed to be B[k] == i. If that's the case, does your loop version look something like this?
Nested Loop Version
import numpy as np
N = 5
M = 2
A = np.zeros((M,N))
B = np.random.randint(M, size=N) # contains indices for A
C = np.random.rand(N,N)
for i in range(M):
for j in range(N):
for k in range(N):
if B[k] == i:
A[i,j] += C[j,k]
There's more than one way to vectorize this problem. I'm going to show my thought process below, but there are more efficient ways to do it (e.g. #DSM's version that recognizes the matrix multiplication inherent in the problem).
For the sake of explanation, here's a walk-through of one approach.
Vectorizing the Inner Loop
Let's start by re-writing the inner k loop:
for i in range(M):
for j in range(N):
A[i,j] = C[j, B == i].sum()
It might be easier to think of this as C[j][B == i].sum(). We're just selecting jth row of C, selecting only the elements in that row where B is equal to i, and summing them.
Vectorizing the Outer-most Loop
Next let's break down the outer i loop. Now we're going to get to the point where readability will start to suffer, unfortunately...
i = np.arange(M)[:,np.newaxis]
mask = (B == i).astype(int)
for j in range(N):
A[:,j] = (C[j] * mask).sum(axis=-1)
There are a couple different tricks here. In this case, we're iterating over the columns of A. Each column of A is the sum of a subset of the corresponding row of C. The subset of the row of C is determined by where B is equal to the row index i.
To get around iterating through i, we're making a 2D array where B == i by adding a new axis to i. (Have a look at the documentation for numpy broadcasting if you're confused by this.) In other words:
B:
array([1, 1, 1, 1, 0])
i:
array([[0],
[1]])
B == i:
array([[False, False, False, False, True],
[ True, True, True, True, False]], dtype=bool)
What we want is to take two (M) filtered sums of C[j], one for each row in B == i. This will give us a two-element vector corresponding to the jth column in A.
We can't do this by indexing C directly because the result won't maintain it's shape, as each row may have a different number of elements. We'll get around this by multiplying the B == i mask by the current row of C, resulting in zeros where B == i is False, and the value in the current row of C where it's true.
To do this, we need to turn the boolean array B == i into integers:
mask = (B == i).astype(int):
array([[0, 0, 0, 0, 1],
[1, 1, 1, 1, 0]])
So when we multiply it by the current row of C:
C[j]:
array([ 0.19844887, 0.44858679, 0.35370919, 0.84074259, 0.74513377])
C[j] * mask:
array([[ 0. , 0. , 0. , 0. , 0.74513377],
[ 0.19844887, 0.44858679, 0.35370919, 0.84074259, 0. ]])
Then we can sum over each row to get the current column of A (This will be broadcast to a column when it's assigned to A[:,j]):
(C[j] * mask).sum(axis=-1):
array([ 0.74513377, 1.84148744])
Fully Vectorized Version
Finally, breaking down the last loop, we can apply the exact same principle to add a third dimension for the loop over j:
i = np.arange(M)[:,np.newaxis,np.newaxis]
mask = (B == i).astype(int)
A = (C * mask).sum(axis=-1)
#DSM's vectorized version
As #DSM suggested, you could also do:
A = (B == np.arange(M)[:,np.newaxis]).dot(C.T)
This is by far the fastest solution for most sizes of M and N, and arguably the most elegant (much more elegant than my solutions, anyway).
Let's break it down a bit.
The B == np.arange(M)[:,np.newaxis] is exactly equivalent to B == i in the "Vectorizing the Outer-most Loop" section above.
The key is in recognizing that all of the j and k loops are equivalent to matrix multiplication. dot will cast the boolean B == i array to the same dtype as C behind-the-scenes, so we don't need to worry about explicitly casting it to a different type.
After that, we're just performing matrix multiplication on the transpose of C (a 5x5 array) and the "mask" 0 and 1 array above, yielding a 2x5 array.
dot will take advantage of any optimized BLAS libraries you have installed (e.g. ATLAS, MKL), so it's very fast.
Timings
For small M's and N's, the differences are less apparent (~6x between looping and DSM's version):
M, N = 2, 5
%timeit loops(B,C,M)
10000 loops, best of 3: 83 us per loop
%timeit k_vectorized(B,C,M)
10000 loops, best of 3: 106 us per loop
%timeit vectorized(B,C,M)
10000 loops, best of 3: 23.7 us per loop
%timeit askewchan(B,C,M)
10000 loops, best of 3: 42.7 us per loop
%timeit einsum(B,C,M)
100000 loops, best of 3: 15.2 us per loop
%timeit dsm(B,C,M)
100000 loops, best of 3: 13.9 us per loop
However, once M and N start to grow, the difference becomes very significant (~600x) (note the units!):
M, N = 50, 20
%timeit loops(B,C,M)
10 loops, best of 3: 50.3 ms per loop
%timeit k_vectorized(B,C,M)
100 loops, best of 3: 10.5 ms per loop
%timeit ik_vectorized(B,C,M)
1000 loops, best of 3: 963 us per loop
%timeit vectorized(B,C,M)
1000 loops, best of 3: 247 us per loop
%timeit askewchan(B,C,M)
1000 loops, best of 3: 493 us per loop
%timeit einsum(B,C,M)
10000 loops, best of 3: 134 us per loop
%timeit dsm(B,C,M)
10000 loops, best of 3: 80.2 us per loop
I am assuming #DSM found your typo, and you want:
A[i,j] = sum_{k in 0...N} C[j,k] where B[k] == i
Then you can loop over i in range(M) since M is relatively small.
A = np.array([C[:,B == i].sum(axis=1) for i in range(M)])