Efficiently compute pairwise equal for NumPy arrays - python

Given two NumPy arrays, say:
import numpy as np
import numpy.random as rand
n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))
Is there a more efficient way to compute X in the following:
X = np.zeros((n, n))
for i in range(n):
for j in range(n):
X[i, j] = 1 * np.all(x[i] == y[j])

Approach #1 : Input arrays with 0s & 1s
For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -
n = x.shape[1]
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)
Approach #2 : Generic case
For a generic case, we can use views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)
Runtime test
# Setup
In [287]: np.random.seed(0)
...: n = 1000
...: x = rand.binomial(n=1, p=.5, size=(n, 10))
...: y = rand.binomial(n=1, p=.5, size=(n, 10))
# Original approach
In [288]: %%timeit
...: X = np.zeros((n, n))
...: for i in range(n):
...: for j in range(n):
...: X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop
# Approach #1
In [290]: %%timeit
...: n = x.shape[1]
...: s = 2**np.arange(n)
...: x1D = x.dot(s)
...: y1D = y.dot(s)
...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop
# Approach #2
In [291]: %%timeit
...: x1D, y1D = view1D(x, y)
...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop

Related

Custom 2D matrix operation using numpy

I have two sparse binary matrices A and B that have matching dimensions, e.g. A has shape I x J and B has shape J x K. I have a custom operation that results in a matrix C of shape I x J x K, where each element (i,j,k) is 1 only if A(i,j) = 1 and B(j,k) = 1. I have currently implemented this operation as follows:
import numpy as np
I = 2
J = 3
K = 4
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
print(C)
However, the for loop is quite slow for large I,J,K. Is it possible to achieve this operation using numpy methods only to speed it up? I have looked at np.multiply.outer, but no success so far.
Here you go:
C = np.einsum('ij,jk->ijk', A,B)
Try to do what you're already doing with numba.
Here's an example using your code, Sehan2's method and numba:
import numpy as np
from numba import jit, prange
I = 2
J = 3
K = 4
np.random.seed(0)
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
def Custom_method(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
def Custom_method_ein(A, B):
C = np.einsum('ij,jk->ijk', A,B)
return C
#jit(nopython=True)
def Custom_method_numba(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in prange(I):
for j in prange(J):
for k in prange(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
print('original')
%timeit Custom_method(A, B)
print('einsum')
%timeit Custom_method_ein(A, B)
print('numba')
%timeit Custom_method_numba(A, B)
Output:
original
10000 loops, best of 5: 18.8 µs per loop
einsum
The slowest run took 20.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.32 µs per loop
numba
The slowest run took 15.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 815 ns per loop
Note that you can make your code run much faster and more efficiently if you use sparse matrix representations. That way you avoid performing unnecessary operations.

Convert Python For-Loop to NumPy Operations

I have a NumPy array full of indices:
size = 100000
idx = np.random.randint(0, size, size=size)
And I have a simple function that loops over the indices and does:
out = np.zeros(size, dtype=np.int)
for i in range(size):
j = idx[i]
out[min(i, j)] = out[min(i, j)] + 1
out[max(i, j)] = out[max(i, j)] - 1
return np.cumsum(out)
This is quite slow when size is large and I am hoping to find a faster way to accomplish this. I've tried this but it isn't quite right:
out = np.zeros(size, dtype=np.int)
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
out[mini] = out[mini] + 1
out[maxi] = out[maxi] - 1
return np.cumsum(out)
We can make use of np.bincount -
R = np.arange(size)
out = np.bincount(np.minimum(R,idx),minlength=size)
out -= np.bincount(np.maximum(R,idx),minlength=size)
final_out = out.cumsum()
Timings -
All posted solutions use cumsum at the end. So, let's time these skipping that last step -
In [25]: np.random.seed(0)
...: size = 100000
...: idx = np.random.randint(0, size, size=size)
# From this post
In [27]: %%timeit
...: R = np.arange(size)
...: out = np.bincount(np.minimum(R,idx),minlength=size)
...: out -= np.bincount(np.maximum(R,idx),minlength=size)
1000 loops, best of 3: 643 µs per loop
# #slaw's solution
In [28]: %%timeit
...: i = np.arange(size)
...: j = idx[i]
...: mini = np.minimum(i, j)
...: maxi = np.maximum(i, j)
...:
...: unique_mini, mini_counts = np.unique(mini, return_counts=True)
...: unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
...:
...: out = np.zeros(size, dtype=np.int)
...: out[unique_mini] = out[unique_mini] + mini_counts
...: out[unique_maxi] = out[unique_maxi] - maxi_counts
100 loops, best of 3: 13.3 ms per loop
# Loopy one from question
In [29]: %%timeit
...: out = np.zeros(size, dtype=np.int)
...:
...: for i in range(size):
...: j = idx[i]
...: out[min(i, j)] = out[min(i, j)] + 1
...: out[max(i, j)] = out[max(i, j)] - 1
10 loops, best of 3: 141 ms per loop
This seems to give the same answer as the for-loop
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
unique_mini, mini_counts = np.unique(mini, return_counts=True)
unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
out = np.zeros(size, dtype=np.int)
out[unique_mini] = out[unique_mini] + mini_counts
out[unique_maxi] = out[unique_maxi] - maxi_counts
return np.cumsum(out)

Loop through varied number of matrices using numpy

Here is the functionality demonstrated on a fixed number of matrices:
x = np.matrix('0.5')
y = np.matrix('0.5 0.5; 0.5 0.5')
z = np.matrix('0.75 0.25; 0.34 0.66')
output = []
for i in x.flat:
for j in y.flat:
for k in z.flat:
output.append(i * j * k)
I need help solving this issue on a variable number of matrices. I have tried using
reduce(np.dot, arr)
But this is not what I want to do.
With A holding the list of input matrices, we could just iteratively use np.outer. np.outer would flatten the inputs on its own, so, we don't need to do it ourselves and only a final flattening step would be needed.
Thus, solution would be -
A = [x,y,z,w]
out = A[0]
for i in A[1:]:
out = np.outer(out, i)
out = out.ravel()
Note that the output would be an array. If needed as a matrix, simply wrap it with np.matrix() at the end.
Sample run for 4 matrices -
In [38]: x = np.matrix('0.5')
...: y = np.matrix('0.15 0.25; 0.35 0.45')
...: z = np.matrix('0.75 0.25; 0.34 0.66')
...: w = np.matrix('0.45 0.15; 0.8 0.2')
...:
...: output = []
...: for i in x.flat:
...: for j in y.flat:
...: for k in z.flat:
...: for l in w.flat:
...: output.append(i * j * k * l)
...:
In [64]: A = [x,y,z,w]
...: out = A[0]
...: for i in A[1:]:
...: out = np.outer(out, i)
...: out = out.ravel()
...:
In [65]: np.allclose(output, out)
Out[65]: True

Numpy element-wise dot product

is there an elegant, numpy way to apply the dot product elementwise? Or how can the below code be translated into a nicer version?
m0 # shape (5, 3, 2, 2)
m1 # shape (5, 2, 2)
r = np.empty((5, 3, 2, 2))
for i in range(5):
for j in range(3):
r[i, j] = np.dot(m0[i, j], m1[i])
Thanks in advance!
Approach #1
Use np.einsum -
np.einsum('ijkl,ilm->ijkm',m0,m1)
Steps involved :
Keep the first axes from the inputs aligned.
Lose the last axis from m0 against second one from m1 in sum-reduction.
Let remaining axes from m0 and m1 spread-out/expand with elementwise multiplications in an outer-product fashion.
Approach #2
If you are looking for performance and with the axis of sum-reduction having a smaller length, you are better off with one-loop and using matrix-multiplication with np.tensordot, like so -
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))
Approach #3
Now, np.dot could be efficiently used on 2D inputs for some further performance boost. So, with it, the modified version, though a bit longer one, but hopefully the most performant one would be -
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
m0.shape = s0,s1*s2,s3 # Get m0 as 3D for temporary usage
r = np.empty((s0,s1*s2,s4))
for i in range(s0):
r[i] = m0[i].dot(m1[i])
r.shape = s0,s1,s2,s4
m0.shape = s0,s1,s2,s3 # Put m0 back to 4D
Runtime test
Function definitions -
def original_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
for j in range(s1):
r[i, j] = np.dot(m0[i, j], m1[i])
return r
def einsum_app(m0, m1):
return np.einsum('ijkl,ilm->ijkm',m0,m1)
def tensordot_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))
return r
def dot_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
m0.shape = s0,s1*s2,s3 # Get m0 as 3D for temporary usage
r = np.empty((s0,s1*s2,s4))
for i in range(s0):
r[i] = m0[i].dot(m1[i])
r.shape = s0,s1,s2,s4
m0.shape = s0,s1,s2,s3 # Put m0 back to 4D
return r
Timings and verification -
In [291]: # Inputs
...: m0 = np.random.rand(50,30,20,20)
...: m1 = np.random.rand(50,20,20)
...:
In [292]: out1 = original_app(m0, m1)
...: out2 = einsum_app(m0, m1)
...: out3 = tensordot_app(m0, m1)
...: out4 = dot_app(m0, m1)
...:
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...: print np.allclose(out1, out4)
...:
True
True
True
In [293]: %timeit original_app(m0, m1)
...: %timeit einsum_app(m0, m1)
...: %timeit tensordot_app(m0, m1)
...: %timeit dot_app(m0, m1)
...:
100 loops, best of 3: 10.3 ms per loop
10 loops, best of 3: 31.3 ms per loop
100 loops, best of 3: 5.12 ms per loop
100 loops, best of 3: 4.06 ms per loop
I think numpy.inner() is what you really want?

Vectorizing NumPy covariance for 3D array

I have a 3D numpy array of shape (t, n1, n2):
x = np.random.rand(10, 2, 4)
I need to calculate another 3D array y which is of shape (t, n1, n1) such that:
y[0] = np.cov(x[0,:,:])
...and so on for all slices along the first axis.
So, a loopy implementation would be:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
But it didn't work.
Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Runtime test
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Huge speedup there!

Categories

Resources