Convert Python For-Loop to NumPy Operations - python

I have a NumPy array full of indices:
size = 100000
idx = np.random.randint(0, size, size=size)
And I have a simple function that loops over the indices and does:
out = np.zeros(size, dtype=np.int)
for i in range(size):
j = idx[i]
out[min(i, j)] = out[min(i, j)] + 1
out[max(i, j)] = out[max(i, j)] - 1
return np.cumsum(out)
This is quite slow when size is large and I am hoping to find a faster way to accomplish this. I've tried this but it isn't quite right:
out = np.zeros(size, dtype=np.int)
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
out[mini] = out[mini] + 1
out[maxi] = out[maxi] - 1
return np.cumsum(out)

We can make use of np.bincount -
R = np.arange(size)
out = np.bincount(np.minimum(R,idx),minlength=size)
out -= np.bincount(np.maximum(R,idx),minlength=size)
final_out = out.cumsum()
Timings -
All posted solutions use cumsum at the end. So, let's time these skipping that last step -
In [25]: np.random.seed(0)
...: size = 100000
...: idx = np.random.randint(0, size, size=size)
# From this post
In [27]: %%timeit
...: R = np.arange(size)
...: out = np.bincount(np.minimum(R,idx),minlength=size)
...: out -= np.bincount(np.maximum(R,idx),minlength=size)
1000 loops, best of 3: 643 µs per loop
# #slaw's solution
In [28]: %%timeit
...: i = np.arange(size)
...: j = idx[i]
...: mini = np.minimum(i, j)
...: maxi = np.maximum(i, j)
...:
...: unique_mini, mini_counts = np.unique(mini, return_counts=True)
...: unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
...:
...: out = np.zeros(size, dtype=np.int)
...: out[unique_mini] = out[unique_mini] + mini_counts
...: out[unique_maxi] = out[unique_maxi] - maxi_counts
100 loops, best of 3: 13.3 ms per loop
# Loopy one from question
In [29]: %%timeit
...: out = np.zeros(size, dtype=np.int)
...:
...: for i in range(size):
...: j = idx[i]
...: out[min(i, j)] = out[min(i, j)] + 1
...: out[max(i, j)] = out[max(i, j)] - 1
10 loops, best of 3: 141 ms per loop

This seems to give the same answer as the for-loop
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
unique_mini, mini_counts = np.unique(mini, return_counts=True)
unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
out = np.zeros(size, dtype=np.int)
out[unique_mini] = out[unique_mini] + mini_counts
out[unique_maxi] = out[unique_maxi] - maxi_counts
return np.cumsum(out)

Related

Custom 2D matrix operation using numpy

I have two sparse binary matrices A and B that have matching dimensions, e.g. A has shape I x J and B has shape J x K. I have a custom operation that results in a matrix C of shape I x J x K, where each element (i,j,k) is 1 only if A(i,j) = 1 and B(j,k) = 1. I have currently implemented this operation as follows:
import numpy as np
I = 2
J = 3
K = 4
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
print(C)
However, the for loop is quite slow for large I,J,K. Is it possible to achieve this operation using numpy methods only to speed it up? I have looked at np.multiply.outer, but no success so far.
Here you go:
C = np.einsum('ij,jk->ijk', A,B)
Try to do what you're already doing with numba.
Here's an example using your code, Sehan2's method and numba:
import numpy as np
from numba import jit, prange
I = 2
J = 3
K = 4
np.random.seed(0)
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
def Custom_method(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
def Custom_method_ein(A, B):
C = np.einsum('ij,jk->ijk', A,B)
return C
#jit(nopython=True)
def Custom_method_numba(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in prange(I):
for j in prange(J):
for k in prange(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
print('original')
%timeit Custom_method(A, B)
print('einsum')
%timeit Custom_method_ein(A, B)
print('numba')
%timeit Custom_method_numba(A, B)
Output:
original
10000 loops, best of 5: 18.8 µs per loop
einsum
The slowest run took 20.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.32 µs per loop
numba
The slowest run took 15.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 815 ns per loop
Note that you can make your code run much faster and more efficiently if you use sparse matrix representations. That way you avoid performing unnecessary operations.

Is there a way to vectorise this dynamic time warping algorithm?

The dynamic time warping algorithm provides a notion of distance between two temporal sequences which may vary in speed. If I have N sequences to compare to each other, I can construct a NXN symmetric matrix with a nule diagonal by applying the algorithm pairwise. However, this is very slow for long two-dimensional sequences. Therefore, I am trying to vectorise the code to speed up this matrix computation. Importantly, I also want to extract the indices defining the optimal alignment.
My code for pairwise comparison so far :
import math
import numpy as np
seq1 = np.random.randint(100, size=(100, 2)) #Two dim sequences
seq2 = np.random.randint(100, size=(100, 2))
def seqdist(seq1, seq2): # dynamic time warping function
ns = len(seq1)
nt = len(seq2)
D = np.zeros((ns+1, nt+1))+math.inf
D[0, 0] = 0
cost = np.zeros((ns,nt))
for i in range(ns):
for j in range(nt):
cost[i,j] = np.linalg.norm(seq1[i,:]-seq2[j,:])
D[i+1, j+1] = cost[i,j]+min([D[i, j+1], D[i+1, j], D[i, j]])
d = D[ns,nt] # distance
matchidx = [[ns-1, nt-1]] # backwards optimal alignment computation
i = ns
j = nt
for k in range(ns+nt+2):
idx = np.argmin([D[i-1, j], D[i, j-1], D[i-1, j-1]])
if idx == 0 and i > 1 and j > 0:
matchidx.append([i-2, j-1])
i -= 1
elif idx == 1 and i > 0 and j > 1:
matchidx.append([i-1, j-2])
j -= 1
elif idx == 2 and i > 1 and j > 1:
matchidx.append([i-2, j-2])
i -= 1
j -= 1
else:
break
matchidx.reverse()
return d, matchidx
[d,matchidx] = seqdist(seq1,seq2) #try it
Here is one re-write of your code that makes it more amenable to numba.jit. This is not exactly a vectorized solution, but I see a speedup by a factor of 230 for this benchmark.
from numba import jit
from scipy import spatial
#jit
def D_from_cost(cost, D):
# operates on D inplace
ns, nt = cost.shape
for i in range(ns):
for j in range(nt):
D[i+1, j+1] = cost[i,j]+min(D[i, j+1], D[i+1, j], D[i, j])
# avoiding the list creation inside mean enables better jit performance
# D[i+1, j+1] = cost[i,j]+min([D[i, j+1], D[i+1, j], D[i, j]])
#jit
def get_d(D, matchidx):
ns = D.shape[0] - 1
nt = D.shape[1] - 1
d = D[ns,nt]
matchidx[0,0] = ns - 1
matchidx[0,1] = nt - 1
i = ns
j = nt
for k in range(1, ns+nt+3):
idx = 0
if not (D[i-1,j] <= D[i,j-1] and D[i-1,j] <= D[i-1,j-1]):
if D[i,j-1] <= D[i-1,j-1]:
idx = 1
else:
idx = 2
if idx == 0 and i > 1 and j > 0:
# matchidx.append([i-2, j-1])
matchidx[k,0] = i - 2
matchidx[k,1] = j - 1
i -= 1
elif idx == 1 and i > 0 and j > 1:
# matchidx.append([i-1, j-2])
matchidx[k,0] = i-1
matchidx[k,1] = j-2
j -= 1
elif idx == 2 and i > 1 and j > 1:
# matchidx.append([i-2, j-2])
matchidx[k,0] = i-2
matchidx[k,1] = j-2
i -= 1
j -= 1
else:
break
return d, matchidx[:k]
def seqdist2(seq1, seq2):
ns = len(seq1)
nt = len(seq2)
cost = spatial.distance_matrix(seq1, seq2)
# initialize and update D
D = np.full((ns+1, nt+1), np.inf)
D[0, 0] = 0
D_from_cost(cost, D)
matchidx = np.zeros((ns+nt+2,2), dtype=np.int)
d, matchidx = get_d(D, matchidx)
return d, matchidx[::-1].tolist()
assert seqdist2(seq1, seq2) == seqdist(seq1, seq2)
%timeit seqdist2(seq1, seq2) # 1000 loops, best of 3: 365 µs per loop
%timeit seqdist(seq1, seq2) # 10 loops, best of 3: 86.1 ms per loop
Here are some changes:
cost is calculated using spatial.distance_matrix.
The definition of idx is replaced with a bunch of ugly if statements that makes the compiled code faster.
min([D[i, j+1], D[i+1, j], D[i, j]]) is replaced with min(D[i, j+1], D[i+1, j], D[i, j]), i.e. instead of taking min of a list, we take min of three values. This leads to a surprising speedup under jit.
matchidx is preallocated as a numpy array and truncated to the right size just before output.

Efficiently compute pairwise equal for NumPy arrays

Given two NumPy arrays, say:
import numpy as np
import numpy.random as rand
n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))
Is there a more efficient way to compute X in the following:
X = np.zeros((n, n))
for i in range(n):
for j in range(n):
X[i, j] = 1 * np.all(x[i] == y[j])
Approach #1 : Input arrays with 0s & 1s
For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -
n = x.shape[1]
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)
Approach #2 : Generic case
For a generic case, we can use views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)
Runtime test
# Setup
In [287]: np.random.seed(0)
...: n = 1000
...: x = rand.binomial(n=1, p=.5, size=(n, 10))
...: y = rand.binomial(n=1, p=.5, size=(n, 10))
# Original approach
In [288]: %%timeit
...: X = np.zeros((n, n))
...: for i in range(n):
...: for j in range(n):
...: X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop
# Approach #1
In [290]: %%timeit
...: n = x.shape[1]
...: s = 2**np.arange(n)
...: x1D = x.dot(s)
...: y1D = y.dot(s)
...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop
# Approach #2
In [291]: %%timeit
...: x1D, y1D = view1D(x, y)
...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop

Loop through varied number of matrices using numpy

Here is the functionality demonstrated on a fixed number of matrices:
x = np.matrix('0.5')
y = np.matrix('0.5 0.5; 0.5 0.5')
z = np.matrix('0.75 0.25; 0.34 0.66')
output = []
for i in x.flat:
for j in y.flat:
for k in z.flat:
output.append(i * j * k)
I need help solving this issue on a variable number of matrices. I have tried using
reduce(np.dot, arr)
But this is not what I want to do.
With A holding the list of input matrices, we could just iteratively use np.outer. np.outer would flatten the inputs on its own, so, we don't need to do it ourselves and only a final flattening step would be needed.
Thus, solution would be -
A = [x,y,z,w]
out = A[0]
for i in A[1:]:
out = np.outer(out, i)
out = out.ravel()
Note that the output would be an array. If needed as a matrix, simply wrap it with np.matrix() at the end.
Sample run for 4 matrices -
In [38]: x = np.matrix('0.5')
...: y = np.matrix('0.15 0.25; 0.35 0.45')
...: z = np.matrix('0.75 0.25; 0.34 0.66')
...: w = np.matrix('0.45 0.15; 0.8 0.2')
...:
...: output = []
...: for i in x.flat:
...: for j in y.flat:
...: for k in z.flat:
...: for l in w.flat:
...: output.append(i * j * k * l)
...:
In [64]: A = [x,y,z,w]
...: out = A[0]
...: for i in A[1:]:
...: out = np.outer(out, i)
...: out = out.ravel()
...:
In [65]: np.allclose(output, out)
Out[65]: True

Numpy element-wise dot product

is there an elegant, numpy way to apply the dot product elementwise? Or how can the below code be translated into a nicer version?
m0 # shape (5, 3, 2, 2)
m1 # shape (5, 2, 2)
r = np.empty((5, 3, 2, 2))
for i in range(5):
for j in range(3):
r[i, j] = np.dot(m0[i, j], m1[i])
Thanks in advance!
Approach #1
Use np.einsum -
np.einsum('ijkl,ilm->ijkm',m0,m1)
Steps involved :
Keep the first axes from the inputs aligned.
Lose the last axis from m0 against second one from m1 in sum-reduction.
Let remaining axes from m0 and m1 spread-out/expand with elementwise multiplications in an outer-product fashion.
Approach #2
If you are looking for performance and with the axis of sum-reduction having a smaller length, you are better off with one-loop and using matrix-multiplication with np.tensordot, like so -
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))
Approach #3
Now, np.dot could be efficiently used on 2D inputs for some further performance boost. So, with it, the modified version, though a bit longer one, but hopefully the most performant one would be -
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
m0.shape = s0,s1*s2,s3 # Get m0 as 3D for temporary usage
r = np.empty((s0,s1*s2,s4))
for i in range(s0):
r[i] = m0[i].dot(m1[i])
r.shape = s0,s1,s2,s4
m0.shape = s0,s1,s2,s3 # Put m0 back to 4D
Runtime test
Function definitions -
def original_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
for j in range(s1):
r[i, j] = np.dot(m0[i, j], m1[i])
return r
def einsum_app(m0, m1):
return np.einsum('ijkl,ilm->ijkm',m0,m1)
def tensordot_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
r = np.empty((s0,s1,s2,s4))
for i in range(s0):
r[i] = np.tensordot(m0[i],m1[i],axes=([2],[0]))
return r
def dot_app(m0, m1):
s0,s1,s2,s3 = m0.shape
s4 = m1.shape[-1]
m0.shape = s0,s1*s2,s3 # Get m0 as 3D for temporary usage
r = np.empty((s0,s1*s2,s4))
for i in range(s0):
r[i] = m0[i].dot(m1[i])
r.shape = s0,s1,s2,s4
m0.shape = s0,s1,s2,s3 # Put m0 back to 4D
return r
Timings and verification -
In [291]: # Inputs
...: m0 = np.random.rand(50,30,20,20)
...: m1 = np.random.rand(50,20,20)
...:
In [292]: out1 = original_app(m0, m1)
...: out2 = einsum_app(m0, m1)
...: out3 = tensordot_app(m0, m1)
...: out4 = dot_app(m0, m1)
...:
...: print np.allclose(out1, out2)
...: print np.allclose(out1, out3)
...: print np.allclose(out1, out4)
...:
True
True
True
In [293]: %timeit original_app(m0, m1)
...: %timeit einsum_app(m0, m1)
...: %timeit tensordot_app(m0, m1)
...: %timeit dot_app(m0, m1)
...:
100 loops, best of 3: 10.3 ms per loop
10 loops, best of 3: 31.3 ms per loop
100 loops, best of 3: 5.12 ms per loop
100 loops, best of 3: 4.06 ms per loop
I think numpy.inner() is what you really want?

Categories

Resources