I have two sparse binary matrices A and B that have matching dimensions, e.g. A has shape I x J and B has shape J x K. I have a custom operation that results in a matrix C of shape I x J x K, where each element (i,j,k) is 1 only if A(i,j) = 1 and B(j,k) = 1. I have currently implemented this operation as follows:
import numpy as np
I = 2
J = 3
K = 4
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
print(C)
However, the for loop is quite slow for large I,J,K. Is it possible to achieve this operation using numpy methods only to speed it up? I have looked at np.multiply.outer, but no success so far.
Here you go:
C = np.einsum('ij,jk->ijk', A,B)
Try to do what you're already doing with numba.
Here's an example using your code, Sehan2's method and numba:
import numpy as np
from numba import jit, prange
I = 2
J = 3
K = 4
np.random.seed(0)
A = np.random.randint(2, size=(I, J))
B = np.random.randint(2, size=(J, K))
# Custom method
def Custom_method(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in range(I):
for j in range(J):
for k in range(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
def Custom_method_ein(A, B):
C = np.einsum('ij,jk->ijk', A,B)
return C
#jit(nopython=True)
def Custom_method_numba(A, B):
I, J = A.shape
J, K = B.shape
C = np.zeros((I,J,K))
for i in prange(I):
for j in prange(J):
for k in prange(K):
if A[i,j] == 1 and B[j,k] == 1:
C[i,j,k] = 1
return C
print('original')
%timeit Custom_method(A, B)
print('einsum')
%timeit Custom_method_ein(A, B)
print('numba')
%timeit Custom_method_numba(A, B)
Output:
original
10000 loops, best of 5: 18.8 µs per loop
einsum
The slowest run took 20.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.32 µs per loop
numba
The slowest run took 15.99 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 815 ns per loop
Note that you can make your code run much faster and more efficiently if you use sparse matrix representations. That way you avoid performing unnecessary operations.
Related
Is it possible to vectorize the following code in Python? It runs very slowly when the size of the array becomes large.
import numpy as np
# A, B, C are 3d arrays with shape (K, N, N).
# Entries in A, B, and C are in [0, 1].
# In the following, I use random values in B and C as an example.
K = 5
N = 10000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
for k in range(K):
for m in [x for x in range(K) if x != k]:
for i in range(N):
for j in range(N):
if A[m, i, j] not in [0, 1]:
if A[k, i, j] == 1:
A[m, i, j] = B[m ,i ,j]
if A[k ,i, j] == 0:
A[m, i, j] = C[m, i, j]
I cannot identify a way to vectorize this, but I can suggest using numba package to reduce the computation time. At here, you can import njit with the nogil=True parameter to speed up your code.
from numba import njit
#njit(nogil=True)
def function():
for k in range(K):
for m in [x for x in range(K) if x != k]:
for i in range(N):
for j in range(N):
if A[k, i, j] == 1 and A[m, i, j] not in [0, 1]:
A[m, i, j] = B[m ,i ,j]
if A[k ,i, j] == 0 and A[m, i, j] not in [0, 1]:
A[m, i, j] = C[m, i, j]
%timeit function()
7.35 s ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With njit and nogil parameter, it took me 7 seconds to run the whole thing, but without the njit, my code is running for hours(and it still is now). Python has a global interpreter lock (GIL) to make sure it sticks to single-threading. By releasing the GIL, you can execute the code in multithreading. However, when using nogil=True, you’ll have to be wary of the usual pitfalls of multi-threaded programming (consistency, synchronization, race conditions, etc.).
You can look at the documentation about Numba here.
https://numba.pydata.org/numba-doc/dev/user/jit.html?highlight=nogil
I can help with a partial vectorization that should speed things up quite a bit, but I'm not sure on your logic for k vs. m, so didn't try to include that part. Essentially, you create a mask with the conditions you want checked across the 2nd and 3rd dimensions of A. Then map between A and either B or C using the appropriate mask:
# A, B, C are 3d arrays with shape (K, N, N).
# Entries in A, B, and C are in [0, 1].
# In the following, I use random values in B and C as an example.
np.random.seed(10)
K = 5
N = 1000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
for k in range(K):
for m in [x for x in range(K) if x != k]:
#if A[m, i, j] not in [0, 1]:
mask_1 = A[k, :, :] == 1
mask_0 = A[k, :, :] == 0
A[m, mask_1] = B[m, mask_1]
A[m, mask_0] = C[m, mask_0]
I omitted the A[m, i, j] not in [0, 1] part because this made it difficult to debug since nothing happens (A is initialized as all zeros). If you need to include additional logic like this, just create another mask for it and include it in with an and in each mask's logic.
Update on 7/6/22
If you want to update the above code to remove the loop over m, then you can initialize an array with all the values of k, and use that to expand the mask to include all 3 dimensions, excluding each value of k that matches m as follows:
np.random.seed(10)
K = 5
N = 1000
A_2 = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
K_vals = np.array(range(K))
for k in range(K):
#for m in [x for x in range(K) if x != k]:
#if A[m, i, j] not in [0, 1]:
k_dim_2_skip = K_vals == k
mask_1 = np.tile(A_2[k, :, :] == 1, (K, 1, 1))
mask_1[k_dim_2_skip, :, :] = False
mask_0 = np.tile(A_2[k, :, :] == 0, (K, 1, 1))
mask_0[k_dim_2_skip, :, :] = False
A_2[mask_1] = B[mask_1]
A_2[mask_0] = C[mask_0]
Use these masks with the & np.logical_not... code you added in the comment below and that should do it. Note the more you vectorize, the larger the arrays you're manipulating for masks, etc. get, so there is a tradeoff with memory consumption. There is usually a sweet spot to balance run time vs memory usage for a given problem.
The dynamic time warping algorithm provides a notion of distance between two temporal sequences which may vary in speed. If I have N sequences to compare to each other, I can construct a NXN symmetric matrix with a nule diagonal by applying the algorithm pairwise. However, this is very slow for long two-dimensional sequences. Therefore, I am trying to vectorise the code to speed up this matrix computation. Importantly, I also want to extract the indices defining the optimal alignment.
My code for pairwise comparison so far :
import math
import numpy as np
seq1 = np.random.randint(100, size=(100, 2)) #Two dim sequences
seq2 = np.random.randint(100, size=(100, 2))
def seqdist(seq1, seq2): # dynamic time warping function
ns = len(seq1)
nt = len(seq2)
D = np.zeros((ns+1, nt+1))+math.inf
D[0, 0] = 0
cost = np.zeros((ns,nt))
for i in range(ns):
for j in range(nt):
cost[i,j] = np.linalg.norm(seq1[i,:]-seq2[j,:])
D[i+1, j+1] = cost[i,j]+min([D[i, j+1], D[i+1, j], D[i, j]])
d = D[ns,nt] # distance
matchidx = [[ns-1, nt-1]] # backwards optimal alignment computation
i = ns
j = nt
for k in range(ns+nt+2):
idx = np.argmin([D[i-1, j], D[i, j-1], D[i-1, j-1]])
if idx == 0 and i > 1 and j > 0:
matchidx.append([i-2, j-1])
i -= 1
elif idx == 1 and i > 0 and j > 1:
matchidx.append([i-1, j-2])
j -= 1
elif idx == 2 and i > 1 and j > 1:
matchidx.append([i-2, j-2])
i -= 1
j -= 1
else:
break
matchidx.reverse()
return d, matchidx
[d,matchidx] = seqdist(seq1,seq2) #try it
Here is one re-write of your code that makes it more amenable to numba.jit. This is not exactly a vectorized solution, but I see a speedup by a factor of 230 for this benchmark.
from numba import jit
from scipy import spatial
#jit
def D_from_cost(cost, D):
# operates on D inplace
ns, nt = cost.shape
for i in range(ns):
for j in range(nt):
D[i+1, j+1] = cost[i,j]+min(D[i, j+1], D[i+1, j], D[i, j])
# avoiding the list creation inside mean enables better jit performance
# D[i+1, j+1] = cost[i,j]+min([D[i, j+1], D[i+1, j], D[i, j]])
#jit
def get_d(D, matchidx):
ns = D.shape[0] - 1
nt = D.shape[1] - 1
d = D[ns,nt]
matchidx[0,0] = ns - 1
matchidx[0,1] = nt - 1
i = ns
j = nt
for k in range(1, ns+nt+3):
idx = 0
if not (D[i-1,j] <= D[i,j-1] and D[i-1,j] <= D[i-1,j-1]):
if D[i,j-1] <= D[i-1,j-1]:
idx = 1
else:
idx = 2
if idx == 0 and i > 1 and j > 0:
# matchidx.append([i-2, j-1])
matchidx[k,0] = i - 2
matchidx[k,1] = j - 1
i -= 1
elif idx == 1 and i > 0 and j > 1:
# matchidx.append([i-1, j-2])
matchidx[k,0] = i-1
matchidx[k,1] = j-2
j -= 1
elif idx == 2 and i > 1 and j > 1:
# matchidx.append([i-2, j-2])
matchidx[k,0] = i-2
matchidx[k,1] = j-2
i -= 1
j -= 1
else:
break
return d, matchidx[:k]
def seqdist2(seq1, seq2):
ns = len(seq1)
nt = len(seq2)
cost = spatial.distance_matrix(seq1, seq2)
# initialize and update D
D = np.full((ns+1, nt+1), np.inf)
D[0, 0] = 0
D_from_cost(cost, D)
matchidx = np.zeros((ns+nt+2,2), dtype=np.int)
d, matchidx = get_d(D, matchidx)
return d, matchidx[::-1].tolist()
assert seqdist2(seq1, seq2) == seqdist(seq1, seq2)
%timeit seqdist2(seq1, seq2) # 1000 loops, best of 3: 365 µs per loop
%timeit seqdist(seq1, seq2) # 10 loops, best of 3: 86.1 ms per loop
Here are some changes:
cost is calculated using spatial.distance_matrix.
The definition of idx is replaced with a bunch of ugly if statements that makes the compiled code faster.
min([D[i, j+1], D[i+1, j], D[i, j]]) is replaced with min(D[i, j+1], D[i+1, j], D[i, j]), i.e. instead of taking min of a list, we take min of three values. This leads to a surprising speedup under jit.
matchidx is preallocated as a numpy array and truncated to the right size just before output.
Given two NumPy arrays, say:
import numpy as np
import numpy.random as rand
n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))
Is there a more efficient way to compute X in the following:
X = np.zeros((n, n))
for i in range(n):
for j in range(n):
X[i, j] = 1 * np.all(x[i] == y[j])
Approach #1 : Input arrays with 0s & 1s
For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -
n = x.shape[1]
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)
Approach #2 : Generic case
For a generic case, we can use views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)
Runtime test
# Setup
In [287]: np.random.seed(0)
...: n = 1000
...: x = rand.binomial(n=1, p=.5, size=(n, 10))
...: y = rand.binomial(n=1, p=.5, size=(n, 10))
# Original approach
In [288]: %%timeit
...: X = np.zeros((n, n))
...: for i in range(n):
...: for j in range(n):
...: X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop
# Approach #1
In [290]: %%timeit
...: n = x.shape[1]
...: s = 2**np.arange(n)
...: x1D = x.dot(s)
...: y1D = y.dot(s)
...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop
# Approach #2
In [291]: %%timeit
...: x1D, y1D = view1D(x, y)
...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop
I have a 3D numpy array of shape (t, n1, n2):
x = np.random.rand(10, 2, 4)
I need to calculate another 3D array y which is of shape (t, n1, n1) such that:
y[0] = np.cov(x[0,:,:])
...and so on for all slices along the first axis.
So, a loopy implementation would be:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
But it didn't work.
Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Runtime test
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Huge speedup there!
I tried to copy one array, says A (2-D) to another array, says B (3-D) which have following shape
A is m * n array and B is m * n * p array
I tried the following code but it is very slow, like 1 sec/frame
for r in range (0, h):
for c in range (0, w):
x = random.randint(0, 20)
B[r, c, x] = A[r, c]
I also read some websites about fancy indexing but I still don't know how to apply it in mine.
I propose a solution using array indices. M,N,P are each (m,n) index arrays, specifying the m*n elements of B that will receive data from A.
def indexing(A, p):
m,n = A.shape
B = np.zeros((m,n,p), dtype=int)
P = np.random.randint(0, p, (m,n))
M, N = np.indices(A.shape)
B[M,N,P] = A
return B
For comparision, the original loop, and the solution using shuffle
def looping(A, p):
m, n = A.shape
B = np.zeros((m,n,p), dtype=int)
for r in range (m):
for c in range (n):
x = np.random.randint(0, p)
B[r, c, x] = A[r, c]
return B
def shuffling(A, p):
m, n = A.shape
B = np.zeros((m,n,p), dtype=int)
B[:,:,0] = A
map(np.random.shuffle, B.reshape(m*n,p))
return B
for m,n,p = 1000,1000,20, timings are:
looping: 1.16 s
shuffling: 10 s
indexing: 271 ms
for small m,n, looping is fastest. My indexing solution takes more time to setup, but the actual assignment is fast. The shuffling solution has as many iterations as the original.
The M,N arrays don't have to be full. They can be column and row arrays, respectively
M = np.arange(m)[:,None]
N = np.arange(n)[None,:]
or
M,N = np.ogrid[:m,:n]
This shaves off some time, more so for small test cases than a large one.
A repeatable version:
def indexing(A, p, B=None):
m, n = A.shape
if B is None:
B = np.zeros((m,n,p), dtype=int)
for r in range (m):
for c in range (n):
x = np.random.randint(0, p)
B[r, c, x] = A[r, c]
return B
indexing(A,p,indexing(A,p))
If A isn't the same size as the 1st 2 dim of B the index ranges will have to be changed. A doesn't have to be a 2D array either:
B[[0,0,2],[1,1,0],[3,4,5]] = [10,11,12]
Assuming that h=m, w=n and x=p, this should give you the same as you have in your example:
B[:,:,0]=A
map(np.random.shuffle, B.reshape(h*w,p))
Note also, I'm assuming the answer to NPE's question in comments is 'yes'