I'm working with a k x k x k x k tensor (say S) and an array X of size (n, k). Roughly, X's rows correspond to node features for a graph. For each pair of edges (say e = (u, v) and e' = (u_, v_)) I want to compute a new element as follows:
elt = np.sum(S * np.multiply.outer(np.outer(X[u, :], X[v, :]), np.outer(X[u_, :], X[v_, :])))
I wonder if there is a way to do this more efficiently instead of 4 nested loops over indices.
If I was working with just pairs of nodes and S was just a k x k matrix, this could be written simply as
all_elts = X # S # X.T
However, I'm not sure how this generalizes over multiple dimensions. Any help is much appreciated!
Here is an example to show how to use einsum():
import numpy as np
from itertools import product
n = 4
x = np.random.randn(n, n)
S = np.random.randn(n, n, n, n)
res = np.zeros((n, n, n, n))
for i, j, k, l in product(range(n), range(n), range(n), range(n)):
res[i, j, k, l] = np.sum(S * np.multiply.outer(np.outer(x[i, :], x[j, :]), np.outer(x[k, :], x[l, :])))
res2 = np.einsum("efgh,ae,bf,cg,dh->abcd", S, x, x, x, x)
np.allclose(res, res2)
Related
Is it possible to vectorize the following code in Python? It runs very slowly when the size of the array becomes large.
import numpy as np
# A, B, C are 3d arrays with shape (K, N, N).
# Entries in A, B, and C are in [0, 1].
# In the following, I use random values in B and C as an example.
K = 5
N = 10000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
for k in range(K):
for m in [x for x in range(K) if x != k]:
for i in range(N):
for j in range(N):
if A[m, i, j] not in [0, 1]:
if A[k, i, j] == 1:
A[m, i, j] = B[m ,i ,j]
if A[k ,i, j] == 0:
A[m, i, j] = C[m, i, j]
I cannot identify a way to vectorize this, but I can suggest using numba package to reduce the computation time. At here, you can import njit with the nogil=True parameter to speed up your code.
from numba import njit
#njit(nogil=True)
def function():
for k in range(K):
for m in [x for x in range(K) if x != k]:
for i in range(N):
for j in range(N):
if A[k, i, j] == 1 and A[m, i, j] not in [0, 1]:
A[m, i, j] = B[m ,i ,j]
if A[k ,i, j] == 0 and A[m, i, j] not in [0, 1]:
A[m, i, j] = C[m, i, j]
%timeit function()
7.35 s ± 252 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
With njit and nogil parameter, it took me 7 seconds to run the whole thing, but without the njit, my code is running for hours(and it still is now). Python has a global interpreter lock (GIL) to make sure it sticks to single-threading. By releasing the GIL, you can execute the code in multithreading. However, when using nogil=True, you’ll have to be wary of the usual pitfalls of multi-threaded programming (consistency, synchronization, race conditions, etc.).
You can look at the documentation about Numba here.
https://numba.pydata.org/numba-doc/dev/user/jit.html?highlight=nogil
I can help with a partial vectorization that should speed things up quite a bit, but I'm not sure on your logic for k vs. m, so didn't try to include that part. Essentially, you create a mask with the conditions you want checked across the 2nd and 3rd dimensions of A. Then map between A and either B or C using the appropriate mask:
# A, B, C are 3d arrays with shape (K, N, N).
# Entries in A, B, and C are in [0, 1].
# In the following, I use random values in B and C as an example.
np.random.seed(10)
K = 5
N = 1000
A = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
for k in range(K):
for m in [x for x in range(K) if x != k]:
#if A[m, i, j] not in [0, 1]:
mask_1 = A[k, :, :] == 1
mask_0 = A[k, :, :] == 0
A[m, mask_1] = B[m, mask_1]
A[m, mask_0] = C[m, mask_0]
I omitted the A[m, i, j] not in [0, 1] part because this made it difficult to debug since nothing happens (A is initialized as all zeros). If you need to include additional logic like this, just create another mask for it and include it in with an and in each mask's logic.
Update on 7/6/22
If you want to update the above code to remove the loop over m, then you can initialize an array with all the values of k, and use that to expand the mask to include all 3 dimensions, excluding each value of k that matches m as follows:
np.random.seed(10)
K = 5
N = 1000
A_2 = np.zeros((K, N, N))
B = np.random.normal(0, 1, (K, N, N))
C = np.random.normal(0, 1, (K, N, N))
K_vals = np.array(range(K))
for k in range(K):
#for m in [x for x in range(K) if x != k]:
#if A[m, i, j] not in [0, 1]:
k_dim_2_skip = K_vals == k
mask_1 = np.tile(A_2[k, :, :] == 1, (K, 1, 1))
mask_1[k_dim_2_skip, :, :] = False
mask_0 = np.tile(A_2[k, :, :] == 0, (K, 1, 1))
mask_0[k_dim_2_skip, :, :] = False
A_2[mask_1] = B[mask_1]
A_2[mask_0] = C[mask_0]
Use these masks with the & np.logical_not... code you added in the comment below and that should do it. Note the more you vectorize, the larger the arrays you're manipulating for masks, etc. get, so there is a tradeoff with memory consumption. There is usually a sweet spot to balance run time vs memory usage for a given problem.
I'm trying to write a python code for a higher order (d=4) factorization machine that returns the scalar result y of
Where x is a vector of some length n, v is a vector of length n, w is an upper triangular matrix of size n by n, and t is a rank 4 tensor of size n by n by n by n. The easiest implementation is just for loops over each index:
for i in range(0,len(x)):
for j in range(0,len(x)):
for k in range(0,len(x)):
for l in range(0,len(x)):
y += t[i,j,k,l] * x[i] * x[j] * x[k] * x[l]
The first two terms are easily calculated:
y = u # x + x # v # x.T
My question- is there a better way of calculating the sum over the tensor than a nested for-loop? (currently looking at possible solutions in pytorch)
This seems like a perfect fit for torch.einsum:
>>> torch.einsum('ijkl,i,j,k,l->', t, *(x,)*4)
In expanded form, this looks like torch.einsum('ijkl,i,j,k,l->', t, x, x, x, x) and computes the value defined by your four for loops:
for i, j, k, l in cartesian_prod:
y += t[i,j,k,l] * x[i] * x[j] * x[k] * x[l]
Where cartesian_prod is the cartesian product: range(len(x))^4
Thank you swag2198
(c * x[:, None, None, None] * x[None, :, None, None] * x[None, None, :, None] * x[None, None, None, :]).sum()
Returns the same result as the for-loops when test on dummy values of x and t
For a 2-dimensional matrix A of size (N, K) with each element 'a', we can get a matrix B of size (N, K, N) with each element 'b' such that b[i, k, j] = a[i, k]*a[j,k] by the operation
B = tf.expand_dims(A, -1)* tf.transpose(A).
Now with a matrix of 3-dimensional matrix A of size (M, N, K) with each element 'a', is there a way to compute 4-dimensional matrix B of size (M, N, K, N) with each element 'b' such that
b[m, i, k, j] = a[m, i, k]*a[m, j, k]?
Try einsum:
B = np.einsum('mik,mjk->mikj', A, A)
You can use (tf.einsum) if you are using tensors.
Bemma,
This solution should work:
Expand N dimension, multiply, transpose result.
M, N, K = 2,3,4 # insert your dimensions here
A = tf.constant(np.random.randint(1, 100, size=[M,N,K])) # generate A
B = tf.expand_dims(A, 1)* tf.expand_dims(A, 2)
B = tf.transpose(B, perm=[0, 1, 3, 2])
# test to verify result:
for m in range (M):
for i in range (N):
for k in range (K):
for j in range (N):
assert B[m, i, k, j] == A[m, i, k] * A[m, j, k]
this test passes without errors
I have some constraints of the form of
A_{i,j,k} = r_{i,j}B_{i,j,k}
A is a nxmxp matrix, as is B. r is an nxm matrix.
I would like to vectorize this in Python somehow, as efficiently as possible. Right now, I am making r into nxmxp matrix by saying r_{i,j,k} = r_{i,j} for all 1 <= k <= p. Then I call np.multiply on r and B. This seems inefficient. Any ideas welcome, thanks.
def ndHadamardProduct(r, n, m, p): #r is a n x m matrix, p is an int
rnew = np.zeros(n, m, p)
B = np.zeros(n, m, p)
for i in range(n):
for j in range(m):
for k in range(p):
r[i, j, k] = r[i, j]
B[i, j, k] = random.uniform(0, 1)
return np.multiply(r, B)
Add an extra dimension with np.newaxis and then broadcasting takes care of the repetition for you.
import numpy as np
r = np.random.random((3,4))
b = np.random.random((3,4,5))
a = r[:,:,np.newaxis] * b
The Wikipedia entry for the Arnoldi method provides a Python example that produces basis of the Krylov subspace of a matrix A. Supposedly, if A is Hermitian (i.e. if A == A.conj().T) then the Hessenberg matrix h generated by this algorithm is tridiagonal (source). However, when I use the Wikipedia code on a real-world Hermitian matrix, the Hessenberg matrix is not at all tridiagonal. When I perform the computation on the real part of A (so that A == A.T) then I do get a tridiagonal Hessenberg matrix, so there seems to be a problem with the imaginary components of A. Does anybody know why the Wikipedia code doesn't produce the expected results?
Working example:
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import circulant
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b) # Normalize the input vector
Q[:, 0] = q # Use it as the first Krylov vector
for k in range(n):
v = A.dot(q) # Generate a new candidate vector
for j in range(k + 1): # Subtract the projections on previous vectors
h[j, k] = np.dot(Q[:, j], v)
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12 # If v is shorter than this threshold it is the zero vector
if h[k + 1, k] > eps: # Add the produced vector to the list, unless
q = v / h[k + 1, k] # the zero vector is produced.
Q[:, k + 1] = q
else: # If that happens, stop iterating.
return Q, h
return Q, h
# Construct matrix A
N = 2**4
I = np.eye(N)
k = np.fft.fftfreq(N, 1.0 / N) + 0.5
alpha = np.linspace(0.1, 1.0, N)*2e2
c = np.fft.fft(alpha) / N
C = circulant(c)
A = np.einsum("i, ij, j->ij", k, C, k)
# Show that A is Hermitian
print(np.allclose(A, A.conj().T))
# Arbitrary (random) initial vector
np.random.seed(0)
v = np.random.rand(N)
# Perform Arnoldi iteration with complex A
_, h = arnoldi_iteration(A, v, N)
# Perform Arnoldi iteration with real A
_, h2 = arnoldi_iteration(np.real(A), v, N)
# Plot results
plt.subplot(121)
plt.imshow(np.abs(h))
plt.title("Complex A")
plt.subplot(122)
plt.imshow(np.abs(h2))
plt.title("Real A")
plt.tight_layout()
plt.show()
Result:
After browsing through some conference presentation slides, I realised that at some point Q had to be conjugated when A is complex. The correct algorithm is posted below for reference, with the code change marked (note that this correction has also been submitted to the Wikipedia entry):
import numpy as np
def arnoldi_iteration(A, b, n):
m = A.shape[0]
h = np.zeros((n + 1, n), dtype=np.complex)
Q = np.zeros((m, n + 1), dtype=np.complex)
q = b / np.linalg.norm(b)
Q[:, 0] = q
for k in range(n):
v = A.dot(q)
for j in range(k + 1):
h[j, k] = np.dot(Q[:, j].conj(), v) # <-- Q needs conjugation!
v = v - h[j, k] * Q[:, j]
h[k + 1, k] = np.linalg.norm(v)
eps = 1e-12
if h[k + 1, k] > eps:
q = v / h[k + 1, k]
Q[:, k + 1] = q
else:
return Q, h
return Q, h