I want to define the following matrix
A = b * c ^T.
So the product from a column vector b and a transposed column vector c. This product is a matrix.
In this case b and c have the same amount of components thus multiplication is possible.
The np.transpose(c) command did not really help me because when I did
import numpy as np
b = np.array([1,1])
c = np.array([0,1])
d = np.transpose(c)
A = b * d
print(A)
I received the vector [0,1] but I should be receiving a matrix. Because a column vector multiplied with a transposed column vector yields a matrix.
What could I do instead?
Related
I have two dense matrices A and B, and each of them has a size fo 3e5x100. Another sparse binary matrix, C, with size 3e5x3e5. I want to find the following quantity: C ∘ (AB'), where ∘ is Hadamard product (i.e., element wise) and B' is the transpose of B. Explicitly calculating AB' will ask for crazy amount of memory (~500GB). Since the end result won't need the whole AB', it is sufficient to only calculate the multiplication A_iB_j' where C_ij != 0, where A_i is the column i of matrix A and C_ij is the element at location (i,j) of the matrix C. A suggested approach would be like the algorithm below:
result = numpy.initalize_sparse_matrix(shape = C.shape)
while True:
(i,j) = C_ij.pop_nonzero_index() #prototype function returns the nonzero index and then points to the next nonzero index
if (i,j) is empty:
break
result(i,j) = A_iB_j'
This algorithm however takes too much time. Is there anyway to improve it using LAPACK/BLAS algorithms? I am coding in Python so I think numpy can be more human friendly wrapper for LAPACK/BLAS.
You can do this computation using the following, assuming C is stored as a scipy.sparse matrix:
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
result = sparse.coo_matrix((result_data, (row, col)), shape=C.shape)
Here we show that the result matches the naive algorithm for some smaller inputs:
import numpy as np
from scipy import sparse
N = 300
M = 10
def make_C(N, nnz=1000):
data = np.random.rand(nnz)
row = np.random.randint(0, N, nnz)
col = np.random.randint(0, N, nnz)
return sparse.coo_matrix((data, (row, col)), shape=(N, N))
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
def f_naive(C, A, B):
return C.multiply(np.dot(A, B.T))
def f_efficient(C, A, B):
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
return sparse.coo_matrix((result_data, (C.row, C.col)), shape=C.shape)
np.allclose(
f_naive(C, A, B).toarray(),
f_efficient(C, A, B).toarray()
)
# True
And here we see that it works for the full input size:
N = 300000
M = 100
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
out = f_efficient(C, A, B)
print(out.shape)
# (300000, 300000)
print(out.nnz)
# 1000
Given 2 tensors 2-D in PyTorch A (a X m) and B (m X b), is there any efficient way to obtain a tensor C (m X a X b), where C[i,:,:] = A[:,i] # B[i,:]?
Here I will give an example of the problem:
A = torch.FloatTensor([[1,2],[3,4]])
B = torch.FloatTensor([[1,2,3],[4,5,6]])
Result:
C = torch.FloatTensor([[[1,2,3],[3,6,9]],[[12,15,18],[16,20,24]]])
I have done it using a for-loop. However, it is very inefficient.
look at torch.einsum:
C = torch.einsum('im,mj->mij', A, B)
np.solve() works great when you have an equation in the form of Ax = b
My problem is that I actually have an equation in the form of xC = D, where x is a 2x2 matrix I want to find out, and C and D are 2x2 matrices I'm given.
And because matrix multiplication is generally not commutative, I can't just swap the two around.
Is there an efficient way to solve this in numpy (or other library in python)?
x # C = D is the same as D^-1 # x # C # C^-1 = D^-1 # D # C^-1 which is D^-1 # x = C^-1 which is in the form Ax = b where A is np.linalg.pinv(D) and b is np.linalg.pinv(C)
which boils down to
x = D # np.linalg.pinv(C)
which you could have gotten by just multipying both side of the equation by the inverse of C
I need to calculate a double sum of the form:
wignersum{ell} = sum_{ell1} sum_{ell2} (2*ell1+1)(2*ell2+1) * W{ell,ell1,ell2}^2 * C1(ell1) * C2(ell2)
where wignersum is an array indexed by ell, and ell, ell1, and ell2 all run from 0 to ellmax. The W{ell,ell1,ell2}^2 are a set of known coefficients that I've already calculated (called w3j), stored in an array of shape (ellmax, ellmax, ellmax) as a global variable to be called by this function. (These coefficients are time intensive to calculate and I've found it faster to load them from a numpy file). The C1 and C2 are arrays of coefficients of shape (ellmax).
I have successfully calculated this sum by making use of a double for loop and grabbing the appropriate elements from each prexisting array and updating the wignersum array in each iteration. I assume there is a better way to vectorize this problem to speed up the calculation. I thought about making the C1 and C2 arrays into arrays of the same shape as the w3j array, then multiplying these arrays elementwise before using np.sum on the ell1 and ell2 axes. I'm unsure whether this is in fact a good method of vecotrizing, and if it is, how to actually do this.
The code as it stands is something like
import numpy as np
ell_max = 400
w3j = np.ones((ell_max, ell_max, ell_max))
C1 = np.arange(ell_max)
C2 = np.arange(ell_max)
def function(ell_max)
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(384)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
(note the in actuality C1 and C2 are not global variables but are local variables that must be calculated from a set of parameters fed to function. This is not the limiting factor in the code speed however)
With the double for loop this takes ~1.5 seconds to run for ell_max~400 which is too long for the purposes I'm using it for. I'd like to vectorize this as much as possible to improve speed.
You can use either einsum or matrix multiplication for a ~20x speedup:
import numpy as np
ell_max = 400
w3j = np.random.randint(1,10,(ell_max, ell_max, ell_max))
C1 = np.random.randint(1,10,ell_max)
C2 = np.random.randint(1,10,ell_max)
def function(ell_max):
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(ell_max)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
def pp_es(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return np.einsum("i,i,j,j,kij",f,C1,f,C2,w3j,optimize=True)
def pp_mm(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return w3j.reshape(l_mx,-1)#np.outer(f*C1,f*C2).ravel()
from timeit import timeit
print(timeit(lambda:pp_es(400),number=10))
print(timeit(lambda:pp_mm(400),number=10))
print(timeit(lambda:function(400),number=10))
print((pp_mm(400)==pp_es(400)).all())
print((function(400)==pp_mm(400)).all())
Sample run:
0.6061844169162214 # einsum
0.6111843499820679 # matrix x vector
12.233918005018495 # OP
True # einsum == matrix x vector
True # OP == matrix x vector
I can first obtain the DFT matrix of a given size, say n by
import numpy as np
n = 64
D = np.fft.fft(np.eye(n))
The FFT is of course just a quick algorithm for applying D to a vector:
x = np.random.randn(n)
ft1 = np.dot(D,x)
print( np.abs(ft1 - fft.fft(x)).max() )
# prints near double precision roundoff
The 2D FFT can be obtained by applying D to both the rows and columns of a matrix:
x = np.random.randn(n,n)
ft2 = np.dot(x, D.T) # Apply D to rows.
ft2 = np.dot(D, ft2) # Apply D to cols.
print( np.abs(ft2 - fft.fft2(x)).max() )
# near machine round off again
How do I compute this analogously for the 3 dimensional Discrete Fourier Transform?
I.e.,
x = np.random.randn(n,n,n)
ft3 = # dot operations using D and x
print( np.abs(ft3 - fft.fftn(x)).max() )
# prints near zero
Essentially, I think I need to apply D to each column vector in the volume, then each row vector in the volume, and finally each "depth vector". But I'm not sure how to do this using dot.
You can use the einsum expression to perform the transformation on each index:
x = np.random.randn(n, n, n)
ft3 = np.einsum('ijk,im->mjk', x, D)
ft3 = np.einsum('ijk,jm->imk', ft3, D)
ft3 = np.einsum('ijk,km->ijm', ft3, D)
print(np.abs(ft3 - np.fft.fftn(x)).max())
1.25571216554e-12
This can also be written as a single NumPy step:
ft3 = np.einsum('ijk,im,jn,kl->mnl', ft3, D, D, D, optimize=True)
Without the optimize argument (available in NumPy 1.12+) it will be very slow however. You can also do each of the steps using dot, but it requires a bit of reshaping and transposing. In NumPy 1.14+ the einsum function will automatically detect the BLAS operations and do this for you.