Multidimensional matrix multiplication in python - python
I have matrix A of dimension 500x2000x30 and matrix B of dimension 30x5.
You can think that there are 500 instances of 2000x30 as matrix A is of dimension 500x2000x30.
I want to multiply each of 1x2000x30 from A with matrix B to obtain new matrix of size 1x2000x5.
i.e. A X B should give me a matrix of dimension 500x2000x5
Obviously looping 500 times through matrix A is a solution but is there an efficient way to achieve this?
Edit: Both A and B are numpy arrays
If you have numpy arrays you can use the np.dot function for this:
np.dot(A, B)
It will do exactly what you want, i.e. "contract" the last axis of A with the first axis of B:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
First of all any multidimensional array is a n times b split of a flatten array. The 2,3,2 dimension array, [ [ [A,B],[C,D],[E,F] ], [ [G,H],[I,J],[K,L] ] ] is simply divide A,B,C,D,E,F,G,H,I,J,K,L into 2 pieces , then 3 pieces, then 2 pieces again and there is your multidimensional array. Vectors are here two dimensional.
Secondly, two dimensional matrix multiplication is also fixed length vector multiplication with a combination. (n,m)*(m,l) dimensional matrix multiplication is actually term by term multiplication and sum of results of l different m vectors and n different m vectors. l times n combination of two different m vectors.
When you get that you simply can convert 20 dimension matrix to two dimension matrix with same vector size and multiply them and reshape it back.
Quick Example:
import numpy as np
a=np.random.random(120).reshape(2,5,3,4)
b=np.random.random(120).reshape(5,3,4,2)
br=np.rollaxis(b,3,2).reshape(30,4).T
# converted all combinations to a series of 4 dimension vectors here
test1=np.matmul(a.reshape(30,4),br).reshape(2,5,3,5,3,2)
test2=np.dot(a,b)
np.all(test1==test2)
returns
array(True)
lets do it basically for all combinations (dimensions)
sa=a.shape
sb=b.shape
res=np.ndarray([sa[0],sa[1],sb[0],sb[1],sa[2],sb[3]])
for i in range(a.shape[0]):
for j in range(a.shape[1]):
for k in range(b.shape[0]):
for n in range(b.shape[1]):
x=np.matmul(a[i,j],b[k,n])
np.append(res,x)
print(np.any(res==np.dot(a,b).reshape(2,5,5,3,3,2)))
returns
True
USE CASE:
Suppose we have a case such as for 10 different materials and 20 different geometries 3 different physical quantity in 3 dimensions vectors will be matrix multiplied for a geometry and physics processing neural network layer which will be calculated with genetic selective algorithm which has neural connection coefficient vectors of 5 different groups of population each having 100 different gene sequence (population) and 20 neural nodes. In a such case you may benefit calculating it at once or you can somehow serialize this calculation to two flat arrays and send it to your gpu or cpu according to your concurrent free ram amount. In such case you may want to understand how this calculations work.
In any case of multidimensional matrix calculation combinations of vectors are calculated term by term
You may want to multiply first term with seconds vectors last term and sum the rest. It is up to you but understanding how it works is important.
So here is a simple illustration I have used to undestand this.
[ [ [A,B],[C,D],[E,F] ], [ [G,H],[I,J],[K,L] ] ] --> ABCDEFGHIJKL
[ [ [1,2],[3,4],[5,6] ], [ [7,8],[9,0],[€,$] ] ] --> 1234567890€$
use operator(multiply) term by term shifting first array by amount of vector size (2)
ABCDEFGHIJKL CDEFGHIJKLAB FGHIJKLABCDE ...
1234567890€$ 1234567890€$ 1234567890€$ ...
Here comes all combinations
append all of them and reshape and use another operator (+)
[A+2*B],[C*3+D*4],[E*5,F*6] ...... [I*€+J*$]
Hope this helps and saves time to grasp this.
Related
Most efficient way to calculate every L2 distance between vectors of vector array A and vectors of vector array B?
I need to implement an algorithm. But it takes a lot of time to compute and I need to make it as fast as possible. Right now I have two numpy arrays: Array A -> 2000 vectors of 512 elements, Array B -> 1000 vectors of 512 elements. I need to calculate every single distance between the vectors from Array A and Array B. Right now, I take 1 vector from array A, and calculate it's distances to all vectors in Array B as follows: np.sum(np.abs(B-A[0])**2,axis=-1)**(0.5) But using this I have to loop for 2000 cycles and it takes a lot of time. Any alternatives?
sklearn.metrics.pairwise_distances solves exactly this problem.
set dimension of svd algorithm in python
svd formular: A ≈ UΣV* I use numpy.linalg.svd to run svd algorithm. And I want to set dimension of matrix. For example: A=3*5 dimension, after running numpy.linalg.svd, U=3*3 dimension, Σ=3*1 dimension, V*=5*5 dimension. I need to set specific dimension like U=3*64 dimension, V*=64*5 dimension. But it seems there is no optional dimension parameter can be set in numpy.linalg.svd.
If A is a 3 x 5 matrix then it has rank at most 3. Therefore the SVD of A contains at most 3 singular values. Note that in your example above, the singular values are stored as a vector instead of a diagonal matrix. Trivially this means that you can pad your matrices with zeroes at the bottom. Since the full S matrix contains of 3 values on the diagonal followed by the rest 0's (in your case it would be 64x64 with 3 nonzero values), the bottom rows of V and the right rows of U don't interact at all and can be set to anything you want. Keep in mind that this isn't the SVD of A anymore, but instead the condensed SVD of the matrix augmented with a lot of 0's.
How do I organize an uneven matrix of many calculations in pandas / numpy / pandas
I am calculating a model that requires a large number of calculations in a big matrix, representing interactions between households (numbering N, roughly 10E4) and firms (numbering M roughly 10E4). In particular, I want to perform the following steps: X2 is an N x M matrix representing pairwise distance between each household and each firm. The step is to multiply every entry by a parameter gamma. delta is a vector length M. The step is to broadcast multiply delta into the rows of the matrix from 1. Exponentiate the matrix from 2. Calculate the row sums of the matrix from 3. Broadcast division division by the row sum vector from 4 into the rows of the matrix from 3. w is a vector of length N. The step is to broadcast multiply w into the columns of the matrix from 5. The final step is to take column sums of the matrix from 6. These steps have to be performed 1000s of times in the context of matching the model simulation to data. At the moment, I have an implementation using a big NxM numpy array and using matrix algebra operations to perform the steps as described above. I would like to be able to reduce the number of calculations by eliminating all the "cells" where the distance is greater than some critical value r. How can I organize my data to do this, while performing all the operations I need to do (exponentiation, row/column sums, broadcasting across rows and columns)? The solution I have in mind is something like storing the distance matrix in "long form", with each row representing a household / firm pair, rather than the N x M matrix, deleting all the invalid rows to get an array whose length is something less than NM, and then performing all the calculations in this format. In this solution I am wondering if I can use pandas dataframes to make the "broadcasts" and "row sums" work properly (and quickly). How can I make that work? (Or alternately, if there is a better way I should be exploring, I'd love to know!)
Covariance matrix for 9 arrays using np.cov
I have 9 different numpy arrays that denote the same quantity, in our case xi. They are of length 19 each, i.e. they have been binned. The difference between these 9 arrays is that, they have been calculated using jackknife resampling, i.e. by omitting some elements each time and repeating the same 9 times. I would now like to calculate the covariance matrix, which should be of size 19x19. The square root of the diagonal elements of this covariance matrix should give me the error on this quantity (xi) for each bin (19 bins overall). The equation for the covariance matrix is given by: Here xi is the quantity. i and j are bins of length 19. I did't want to write a manual code, so I tried with numpy.cov: vstack = np.vstack((array1,array2,....,array9)) cov = np.cov(vstack) This is giving me a matrix of size 9x9 instead of 19x19. What is the mistake here? Each array, i.e. array1, array2...etc all are of length 19.
As you can see in the Example of the docs the shape of the output equals the number of rows squared. Therefore, when you have 9 rows you get a 9x9 matrix If you expect a 19x19 matrix then you probably mixed your columns and rows up and you should use transpose vst = np.vstack((array1,array2,....,array9)) cov_matrix = np.cov(vst.T)
Null matrix with constant diagonal, with same shape as another matrix
I'm wondering if there is a simple way to multiply a numpy matrix by a scalar. Essentially I want all values to be multiplied by the constant 40. This would be an nxn matrix with 40's on the diagonal, but I'm wondering if there is a simpler function to use to scale this matrix. Or how would I go about making a matrix with the same shape as my other matrix and fill in its diagonal? Sorry if this seems a bit basic, but for some reason I couldn't find this in the doc.
If you want a matrix with 40 on the diagonal and zeros everywhere else, you can use NumPy's function fill_diagonal() on a matrix of zeros. You can thus directly do: N = 100; value = 40 b = np.zeros((N, N)) np.fill_diagonal(b, value) This involves only setting elements to a certain value, and is therefore likely to be faster than code involving multiplying all the elements of a matrix by a constant. This approach also has the advantage of showing explicitly that you fill the diagonal with a specific value. If you want the diagonal matrix b to be of the same size as another matrix a, you can use the following shortcut (no need for an explicit size N): b = np.zeros_like(a) np.fill_diagonal(b, value)
Easy: N = 100 a = np.eye(N) # Diagonal Identity 100x100 array b = 40*a # Multiply by a scalar If you actually want a numpy matrix vs an array, you can do a = np.asmatrix(np.eye(N)) instead. But in general * is element-wise multiplication in numpy.