Perform a batch matrix - multiple weight matrices multiplications in pytorch - python

I have a batch of matrices A with size torch.Size([batch_size, 9, 5]) and weight matrices B with size torch.Size([3, 5, 6]). In Keras, a simple K.dot(A, B) is able to handle the matrix multiplication to give an output with size (batch_size, 9, 3, 6). Here, each row in A is multiplied to the 3 matrices in B to form a (3x6) matrix.
How do you perform a similar operation in torch. From the documentation, torch.bmm requires that A and B must have the same batch size, so I tried this:
B = B.unsqueeze(0).repeat((batch_size, 1, 1, 1))
B.size() # torch.Size([batch_size, 3, 5, 6])
torch.bmm(A,B) # gives an error
RuntimeError: invalid argument 2: expected 3D tensor, got 4D
Well, the error is expected but how do I perform such an operation?

You can use einstein notation to describe the operation you want as bxy,iyk->bxik. So, you can use einsum to calculate it.
torch.einsum('bxy,iyk->bxik', (A, B)) will give you the answer you want.

Related

Pytorch Batchwise block diagonal

I have two tensors containing batches of matrices of the same batch size (first dimension) but different matrix structure (all other dimensions).
For example A of shape (n,d,d) and B (n,e,e).
Now I would like to build block diagonals of A and B for all n.
So that the output shape (n,(d+e),(d+e)).
Is there an implementation for a problem like this?
I could only find torch.block_diag which is not suited for dimensions higher than 2.
Unfortunately there's no vectorized implementation, you'd have to loop through the batch:
A = torch.rand((2, 2, 2))
B = torch.rand((2, 3, 3))
C = torch.zeros((2, 5, 5))
for i in range(2):
C[i] = torch.block_diag(A[i], B[i])

Different Matrix multiplication behaviour between Keras and Pytorch

I was trying to understand how matrix multiplication works over 2 dimensions in DL frameworks and I stumbled upon an article here.
He used Keras to explain the same and it works for him.
But when I try to reproduce the same code in Pytorch, it fails with the error as in the output of the following code
Pytorch Code:
a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.matmul(a,b)
print(c.shape)
Output: RuntimeError: The size of tensor a (2) must match the size of tensor b (7) at non-singleton dimension 0
Keras Code:
a = K.ones((2,3,4))
b = K.ones((7,4,5))
c = K.dot(a,b)
print(c.shape)
Output:(2, 3, 7, 5)
Can somebody explain what is it that I'm doing wrong?
Matrix multiplication (aka matrix dot product) is a well defined algebraic operation taking two 2D matrices.
Deep-learning frameworks (e.g., tensorflow, keras, pytorch) are tuned to operate of batches of matrices, hence they usually implement batched matrix multiplication, that is, applying matrix dot product to a batch of 2D matrices.
The examples you linked to show how matmul processes a batch of matrices:
a = tf.ones((9, 8, 7, 4, 2))
b = tf.ones((9, 8, 7, 2, 5))
c = tf.matmul(a, b)
Note how all but last two dimensions are identical ((9, 8, 7)).
This is NOT the case in your example - the leading ("batch") dimensions are different, hence the error.
Using identical leading dimensions in pytorch:
a = torch.ones((2,3,4))
b = torch.ones((2,4,5))
c = torch.matmul(a,b)
print(c.shape)
results with
torch.Size([2, 3, 5])
If you insist on dot products with different batch dimensions, you will have to explicitly define how to multiply the two tensors. You can do that using the very flexible torch.einsum:
a = torch.ones((2,3,4))
b = torch.ones((7,4,5))
c = torch.einsum('ijk,lkm->ijlm', a, b)
print(c.shape)
Resulting with:
torch.Size([2, 3, 7, 5])

pytorch: How to do layer wise multiplication?

I have a tensor containing five 2x2 matrices - shape (1,5,2,2), and a tensor containing 5 elements - shape ([5]). I want to multiply each 2x2 matrix(in the former tensor) with the corresponding value (in the latter tensor). The resultant tensor should be of shape (1,5,2,2). How to do that?
Getting the following error when I run this code
a = torch.rand(1,5,2,2)
print(a.shape)
b = torch.rand(5)
print(b.shape)
mul = a*b
RuntimeError: The size of tensor a (2) must match the size of tensor b (5) at non-singleton dimension 3
You can use either a * b or torch.mul(a, b) but you must use permute() before and after you multiply, in order to have the compatible shape:
import torch
a = torch.ones(1,5,2,2)
b = torch.rand(5)
a.shape # torch.Size([1, 5, 2, 2])
b.shape # torch.Size([5])
c = (a.permute(0,2,3,1) * b).permute(0,3,1,2)
c.shape # torch.Size([1, 5, 2, 2])
# OR #
c = torch.mul(a.permute(0,2,3,1), b).permute(0,3,1,2)
c.shape # torch.Size([1, 5, 2, 2])
The permute() function transposes the dimention in the order of it's arguments. I.e, a.permute(0,2,3,1) will be of shape torch.Size([1, 2, 2, 5]) which fits the shape of b (torch.Size([5])) for matrix multiplication, since the last dimention of a equals the first dimention of b. After we finish the multiplication we transpose it again, using permute(), to the. desired shape of torch.Size([1, 5, 2, 2]) by permute(0,3,1,2).
You can read about permute() in the docs. But it works with it's arguments numbering the current shape of [1, 5, 2, 2] by 0 to 3, and permutes as you inserted the arguments, meaning for a.permute(0,2,3,1) it will keep the first dimention in its place, since the first argument is 0, the second dimention it will move to the forth dimention, since the index 1 is the forth argument. And the third and forth dimention will move to the second and third dimention, because the 2 and 3 indices are located in the second and third place. Remember when talking about the 4th dimention for instance, its representation as an argument is 3 (not 4).
EDIT
If you want to element-wise multiply tensors of shape [32,5,2,2] and [32,5] for example, such that each 2x2 matrix will be multiplied by the corresponding value, you could rearrange the dimentions as [2,2,32,5] by permute(2,3,0,1), then perform the multiplication by a * b and then return to the original shape by permute(2,3,0,1) again. The key here, is that the last n dimentions of the first matrix, need to align with the first n dimentions of the second matrix. In our case n=2.
Hope that helps.

How to concatenate two tensors having different shape with TensorFlow?

Hello I'm new with TensorFlow and I'd like to concatenate a 2D tensor to a 3D one. I don't know how to do it by exploiting TensorFlow functions.
tensor_3d = [[[1,2], [3,4]], [[5,6], [7,8]]] # shape (2, 2, 2)
tensor_2d = [[10,11], [12,13]] # shape (2, 2)
out: [[[1,2,10,11], [3,4,10,11]], [[5,6,12,13], [7,8,12,13]]] # shape (2, 2, 4)
I would make it work by using loops and new numpy arrays, but in that way I wouldn't use TensorFlow transformations. Any suggestions on how to make this possible? I don't see how transformations like: tf.expand_dims or tf.reshape may help here...
Thanks for sharing your knowledge.
This should do the trick:
import tensorflow as tf
a = tf.constant([[[1,2], [3,4]], [[5,6], [7,8]]])
b = tf.constant([[10,11], [12,13]])
c = tf.expand_dims(b, axis=1) # Add dimension
d = tf.tile(c, multiples=[1,2,1]) # Duplicate in this dimension
e = tf.concat([a,d], axis=-1) # Concatenate on innermost dimension
with tf.Session() as sess:
print(e.eval())
Gives:
[[[ 1 2 10 11]
[ 3 4 10 11]]
[[ 5 6 12 13]
[ 7 8 12 13]]]
There is actually a different trick, that is used from time to time in code bases such as OpenAI's baselines.
Suppose you have two tensors for your gaussian policy. mu and std. The standard deviation has the same shape as mu for batch size 1, but because you use the same parameterized standard deviation for all actions, when the batch size is larger than 1 the two would differ:
mu : Size<batch_size, feat_n>
std: Size<1, feat_n>
In this case a simple thing to do (as what the OpenAI baseline does) is to do:
params = tf.concat([mu, mu * 0 + std])
The zero multiplication casts the std into the same rank as mu.
Enjoy, and good luck training!
ps: numpy and tensorflow's concat operator does not automagically apply broadcasting because according to the maintainers, when the shape of two tensors doesn't match, it is usually the result of a programming error. This is not a big deal in numpy because the computation are evaluated eagerly. But with tensorflow this means that you have to explicitly broadcast the lower rank (or the one that has shape [1, *_]) by hand using the tf.shape operator.

is there a way to broadcast tensor in tensordot operation in tensorflow?

I want to multiply stacked matrix which is expressed in tensor form.
tensor.shape == [2,5,7,6]
where 2 and 5 is size of batch,
tensor2.shape == [5,6,8]
where 5 is batch size.
In numpy, tensor2 is automatically broadcasted to [2,5,7,6] tensor
so I can easily use np.matmul(tensor,tensor2)
but in tensorflow, error occurs.
I tried tf.expand_dims(tensor2,0) but this also doesn't work
is there any way to broadcast tensor in tensorflow?
You could use tf.einsum:
tf.einsum('abij,bjk->abik', tensor, tensor2)
Example:
import tensorflow as tf
x = tf.zeros((2, 5, 7, 6))
y = tf.zeros((5, 6, 8))
z = tf.einsum('abij,bjk->abik', x, y)
z.shape.as_list()
# returns [2, 5, 7, 8]
The most general and appropriate way to tackle such problems is to use tf.einsum. This function allows you to directly specify the multiplication rules using Einstein notation which was invented to operate with tensors of arbitrary dimenstions.

Categories

Resources