I have a tensor containing five 2x2 matrices - shape (1,5,2,2), and a tensor containing 5 elements - shape ([5]). I want to multiply each 2x2 matrix(in the former tensor) with the corresponding value (in the latter tensor). The resultant tensor should be of shape (1,5,2,2). How to do that?
Getting the following error when I run this code
a = torch.rand(1,5,2,2)
print(a.shape)
b = torch.rand(5)
print(b.shape)
mul = a*b
RuntimeError: The size of tensor a (2) must match the size of tensor b (5) at non-singleton dimension 3
You can use either a * b or torch.mul(a, b) but you must use permute() before and after you multiply, in order to have the compatible shape:
import torch
a = torch.ones(1,5,2,2)
b = torch.rand(5)
a.shape # torch.Size([1, 5, 2, 2])
b.shape # torch.Size([5])
c = (a.permute(0,2,3,1) * b).permute(0,3,1,2)
c.shape # torch.Size([1, 5, 2, 2])
# OR #
c = torch.mul(a.permute(0,2,3,1), b).permute(0,3,1,2)
c.shape # torch.Size([1, 5, 2, 2])
The permute() function transposes the dimention in the order of it's arguments. I.e, a.permute(0,2,3,1) will be of shape torch.Size([1, 2, 2, 5]) which fits the shape of b (torch.Size([5])) for matrix multiplication, since the last dimention of a equals the first dimention of b. After we finish the multiplication we transpose it again, using permute(), to the. desired shape of torch.Size([1, 5, 2, 2]) by permute(0,3,1,2).
You can read about permute() in the docs. But it works with it's arguments numbering the current shape of [1, 5, 2, 2] by 0 to 3, and permutes as you inserted the arguments, meaning for a.permute(0,2,3,1) it will keep the first dimention in its place, since the first argument is 0, the second dimention it will move to the forth dimention, since the index 1 is the forth argument. And the third and forth dimention will move to the second and third dimention, because the 2 and 3 indices are located in the second and third place. Remember when talking about the 4th dimention for instance, its representation as an argument is 3 (not 4).
EDIT
If you want to element-wise multiply tensors of shape [32,5,2,2] and [32,5] for example, such that each 2x2 matrix will be multiplied by the corresponding value, you could rearrange the dimentions as [2,2,32,5] by permute(2,3,0,1), then perform the multiplication by a * b and then return to the original shape by permute(2,3,0,1) again. The key here, is that the last n dimentions of the first matrix, need to align with the first n dimentions of the second matrix. In our case n=2.
Hope that helps.
Related
I try to run the code like below:
>>> import numpy as np
>>> A = np.array([[1,2], [3,4], [5,6]])
>>> A.shape
(3, 2)
>>> B = np.array([7,8])
>>> B.shape
(2,)
>>> np.dot(A,B)
array([23, 53, 83])
I thought the shape of np.dot(A,B) should be (1,3) not (3,).
The result of matrix return should be:
array([[23],[53],[83]])
23
53
83
not
array([23,53,83])
23 53 83
why the result occurred?
As its name suggests, the primary purpose of the numpy.dot() function is to deliver a scalar result by performing a traditional linear algebra dot product on two arrays of identical shape (m,).
Given this primary purpose, the documentation of numpy.dot() also talks about this scenario as the first (the first bullet point below):
numpy.dot(a, b, out=None)
1. If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
2. If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
3. If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.
4. If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
Your case is covered by the 4 th bullet point above (as pointed out by #hpaulj) in his comments.
But then, it still does not fully answer your question as to why the result has shape (3,), and not (3,1) as you expected.
You are justified in expecting a result-shape of (3,1), only if shape of B is (2,1). In such a case, since A has shape (3,2), and B has shape (2,1), you'd be justified in expecting a result-shape of (3,1).
But here, B has a shape of (2,), and not (2,1). So, we are now in a territory that's outside the jurisdiction of the usual rules of matrix multiplication. So, it's really up to the designers of the numpy.dot() function as to how the result turns out to be. They could've chosen to treat this as an error ("dimension mis-match"). Instead, they've chosen to deal with this scenario, as described in this answer.
I'm quoting that answer, with some modifications to relate your code:
According to numpy a 1D array has only 1 dimension and all checks
are done against that dimension. Because of this we find that np.dot(A,B)
checks second dimension of A against the one dimension of B
So, the check would succeed, and numpy wouldn't treat this as an error.
Now, the only remaining question is why is the result-shape (3,) and not (3,1) or (1,3).
The answer to this is: in A, which has shape (3,2), we have consumed the last part (2,) to perform sum-product. The un-consumed part of A's shape is (3,), and hence the shape of the result of np.dot(A,B), would be (3,). To understand this further, if we take a different example in which A has a shape of (3,4,2), instead of (3,2), the un-consumed part of A's shape would be (3,4,), and the result of np.dot(A,B) would be (3,4,) instead of (3,) which your example produced.
Here's the code for you to verify:
import numpy as np
A = np.arange(24).reshape(3,4,2)
print ("A is:\n", A, ", and its shape is:", A.shape)
B = np.array([7,8])
print ("B is:\n", B, ", and its shape is:", B.shape)
C = np.dot(A,B)
print ("C is:\n", C, ", and its shape is:", C.shape)
The output of this is:
A is:
[[[ 0 1]
[ 2 3]
[ 4 5]
[ 6 7]]
[[ 8 9]
[10 11]
[12 13]
[14 15]]
[[16 17]
[18 19]
[20 21]
[22 23]]] , and its shape is: (3, 4, 2)
B is:
[7 8] , and its shape is: (2,)
C is:
[[ 8 38 68 98]
[128 158 188 218]
[248 278 308 338]] , and its shape is: (3, 4)
Another helpful perspective to understand the behavior in this example is below:
The array A of shape (3,4,2) can be conceptually visualized as an outer array of inner arrays, where the outer array has shape (3,4), and each inner array has shape (2,). On each of these inner arrays, the traditional dot product will therefore be performed using the array B (which has shape (2,), and the resulting scalars are all left in their own respective places, to form a (3,4) shape (the outer matrix shape). So, the overall result of numpy.dot(A,B), consisting of all these in-place scalar results, would have the shape (3,4).
In wiki
So (3, 2) dot with (2,1) will be (3,1)
How to fix
np.dot(A,B[:,None])
Out[49]:
array([[23],
[53],
[83]])
I just learned this dot product from Neural Network...
Anyway, it is the dot product between "1d" array and "nd" array.
enter image description here
As we can see, it calculates the sum of the multiplication for elements separately in the red box as "17 + 28"
Then enter image description here
Then enter image description here
A.shape is (3, 2), B.shape is (2,) this situation could directly use the rule #4 for the dot operation np.dot(A,B):
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.
Because the alignment will happen between B's 2 (only axis of B) and A's 2 (last axis of A) and 2 indeed equals 2, numpy will judge that this is absolutely legitimate for dot operation. Therefore these two "2" are "consumed", leaving A's (3,) "in the wild". This (3,) will therefore be the shape of the result.
I am following a tutorial to implement the K-nearest Neighbor algorithm on a dataset.
I have an array of shape (6003,) and I want to do this:
data = data.reshape((data.shape[0], 3072))
However, I am getting this error:
cannot reshape array of size 6003 into shape (6003,3072)
Any help on this, please? Thanks!
when you reshape a numpy array the total number elements shouldn't change.
e.g. a =[2,3,4,5,1,7] if you want to reshape this to a 2Darray then the dimensions multiplied should be equal to the total number elements in the original array a.
this means you can reshape array a in to dimension of (1,6) (2,3),(6,1),(3,2).
the title of your question does give away the error by the way.
Reshaping array of shape (x,) into an array of shape (x,y)
is impossible because you are trying to add more elements into your original data.
an array of shape (x,) can only be reshaped into an array of shape (x/y,y)
I hope this helps.
You are trying to reshape into an incompatible shape. Now, what do I mean by that? Look at this example:
a = np.array([[1, 2, 3],
[4, 5, 6],
])
The shape of this array is:
a.shape
>> (2, 3)
Array a has 2 x 3 = 6 elements. Let's try to reshape it into a (2, 6) array
a.reshape(2, 6)
This raises
>> ValueError: cannot reshape array of size 6 into shape (2,6)
Notice that we were trying to make an array that has 2 x 3 = 6 elements into an array that would have 2 x 6 = 12 elements. But NumPy cannot add those extra elements into your original array and give that your desired shape. So it raises ValueError.
In your case, you are trying to make an array with 6003 elements into an array that will have 6003 x 3072 = 18441216 elements!
I have a batch of matrices A with size torch.Size([batch_size, 9, 5]) and weight matrices B with size torch.Size([3, 5, 6]). In Keras, a simple K.dot(A, B) is able to handle the matrix multiplication to give an output with size (batch_size, 9, 3, 6). Here, each row in A is multiplied to the 3 matrices in B to form a (3x6) matrix.
How do you perform a similar operation in torch. From the documentation, torch.bmm requires that A and B must have the same batch size, so I tried this:
B = B.unsqueeze(0).repeat((batch_size, 1, 1, 1))
B.size() # torch.Size([batch_size, 3, 5, 6])
torch.bmm(A,B) # gives an error
RuntimeError: invalid argument 2: expected 3D tensor, got 4D
Well, the error is expected but how do I perform such an operation?
You can use einstein notation to describe the operation you want as bxy,iyk->bxik. So, you can use einsum to calculate it.
torch.einsum('bxy,iyk->bxik', (A, B)) will give you the answer you want.
I am trying to add a new column to my image dataset.
Sample Code:
import numpy as np
A = np.arange(240).reshape(3,4,4,5)
print(type(A))
print(A.shape)
B = np.concatenate([A, np.ones((A.shape[0],4,4,5,1),dtype=int)], axis=1)
print(B.shape)
Gives error:
ValueError: all the input arrays must have same number of dimensions
Context:
Consider this as m samples of read images (nH=height, nW=Weight, nC=channels).
Dataset is of shape (m, nH, nW, nC )and now I want to add additional column reflecting the image is of "good" example or "bad" example of an object.
Thus, want to create a dataset with label added in the dataset to form shape : (m,nH,nW,nC,l) where l stands for label and can have values either 0 or 1.
How can i achieve this? Thanks in advance.
Even simpler without reshaping :
A = np.random.rand(3, 4, 4, 5)
B = A[None] # Append a new dimension at the beginning, shape (1, 3, 4, 4, 5)
B = A[:,:,None] # Append a new dimension in the middle, shape (3, 4, 1, 4, 5)
B = A[:,:,:,:,None] # Append a new dimension at the end, shape (3, 4, 4, 5, 1)
Basically, the position of None indicates where to add the new dimension.
You don't need to add the fifth column explicitly. Just reshape and add the fifth dimension.
import numpy as np
A = np.arange(240).reshape(3,4,4,5,1) # add the fifth dimension here
print(type(A))
print(A.shape)
To set the "good" or "bad" label, just access the last dimension of A
I don't understand broadcasting. The documentation explains the rules of broadcasting but doesn't seem to define it in English. My guess is that broadcasting is when NumPy fills a smaller dimensional array with dummy data in order to perform an operation. But this doesn't work:
>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
The error message hints that I'm on the right track, though. Can someone define broadcasting and then provide some simple examples of when it works and when it doesn't?
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations.
It's basically a way numpy can expand the domain of operations over arrays.
The only requirement for broadcasting is a way aligning array dimensions such that either:
Aligned dimensions are equal.
One of the aligned dimensions is 1.
So, for example if:
x = np.ndarray(shape=(4,1,3))
y = np.ndarray(shape=(3,3))
You could not align x and y like so:
4 x 1 x 3
3 x 3
But you could like so:
4 x 1 x 3
3 x 3
How would an operation like this result?
Suppose we have:
x = np.ndarray(shape=(1,3), buffer=np.array([1,2,3]),dtype='int')
array([[1, 2, 3]])
y = np.ndarray(shape=(3,3), buffer=np.array([1,1,1,1,1,1,1,1,1]),dtype='int')
array([[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
The operation x + y would result in:
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I hope you caught the drift. If you did not, you can always check the official documentation here.
Cheers!
1.What is Broadcasting?
Broadcasting is a Tensor operation. Helpful in Neural Network (ML, AI)
2.What is the use of Broadcasting?
Without Broadcasting addition of only identical Dimension(shape) Tensors is supported.
Broadcasting Provide us the Flexibility to add two Tensors of Different Dimension.
for Example: adding a 2D Tensor with a 1D Tensor is not possible without broadcasting see the image explaining Broadcasting pictorially
Run the Python example code understand the concept
x = np.array([1,3,5,6,7,8])
y = np.array([2,4,5])
X=x.reshape(2,3)
x is reshaped to get a 2D Tensor X of shape (2,3), and adding this 2D Tensor X with 1D Tensor y of shape(1,3) to get a 2D Tensor z of shape(2,3)
print("X =",X)
print("\n y =",y)
z=X+y
print("X + y =",z)
You are almost correct about smaller Tensor, no ambiguity, the smaller tensor will be broadcasted to match the shape of the larger tensor.(Small vector is repeated but not filled with Dummy Data or Zeros to Match the Shape of larger).
3. How broadcasting happens?
Broadcasting consists of two steps:
1 Broadcast axes are added to the smaller tensor to match the ndim of
the larger tensor.
2 The smaller tensor is repeated alongside these new axes to match the full shape
of the larger tensor.
4. Why Broadcasting not happening in your code?
your code is working but Broadcasting can not happen here because both Tensors are different in shape but Identical in Dimensional(1D).
Broadcasting occurs when dimensions are nonidentical.
what you need to do is change Dimension of one of the Tensor, you will experience Broadcasting.
5. Going in Depth.
Broadcasting(repetition of smaller Tensor) occurs along broadcast axes but since both the Tensors are 1 Dimensional there is no broadcast Axis.
Don't Confuse Tensor Dimension with the shape of tensor,
Tensor Dimensions are not same as Matrices Dimension.
Broadcasting is numpy trying to be smart when you tell it to perform an operation on arrays that aren't the same dimension. For example:
2 + np.array([1,3,5]) == np.array([3, 5, 7])
Here it decided you wanted to apply the operation using the lower dimensional array (0-D) on each item in the higher-dimensional array (1-D).
You can also add a 0-D array (scalar) or 1-D array to a 2-D array. In the first case, you just add the scalar to all items in the 2-D array, as before. In the second case, numpy will add row-wise:
In [34]: np.array([1,2]) + np.array([[3,4],[5,6]])
Out[34]:
array([[4, 6],
[6, 8]])
There are ways to tell numpy to apply the operation along a different axis as well. This can be taken even further with applying an operation between a 3-D array and a 1-D, 2-D, or 0-D array.
>>> x = np.array([1,3,5])
>>> y = np.array([2,4])
>>> x+y
*** ValueError: operands could not be broadcast together with shapes (3,) (2,)
Broadcasting is how numpy do math operations with array of different shapes. Shapes are the format the array has, for example the array you used, x , has 3 elements of 1 dimension; y has 2 elements and 1 dimension.
To perform broadcasting there are 2 rules:
1) Array have the same dimensions(shape) or
2)The dimension that doesn't match equals one.
for example x has shape(2,3) [or 2 lines and 3 columns];
y has shape(2,1) [or 2 lines and 1 column]
Can you add them? x + y?
Answer: Yes, because the mismatched dimension is equal to 1 (the column in y). If y had shape(2,4) broadcasting would not be possible, because the mismatched dimension is not 1.
In the case you posted:
operands could not be broadcast together with shapes (3,) (2,);
it is because 3 and 2 mismatched altough both have 1 line.
I would like to suggest to try the np.broadcast_arrays, run some demos may give intuitive ideas. Official Document is also helpful. From my current understanding, numpy will compare the dimension from tail to head. If one dim is 1, it will broadcast in the dimension, if one array has more axes, such (256*256*3) multiply (1,), you can view (1) as (1,1,1). And broadcast will make (256,256,3).