i have a question regarding the usage of the torch.scatter() function.
I want to construct a weights matrix weights (# [B, N, V]. B is batch size, N is number of points and V is the number of features for each point. )
Let's say i have two tensors
a = # shape [B, N, k], where B is batch size, N is number of points, k is the index number within [0,V] to select feature.
b = # shape [B, N, k], where B is batch size, N is number of points, k stores here the weights for selected feature.
I tried to use function torch.scatter():
weights.scatter_(index=a, dim=2, value=some_fix_value). By this operation i can only set one fixed value, but not the whole value tensor b, which contains all information at those location.
Can someone gives me a hint on how to do this properly?
I believe what you are looking to do is:
weights.scatter_(dim=2, index=a, src=b)
In other words, a's last dimension is indexing b's last dimension. Which corresponds to the following operation in pseudo-code when torch.scatter's dim argument is set to 2:
out[i][j][a[i][j][k]] = b[i][j][k]
Related
I was trying to write a function called spatial_batchnorm_forward, used in Convolutional Neural Network. In this function, I wanted to reuse the batchnorm_foward function, which is implemented for a (N, D) shaped input in Fully Connected Network.
The following is a correct implementation.
def spatial_batchnorm_forward(x, gamma, beta, bn_param):
"""Computes the forward pass for spatial batch normalization.
"""
out, cache = None, None
N, C, H, W = x.shape
x_ = x.transpose(0,2,3,1).reshape(N*H*W, C)
out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
out = out_.reshape(N, H, W, C).transpose(0,3,1,2)
return out, cache
But at first, I wrote it as:
def spatial_batchnorm_forward(x, gamma, beta, bn_param):
"""Computes the forward pass for spatial batch normalization.
"""
out, cache = None, None
N, C, H, W = x.shape
x_ = x.reshape(-1, C)
out_, cache = batchnorm_forward(x_, gamma, beta, bn_param)
out = out_.reshape(N, C, H, W)
return out, cache
This code can run, which means those dimensions match. But the output is slightly different from the above one.
I was wondering what's going on here.
Really appreciate your patience and help!!!
I guess the problem occurs in the reshape function, so I read the document.
numpy.reshape(a, newshape, order='C')[source]
Gives a new shape to an array without changing its data.
Parameters
aarray_like
Array to be reshaped.
newshapeint or tuple of ints
The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
order{‘C’, ‘F’, ‘A’}, optional
Read the elements of a using this index order, and place the elements into the reshaped array using this index order. ‘C’ means to read / write the elements using C-like index order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to read / write the elements using Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of indexing. ‘A’ means to read / write the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise.
Returns
reshaped_arrayndarray
This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.
But I still can't figure out what's going on here.
I have one tensor which is A = 40x1.
i need to multiply this one with 3 other tensors: B = 40x100x384, C = 40x10, D=40x10.
for example in tensor B, we got 40 100x384 matrixes and i need each one of these matrixes to be multiplied with its corresponding element from A
what is the best way to do this in pytorch? Suppose that we could have more matrixes like B,C,D they will always be in the style 40xKxL or 40xJ
If I understand correctly, you want to multiply every i-th matrix K x L by the corresponding i-th scalar in A.
One possible way is:
(A * B.view(len(A), -1)).view(B.shape)
Or you can use the power of broadcasting:
A = A.reshape(len(A), 1, 1)
# now A is (40, 1, 1) and you can do
A*B
A*C
A*D
essentially each trailing dimension equal to 1 in A is stretched and copied to match the other matrix.
I am trying to apply a weighted average scheme on RNN output.
RNN output is represented by tensor A having dimension (a,b,c).
I can simply take tf.reduce_mean(A,axis=1) to get the tensor C having dimension (a,c).
However, I want to do the "weighted average" of tensor A along axis = 1.
Weights are specified in the matrix B having dimension (d,b).
For d = 1, I can do tf.tensordot(A,B,[1,1]) to get the result of dimension (a,c).
Now for d=a, I am unable to compute the weighted average.
Can someone suggest a solution?
I don't quite get why B should have dimensions (d,b). If B contains the weights to do a weighted average of A across only one dimension, B only has to be a vector (b,), not a matrix.
If B is a vector, you can do:
C = tf.tensordot(A,B,[1,0]) to get a vector C of shape (a,c) which contains the weighted average of A across axis=1 using the weights specified in B.
Update:
You can do something like:
A = A*B[:,:,None]
which is doing element wise multiplication of A and B, where B stores the weights given to each element in A.
Then:
C = tf.reduce_mean(A,axis=1)
will do the weighted average since each element in A has been multiplied by its weight.
Since B is already normalized, the answer is
tf.reduce_sum(A * B[:, :, None], axis=1)
Indexing with None adds a new dimension, a behavior inherited from numpy.B[:,:, None] adds a last dimension so the result has shape (a, b, 1). You can achieve the same thing with tf.expand_dims, whose name may make more sense to you.
A has shape (a, b, c) while B[:, :, None] has shape (a, b, 1). When they are multiplied, expanded B will be treated as having shape (a, b, c) too, with the last dimension being c copies of the same value. This is called broadcasting.
Because of how broadcasting works, the same answer also works if B has shape (1, b).
I have two rank-2 Tensors with equal sizes along the second dimension, but unequal along the first. For example, tensor A of shape [a, n] and tensor B of shape [b, n]. They can be regarded as two arrays containing vectors of length n.
I have a function f which takes two inputs, each a tensor of shape [n], and returns a scalar. I want to apply this function to each pair of vectors in A and B with the result being a tensor C of shape [a, b] such that, for each location (i, j) in C, C[i, j] = f(A[i], B[j]).
If these were just regular Numpy arrays, I could accomplish this with the following code:
# Assume a, b, and n are integers, and A and B are Numpy arrays
C = numpy.zeros((a, b))
for i in range(0, a):
for j in range(0, b):
C[i, j] = f(A[i], B[j])
return C
If this could be accomplished in such a way that f simply takes A and B as input and returns C, that would be the preferred solution, so that everything happens as proper tensor operations, so that it can all be properly parallelized by Tensorflow. Just so long as the end result is the same.
I have found a solution to this problem specifically for when f calculates the euclidean distance between each pair of vectors. I would like to extend this to other functions, such as cosine distance or Manhattan (L1) distance.
a = tf.random_normal([10,5])
b = tf.random_normal([20,5])
I would start by re-orienting the two arrays like this:
a = a[:,tf.newaxis,:]
b = b[tf.newaxis,:,:]
Now the shapes are [a,1,n] and [1,b,n], so we can broad-cast a subtraction to calculate the delta for each pair:
delta = (a-b)
This has a shape of [a,b,n].
Now the Euclidean distance is straight forward.
(axis=-1 summs over the last axis):
distance = tf.reduce_sum(delta**2,axis = -1)**0.5
And you're done:
print(distance)
<tf.Tensor 'pow_3:0' shape=(10, 20) dtype=float32>
I have some data represented by input_x. It is a tensor of unknown size (should be inputted by batch) and each item there is of size n. input_x undergoes tf.nn.embedding_lookup, so that embed now has dimensions [?, n, m] where m is the embedding size and ? refers to the unknown batch size.
This is described here:
input_x = tf.placeholder(tf.int32, [None, n], name="input_x")
embed = tf.nn.embedding_lookup(W, input_x)
I'm now trying to multiply each sample in my input data (which is now expanded by embedding dimension) by a matrix variable, U, and I can't seem to get how to do that.
I first tried using tf.matmul but it gives an error due to mismatch in shapes. I then tried the following, by expanding the dimension of U and applying batch_matmul (I also tried the function from tf.nn.math_ops., the result was the same):
U = tf.Variable( ... )
U1 = tf.expand_dims(U,0)
h=tf.batch_matmul(embed, U1)
This passes the initial compilation, but then when actual data is applied, I get the following error:
In[0].dim(0) and In[1].dim(0) must be the same: [64,58,128] vs [1,128,128]
I also know why this is happening - I replicated the dimension of U and it is now 1, but the minibatch size, 64, doesn't fit.
How can I do that matrix multiplication on my tensor-matrix input correctly (for unknown batch size)?
Previous answers are obsolete. Currently tf.matmul() support tensors with rank > 2:
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
Also tf.batch_matmul() was removed and tf.matmul() is the right way to do batch multiplication. The main idea can be understood from the following code:
import tensorflow as tf
batch_size, n, m, k = 10, 3, 5, 2
A = tf.Variable(tf.random_normal(shape=(batch_size, n, m)))
B = tf.Variable(tf.random_normal(shape=(batch_size, m, k)))
tf.matmul(A, B)
Now you will receive a tensor of the shape (batch_size, n, k). Here is what is going on here. Assume you have batch_size of matrices nxm and batch_size of matrices mxk. Now for each pair of them you calculate nxm X mxk which gives you an nxk matrix. You will have batch_size of them.
Notice that something like this is also valid:
A = tf.Variable(tf.random_normal(shape=(a, b, n, m)))
B = tf.Variable(tf.random_normal(shape=(a, b, m, k)))
tf.matmul(A, B)
and will give you a shape (a, b, n, k)
1. I want to multiply a batch of matrices with a batch of matrices of the same length, pairwise
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((batch_size, m, p))
# python >= 3.5
MN = M # N
# or the old way,
MN = tf.matmul(M, N)
# MN has shape (batch_size, n, p)
2. I want to multiply a batch of matrices with a batch of vectors of the same length, pairwise
We fall back to case 1 by adding and removing a dimension to v.
M = tf.random_normal((batch_size, n, m))
v = tf.random_normal((batch_size, m))
Mv = (M # v[..., None])[..., 0]
# Mv has shape (batch_size, n)
3. I want to multiply a single matrix with a batch of matrices
In this case, we cannot simply add a batch dimension of 1 to the single matrix, because tf.matmul does not broadcast in the batch dimension.
3.1. The single matrix is on the right side
In that case, we can treat the matrix batch as a single large matrix, using a simple reshape.
M = tf.random_normal((batch_size, n, m))
N = tf.random_normal((m, p))
MN = tf.reshape(tf.reshape(M, [-1, m]) # N, [-1, n, p])
# MN has shape (batch_size, n, p)
3.2. The single matrix is on the left side
This case is more complicated. We can fall back to case 3.1 by transposing the matrices.
MT = tf.matrix_transpose(M)
NT = tf.matrix_transpose(N)
NTMT = tf.reshape(tf.reshape(NT, [-1, m]) # MT, [-1, p, n])
MN = tf.matrix_transpose(NTMT)
However, transposition can be a costly operation, and here it is done twice on an entire batch of matrices. It may be better to simply duplicate M to match the batch dimension:
MN = tf.tile(M[None], [batch_size, 1, 1]) # N
Profiling will tell which option works better for a given problem/hardware combination.
4. I want to multiply a single matrix with a batch of vectors
This looks similar to case 3.2 since the single matrix is on the left, but it is actually simpler because transposing a vector is essentially a no-op. We end-up with
M = tf.random_normal((n, m))
v = tf.random_normal((batch_size, m))
MT = tf.matrix_transpose(M)
Mv = v # MT
What about einsum?
All of the previous multiplications could have been written with the tf.einsum swiss army knife. For example the first solution for 3.2 could be written simply as
MN = tf.einsum('nm,bmp->bnp', M, N)
However, note that einsum is ultimately relying on tranpose and matmul for the computation.
So even though einsum is a very convenient way to write matrix multiplications, it hides the complexity of the operations underneath — for example it is not straightforward to guess how many times an einsum expression will transpose your data, and therefore how costly the operation will be. Also, it may hide the fact that there could be several alternatives for the same operation (see case 3.2) and might not necessarily choose the better option.
For this reason, I would personally use explicit formulas like those above to better convey their respective complexity. Although if you know what you are doing and like the simplicity of the einsum syntax, then by all means go for it.
The matmul operation only works on matrices (2D tensors). Here are two main approaches to do this, both assume that U is a 2D tensor.
Slice embed into 2D tensors and multiply each of them with U individually. This is probably easiest to do using tf.scan() like this:
h = tf.scan(lambda a, x: tf.matmul(x, U), embed)
On the other hand if efficiency is important it may be better to reshape embed to be a 2D tensor so the multiplication can be done with a single matmul like this:
embed = tf.reshape(embed, [-1, m])
h = tf.matmul(embed, U)
h = tf.reshape(h, [-1, n, c])
where c is the number of columns in U. The last reshape will make sure that h is a 3D tensor where the 0th dimension corresponds to the batch just like the original x_input and embed.
As answered by #Stryke, there are two ways to achieve this: 1. Scanning, and 2. Reshaping
tf.scan requires lambda functions and is generally used for recursive operations. Some examples for the same are here: https://rdipietro.github.io/tensorflow-scan-examples/
I personally prefer reshaping, since it is more intuitive. If you are trying to matrix multiply each matrix in the 3D tensor by the matrix that is the 2D tensor, like Cijl = Aijk * Bkl, you can do it with a simple reshape.
A' = tf.reshape(Aijk,[i*j,k])
C' = tf.matmul(A',Bkl)
C = tf.reshape(C',[i,j,l])
It seems that in TensorFlow 1.11.0 the docs for tf.matmul incorrectly say that it works for rank >= 2.
Instead, the best clean alternative I've found is to use tf.tensordot(a, b, (-1, 0)) (docs).
This function gets the dot product of any axis of array a and any axis of array b in its general form tf.tensordot(a, b, axis). Providing axis as (-1, 0) gets the standard dot product of two arrays.