I am trying to take an inner product of two vectors in tensorflow, for which I use the dot product:
x = tf.constant([1, 2, 3], dtype=tf.int32)
y = tf.constant([4, 5, 6], dtype=tf.int32)
# desired result
tf.tensordot(x, y, axes=1)
# Output: 32
Now I'm dealing with batch tensors which both have shape (32, 3). I still want the same operation, yielding an output vector of shape (32, ). My only succesful attempt so far is:
tf.linalg.diag_part(tf.tensordot(x, y, axes=[[1], [1]]))
# Output: <tf.Tensor: shape=(32,)>
# where each entry is the inner product of the vectors of length 3
However, I compute 32 as many inner products as required.
How do I solve my problem more efficiently?
Think about what this operation is at the end of the day: Element-wise multiplication and a sum over axis 1. So you can just do this
tf.reduce_sum(x * y, axis=1)
Related
I am trying to backpropagate from scratch using batches and I am having issues calculating dx. First, I would like to start by defining variables to avoid confusion:
a - The activation value calculated by passing z through an activation function
z - The value before the activation function of the layer
x - The inputs into the layer
w - The weights that connect the inputs to the output nodes
da - The derivative of a
dz - The derivative of z
dx - The derivative of x
I know that this is the derivative of x:
dx = w.T*dz
Note: * means dot and .T means transpose
Now let me introduce the problem. Say I have a neural network with 2 inputs, 3 output nodes, and a batch size of 5. How would I go about computing dx? In this case, the weights would be of shape (z, x) or (3, 2) before transposing and dz would be of shape (z, batches) or (3, 5). If I were to use the formula above, I would get a shape of (x, batches) or (2, 5). Would I take the sum with respect to the last dimension after using the formula above to get dx (resulting in a shape of (2, 1))? Below is a representation of the dot product using made-up values:
w.T * dz = dx
[[1, 2, 3, 4, 5],
[[1, 0.5, 1], * [1, 2, 3, 4, 5], = [[2.5, 5, 7.5, 10, 12.5],
[-1, -1, -0.5] [1, 2, 3, 4, 5]] [-2.5, -5, -7.5, -10, -12.5]]
You did everything correctly. X always needs to have the same dimensions as dX in backpropagation. If X was an intermediate outcome of shape (2,5), then the gradient also has the shape (2,5). In this way you can update the matrix X. Now in your case matrix X is the input matrix, which you will never update. You only need to update W.
If X was the result of a hidden layer, your calculation of the backpropagated gradient is correct.
I'm trying to build a neural net however I can't figure out where I'm going wrong with the max pooling layer.
self.embed1 = nn.Embedding(256, 8)
self.conv_1 = nn.Conv2d(1, 64, (7,8), padding = (0,0))
self.fc1 = nn.Linear(64, 2)
def forward(self,x):
import pdb; pdb.set_trace()
x = self.embed1(x) #input a tensor of ([1,217]) output size: ([1, 217, 8])
x = x.unsqueeze(0) #conv lay needs a tensor of size (B x C x W x H) so unsqueeze here to make ([1, 1, 217, 8])
x = self.conv_1(x) #creates 64 filter of size (7, 8).Outputs ([1, 64, 211, 1]) as 6 values lost due to not padding.
x = torch.max(x,0) #returning max over the 64 columns. This returns a tuple of length 2 with 64 values in each att, the max val and indices.
x = x[0] #I only need the max values. This returns a tensor of size ([64, 211, 1])
x = x.squeeze(2) #linear layer only wants the number of inputs and number of outputs so I squeeze the tensor to ([64, 211])
x = self.fc1(x) #Error Size mismatch (M1: [64 x 211] M2: [64 x 2])
I understand why the linear layer isn't accepting 211 however I don't understand why my tensor after maxing over the columns isn't 64 x 2.
You use of torch.max returns two outputs: the max value along dim=0 and the argmax along that dimension. Thus, you need to pick only the first output. (you might want to consider using adaptive max pooling for this task).
Your linear layer expects its input to have dim 64 (that is batch_size-by-64 shaped tensor). However, it seems like your x[0] is of shape 13504x1 - definitely not 64.
See this thread for example.
If I'm guessing your intentions correctly, your mistake is that you're using torch.max for 2d maxpooling, instead of torch.nn.functional.max_pool2d. The former reduces across a tensor dimension (for instance across all feature maps or all horizontal lines), whereas the latter reduces in each square spatial neighborhood in the [h, w] plane of a [batch, features, h, w] tensor.
Instead of this:
x = x.squeeze(2)
You can do this instead:
x = x.view(-1, 64) # view will now correctly resize it to [64 x 2]
You can think of view as numpy reshape. We use -1 to signify that we don't know how many rows we want but we know how many columns we have, 64.
I'm trying to setup a simple tf.keras model in which a vector is fed in as input and the output is the result of a single matrix multiply.
The lines of code to create the model suceed but calling it for a forward pass results in an error.
n_input_nodes = 2
n_output_nodes = 1
x = tf.keras.Input(shape=(n_input_nodes,))
W = tf.ones((n_input_nodes,n_output_nodes), dtype=tf.float32)
y = tf.matmul(x, W)
model = tf.keras.Model(inputs=x, outputs=y)
x_input = tf.constant([10,30.], shape=[1, 2])
output = model(x_input)
The final line (i.e. the forward pass) throws the following error:
ValueError: Argument must be a dense tensor: [array([[1.], [1.]], dtype=float32)] - got shape [1, 2, 1], but wanted [1].
The input is of shape (2,1) and the weight matrix has shape (2,1). Matrix multiply between the two should be a valid multiplication and result in a [1,1] tensor; however, this is not the case.
They require a dense tensor and not a sparse tensor. Consider this shape
W = tf.ones((n_input_nodes,), dtype=tf.float32)
It requires a tensor of shape ( 2, ) which is dense.
For any 2D tensor like
[[2,5,4,7],
[7,5,6,8]],
I want to do softmax for the top k element in each row and then construct a new tensor by replacing all the other elements to 0.
The result should be to get the softmax of top k (here k=2) elements for each row [[7,5],[8,7]],
which is thus
[[0.880797,0.11920291],
[0.7310586,0.26894143]]
and then reconstruct a new tensor according to the index of the top k elements in the original tensor, the final result should be
[[0,0.11920291,0,0.880797],
[0.26894143,0,0,0.7310586]].
Is it possible to implement this kind of masked softmax in tensorflow? Many thanks in advance!
Here is how you can do that:
import tensorflow as tf
# Input data
a = tf.placeholder(tf.float32, [None, None])
num_top = tf.placeholder(tf.int32, [])
# Find top elements
a_top, a_top_idx = tf.nn.top_k(a, num_top, sorted=False)
# Apply softmax
a_top_sm = tf.nn.softmax(a_top)
# Reconstruct into original shape
a_shape = tf.shape(a)
a_row_idx = tf.tile(tf.range(a_shape[0])[:, tf.newaxis], (1, num_top))
scatter_idx = tf.stack([a_row_idx, a_top_idx], axis=-1)
result = tf.scatter_nd(scatter_idx, a_top_sm, a_shape)
# Test
with tf.Session() as sess:
result_val = sess.run(result, feed_dict={a: [[2, 5, 4, 7], [7, 5, 6, 8]], num_top: 2})
print(result_val)
Output:
[[0. 0.11920291 0. 0.880797 ]
[0.26894143 0. 0. 0.7310586 ]]
EDIT:
Actually, there is a function that more closely does what you intend, tf.sparse.softmax. However, it requires a SparseTensor as input, and I'm not sure it should be faster since it has to figure out which sparse values go together in the softmax. The good thing about this function is that you could have different number of elements to softmax in each row, but in your case that does not seem to be important. Anyway, here is an implementation with that, in case you find it useful.
import tensorflow as tf
a = tf.placeholder(tf.float32, [None, None])
num_top = tf.placeholder(tf.int32, [])
# Find top elements
a_top, a_top_idx = tf.nn.top_k(a, num_top, sorted=False)
# Flatten values
sparse_values = tf.reshape(a_top, [-1])
# Make sparse indices
shape = tf.cast(tf.shape(a), tf.int64)
a_row_idx = tf.tile(tf.range(shape[0])[:, tf.newaxis], (1, num_top))
sparse_idx = tf.stack([a_row_idx, tf.cast(a_top_idx, tf.int64)], axis=-1)
sparse_idx = tf.reshape(sparse_idx, [-1, 2])
# Make sparse tensor
a_top_sparse = tf.SparseTensor(sparse_idx, sparse_values, shape)
# Reorder sparse tensor
a_top_sparse = tf.sparse.reorder(a_top_sparse)
# Softmax
result_sparse = tf.sparse.softmax(a_top_sparse)
# Convert back to dense (or you can keep working with the sparse tensor)
result = tf.sparse.to_dense(result_sparse)
# Test
with tf.Session() as sess:
result_val = sess.run(result, feed_dict={a: [[2, 5, 4, 7], [7, 5, 6, 8]], num_top: 2})
print(result_val)
# Same as before
Let's say you have a weights tensor w with shape (None, N)
Find the minimum value of the top k elements
top_kw = tf.math.top_k(w, k=10, sorted=False)[0]
min_w = tf.reduce_min(top_kw, axis=1, keepdims=True)
Generate a boolean mask for the weights tensor
mask_w = tf.greater_equal(w, min_w)
mask_w = tf.cast(mask_w, tf.float32)
Compute custom softmax using the mask
w = tf.multiply(tf.exp(w), mask_w) / tf.reduce_sum(tf.multiply(tf.exp(w), mask_w), axis=1, keepdims=True)
I have two tensors of shape N x D1 and M x D2 where D1 > D2, called X and Y respectively. For my task, X acts as the input and Y acts as the filter.
I want to calculate a matrix P of shape N x M x (D1-D2+1) such that:
P[0,0,0] = dot(X[0,0:D2], Y[0,:])
P[0,0,1] = dot(X[0,1:D2+1], Y[0,:])
...
P[N-1,M-1,D1-D2] = dot(X[N-1,D1-D2:D1], Y[M-1,:])
I can create a for loop and manually slide Y and calculate the dot products.
However I prefer using the correlation operator.
As I know, tensorflow has correlation operator implemented (https://www.tensorflow.org/versions/master/api_docs/python/nn/convolution) but I don't know how can I use my tensors as inputs and filters.
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
In your case, I'd set strides to 1, and padding to SAME.
tf.nn.conv2d(X, Y, strides=1, padding=SAME)
Yes, you can use indeed tf.nn.conv2d(), but you should add both batch and channel dimensions:
X = tf.expand_dims(tf.expand_dims(X,0),-1)
# X.shape [batch=1, in_height, in_width, in_channels=1]
Y = tf.expand_dims(tf.expand_dims(Y,-1),-1)
# Y.shape = [filter_height, filter_width, in_channels=1, out_channels=1]
# Convolution (actually correlation, see doc of conv2d)
xcorr = tf.nn.conv2d(X, Y, padding="VALID", strides=[1, 1, 1, 1])
# Padding should be VALID, since you've already padded your input
CAVEAT: However, you cannot extrapolate this approach for batches of signals, since tf.nn.conv2d uses always the same filter over the batch dimension, and from my understanding you do want to change it.