I'm doing a reinforcement learning project, and I'm trying to get a tensor that represents the expected reward of all the given actions. I have a long tensor of chosen actions of size batch with values of either zero or one (the two potential actions). I have a tensor of expected rewards for each action of size batch * action_size, and I want a tensor of size batch.
For example, if batch size was 4, then I have
action = tensor([1,0,0,1])
expectedReward = tensor([[3,7],[5,9],[-1,12],[0,1]])
and what I want is
rewardForActions = tensor([7,5,-1,1])
I thought this would answer my question, but it's not the same at all, because if I went with that solution, it would end up with a 4*4 tensor, selecting from each row 4 times, instead of once.
Any ideas?
You could do
rewardForActions = expectedReward.index_select(1, action).diagonal()
# tensor([ 7, 5, -1, 1])
Related
Long story short, I have an RNN that is stacked on top of a CNN.
The CNN was created and trained separately. To clarify things, let's suppose the CNN takes input in the form of a [BATCH SIZE, H, W, C] placeholder (H = height, W = width, C = number of channels).
Now, when stacked on top of the RNN, the overall input to the combined network will have the shape: [BATCH SIZE, TIME SEQUENCE, H, W, C], i.e. each sample in the minibatch consists of TIME_SEQUENCE many images. Moreover, the time sequences are variable in length. There is a separate placeholder called sequence_lengths with shape [BATCH SIZE] that contains scalar values corresponding to the length of each sample in the minibatch. The value of TIME SEQUENCE corresponds to the maximum possible time sequence length, and for samples with smaller lengths, the remaining values are padded with zeros.
What I want to do
I want to accumulate the output from the CNN in a tensor of shape [BATCH SIZE, TIME SEQUENCE, 1] (the last dimension just contains the final score output by the CNN for each time sample for each batch element) so that I can forward this entire chunk of information to the RNN that is stacked on top of the CNN. The tricky thing is, I also want to be able to back-propagate the error from the RNN to the CNN (the CNN is already pre-trained, but I would like to fine-tune the weights a bit), so I have to stay inside the graph, i.e. I can't make any calls to session.run().
Option A:
The easiest way would be to just reshape the overall network input tensor to [BATCH SIZE * TIME SEQUENCE, H, W, C]. The problem with this is that BATCH SIZE * TIME SEQUENCE may be as large as 2000, so I'm bound to run out of memory when trying to feed a batch that big into my CNN. And the batch size is too large for training anyway. Also, a lot of sequences are just padded zeros, and it'd be a waste of computation.
Option B:
Use the tf.while_loop. My idea was to treat all the images along the time axis for a single minibatch element as a minibatch for the CNN. Essentially, the CNn would be processing batches of size [TIME SEQUENCE, H, W, C] at each iteration (not exactly TIME SEQUENCE many images every time; the exact number would depend on the sequence length). The code I have right now looks like this:
# The output tensor that I want populated
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
# Counter for the loop. I'll process one batch element per iteration.
# One batch element contains a variable number of images for each time step. All these images will form a minibatch for the CNN.
loop_counter = tf.get_variable('loop_counter', dtype=tf.int32, initializer=0)
# Loop variables that will be passed to the body and cond methods
loop_vars = [input_image_sequence, sequence_lengths, image_output_sequence, loop_counter]
# input_image_sequence: [BATCH SIZE, TIME SEQUENCE, H, W, C]
# sequence_lengths: [BATCH SIZE]
# image_output_sequence: [BATCH SIZE, TIME SEQUENCE, 1]
# abbreviations for vars in loop_vars:
# iis --> input_image_sequence
# sl --> sequence_lengths
# ios --> image_output_sequence
# lc --> loop_counter
def cond(iis, sl, ios, lc):
return tf.less(lc, batch_size)
def body(iis, sl, ios, lc):
seq_len = sl[lc] # the sequence length of the current batch element
cnn_input_batch = iis[lc, :seq_len] # extract the relevant portion (the rest are just padded zeros)
# propagate this 'batch' through the CNN
my_cnn_model.process_input(cnn_input_batch)
# Pad the remaining indices
padding = [[0, 0], [0, batch_size - seq_len]]
padded_cnn_output = tf.pad(cnn_input_batch_features, paddings=padding, mode='CONSTANT', constant_values=0)
# The problematic part: assign these processed values to the output tensor
ios[lc].assign(padded_cnn_features)
return [iis, sl, ios, lc + 1]
_, _, result, _ = tf.while_loop(cond, body, loop_vars, swap_memory=True)
Inside my_cnn_model.process_input, I'm just passing the input through a vanilla CNN. All the variables created in it are with tf.AUTO_REUSE, so that should ensure that the while loop reuses the same weights for all the loop iterations.
The exact problem
image_output_sequence is a variable, but somehow when tf.while_loop calls the body method, it gets turned into a Tensor type object to which assignments can't be made. I get the error message: Sliced assignment is only supported for variables
This problem persists even if I use another format like using a tuple of BATCH SIZE Tensors each with dimensions [TIME SEQUENCE, H, W, C].
I'm open to a complete redesign of the code as well, as long as it gets the job done nicely.
The solution is to use an object of type TensorArray, which is specifically made to address such problems. The following line:
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
is replaced by:
image_output_sequence = tf.TensorArray(size=batch_size, dtype=tf.float32, element_shape=[max_sequence_length, 1], infer_shape=True)
TensorArray doesn't actually require a fixed shape for each element, but for my case it is fixed, so it's better to enforce it.
Then inside the body function, replace this:
ios[lc].assign(padded_cnn_features)
with:
ios = ios.write(lc, padded_cnn_output)
Then after the tf.while_loop statement, the TensorArray can be stacked to form a regular Tensor for further processing:
stacked_tensor = result.stack()
I'm trying to understand how the weights are scaled in a RNNCell when going from training to inference in tensorflow.
Consider the following placeholders defined as:
data = tf.placeholder(tf.int32,[None,max_seq_len])
targets = tf.placholder(tf.int32,[None,max_seq_len])
During training the batch_size is set to 10, e.g. both tensors have shape [10,max_seq_len]. However, during inference only one example is used, not a batch of ten, so the tensors have shape [1,max_seq_len].
Tensorflow handles this dimension change seamlessly, however, I'm uncertain of how it does this?
My hypothesis is that weigth tensors, in the RNNCell, are actually shape [1,hidden_dim], and scaling to larger batch sizes is acheived by broadcasting, but I'm unable to find something that reflects this in the source. I've read through the rnn source and the rnn cell source. Any help with understanding this would be much appreciated.
you have defined your data tensor as data = tf.placeholder(tf.int32,[None,max_seq_len]) which means that the first dimension will change according to the input but the second dimension will always remain max_seq_len
So if max_seq_len = 5 than you feed shape can be either [1, 5], [2, 5], [3, 5] which means you can change the first dimension but not the second one
If you change the second dimension to a number other than 5 then it will throw you an error for mismatch shape or similar error
Your input's first dimension which is the batch_size won't affect the weight matrix of any of the neurons in you network
I am having the following warning in Tensorflow: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
The reason I am getting this is:
import tensorflow as tf
# Flatten batch elements to rank-2 tensor where 1st max_length rows
#belong to first batch element and so forth
all_timesteps = tf.reshape(raw_output, [-1, n_dim]) # (batch_size*max_length, n_dim)
# Indices to last element of each sequence.
# Index to first element is the sequence order number times max
#sequence length.
# Index to last element is the index to first element plus sequence
#length.
row_inds = tf.range(0, batch_size) * max_length + (seq_len - 1)
# Gather rows with indices to last elements of sequences
# http://stackoverflow.com/questions/35892412/tensorflow-dense-gradient-explanation
# This is due to gather returning IndexedSlice which is later
#converted into a Tensor for gradient
# calculation.
last_timesteps = tf.gather(all_timesteps, row_inds) # (batch_size,n_dim)
tf.gather is causing the issue. I have been ignoring it until now because my architectures were not really big. However, now, I have bigger architectures and a lot of data. I am facing Out of memory issues when training with batch sizes bigger than 10. I believe that dealing with this warning would allow me to fit my models inside the GPU.
Please note that I am using Tensorflow 1.3.
I managed to solve the issue by using tf.dynnamic_partition instead of tf.gather . I replaced the above code like this:
# Flatten batch elements to rank-2 tensor where 1st max_length rows belong to first batch element and so forth
all_timesteps = tf.reshape(raw_output, [-1, n_dim]) # (batch_size*max_length, n_dim)
# Indices to last element of each sequence.
# Index to first element is the sequence order number times max sequence length.
# Index to last element is the index to first element plus sequence length.
row_inds = tf.range(0, batch_size) * max_length + (seq_len - 1)
# Creating a vector of 0s and 1s that will specify what timesteps to choose.
partitions = tf.reduce_sum(tf.one_hot(row_inds, tf.shape(all_timesteps)[0], dtype='int32'), 0)
# Selecting the elements we want to choose.
last_timesteps = tf.dynamic_partition(all_timesteps, partitions, 2) # (batch_size, n_dim)
last_timesteps = last_timesteps[1]
I use two stacking dynamic_rnn in my model, which means that the initial_state of the second dynamic_rnn is the final_state output by the first dynamic_rnn. My loss function is calculated only based on the output of the second dynamic_rnn. My question is that would the gradient be back propagated to the first dynamic_rnn?
You may ask me why I verbosely use two dynamic_rnn instead of one. The answer is that for my problem, most input sequences are totally identical except the last step. So I just run dynamic_rnn once for the common part of all these input sequences for the purpose of saving time and feed the final_state to another dynamic_rnn which accepts the distinct and last input elements.
Suppose that we have 3 sequences with length 10. All these sequences are identical except the last step (the 10th element). The simplified code:
cell = BasicRNNCell()
# the first dynamic_rnn which handles the common part
first_outputs, first_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float32,
sequence_length=[9], # only one sample with length 9
inputs=identical_input # input with shape (1, 9, input_element_dim)
)
# tile the first_states to accommodate next dynamic_rnn
# first_states is transformed from shape (1, hidden_state_dim) to (3, hidden_state_dim)
first_states = tf.reshape(tf.tile(first_states, [1, 3]), [3, hidden_state_dim])
# the second dynamic_rnn which handles the distinct last element
second_outputs, second_states = tf.nn.dynamic_rnn(
initial_state=first_states,
cell=cell,
dtype=tf.float32,
sequence_length=[1, 1, 1], # 3 samples with only one element
inputs=distinct_input # input with shape (3, 1, input_element_dim)
)
# calculate loss based on second_outputs
loss = some_loss_function(second_outputs, groud_truth)
It should. If you're seeing problems, please describe the error you're getting in a little more detail.
I am using dynamic_rnn to process MNIST data:
# LSTM Cell
lstm = rnn_cell.LSTMCell(num_units=200,
forget_bias=1.0,
initializer=tf.random_normal)
# Initial state
istate = lstm.zero_state(batch_size, "float")
# Get lstm cell output
output, states = rnn.dynamic_rnn(lstm, X, initial_state=istate)
# Output at last time point T
output_at_T = output[:, 27, :]
Full code: http://pastebin.com/bhf9MgMe
The input to the lstm is (batch_size, sequence_length, input_size)
As a result the dimensions of output_at_T is (batch_size, sequence_length, num_units) where num_units=200.
I need to get the last output along the sequence_length dimension. In the code above, this is hardcoded as 27. However, I do not know the sequence_length in advance as it can change from batch to batch in my application.
I tried:
output_at_T = output[:, -1, :]
but it says negative indexing is not implemented yet, and I tried using a placeholder variable as well as a constant (into which I could ideally feed the sequence_length for a particular batch); neither worked.
Any way to implement something like this in tensorflow atm?
Have you noticed that there are two outputs from dynamic_rnn?
Output 1, let's call it h, has all outputs at each time steps (i.e. h_1, h_2, etc),
Output 2, final_state, has two elements: the cell_state, and the last output for each element of the batch (as long as you input the sequence length to dynamic_rnn).
So from:
h, final_state= tf.dynamic_rnn( ..., sequence_length=[batch_size_vector], ... )
the last state for each element in the batch is:
final_state.h
Note that this includes the case when the length of the sequence is different for each element of the batch, as we are using the sequence_length argument.
This is what gather_nd is for!
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.gather_nd(data, indices)
return res
In your case (assuming sequence_length is a 1-D tensor with the length of each axis 0 element):
output = extract_axis_1(output, sequence_length - 1)
Now output is a tensor of dimension [batch_size, num_cells].
output[:, -1, :]
works with Tensorflow 1.x now!!
Most answers cover it thoroughly, but this code snip might help understand what's really being returned by the dynamic_rnn layer
=> Tuple of (outputs, final_output_state).
So for an input with max sequence length of T time steps outputs is of the shape [Batch_size, T, num_inputs] (given time_major=False; default value) and it contains the output state at each timestep h1, h2.....hT.
And final_output_state is of the shape [Batch_size,num_inputs] and has the final cell state cT and output state hT of each batch sequence.
But since the dynamic_rnn is being used my guess is your sequence lengths vary for each batch.
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.dynamic_rnn(cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n({"outputs": outputs, "states":states},
n=1,
feed_dict=None)
assert result[0]["outputs"].shape == (2, 10, 64)
print result[0]["outputs"].shape
print result[0]["states"].h.shape
# the final outputs state and states returned must be equal for each
# sequence
assert(result[0]["outputs"][0][-1]==result[0]["states"].h[0]).all()
assert(result[0]["outputs"][-1][5]==result[0]["states"].h[-1]).all()
assert(result[0]["outputs"][-1][-1]==result[0]["states"].h[-1]).all()
The final assertion will fail as the final state for the 2nd sequence is at 6th time step ie. the index 5 and the rest of the outputs from [6:9] are all 0s in the 2nd timestep
I am new to Stackoverflow and cannot comment yet so I am writing this new answer. #VM_AI, the last index is tf.shape(output)[1] - 1.
So, reusing your answer:
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1] - 1
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This works for me.
You should be able to access the shape of your output tensor using tf.shape(output). The tf.shape() function will return a 1d tensor containing the sizes of the output tensor. In your example, this would be (batch_size, sequence_length, num_units)
You should then be able to extract the value of output_at_T as output[:, tf.shape(output)[1], :]
There is a function in TensorFlow tf.shape that allows you to get the symbolic interpretation of shape rather than None being returned by output._shape[1]. And after fetching the last index you can lookup by using tf.nn.embedding_lookup, which is recommended especially when the data to be fetched is high as this does parallel lookup 32 by default.
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1]
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This should work.
Just to clarify what #Benoit Steiner said. His solution would not work as tf.shape would return symbolic interpretation of the shape value, and such cannot be used for slicing tensors i.e., direct indexing