Accumulating output from a graph using tf.while_loop (TensorFlow)

Accumulating output from a graph using tf.while_loop (TensorFlow) - python

Long story short, I have an RNN that is stacked on top of a CNN.
The CNN was created and trained separately. To clarify things, let's suppose the CNN takes input in the form of a [BATCH SIZE, H, W, C] placeholder (H = height, W = width, C = number of channels).
Now, when stacked on top of the RNN, the overall input to the combined network will have the shape: [BATCH SIZE, TIME SEQUENCE, H, W, C], i.e. each sample in the minibatch consists of TIME_SEQUENCE many images. Moreover, the time sequences are variable in length. There is a separate placeholder called sequence_lengths with shape [BATCH SIZE] that contains scalar values corresponding to the length of each sample in the minibatch. The value of TIME SEQUENCE corresponds to the maximum possible time sequence length, and for samples with smaller lengths, the remaining values are padded with zeros.
What I want to do
I want to accumulate the output from the CNN in a tensor of shape [BATCH SIZE, TIME SEQUENCE, 1] (the last dimension just contains the final score output by the CNN for each time sample for each batch element) so that I can forward this entire chunk of information to the RNN that is stacked on top of the CNN. The tricky thing is, I also want to be able to back-propagate the error from the RNN to the CNN (the CNN is already pre-trained, but I would like to fine-tune the weights a bit), so I have to stay inside the graph, i.e. I can't make any calls to session.run().
Option A:
The easiest way would be to just reshape the overall network input tensor to [BATCH SIZE * TIME SEQUENCE, H, W, C]. The problem with this is that BATCH SIZE * TIME SEQUENCE may be as large as 2000, so I'm bound to run out of memory when trying to feed a batch that big into my CNN. And the batch size is too large for training anyway. Also, a lot of sequences are just padded zeros, and it'd be a waste of computation.
Option B:
Use the tf.while_loop. My idea was to treat all the images along the time axis for a single minibatch element as a minibatch for the CNN. Essentially, the CNn would be processing batches of size [TIME SEQUENCE, H, W, C] at each iteration (not exactly TIME SEQUENCE many images every time; the exact number would depend on the sequence length). The code I have right now looks like this:
# The output tensor that I want populated
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
# Counter for the loop. I'll process one batch element per iteration.
# One batch element contains a variable number of images for each time step. All these images will form a minibatch for the CNN.
loop_counter = tf.get_variable('loop_counter', dtype=tf.int32, initializer=0)
# Loop variables that will be passed to the body and cond methods
loop_vars = [input_image_sequence, sequence_lengths, image_output_sequence, loop_counter]
# input_image_sequence: [BATCH SIZE, TIME SEQUENCE, H, W, C]
# sequence_lengths: [BATCH SIZE]
# image_output_sequence: [BATCH SIZE, TIME SEQUENCE, 1]
# abbreviations for vars in loop_vars:
# iis --> input_image_sequence
# sl --> sequence_lengths
# ios --> image_output_sequence
# lc --> loop_counter
def cond(iis, sl, ios, lc):
return tf.less(lc, batch_size)
def body(iis, sl, ios, lc):
seq_len = sl[lc] # the sequence length of the current batch element
cnn_input_batch = iis[lc, :seq_len] # extract the relevant portion (the rest are just padded zeros)
# propagate this 'batch' through the CNN
my_cnn_model.process_input(cnn_input_batch)
# Pad the remaining indices
padding = [[0, 0], [0, batch_size - seq_len]]
padded_cnn_output = tf.pad(cnn_input_batch_features, paddings=padding, mode='CONSTANT', constant_values=0)
# The problematic part: assign these processed values to the output tensor
ios[lc].assign(padded_cnn_features)
return [iis, sl, ios, lc + 1]
_, _, result, _ = tf.while_loop(cond, body, loop_vars, swap_memory=True)
Inside my_cnn_model.process_input, I'm just passing the input through a vanilla CNN. All the variables created in it are with tf.AUTO_REUSE, so that should ensure that the while loop reuses the same weights for all the loop iterations.
The exact problem
image_output_sequence is a variable, but somehow when tf.while_loop calls the body method, it gets turned into a Tensor type object to which assignments can't be made. I get the error message: Sliced assignment is only supported for variables
This problem persists even if I use another format like using a tuple of BATCH SIZE Tensors each with dimensions [TIME SEQUENCE, H, W, C].
I'm open to a complete redesign of the code as well, as long as it gets the job done nicely.

The solution is to use an object of type TensorArray, which is specifically made to address such problems. The following line:
image_output_sequence = tf.Variable(tf.zeros([batch_size, max_sequence_length, 1], tf.float32))
is replaced by:
image_output_sequence = tf.TensorArray(size=batch_size, dtype=tf.float32, element_shape=[max_sequence_length, 1], infer_shape=True)
TensorArray doesn't actually require a fixed shape for each element, but for my case it is fixed, so it's better to enforce it.
Then inside the body function, replace this:
ios[lc].assign(padded_cnn_features)
with:
ios = ios.write(lc, padded_cnn_output)
Then after the tf.while_loop statement, the TensorArray can be stacked to form a regular Tensor for further processing:
stacked_tensor = result.stack()

Related

How to reshape a dataset in order to make it sequential?

I have a classic dataset of images and labels.
Here is a simple representation of the __getitem__ function :
def __getitem__(self, index):
(img_path, label) = df.iloc[index].values
img = Image.open(img_path).convert("RGB")
y = torch.tensor(labels))
return (img, y)
I have :
dataset = ClassDataset()
train_set, validation_set = random_split(dataset)
train_loader = DataLoader(dataset=train_set)
The size of one batch of the train loader would be : [32,3,256,256]
With 32 being the batch size, 3 the number of channels and 256 the width and height of my image.
I want to modify the shape of one batch so that it is sequential [8,4,3,256,256] with 8 being the batch size and 4 the length of one sequence.
I know that it could be easily done with torch.view() or torch.reshape() knowing that my data are already in the right order (they can be grouped directly into sequences).
But I want to know where is the most intelligent place to make this change, in the dataset class, in the dataloader class or in the train loop.
I already tried passing sequences into the getitem :
(img_path, coords) = df.iloc[4*(index-1):4*index].values
(assuming that sequence length is 4), but it didn't work.

It is more relevant to do this kind of processing in the dataset layer. Indeed, what you are looking to implement there is "given a dataset index index return the corresponding input and its label". In your case you are dealing with a sequence as input, so something like this makes sense for your __getitem__ to return a sequence of images.
The data loader will automatically collate the data such that you get (batch_size, seq_len, channel, height, width) for your input, and (batch_size, seq_len) for your label (or (batch_size,) if there is meant to be a single label per sequence).

Variable sentence length for LSTM using word2vec as inputs on tensorflow

I am building an LSTM Model using word2vec as an input. I am using the tensorflow framework. I have finished word embedding part, but I am stuck with LSTM part.
The issue here is that I have different sentence lengths, which means that I have to either do padding or use dynamic_rnn with specified sequence length. I am struggling with both of them.
Padding.
The confusing part of padding is when I do padding. My model goes like
word_matrix=model.wv.syn0
X = tf.placeholder(tf.int32, shape)
data = tf.placeholder(tf.float32, shape)
data = tf.nn.embedding_lookup(word_matrix, X)
Then, I am feeding sequences of word indices for word_matrix into X. I am worried that if I pad zero's to the sequences fed into X, then I would incorrectly keep feeding unnecessary input (word_matrix[0] in this case).
So, I am wondering what is the correct way of 0 padding. It would be great if you let me know how to implement it with tensorflow.
dynamic_rnn
For this, I have declared a list containing all the lengths of sentences and feed those along with X and y at the end. In this case, I cannot feed the inputs as batch though. Then, I have encountered this error (ValueError: as_list() is not defined on an unknown TensorShape.), which seems to me that sequence_length argument only accepts list? (My thoughts might be entirely incorrect though).
The following is my code for this.
X = tf.placeholder(tf.int32)
labels = tf.placeholder(tf.int32, [None, numClasses])
length = tf.placeholder(tf.int32)
data = tf.placeholder(tf.float32, [None, None, numDimensions])
data = tf.nn.embedding_lookup(word_matrix, X)
lstmCell = tf.contrib.rnn.BasicLSTMCell(lstmUnits, state_is_tuple=True)
lstmCell = tf.contrib.rnn.DropoutWrapper(cell=lstmCell, output_keep_prob=0.25)
initial_state=lstmCell.zero_state(batchSize, tf.float32)
value, _ = tf.nn.dynamic_rnn(lstmCell, data, sequence_length=length,
initial_state=initial_state, dtype=tf.float32)
I am so struggling with this part so that any help would be very much appreciated.
Thank you in advance.

Tensorflow does not support variable length Tensor. So when you declare a Tensor, the list/numpy array should have a uniform shape.
From your 1st part, what I understand is that you were already able to pad the zeros in the last time steps of the sequence length. Which is what the ideal situation should be. Here is how it should look for a batch size of 4, max sequence length 10 and 50 hidden units ->
[4,10,50] would be the size of your whole batch, but internally, it may be shaped like this when you try to visualize the paddings ->
`[[5+5pad,50],[10,50],[8+2pad,50],[9+1pad,50]`
Each pad would represent a sequence length of 1 with hidden state size 50 Tensor. All filled with nothing but zeroes. Look at this question and this one to know more about how to pad manually.
You will use dynamic rnn for the exact reason that you do not want to compute it on the padding sequences. The tf.nn.dynamic_rnn api will ensure that by passing the sequence_length argument.
For the above example, that argument will be: [5,10,8,9] for the example above. You can compute it by summing the non-zero entities for each batch component. A simple way to compute that would be:
data_mask = tf.cast(data, tf.bool)
data_len = tf.reduce_sum(tf.cast(data_mask, tf.int32), axis=1)
and pass it in the tf.nn.dynamic_rnn api:
tf.nn.dynamic_rnn(lstmCell, data, sequence_length=data_len, initial_state=initial_state)

Output of Tensorflow LSTM-Cell

I've got a question on Tensorflow LSTM-Implementation. There are currently several implementations in TF, but I use:
cell = tf.contrib.rnn.BasicLSTMCell(n_units)
where n_units is the amount of 'parallel' LSTM-Cells.
Then to get my output I call:
rnn_outputs, rnn_states = tf.nn.dynamic_rnn(cell, x,
initial_state=initial_state, time_major=False)
where (as time_major=False) x is of shape (batch_size, time_steps, input_length)
where batch_size is my batch_size
where time_steps is the amount of timesteps my RNN will go through
where input_length is the length of one of my input vectors (vector fed into the network on one specific timestep on one specific batch)
I expect rnn_outputs to be of shape (batch_size, time_steps, n_units, input_length) as I have not specified another output size.
Documentation of nn.dynamic_rnn tells me that output is of shape (batch_size, input_length, cell.output_size).
The documentation of tf.contrib.rnn.BasicLSTMCell does have a property output_size, which is defaulted to n_units (the amount of LSTM-cells I use).
So does each LSTM-Cell only output a scalar for every given timestep? I would expect it to output a vector of the length of the input vector. This seems not to be the case from how I understand it right now, so I am confused. Can you tell me whether that's the case or how I could change it to output a vector of size of the input vector per single lstm-cell maybe?

I think the primary confusion is on the terminology of the LSTM cell's argument: num_units. Unfortunately it doesn't mean, as the name suggests, "the no. of LSTM cells" that should be equal to your time-steps. They actually correspond to the number of dimensions in the hidden state (cell state + hidden state vector).
The call to dynamic_rnn() returns a tensor of shape: [batch_size, time_steps, output_size] where,
(Please note this) output_size = num_units; if (num_proj = None) in the lstm cell
where as, output_size = num_proj; if it is defined.
Now, typically, you will extract the last time_step's result and project it to the size of output dimensions using a mat-mul + biases operation manually, or use the num_proj argument in the LSTM cell.
I have been through the same confusion and had to look really deep to get it cleared. Hope this answer clears some of it.

TensorFlow Multi-Layer Perceptron

I am learning TensorFlow, and my goal is to implement MultiPerceptron for my needs. I checked the MNIST tutorial with MultiPerceptron implementation and everything was clear to me except this:
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
I guess, x is an image itself(28*28 pixels, so the input is 784 neurons) and y is a label which is an 1x10 array:
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
They feed whole batches (which are packs of data points and labels)! How does tensorflow interpret this "batch" input? And how does it update the weights: simultaneously after each element in a batch, or after running through the whole batch?
And, if I need to input one number (input_shape = [1,1]) and output four numbers (output_shape = [1,4]), how should I change the tf.placeholders and in which form should I feed them into session?
When I ask, how does tensorflow interpret it, I want to know how tensorflow splits the batch into single elements. For example, batch is a 2-D array, right? In which direction does it split an array? Or it uses matrix operations and doesn't split anything?
When I ask, how should I feed my data, I want to know, should it be a 2-D array with samples at its rows and features at its columns, or, maybe, could it be a 2-D list.
When I feed my float numpy array X_train to x, which is :
x = tf.placeholder("float", [1, n_input])
I receive an error:
ValueError: Cannot feed value of shape (1, 18) for Tensor 'Placeholder_10:0', which has shape '(1, 1)'
It appears that I have to create my data as a Tensor too?
When I tried [18x1]:
Cannot feed value of shape (18, 1) for Tensor 'Placeholder_12:0', which has shape '(1, 1)'

They feed whole bathces(which are packs of data points and labels)!
Yes, this is how neural networks are usually trained (due to some nice mathematical properties of having best of two worlds - better gradient approximation than in SGD on one hand and much faster convergence than full GD).
How does tensorflow interpret this "batch" input?
It "interprets" it according to operations in your graph. You probably have reduce mean somewhere in your graph, which calculates average over your batch, thus causing this to be the "interpretation".
And how does it update the weights: 1.simultaniusly after each element in a batch? 2. After running threw the whole batch?.
As in the previous answer - there is nothing "magical" about batch, it is just another dimension, and each internal operation of neural net is well defined for the batch of data, thus there is still a single update in the end. Since you use reduce mean operation (or maybe reduce sum?) you are updating according to mean of the "small" gradients (or sum if there is reduce sum instead). Again - you could control it (up to the agglomerative behaviour, you cannot force it to do per-sample update unless you introduce while loop into the graph).
And, if i need to imput one number(input_shape = [1,1]) and ouput four nubmers (output_shape = [1,4]), how should i change the tf.placeholders and in which form should i feed them into session? THANKS!!
just set the variables, n_input=1 and n_classes=4, and you push your data as before, as [batch, n_input] and [batch, n_classes] arrays (in your case batch=1, if by "1x1" you mean "one sample of dimension 1", since your edit start to suggest that you actually do have a batch, and by 1x1 you meant a 1d input).
EDIT: 1.when i ask, how does tensorflow interpret it, i want to know, how tensorflow split the batch into single elements. For example, batch is a 2-D array, right? In which direction it splits an array. Or it uses matrix operations and doesnt split anything? 2. When i ask, how should i feed my data, i want to know, should it be a 2-D array with samples at its rows and features at its colums, or, maybe, could it be a 2-D list.
It does not split anything. It is just a matrix, and each operation is perfectly well defined for matrices as well. Usually you put examples in rows, thus in first dimension, and this is exactly what [batch, n_inputs] says - that you have batch rows each with n_inputs columns. But again - there is nothing special about it, and you could also create a graph which accepts column-wise batches if you would really need to.

Get last output of dynamic_rnn in tensorflow?

I am using dynamic_rnn to process MNIST data:
# LSTM Cell
lstm = rnn_cell.LSTMCell(num_units=200,
forget_bias=1.0,
initializer=tf.random_normal)
# Initial state
istate = lstm.zero_state(batch_size, "float")
# Get lstm cell output
output, states = rnn.dynamic_rnn(lstm, X, initial_state=istate)
# Output at last time point T
output_at_T = output[:, 27, :]
Full code: http://pastebin.com/bhf9MgMe
The input to the lstm is (batch_size, sequence_length, input_size)
As a result the dimensions of output_at_T is (batch_size, sequence_length, num_units) where num_units=200.
I need to get the last output along the sequence_length dimension. In the code above, this is hardcoded as 27. However, I do not know the sequence_length in advance as it can change from batch to batch in my application.
I tried:
output_at_T = output[:, -1, :]
but it says negative indexing is not implemented yet, and I tried using a placeholder variable as well as a constant (into which I could ideally feed the sequence_length for a particular batch); neither worked.
Any way to implement something like this in tensorflow atm?

Have you noticed that there are two outputs from dynamic_rnn?
Output 1, let's call it h, has all outputs at each time steps (i.e. h_1, h_2, etc),
Output 2, final_state, has two elements: the cell_state, and the last output for each element of the batch (as long as you input the sequence length to dynamic_rnn).
So from:
h, final_state= tf.dynamic_rnn( ..., sequence_length=[batch_size_vector], ... )
the last state for each element in the batch is:
final_state.h
Note that this includes the case when the length of the sequence is different for each element of the batch, as we are using the sequence_length argument.

This is what gather_nd is for!
def extract_axis_1(data, ind):
"""
Get specified elements along the first axis of tensor.
:param data: Tensorflow tensor that will be subsetted.
:param ind: Indices to take (one for each element along axis 0 of data).
:return: Subsetted tensor.
"""
batch_range = tf.range(tf.shape(data)[0])
indices = tf.stack([batch_range, ind], axis=1)
res = tf.gather_nd(data, indices)
return res
In your case (assuming sequence_length is a 1-D tensor with the length of each axis 0 element):
output = extract_axis_1(output, sequence_length - 1)
Now output is a tensor of dimension [batch_size, num_cells].

output[:, -1, :]
works with Tensorflow 1.x now!!

Most answers cover it thoroughly, but this code snip might help understand what's really being returned by the dynamic_rnn layer
=> Tuple of (outputs, final_output_state).
So for an input with max sequence length of T time steps outputs is of the shape [Batch_size, T, num_inputs] (given time_major=False; default value) and it contains the output state at each timestep h1, h2.....hT.
And final_output_state is of the shape [Batch_size,num_inputs] and has the final cell state cT and output state hT of each batch sequence.
But since the dynamic_rnn is being used my guess is your sequence lengths vary for each batch.
import tensorflow as tf
import numpy as np
from tensorflow.contrib import rnn
tf.reset_default_graph()
# Create input data
X = np.random.randn(2, 10, 8)
# The second example is of length 6
X[1,6:] = 0
X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)
outputs, states = tf.nn.dynamic_rnn(cell=cell,
dtype=tf.float64,
sequence_length=X_lengths,
inputs=X)
result = tf.contrib.learn.run_n({"outputs": outputs, "states":states},
n=1,
feed_dict=None)
assert result[0]["outputs"].shape == (2, 10, 64)
print result[0]["outputs"].shape
print result[0]["states"].h.shape
# the final outputs state and states returned must be equal for each
# sequence
assert(result[0]["outputs"][0][-1]==result[0]["states"].h[0]).all()
assert(result[0]["outputs"][-1][5]==result[0]["states"].h[-1]).all()
assert(result[0]["outputs"][-1][-1]==result[0]["states"].h[-1]).all()
The final assertion will fail as the final state for the 2nd sequence is at 6th time step ie. the index 5 and the rest of the outputs from [6:9] are all 0s in the 2nd timestep

I am new to Stackoverflow and cannot comment yet so I am writing this new answer. #VM_AI, the last index is tf.shape(output)[1] - 1.
So, reusing your answer:
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1] - 1
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This works for me.

You should be able to access the shape of your output tensor using tf.shape(output). The tf.shape() function will return a 1d tensor containing the sizes of the output tensor. In your example, this would be (batch_size, sequence_length, num_units)
You should then be able to extract the value of output_at_T as output[:, tf.shape(output)[1], :]

There is a function in TensorFlow tf.shape that allows you to get the symbolic interpretation of shape rather than None being returned by output._shape[1]. And after fetching the last index you can lookup by using tf.nn.embedding_lookup, which is recommended especially when the data to be fetched is high as this does parallel lookup 32 by default.
# Let's first fetch the last index of seq length
# last_index would have a scalar value
last_index = tf.shape(output)[1]
# Then let's reshape the output to [sequence_length,batch_size,num_units]
# for convenience
output_rs = tf.transpose(output,[1,0,2])
# Last state of all batches
last_state = tf.nn.embedding_lookup(output_rs,last_index)
This should work.
Just to clarify what #Benoit Steiner said. His solution would not work as tf.shape would return symbolic interpretation of the shape value, and such cannot be used for slicing tensors i.e., direct indexing

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.