Tensorflow, best way to save state in RNNs? - python

I currently have the following code for a series of chained together RNNs in tensorflow. I am not using MultiRNN since I was to do something later on with the output of each layer.
for r in range(RNNS):
with tf.variable_scope('recurent_%d' % r) as scope:
state = [tf.zeros((BATCH_SIZE, sz)) for sz in rnn_func.state_size]
time_outputs = [None] * TIME_STEPS
for t in range(TIME_STEPS):
rnn_input = getTimeStep(rnn_outputs[r - 1], t)
time_outputs[t], state = rnn_func(rnn_input, state)
time_outputs[t] = tf.reshape(time_outputs[t], (-1, 1, RNN_SIZE))
scope.reuse_variables()
rnn_outputs[r] = tf.concat(1, time_outputs)
Currently I have a fixed number of time steps. However I would like to change it to have only one timestep but remember the state between batches. I would therefore need to create a state variable for each layer and assign it the final state of each of the layers. Something like this.
for r in range(RNNS):
with tf.variable_scope('recurent_%d' % r) as scope:
saved_state = tf.get_variable('saved_state', ...)
rnn_outputs[r], state = rnn_func(rnn_outputs[r - 1], saved_state)
saved_state = tf.assign(saved_state, state)
Then for each of the layers I would need to evaluate the saved state in my sess.run function as well as calling my training function. I would need to do this for every rnn layer. This seems like kind of a hassle. I would need to track every saved state and evaluate it in run. Also then run would need to copy the state from my GPU to host memory which would be inefficient and unnecessary. Is there a better way of doing this?

Here is the code to update the LSTM's initial state, when state_is_tuple=True by defining state variables. It also supports multiple layers.
We define two functions - one for getting the state variables with an initial zero state and one function for returning an operation, which we can pass to session.run in order to update the state variables with the LSTM's last hidden state.
def get_state_variables(batch_size, cell):
# For each layer, get the initial state and make a variable out of it
# to enable updating its value.
state_variables = []
for state_c, state_h in cell.zero_state(batch_size, tf.float32):
state_variables.append(tf.contrib.rnn.LSTMStateTuple(
tf.Variable(state_c, trainable=False),
tf.Variable(state_h, trainable=False)))
# Return as a tuple, so that it can be fed to dynamic_rnn as an initial state
return tuple(state_variables)
def get_state_update_op(state_variables, new_states):
# Add an operation to update the train states with the last state tensors
update_ops = []
for state_variable, new_state in zip(state_variables, new_states):
# Assign the new state to the state variables on this layer
update_ops.extend([state_variable[0].assign(new_state[0]),
state_variable[1].assign(new_state[1])])
# Return a tuple in order to combine all update_ops into a single operation.
# The tuple's actual value should not be used.
return tf.tuple(update_ops)
We can use that to update the LSTM's state after each batch. Note that I use tf.nn.dynamic_rnn for unrolling:
data = tf.placeholder(tf.float32, (batch_size, max_length, frame_size))
cell_layer = tf.contrib.rnn.GRUCell(256)
cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)
# For each layer, get the initial state. states will be a tuple of LSTMStateTuples.
states = get_state_variables(batch_size, cell)
# Unroll the LSTM
outputs, new_states = tf.nn.dynamic_rnn(cell, data, initial_state=states)
# Add an operation to update the train states with the last state tensors.
update_op = get_state_update_op(states, new_states)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run([outputs, update_op], {data: ...})
The main difference to this answer is that state_is_tuple=True makes the LSTM's state a LSTMStateTuple containing two variables (cell state and hidden state) instead of just a single variable. Using multiple layers then makes the LSTM's state a tuple of LSTMStateTuples - one per layer.
Resetting to zero
When using a trained model for prediction / decoding, you might want to reset the state to zero. Then, you can make use of this function:
def get_state_reset_op(state_variables, cell, batch_size):
# Return an operation to set each variable in a list of LSTMStateTuples to zero
zero_states = cell.zero_state(batch_size, tf.float32)
return get_state_update_op(state_variables, zero_states)
For example like above:
reset_state_op = get_state_reset_op(state, cell, max_batch_size)
# Reset the state to zero before feeding input
sess.run([reset_state_op])
sess.run([outputs, update_op], {data: ...})

I am now saving the RNN states using the tf.control_dependencies. Here is an example.
saved_states = [tf.get_variable('saved_state_%d' % i, shape = (BATCH_SIZE, sz), trainable = False, initializer = tf.constant_initializer()) for i, sz in enumerate(rnn.state_size)]
W = tf.get_variable('W', shape = (2 * RNN_SIZE, RNN_SIZE), initializer = tf.truncated_normal_initializer(0.0, 1 / np.sqrt(2 * RNN_SIZE)))
b = tf.get_variable('b', shape = (RNN_SIZE,), initializer = tf.constant_initializer())
rnn_output, states = rnn(last_output, saved_states)
with tf.control_dependencies([tf.assign(a, b) for a, b in zip(saved_states, states)]):
dense_input = tf.concat(1, (last_output, rnn_output))
dense_output = tf.tanh(tf.matmul(dense_input, W) + b)
last_output = dense_output + last_output
I just make sure that part of my graph is dependent on saving the state.

These two links are also related and useful for this question:
https://github.com/tensorflow/tensorflow/issues/2695
https://github.com/tensorflow/tensorflow/issues/2838

Related

PyTorch: "one of the variables needed for gradient computation has been modified by an inplace operation"

I'm training a PyTorch RNN on a text file of song lyrics to predict the next character given a character.
Here's how my RNN is defined:
import torch.nn as nn
import torch.optim
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
# from input, previous hidden state to new hidden state
self.i2h = nn.Linear(input_size + hidden_size, hidden_size)
# from input, previous hidden state to output
self.i2o = nn.Linear(input_size + hidden_size, output_size)
# softmax on output
self.softmax = nn.LogSoftmax(dim = 1)
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
#get new hidden state
hidden = self.i2h(combined)
#get output
output = self.i2o(combined)
#apply softmax
output = self.softmax(output)
return output, hidden
def initHidden(self):
return torch.zeros(1, self.hidden_size)
rnn = RNN(input_size = num_chars, hidden_size = 200, output_size = num_chars)
criterion = nn.NLLLoss()
lr = 0.01
optimizer = torch.optim.AdamW(rnn.parameters(), lr = lr)
Here's my training function:
def train(train, target):
hidden = rnn.initHidden()
loss = 0
for i in range(len(train)):
optimizer.zero_grad()
# get output, hidden state from rnn given input char, hidden state
output, hidden = rnn(train[i].unsqueeze(0), hidden)
#returns the index with '1' - indentifying the index of the right character
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss += criterion(output, target_class)
loss.backward(retain_graph = True)
optimizer.step()
print("done " + str(i) + " loop")
return output, loss.item() / train.size(0)
When I run my training function, I get this error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [274, 74]], which is output 0 of TBackward, is at version 5; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Interestingly, it makes it through two complete loops of the training function before giving me that error.
Now, when I remove the retain_graph = True from loss.backward(), I get this error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
It shouldn't be trying to go backward through the graph multiple times here. Perhaps the graph is not getting cleared between training loops?
The issue is you are accumulating your loss values (and at the same time, the computation graphs associated attached to them) on variable loss, here:
loss += criterion(output, target_class)
In turn, this means at every iteration you are trying to backpropagate through the current and previous loss values that were computed in previous inferences. In this particular instance where you are looping through your dataset, it isn't the right thing to do.
A simple fix is to accumulate loss's underlying value, i.e. the scalar value, not the tensor itself, using item. And, backpropagate on the current loss tensor:
total_loss = 0
for i in range(len(train)):
optimizer.zero_grad()
output, hidden = rnn(train[i].unsqueeze(0), hidden)
target_class = (target[i] == 1).nonzero(as_tuple=True)[0]
loss = criterion(output, target_class)
loss.backward()
total_loss += loss.item()
Since you are updating the model's parameter straight after having done the backpropagation, you don't need to retain the graph in memory.

How to feed back RNN output to input in tensorflow

In case where suppose I have a trained RNN (e.g. language model), and I want to see what it would generate on its own, how should I feed its output back to its input?
I read the following related questions:
TensorFlow using LSTMs for generating text
TensorFlow LSTM Generative Model
Theoretically it is clear to me, that in tensorflow we use truncated backpropagation, so we have to define the max step which we would like to "trace". Also we reserve a dimension for batches, therefore if I'd like to train a sine wave, I have to feed [None, num_step, 1] inputs.
The following code works:
tf.reset_default_graph()
n_samples=100
state_size=5
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
X = tf.placeholder_with_default(zero_x, [None, n_samples, 1])
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Y = np.roll(def_x, 1)
loss = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
opt = tf.train.AdamOptimizer().minimize(loss)
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
# Initial state run
plt.show(plt.plot(output.eval()[0]))
plt.plot(def_x.squeeze())
plt.show(plt.plot(pred.eval().squeeze()))
steps = 1001
for i in range(steps):
p, l, _= sess.run([pred, loss, opt])
The state size of the LSTM can be varied, also I experimented with feeding sine wave into the network and zeros, and in both cases it converged in ~500 iterations. So far I have understood that in this case the graph consists n_samples number of LSTM cells sharing their parameters, and it is only up to me that I feed input to them as a time series. However when generating samples the network is explicitly depending on its previous output - meaning that I cannot feed the unrolled model at once. I tried to compute the state and output at every step:
with tf.variable_scope('sine', reuse=True):
X_test = tf.placeholder(tf.float64)
X_reshaped = tf.reshape(X_test, [1, -1, 1])
output, last_states = tf.nn.dynamic_rnn(lstm_cell, X_reshaped, dtype=tf.float64)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
test_vals = [0.]
for i in range(1000):
val = pred.eval({X_test:np.array(test_vals)[None, :, None]})
test_vals.append(val)
However in this model it seems that there is no continuity between the LSTM cells. What is going on here?
Do I have to initialize a zero array with i.e. 100 time steps, and assign each run's result into the array? Like feeding the network with this:
run 0: input_feed = [0, 0, 0 ... 0]; res1 = result
run 1: input_feed = [res1, 0, 0 ... 0]; res2 = result
run 1: input_feed = [res1, res2, 0 ... 0]; res3 = result
etc...
What to do if I want to use this trained network to use its own output as its input in the following time step?
If I understood you correctly, you want to find a way to feed the output of time step t as input to time step t+1, right? To do so, there is a relatively easy work around that you can use at test time:
Make sure your input placeholders can accept a dynamic sequence length, i.e. the size of the time dimension is None.
Make sure you are using tf.nn.dynamic_rnn (which you do in the posted example).
Pass the initial state into dynamic_rnn.
Then, at test time, you can loop through your sequence and feed each time step individually (i.e. max sequence length is 1). Additionally, you just have to carry over the internal state of the RNN. See pseudo code below (the variable names refer to your code snippet).
I.e., change the definition of the model to something like this:
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(state_size, forget_bias=1.)
X = tf.placeholder_with_default(zero_x, [None, None, 1]) # [batch_size, seq_length, dimension of input]
batch_size = tf.shape(self.input_)[0]
initial_state = lstm_cell.zero_state(batch_size, dtype=tf.float32)
def_x = np.sin(np.linspace(0, 10, n_samples))[None, :, None]
zero_x = np.zeros(n_samples)[None, :, None]
output, last_states = tf.nn.dynamic_rnn(inputs=X, cell=lstm_cell, dtype=tf.float64,
initial_state=initial_state)
pred = tf.contrib.layers.fully_connected(output, 1, activation_fn=tf.tanh)
Then you can perform inference like so:
fetches = {'final_state': last_state,
'prediction': pred}
toy_initial_input = np.array([[[1]]]) # put suitable data here
seq_length = 20 # put whatever is reasonable here for you
# get the output for the first time step
feed_dict = {X: toy_initial_input}
eval_out = sess.run(fetches, feed_dict)
outputs = [eval_out['prediction']]
next_state = eval_out['final_state']
for i in range(1, seq_length):
feed_dict = {X: outputs[-1],
initial_state: next_state}
eval_out = sess.run(fetches, feed_dict)
outputs.append(eval_out['prediction'])
next_state = eval_out['final_state']
# outputs now contains the sequence you want
Note that this can also work for batches, however it can be a bit more complicated if you sequences of different lengths in the same batch.
If you want to perform this kind of prediction not only at test time, but also at training time, it is also possible to do, but a bit more complicated to implement.
You can use its own output (last state) as the next-step input (initial state).
One way to do this is to:
use zero-initialized variables as the input state at every time step
each time you completed a truncated sequence and got some output state, update the state variables with this output state you just got.
The second can be done by either:
fetching the states to python and feeding them back next time, as done in the ptb example in tensorflow/models
build an update op in the graph and add a dependency, as done in the ptb example in tensorpack.
I know I'm a bit late to the party but I think this gist could be useful:
https://gist.github.com/CharlieCodex/f494b27698157ec9a802bc231d8dcf31
It lets you autofeed the input through a filter and back into the network as input. To make shapes match up processing can be set as a tf.layers.Dense layer.
Please ask any questions!
Edit:
In your particular case, create a lambda which performs the processing of the dynamic_rnn outputs into your character vector space. Ex:
# if you have:
W = tf.Variable( ... )
B = tf.Variable( ... )
Yo, Ho = tf.nn.dynamic_rnn( cell , inputs , state )
logits = tf.matmul(W, Yo) + B
...
# use self_feeding_rnn as
process_yo = lambda Yo: tf.matmul(W, Yo) + B
Yo, Ho = self_feeding_rnn( cell, seed, initial_state, processing=process_yo)

using MultiRNNCell in tensorflow 0.12

With Tensorflow 0.12, there have been changes to the way that MultiRNNCell works, for starters, state_is_tuple is now set to True by default, furthermore, there is this discussion on it:
state_is_tuple: If True, accepted and returned states are n-tuples, where n = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.
I'm wondering how exactly I could use a multi layer RNN with GRU cells, here is my code so far:
def _run_rnn(self, inputs):
# embedded inputs are passed in here
self.initial_state = tf.zeros([self._batch_size, self._hidden_size], tf.float32)
cell = tf.nn.rnn_cell.GRUCell(self._hidden_size)
cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=self._dropout_placeholder)
cell = tf.nn.rnn_cell.MultiRNNCell([cell] * self._num_layers, state_is_tuple=False)
outputs, last_state = tf.nn.dynamic_rnn(
cell = cell,
inputs = inputs,
sequence_length = self.sequence_length,
initial_state = self.initial_state
)
return outputs, last_state
My inputs look up word ids and return a corresponding embedding vectors. Now, running with the code above I'm greeted by the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 100 and 200 for 'rnn/while/Select_1' (op: 'Select') with input shapes: [?], [64,100], [64,200]
The places I've got a ? in is within my placeholders:
def _add_placeholders(self):
self.input_placeholder = tf.placeholder(tf.int32, shape=[None, self._max_steps])
self.label_placeholder = tf.placeholder(tf.int32, shape=[None, self._max_steps])
self.sequence_length = tf.placeholder(tf.int32, shape=[None])
self._dropout_placeholder = tf.placeholder(tf.float32)
Your main issue is in the setting of the initial_state. Since your state is now a tuple, (more specifically an LSTMStateTuple, you cannot directly assign it to tf.zeros. Instead use,
self.initial_state = cell.zero_state(self._batch_size, tf.float32)
Have a look at the documentation for more.
To use this in code, you will need to pass this tensor in the feed_dict. Do something like this,
state = sess.run(model.initial_state)
for batch in batches:
# Logic to add input placeholder in `feed_dict`
feed_dict[model.initial_state] = state
# Note I'm re-using `state` below
(loss, state) = sess.run([model.loss, model.final_state], feed_dict=feed_dict)

TensorFlow: Remember LSTM state for next batch (stateful LSTM)

Given a trained LSTM model I want to perform inference for single timesteps, i.e. seq_length = 1 in the example below. After each timestep the internal LSTM (memory and hidden) states need to be remembered for the next 'batch'. For the very beginning of the inference the internal LSTM states init_c, init_h are computed given the input. These are then stored in a LSTMStateTuple object which is passed to the LSTM. During training this state is updated every timestep. However for inference I want the state to be saved in between batches, i.e. the initial states only need to be computed at the very beginning and after that the LSTM states should be saved after each 'batch' (n=1).
I found this related StackOverflow question: Tensorflow, best way to save state in RNNs?. However this only works if state_is_tuple=False, but this behavior is soon to be deprecated by TensorFlow (see rnn_cell.py). Keras seems to have a nice wrapper to make stateful LSTMs possible but I don't know the best way to achieve this in TensorFlow. This issue on the TensorFlow GitHub is also related to my question: https://github.com/tensorflow/tensorflow/issues/2838
Anyone good suggestions for building a stateful LSTM model?
inputs = tf.placeholder(tf.float32, shape=[None, seq_length, 84, 84], name="inputs")
targets = tf.placeholder(tf.float32, shape=[None, seq_length], name="targets")
num_lstm_layers = 2
with tf.variable_scope("LSTM") as scope:
lstm_cell = tf.nn.rnn_cell.LSTMCell(512, initializer=initializer, state_is_tuple=True)
self.lstm = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_lstm_layers, state_is_tuple=True)
init_c = # compute initial LSTM memory state using contents in placeholder 'inputs'
init_h = # compute initial LSTM hidden state using contents in placeholder 'inputs'
self.state = [tf.nn.rnn_cell.LSTMStateTuple(init_c, init_h)] * num_lstm_layers
outputs = []
for step in range(seq_length):
if step != 0:
scope.reuse_variables()
# CNN features, as input for LSTM
x_t = # ...
# LSTM step through time
output, self.state = self.lstm(x_t, self.state)
outputs.append(output)
I found out it was easiest to save the whole state for all layers in a placeholder.
init_state = np.zeros((num_layers, 2, batch_size, state_size))
...
state_placeholder = tf.placeholder(tf.float32, [num_layers, 2, batch_size, state_size])
Then unpack it and create a tuple of LSTMStateTuples before using the native tensorflow RNN Api.
l = tf.unpack(state_placeholder, axis=0)
rnn_tuple_state = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(num_layers)]
)
RNN passes in the API:
cell = tf.nn.rnn_cell.LSTMCell(state_size, state_is_tuple=True)
cell = tf.nn.rnn_cell.MultiRNNCell([cell]*num_layers, state_is_tuple=True)
outputs, state = tf.nn.dynamic_rnn(cell, x_input_batch, initial_state=rnn_tuple_state)
The state - variable will then be feeded to the next batch as a placeholder.
Tensorflow, best way to save state in RNNs? was actually my original question. The code bellow is how I use the state tuples.
with tf.variable_scope('decoder') as scope:
rnn_cell = tf.nn.rnn_cell.MultiRNNCell \
([
tf.nn.rnn_cell.LSTMCell(512, num_proj = 256, state_is_tuple = True),
tf.nn.rnn_cell.LSTMCell(512, num_proj = WORD_VEC_SIZE, state_is_tuple = True)
], state_is_tuple = True)
state = [[tf.zeros((BATCH_SIZE, sz)) for sz in sz_outer] for sz_outer in rnn_cell.state_size]
for t in range(TIME_STEPS):
if t:
last = y_[t - 1] if TRAINING else y[t - 1]
else:
last = tf.zeros((BATCH_SIZE, WORD_VEC_SIZE))
y[t] = tf.concat(1, (y[t], last))
y[t], state = rnn_cell(y[t], state)
scope.reuse_variables()
Rather than using tf.nn.rnn_cell.LSTMStateTuple I just create a lists of lists which works fine. In this example I am not saving the state. However you could easily have made state out of variables and just used assign to save the values.

How to get Tensorflow seq2seq embedding output

I am attempting to train a sequence to sequence model using tensorflow and have been looking at their example code.
I want to be able to access the vector embeddings created by the encoder as they seem to have some interesting properties.
However, it really isn't clear to me how this can be.
In the vector representations of words example they talk a lot about what these embeddings can be used for and then don't appear to provide a simple way of accessing them, unless I am mistaken.
Any help figuring out how to access them would be greatly appreciated.
As with all Tensorflow operations, most variables are dynamically created. There are different ways to access these variables ( and their values ). Here, the variable you are interested in is part of the set of trained variables. To access these, we can thus use the tf.trainable_variables() function:
for var in tf.trainable_variables():
print var.name
which will give us - for a GRU seq2seq model, the following list:
embedding_rnn_seq2seq/RNN/EmbeddingWrapper/embedding:0
embedding_rnn_seq2seq/RNN/GRUCell/Gates/Linear/Matrix:0
embedding_rnn_seq2seq/RNN/GRUCell/Gates/Linear/Bias:0
embedding_rnn_seq2seq/RNN/GRUCell/Candidate/Linear/Matrix:0
embedding_rnn_seq2seq/RNN/GRUCell/Candidate/Linear/Bias:0
embedding_rnn_seq2seq/embedding_rnn_decoder/embedding:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/GRUCell/Gates/Linear/Matrix:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/GRUCell/Gates/Linear/Bias:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/GRUCell/Candidate/Linear/Matrix:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/GRUCell/Candidate/Linear/Bias:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/OutputProjectionWrapper/Linear/Matrix:0
embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/OutputProjectionWrapper/Linear/Bias:0
This tells us that the embedding is called embedding_rnn_seq2seq/RNN/EmbeddingWrapper/embedding:0, which we can then use to retrieve a pointer to that variable in our earlier iterator:
for var in tf.trainable_variables():
print var.name
if var.name == 'embedding_rnn_seq2seq/RNN/EmbeddingWrapper/embedding:0':
embedding_op = var
This we can then pass along with other ops to our session-run:
_, loss_t, summary, embedding = sess.run([train_op, loss, summary_op, embedding_op], feed_dict)
and we have ourselves the (batch-list of) embeddings ...
There is a related post, but it is based on tensorflow-0.6 which is quite out of date. So I update his answer in tensorflow-0.8 which is also similar to that in the newest version.
(*represent where to modify)
losses = []
outputs = []
*states = []
with ops.op_scope(all_inputs, name, "model_with_buckets"):
for j, bucket in enumerate(buckets):
with variable_scope.variable_scope(variable_scope.get_variable_scope(),
reuse=True if j > 0 else None):
*bucket_outputs, _ ,bucket_states= seq2seq(encoder_inputs[:bucket[0]],
decoder_inputs[:bucket[1]])
outputs.append(bucket_outputs)
if per_example_loss:
losses.append(sequence_loss_by_example(
outputs[-1], targets[:bucket[1]], weights[:bucket[1]],
softmax_loss_function=softmax_loss_function))
else:
losses.append(sequence_loss(
outputs[-1], targets[:bucket[1]], weights[:bucket[1]],
softmax_loss_function=softmax_loss_function))
return outputs, losses, *states
at python/ops/seq2seq, modify embedding_attention_seq2seq()
if isinstance(feed_previous, bool):
*outputs, states = embedding_attention_decoder(
decoder_inputs, encoder_state, attention_states, cell,
num_decoder_symbols, embedding_size, num_heads=num_heads,
output_size=output_size, output_projection=output_projection,
feed_previous=feed_previous,
initial_state_attention=initial_state_attention)
*return outputs, states, encoder_state
# If feed_previous is a Tensor, we construct 2 graphs and use cond.
def decoder(feed_previous_bool):
reuse = None if feed_previous_bool else True
with variable_scope.variable_scope(variable_scope.get_variable_scope(),reuse=reuse):
outputs, state = embedding_attention_decoder(
decoder_inputs, encoder_state, attention_states, cell,
num_decoder_symbols, embedding_size, num_heads=num_heads,
output_size=output_size, output_projection=output_projection,
feed_previous=feed_previous_bool,
update_embedding_for_previous=False,
initial_state_attention=initial_state_attention)
return outputs + [state]
outputs_and_state = control_flow_ops.cond(feed_previous, lambda: decoder(True), lambda: decoder(False))
*return outputs_and_state[:-1], outputs_and_state[-1], encoder_state
at model/rnn/translate/seq2seq_model.py modify init()
if forward_only:
*self.outputs, self.losses, self.states= tf.nn.seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs, targets,
self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=softmax_loss_function)
# If we use output projection, we need to project outputs for decoding.
if output_projection is not None:
for b in xrange(len(buckets)):
self.outputs[b] = [
tf.matmul(output, output_projection[0]) + output_projection[1]
for output in self.outputs[b]
]
else:
*self.outputs, self.losses, _ = tf.nn.seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs, targets,
self.target_weights, buckets,
lambda x, y: seq2seq_f(x, y, False),
softmax_loss_function=softmax_loss_function)
at model/rnn/translate/seq2seq_model.py modify step()
if not forward_only:
return outputs[1], outputs[2], None # Gradient norm, loss, no outputs.
else:
*return None, outputs[0], outputs[1:], outputs[-1] # No gradient norm, loss, outputs.
with all these done, we can get the encoded states by calling :
_, _, output_logits, states = model.step(sess, encoder_inputs, decoder_inputs,
target_weights, bucket_id, True)
print (states)
in the translate.py.

Categories

Resources