How to use Tensorflow's batch_sequences_with_states utility - python

I am trying to build a generative RNN using Tensorflow. I have a preprocessed dataset which is a list of sequence_length x 2048 x 2 numpy arrays. The sequences have different lengths. I have been looking through examples and documentation but I really couldn't understand, for example, what key is, or how I should create the input_sequences dictionary, etc.
So how should one format a list of numpy arrays, each of which represent a sequence of rank n (2 in this case) tensors, in order to be able to use this batch_sequences_with_states method?

Toy Implementations
I tried this and I will be glad to share my findings with you. It is a toy example. I attempted to create an example that works and observe how the output varies. In particular I used a case study of lstm. For you, you can define a conv net. Feel free to add more input and adjust as usual and follow the doc.
https://www.tensorflow.org/versions/r0.11/api_docs/python/contrib.training/splitting_sequence_inputs_into_minibatches_with_state_saving#batch_sequences_with_states
There are other more subtle examples I tried but I keep this simple version to show how the operation can be useful. In particular add more elements to the dictionaries (input sequence and context sequence) and observe the changes.
Two Approaches
Basically I will use two approaches:
tf.contrib.training.batch_sequences_with_states
tf.train.batch( )
I will start with the first one because it will directly helpful then I will show how to solve similar problem with train.batch.
I will basically be generate toy numpy arrays and tensors and use it for testing the operations
import tensorflow as tf
batch_size = 32
num_unroll = 20
num_enqueue_threads = 20
lstm_size = 8
cell = tf.contrib.rnn.BasicLSTMCell(num_units=lstm_size)
#state size
state_size = cell.state_size[0];
initial_state_values = tf.zeros((state_size,), dtype=tf.float32)
# Initial states
initial_state_values = tf.zeros((state_size,), dtype=tf.float32)
initial_states = {"lstm_state": initial_state_values}
# Key should be string
#I used x as input sequence and y as input context. So that the
# keys should be 2.
key = ["1","2"]
#Toy data for our sample
x = tf.range(0, 12, name="x")
y = tf.range(12,24,name="y")
# convert to float
#I converted to float so as not to raise type mismatch erroe
x=tf.to_float(x)
y=tf.to_float(y)
#the input sequence as dictionary
#This is needed according to the tensorflow doc
sequences = {"x": x }
#Context Input
context = {"batch1": y}
# Train batch with sequence state
batch_new = tf.contrib.training.batch_sequences_with_states(
input_key=key,
input_sequences=sequences,
input_context=context,
initial_states=initial_states,
num_unroll=num_unroll,
batch_size=batch_size,
input_length = None,
pad = True,
num_threads=num_enqueue_threads,
capacity=batch_size * num_enqueue_threads * 2)
# To test what we have got type and observe the output of
# the following
# In short once in ipython notebook
# type batch_new.[press tab] to see all options
batch_new.key
batch_new.sequences
#splitting of input. This generate input per epoch
inputs_by_time = tf.split(inputs, num_unroll)
assert len(inputs_by_time) == num_unroll
# Get lstm or conv net output
lstm_output, _ = tf.contrib.rnn.static_state_saving_rnn(
cell,
inputs_by_time,
state_saver=batch_new,
state_name=("lstm_state","lstm_state"))
Create Graph and Queue as Usual
The parts with # and * can be further adapted to suit requirement.
# Create the graph, etc.
init_op = tf.global_variables_initializer()
#Create a session for running operations in the Graph.
sess = tf.Session()
# Initialize the variables (like the epoch counter).
sess.run(init_op)
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# For the part below uncomment
#*those comments with asterics to do other operations
#*try:
#* while not coord.should_stop():
#*Run training steps or whatever
#*sess.run(train_op) # uncomment to run other ops
#*except tf.errors.OutOfRangeError:
#print('Done training -- epoch limit reached')
#*finally:
# When done, ask the threads to stop.
coord.request_stop()
# Wait for threads to finish.
coord.join(threads)
sess.close()
Second Approach
You can also use train.batch in a very interesting way:
import tensorflow as tf
#[0, 1, 2, 3, 4 ,...]
x = tf.range(0, 11, name="x")
# A queue that outputs 0,1,2,3,..
# slice end is useful for dequeuing
slice_end = 10
# instantiate variable y
y = tf.slice(x, [0], [slice_end], name="y")
# Reshape y
y = tf.reshape(y,[10,1])
y=tf.to_float(y, name='ToFloat')
Important
Note the use of dynamic and enqueue many with padding. Feel free to play with both options. And compare output!
batched_data = tf.train.batch(
tensors=[y],
batch_size=10,
dynamic_pad=True,
#enqueue_many=True,
name="y_batch"
)
batch_size = 128 ;
lstm_cell = tf.contrib.rnn.LSTMCell(batch_size,forget_bias=1,state_is_tuple=True)
val, state = tf.nn.dynamic_rnn(lstm_cell, batched_data, dtype=tf.float32)
Conclusion
The aim is to show that by simple examples we can get insight into the
details of the operations. You can adapt it to convolutional net in your case.
Hope this helps!

Related

Cascade multiple RNN models for N-dimensional output

I'm having some difficulty with chaining together two models in an unusual way.
I am trying to replicate the following flowchart:
For clarity, at each timestep of Model[0] I am attempting to generate an entire time series from IR[i] (Intermediate Representation) as a repeated input using Model[1]. The purpose of this scheme is it allows the generation of a ragged 2-D time series from a 1-D input (while both allowing the second model to be omitted when the output for that timestep is not needed, and not requiring Model[0] to constantly "switch modes" between accepting input, and generating output).
I assume a custom training loop will be required, and I already have a custom training loop for handling statefulness in the first model (the previous version only had a single output at each timestep). As depicted, the second model should have reasonably short outputs (able to be constrained to fewer than 10 timesteps).
But at the end of the day, while I can wrap my head around what I want to do, I'm not nearly adroit enough with Keras and/or Tensorflow to actually implement it. (In fact, this is my first non-toy project with the library.)
I have unsuccessfully searched literature for similar schemes to parrot, or example code to fiddle with. And I don't even know if this idea is possible from within TF/Keras.
I already have the two models working in isolation. (As in I've worked out the dimensionality, and done some training with dummy data to get garbage outputs for the second model, and the first model is based off of a previous iteration of this problem and has been fully trained.) If I have Model[0] and Model[1] as python variables (let's call them model_a and model_b), then how would I chain them together to do this?
Edit to add:
If this is all unclear, perhaps having the dimensions of each input and output will help:
The dimensions of each input and output are:
Input: (batch_size, model_a_timesteps, input_size)
IR: (batch_size, model_a_timesteps, ir_size)
IR[i] (after duplication): (batch_size, model_b_timesteps, ir_size)
Out[i]: (batch_size, model_b_timesteps, output_size)
Out: (batch_size, model_a_timesteps, model_b_timesteps, output_size)
As this question has multiple major parts, I've dedicated a Q&A to the core challenge: stateful backpropagation. This answer focuses on implementing the variable output step length.
Description:
As validated in Case 5, we can take a bottom-up first approach. First we feed the complete input to model_a (A) - then, feed its outputs as input to model_b (B), but this time one step at a time.
Note that we must chain B's output steps per A's input step, not between A's input steps; i.e., in your diagram, gradient is to flow between Out[0][1] and Out[0][0], but not between Out[2][0] and Out[0][1].
For computing loss it won't matter whether we use a ragged or padded tensor; we must however use a padded tensor for writing to TensorArray.
Loop logic in code below is general; specific attribute handling and hidden state passing, however, is hard-coded for simplicity, but can be rewritten for generality.
Code: at bottom.
Example:
Here we predefine the number of iterations for B per input from A, but we can implement any arbitrary stopping logic. For example, we can take a Dense layer's output from B as a hidden state and check if its L2-norm exceeds a threshold.
Per above, if longest_step is unknown to us, we can simply set it, which is common for NLP & other tasks with a STOP token.
Alternatively, we may write to separate TensorArrays at every A's input with dynamic_size=True; see "point of uncertainty" below.
A valid concern is, how do we know gradients flow correctly? Note that we've validate them for both vertical and horizontal in the linked Q&A, but it didn't cover multiple output steps per an input step, for multiple input steps. See below.
Point of uncertainty: I'm not entirely sure whether gradients interact between e.g. Out[0][1] and Out[2][0]. I did, however, verify that gradients will not flow horizontally if we write to separate TensorArrays for B's outputs per A's inputs (case 2); reimplementing for cases 4 & 5, grads will differ for both models, including lower one with a complete single horizontal pass.
Thus we must write to a unified TensorArray. For such, as there are no ops leading from e.g. IR[1] to Out[0][1], I can't see how TF would trace it as such - so it seems we're safe. Note, however, that in below example, using steps_at_t=[1]*6 will make gradient flow in the both model horizontally, as we're writing to a single TensorArray and passing hidden states.
The examined case is confounded, however, with B being stateful at all steps; lifting this requirement, we might not need to write to a unified TensorArray for all Out[0], Out[1], etc, but we must still test against something we know works, which is no longer as straightforward.
Example [code]:
import numpy as np
import tensorflow as tf
#%%# Make data & models, then fit ###########################################
x0 = y0 = tf.constant(np.random.randn(2, 3, 4))
msn = MultiStatefulNetwork(batch_shape=(2, 3, 4), steps_at_t=[3, 4, 2])
#%%#############################################
with tf.GradientTape(persistent=True) as tape:
outputs = msn(x0)
# shape: (3, 4, 2, 4), 0-padded
# We can pad labels accordingly.
# Note the (2, 4) model_b's output shape, which is a timestep slice;
# model_b is a *slice model*. Careful in implementing various logics
# which are and aren't intended to be stateful.
Methods:
Not the cleanest, nor most optimal code, but it works; room for improvement.
More importantly: I implemented this in Eager, and have no idea how it'll work in Graph, and making it work for both can be quite tricky. If needed, just run in Graph and compare all values as done in the "cases".
# ideally we won't `import tensorflow` at all; kept for code simplicity
import tensorflow as tf
from tensorflow.python.util import nest
from tensorflow.python.ops import array_ops, tensor_array_ops
from tensorflow.python.framework import ops
from tensorflow.keras.layers import Input, SimpleRNN, SimpleRNNCell
from tensorflow.keras.models import Model
#######################################################################
class MultiStatefulNetwork():
def __init__(self, batch_shape=(2, 6, 4), steps_at_t=[]):
self.batch_shape=batch_shape
self.steps_at_t=steps_at_t
self.batch_size = batch_shape[0]
self.units = batch_shape[-1]
self._build_models()
def __call__(self, inputs):
outputs = self._forward_pass_a(inputs)
outputs = self._forward_pass_b(outputs)
return outputs
def _forward_pass_a(self, inputs):
return self.model_a(inputs, training=True)
def _forward_pass_b(self, inputs):
return model_rnn_outer(self.model_b, inputs, self.steps_at_t)
def _build_models(self):
ipt = Input(batch_shape=self.batch_shape)
out = SimpleRNN(self.units, return_sequences=True)(ipt)
self.model_a = Model(ipt, out)
ipt = Input(batch_shape=(self.batch_size, self.units))
sipt = Input(batch_shape=(self.batch_size, self.units))
out, state = SimpleRNNCell(4)(ipt, sipt)
self.model_b = Model([ipt, sipt], [out, state])
self.model_a.compile('sgd', 'mse')
self.model_b.compile('sgd', 'mse')
def inner_pass(model, inputs, states):
return model_rnn(model, inputs, states)
def model_rnn_outer(model, inputs, steps_at_t=[2, 2, 4, 3]):
def outer_step_function(inputs, states):
x, steps = inputs
x = array_ops.expand_dims(x, 0)
x = array_ops.tile(x, [steps, *[1] * (x.ndim - 1)]) # repeat steps times
output, new_states = inner_pass(model, x, states)
return output, new_states
(outer_steps, steps_at_t, longest_step, outer_t, initial_states,
output_ta, input_ta) = _process_args_outer(model, inputs, steps_at_t)
def _outer_step(outer_t, output_ta_t, *states):
current_input = [input_ta.read(outer_t), steps_at_t.read(outer_t)]
output, new_states = outer_step_function(current_input, tuple(states))
# pad if shorter than longest_step.
# model_b may output twice, but longest in `steps_at_t` is 4; then we need
# output.shape == (2, *model_b.output_shape) -> (4, *...)
# checking directly on `output` is more reliable than from `steps_at_t`
output = tf.cond(
tf.math.less(output.shape[0], longest_step),
lambda: tf.pad(output, [[0, longest_step - output.shape[0]],
*[[0, 0]] * (output.ndim - 1)]),
lambda: output)
output_ta_t = output_ta_t.write(outer_t, output)
return (outer_t + 1, output_ta_t) + tuple(new_states)
final_outputs = tf.while_loop(
body=_outer_step,
loop_vars=(outer_t, output_ta) + initial_states,
cond=lambda outer_t, *_: tf.math.less(outer_t, outer_steps))
output_ta = final_outputs[1]
outputs = output_ta.stack()
return outputs
def _process_args_outer(model, inputs, steps_at_t):
def swap_batch_timestep(input_t):
# Swap the batch and timestep dim for the incoming tensor.
# (samples, timesteps, channels) -> (timesteps, samples, channels)
# iterating dim0 to feed (samples, channels) slices expected by RNN
axes = list(range(len(input_t.shape)))
axes[0], axes[1] = 1, 0
return array_ops.transpose(input_t, axes)
inputs = nest.map_structure(swap_batch_timestep, inputs)
assert inputs.shape[0] == len(steps_at_t)
outer_steps = array_ops.shape(inputs)[0] # model_a_steps
longest_step = max(steps_at_t)
steps_at_t = tensor_array_ops.TensorArray(
dtype=tf.int32, size=len(steps_at_t)).unstack(steps_at_t)
# assume single-input network, excluding states which are handled separately
input_ta = tensor_array_ops.TensorArray(
dtype=inputs.dtype,
size=outer_steps,
element_shape=tf.TensorShape(model.input_shape[0]),
tensor_array_name='outer_input_ta_0').unstack(inputs)
# TensorArray is used to write outputs at every timestep, but does not
# support RaggedTensor; thus we must make TensorArray such that column length
# is that of the longest outer step, # and pad model_b's outputs accordingly
element_shape = tf.TensorShape((longest_step, *model.output_shape[0]))
# overall shape: (outer_steps, longest_step, *model_b.output_shape)
# for every input / at each step we write in dim0 (outer_steps)
output_ta = tensor_array_ops.TensorArray(
dtype=model.output[0].dtype,
size=outer_steps,
element_shape=element_shape,
tensor_array_name='outer_output_ta_0')
outer_t = tf.constant(0, dtype='int32')
initial_states = (tf.zeros(model.input_shape[0], dtype='float32'),)
return (outer_steps, steps_at_t, longest_step, outer_t, initial_states,
output_ta, input_ta)
def model_rnn(model, inputs, states):
def step_function(inputs, states):
output, new_states = model([inputs, *states], training=True)
return output, new_states
initial_states = states
input_ta, output_ta, time, time_steps_t = _process_args(model, inputs)
def _step(time, output_ta_t, *states):
current_input = input_ta.read(time)
output, new_states = step_function(current_input, tuple(states))
flat_state = nest.flatten(states)
flat_new_state = nest.flatten(new_states)
for state, new_state in zip(flat_state, flat_new_state):
if isinstance(new_state, ops.Tensor):
new_state.set_shape(state.shape)
output_ta_t = output_ta_t.write(time, output)
new_states = nest.pack_sequence_as(initial_states, flat_new_state)
return (time + 1, output_ta_t) + tuple(new_states)
final_outputs = tf.while_loop(
body=_step,
loop_vars=(time, output_ta) + tuple(initial_states),
cond=lambda time, *_: tf.math.less(time, time_steps_t))
new_states = final_outputs[2:]
output_ta = final_outputs[1]
outputs = output_ta.stack()
return outputs, new_states
def _process_args(model, inputs):
time_steps_t = tf.constant(inputs.shape[0], dtype='int32')
# assume single-input network (excluding states)
input_ta = tensor_array_ops.TensorArray(
dtype=inputs.dtype,
size=time_steps_t,
tensor_array_name='input_ta_0').unstack(inputs)
# assume single-output network (excluding states)
output_ta = tensor_array_ops.TensorArray(
dtype=model.output[0].dtype,
size=time_steps_t,
element_shape=tf.TensorShape(model.output_shape[0]),
tensor_array_name='output_ta_0')
time = tf.constant(0, dtype='int32', name='time')
return input_ta, output_ta, time, time_steps_t

How to use properly Tensorflow Dataset with batch?

I am new to Tensorflow and deep learning, and I am struggling with the Dataset class. I tried a lot of things and I can’t find a good solution.
What I am trying
I have a large amount of images (500k+) to train my DNN with. This is a denoising autoencoder so I have a pair of each image. I am using the dataset class of TF to manage the data, but I think I use it really badly.
Here is how I load the filenames in a dataset:
class Data:
def __init__(self, in_path, out_path):
self.nb_images = 512
self.test_ratio = 0.2
self.batch_size = 8
# load filenames in input and outputs
inputs, outputs, self.nb_images = self._load_data_pair_paths(in_path, out_path, self.nb_images)
self.size_training = self.nb_images - int(self.nb_images * self.test_ratio)
self.size_test = int(self.nb_images * self.test_ratio)
# split arrays in training / validation
test_data_in, training_data_in = self._split_test_data(inputs, self.test_ratio)
test_data_out, training_data_out = self._split_test_data(outputs, self.test_ratio)
# transform array to tf.data.Dataset
self.train_dataset = tf.data.Dataset.from_tensor_slices((training_data_in, training_data_out))
self.test_dataset = tf.data.Dataset.from_tensor_slices((test_data_in, test_data_out))
I have a function to call at each epoch that will prepare the dataset. It shuffles the filenames, and transforms filenames to images and batch data.
def get_batched_data(self, seed, batch_size):
nb_batch = int(self.size_training / batch_size)
def img_to_tensor(path_in, path_out):
img_string_in = tf.read_file(path_in)
img_string_out = tf.read_file(path_out)
im_in = tf.image.decode_jpeg(img_string_in, channels=1)
im_out = tf.image.decode_jpeg(img_string_out, channels=1)
return im_in, im_out
t_datas = self.train_dataset.shuffle(self.size_training, seed=seed)
t_datas = t_datas.map(img_to_tensor)
t_datas = t_datas.batch(batch_size)
return t_datas
Now during the training, at each epoch we call the get_batched_data function, make an iterator, and run it for each batch, then feed the array to the optimizer operation.
for epoch in range(nb_epoch):
sess_iter_in = tf.Session()
sess_iter_out = tf.Session()
batched_train = data.get_batched_data(epoch)
iterator_train = batched_train.make_one_shot_iterator()
in_data, out_data = iterator_train.get_next()
total_batch = int(data.size_training / batch_size)
for batch in range(total_batch):
print(f"{batch + 1} / {total_batch}")
in_images = sess_iter_in.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess_iter_out.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
What do I need ?
I need to have a pipeline that loads only the images of the current batch (otherwise it will not fit in memory) and I want to shuffle the dataset in a different way for each epoch.
Questions and problems
First question, am I using the Dataset class in a good way? I saw very different things on the internet, for example in this blog post the dataset is used with a placeholder and fed during the learning with the datas. It seems strange because the data are all in an array, so loaded in memory. I don't see the point of using tf.data.dataset in this case.
I found solution by using repeat(epoch) on the dataset, like this, but the shuffle will not be different for each epoch in this case.
The second problem with my implementation is that I have an OutOfRangeError in some cases. With a small amount of data (512 like in the exemple) it works fine, but with a bigger amount of data, the error occurs. I thought it was because of a bad calculation of the number of batch due to bad rounding, or when the last batch has a smaller amount of data, but it happens in batch 32 out of 115... Is there any way to know the number of batch created after a batch(n) call on dataset?
Sorry for this loooonng question, but I've been struggling with this for a few days.
As far as I know, Official Performance Guideline is the best teaching material to make input pipelines.
I want to shuffle the dataset in a different way for each epoch.
Using shuffle() and repeat(), you can get different shuffle pattern for each epochs. You can confirm it with the following code
dataset = tf.data.Dataset.from_tensor_slices([1,2,3,4])
dataset = dataset.shuffle(4)
dataset = dataset.repeat(3)
iterator = dataset.make_one_shot_iterator()
x = iterator.get_next()
with tf.Session() as sess:
for i in range(10):
print(sess.run(x))
You can also use tf.contrib.data.shuffle_and_repeat as the mentioned by the above official page.
There are some problems in your code outside of creating data pipelines. You confuse graph construction with graph execution. You are repeating to create data input pipeline, so there are many redundant input pipelines as many as epochs. You can observe the redundant pipelines by Tensorboard.
You should place your graph construction code outside of loop as the following code (pseudo code)
batched_train = data.get_batched_data()
iterator = batched_train.make_initializable_iterator()
in_data, out_data = iterator_train.get_next()
for epoch in range(nb_epoch):
# reset iterator's state
sess.run(iterator.initializer)
try:
while True:
in_images = sess.run(in_data).reshape((-1, 64, 64, 1))
out_images = sess.run(out_data).reshape((-1, 64, 64, 1))
sess.run(optimizer, feed_dict={inputs: in_images,
outputs: out_images})
except tf.errors.OutOfRangeError:
pass
Moreover there are some unimportant inefficient code. You loaded a list of file path with from_tensor_slices(), so the list was embedded in your graph. (See https://www.tensorflow.org/guide/datasets#consuming_numpy_arrays for detail)
You would be better off using prefetch, and decreasing sess.run call by combining your graph.

Tensorflow save final state of LSTM in dynamic_rnn for prediction

I want to save the final state of my LSTM such that it's included when I restore the model and can be used for prediction. As explained below, the Saver only has knowledge of the final state when I use tf.assign. However, this throws an error (also explained below).
During training I always feed the final LSTM state back into the network, as explained in this post. Here are the important parts of the code:
When building the graph:
self.init_state = tf.placeholder(tf.float32, [
self.n_layers, 2, self.batch_size, self.n_hidden
])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training:
_current_state = np.zeros((self.n_layers, 2, self.batch_size,
self.n_hidden))
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels,
self.init_state: _current_state})
When I later restore my model from a checkpoint, the final state is not restored as well. As outlined in this post the problem is that the Saver has no knowledge of the new state. The post also suggests a solution, based on tf.assign. Regrettably, I cannot use the suggested
assign_op = tf.assign(self.init_state, _current_state)
self.sess.run(assign_op)
because self.init state is not a Variable but a placeholder. I get the error
AttributeError: 'Tensor' object has no attribute 'assign'
I have tried to solve this problem for several hours now but I can't get it to work.
Any help is appreciated!
EDIT:
I have changed self.init_state to
self.init_state = tf.get_variable('saved_state', shape=
[self.n_layers, 2, self.batch_size, self.n_hidden])
state_per_layer_list = tf.unstack(self.init_state, axis=0)
rnn_tuple_state = tuple([
tf.contrib.rnn.LSTMStateTuple(state_per_layer_list[idx][0],
state_per_layer_list[idx][1])
for idx in range(self.n_layers)
])
outputs, self.final_state = tf.nn.dynamic_rnn(
cell, inputs=self.inputs, initial_state=rnn_tuple_state)
And during training I don't feed a value for self.init_state:
_train_step, _current_state, _loss, _acc, summary = self.sess.run(
[
self.train_step, self.final_state,
self.merged
],
feed_dict={self.inputs: _inputs,
self.labels:_labels})
However, I still can't run the assignment op. Know I get
TypeError: Expected float32 passed to parameter 'value' of op 'Assign', got (LSTMStateTuple(c=array([[ 0.07291573, -0.06366599, -0.23425588, ..., 0.05307654,
In order to save the final state, you can create a separate TF variable, then before saving the graph, run an assign op to assign your latest state to that variable, and then save the graph. The only thing you need to keep in mind is to declare that variable BEFORE you declare the Saver; otherwise it won't be included in the graph.
This is discussed at great detail here, including the working code:
TF LSTM: Save State from training session for prediction session later
*** UPDATE: answers to followup questions:
It looks like you are using BasicLSTMCell, with state_is_tuple=True. The prior discussion that I referred you to used GRUCell with state_is_tuple=False. The details between the two are somewhat different, but the overall approach could be similar, so hopefully this should work for you:
During training, you first feed zeros as initial_state into dynamic_rnn and then keep re-feeding its own output back as input as initial_state. So, the LAST output state of our dynamic_rnn call is what you want to save for later. Since it results from a sess.run() call, essentially it's a numpy array (not a tensor and not a placeholder). So the question amounts to "how do I save a numpy array as a Tensorflow variable along with the rest of the variables in the graph." That's why you assign the final state to a variable whose only purpose is that.
So, code is something like this:
# GRAPH DEFINITIONS:
state_in = tf.placeholder(tf.float32, [LAYERS, 2, None, CELL_SIZE], name='state_in')
l = tf.unstack(state_in, axis=0)
state_tup = tuple(
[tf.nn.rnn_cell.LSTMStateTuple(l[idx][0], l[idx][1])
for idx in range(NLAYERS)])
#multicell = your BasicLSTMCell / MultiRNN definitions
output, state_out = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=state_tup)
savedState = tf.get_variable('savedState', shape=[LAYERS, 2, BATCHSIZE, CELL_SIZE])
saver = tf.train.Saver(max_to_keep=1)
in_state = np.zeros((LAYERS, 2, BATCHSIZE, CELL_SIZE))
# TRAINING LOOP:
feed_dict = {X: x, Y_: y_, batchsize: BATCHSIZE, state_in:in_state}
_, out_state = sess.run([training_step, state_out], feed_dict=feed_dict)
in_state = out_state
# ONCE TRAINING IS OVER:
assignOp = tf.assign(savedState, out_state)
sess.run(assignOp)
saver.save(sess, pathModel + '/my_model.ckpt')
# RECOVERING IN A DIFFERENT PROGRAM:
gInit = tf.global_variables_initializer().run()
lInit = tf.local_variables_initializer().run()
new_saver = tf.train.import_meta_graph(pathModel + 'my_model.ckpt.meta')
new_saver.restore(sess, pathModel + 'my_model.ckpt')
# retrieve State and get its LAST batch (latest obervarions)
savedState = sess.run('savedState:0') # this is FULL state from training
state = savedState[:,:,-1,:] # -1 gets only the LAST batch of the state (latest seen observations)
state = np.reshape(state, [state.shape[0], 2, -1, state.shape[2]]) #[LAYERS, 2, 1 (BATCH), SELL_SIZE]
#x = .... (YOUR INPUTS)
feed_dict = {'X:0': x, 'state_in:0':state}
#PREDICTION LOOP:
preds, state = sess.run(['preds:0', 'state_out:0'], feed_dict = feed_dict)
# so now state will be re-fed into feed_dict with the next loop iteration
As mentioned, this is a modified approach of what works well for me with GRUCell, where state_is_tuple = False. I adapted it to try BasicLSTMCell with state_is_tuple=True. It works, but not as accurately as the original approach. I don't know yet whether its just because for me GRU is better than LSTM or for some other reason. See if this works for you...
Also keep in mind that, as you can see with the recovery and prediction code, your predictions will likely be based on a different batch size than your training loop (I guess batch of 1?) So you have to think through how to handle your recovered state -- just take the last batch? Or something else? This code takes the last layer of the saved state only (i.e. the most recent observations from training) because that's what was relevant for me...

Iterate a tensor in a for loop?

How do i iterate a tensor in a for loop?..
I want to do convolution on each row of my input_tensor... but can't seem to iterate in a tensor.
Currently trying to it like this:
def row_convolution(input):
filter_size = 8
print input.dtype
print input.get_shape()
for units in xrange(splits):
extract = input[units:units+filter_size,:,:]
for row_of_extract in extract:
for unit in row_of_extract:
temp_list.append((Conv1D(filters = 1, kernel_size = 1, activation='relu' , name = 'conv')(unit)))
print len(temp_list)
sum_temp_list.append(sum(temp_list))
sum_sum_temp_list.append(sum(sum_temp_list))
conv_feature_map.append(sum_sum_temp_list)
return np.array(conv_feature_map)
It looks like you're trying to define tensorflow operations for each input. This is a common misunderstanding about the framework.
You must first define the operations that you will perform, all operations must be defined up front. Usually it looks something like this:
g = tf.Graph()
with g.as_default():
# define some placeholders to accept your input
X = tf.placeholder(tf.float32, shape=[1000,1])
y = tf.placeholder(tf.float32, shape=[1])
# add more operations...
Conv1D(...) # add your convolution operations
# add the rest of your operations
optimizer = tf.train.AdamOptimizer(0.00001).minimize(loss)
Now the graph has been defined, all of it. Consider that fixed, you won't add anything to it again.
Now you'll run data through the fixed graph:
with g.as_default(), tf.Session() as sess:
X_data, y_data = get_my_data()
# run this in a loop
result = sess.run([optimizer,loss], feed_dict={X:X_data, y:y_data})
Note that your data and labels should be feed in a batch, so the first dimension of your data represents N number of datapoints (N=1 is perfectly acceptable of course). You should preprocess the data so it's in that format. For example, a batch of 10 MNIST digits would be in shape [10,28,28,1]. That's:
10 data samples
Images are 28 px height
Images are 28 px width
It's a grayscale image, so 1 color channel

Train model using queue Tensorflow

I designed a neural network in tensorflow for my regression problem by following and adapting the tensorflow tutorial. However, due to the structure of my problem (~300.000 data points and use of the costful FTRLOptimizer), my problem took too long to execute even with my 32 CPUs machine (I don't have GPUs).
According to this comment and a quick confirmation via htop, it appears that I have some single-threaded operations and it should be feed_dict.
Therefore, as adviced here, I tried to use queues for multi-threading my program.
I wrote a simple code file with queue to train a model as following:
import numpy as np
import tensorflow as tf
import threading
#Function for enqueueing in parallel my data
def enqueue_thread():
sess.run(enqueue_op, feed_dict={x_batch_enqueue: x, y_batch_enqueue: y})
#Set the number of couples (x, y) I use for "training" my model
BATCH_SIZE = 5
#Generate my data where y=x+1+little_noise
x = np.random.randn(10, 1).astype('float32')
y = x+1+np.random.randn(10, 1)/100
#Create the variables for my model y = x*W+b, then W and b should both converge to 1.
W = tf.get_variable('W', shape=[1, 1], dtype='float32')
b = tf.get_variable('b', shape=[1, 1], dtype='float32')
#Prepare the placeholdeers for enqueueing
x_batch_enqueue = tf.placeholder(tf.float32, shape=[None, 1])
y_batch_enqueue = tf.placeholder(tf.float32, shape=[None, 1])
#Create the queue
q = tf.RandomShuffleQueue(capacity=2**20, min_after_dequeue=BATCH_SIZE, dtypes=[tf.float32, tf.float32], seed=12, shapes=[[1], [1]])
#Enqueue operation
enqueue_op = q.enqueue_many([x_batch_enqueue, y_batch_enqueue])
#Dequeue operation
x_batch, y_batch = q.dequeue_many(BATCH_SIZE)
#Prediction with linear model + bias
y_pred=tf.add(tf.mul(x_batch, W), b)
#MAE cost function
cost = tf.reduce_mean(tf.abs(y_batch-y_pred))
learning_rate = 1e-3
train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
available_threads = 1024
#Feed the queue
for i in range(available_threads):
threading.Thread(target=enqueue_thread).start()
#Train the model
for step in range(1000):
_, cost_step = sess.run([train_op, cost])
print(cost_step)
Wf=sess.run(W)
bf=sess.run(b)
This code doesn't work because each time I call x_batch, one y_batch is also dequeued and vice versa. Then, I do not compare the features with the corresponding "result".
Is there an easy way to avoid this problem ?
My mistake, everything worked fine.
I was misled because I estimated at each step of the algorithm my performance on different batches and also because my model was too complicated for a dummy one (I should had something like y=W*x or y=x+b).
Then, when I tried to print in the console, I exucuted several times sess.run on different variables and got obviously non-consistent results.
Nonetheless your problem is solved, wanted to show you a small inefficiency in your code. When you created your RandomShuffleQueue you specified capacity=2**20. In all the queues capacity:
The upper bound on the number of elements that may be stored in this
queue.
The queue will try to put as many elements as possible in the queue till it will hit this limit. All these elements are eating your RAM. If each element consists of only 1byte, your queue will eat 1Mb of your data. If you will have 10Kb images in your queue you will eat 10Gb of RAM.
This is very wasteful, especially because you never need so many elements in the queue. All you need to make sure is that your queue is never empty. So find a reasonable capacity of the queue and do not use huge numbers.

Categories

Resources