Tensorflow string_input_producer stuck in queue

Tensorflow string_input_producer stuck in queue - python

By following the mnist example, I was able to build a custom network and use the inputs function of the example to load my dataset (previously encoded as a TFRecord). Just to recap it, the inputs function looks like:
def inputs(train_dir, train, batch_size, num_epochs, one_hot_labels=False):
if not num_epochs: num_epochs = None
filename = os.path.join(train_dir,
TRAIN_FILE if train else VALIDATION_FILE)
with tf.name_scope('input'):
filename_queue = tf.train.string_input_producer(
[filename], num_epochs=num_epochs)
# Even when reading in multiple threads, share the filename
# queue.
image, label = read_and_decode(filename_queue)
# Shuffle the examples and collect them into batch_size batches.
# (Internally uses a RandomShuffleQueue.)
# We run this in two threads to avoid being a bottleneck.
images, sparse_labels = tf.train.shuffle_batch(
[image, label], batch_size=batch_size, num_threads=2,
capacity=1000 + 3 * batch_size,
# Ensures a minimum amount of shuffling of examples.
min_after_dequeue=1000)
return images, sparse_labels
Then, during the training I declare the training operator and run everything, and everything goes smoothly.
Now, I am trying to use the very same function to train a different network on the same data, the only (major) difference is that instead of just calling the slim.learning.train function on some train_operator, I do the training manually (by manually evaluating the losses and updating the parameters). The architecture is more complex and I'm forced to do so.
When I try to use the data generated by the inputs function, the program gets stuck, setting a queue timeout indeed shows that it's stuck on the producer's queue.
This leads me to believe that I'm probably missing something about the use of producers in tensorflow, I have read the tutorials but I couldn't figure out the issue. Is there some kind of initialization that calling slim.learning.train does and that I need to replicate by hand if I do my training manually? Why exactly isn't the producer producing?
For example, doing something like:
imgs, labels = inputs(...)
print imgs
prints
<tf.Tensor 'input/shuffle_batch:0' shape=(1, 128, 384, 6) dtype=float32>
which is the correct (symbolic?) tensor but if I then try to get the actual data with a imgs.eval() it's stuck indefinitely.

You need to start the queue runners, or the queues will be empty and reading from them will hang. See the documentation on queue runners.

Related

forecasting tensorflow confused with usage of repeat

I came across this notebook that covers forecasting. I got it through this article.
I am confused about the 2nd and 4th line from below
train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_data = train_data.cache().shuffle(buffer_size).batch(batch_size).repeat()
val_data = tf.data.Dataset.from_tensor_slices((x_vali, y_vali))
val_data = val_data.batch(batch_size).repeat()
I understand that we are trying to shuffle our data as we dont want to feed data to our model in the serial order. On additional reading I realized that it is better to have buffer_size same as the size of the dataset. But I am not sure what repeat is doing in this case. Could someone explain what is being done here and what is the function of repeat?
I also looked at this page and saw below text but still not clear.
The following methods in tf.Dataset :
repeat( count=0 ) The method repeats the dataset count number of times.
shuffle( buffer_size, seed=None, reshuffle_each_iteration=None) The method shuffles the samples in the dataset. The buffer_size is the number of samples which are randomized and returned as tf.Dataset.
batch(batch_size,drop_remainder=False) Creates batches of the dataset with batch size given as batch_size which is also the length of the batches.

The repeat call with nothing passed to the count param makes this dataset repeat infinitely.
In python terms, Datasets are a subclass of python iterables. If you have an object ds of type tf.data.Dataset, then you can execute iter(ds). If the dataset was generated by repeat(), then it will never run out of items, i.e., it will never throw a StopIteration exception.
In the notebook you referenced, the call to tf.keras.Model.fit() is passed an argument of 100 to the param steps_per_epoch. This means that the dataset should be infinitely repeating, and Keras will pause training to run validation every 100 steps.
tldr: leave it in.
https://github.com/tensorflow/tensorflow/blob/3f878cff5b698b82eea85db2b60d65a2e320850e/tensorflow/python/data/ops/dataset_ops.py#L134-L3445
https://docs.python.org/3/library/exceptions.html

How to use feed_dict in Tensorflow multiple GPU case

Recently, I try to learn how to use Tensorflow on multiple GPU to accelerate training speed. I found an official tutorial about training classification model based on Cifar10 dataset. However, I found that this tutorial reads image by using the queue. Out of curiosity, how can I use multiple GPU by feeding value into Session? It seems that it is hard for me to solve the problem that feeds different value from the same dataset to different GPU. Thank you, everybody! The following code is about part of the official tutorial.
images, labels = cifar10.distorted_inputs()
batch_queue = tf.contrib.slim.prefetch_queue.prefetch_queue(
[images, labels], capacity=2 * FLAGS.num_gpus)
# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
# Dequeues one batch for the GPU
image_batch, label_batch = batch_queue.dequeue()
# Calculate the loss for one tower of the CIFAR model. This function
# constructs the entire CIFAR model but shares the variables across
# all towers.
loss = tower_loss(scope, image_batch, label_batch)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)
# Calculate the gradients for the batch of data on this CIFAR tower.
grads = opt.compute_gradients(loss)
# Keep track of the gradients across all towers.
tower_grads.append(grads)

The core idea of the multi-GPU example is that you explicitly assign operations to a tf.device. The example loops over FLAGS.num_gpus devices and creates a replica for each of the GPUs.
If you create placeholder ops inside the for loop, they will get assigned to their respective devices. All you need to do is keep handles to the created placeholders and then feed them all independently in a single session.run call.
placeholders = []
for i in range(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
plc = tf.placeholder(tf.int32)
placeholders.append(plc)
with tf.Session() as sess:
fd = {plc: i for i, plc in enumerate(placeholders)}
sess.run(sum(placeholders), feed_dict=fd) # this should give you the sum of all
# numbers from 0 to FLAGS.num_gpus - 1
To address your specific example, it should suffice to replace the batch_queue.dequeue() call with the construction of two placeholders (for image_batch and label_batch tensors), store these placeholders somewhere, and then feed the values you need to those.
Another (somewhat hacky) way is to override the image_batch and label_batch tensors directly in the session.run call, because you can feed_dict any tensor (not just a placeholder). You will still need to store the tensors somewhere to be able to reference them from the run call.

QueueRunner and Queue-based API is relatively out-dated, it is clearly mentioned in Tensorflow docs:
Input pipelines using the queue-based APIs can be cleanly
replaced by the tf.data API
As a result, it is recommended to use tf.data API. It optimized for multi GPU and TPU purposes.
How to use it?
dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train))
iterator = dataset.make_one_shot_iterator()
x,y = iterator.get_next()
# define your model
logit = tf.layers.dense(x,2) # use x directrly in your model
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
train_step = tf.train.AdamOptimizer().minimize(cost)
with tf.Session() as sess:
sess.run(train_step)
You can create multiple iterator for each GPU with Dataset.shard() or more easily use estimator API.
For a complete tutorial see here.

Adding Tensorboard summaries from graph ops generated inside Dataset map() function calls

I've found the Dataset.map() functionality pretty nice for setting up pipelines to preprocess image/audio data before feeding into the network for training, but one issue I have is accessing the raw data before the preprocessing to send to tensorboard as a summary.
For example, say I have a function that loads audio data, does some framing, makes a spectrogram, and returns this.
import tensorflow as tf
def load_audio_examples(label, path):
# loads audio, converts to spectorgram
pcm = ... # this is what I'd like to put into tf.summmary.audio() !
# creates one-hot encoded labels, etc
return labels, examples
# create dataset
training = tf.data.Dataset.from_tensor_slices((
tf.constant(labels),
tf.constant(paths)
))
training = training.map(load_audio_examples, num_parallel_calls=4)
# create ops for training
train_step = # ...
accuracy = # ...
# create iterator
iterator = training.repeat().make_one_shot_iterator()
next_element = iterator.get_next()
# ready session
sess = tf.InteractiveSession()
tf.global_variables_initializer().run()
train_writer = # ...
# iterator
test_iterator = testing.make_one_shot_iterator()
test_next_element = iterator.get_next()
# train loop
for i in range(100):
batch_ys, batch_xs, path = sess.run(next_element)
summary, train_acc, _ = sess.run([summaries, accuracy, train_step],
feed_dict={x: batch_xs, y: batch_ys})
train_writer.add_summary(summary, i)
It appears as though this does not become part of the graph that is plotted in the "Graph" tab of tensorboard (see screenshot below).
As you can see, it's just X (the output of the preprocessing map() function).
How would I better structure this to get the raw audio into a tf.summary.audio()? Right now the things inside map() aren't accessible as Tensors inside my training loop.
Also, why isn't my graph showing up on Tensorboard? Worries me that I won't be able to export my model or use Tensorflow Serving to put my model into production because I'm using the new Dataset API - maybe I should go back to doing things manually? (with queues, etc).

I think your use of Dataset API doesn't make much sense. In fact you have 2 disconnected subgraphs. One for reading data and the other for running your training step.
batch_ys, batch_xs, path = sess.run(next_element)
summary, train_acc, _ = sess.run([summaries, accuracy, train_step],
feed_dict={x: batch_xs, y: batch_ys})
The first line in the code above runs session and fetches data items from it. It transfers data from Tensorflow backend into Python.
The next line feeds data using feed_dict and that is said to be inefficient. This time TensorFlow transfers data from Python to runtime.
This has the following consequences:
Your graph looks disconnected
TensorFlow wastes time doing unnecessary data transfer to and from Python.
To have a single graph (without disconnected subgraphs) you need to build your model on top of tensors returned by Dataset API. Please note that it is possible to switch between training and testing datasets without manual fetching of batches (see Dataset guide)
If to speak about summary defined in map_fn I believe you can retrieve summary from SUMMARIES collection (default collection for summaries). You can also pass your own collection name when adding summary operation.

Passing Input Pipeline to TensorFlow Estimator

I'm a noob to TF so go easy on me.
I have to train a simple CNN from a bunch of images in a directory with labels. After looking around a lot, I cooked up this code that prepares a TF input pipeline and I was able to print the image array.
image_list, label_list = load_dataset()
imagesq = ops.convert_to_tensor(image_list, dtype=dtypes.string)
labelsq = ops.convert_to_tensor(label_list, dtype=dtypes.int32)
# Makes an input queue
input_q = tf.train.slice_input_producer([imagesq, labelsq],
shuffle=True)
file_content = tf.read_file(input_q[0])
train_image = tf.image.decode_png(file_content,channels=3)
train_label = input_q[1]
train_image.set_shape([120,120,3])
# collect batches of images before processing
train_image_batch, train_label_batch = tf.train.batch(
[train_image, train_label],
batch_size=5
# ,num_threads=1
)
with tf.Session() as sess:
# initialize the variables
sess.run(tf.global_variables_initializer())
# initialize the queue threads to start to shovel data
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# print "from the train set:"
for i in range(len(image_list)):
print sess.run(train_image_batch)
# sess.run(train_image)
# sess.run(train_label)
# classifier.fit(input_fn=lambda: (train_image, train_label),
# steps=100,
# monitors=[logging_hook])
# stop our queue threads and properly close the session
coord.request_stop()
coord.join(threads)
sess.close()
But looking at the MNIST example given in TF docs, I see they use a cnn_model_fn along with Estimator class.
I have defined my own cnn_model_fn and would like to combine the two. Please help me on how to move forward with this. This code doesn't work
classifier = learn.Estimator(model_fn=cnn_model_fn, model_dir='./test_model')
classifier.fit(input_fn=lambda: (train_image, train_label),
steps=100,
monitors=[logging_hook])
It seems the pipeline is populated only when the session is run, otherwise its empty and it gives a ValueError 'Input graph and Layer graph are not the same'
Please help me.

I'm new to tensorflow myself so take this with a grain of salt.
AFAICT, when you call any of the tf APIs that create "tensors" or "operations" they are created into a context called a Graph.
Further, I believe when the Estimator runs it creates a new empty Graph for each run. It populates the Graph by running model_fn and input_fn that are supposed to call tf APIs that add the "tensors" and "operations" in context of this fresh Graph.
The return values from model_fn and input_fn just provide references so that the parts could be connected correctly - the Graph already contains them.
However in this example the input operations have already been created before the Estimator created the Graph and thus their related operations have been added to the implicit default Graph (one is created automatically I believe). So when the Estimator creates a new one and populates the model with model_fn the input and model will be on two different graphs.
To fix this you need to change the input_fn. Don't just wrap the (image, labels) pair into a lambda but rather wrap the entire construction of the input into a function so that when the Estimator runs input_fn as a side effect of all the API calls all the input operations and tensors would be created in context of the correct Graph.

How to use tf.cond in combination with batching operations / queue runners

Situation
I want to train a specific network architecture (a GAN) that needs inputs from different sources during training.
One input source is examples loaded from disk. The other source is a generator sub-network creating examples.
To choose which kind of input to feed to the network I use tf.cond. There is one caveat though that has already been explained: tf.cond evaluates the inputs to both conditional branches even though only one of those will ultimately be used.
Enough setup, here is a minimal working example:
import numpy as np
import tensorflow as tf
BATCH_SIZE = 32
def load_input_data():
# Normally this data would be read from disk
data = tf.reshape(np.arange(10 * BATCH_SIZE, dtype=np.float32), shape=(10 * BATCH_SIZE, 1))
return tf.train.batch([data], BATCH_SIZE, enqueue_many=True)
def generate_input_data():
# Normally this data would be generated by a much bigger sub-network
return tf.random_uniform(shape=[BATCH_SIZE, 1])
def main():
# A bool to choose between loaded or generated inputs
load_inputs_pred = tf.placeholder(dtype=tf.bool, shape=[])
# Variant 1: Call "load_input_data" inside tf.cond
data_batch = tf.cond(load_inputs_pred, load_input_data, generate_input_data)
# Variant 2: Call "load_input_data" outside tf.cond
#loaded_data = load_input_data()
#data_batch = tf.cond(load_inputs_pred, lambda: loaded_data, generate_input_data)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
print(threads)
# Get generated input data
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: False})
print(data_batch_values)
# Get input data loaded from disk
data_batch_values = sess.run(data_batch, feed_dict={load_inputs_pred: True})
print(data_batch_values)
if __name__ == '__main__':
main()
Problem
Variant 1 does not work at all since the queue runner threads don't seem to run. print(threads) outputs something like [<Thread(Thread-1, stopped daemon 140165838264064)>, ...].
Variant 2 does work and print(threads) outputs something like [<Thread(Thread-1, started daemon 140361854863104)>, ...]. But since load_input_data() has been called outside of tf.cond, batches of data will be loaded from disk even when load_inputs_pred is False.
Is it possible to make Variant 1 work, so that input data is only loaded when load_inputs_pred is True and not for every call to session.run()?

If you're using a queue when loading your data and follow it up with a batch input then this shouldn't be a problem as you can specify the max amount to have loaded or stored in the queue.
input = tf.WholeFileReader(somefilelist) # or another way to load data
return tf.train.batch(input,batch_size=10,capacity=100)
See here for more details:
https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#batch
Also there's an alternative approach that skips the tf.cond completely. Just define two losses one that follows the data through the autoencoder and discrimator and the other that follows the data through just the discriminator.
Then it just becomes a matter of calling
sess.run(auto_loss,feed_dict)
or
sess.run(real_img_loss,feed_dict)
In this way the graph will only run through which ever loss was called upon. Let me know if this needs more explanation.
Lastly I think to make variant one work you need to do something like this if you're using preloaded data.
https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#preloaded-data
Otherwise I'm not sure what the issue is to be honest.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow string_input_producer stuck in queue - python

You need to start the queue runners, or the queues will be empty and reading from them will hang. See the documentation on queue runners.

Related

forecasting tensorflow confused with usage of repeat

How to use feed_dict in Tensorflow multiple GPU case

Adding Tensorboard summaries from graph ops generated inside Dataset map() function calls

Passing Input Pipeline to TensorFlow Estimator

How to use tf.cond in combination with batching operations / queue runners

Categories

Resources