TF slice_input_producer not keeping tensors in sync

TF slice_input_producer not keeping tensors in sync - python

I'm reading images into my TF network, but I also need the associated labels along with them.
So I tried to follow this answer, but the labels that are output don't actually match the images that I'm getting in every batch.
The names of my images are in the format dir/3.jpg, so I just extract the label from the image file name.
truth_filenames_np = ...
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
# *** This line should make sure both input tensors are synced (from my limited understanding)
# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer([truth_filenames_tf, labels_tf], shuffle=False)
truth_image_value = tf.read_file(truth_image_name)
truth_image = tf.image.decode_jpeg(truth_image_value)
truth_image.set_shape([IMAGE_DIM, IMAGE_DIM, 3])
truth_image = tf.cast(truth_image, tf.float32)
truth_image = truth_image/255.0
# Another key step, where I batch them together
truth_images_batch, truth_label_batch = tf.train.batch([truth_image, truth_label], batch_size=mb_size)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(epochs):
print "Epoch ", i
X_truth_batch = truth_images_batch.eval()
X_label_batch = truth_label_batch.eval()
# Here I display all the images in this batch, and then I check which file numbers they actually are.
# BUT, the images that are displayed don't correspond with what is printed by X_label_batch!
print X_label_batch
plot_batch(X_truth_batch)
coord.request_stop()
coord.join(threads)
Am I doing something wrong, or does the slice_input_producer not actually ensure that its input tensors are synced?
Aside:
I also noticed that when I get a batch from tf.train.batch, the elements in the batch are adjacent to each other in the original list I gave it, but the batch order isn't in the original order.
Example: If my data is ["dir/1.jpg", "dir/2.jpg", "dir/3.jpg", "dir/4.jpg", "dir/5.jpg, "dir/6.jpg"], then I may get the batch (with batch_size=2) ["dir/3.jpg", "dir/4.jpg"], then batch ["dir/1.jpg", "dir/2.jpg"], and then the last one.
So this makes it hard to even just use a FIFO queue for the labels since the order won't match the batch order.

Here is a complete runnable example that reproduces the problem:
import tensorflow as tf
truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer(
[truth_filenames_tf, labels_tf], shuffle=False)
# # Another key step, where I batch them together
# truth_images_batch, truth_label_batch = tf.train.batch(
# [truth_image_name, truth_label], batch_size=11)
epochs = 7
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(epochs):
print("Epoch ", i)
X_truth_batch = truth_image_name.eval()
X_label_batch = truth_label.eval()
# Here I display all the images in this batch, and then I check
# which file numbers they actually are.
# BUT, the images that are displayed don't correspond with what is
# printed by X_label_batch!
print(X_truth_batch)
print(X_label_batch)
coord.request_stop()
coord.join(threads)
What this prints is:
Epoch 0
b'dir/0.jpg'
b'1.jpg'
Epoch 1
b'dir/2.jpg'
b'3.jpg'
Epoch 2
b'dir/4.jpg'
b'5.jpg'
Epoch 3
b'dir/6.jpg'
b'7.jpg'
Epoch 4
b'dir/8.jpg'
b'9.jpg'
Epoch 5
b'dir/10.jpg'
b'11.jpg'
Epoch 6
b'dir/12.jpg'
b'13.jpg'
So basically each eval call runs the operation another time ! Adding the batching does not make a difference to that - just prints batches (the first 11 filenames followed by the next 11 labels and so on)
The workaround I see is:
for i in range(epochs):
print("Epoch ", i)
pair = tf.convert_to_tensor([truth_image_name, truth_label]).eval()
print(pair[0])
print(pair[1])
which correctly prints:
Epoch 0
b'dir/0.jpg'
b'0.jpg'
Epoch 1
b'dir/1.jpg'
b'1.jpg'
# ...
but does nothing for the violation of the principle of the least surprise.
EDIT: yet another way of doing it:
import tensorflow as tf
truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
truth_image_name, truth_label = tf.train.slice_input_producer(
[truth_filenames_tf, labels_tf], shuffle=False)
epochs = 7
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)
for i in range(epochs):
print("Epoch ", i)
X_truth_batch, X_label_batch = sess.run(
[truth_image_name, truth_label])
print(X_truth_batch)
print(X_label_batch)
That's a much better way as tf.convert_to_tensor and co only accept tensors of same type/shape etc.
Note that I removed the coordinator for simplicity, which however results in a warning:
W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\kernels\queue_base.cc:294] _0_input_producer/input_producer/fraction_of_32_full/fraction_of_32_full: Skipping cancelled enqueue attempt with queue not closed
See this

Related

AssertionError Tensorflow

I try to using this Code - https://github.com/KGPML/Hyperspectral
def run_training():
"""Train MNIST for a number of steps."""
# Get the sets of images and labels for training, validation, and
# test on IndianPines.
"""Concatenating all the training and test mat files"""
for i in range(TRAIN_FILES):
Training_data = input_data.read_data_sets(os.path.join(DATA_PATH, 'Train_'+str(IMAGE_SIZE)+'_'+str(1+1)+'.mat'), 'train')
for i in range(TEST_FILES):
Test_data = input_data.read_data_sets(os.path.join(DATA_PATH, 'Test_'+str(IMAGE_SIZE)+'_'+str(0+1)+'.mat'),'test')
# Tell TensorFlow that the model will be built into the default Graph.
with tf.Graph().as_default():
# Generate placeholders for the images and labels.
images_placeholder, labels_placeholder = placeholder_inputs(FLAGS.batch_size)
# Build a Graph that computes predictions from the inference model.
logits = IndianPinesMLP.inference(images_placeholder,
FLAGS.hidden1,
FLAGS.hidden2,
FLAGS.hidden3)
# Add to the Graph the Ops for loss calculation.
loss = IndianPinesMLP.loss(labels=labels_placeholder, logits=logits)
# Add to the Graph the Ops that calculate and apply gradients.
train_op = IndianPinesMLP.training(loss, FLAGS.learning_rate)
# Add the Op to compare the logits to the labels during evaluation.
eval_correct = IndianPinesMLP.evaluation(labels=labels_placeholder, logits=logits)
# Build the summary operation based on the TF collection of Summaries.
# summary_op = tf.merge_all_summaries()
# Add the variable initializer Op.
init = tf.initialize_all_variables()
# Create a saver for writing training checkpoints.
saver = tf.train.Saver()
# Create a session for running Ops on the Graph.
sess = tf.Session()
# Instantiate a SummaryWriter to output summaries and the Graph.
# summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
# And then after everything is built:
# Run the Op to initialize the variables.
sess.run(init)
# Start the training loop.
for step in xrange(FLAGS.max_steps):
start_time = time.time()
# Fill a feed dictionary with the actual set of images and labels
# for this particular training step.
feed_dict = fill_feed_dict(Training_data,
images_placeholder,
labels_placeholder)
# Run one step of the model. The return values are the activations
# from the `train_op` (which is discarded) and the `loss` Op. To
# inspect the values of your Ops or variables, you may include them
# in the list passed to sess.run() and the value tensors will be
# returned in the tuple from the call.
_, loss_value = sess.run([train_op, loss],
feed_dict=feed_dict)
duration = time.time() - start_time
# Write the summaries and print an overview fairly often.
if step % 50 == 0:
# Print status to stdout.
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
# Update the events file.
# summary_str = sess.run(summary_op, feed_dict=feed_dict)
# summary_writer.add_summary(summary_str, step)
# summary_writer.flush()
# Save a checkpoint and evaluate the model periodically.
if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
saver.save(sess, '.\model-MLP-'+str(IMAGE_SIZE)+'X'+str(IMAGE_SIZE)+'.ckpt', global_step=step)
# Evaluate against the training set.
print('Training Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
Training_data)
print('Test Data Eval:')
do_eval(sess,
eval_correct,
images_placeholder,
labels_placeholder,
Test_data)
and got an Error:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-23-0683f80cdbe4> in <module>()
----> 1 run_training()
<ipython-input-22-b34daa52b702> in run_training()
60 feed_dict = fill_feed_dict(Training_data,
61 images_placeholder,
---> 62 labels_placeholder)
63
64 # Run one step of the model. The return values are the activations
If run these parts manually I got no errors :
<ipython-input-5-f04ef9a1e6b2> in fill_feed_dict(data_set, images_pl, labels_pl)
15 # Create the feed_dict for the placeholders filled with the next
16 # `batch size ` examples.
---> 17 images_feed, labels_feed = data_set.next_batch(batch_size)
18 feed_dict = {
19 images_pl: images_feed,
the same problem here :
~\Path to: \Spatial_dataset.py in next_batch(self, batch_size)
87 start = 0
88 self._index_in_epoch = batch_size
---> 89 assert batch_size <= self._num_examples
90 end = self._index_in_epoch
91 return self._images[start:end], np.reshape(self._labels[start:end],len(self._labels[start:end]))
AssertionError:
When I now run run_training() then the error above occurs.
What does this mean and how can I solve it, Google was not a help in this case.
Thanks for every help.

The main error is due to:
---> 89 assert batch_size <= self._num_examples
Change batch_size such that it is a factor of the number of training set files (without validation) as well as a factor of the total number of training set images (with validation).
For example, if your training set has 100 files, and 0.2 is the validation_size, 80 images will be trained and 20 will be used for validation. So choose batch_size such that it is a factor of 80, say 20. 20 is a factor of 80 as well as 100.

How to use TensorFlow tf.train.string_input_producer to produce several epochs data?

When I want to use tf.train.string_input_producer to load data for 2 epochs, I used
filename_queue = tf.train.string_input_producer(filenames=['data.csv'], num_epochs=2, shuffle=True)
col1_batch, col2_batch, col3_batch = tf.train.shuffle_batch([col1, col2, col3], batch_size=batch_size, capacity=capacity,\min_after_dequeue=min_after_dequeue, allow_smaller_final_batch=True)
But then I found that this op did not produce what I want.
It can only produce each sample in data.csv for 2 times, but the generated order is not clearly. For example, 3 line data in data.csv
[[1]
[2]
[3]]
it will produce (which each sample just appear 2 times, but the order is optional)
[1]
[1]
[3]
[2]
[2]
[3]
but what I want is (each epoch is separate, shuffle in each epoch)
(epoch 1:)
[1]
[2]
[3]
(epoch 2:)
[1]
[3]
[2]
In addition, how to know when 1 epoch was done? Is there some flag variables? Thanks!
my code is here.
import tensorflow as tf
def read_my_file_format(filename_queue):
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [['1'], ['1'], ['1']]
col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defaults, field_delim='-')
# col1 = list(map(int, col1.split(',')))
# col2 = list(map(int, col2.split(',')))
return col1, col2, col3
def input_pipeline(filenames, batch_size, num_epochs=1):
filename_queue = tf.train.string_input_producer(
filenames, num_epochs=num_epochs, shuffle=True)
col1,col2,col3 = read_my_file_format(filename_queue)
min_after_dequeue = 10
capacity = min_after_dequeue + 3 * batch_size
col1_batch, col2_batch, col3_batch = tf.train.shuffle_batch(
[col1, col2, col3], batch_size=batch_size, capacity=capacity,
min_after_dequeue=min_after_dequeue, allow_smaller_final_batch=True)
return col1_batch, col2_batch, col3_batch
filenames=['1.txt']
batch_size = 3
num_epochs = 1
a1,a2,a3=input_pipeline(filenames, batch_size, num_epochs)
with tf.Session() as sess:
sess.run(tf.local_variables_initializer())
# start populating filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
while not coord.should_stop():
a, b, c = sess.run([a1, a2, a3])
print(a, b, c)
except tf.errors.OutOfRangeError:
print('Done training, epoch reached')
finally:
coord.request_stop()
coord.join(threads)
my data is like
1,2-3,4-A
7,8-9,10-B
12,13-14,15-C
17,18-19,20-D
22,23-24,25-E
27,28-29,30-F
32,33-34,35-G
37,38-39,40-H

As Nicolas observes, the tf.train.string_input_producer() API does not give you the ability to detect when the end of an epoch is reached; instead it concatenates together all epochs into one long batch. For this reason, we recently added (in TensorFlow 1.2) the tf.contrib.data API, which makes it possible to express more sophisticated pipelines, including your use case.
The following code snippet shows how you would write your program using tf.contrib.data:
import tensorflow as tf
def input_pipeline(filenames, batch_size):
# Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.
dataset = (tf.contrib.data.TextLineDataset(filenames)
.map(lambda line: tf.decode_csv(
line, record_defaults=[['1'], ['1'], ['1']], field_delim='-'))
.shuffle(buffer_size=10) # Equivalent to min_after_dequeue=10.
.batch(batch_size))
# Return an *initializable* iterator over the dataset, which will allow us to
# re-initialize it at the beginning of each epoch.
return dataset.make_initializable_iterator()
filenames=['1.txt']
batch_size = 3
num_epochs = 10
iterator = input_pipeline(filenames, batch_size)
# `a1`, `a2`, and `a3` represent the next element to be retrieved from the iterator.
a1, a2, a3 = iterator.get_next()
with tf.Session() as sess:
for _ in range(num_epochs):
# Resets the iterator at the beginning of an epoch.
sess.run(iterator.initializer)
try:
while True:
a, b, c = sess.run([a1, a2, a3])
print(a, b, c)
except tf.errors.OutOfRangeError:
# This will be raised when you reach the end of an epoch (i.e. the
# iterator has no more elements).
pass
# Perform any end-of-epoch computation here.
print('Done training, epoch reached')

You might want to have a look to this answer to a similar question.
The short story is that:
if num_epochs > 1, all the data is enqueued at the same time and suffled independently of the epoch,
so you don't have the ability to monitor which epoch is being dequeued.
What you could do is the first suggestion in the quoted answer, which is to work with num_epochs == 1, and reinitialise the local queue variables (and obviously not the model variables) in each run.
init_queue = tf.variables_initializer(tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES, scope='input_producer'))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
for e in range(num_epochs):
with tf.Session() as sess:
sess.run(init_queue) # reinitialize the local variables in the input_producer scope
# start populating filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
try:
while not coord.should_stop():
a, b, c = sess.run([a1, a2, a3])
print(a, b, c)
except tf.errors.OutOfRangeError:
print('Done training, epoch reached')
finally:
coord.request_stop()
coord.join(threads)

Tensorflow: How can I run my testing set on a trained Neural Net

I have created a Neural Net that takes as input an RGB corrupted image and produces a clean version of it. After I finish training my NN I want to test it on a big set of 50 images. Each input (image) consists of a batch sized 64*32*32*3( I crop my image in 64 patches and then feed it to the NN). I train my NN with the following code:
# placeholders, variables etc here
train_step = tf.train.AdamOptimizer().minimize(loss)
#loading data to queue
training_queue = tf.train.string_input_producer(clean_set, shuffle=False)
cor_queue = tf.train.string_input_producer(corrupted_set, shuffle=False)
reader = tf.WholeFileReader()
key, value = reader.read(training_queue)
cor_key, cor_value = reader.read(cor_queue)
data = tf.image.decode_jpeg(value, channels = 3)
cor_data = tf.image.decode_jpeg(cor_value, channels = 3)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range (64):
my_data = sess.run([key,data,cor_key,cor_data])
im_list.append(my_data[1].reshape(1,-1))
key_list.append(my_data[0])
cor_im_list.append(my_data[3].reshape(1,-1))
cor_key_list.append(my_data[2])
for j in range(my_times):
_, y = sess.run([train_step,h_y],feed_dict={x: cor_im_list, y_: im_list})
print 'finished training NN'
coord.request_stop()
coord.join(threads)
This works fine!
Now I want to test my data set:
ressu = []
test_im_list = []
test_key_list = []
# I have 50 images in 50 folders (each folder contains the 64 patches of the image)
for i in range(50):
path = 'randomize_text/permutated_data/perm_test_' + str(i) + '/*.jpg'
testing_set = glob.glob(path)
testing_queue = tf.train.string_input_producer(testing_set, shuffle=False)
reader = tf.WholeFileReader()
test_key, test_value = reader.read(testing_queue)
test_data = tf.image.decode_jpeg(test_value, channels = 3)
for j in range (64):
print j
my_data = sess.run([test_key,test_data])
test_im_list.append(my_data[1].reshape(1,-1))
test_key_list.append(my_data[0])
psi = sess.run(y,feed_dict={x: test_im_list})
ressu.append(psi)
If put this code right after finishing training the NN the program becomes unresponsive and my guess is that I dont use the coord and threads so I cant handle the big set (even if place it right before I close threads). If I load it the way I loaded my training set I can only do it for one image which is not enough, I need to load them all.
How can I test my trained NN with my testing set?
Thanks

Restoring queue state in Tensorflow from checkpoint

Context: I am training a model using an Estimator. Without extraneous details, I am using queues to read in a series of input images, which are batched and manipulated using an input function which I am call "read_pics_batch":
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
keypoint_regression.fit(
input_fn = lambda: inp_mod.read_pics_batch(names_train, \
joint_annopoints_train,num_in_batch,max_num_epochs,'TRAIN'),
steps= max_steps_per_epoch*max_num_epochs, # max number of steps
monitors=[logging_hook])
coord.request_stop()
coord.join(threads)
The input function has the following form, where I am also randomising the input file order:
def read_pics_batch(names_list,joint_list,batch_size,max_num_epochs,task):
names_tensor = tf.convert_to_tensor(names_list, dtype=tf.string)
joint_total_tensor = tf.convert_to_tensor(joint_list, dtype=tf.int32)
min_after_dequeue = 100
capacity = min_after_dequeue + 3 * batch_size
file_pattern = [("...")]
examples = graph_io.read_keyed_batch_examples(file_pattern, batch_size, \
reader = tf.WholeFileReader, randomize_input = True, \
parse_fn = example_to_standard_pic, \
num_epochs = max_num_epochs, queue_capacity = capacity)
My questions are as follows:
1) Is there any way to restore the queue state from a checkpoint, like any other variable? If "randomize_input" from read_keyed_batch_examples would be set to False, than at each restart of the training_op I would read the same input files over and over, which is clearly not what I want.
2) If randomize_input = True, how exactly does the queue decide which files to enqueue? I see two possible options and I am unsure which is correct:
it selects a short-list of size "capacity" (from the full-list given by all the filenames defined by "file_pattern") and then randomises the order of the names in this short-list
it randomises the names in the full-list first, and then creates a short-list of size "capacity" out of this
If the second case applies, I don't believe that I would actually need to restore the queue state, since I would in principle read different files every time, but if the first case applies, I would still be reading the same few files over and over, just in a different order.
Thank you for your time!

How to use image_summary to view images from different batches in Tensorflow?

I am curious about how image_summary works. There is a parameter called max_images, which controls how many images would be shown. However it seems the summary only displays images from one batch. If we use bigger value of max_iamges, we will just view more images from the batch. Is there a way I can view for example one image from each batch?

To view one image from each batch, you need to fetch the result of the tf.image_summary() op every time you run a step. For example, it you have the following setup:
images = ...
loss = ...
optimizer = ...
train_op = optimizer.minimize(loss)
init_op = tf.initialize_all_variables()
image_summary_t = tf.image_summary(images.name, images, max_images=1)
sess = tf.Session()
summary_writer = tf.train.SummaryWriter(...)
sess.run(init_op)
...you could set up your training loop to capture one image per iteration as follows:
for _ in range(10000):
_, image_summary = sess.run([train_op, image_summary_t])
summary_writer.add_summary(image_summary)
Note that capturing summaries on each batch might be inefficient, and you should probably only capture the summary periodically for faster training.
EDIT: The above code writes a separate summary for each image, so your log will contain all of the images, but they will not all be visualized in TensorBoard. If you want to combine your summaries to visualize images from multiple batches, you could do the following:
combined_summary = tf.Summary()
for i in range(10000):
_, image_summary = sess.run([train_op, image_summary_t])
combined_summary.MergeFromString(image_summary)
if i % 10 == 0:
summary_writer.add_summary(combined_summary)
combined_summary = tf.Summary()

I was able to solve this by creating a new image_summary op for each batch. i.e. I went from something that looked like:
train_writer = tf.train.SummaryWriter('summary_dir')
img = tf.image_summary("fooImage", img_data)
for i in range(N_BATCHES):
summary, _ = sess.run([img, train_step])
train_writer.add_summary(summary, i)
(Which, frustratingly, was not doing what I expected.) To...
train_writer = tf.train.SummaryWriter('summary_dir')
for i in range(N_BATCHES):
# Images are sorted in lexicographic order, so zero-pad the name
img = tf.image_summary("fooImage{:06d}".format(i), img_data)
summary, _ = sess.run([img, train_step])
train_writer.add_summary(summary)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

TF slice_input_producer not keeping tensors in sync - python

Related

AssertionError Tensorflow

How to use TensorFlow tf.train.string_input_producer to produce several epochs data?

Tensorflow: How can I run my testing set on a trained Neural Net

Restoring queue state in Tensorflow from checkpoint

How to use image_summary to view images from different batches in Tensorflow?

Categories

Resources