Tensorflow Race conditions when chaining multiple queues

Tensorflow Race conditions when chaining multiple queues - python

I'd like to compute the mean of each of the RGB channels of a set of images in a multithreaded manner.
My idea was to have a string_input_producer that fills a filename_queue and then have a second FIFOQueue that is filled by num_threads threads that load images from the filenames in filename_queue, perform some ops on them and then enqueue the result.
This second queue is then accessed by one single thread (the main thread) that sums up all the values from the queue.
This is the code I have:
# variables for storing the mean and some intermediate results
mean = tf.Variable([0.0, 0.0, 0.0])
total = tf.Variable(0.0)
# the filename queue and the ops to read from it
filename_queue = tf.train.string_input_producer(filenames, num_epochs=1)
reader = tf.WholeFileReader()
_, value = reader.read(filename_queue)
image = tf.image.decode_jpeg(value, channels=3)
image = tf.cast(image, tf.float32)
sum = tf.reduce_sum(image, [0, 1])
num = tf.mul(tf.shape(image)[0], tf.shape(image)[1])
num = tf.cast(num, tf.float32)
# the second queue and its enqueue op
queue = tf.FIFOQueue(1000, dtypes=[tf.float32, tf.float32], shapes=[[3], []])
enqueue_op = queue.enqueue([sum, num])
# the ops performed by the main thread
img_sum, img_num = queue.dequeue()
mean_op = tf.add(mean, img_sum)
total_op = tf.add(total, img_num)
# adding new queue runner that performs enqueue_op on num_threads threads
qr = tf.train.QueueRunner(queue, [enqueue_op] * num_threads)
tf.train.add_queue_runner(qr)
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
# the main loop being executed until the OutOfRangeError
# (when filename_queue does not yield elements anymore)
try:
while not coord.should_stop():
mean, total = sess.run([mean_op, total_op])
except tf.errors.OutOfRangeError:
print 'All images processed.'
finally:
coord.request_stop()
coord.join(threads)
# some additional computations to get the mean
total_3channel = tf.pack([total, total, total])
mean = tf.div(mean, total_3channel)
mean = sess.run(mean)
print mean
The problem is each time I'm running this function I get different results, for example:
[ 99.35347748 58.35261154 44.56705856]
[ 95.91153717 92.54192352 87.48269653]
[ 124.991745 121.83417511 121.1891861 ]
I blame this to race conditions. But where do those race conditions come from? Can someone help me out?

Your QueueRunner will start num_threads threads which will race to access your reader and push the result onto the queue. The order of images on queue will vary depending on which thread finishes first.
Update Feb 12
A simple example of chaining two queues, and summing up values from the second queue. When using num_threads > 1, there's some non-determinism in the intermediate values, but the final value will always be 30. When num_threads=1, everything is deterministic
tf.reset_default_graph()
queue_dtype = np.int32
# values_queue is a queue that will be filled with 0,1,2,3,4
# range_input_producer creates the queue and registers its queue_runner
value_queue = tf.range_input_producer(limit=5, num_epochs=1, shuffle=False)
value = value_queue.dequeue()
# value_squared_queue will be filled with 0,1,4,9,16
value_squared_queue = tf.FIFOQueue(capacity=50, dtypes=queue_dtype)
value_squared_enqueue = value_squared_queue.enqueue(tf.square(value))
value_squared = value_squared_queue.dequeue()
# value_squared_sum keeps running sum of squares of values
value_squared_sum = tf.Variable(0)
value_squared_sum_update = value_squared_sum.assign_add(value_squared)
# register queue_runner in the global queue runners collection
num_threads = 2
qr = tf.train.QueueRunner(value_squared_queue, [value_squared_enqueue] * num_threads)
tf.train.queue_runner.add_queue_runner(qr)
sess = tf.InteractiveSession()
sess.run(tf.initialize_all_variables())
tf.start_queue_runners()
for i in range(5):
sess.run([value_squared_sum_update])
print sess.run([value_squared_sum])
You should see:
[0]
[1]
[5]
[14]
[30]
Or sometimes (when the order of first 2 values is flipped)
[1]
[1]
[5]
[14]
[30]

Related

How to use crop huge batch of images in tensorflow

I am trying to use below function to crop large number of images 100,000s. I am doing this operation serially, but its taking lot of time. What is the efficient way to do this?
tf.image.crop_to_bounding_box
Below is my code:
def crop_images(img_dir, list_images):
outlist=[]
with tf.Session() as session:
for image1 in list_images[:5]:
image = mpimg.imread(img_dir+image1)
x = tf.Variable(image, name='x')
data_t = tf.placeholder(tf.uint8)
op = tf.image.encode_jpeg(data_t, format='rgb')
model = tf.global_variables_initializer()
img_name = "img/"+image1.split("_img_0")[0] + "/img_0"+image1.split("_img_0")[1]
height = x.shape[1]
[x1,y1,x2,y2] = img_bbox_dict[img_name]
x = tf.image.crop_to_bounding_box(x, int(y1), int(x1), int(y2)-int(y1), int(x2)-int(x1))
session.run(model)
result = session.run(x)
data_np = session.run(op, feed_dict={ data_t: result })
with open(img_path+image1, 'w+') as fd:
fd.write(data_np)

I'll give a simplified version of one of the examples from Tensorflow's Programmer's guide on reading data which can be found here. Basically, it uses Reader and Filename Queues to batch together image data using a specified number of threads. These threads are coordinated using what is called a thread Coordinator.
import tensorflow as tf
import glob
images_path = "./" #RELATIVE glob pathname of current directory
images_extension = "*.png"
# Save the list of files matching pattern, so it is only computed once.
filenames = tf.train.match_filenames_once(glob.glob(images_path+images_extension))
batch_size = len(glob.glob1(images_path,images_extension))
num_epochs=1
standard_size = [500, 500]
num_channels = 3
min_after_dequeue = 10
num_preprocess_threads = 3
seed = 14131
"""
IMPORTANT: Cropping params. These are arbitrary values used only for this example.
You will have to change them according to your requirements.
"""
crop_size=[200,200]
boxes = [1,1,460,460]
"""
'WholeFileReader' is a Reader who's 'read' method outputs the next
key-value pair of the filename and the contents of the file (the image) from
the Queue, both of which are string scalar Tensors.
Note that the The QueueRunner works in a thread separate from the
Reader that pulls filenames from the queue, so the shuffling and enqueuing
process does not block the reader.
'resize_images' is used so that all images are resized to the same
size (Aspect ratios may change, so in that case use resize_image_with_crop_or_pad)
'set_shape' is used because the height and width dimensions of 'image' are
data dependent and cannot be computed without executing this operation. Without
this Op, the 'image' Tensor's shape will have None as Dimensions.
"""
def read_my_file_format(filename_queue, standard_size, num_channels):
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
if "jpg" in images_extension:
image = tf.image.decode_jpeg(image_file)
elif "png" in images_extension:
image = tf.image.decode_png(image_file)
image = tf.image.resize_images(image, standard_size)
image.set_shape(standard_size+[num_channels])
print "Successfully read file!"
return image
"""
'string_input_producer' Enters matched filenames into a 'QueueRunner' FIFO Queue.
'shuffle_batch' creates batches by randomly shuffling tensors. The 'capacity'
argument controls the how long the prefetching is allowed to grow the queues.
'min_after_dequeue' defines how big a buffer we will randomly
sample from -- bigger means better shuffling but slower startup & more memory used.
'capacity' must be larger than 'min_after_dequeue' and the amount larger
determines the maximum we will prefetch.
Recommendation: min_after_dequeue + (num_threads + a small safety margin) * batch_size
"""
def input_pipeline(filenames, batch_size, num_epochs, standard_size, num_channels, min_after_dequeue, num_preprocess_threads, seed):
filename_queue = tf.train.string_input_producer(filenames, num_epochs=num_epochs, shuffle=True)
example = read_my_file_format(filename_queue, standard_size, num_channels)
capacity = min_after_dequeue + 3 * batch_size
example_batch = tf.train.shuffle_batch([example], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue, num_threads=num_preprocess_threads, seed=seed, enqueue_many=False)
print "Batching Successful!"
return example_batch
"""
Any transformation on the image batch goes here. Refer the documentation
for the details of how the cropping is done using this function.
"""
def crop_batch(image_batch, batch_size, b_boxes, crop_size):
cropped_images = tf.image.crop_and_resize(image_batch, boxes=[b_boxes for _ in xrange(batch_size)], box_ind=[i for i in xrange(batch_size)], crop_size=crop_size)
print "Cropping Successful!"
return cropped_images
example_batch = input_pipeline(filenames, batch_size, num_epochs, standard_size, num_channels, min_after_dequeue, num_preprocess_threads, seed)
cropped_images = crop_batch(example_batch, batch_size, boxes, crop_size)
"""
if 'num_epochs' is not `None`, the 'string_input_producer' function creates local
counter `epochs`. Use `local_variables_initializer()` to initialize local variables.
'Coordinator' class implements a simple mechanism to coordinate the termination
of a set of threads. Any of the threads can call `coord.request_stop()` to ask for all
the threads to stop. To cooperate with the requests, each thread must check for
`coord.should_stop()` on a regular basis.
`coord.should_stop()` returns True` as soon as `coord.request_stop()` has been called.
A thread can report an exception to the coordinator as part of the `should_stop()`
call. The exception will be re-raised from the `coord.join()` call.
After a thread has called `coord.request_stop()` the other threads have a
fixed time to stop, this is called the 'stop grace period' and defaults to 2 minutes.
If any of the threads is still alive after the grace period expires `coord.join()`
raises a RuntimeError reporting the laggards.
IMPORTANT: 'start_queue_runners' starts threads for all queue runners collected in
the graph, & returns the list of all threads. This must be executed BEFORE running
any other training/inference/operation steps, or it will hang forever.
"""
with tf.Session() as sess:
_, _ = sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
try:
while not coord.should_stop():
# Run training steps or whatever
cropped_images1 = sess.run(cropped_images)
print cropped_images1.shape
except tf.errors.OutOfRangeError:
print('Load and Process done -- epoch limit reached')
finally:
# When done, ask the threads to stop.
coord.request_stop()
coord.join(threads)
sess.close()

Matrix factorization based recommendation using Tensorflow

I am new to tensor Flow and exploring about recommendation system using tensorflow. I have verified few sample codes in in github and come across mostly the same like following as the follwing
https://github.com/songgc/TF-recomm/blob/master/svd_train_val.py
But the question is, how do I pick top recommendation for user U1 in the above code?
If there any sample code or approach, please share. Thanks

It is a little difficult! Basically, when svd returns, it closes the session, and the tensors lose their values (you still keep the graph). There are a few options:
Save the model to a file and restore it later;
Don't put the session in a with tf.Session() as sess: .... block, and instead return the session;
Do the user processing inside the with ... block
The worst option is option 3: you should train your model separately from using it. The best approach is to save your model and weights somewhere, then restore the session. However, you are still left with the question of how you use this session object once you have recovered it. To demonstrate just that part, I am going to solve this problem using option 3, assuming that you know how to restore a session.
def svd(train, test):
samples_per_batch = len(train) // BATCH_SIZE
iter_train = dataio.ShuffleIterator([train["user"],
train["item"],
train["rate"]],
batch_size=BATCH_SIZE)
iter_test = dataio.OneEpochIterator([test["user"],
test["item"],
test["rate"]],
batch_size=-1)
user_batch = tf.placeholder(tf.int32, shape=[None], name="id_user")
item_batch = tf.placeholder(tf.int32, shape=[None], name="id_item")
rate_batch = tf.placeholder(tf.float32, shape=[None])
infer, regularizer = ops.inference_svd(user_batch, item_batch, user_num=USER_NUM, item_num=ITEM_NUM, dim=DIM,
device=DEVICE)
global_step = tf.contrib.framework.get_or_create_global_step()
_, train_op = ops.optimization(infer, regularizer, rate_batch, learning_rate=0.001, reg=0.05, device=DEVICE)
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
summary_writer = tf.summary.FileWriter(logdir="/tmp/svd/log", graph=sess.graph)
print("{} {} {} {}".format("epoch", "train_error", "val_error", "elapsed_time"))
errors = deque(maxlen=samples_per_batch)
start = time.time()
for i in range(EPOCH_MAX * samples_per_batch):
users, items, rates = next(iter_train)
_, pred_batch = sess.run([train_op, infer], feed_dict={user_batch: users, item_batch: items, rate_batch: rates})
pred_batch = clip(pred_batch)
errors.append(np.power(pred_batch - rates, 2))
if i % samples_per_batch == 0:
train_err = np.sqrt(np.mean(errors))
test_err2 = np.array([])
for users, items, rates in iter_test:
pred_batch = sess.run(infer, feed_dict={user_batch: users,item_batch: items})
pred_batch = clip(pred_batch)
test_err2 = np.append(test_err2, np.power(pred_batch - rates, 2))
end = time.time()
test_err = np.sqrt(np.mean(test_err2))
print("{:3d} {:f} {:f} {:f}(s)".format(i // samples_per_batch, train_err, test_err, end - start))
train_err_summary = make_scalar_summary("training_error", train_err)
test_err_summary = make_scalar_summary("test_error", test_err)
summary_writer.add_summary(train_err_summary, i)
summary_writer.add_summary(test_err_summary, i)
start = end
# Get the top rated movie for user #1 for every item in the set
userNumber = 1
user_prediction = sess.run(infer, feed_dict={user_batch: np.array([userNumber]), item_batch: np.array(range(ITEM_NUM))})
# The index number is the same as the item number. Orders from lowest (least recommended)
# to largeset
index_rating_order = np.argsort(user_prediction)
print "Top ten recommended items for user {} are".format(userNumber)
print index_rating_order[-10:][::-1] # at the end, reverse the list
# If you want to include the score:
items_to_choose = index_rating_order[-10:][::-1]
for item, score in zip(items_to_choose, user_prediction[items_to_choose]):
print "{}: {}".format(item,score)
The only changes I made begin at the first commented line. To emphasize again, best practice would be to train in this function, but to actually make your predictions separately.

Restoring queue state in Tensorflow from checkpoint

Context: I am training a model using an Estimator. Without extraneous details, I am using queues to read in a series of input images, which are batched and manipulated using an input function which I am call "read_pics_batch":
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
keypoint_regression.fit(
input_fn = lambda: inp_mod.read_pics_batch(names_train, \
joint_annopoints_train,num_in_batch,max_num_epochs,'TRAIN'),
steps= max_steps_per_epoch*max_num_epochs, # max number of steps
monitors=[logging_hook])
coord.request_stop()
coord.join(threads)
The input function has the following form, where I am also randomising the input file order:
def read_pics_batch(names_list,joint_list,batch_size,max_num_epochs,task):
names_tensor = tf.convert_to_tensor(names_list, dtype=tf.string)
joint_total_tensor = tf.convert_to_tensor(joint_list, dtype=tf.int32)
min_after_dequeue = 100
capacity = min_after_dequeue + 3 * batch_size
file_pattern = [("...")]
examples = graph_io.read_keyed_batch_examples(file_pattern, batch_size, \
reader = tf.WholeFileReader, randomize_input = True, \
parse_fn = example_to_standard_pic, \
num_epochs = max_num_epochs, queue_capacity = capacity)
My questions are as follows:
1) Is there any way to restore the queue state from a checkpoint, like any other variable? If "randomize_input" from read_keyed_batch_examples would be set to False, than at each restart of the training_op I would read the same input files over and over, which is clearly not what I want.
2) If randomize_input = True, how exactly does the queue decide which files to enqueue? I see two possible options and I am unsure which is correct:
it selects a short-list of size "capacity" (from the full-list given by all the filenames defined by "file_pattern") and then randomises the order of the names in this short-list
it randomises the names in the full-list first, and then creates a short-list of size "capacity" out of this
If the second case applies, I don't believe that I would actually need to restore the queue state, since I would in principle read different files every time, but if the first case applies, I would still be reading the same few files over and over, just in a different order.
Thank you for your time!

TF slice_input_producer not keeping tensors in sync

I'm reading images into my TF network, but I also need the associated labels along with them.
So I tried to follow this answer, but the labels that are output don't actually match the images that I'm getting in every batch.
The names of my images are in the format dir/3.jpg, so I just extract the label from the image file name.
truth_filenames_np = ...
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
# *** This line should make sure both input tensors are synced (from my limited understanding)
# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer([truth_filenames_tf, labels_tf], shuffle=False)
truth_image_value = tf.read_file(truth_image_name)
truth_image = tf.image.decode_jpeg(truth_image_value)
truth_image.set_shape([IMAGE_DIM, IMAGE_DIM, 3])
truth_image = tf.cast(truth_image, tf.float32)
truth_image = truth_image/255.0
# Another key step, where I batch them together
truth_images_batch, truth_label_batch = tf.train.batch([truth_image, truth_label], batch_size=mb_size)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(epochs):
print "Epoch ", i
X_truth_batch = truth_images_batch.eval()
X_label_batch = truth_label_batch.eval()
# Here I display all the images in this batch, and then I check which file numbers they actually are.
# BUT, the images that are displayed don't correspond with what is printed by X_label_batch!
print X_label_batch
plot_batch(X_truth_batch)
coord.request_stop()
coord.join(threads)
Am I doing something wrong, or does the slice_input_producer not actually ensure that its input tensors are synced?
Aside:
I also noticed that when I get a batch from tf.train.batch, the elements in the batch are adjacent to each other in the original list I gave it, but the batch order isn't in the original order.
Example: If my data is ["dir/1.jpg", "dir/2.jpg", "dir/3.jpg", "dir/4.jpg", "dir/5.jpg, "dir/6.jpg"], then I may get the batch (with batch_size=2) ["dir/3.jpg", "dir/4.jpg"], then batch ["dir/1.jpg", "dir/2.jpg"], and then the last one.
So this makes it hard to even just use a FIFO queue for the labels since the order won't match the batch order.

Here is a complete runnable example that reproduces the problem:
import tensorflow as tf
truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
# get the labels
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
# My list is also already shuffled, so I set shuffle=False
truth_image_name, truth_label = tf.train.slice_input_producer(
[truth_filenames_tf, labels_tf], shuffle=False)
# # Another key step, where I batch them together
# truth_images_batch, truth_label_batch = tf.train.batch(
# [truth_image_name, truth_label], batch_size=11)
epochs = 7
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(epochs):
print("Epoch ", i)
X_truth_batch = truth_image_name.eval()
X_label_batch = truth_label.eval()
# Here I display all the images in this batch, and then I check
# which file numbers they actually are.
# BUT, the images that are displayed don't correspond with what is
# printed by X_label_batch!
print(X_truth_batch)
print(X_label_batch)
coord.request_stop()
coord.join(threads)
What this prints is:
Epoch 0
b'dir/0.jpg'
b'1.jpg'
Epoch 1
b'dir/2.jpg'
b'3.jpg'
Epoch 2
b'dir/4.jpg'
b'5.jpg'
Epoch 3
b'dir/6.jpg'
b'7.jpg'
Epoch 4
b'dir/8.jpg'
b'9.jpg'
Epoch 5
b'dir/10.jpg'
b'11.jpg'
Epoch 6
b'dir/12.jpg'
b'13.jpg'
So basically each eval call runs the operation another time ! Adding the batching does not make a difference to that - just prints batches (the first 11 filenames followed by the next 11 labels and so on)
The workaround I see is:
for i in range(epochs):
print("Epoch ", i)
pair = tf.convert_to_tensor([truth_image_name, truth_label]).eval()
print(pair[0])
print(pair[1])
which correctly prints:
Epoch 0
b'dir/0.jpg'
b'0.jpg'
Epoch 1
b'dir/1.jpg'
b'1.jpg'
# ...
but does nothing for the violation of the principle of the least surprise.
EDIT: yet another way of doing it:
import tensorflow as tf
truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)]
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np)
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np]
labels_tf = tf.convert_to_tensor(labels)
truth_image_name, truth_label = tf.train.slice_input_producer(
[truth_filenames_tf, labels_tf], shuffle=False)
epochs = 7
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
tf.train.start_queue_runners(sess=sess)
for i in range(epochs):
print("Epoch ", i)
X_truth_batch, X_label_batch = sess.run(
[truth_image_name, truth_label])
print(X_truth_batch)
print(X_label_batch)
That's a much better way as tf.convert_to_tensor and co only accept tensors of same type/shape etc.
Note that I removed the coordinator for simplicity, which however results in a warning:
W c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\kernels\queue_base.cc:294] _0_input_producer/input_producer/fraction_of_32_full/fraction_of_32_full: Skipping cancelled enqueue attempt with queue not closed
See this

How to prefetch data using a custom python function in tensorflow

I am trying to prefetch training data to hide I/O latency. I would like to write custom Python code that loads data from disk and preprocesses the data (e.g. by adding a context window). In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Update: I have a working example based on #mrry's example.
import numpy as np
import tensorflow as tf
import threading
BATCH_SIZE = 5
TRAINING_ITERS = 4100
feature_input = tf.placeholder(tf.float32, shape=[128])
label_input = tf.placeholder(tf.float32, shape=[128])
q = tf.FIFOQueue(200, [tf.float32, tf.float32], shapes=[[128], [128]])
enqueue_op = q.enqueue([label_input, feature_input])
label_batch, feature_batch = q.dequeue_many(BATCH_SIZE)
c = tf.reshape(feature_batch, [BATCH_SIZE, 128]) + tf.reshape(label_batch, [BATCH_SIZE, 128])
sess = tf.Session()
def load_and_enqueue(sess, enqueue_op, coord):
with open('dummy_data/features.bin') as feature_file, open('dummy_data/labels.bin') as label_file:
while not coord.should_stop():
feature_array = np.fromfile(feature_file, np.float32, 128)
if feature_array.shape[0] == 0:
print('reach end of file, reset using seek(0,0)')
feature_file.seek(0,0)
label_file.seek(0,0)
continue
label_value = np.fromfile(label_file, np.float32, 128)
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
coord = tf.train.Coordinator()
t = threading.Thread(target=load_and_enqueue, args=(sess,enqueue_op, coord))
t.start()
for i in range(TRAINING_ITERS):
sum = sess.run(c)
print('train_iter='+str(i))
print(sum)
coord.request_stop()
coord.join([t])

This is a common use case, and most implementations use TensorFlow's queues to decouple the preprocessing code from the training code. There is a tutorial on how to use queues, but the main steps are as follows:
Define a queue, q, that will buffer the preprocessed data. TensorFlow supports the simple tf.FIFOQueue that produces elements in the order they were enqueued, and the more advanced tf.RandomShuffleQueue that produces elements in a random order. A queue element is a tuple of one or more tensors (which can have different types and shapes). All queues support single-element (enqueue, dequeue) and batch (enqueue_many, dequeue_many) operations, but to use the batch operations you must specify the shapes of each tensor in a queue element when constructing the queue.
Build a subgraph that enqueues preprocessed elements into the queue. One way to do this would be to define some tf.placeholder() ops for tensors corresponding to a single input example, then pass them to q.enqueue(). (If your preprocessing produces a batch at once, you should use q.enqueue_many() instead.) You might also include TensorFlow ops in this subgraph.
Build a subgraph that performs training. This will look like a regular TensorFlow graph, but will get its input by calling q.dequeue_many(BATCH_SIZE).
Start your session.
Create one or more threads that execute your preprocessing logic, then execute the enqueue op, feeding in the preprocessed data. You may find the tf.train.Coordinator and tf.train.QueueRunner utility classes useful for this.
Run your training graph (optimizer, etc.) as normal.
EDIT: Here's a simple load_and_enqueue() function and code fragment to get you started:
# Features are length-100 vectors of floats
feature_input = tf.placeholder(tf.float32, shape=[100])
# Labels are scalar integers.
label_input = tf.placeholder(tf.int32, shape=[])
# Alternatively, could do:
# feature_batch_input = tf.placeholder(tf.float32, shape=[None, 100])
# label_batch_input = tf.placeholder(tf.int32, shape=[None])
q = tf.FIFOQueue(100, [tf.float32, tf.int32], shapes=[[100], []])
enqueue_op = q.enqueue([feature_input, label_input])
# For batch input, do:
# enqueue_op = q.enqueue_many([feature_batch_input, label_batch_input])
feature_batch, label_batch = q.dequeue_many(BATCH_SIZE)
# Build rest of model taking label_batch, feature_batch as input.
# [...]
train_op = ...
sess = tf.Session()
def load_and_enqueue():
with open(...) as feature_file, open(...) as label_file:
while True:
feature_array = numpy.fromfile(feature_file, numpy.float32, 100)
if not feature_array:
return
label_value = numpy.fromfile(feature_file, numpy.int32, 1)[0]
sess.run(enqueue_op, feed_dict={feature_input: feature_array,
label_input: label_value})
# Start a thread to enqueue data asynchronously, and hide I/O latency.
t = threading.Thread(target=load_and_enqueue)
t.start()
for _ in range(TRAINING_EPOCHS):
sess.run(train_op)

In other words, one thread does data preprocessing and the other does training. Is this possible in TensorFlow?
Yes, it is. mrry's solution works, but simpler exists.
Fetching data
tf.py_func wraps a python function and uses it as a TensorFlow operator. So we can load the data at sess.run() each time. The problem with this approach is that data is loaded during sess.run() via the main thread.
A minimal example:
def get_numpy_tensor():
return np.array([[1,2],[3,4]], dtype=np.float32)
tensorflow_tensor = tf.py_func(get_numpy_tensor, [], tf.float32)
A more complex example:
def get_numpy_tensors():
# Load data from the disk into numpy arrays.
input = np.array([[1,2],[3,4]], dtype=np.float32)
target = np.int32(1)
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
tensorflow_input, tensorflow_target = 2*tensorflow_input, 2*tensorflow_target
sess = tf.InteractiveSession()
numpy_input, numpy_target = sess.run([tensorflow_input, tensorflow_target])
assert np.all(numpy_input==np.array([[2,4],[6,8]])) and numpy_target==2
Prefetching data in another thread
To queue our data in another thread (so that sess.run() won't have to wait for the data), we can use tf.train.batch() on our operators from tf.py_func().
A minimal example:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32, shapes=[tensor_shape])
# Run `tf.train.start_queue_runners()` once session is created.
We can omit the argument shapes if tensorflow_tensor has its shape specified:
tensor_shape = get_numpy_tensor().shape
tensorflow_tensor.set_shape(tensor_shape)
tensorflow_tensors = tf.train.batch([tensorflow_tensor], batch_size=32)
# Run `tf.train.start_queue_runners()` once session is created.
A more complex example:
input_shape, target_shape = (2, 2), ()
def get_numpy_tensors():
input = np.random.rand(*input_shape).astype(np.float32)
target = np.random.randint(10, dtype=np.int32)
print('f', end='')
return input, target
tensorflow_input, tensorflow_target = tf.py_func(get_numpy_tensors, [], [tf.float32, tf.int32])
batch_size = 2
tensorflow_inputs, tensorflow_targets = tf.train.batch([tensorflow_input, tensorflow_target], batch_size, shapes=[input_shape, target_shape], capacity=2)
# Internal queue will contain at most `capasity=2` times `batch_size=2` elements `[tensorflow_input, tensorflow_target]`.
tensorflow_inputs, tensorflow_targets = 2*tensorflow_inputs, 2*tensorflow_targets
sess = tf.InteractiveSession()
tf.train.start_queue_runners() # Internally, `tf.train.batch` uses a QueueRunner, so we need to ask tf to start it.
for _ in range(10):
numpy_inputs, numpy_targets = sess.run([tensorflow_inputs, tensorflow_targets])
assert numpy_inputs.shape==(batch_size, *input_shape) and numpy_targets.shape==(batch_size, *target_shape)
print('r', end='')
# Prints `fffffrrffrfrffrffrffrffrffrffrf`.
In case get_numpy_tensor() returns a batch of tensors, then tf.train.batch(..., enqueue_many=True) will help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tensorflow Race conditions when chaining multiple queues - python

Related

How to use crop huge batch of images in tensorflow

Matrix factorization based recommendation using Tensorflow

Restoring queue state in Tensorflow from checkpoint

TF slice_input_producer not keeping tensors in sync

How to prefetch data using a custom python function in tensorflow

Categories

Resources