Batch training in Tensorflow Slim

Batch training in Tensorflow Slim - python

I am looking at TF Slim introductory document and from what I understand, it only takes in one batch of image data at each run(32 images). Obviously, one wants to loop through this and train for many different batches. The intro does not cover this. How can this be done properly. I imagine there should be some way to specify a load batch function which should be called automatically when starting a batch training event, but I can't seem to find a simple example for this on the intro.
# Note that this may take several minutes.
import os
from datasets import flowers
from nets import inception
from preprocessing import inception_preprocessing
slim = tf.contrib.slim
image_size = inception.inception_v1.default_image_size
def get_init_fn():
"""Returns a function run by the chief worker to warm-start the training."""
checkpoint_exclude_scopes=["InceptionV1/Logits", "InceptionV1/AuxLogits"]
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
break
if not excluded:
variables_to_restore.append(var)
return slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
variables_to_restore)
train_dir = '/tmp/inception_finetuned/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset, height=image_size, width=image_size)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
# Specify the loss function:
one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
slim.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = slim.losses.get_total_loss()
# Create some summaries to visualize the training process:
tf.scalar_summary('losses/Total Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training:
final_loss = slim.learning.train(
train_op,
logdir=train_dir,
init_fn=get_init_fn(),
number_of_steps=2)
print('Finished training. Last batch loss %f' % final_loss)

The slim.learning.train function contains a training loop, so the code you've given does in fact train on multiple batches of images.
See here in the source code, where train_step_fn is called within a while loop. train_step (the default value of train_step_fn) contains the line sess.run([train_op, global_step]...), which actually runs the training operation on a single batch of images.

Related

How to generate predictions from new data using trained tensorflow network?

I want to train Googles VGGish network (Hershey et al 2017) from scratch to predict classes specific to my own audio files.
For this I am using the vggish_train_demo.py script available on their github repo which uses tensorflow. I've been able to modify the script to extract melspec features from my own audio by changing the _get_examples_batch() function, and, then train the model on the output of this function. This runs to completetion and prints the loss at each epoch.
However, I've been unable to figure out how to get this trained model to generate predictions from new data. Can this be done with changes to the vggish_train_demo.py script?

For anyone who stumbles across this in the future, I wrote this script which does the job. You must save logmel specs for train and test data in the arrays: X_train, y_train, X_test, y_test. The X_train/test are arrays of the (n, 96,64) features and the y_train/test are arrays of shape (n, _NUM_CLASSES) for two classes, where n = the number of 0.96s audio segments and _NUM_CLASSES = the number of classes used.
See the function definition statement for more info and the vggish github in my original post:
### Run the network and save the predictions and accuracy at each epoch
### Train NN, output results
r"""This uses the VGGish model definition within a larger model which adds two
layers on top, and then trains this larger model.
We input log-mel spectrograms (X_train) calculated above with associated labels
(y_train), and feed the batches into the model. Once the model is trained, it
is then executed on the test log-mel spectrograms (X_test), and the accuracy is
ouput, alongside a .csv file with the predictions for each 0.96s chunk and their
true class."""
def main(X):
with tf.Graph().as_default(), tf.Session() as sess:
# Define VGGish.
embeddings = vggish_slim.define_vggish_slim(training=FLAGS.train_vggish)
# Define a shallow classification model and associated training ops on top
# of VGGish.
with tf.variable_scope('mymodel'):
# Add a fully connected layer with 100 units. Add an activation function
# to the embeddings since they are pre-activation.
num_units = 100
fc = slim.fully_connected(tf.nn.relu(embeddings), num_units)
# Add a classifier layer at the end, consisting of parallel logistic
# classifiers, one per class. This allows for multi-class tasks.
logits = slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='logits')
tf.sigmoid(logits, name='prediction')
linear_out= slim.fully_connected(
fc, _NUM_CLASSES, activation_fn=None, scope='linear_out')
logits = tf.sigmoid(linear_out, name='logits')
# Add training ops.
with tf.variable_scope('train'):
global_step = tf.train.create_global_step()
# Labels are assumed to be fed as a batch multi-hot vectors, with
# a 1 in the position of each positive class label, and 0 elsewhere.
labels_input = tf.placeholder(
tf.float32, shape=(None, _NUM_CLASSES), name='labels')
# Cross-entropy label loss.
xent = tf.nn.sigmoid_cross_entropy_with_logits(
logits=logits, labels=labels_input, name='xent')
loss = tf.reduce_mean(xent, name='loss_op')
tf.summary.scalar('loss', loss)
# We use the same optimizer and hyperparameters as used to train VGGish.
optimizer = tf.train.AdamOptimizer(
learning_rate=vggish_params.LEARNING_RATE,
epsilon=vggish_params.ADAM_EPSILON)
train_op = optimizer.minimize(loss, global_step=global_step)
# Initialize all variables in the model, and then load the pre-trained
# VGGish checkpoint.
sess.run(tf.global_variables_initializer())
vggish_slim.load_vggish_slim_checkpoint(sess, FLAGS.checkpoint)
# The training loop.
features_input = sess.graph.get_tensor_by_name(
vggish_params.INPUT_TENSOR_NAME)
accuracy_scores = []
for epoch in range(num_epochs):#FLAGS.num_batches):
epoch_loss = 0
i=0
while i < len(X_train):
start = i
end = i+batch_size
batch_x = np.array(X_train[start:end])
batch_y = np.array(y_train[start:end])
_, c = sess.run([train_op, loss], feed_dict={features_input: batch_x, labels_input: batch_y})
epoch_loss += c
i+=batch_size
#print no. of epochs and loss
print('Epoch', epoch+1, 'completed out of', num_epochs,', loss:',epoch_loss) #FLAGS.num_batches,', loss:',epoch_loss)
#If these lines are left here, it will evaluate on the test data every iteration and print accuracy
#note this adds a small computational cost
correct = tf.equal(tf.argmax(logits, 1), tf.argmax(labels_input, 1)) #This line returns the max value of each array, which we want to be the same (think the prediction/logits is value given to each class with the highest value being the best match)
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) #changes correct to type: float
accuracy1 = accuracy.eval({features_input:X_test, labels_input:y_test})
accuracy_scores.append(accuracy1)
print('Accuracy:', accuracy1)#TF is smart so just knows to feed it through the model without us seeming to tell it to.
#Save predictions for test data
predictions_sigm = logits.eval(feed_dict = {features_input:X_test}) #not really _sigm, change back later
#print(predictions_sigm) #shows table of predictions, meaningless if saving at each epoch
test_preds = pd.DataFrame(predictions_sigm, columns = col_names) #converts predictions to df
true_class = np.argmax(y_test, axis = 1) #This saves the true class
test_preds['True class'] = true_class #This adds true class to the df
#Saves csv file of table of predictions for test data. NB. header will not save when using np.text for some reason
np.savetxt("/content/drive/MyDrive/..."+"Epoch_"+str(epoch+1)+"_Accuracy_"+str(accuracy1), test_preds.values, delimiter=",")
if __name__ == '__main__':
tf.app.run()
#'An exception has occurred, use %tb to see the full traceback.' error will occur, fear not, this just means its finished (perhaps as its exited the tensorflow session?)

Training and testing CNN with pytorch. With and without model.eval()

I have two questions:-
I am trying to train a convolution neural network initialized with some pre trained weights (Netwrok contains batch normalization layers as well) (taking reference from here). Before training I want to calculate a validation error using loss_fn = torch.nn.MSELoss().cuda().
And in the reference, the author is using model.eval() before calculating the validation error. But with that result, the CNN model is off from what it should be however when I comment out model.eval(), the output is good (what it should be with pre-trained weights). What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
While calculating the validation error with pre-trained weights and above mentioned loss function what should be the batch size. Shouldn't it be 1 as i want output on each of my input, calculate error with ground truth and in the end take average of all results. If i use higher batch size error is increased. So question is can i use higher batch size if yes what should be the right way. In given code i have given err = float(loss_local) / num_samples but i observed without averaging i.e err = float(loss_local). Error is different for different batch size. I am doing this without model.eval right now.
batch_size = 1
data_path = 'path_to_data'
dtype = torch.FloatTensor
weight_file = 'path_to_weight_file'
val_loader = torch.utils.data.DataLoader(NyuDepthLoader(data_path, val_lists),batch_size=batch_size, shuffle=True, drop_last=True)
model = Model(batch_size)
model.load_state_dict(load_weights(model, weight_file, dtype))
loss_fn = torch.nn.MSELoss().cuda()
# model.eval()
with torch.no_grad():
for input, depth in val_loader:
input_var = Variable(input.type(dtype))
depth_var = Variable(depth.type(dtype))
output = model(input_var)
input_rgb_image = input_var[0].data.permute(1, 2, 0).cpu().numpy().astype(np.uint8)
input_gt_depth_image = depth_var[0][0].data.cpu().numpy().astype(np.float32)
pred_depth_image = output[0].data.squeeze().cpu().numpy().astype(np.float32)
print (format(type(depth_var)))
pred_depth_image_resize = cv2.resize(pred_depth_image, dsize=(608, 456), interpolation=cv2.INTER_LINEAR)
target_depth_transform = transforms.Compose([flow_transforms.ArrayToTensor()])
pred_depth_image_tensor = target_depth_transform(pred_depth_image_resize)
#both inputs to loss_fn are 'torch.Tensor'
loss_local += loss_fn(pred_depth_image_tensor, depth_var)
num_samples += 1
print ('num_samples {}'.format(num_samples))
err = float(loss_local) / num_samples
print('val_error before train:', err)

What could be reason behind it as I have read on many posts that model.eval should be used before testing the model and model.train() before training it.
Note: testing the model is called inference.
As explained in the official documentation:
Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.
So this code must be present once you load the model from a file and do inference.
# Model class must be defined somewhere
model = torch.load(PATH)
model.eval()
This is because dropout works as a regularization for preventing overfitting during training, it is not needed for inference. Same for the batch norms.
When you use eval() this just sets module train label to False and affects only certain types of modules in particular Dropout and BatchNorm.

Issue with fine-tuning inceptionv3 in slim tensorflow and tf record batches

I am trying to fine-tune inceptionv3 model using slim tensorflow library.
I am unable to understand certain things while writing the code for it. I tried to read source code (no proper documentation) and figured out few things and I am able to fine-tune it and save the check point. Here are the steps I followed
1. I created a tf.record for my training data which is fine, now I am reading the data using the below code.
import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
import tensorflow.contrib.slim as slim
import matplotlib.pyplot as plt
import numpy as np
# get the data and labels here
data_path = '/home/sfarkya/nvidia_challenge/datasets/detrac/train1.tfrecords'
# Training setting
num_epochs = 100
initial_learning_rate = 0.0002
learning_rate_decay_factor = 0.7
num_epochs_before_decay = 5
num_classes = 5980
# load the checkpoint
model_path = '/home/sfarkya/nvidia_challenge/datasets/detrac/inception_v3.ckpt'
# log directory
log_dir = '/home/sfarkya/nvidia_challenge/datasets/detrac/fine_tuned_model'
with tf.Session() as sess:
feature = {'train/image': tf.FixedLenFeature([], tf.string),
'train/label': tf.FixedLenFeature([], tf.int64)}
# Create a list of filenames and pass it to a queue
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
# Define a reader and read the next record
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
# Decode the record read by the reader
features = tf.parse_single_example(serialized_example, features=feature)
# Convert the image data from string back to the numbers
image = tf.decode_raw(features['train/image'], tf.float32)
# Cast label data into int32
label = tf.cast(features['train/label'], tf.int32)
# Reshape image data into the original shape
image = tf.reshape(image, [128, 128, 3])
# Creates batches by randomly shuffling tensors
images, labels = tf.train.shuffle_batch([image, label], batch_size=64, capacity=128, num_threads=2,
min_after_dequeue=64)
Now I am finetuning the model using slim and this is the code.
init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
# Create a coordinator and run all QueueRunner objects
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
# load model
# load the inception model from the slim library - we are using inception v3
#inputL = tf.placeholder(tf.float32, (64, 128, 128, 3))
img, lbl = sess.run([images, labels])
one_hot_labels = slim.one_hot_encoding(lbl, num_classes)
with slim.arg_scope(slim.nets.inception.inception_v3_arg_scope()):
logits, inceptionv3 = nets.inception.inception_v3(inputs=img, num_classes=5980, is_training=True,
dropout_keep_prob=.6)
# Restore convolutional layers:
variables_to_restore = slim.get_variables_to_restore(exclude=['InceptionV3/Logits', 'InceptionV3/AuxLogits'])
init_fn = slim.assign_from_checkpoint_fn(model_path, variables_to_restore)
# loss function
loss = tf.losses.softmax_cross_entropy(onehot_labels=one_hot_labels, logits = logits)
total_loss = tf.losses.get_total_loss()
# train operation
train_op = slim.learning.create_train_op(total_loss + loss, optimizer= tf.train.AdamOptimizer(learning_rate=1e-4))
print('Im here')
# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn, save_interval_secs=20, number_of_steps= 10)
Now I have few questions about the code, which I am quite unable to figure out. Once, the code reaches slim.learning.train I don't see anything printing however, it's training, I can see in the log. Now,
1. How do I give the number of epochs to the code? Right now it's running step by step with each step has batch_size = 64.
2. How do I make sure that in the code tf.train.shuffle_batch I am not repeating my images and I am training over the whole dataset?
3. How can I print the loss values while it's training?

Here are answers to your questions.
You cannot give epochs directly to slim.learning.train. Instead, you give the number of batches as the argument. It is called number_of_steps. It is used to set an operation called should_stop_op on line 709. I assume you know how to convert number of epochs to batches.
I don't think the shuffle_batch function will repeat images because internally it uses the RandomShuffleQueue. According to this answer, the RandomShuffleQueue enqueues elements using a background thread as:
While size(queue) < capacity:
Add an element to the queue
It dequeues elements as:
While the number of elements dequeued < batch_size:
Wait until the size(queue) >= min_after_dequeue + 1 elements.
Select an element from the queue uniformly at random, remove it from the queue, and add it the output batch.
So in my opinion, there is very little chance that the elements would be repeated, because in the dequeuing operation, the chosen element is removed from the queue. So it is sampling without replacement.
Will a new queue be created for every epoch?
The tensors being inputted to tf.train.shuffle_batch are image and label which ultimately come from the filename_queue. If that queue is producing TFRecord filenames indefinitely, then I don't think a new queue will be created by shuffle_batch. You can also create a toy code like this to understand how shuffle_batch works.
Coming to the next point, how to train over the whole dataset? In your code, the following line gets the list of TFRecord filenames.
filename_queue = tf.train.string_input_producer([data_path], num_epochs=1)
If filename_queue covers all TFRecords that you have, then you are surely training over the entire dataset. Now, how to shuffle the entire dataset is another question. As mentioned here by #mrry, there is no support (yet, AFAIK) to shuffle out-of-memory datasets. So the best way is to prepare many shards of your dataset such that each shard contains about 1024 examples. Shuffle the list of TFRecord filenames as:
filename_queue = tf.train.string_input_producer([data_path], shuffle=True, capacity=1000)
Note that I removed the num_epochs = 1 argument and set shuffle=True. This way it will produce the shuffled list of TFRecord filenames indefinitely. Now on each file, if you use tf.train.shuffle_batch, you will get a near-to-uniform shuffling. Basically, as the number of examples in each shard tend to 1, your shuffling will get more and more uniform. I like to not set num_epochs and instead terminate the training using the number_of_steps argument mentioned earlier.
To print the loss values, you could probably just edit the training.py and introduce logging.info('total loss = %f', total_loss). I don't know if there is any simpler way. Another way without changing the code is to view summaries in Tensorboard.
There are very helpful articles on how to view summaries in Tensorboard, including the link at the end of this answer. Generally, you need to do the following things.
Create summary object.
Write variables of interest into summary.
Merge all individual summaries.
Create a summary op.
Create a summary file writer.
Write the summaries throughout the training at a desired frequency.
Now steps 5 and 6 are already done automatically for you if you use slim.learning.train.
For first 4 steps, you could check the file train_image_classifier.py. Line 472 shows you how to create a summaries object. Lines 490, 512 and 536 write the relevant variables into summaries. Line 549 merges all summaries and the line 553 creates an op. You can pass this op to slim.learning.train and you can also specify how frequently you want to write summaries. In my opinion, do not write anything apart from loss, total_loss, accuracy and learning rate into the summaries, unless you want to do specific debugging. If you write histograms, then the tensorboard file could take tens of hours to load for networks like ResNet-50 (my tensorboard file once was 28 GB, which took 12 hours to load the progress of 6 days!). By the way, you could actually use train_image_classifier.py file to finetune and you will skip most of the steps above. However, I prefer this as you get to learn a lot of things.
See the launching tensorboard section on how to view the progress in a browser.
Additional remarks:
Instead of minimizing total_loss + loss, you could do the following:
loss = tf.losses.softmax_cross_entropy(onehot_labels=one_hot_labels, logits = logits)
tf.losses.add_loss(loss)
total_loss = tf.losses.get_total_loss()
train_op = slim.learning.create_train_op(total_loss, optimizer=tf.train.AdamOptimizer(learning_rate=1e-4))
I found this post to be very useful when I was learning Tensorflow.

Tensorflow - Using batching to make predictions

I'm attempting to make predictions using a trained convolutional neural network, slightly modified from the example in the example expert tensorflow tutorial. I have followed the instructions at https://www.tensorflow.org/versions/master/how_tos/reading_data/index.html to read data from a CSV file.
I have trained the model and evaluated its accuracy. I then saved the model and loaded it into a new python script for making predictions. Can I still use the batching method detailed in the link above or should I use feed_dict instead? Most tutorials I've seen online use the latter.
My code is shown below, I have essentially duplicated the code for reading from my training data, which was stored as lines within a single .csv file. Conv_nn is simply a class that contains the convolutional neural network detailed in the expert MNIST tutorial. Most of the content is probably not very useful except for the part where I run the graph.
I suspect I have badly mixed up training and prediction - I'm not sure if the test images are being fed to the prediction operation correctly or if it is valid to use the same batch operations for both datasets.
filename_queue = tf.train.string_input_producer(["data/test.csv"],num_epochs=None)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Defaults force key value and label to int, all others to float.
record_defaults = [[1]]+[[46]]+[[1.0] for i in range(436)]
# Reads in a single row from the CSV and outputs a list of scalars.
csv_list = tf.decode_csv(value, record_defaults=record_defaults)
# Packs the different columns into separate feature tensors.
location = tf.pack(csv_list[2:4])
bbox = tf.pack(csv_list[5:8])
pix_feats = tf.pack(csv_list[9:])
onehot = tf.one_hot(csv_list[1], depth=98)
keep_prob = 0.5
# Creates batches of images and labels.
image_batch, label_batch = tf.train.shuffle_batch(
[pix_feats, onehot],
batch_size=50,num_threads=4,capacity=50000,min_after_dequeue=10000)
# Creates a graph of variables and operation nodes.
nn = Conv_nn(x=image_batch,keep_prob=keep_prob,pixels=33*13,outputs=98)
# Launch the default graph.
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
saver.restore(sess, 'model1.ckpt')
print("Model restored.")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
prediction=tf.argmax(nn.y_conv,1)
pred = sess.run([prediction])
coord.request_stop()
coord.join(threads)

This question is old, but I am going to answer anyway, as it has been viewed nearly 1000 times.
So if your model had Y labels and X inputs then
prediction=tf.argmax(Y,1)
result = prediction.eval(feed_dict={X: [data]}, session=sess)
This evaluates a single input, for example a single mnist image, but it can be a batch.

Tensorflow: restoring a graph and model then running evaluation on a single image

I think it would be immensely helpful to the Tensorflow community if there was a well-documented solution to the crucial task of testing a single new image against the model created by the convnet in the CIFAR-10 tutorial.
I may be wrong, but this critical step that makes the trained model usable in practice seems to be lacking. There is a "missing link" in that tutorial—a script that would directly load a single image (as array or binary), compare it against the trained model, and return a classification.
Prior answers give partial solutions that explain the overall approach, but none of which I've been able to implement successfully. Other bits and pieces can be found here and there, but unfortunately haven't added up to a working solution. Kindly consider the research I've done, before tagging this as duplicate or already answered.
Tensorflow: how to save/restore a model?
Restoring TensorFlow model
Unable to restore models in tensorflow v0.8
https://gist.github.com/nikitakit/6ef3b72be67b86cb7868
The most popular answer is the first, in which #RyanSepassi and #YaroslavBulatov describe the problem and an approach: one needs to "manually construct a graph with identical node names, and use Saver to load the weights into it". Although both answers are helpful, it is not apparent how one would go about plugging this into the CIFAR-10 project.
A fully functional solution would be highly desirable so we could port it to other single image classification problems. There are several questions on SO in this regard that ask for this, but still no full answer (for example Load checkpoint and evaluate single image with tensorflow DNN).
I hope we can converge on a working script that everyone could use.
The below script is not yet functional, and I'd be happy to hear from you on how this can be improved to provide a solution for single-image classification using the CIFAR-10 TF tutorial trained model.
Assume all variables, file names etc. are untouched from the original tutorial.
New file: cifar10_eval_single.py
import cv2
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('eval_dir', './input/eval',
"""Directory where to write event logs.""")
tf.app.flags.DEFINE_string('checkpoint_dir', './input/train',
"""Directory where to read model checkpoints.""")
def get_single_img():
file_path = './input/data/single/test_image.tif'
pixels = cv2.imread(file_path, 0)
return pixels
def eval_single_img():
# below code adapted from #RyanSepassi, however not functional
# among other errors, saver throws an error that there are no
# variables to save
with tf.Graph().as_default():
# Get image.
image = get_single_img()
# Build a Graph.
# TODO
# Create dummy variables.
x = tf.placeholder(tf.float32)
w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print('Checkpoint found')
else:
print('No checkpoint found')
# Run the model to get predictions
predictions = sess.run(y_hat, feed_dict={x: image})
print(predictions)
def main(argv=None):
if tf.gfile.Exists(FLAGS.eval_dir):
tf.gfile.DeleteRecursively(FLAGS.eval_dir)
tf.gfile.MakeDirs(FLAGS.eval_dir)
eval_single_img()
if __name__ == '__main__':
tf.app.run()

There are two methods to feed a single new image to the cifar10 model. The first method is a cleaner approach but requires modification in the main file, hence will require retraining. The second method is applicable when a user does not want to modify the model files and instead wants to use the existing check-point/meta-graph files.
The code for the first approach is as follows:
import tensorflow as tf
import numpy as np
import cv2
sess = tf.Session('', tf.Graph())
with sess.graph.as_default():
# Read meta graph and checkpoint to restore tf session
saver = tf.train.import_meta_graph("/tmp/cifar10_train/model.ckpt-200.meta")
saver.restore(sess, "/tmp/cifar10_train/model.ckpt-200")
# Read a single image from a file.
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
# Start the queue runners. If they are not started the program will hang
# see e.g. https://www.tensorflow.org/programmers_guide/reading_data
coord = tf.train.Coordinator()
threads = []
for qr in sess.graph.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
start=True))
# In the graph created above, feed "is_training" and "imgs" placeholders.
# Feeding them will disconnect the path from queue runners to the graph
# and enable a path from the placeholder instead. The "img" placeholder will be
# fed with the image that was read above.
logits = sess.run('softmax_linear/softmax_linear:0',
feed_dict={'is_training:0': False, 'imgs:0': img})
#Print classifiction results.
print(logits)
The script requires that a user creates two placeholders and a conditional execution statement for it to work.
The placeholders and conditional execution statement are added in cifar10_train.py as shown below:
def train():
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
global_step = tf.contrib.framework.get_or_create_global_step()
with tf.device('/cpu:0'):
images, labels = cifar10.distorted_inputs()
is_training = tf.placeholder(dtype=bool,shape=(),name='is_training')
imgs = tf.placeholder(tf.float32, (1, 32, 32, 3), name='imgs')
images = tf.cond(is_training, lambda:images, lambda:imgs)
logits = cifar10.inference(images)
The inputs in cifar10 model are connected to queue runner object which is a multistage queue that can prefetch data from files in parallel. See a nice animation of queue runner here
While queue runners are efficient in prefetching large dataset for training, they are an overkill for inference/testing where only a single file is needed to be classified, also they are a bit more involved to modify/maintain.
For that reason, I have added a placeholder "is_training", which is set to False while training as shown below:
import numpy as np
tmp_img = np.ndarray(shape=(1,32,32,3), dtype=float)
with tf.train.MonitoredTrainingSession(
checkpoint_dir=FLAGS.train_dir,
hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),
tf.train.NanTensorHook(loss),
_LoggerHook()],
config=tf.ConfigProto(
log_device_placement=FLAGS.log_device_placement)) as mon_sess:
while not mon_sess.should_stop():
mon_sess.run(train_op, feed_dict={is_training: True, imgs: tmp_img})
Another placeholder "imgs" holds a tensor of shape (1,32,32,3) for the image that will be fed during inference -- the first dimension is the batch size which is one in this case. I have modified cifar model to accept 32x32 images instead of 24x24 as the original cifar10 images are 32x32.
Finally, the conditional statement feeds the placeholder or queue runner output to the graph. The "is_training" placeholder is set to False during inference and "img" placeholder is fed a numpy array -- the numpy array is reshaped from 3 to 4 dimensional vector to conform to the input tensor to inference function in the model.
That is all there is to it. Any model can be inferred with a single/user defined test data like shown in the script above. Essentially read the graph, feed data to the graph nodes and run the graph to get the final output.
Now the second method. The other approach is to hack cifar10.py and cifar10_eval.py to change batch size to one and replace the data coming from the queue runner with the one read from a file.
Set batch size to 1:
tf.app.flags.DEFINE_integer('batch_size', 1,
"""Number of images to process in a batch.""")
Call inference with an image file read.
def evaluate(): with tf.Graph().as_default() as g:
# Get images and labels for CIFAR-10.
eval_data = FLAGS.eval_data == 'test'
images, labels = cifar10.inputs(eval_data=eval_data)
import cv2
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
img = tf.cast(img, tf.float32)
logits = cifar10.inference(img)
Then pass logits to eval_once and modify eval once to evaluate logits:
def eval_once(saver, summary_writer, top_k_op, logits, summary_op):
...
while step < num_iter and not coord.should_stop():
predictions = sess.run([top_k_op])
print(sess.run(logits))
There is no separate script to run this method of inference, just run cifar10_eval.py which will now read a file from the user defined location with a batch size of one.

Here's how I ran a single image at a time. I'll admit it seems a bit hacky with the reuse of getting the scope.
This is a helper function
def restore_vars(saver, sess, chkpt_dir):
""" Restore saved net, global score and step, and epsilons OR
create checkpoint directory for later storage. """
sess.run(tf.initialize_all_variables())
checkpoint_dir = chkpt_dir
if not os.path.exists(checkpoint_dir):
try:
os.makedirs(checkpoint_dir)
except OSError:
pass
path = tf.train.get_checkpoint_state(checkpoint_dir)
#print("path1 = ",path)
#path = tf.train.latest_checkpoint(checkpoint_dir)
print(checkpoint_dir,"path = ",path)
if path is None:
return False
else:
saver.restore(sess, path.model_checkpoint_path)
return True
Here is the main part of the code that runs a single image at a time within the for loop.
to_restore = True
with tf.Session() as sess:
for i in test_img_idx_set:
# Gets the image
images = get_image(i)
images = np.asarray(images,dtype=np.float32)
images = tf.convert_to_tensor(images/255.0)
# resize image to whatever you're model takes in
images = tf.image.resize_images(images,256,256)
images = tf.reshape(images,(1,256,256,3))
images = tf.cast(images, tf.float32)
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
with tf.variable_scope(tf.get_variable_scope()) as scope:
if to_restore:
logits = inference(images)
else:
scope.reuse_variables()
logits = inference(images)
if to_restore:
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
to_restore = False
logit_val = sess.run(logits)
print(logit_val)
Here is an alternative implementation to the above using place holders it's a bit cleaner in my opinion. but I'll leave the above example for historical reasons.
imgs_place = tf.placeholder(tf.float32, shape=[my_img_shape_put_here])
images = tf.reshape(imgs_place,(1,256,256,3))
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
logits = inference(images)
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
with tf.Session() as sess:
for i in test_img_idx_set:
logit_val = sess.run(logits,feed_dict={imgs_place=i})
print(logit_val)

got it working with this
softmax = gn.inference(image)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
with tf.Session() as sess:
saver.restore(sess, ckpt.model_checkpoint_path)
softmaxval = sess.run(softmax)
print(softmaxval)
output
[[ 6.73550041e-03 4.44930716e-04 9.92570221e-01 1.00681427e-06
3.05406687e-08 2.38927707e-04 1.89839399e-12 9.36238484e-06
1.51646684e-09 3.38977535e-09]]

I don't have working code for you I'm afraid, but here's how we often tackle this problem in production:
Save out the GraphDef to disk, using something like write_graph.
Use freeze_graph to load the GraphDef and checkpoints, and save out a GraphDef with the Variables converted into Constants.
Load the GraphDef in something like label_image or classify_image.
For your example this is overkill, but I would at least suggest serializing the graph in the original example as a GraphDef, and then loading it in your script (so you don't have to duplicate the code generating the graph). With the same graph created, you should be able to populate it from a SaverDef, and the freeze_graph script may help as an example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.