I have trained a deep learning model by Tensorflow. The model is saved as a saved_model.pb and also there are variables and checkpoints saved during training. The model's task is regression. The output is a single scalar.
I use the saved model on some new data to test the model accuracy. Now, I want to use the saved model to save the vector just before the last layer which do regression. I'm going to use that vector by different regressors to take and improve the best result. I have also visualized the graph by Tensorboard, and I realized the vector I'm looking for is the output of an operation node in Tensorflow. I attach the screenshot of the desired part of graph.
How it is possible to use a saved model to do that? I mean to save the output vector just before the regression layer. https://drive.google.com/file/d/1tn8kGO6KBYSaD-Yz5Xp5OYQmdAniY84d/view?usp=sharing
Furthermore, I have some codes to get the predicted single scalar.
my_predictor = predictor.from_saved_model(export_dir)
mae = []
for output in read_fn(file_references=file_names,
mode=tf.estimator.ModeKeys.EVAL,
params=READER_PARAMS):
img = output['features']['x']
lbl = output['labels']['y']
test_id = output['img_id']
num_crop_predictions = 4
crop_batch = extract_random_example_array(
image_list=img,
example_size=[32, 32, 32],
n_examples=num_crop_predictions)
y_ = my_predictor.session.run(
fetches=my_predictor._fetch_tensors['logits'],
feed_dict={my_predictor.feed_tensors['x']: crop_batch})
y_ = np.mean(y_)
mae.append(np.abs(y_ - lbl))
I have changed the part
y_ = my_predictor.session.run(
fetches=my_predictor._fetch_tensors['logits'],
feed_dict={my_predictor.feed_tensors['x']: crop_batch})
to
y_ = my_predictor.session.run(
fetches=my_predictor.graph,
feed_dict={my_predictor.feed_tensors['x']: crop_batch})
due to feed the 3D inputs, and fetch the desired node of graph. But, the output of an operation node (which is what I'm looking for (pool/global_avg_pool)) is not in the graph.
This is how I did it on a DNN learning transfer.
Save Trained Model
self._saver.save(self._session, self._get_save_path(name))
Restore Model
## restore the graph
imported_meta = tf.train.import_meta_graph("%s.meta"%self._get_save_path(name))
graph = tf.get_default_graph()
## get reference of output of operation DNN/hidden5_out and set it to variable
## here ":0" indicates the output
self._frozen_out = self._graph.get_tensor_by_name("DNN/hidden5_out:0)
## restore the variables in session
self._session = tf.Session(graph=graph)
imported_meta.restore(self._session, self._get_save_path(name))
Calculate Frozen Output Once
I need to do this because it needs for placeholders to calculate frozen output. I think it is not necessary if you don't need to recalculate tensor value by feeding feed_dict
tensor = sess.eval(self._frozen_out, feed_dict=feed_dict)
Reuse the Frozen Output
## here dnn is an operation which uses output of operation "DNN/hidden5_out"
## I am passing the value of it directly as feed_dict so it will not go down the
## graph to calculate it
sess.eval(dnn, feed_dict={self._frozen_out: tensor})
Hope this helps, comment if it is not clear.
I am working on a Tensorflow estimator using RNN (GRUCell).
I use zero_state to initialize the first state, it requires a fixed size.
My problem is that I want to be able to use the estimator to predict with a single sample (batchsize=1).
When it load the serialized estimator, it complain that the size of the batch I use for prediction does not match the training batch size.
If I reconstruct the estimator with a different batch size, I cannot load what has been serialized.
Is there an elegant way to use zero_state in an estimator?
I saw some solutions using a variable to store batch size, but using feed_dict method. I don't find how to make it work in the context of an estimator.
Here is the core of my simple test RNN in the estimator:
cells = [ tf.nn.rnn_cell.GRUCell(self.getNSize()) for _ in range(self.getNLayers())]
multicell = tf.nn.rnn_cell.MultiRNNCell(cells, state_is_tuple=False)
H_init = tf.Variable( multicell.zero_state( batchsize, dtype=tf.float32 ), trainable=False)
H = tf.Variable( H_init )
Yr, state = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=H)
Would someone have a clue on that?
EDIT:
Ok, I try various things on this problem.
I now try to filter the variables I load from the checkpoint to remove 'H', which is used as internal state of the recurrent cells. For prediction, I can leave it with all 0 values.
So far, I did that:
First I define a hook:
class RestoreHook(tf.train.SessionRunHook):
def __init__(self, init_fn):
self.init_fn = init_fn
def after_create_session(self, session, coord=None):
print("--------------->After create session.")
self.init_fn(session)
Then in my model_fn:
if mode == tf.estimator.ModeKeys.PREDICT:
logits = tf.nn.softmax(logits)
# Do not restore H as it's batch size might be different.
vlist = tf.contrib.framework.get_variables_to_restore()
vlist = [ x for x in vlist if x.name.split(':')[0] != 'architecture/H']
init_fn = tf.contrib.framework.assign_from_checkpoint_fn(tf.train.latest_checkpoint(self.modelDir), vlist, ignore_missing_vars=True)
spec = tf.estimator.EstimatorSpec(mode=mode,
predictions = {
'logits': logits,
},
export_outputs={
'prediction': tf.estimator.export.PredictOutput( logits )
},
prediction_hooks=[RestoreHook(init_fn)])
I took this piece of code from https://github.com/tensorflow/tensorflow/issues/14713
But it does not work yet. It seems that it still trying to load H from the file... I checked that it is not in vlist.
I am still looking for a solution.
You can get batch size form other tensor example
decoder_initial_state = cell.zero_state(array_ops.shape(attention_states)[0],
dtypes.float32).clone(cell_state=encoder_state)
I found a solution:
I create the variables for the initial state for both batchsize=64 and batchsize=1.
At training I use the first one to initialize the RNN.
At Predict time, I use the second one.
It works as both those variables will be serialized and restored by the estimator code so it will not complain.
The drawback is that the query batch size (in my case, 1) bust be known at training time (when it create both variables).
I've started recently to play with tensorflow and, more specifically, with the new dataset API.
I've successfully used a dataset to feed training data to my simple model by plugging dataset's iterators to the nodes of my graph representing input and label. Something like:
input = input_dataset.make_one_shot_iterator().get_next()
label = label_dataset.make_one_shot_iterator().get_next()
Now I'm wondering what to do when I have to do inference on a user input, that is, the user gives me one single input value and I have to make my prediction. If I had a placeholder I would just put the user input in a feed_dict, but with the dataset api I have very little idea how to do something similar. Shall I have a separate graph only for inference in which my input variable is a placeholder?
I've tried already to make a feedable iterator as described here but that only works with a placeholder for strings, while my input are int32.
Thanks for any advice.
For that specific purpose, tensorflow provides tf.placeholder_with_default API
# Create a Dataset
dataset = tf.data.Dataset.zip((input_dataset, label_dataset)).batch(32).repeat(...)
# Create Iterator
input, label = dataset.make_one_shot_iterator()
# Create Placholders
x = tf.placeholder_with_default(input, shape=[...], name='input')
y = tf.placeholder_with_default(label, shape-[...], name='label')
def nn_model(features, labels):
logits = ...
loss = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits_v2(labels=labels, logits=logits))
optimizer = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
return optimizer, loss
# Create Model
train_op, loss_op = nn_model(x, y)
# Training
sess.run(train_op)
# Inference
sess.run(logits, feed_dict={x:..., y:...})
I'm attempting to make predictions using a trained convolutional neural network, slightly modified from the example in the example expert tensorflow tutorial. I have followed the instructions at https://www.tensorflow.org/versions/master/how_tos/reading_data/index.html to read data from a CSV file.
I have trained the model and evaluated its accuracy. I then saved the model and loaded it into a new python script for making predictions. Can I still use the batching method detailed in the link above or should I use feed_dict instead? Most tutorials I've seen online use the latter.
My code is shown below, I have essentially duplicated the code for reading from my training data, which was stored as lines within a single .csv file. Conv_nn is simply a class that contains the convolutional neural network detailed in the expert MNIST tutorial. Most of the content is probably not very useful except for the part where I run the graph.
I suspect I have badly mixed up training and prediction - I'm not sure if the test images are being fed to the prediction operation correctly or if it is valid to use the same batch operations for both datasets.
filename_queue = tf.train.string_input_producer(["data/test.csv"],num_epochs=None)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
# Defaults force key value and label to int, all others to float.
record_defaults = [[1]]+[[46]]+[[1.0] for i in range(436)]
# Reads in a single row from the CSV and outputs a list of scalars.
csv_list = tf.decode_csv(value, record_defaults=record_defaults)
# Packs the different columns into separate feature tensors.
location = tf.pack(csv_list[2:4])
bbox = tf.pack(csv_list[5:8])
pix_feats = tf.pack(csv_list[9:])
onehot = tf.one_hot(csv_list[1], depth=98)
keep_prob = 0.5
# Creates batches of images and labels.
image_batch, label_batch = tf.train.shuffle_batch(
[pix_feats, onehot],
batch_size=50,num_threads=4,capacity=50000,min_after_dequeue=10000)
# Creates a graph of variables and operation nodes.
nn = Conv_nn(x=image_batch,keep_prob=keep_prob,pixels=33*13,outputs=98)
# Launch the default graph.
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
saver.restore(sess, 'model1.ckpt')
print("Model restored.")
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess,coord=coord)
prediction=tf.argmax(nn.y_conv,1)
pred = sess.run([prediction])
coord.request_stop()
coord.join(threads)
This question is old, but I am going to answer anyway, as it has been viewed nearly 1000 times.
So if your model had Y labels and X inputs then
prediction=tf.argmax(Y,1)
result = prediction.eval(feed_dict={X: [data]}, session=sess)
This evaluates a single input, for example a single mnist image, but it can be a batch.
I think it would be immensely helpful to the Tensorflow community if there was a well-documented solution to the crucial task of testing a single new image against the model created by the convnet in the CIFAR-10 tutorial.
I may be wrong, but this critical step that makes the trained model usable in practice seems to be lacking. There is a "missing link" in that tutorial—a script that would directly load a single image (as array or binary), compare it against the trained model, and return a classification.
Prior answers give partial solutions that explain the overall approach, but none of which I've been able to implement successfully. Other bits and pieces can be found here and there, but unfortunately haven't added up to a working solution. Kindly consider the research I've done, before tagging this as duplicate or already answered.
Tensorflow: how to save/restore a model?
Restoring TensorFlow model
Unable to restore models in tensorflow v0.8
https://gist.github.com/nikitakit/6ef3b72be67b86cb7868
The most popular answer is the first, in which #RyanSepassi and #YaroslavBulatov describe the problem and an approach: one needs to "manually construct a graph with identical node names, and use Saver to load the weights into it". Although both answers are helpful, it is not apparent how one would go about plugging this into the CIFAR-10 project.
A fully functional solution would be highly desirable so we could port it to other single image classification problems. There are several questions on SO in this regard that ask for this, but still no full answer (for example Load checkpoint and evaluate single image with tensorflow DNN).
I hope we can converge on a working script that everyone could use.
The below script is not yet functional, and I'd be happy to hear from you on how this can be improved to provide a solution for single-image classification using the CIFAR-10 TF tutorial trained model.
Assume all variables, file names etc. are untouched from the original tutorial.
New file: cifar10_eval_single.py
import cv2
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('eval_dir', './input/eval',
"""Directory where to write event logs.""")
tf.app.flags.DEFINE_string('checkpoint_dir', './input/train',
"""Directory where to read model checkpoints.""")
def get_single_img():
file_path = './input/data/single/test_image.tif'
pixels = cv2.imread(file_path, 0)
return pixels
def eval_single_img():
# below code adapted from #RyanSepassi, however not functional
# among other errors, saver throws an error that there are no
# variables to save
with tf.Graph().as_default():
# Get image.
image = get_single_img()
# Build a Graph.
# TODO
# Create dummy variables.
x = tf.placeholder(tf.float32)
w = tf.Variable(tf.zeros([1, 1], dtype=tf.float32))
b = tf.Variable(tf.ones([1, 1], dtype=tf.float32))
y_hat = tf.add(b, tf.matmul(x, w))
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
print('Checkpoint found')
else:
print('No checkpoint found')
# Run the model to get predictions
predictions = sess.run(y_hat, feed_dict={x: image})
print(predictions)
def main(argv=None):
if tf.gfile.Exists(FLAGS.eval_dir):
tf.gfile.DeleteRecursively(FLAGS.eval_dir)
tf.gfile.MakeDirs(FLAGS.eval_dir)
eval_single_img()
if __name__ == '__main__':
tf.app.run()
There are two methods to feed a single new image to the cifar10 model. The first method is a cleaner approach but requires modification in the main file, hence will require retraining. The second method is applicable when a user does not want to modify the model files and instead wants to use the existing check-point/meta-graph files.
The code for the first approach is as follows:
import tensorflow as tf
import numpy as np
import cv2
sess = tf.Session('', tf.Graph())
with sess.graph.as_default():
# Read meta graph and checkpoint to restore tf session
saver = tf.train.import_meta_graph("/tmp/cifar10_train/model.ckpt-200.meta")
saver.restore(sess, "/tmp/cifar10_train/model.ckpt-200")
# Read a single image from a file.
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
# Start the queue runners. If they are not started the program will hang
# see e.g. https://www.tensorflow.org/programmers_guide/reading_data
coord = tf.train.Coordinator()
threads = []
for qr in sess.graph.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
start=True))
# In the graph created above, feed "is_training" and "imgs" placeholders.
# Feeding them will disconnect the path from queue runners to the graph
# and enable a path from the placeholder instead. The "img" placeholder will be
# fed with the image that was read above.
logits = sess.run('softmax_linear/softmax_linear:0',
feed_dict={'is_training:0': False, 'imgs:0': img})
#Print classifiction results.
print(logits)
The script requires that a user creates two placeholders and a conditional execution statement for it to work.
The placeholders and conditional execution statement are added in cifar10_train.py as shown below:
def train():
"""Train CIFAR-10 for a number of steps."""
with tf.Graph().as_default():
global_step = tf.contrib.framework.get_or_create_global_step()
with tf.device('/cpu:0'):
images, labels = cifar10.distorted_inputs()
is_training = tf.placeholder(dtype=bool,shape=(),name='is_training')
imgs = tf.placeholder(tf.float32, (1, 32, 32, 3), name='imgs')
images = tf.cond(is_training, lambda:images, lambda:imgs)
logits = cifar10.inference(images)
The inputs in cifar10 model are connected to queue runner object which is a multistage queue that can prefetch data from files in parallel. See a nice animation of queue runner here
While queue runners are efficient in prefetching large dataset for training, they are an overkill for inference/testing where only a single file is needed to be classified, also they are a bit more involved to modify/maintain.
For that reason, I have added a placeholder "is_training", which is set to False while training as shown below:
import numpy as np
tmp_img = np.ndarray(shape=(1,32,32,3), dtype=float)
with tf.train.MonitoredTrainingSession(
checkpoint_dir=FLAGS.train_dir,
hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),
tf.train.NanTensorHook(loss),
_LoggerHook()],
config=tf.ConfigProto(
log_device_placement=FLAGS.log_device_placement)) as mon_sess:
while not mon_sess.should_stop():
mon_sess.run(train_op, feed_dict={is_training: True, imgs: tmp_img})
Another placeholder "imgs" holds a tensor of shape (1,32,32,3) for the image that will be fed during inference -- the first dimension is the batch size which is one in this case. I have modified cifar model to accept 32x32 images instead of 24x24 as the original cifar10 images are 32x32.
Finally, the conditional statement feeds the placeholder or queue runner output to the graph. The "is_training" placeholder is set to False during inference and "img" placeholder is fed a numpy array -- the numpy array is reshaped from 3 to 4 dimensional vector to conform to the input tensor to inference function in the model.
That is all there is to it. Any model can be inferred with a single/user defined test data like shown in the script above. Essentially read the graph, feed data to the graph nodes and run the graph to get the final output.
Now the second method. The other approach is to hack cifar10.py and cifar10_eval.py to change batch size to one and replace the data coming from the queue runner with the one read from a file.
Set batch size to 1:
tf.app.flags.DEFINE_integer('batch_size', 1,
"""Number of images to process in a batch.""")
Call inference with an image file read.
def evaluate(): with tf.Graph().as_default() as g:
# Get images and labels for CIFAR-10.
eval_data = FLAGS.eval_data == 'test'
images, labels = cifar10.inputs(eval_data=eval_data)
import cv2
img = cv2.imread('tmp.png')
img = np.expand_dims(img, axis=0)
img = tf.cast(img, tf.float32)
logits = cifar10.inference(img)
Then pass logits to eval_once and modify eval once to evaluate logits:
def eval_once(saver, summary_writer, top_k_op, logits, summary_op):
...
while step < num_iter and not coord.should_stop():
predictions = sess.run([top_k_op])
print(sess.run(logits))
There is no separate script to run this method of inference, just run cifar10_eval.py which will now read a file from the user defined location with a batch size of one.
Here's how I ran a single image at a time. I'll admit it seems a bit hacky with the reuse of getting the scope.
This is a helper function
def restore_vars(saver, sess, chkpt_dir):
""" Restore saved net, global score and step, and epsilons OR
create checkpoint directory for later storage. """
sess.run(tf.initialize_all_variables())
checkpoint_dir = chkpt_dir
if not os.path.exists(checkpoint_dir):
try:
os.makedirs(checkpoint_dir)
except OSError:
pass
path = tf.train.get_checkpoint_state(checkpoint_dir)
#print("path1 = ",path)
#path = tf.train.latest_checkpoint(checkpoint_dir)
print(checkpoint_dir,"path = ",path)
if path is None:
return False
else:
saver.restore(sess, path.model_checkpoint_path)
return True
Here is the main part of the code that runs a single image at a time within the for loop.
to_restore = True
with tf.Session() as sess:
for i in test_img_idx_set:
# Gets the image
images = get_image(i)
images = np.asarray(images,dtype=np.float32)
images = tf.convert_to_tensor(images/255.0)
# resize image to whatever you're model takes in
images = tf.image.resize_images(images,256,256)
images = tf.reshape(images,(1,256,256,3))
images = tf.cast(images, tf.float32)
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
with tf.variable_scope(tf.get_variable_scope()) as scope:
if to_restore:
logits = inference(images)
else:
scope.reuse_variables()
logits = inference(images)
if to_restore:
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
to_restore = False
logit_val = sess.run(logits)
print(logit_val)
Here is an alternative implementation to the above using place holders it's a bit cleaner in my opinion. but I'll leave the above example for historical reasons.
imgs_place = tf.placeholder(tf.float32, shape=[my_img_shape_put_here])
images = tf.reshape(imgs_place,(1,256,256,3))
saver = tf.train.Saver(max_to_keep=5, keep_checkpoint_every_n_hours=1)
#print("infer")
logits = inference(images)
restored = restore_vars(saver, sess,FLAGS.train_dir)
print("restored ",restored)
with tf.Session() as sess:
for i in test_img_idx_set:
logit_val = sess.run(logits,feed_dict={imgs_place=i})
print(logit_val)
got it working with this
softmax = gn.inference(image)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
with tf.Session() as sess:
saver.restore(sess, ckpt.model_checkpoint_path)
softmaxval = sess.run(softmax)
print(softmaxval)
output
[[ 6.73550041e-03 4.44930716e-04 9.92570221e-01 1.00681427e-06
3.05406687e-08 2.38927707e-04 1.89839399e-12 9.36238484e-06
1.51646684e-09 3.38977535e-09]]
I don't have working code for you I'm afraid, but here's how we often tackle this problem in production:
Save out the GraphDef to disk, using something like write_graph.
Use freeze_graph to load the GraphDef and checkpoints, and save out a GraphDef with the Variables converted into Constants.
Load the GraphDef in something like label_image or classify_image.
For your example this is overkill, but I would at least suggest serializing the graph in the original example as a GraphDef, and then loading it in your script (so you don't have to duplicate the code generating the graph). With the same graph created, you should be able to populate it from a SaverDef, and the freeze_graph script may help as an example.