Understanding TensorFlow checkpoint loading?

Understanding TensorFlow checkpoint loading? - python

What's contained in a TF checkpoint? Estimators for example store a separate file that contains the GraphDef proto and you can basically do a tf.import_graph_def(), then create a tf.train.Saver() and restore a checkpoint into the graph. Now if you have another GraphDef describing a completely different graph that just happens to share the exact same variable names together with matching variable dimensions, will you be able to load the checkpoint into that graph? In other words, is it just a variable name to value mapping or does it assume something else about a graph that would be checked during loading? What if you try to load a checkpoint into a graph that is a subset of the original graph (i.e. tensor dimensions and names match, but some names are missing)?

When do people start reading the documentation (?):
https://www.tensorflow.org/mobile/prepare_models
These are different concepts. You can load just the weights as long as the shapes match. If there is a miss-match you just get:
Restoring from checkpoint failed. This is most likely due to a
mismatch between the current graph and the graph from the checkpoint.
Please ensure that you have not altered the graph expected based on
the checkpoint.
However, you can tweak a non-trivial case, where the graph is completely different:
import tensorflow as tf
import numpy as np
test_data = np.arange(4).reshape(1, 2, 2, 1)
# a simple graph and everything is fine
input = tf.placeholder(dtype=tf.float32, shape=[1, 2, 2, 1])
output = tf.layers.conv2d(input, 3, kernel_size=1, name='test', use_bias=False)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(output, {input: test_data}))
saver = tf.train.Saver()
save_path = saver.save(sess, "/tmp/model.ckpt")
print(tf.trainable_variables())
# reset previous elements
tf.reset_default_graph()
# a new graph
input = tf.placeholder(dtype=tf.float32, shape=[1, 2, 2, 1])
# and wait: this is complete different but same name and shape
W = tf.get_variable('test/kernel', shape=[1, 1, 1, 3])
# but the graph has different operations
output = input + W
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.restore(sess, "/tmp/model.ckpt")
print(sess.run(output, {input: test_data}))
In my case I got:
# 1st version (original graph)
[[[[-0. -0. -0. ]
[-0.08429337 -1.0156475 -0.42691123]]
[[-0.16858673 -2.031295 -0.85382247]
[-0.2528801 -3.0469427 -1.2807337 ]]]]
# 2nd version (altered graph)
[[[[-0.08429337 -1.0156475 -0.42691123]
[ 0.91570663 -0.01564753 0.57308877]]
[[ 1.9157066 0.98435247 1.5730888 ]
[ 2.9157066 1.9843525 2.5730886 ]]]]

Related

How to assign new values to a tensorflow constant?

I am loading a TensorFlow model from a .pb file. I want to change the weights of all the layers. I am able to extract the weights but I am not able to change the weights.
I converted the graph_def model to TensorFlow model but even then I cannot assign a new value to the weights as the weights are stored in a tensor of type "Const".
b = graph_tf.get_tensor_by_name("Variable_1:0")
tf.assign(b, np.ones((1,1,64,64)))
I am getting the following error:
AttributeError: 'Tensor' object has no attribute 'assign'
Please provide a way to resolve this issue. Thanks in advance.

Here is one way you can achieve something like that. You want to replace some constant operations with variables initialized to the value of those operations, so you can first extract those constant values, and then create the graph with the variables initalized to those. See the example below.
import tensorflow as tf
# Example graph
with tf.Graph().as_default():
inp = tf.placeholder(tf.float32, [None, 3], name='Input')
w = tf.constant([[1.], [2.], [3.]], tf.float32, name='W')
out = tf.squeeze(inp # w, 1, name='Output')
gd = tf.get_default_graph().as_graph_def()
# Extract weight values
with tf.Graph().as_default():
w, = tf.graph_util.import_graph_def(gd, return_elements=['W:0'])
# Get the constant weight values
with tf.Session() as sess:
w_val = sess.run(w)
# Alternatively, since it is a constant,
# you can get the values from the operation attribute directly
w_val = tf.make_ndarray(w.op.get_attr('value'))
# Make new graph
with tf.Graph().as_default():
# Make variables initialized with stored values
w = tf.Variable(w_val, name='W')
init_op = tf.global_variables_initializer()
# Import graph
inp, out = tf.graph_util.import_graph_def(
gd, input_map={'W:0': w},
return_elements=['Input:0', 'Output:0'])
# Change value operation
w_upd = w[2].assign([5.])
# Test
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(w))
# [[1.]
# [2.]
# [3.]]
sess.run(w_upd)
print(sess.run(w))
# [[1.]
# [2.]
# [5.]]

Restore tf variables in a different graph

I want to use my pretrained separable convolution (which is a part of a bigger module) in another separable convolution in a other model.
In the trained module I tried
with tf.variable_scope('sep_conv_ker' + str(input_shape[-1])):
sep_conv2d = tf.reshape(
tf.layers.separable_conv2d(inputs_flatten,input_shape[-1] ,
[1,input_shape[-2]]
trainable=trainable),
[inputs_flatten.shape[0],1,input_shape[-1],INNER_LAYER_WIDTH])
and
all_variables = tf.trainable_variables()
scope1_variables = tf.contrib.framework.filter_variables(all_variables, include_patterns=['sep_conv_ker'])
sep_conv_weights_saver = tf.train.Saver(scope1_variables, sharded=True, max_to_keep=20)
Inside sess.run
sep_conv_weights_saver.save(sess,os.path.join(LOG_DIR + MODEL_SPEC_LOG_DIR,
"init_weights",MODEL_SPEC_SUFFIX + 'epoch_' + str(epoch) + '.ckpt'))
But I cannot understand when and how should I load the weights to the separable convolution in the other module, it has different name, and different scope,
Furthermore, as I'm using a defined tf.layer does it mean I need to access each individual weight in the new graph and assign it?
My current solution doesn't work and I think that the weights are being initialized after the assignment somehow Furthermore, loading a whole new graph just for few weights seems weird, isn't it?
###IN THE OLD GRAPH###
all_variables = tf.trainable_variables()
scope1_variables = tf.contrib.framework.filter_variables(all_variables, include_patterns=['sep_conv_ker'])
vars = dict((var.op.name.split("/")[-1] + str(idx), var) for idx,var in enumerate(scope1_variables))
sep_conv_weights_saver = tf.train.Saver(vars, sharded=True, max_to_keep=20)
In the new graph is a function that basiclly takes the variables from the old graph and assigning them, loading the meta_graph is redundant
def load_pretrained(sess):
sep_conv2d_vars = [var for var in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if ("sep_conv_ker" in var.op.name)]
var_dict = dict((var.op.name.split("/")[-1] + str(idx), var) for idx, var in enumerate(sep_conv2d_vars))
new_saver = tf.train.import_meta_graph(
tf.train.latest_checkpoint('log/train/sep_conv_ker/global_neighbors40/init_weights') + '.meta')
# saver = tf.train.Saver(var_list=var_dict)
new_saver.restore(sess,
tf.train.latest_checkpoint('log/train/sep_conv_ker/global_neighbors40/init_weights'))
graph = tf.get_default_graph()
sep_conv2d_trained = dict(("".join(var.op.name.split("/")[-2:]),var) for var in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if ("sep_conv_ker_init" in var.op.name))
for var in sep_conv2d_vars:
tf.assign(var,sep_conv2d_trained["".join(var.op.name.split("/")[-2:])])

You need to make sure that the variables have the same in the variable file and in the graph where you load the variables. You can write a script that will convert the variables names.
With tf.contrib.framework.list_variables(ckpt), you can find out what variables of what shapes you have in the checkpoint and create respective variables with the new names (I believe, you can write a regex that will fix the names) and correct shape.
Then you load the original variables with tf.contrib.framework.load_checkpoint(ckpt) assign ops tf.assign(var, loaded) that will assigning the variables with new names with the saved values.
Runn the assign ops in a session.
Save the new variables.
Minimum example:
Original model (variables in scope "regression"):
import tensorflow as tf
x = tf.placeholder(tf.float32, [None, 3])
regression = tf.layers.dense(x, 1, name="regression")
session = tf.Session()
session.run(tf.global_variables_initializer())
saver = tf.train.Saver(tf.trainable_variables())
saver.save(session, './model')
Renaming script:
import tensorflow as tf
assign_ops = []
reader = tf.contrib.framework.load_checkpoint("./model")
for name, shape in tf.contrib.framework.list_variables("./model"):
new_name = name.replace("regression/", "foo/bar/")
new_var = tf.get_variable(new_name, shape)
assign_ops.append(tf.assign(new_var, reader.get_tensor(name)))
session = tf.Session()
saver = tf.train.Saver(tf.trainable_variables())
session.run(assign_ops)
saver.save(session, './model-renamed')
Model where you load the renamed variables (the same variables in score "foo/bar"):
import tensorflow as tf
with tf.variable_scope("foo"):
x = tf.placeholder(tf.float32, [None, 3])
regression = tf.layers.dense(x, 1, name="bar")
session = tf.Session()
session.run(tf.global_variables_initializer())
saver = tf.train.Saver(tf.trainable_variables())
saver.restore(session, './model-renamed')

Understanding next step after saving TensorFlow model

I've a simple MNIST which I've successfully saved, being the code the next:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
import tensorflow as tf
sess = tf.InteractiveSession()
tf_save_file = './mnist-to-save-saved'
x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
saver.save(sess, tf_save_file)
for _ in range(1000):
batch = mnist.train.next_batch(100)
train_step.run(feed_dict={x: batch[0], y_: batch[1]})
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver.save(sess, tf_save_file, global_step=1000)
print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
Then, the next files are generated:
checkpoint
mnist-to-save-saved-1000.data-00000-of-00001
mnist-to-save-saved-1000.index
mnist-to-save-saved-1000.meta
mnist-to-save-saved.data-00000-of-00001
mnist-to-save-saved.index
mnist-to-save-saved.meta
Now, in order to use it in production (and so, for example, pass it a number image), I want to be able to execute the trained model by passing it any number image to make the prediction (I mean, not deploying yet a server but making this prediction "locally", having in the same directory that "fixed" number image, so using the model would be like when you run an executable).
But, considering the (mid-low?) API level of my code, I'm confused about what would be the easiest correct next step (if restoring, using an Estimator, etc...), and how to do it.
Although I've read the official documentation, I insist that they seem to be many ways, but some are a bit complex and "noisy" for a simple model like this.
Edit:
I've edit and re-run the mnist file, whose code is the same as above except for those lines:
...
x = tf.placeholder(tf.float32, shape=[None, 784], name='input')
...
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1), name='result')
...
Then, I try to run this another .py code (in the same directory as the above code) in order to pass a local handwritten number image ("mnist-input-image.png") located in the same directory:
import tensorflow as tf
from PIL import Image
import numpy as np
image_test = Image.open("mnist-input-image.png")
image = np.array(image_test)
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/Users/username/.meta')
new = saver.restore(sess, tf.train.latest_checkpoint('/Users/username/'))
graph = tf.get_default_graph()
input_x = graph.get_tensor_by_name("input:0")
result = graph.get_tensor_by_name("result:0")
feed_dict = {input_x: image}
predictions = result.eval(feed_dict=feed_dict)
print(predictions)
Now, if I correctly understand, I've to pass the image as numpy array. Then, my questions are:
1) Which is the exact file reference of those lines (since I've no .meta folder in my User folder)?
saver = tf.train.import_meta_graph('/Users/username/.meta')
new = saver.restore(sess, tf.train.latest_checkpoint('/Users/username/'))
I mean, to which exact files refer those lines (from my generated files list above)?
2) Translasted to my case, is correct this line to pass my numpy array into the feed dict?
feed_dict = {input_x: image}

A simple solution is to use your session object. When you have generated the checkpoint file, you can restore it with a Saver object.
By the way, do you know why most tutorials have their graph creation inside of a function? One good reason is because you can deserialize the graph quickly with your inputs.
The correct method to start a session is with the following:
# Use your placeholders, variables, etc to create the entire graph.
# Usually you return the input placeholder,
# prediction and the loss/accuracy here.
# You don't need the accuracy.
x, y, _ = make_your_graph(test_X, test_y)
# This object is the interface for serialization in tf
saver = tf.train.Saver()
with tf.Session() as sess:
# Takes your current model's checkpoint. "./checkpoint" is your checkpoint file.
saver.restore(sess, tf.train.latest_checkpoint("./checkpoint"))
prediction = sess.run(y)
Want to run more than 1 data point for your already-booted up session?
Then replace the last line with a feed dict:
while waiting_for_new_y():
another_y = get_new_y()
feed_dict = {x: [another_y]}
another_prediction = sess.run(y, feed_dict)

First of all , give value to name parameter in each object which you want to use later , so that you can use it later by it's name:
change this :
x = tf.placeholder(tf.float32, shape=[None, 784])
to
x = tf.placeholder(tf.float32, shape=[None, 784],name='input')
and
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
to
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1),name='result')
Now run this small script to store model :
import tensorflow as tf
with tf.Session() as sess:
saver = tf.train.import_meta_graph('/Users/dummy/.meta')
new=saver.restore(sess, tf.train.latest_checkpoint('/Users/dummy/'))
graph = tf.get_default_graph()
input_x = graph.get_tensor_by_name("input:0")
result = graph.get_tensor_by_name("result:0")
feed_dict = {input_x: mnist.test.images,} #here you feed your new data for example i am feeding mnist
predictions = result.eval(feed_dict=feed_dict)
print(predictions)
And you will get output.

Can tensorflow Saver be used in different graphs with the same structure

The network structure has already been loaded into the default global graph. I want to create another graph with the same structure and load checkpoints into this graph.
If the code is like this, it will throw error: ValueError: No variables to save in the last line. However, the second line works fine. Why? Does GraphDef returned by as_graph_def() contains variable definition/name?
inference_graph_def = tf.get_default_graph().as_graph_def()
saver = tf.train.Saver()
with tf.Graph().as_default():
tf.import_graph_def(inference_graph_def)
saver1 = tf.train.Saver()
If the code like this, it will throw error Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist in last line. Howerver, it works fine with the 3rd line removed.
inference_graph_def = tf.get_default_graph().as_graph_def()
saver = tf.train.Saver()
with tf.Graph().as_default():
tf.import_graph_def(inference_graph_def)
with session.Session() as sess:
saver.restore(sess, checkpoint_path)
So, does this mean Saver cannot work in different graphs even though they have the same structure?
Any help would be appreciated~

Here's an example of using a MetaGraphDef, which unlike GraphDef saves variable collections, to initialize a new graph using a previously saved graph.
import tensorflow as tf
CHECKPOINT_PATH = "/tmp/first_graph_checkpoint"
with tf.Graph().as_default():
some_variable = tf.get_variable(
name="some_variable",
shape=[2],
dtype=tf.float32)
init_op = tf.global_variables_initializer()
first_meta_graph = tf.train.export_meta_graph()
first_graph_saver = tf.train.Saver()
with tf.Session() as session:
init_op.run()
print("Initialized value in first graph", some_variable.eval())
first_graph_saver.save(
sess=session,
save_path=CHECKPOINT_PATH)
with tf.Graph().as_default():
tf.train.import_meta_graph(first_meta_graph)
second_graph_saver = tf.train.Saver()
with tf.Session() as session:
second_graph_saver.restore(
sess=session,
save_path=CHECKPOINT_PATH)
print("Variable value after restore", tf.global_variables()[0].eval())
Prints something like:
Initialized value in first graph [-0.98926258 -0.09709156]
Variable value after restore [-0.98926258 -0.09709156]
Note that the checkpoint is still important! Loading the MetaGraph does not restore the values of Variables (it doesn't contain those values), just the bookkeeping which tracks their existence (collections). SavedModel format addresses this, bundling MetaGraphs with checkpoints and other metadata for running them.
Edit: By popular demand, here's an example of doing the same thing with a GraphDef. I don't recommend it. Since none of the collections are restored when the GraphDef is loaded, we have to manually specify the Variables we want the Saver to restore; the "import/" default naming scheme is easy enough to fix with a name='' argument to import_graph_def, but removing it isn't super helpful since you'd need to manually fill in the variables collection if you wanted the Saver to work "automatically". Instead I've chosen to specify a mapping manually when creating the Saver.
import tensorflow as tf
CHECKPOINT_PATH = "/tmp/first_graph_checkpoint"
with tf.Graph().as_default():
some_variable = tf.get_variable(
name="some_variable",
shape=[2],
dtype=tf.float32)
init_op = tf.global_variables_initializer()
first_graph_def = tf.get_default_graph().as_graph_def()
first_graph_saver = tf.train.Saver()
with tf.Session() as session:
init_op.run()
print("Initialized value in first graph", some_variable.eval())
first_graph_saver.save(
sess=session,
save_path=CHECKPOINT_PATH)
with tf.Graph().as_default():
tf.import_graph_def(first_graph_def)
variable_to_restore = tf.get_default_graph().get_tensor_by_name(
"import/some_variable:0")
second_graph_saver = tf.train.Saver(var_list={
"some_variable": variable_to_restore
})
with tf.Session() as session:
second_graph_saver.restore(
sess=session,
save_path=CHECKPOINT_PATH)
print("Variable value after restore", variable_to_restore.eval())

TensorFlow Inference

I've been digging around on this for a while. I have found a ton of articles; but none really show just tensorflow inference as a plain inference. Its always "use the serving engine" or using a graph that is pre-coded/defined.
Here is the problem: I have a device which occasionally checks for updated models. It then needs to load that model and run input predictions through the model.
In keras this was simple: build a model; train the model and the call model.predict(). In scikit-learn same thing.
I am able to grab a new model and load it; I can print out all of the weights; but how in the world do I run inference against it?
Code to load model and print weights:
with tf.Session() as sess:
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
for var in tf.trainable_variables():
print(sess.run(var))
I printed out all of my collections and I have:
['queue_runners', 'variables', 'losses', 'summaries', 'train_op', 'cond_context', 'trainable_variables']
I tried using sess.run(train_op); however that just started kicking up a full training session; which is not what I want to do. I just want to run inference against a different set of inputs that I provide which are not TF Records.
Just a little more detail:
The device can use C++ or Python; as long as I can produce a .exe. I can set up a feed dict if I want to feed the system. I trained with TFRecords; but in production I'm not going to use TFRecords; its a real/near real time system.
Thanks for any input. I am posting sample code to this repo: https://github.com/drcrook1/CIFAR10/TensorFlow which does all the training and sample inference.
Any hints are greatly appreciated!
------------EDITS-----------------
I rebuilt the model to be as below:
def inference(images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
notice I add the name 'prediction' to the TF operation so I can retrieve it later.
When training I used the input pipeline for tfrecords and input queues.
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])
logits = Vgg3CIFAR10.inference(examples)
loss = Vgg3CIFAR10.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
I am attempting to use feed_dict on the loaded operation in the graph; however now it is just simply hanging....
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
#sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
rand = np.random.rand(1, 32, 32, 3)
print(rand)
print(pred)
print(sess.run(pred, feed_dict={images: rand}))
print('done')
run_inference()
I believe this is not working because the original network was trained using TFRecords. In the sample CIFAR data set the data is small; our real data set is huge and it is my understanding TFRecords the the default best practice for training a network. The feed_dict makes great perfect sense from a productionizing perspective; we can spin up some threads and populate that thing from our input systems.
So I guess I have a network that is trained, I can get the predict operation; but how do I tell it to stop using the input queues and start using the feed_dict? Remember that from the production perspective I do not have access to whatever the scientists did to make it. They do their thing; and we stick it in production using whatever agreed upon standard.
-------INPUT OPS--------
tf.Operation 'input/input_producer/Const' type=Const, tf.Operation 'input/input_producer/Size' type=Const, tf.Operation 'input/input_producer/Greater/y' type=Const, tf.Operation 'input/input_producer/Greater' type=Greater, tf.Operation 'input/input_producer/Assert/Const' type=Const, tf.Operation 'input/input_producer/Assert/Assert/data_0' type=Const, tf.Operation 'input/input_producer/Assert/Assert' type=Assert, tf.Operation 'input/input_producer/Identity' type=Identity, tf.Operation 'input/input_producer/RandomShuffle' type=RandomShuffle, tf.Operation 'input/input_producer' type=FIFOQueueV2, tf.Operation 'input/input_producer/input_producer_EnqueueMany' type=QueueEnqueueManyV2, tf.Operation 'input/input_producer/input_producer_Close' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Close_1' type=QueueCloseV2, tf.Operation 'input/input_producer/input_producer_Size' type=QueueSizeV2, tf.Operation 'input/input_producer/Cast' type=Cast, tf.Operation 'input/input_producer/mul/y' type=Const, tf.Operation 'input/input_producer/mul' type=Mul, tf.Operation 'input/input_producer/fraction_of_32_full/tags' type=Const, tf.Operation 'input/input_producer/fraction_of_32_full' type=ScalarSummary, tf.Operation 'input/TFRecordReaderV2' type=TFRecordReaderV2, tf.Operation 'input/ReaderReadV2' type=ReaderReadV2,
------END INPUT OPS-----
----UPDATE 3----
I believe what I need to do is to kill the input section of the graph trained with TF Records and rewire the input to the first layer to a new input. Its kinda like performing surgery; but this is the only way I can find to do inference if I trained using TFRecords as crazy as it sounds...
Full Graph:
Section to kill:
So I think the question becomes: How does one kill the input section of the graph and replace it with a feed_dict?
A follow up to this would be: is this really the right way to do it? This seems bonkers.
----END UPDATE 3----
---link to checkpoint files---
https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015-12-11&sr=b&sig=oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D
--end link to checkpoint files---
-----UPDATE 4 -----
I gave in and just gave a shot at the 'normal' way of performing inference assuming I could have the scientists simply just pickle their models and we could grab the model pickle; unpack it and then run inference on it. So to test I tried the normal way assuming we already unpacked it...It doesn't work worth a beans either...
import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
from scipy import misc
import time
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
logits = Vgg3CIFAR10.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images})
new_saver.restore(sess, MODEL_PATH)
pred = tf.get_default_graph().get_operation_by_name('prediction')
enq = sess.graph.get_operation_by_name(enqueue_op)
#tf.train.start_queue_runners(sess)
print(rand)
print(pred)
print(enq)
for i in range(1, 25):
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
print(sess.run(logits, feed_dict={images : img}))
time.sleep(3)
print('done')
run_inference()
Tensorflow ends up building a new graph with the inference function from the loaded model; then it appends all the other stuff from the other graph to the end of it. So then when I populate a feed_dict expecting to get inferences back; I just get a bunch of random garbage as if it were the first pass through the network...
Again; this seems nuts; do I really need to write my own framework for serializing and deserializing random networks? This has had to have been done before...
-----UPDATE 4 -----
Again; thanks!

Alright, this took way too much time to figure out; so here is the answer for the rest of the world.
Quick Reminder: I needed to persist a model that can be dynamically loaded and inferred against without knowledge as to the under pinnings or insides of how it works.
Step 1: Create a model as a Class and ideally use an interface definition
class Vgg3Model:
NUM_DENSE_NEURONS = 50
DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2)
def inference(self, images):
'''
Portion of the compute graph that takes an input and converts it into a Y output
'''
with tf.variable_scope('Conv1') as scope:
C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
with tf.variable_scope('Dense1') as scope:
P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE))
dim = P_1.get_shape()[1].value
D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
with tf.variable_scope('Dense2') as scope:
D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
H = tf.nn.softmax(D_2, name='prediction')
return H
def loss(self, logits, labels):
'''
Adds Loss to all variables
'''
cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
cross_entr = tf.reduce_mean(cross_entr)
tf.summary.scalar('cost', cross_entr)
tf.add_to_collection('losses', cross_entr)
return tf.add_n(tf.get_collection('losses'), name='total_loss')
Step 2: Train your network with whatever inputs you want; in my case I used Queue Runners and TF Records. Note that this step is done by a different team which iterates, builds, designs and optimizes models. This can also change over time. The output they produce must be able to be pulled from a remote location so we can dynamically load the updated models on devices (reflashing hardware is a pain especially if it is geographically distributed). In this instance; the team drops the 3 files associated with a graph saver; but also a pickle of the model used for that training session
model = vgg3.Vgg3Model()
def create_sess_ops():
'''
Creates and returns operations needed for running
a tensorflow training session
'''
GRAPH = tf.Graph()
with GRAPH.as_default():
examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
batch_size=CONSTANTS.BATCH_SIZE,
img_shape=CONSTANTS.IMAGE_SHAPE,
num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0],
CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input')
logits = model.inference(examples)
loss = model.loss(logits, labels)
OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
gradients = OPTIMIZER.compute_gradients(loss)
apply_gradient_op = OPTIMIZER.apply_gradients(gradients)
gradients_summary(gradients)
summaries_op = tf.summary.merge_all()
return [apply_gradient_op, summaries_op, loss, logits], GRAPH
def main():
'''
Run and Train CIFAR 10
'''
print('starting...')
ops, GRAPH = create_sess_ops()
total_duration = 0.0
with tf.Session(graph=GRAPH) as SESSION:
COORDINATOR = tf.train.Coordinator()
THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)
SESSION.run(tf.global_variables_initializer())
SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH)
GRAPH_SAVER = tf.train.Saver()
for EPOCH in range(CONSTANTS.EPOCHS):
duration = 0
error = 0.0
start_time = time.time()
for batch in range(CONSTANTS.MINI_BATCHES):
_, summaries, cost_val, prediction = SESSION.run(ops)
error += cost_val
duration += time.time() - start_time
total_duration += duration
SUMMARY_WRITER.add_summary(summaries, EPOCH)
print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))
if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:
print(
'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)
)
break
GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model')
with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output:
pickle.dump(model, output)
COORDINATOR.request_stop()
COORDINATOR.join(THREADS)
Step 3: Run some Inference. Load your pickled model; create a new graph by piping in the new placeholder to the logits; and then call session restore. DO NOT RESTORE THE WHOLE GRAPH; JUST THE VARIABLES.
MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'
images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
with open('models/vgg3.pkl', 'rb') as model_in:
model = pickle.load(model_in)
logits = model.inference(images)
def run_inference():
'''Runs inference against a loaded model'''
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
new_saver = tf.train.Saver()
new_saver.restore(sess, MODEL_PATH)
print("Starting...")
for i in range(20, 30):
print(str(i) + '.png')
img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
img = img.reshape(1, 32, 32, 3)
pred = sess.run(logits, feed_dict={images : img})
max_node = np.argmax(pred)
print('predicted label: ' + str(max_node))
print('done')
run_inference()
There definitely ways to improve on this using interfaces and maybe packaging up everything better; but this is working and sets the stage for how we will be moving forward.
FINAL NOTE When we finally pushed this to production, we ended up having to ship the stupid `mymodel_model.py file down with everything to build up the graph. So we now enforce a naming convention for all models and there is also a coding standard for production model runs so we can do this properly.
Good Luck!

While it's not as cut and dry as model.predict(), it's still really trivial.
In your model you should have a tensor that computes the final output you're interested in, let's name that tensor output. You may currently just have a loss function. If so create another tensor (variable in the model) that actually computes the output you want.
For example, if your loss function is:
tf.nn.sigmoid_cross_entropy_with_logits(last_layer_activation, labels)
And you expect your outputs to be in the range [0,1] per class, create another variable:
output = tf.sigmoid(last_layer_activation)
Now, when you call sess.run(...) just request the output tensor. Don't request the optimization OP you normally would to train it. When you request this variable tensorflow will do the minimum work necessary to produce the value (e.g. it won't bother with backprop, loss functions, and all that because a simple feed forward pass is all that's necessary to compute output.
So if you're creating a service to return inferences of the model you'll want to keep the model loaded in memory/gpu, and repeat:
sess.run(output, feed_dict={X: input_data})
You won't need to feed it the labels because tensorflow won't bother to compute ops that aren't needed to produce the output you are requesting. You don't have to change your model or anything.
While this approach might not be as obvious as model.predict(...) I'd argue that it's vastly more flexible. If you start playing with more complex models you'll probably learn to love this approach. model.predict() is like "thinking inside the box."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Understanding TensorFlow checkpoint loading? - python

Related

How to assign new values to a tensorflow constant?

Restore tf variables in a different graph

Understanding next step after saving TensorFlow model

Can tensorflow Saver be used in different graphs with the same structure

TensorFlow Inference

Categories

Resources