Issue with Tensorflow save and restore model - python

I am trying to use the Transfer Learning approach. Here is a snapshot for the code where my code is learning over the Training data :
max_accuracy = 0.0
saver = tf.train.Saver()
for epoch in range(epocs):
shuffledRange = np.random.permutation(n_train)
y_one_hot_train = encode_one_hot(len(classes), Y_input)
y_one_hot_validation = encode_one_hot(len(classes), Y_validation)
shuffledX = X_input[shuffledRange,:]
shuffledY = y_one_hot_train[shuffledRange]
for Xi, Yi in iterate_mini_batches(shuffledX, shuffledY, mini_batch_size):
sess.run(train_step,
feed_dict={bottleneck_tensor: Xi,
ground_truth_tensor: Yi})
# Every so often, print out how well the graph is training.
is_last_step = (i + 1 == FLAGS.how_many_training_steps)
if (i % FLAGS.eval_step_interval) == 0 or is_last_step:
train_accuracy, cross_entropy_value = sess.run(
[evaluation_step, cross_entropy],
feed_dict={bottleneck_tensor: Xi,
ground_truth_tensor: Yi})
validation_accuracy = sess.run(
evaluation_step,
feed_dict={bottleneck_tensor: X_validation,
ground_truth_tensor: y_one_hot_validation})
print('%s: Step %d: Train accuracy = %.1f%%, Cross entropy = %f, Validation accuracy = %.1f%%' %
(datetime.now(), i, train_accuracy * 100, cross_entropy_value, validation_accuracy * 100))
result_tensor = sess.graph.get_tensor_by_name(ensure_name_has_port(FLAGS.final_tensor_name))
probs = sess.run(result_tensor,feed_dict={'pool_3/_reshape:0': Xi[0].reshape(1,2048)})
if validation_accuracy > max_accuracy :
saver.save(sess, 'models/superheroes_model')
max_accuracy = validation_accuracy
print(probs)
i+=1
Here is where my code, where I am loading the model :
def load_model () :
sess=tf.Session()
#First let's load meta graph and restore weights
saver = tf.train.import_meta_graph('models/superheroes_model.meta')
saver.restore(sess,tf.train.latest_checkpoint('models/'))
sess.run(tf.global_variables_initializer())
result_tensor = sess.graph.get_tensor_by_name(ensure_name_has_port(FLAGS.final_tensor_name))
X_feature = features[0].reshape(1,2048)
probs = sess.run(result_tensor,
feed_dict={'pool_3/_reshape:0': X_feature})
print probs
return sess
So now for the same data point I am getting totally different results while training and testing. Its not even close. During testing, my probabilities are near to 25% as I have 4 classes. But during training highest class probability is 90%.
Is there any issue while saving or restoring the model?

Be careful -- you are calling
sess.run(tf.global_variables_initializer())
after calling
saver.restore(sess,tf.train.latest_checkpoint('models/'))
I've done similar before, and I think that resets all your trained weights/biases/etc. in the restored model.
If you must, call the initializer prior to restoring the model, and if you need to initialize something specific from the restored model, do it individually.

delete sess.run(tf.global_variables_initializer()) in your function load_model, if you do it, all your trained parameters will be replaced with the initial value that will produce 1/4 probability for each class

Related

Saver.save is getting slower and slower in each fold

I am using tensorflow and I have developer a deep multilayer feedforward model. To be sure about the performance of the model, I decided to use it in 10-fold cross validation. In each fold I create a new instance of the neural network, call the train and the predict functions.
In each fold I call the following codes:
for each fold:
nn= ffNN(hidden_nodes, epochs, learning_rate, saveFrequency, save_path, decay, decay_step, decay_factor, stop_loss, keep_probability, regularization_factor,minimum_cost,activation_function,batch_size,shuffle,stopping_iteration)
nn.initialize(x_size)
nn.train(X,y)
nn.predict(X_test)
in ffNN file I have the initialization and train and predict functions as follow:
nn.train:
sess = tf.InteractiveSession()
init = tf.global_variables_initializer()
sess.run(init)
saver = tf.train.Saver()
for each epoch:
for each batch:
_ , loss = session.run([self.optimizer,self.loss],feed_dict={self.X:X1, self.y:y})
if epoch % save_frequency == 0:
saver.save(session,save_path)
sess.close()
The problem is in saver.save, in each fold it takes longer and longer to save. Although I create all of the variables from the scratch, I don't know what is making it dependent on the folds and make the saving takes longer and longer.
Thanks in advance.
Edit:
The code for building the model nn.initialize is as follow:
self.X = tf.placeholder("float", shape=[None, x_size], name='XValue')
self.y = tf.placeholder("float", shape=[None, y_size], name='yValue')
with tf.variable_scope("initialization", reuse=tf.AUTO_REUSE):
w_in, b_in = init_weights((x_size, self.hidden_nodes))
h_out = self.forwardprop(self.X, w_in, b_in, self.keep_prob,self.activation_function)
l2_norm = tf.add(tf.nn.l2_loss(w_in), tf.nn.l2_loss(b_in))
w_out, b_out = init_weights((self.hidden_nodes, y_size))
l2_norm = tf.add(tf.nn.l2_loss(w_out), l2_norm)
l2_norm = tf.add(tf.nn.l2_loss(b_out), l2_norm)
self.yhat = tf.add(tf.matmul(h_out, w_out), b_out)
self.mse = tf.losses.mean_squared_error(labels=self.y, predictions=self.yhat)
self.loss = tf.add(self.mse,self.regularization_factor * l2_norm)
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
Based on what you described in the question the problem is not in saver.save, but the computational graph getting bigger and bigger instead. Thus, the saving takes more time. Make sure to structure the code in the following way:
for each fold:
# Clear the previous computational graph
tf.reset_default_graph()
# Then build the graph
nn = ffNN()
# Create the saver
saver = tf.train.Saver()
# Create a session
with tf.Session() as sess:
# Initialize the variables in the graph
sess.run(tf.global_variables_initializer())
# Train the model
for each epoch:
for each batch:
nn.train_on_batch()
if epoch % save_frequency == 0:
saver.save(sess,save_path)

Get a prediction from Tensor Flow Model

I want to get predictions from my trained tensor flow model. The following is the code I have for training my model.
def train_model(self, train, test, learning_rate=0.0001, num_epochs=16, minibatch_size=32, print_cost=True, graph_filename='costs'):
# Ensure that model can be rerun without overwriting tf variables
ops.reset_default_graph()
# For reproducibility
tf.set_random_seed(42)
seed = 42
# Get input and output shapes
(n_x, m) = train.images.T.shape
n_y = train.labels.T.shape[0]
costs = []
# Create placeholders of shape (n_x, n_y)
X, Y = self.create_placeholders(n_x, n_y)
# Initialize parameters
parameters = self.initialize_parameters()
# Forward propagation
Z3 = self.forward_propagation(X, parameters)
# Cost function
cost = self.compute_cost(Z3, Y)
# Backpropagation (using Adam optimizer)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# Initialize variables
init = tf.global_variables_initializer()
# Start session to compute Tensorflow graph
with tf.Session() as sess:
# Run initialization
sess.run(init)
# Training loop
for epoch in range(num_epochs):
epoch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed = seed + 1
for i in range(num_minibatches):
# Get next batch of training data and labels
minibatch_X, minibatch_Y = train.next_batch(minibatch_size)
# Execute optimizer and cost function
_, minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X.T, Y: minibatch_Y.T})
# Update epoch cost
epoch_cost += minibatch_cost / num_minibatches
# Print the cost every epoch
if print_cost == True:
print("Cost after epoch {epoch_num}: {cost}".format(epoch_num=epoch, cost=epoch_cost))
costs.append(epoch_cost)
# Plot costs
plt.figure(figsize=(16,5))
plt.plot(np.squeeze(costs), color='#2A688B')
plt.xlim(0, num_epochs-1)
plt.ylabel("cost")
plt.xlabel("iterations")
plt.title("learning rate = {rate}".format(rate=learning_rate))
plt.savefig(graph_filename, dpi=300)
plt.show()
# Save parameters
parameters = sess.run(parameters)
print("Parameters have been trained!")
# Calculate correct predictions
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Calculate accuracy on test set
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print ("Train Accuracy:", accuracy.eval({X: train.images.T, Y: train.labels.T}))
print ("Test Accuracy:", accuracy.eval({X: test.images.T, Y: test.labels.T}))
return parameters
After training the model, I want to extract the prediction from the model.
So I add
print(sess.run(accuracy, feed_dict={X: test.images.T}))
But I am seeing the below error after running the above code:
InvalidArgumentError: You must feed a value for placeholder tensor 'Y'
with dtype float and shape [10,?]
[[{{node Y}} = Placeholderdtype=DT_FLOAT, shape=[10,?], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Any help is welcome..
The tensor accuracy is a function of the tensor correct_prediction, which in turn is a function of (among the rest) Y.
So you're correctly being told that you should feed values for that placeholder too.
I'm assuming Y hold your labels, so it should also make intuitive sense that your feed_dict would also contain the correct Y values.
Hope that helps.
Good luck!

How to load a trained tensorflow model

I am having all kinds of trouble loading a tensorflow model to test on some new data. When I trained the model, I used this:
save_model_file = 'my_saved_model'
saver = tf.train.Saver()
save_path = saver.save(sess, save_model_file)
This seems to result in the following files being created:
my_saved_model.meta
checkpoint
my_saved_model.index
my_saved_model.data-00000-of-00001
I have no idea which of these files I am supposed to pay attention to.
Now the model is trained, and I can't seem to load it or use it without throwing an exception. Here is what I am doing:
def neural_net_data_input(data_shape):
theshape=(None,)+tuple(data_shape)
return tf.placeholder(tf.float32,shape=theshape,name='x')
def neural_net_label_input(n_out):
return tf.placeholder(tf.float32,shape=(None,n_out),name='one_hot_labels')
def neural_net_keep_prob_input():
return tf.placeholder(tf.float32,name='keep_prob')
def do_generate_network(x):
#
# here is where i generate the network layer by layer.
# this code works fine so i am not showing it here
#
pass
#
# Now I want to restore the model
#
tf.reset_default_graph()
input_data_shape=(32,32,1)
final_num_outputs=43
graph1 = tf.Graph()
with graph1.as_default():
x = neural_net_data_input(input_data_shape)
one_hot_labels = neural_net_label_input(final_num_outputs)
keep_prob=neural_net_keep_prob_input()
logits = do_generate_network(x)
# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')
#
# accuracy: we use this for validation testing
#
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
################################
# Evaluate
################################
new_data=myutils.load_pickle_file(SOME_DATA_FILE_NAME)
new_features=new_data['features']
new_one_hot_labels=new_data['labels']
print('Evaluating on new data...')
with tf.Session(graph=graph1) as sess:
# Initializing the variables
sess.run(tf.global_variables_initializer())
saver.restore(sess,save_model_file)
new_acc = sess.run(accuracy, feed_dict={x: new_features, one_hot_labels: new_one_hot_labels, keep_prob: 1.})
print('Testing Accuracy For New Images: {}'.format(new_acc))
But when I do this, I get this:
TypeError: Cannot interpret feed_dict key as Tensor: The name 'save/Const:0' refers to a Tensor which does not exist. The operation, 'save/Const', does not exist in the graph.
So, i tried moving my graph inside the session like this:
################################
# Evaluate
################################
print('Evaluating on web data...')
with tf.Session() as sess:
x = neural_net_data_input(input_data_shape)
one_hot_labels = neural_net_label_input(final_num_outputs)
keep_prob=neural_net_keep_prob_input()
logits = do_generate_network(x)
# Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits')
#
# accuracy: we use this for validation testing
#
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')
sess.run(tf.global_variables_initializer())
my_save_dir="/home/carnd/CarND-Traffic-Sign-Classifier-Project"
load_model_meta_file=os.path.join(my_save_dir,"my_saved_model.meta")
load_model_path=os.path.join(my_save_dir,"my_saved_model")
new_saver = tf.train.import_meta_graph(load_model_meta_file)
new_saver.restore(sess, load_model_path)
web_acc = sess.run(accuracy, feed_dict={x: web_features, one_hot_labels: web_one_hot_labels, keep_prob: 1.})
print('Testing Accuracy For Web Images: {}'.format(web_acc))
Now it runs without throwing an error, but the accuracy result it prints is 0.02! I am feeding in the very same data that during training I was getting 95% accuracy on. So it appears I am somehow loading my model incorrectly.
What am I doing wrong?
Steps for loading the trained model:
Load the graph:
You can load the graph using tf.train.import_meta_graph(). An example code would be:
model_path = "my_saved_model"
inference_graph = tf.Graph()
with tf.Session(graph= inference_graph) as sess:
# Load the graph with the trained states
loader = tf.train.import_meta_graph(model_path+'.meta')
loader.restore(sess, model_path)
Get the tensors: Get the tensors need for inference by using get_tensor_by_name(). So in your model make sure you name the tensors by name, so that you can call it during inference.
#Get the tensors by their variable name
_accuracy = inference_graph.get_tensor_by_name('accuracy:0')
_x = inference_graph get_tensor_by_name('x:0')
_y = inference_graph.get_tensor_by_name('y:0')
Test: Can do done by using the tensors loaded. sess.run(_accuracy, feed_dict={_x: ... , _y:...}

Shift images to the right in TensorFlow

I've made a learning on Tensorflow (MNIST) and I've saved the weights in a .ckpt.
Now I want to test my neural network on this weights, with the same images translated of a few pixels to the right and bottom.
The loading weigths works well, but when I print an eval, Tensorflow display always the same results (0.9630 for the test), whatever the translation is about 1 or 14px.
Here is my code for the function which print the eval :
def eval_translation(sess, eval_correct, images_pl, labels_pl, dataset):
print('Test Data Eval:')
for i in range(28):
true_count = 0 # Counts the number of correct predictions.
steps_per_epoch = dataset.num_examples // FLAGS.batch_size
nb_exemples = steps_per_epoch * FLAGS.batch_size
for step in xrange(steps_per_epoch):
images_feed, labels_feed = dataset.next_batch(FLAGS.batch_size)
feed_dict = {images_pl: translate_right(images_feed, i), labels_pl: labels_feed}
true_count += sess.run(eval_correct, feed_dict=feed_dict)
precision = true_count / nb_exemples
print('Translation: %d Num examples: %d Num correct: %d Precision # 1: %0.04f' % (i, nb_exemples, true_count, precision))
This is the function which with I load the datas and which with I print the test results.
Here is my translation function :
def translate_right(images, dev):
for i in range(len(images)):
for j in range(len(images[i])):
images[i][j] = np.roll(images[i][j], dev)
return images
I call this function in place of the learning just after initialise all the variables :
with tf.Graph().as_default():
# Generate placeholders for the images and labels.
images_placeholder, labels_placeholder = placeholder_inputs(FLAGS.batch_size)
# Build a Graph that computes predictions from the inference model.
weights, logits = mnist.inference(images_placeholder, neurons)
# Add to the Graph the Ops for loss calculation.
loss = mnist.loss(logits, labels_placeholder)
# Add to the Graph the Ops that calculate and apply gradients.
train_op = mnist.training(loss, learning_rate)
# Add the Op to compare the logits to the labels during evaluation.
eval_correct = mnist.evaluation(logits, labels_placeholder)
# Build the summary operation based on the TF collection of Summaries.
summary_op = tf.merge_all_summaries()
# Create a saver for writing training checkpoints.
save = {}
for i in range(len(weights)):
save['weights' + str(i)] = weights[i]
saver = tf.train.Saver(save)
# Create a session for running Ops on the Graph.
sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)
# load weights
saver.restore(sess, restore_path)
# Instantiate a SummaryWriter to output summaries and the Graph.
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
temps_total = time.time()
eval_translation(sess, eval_correct, images_placeholder, labels_placeholder, dataset.test)
I don't know what's wrong with my code, and why Tensorflow seems to ignore my images.
Can someone could help me please ?
Thanks !
You function translate_right doesn't work, because images[i, j] is just one pixel (containing 1 value if you have greyscale images).
You should use the argument axis of np.roll:
def translate_right(images, dev):
return np.roll(images, dev, axis=1)

tensorflow: saving and restoring session

I am trying to implement a suggestion from answers:
Tensorflow: how to save/restore a model?
I have an object which wraps a tensorflow model in a sklearn style.
import tensorflow as tf
class tflasso():
saver = tf.train.Saver()
def __init__(self,
learning_rate = 2e-2,
training_epochs = 5000,
display_step = 50,
BATCH_SIZE = 100,
ALPHA = 1e-5,
checkpoint_dir = "./",
):
...
def _create_network(self):
...
def _load_(self, sess, checkpoint_dir = None):
if checkpoint_dir:
self.checkpoint_dir = checkpoint_dir
print("loading a session")
ckpt = tf.train.get_checkpoint_state(self.checkpoint_dir)
if ckpt and ckpt.model_checkpoint_path:
self.saver.restore(sess, ckpt.model_checkpoint_path)
else:
raise Exception("no checkpoint found")
return
def fit(self, train_X, train_Y , load = True):
self.X = train_X
self.xlen = train_X.shape[1]
# n_samples = y.shape[0]
self._create_network()
tot_loss = self._create_loss()
optimizer = tf.train.AdagradOptimizer( self.learning_rate).minimize(tot_loss)
# Initializing the variables
init = tf.initialize_all_variables()
" training per se"
getb = batchgen( self.BATCH_SIZE)
yvar = train_Y.var()
print(yvar)
# Launch the graph
NUM_CORES = 3 # Choose how many cores to use.
sess_config = tf.ConfigProto(inter_op_parallelism_threads=NUM_CORES,
intra_op_parallelism_threads=NUM_CORES)
with tf.Session(config= sess_config) as sess:
sess.run(init)
if load:
self._load_(sess)
# Fit all training data
for epoch in range( self.training_epochs):
for (_x_, _y_) in getb(train_X, train_Y):
_y_ = np.reshape(_y_, [-1, 1])
sess.run(optimizer, feed_dict={ self.vars.xx: _x_, self.vars.yy: _y_})
# Display logs per epoch step
if (1+epoch) % self.display_step == 0:
cost = sess.run(tot_loss,
feed_dict={ self.vars.xx: train_X,
self.vars.yy: np.reshape(train_Y, [-1, 1])})
rsq = 1 - cost / yvar
logstr = "Epoch: {:4d}\tcost = {:.4f}\tR^2 = {:.4f}".format((epoch+1), cost, rsq)
print(logstr )
self.saver.save(sess, self.checkpoint_dir + 'model.ckpt',
global_step= 1+ epoch)
print("Optimization Finished!")
return self
When I run:
tfl = tflasso()
tfl.fit( train_X, train_Y , load = False)
I get output:
Epoch: 50 cost = 38.4705 R^2 = -1.2036
b1: 0.118122
Epoch: 100 cost = 26.4506 R^2 = -0.5151
b1: 0.133597
Epoch: 150 cost = 22.4330 R^2 = -0.2850
b1: 0.142261
Epoch: 200 cost = 20.0361 R^2 = -0.1477
b1: 0.147998
However, when I try to recover the parameters (even without killing the object):
tfl.fit( train_X, train_Y , load = True)
I get strange results. First of all, the loaded value does not correspond the saved one.
loading a session
loaded b1: 0.1 <------- Loaded another value than saved
Epoch: 50 cost = 30.8483 R^2 = -0.7670
b1: 0.137484
What is the right way to load, and probably first inspect the saved variables?
TL;DR: You should try to rework this class so that self.create_network() is called (i) only once, and (ii) before the tf.train.Saver() is constructed.
There are two subtle issues here, which are due to the code structure, and the default behavior of the tf.train.Saver constructor. When you construct a saver with no arguments (as in your code), it collects the current set of variables in your program, and adds ops to the graph for saving and restoring them. In your code, when you call tflasso(), it will construct a saver, and there will be no variables (because create_network() has not yet been called). As a result, the checkpoint should be empty.
The second issue is that—by default—the format of a saved checkpoint is a map from the name property of a variable to its current value. If you create two variables with the same name, they will be automatically "uniquified" by TensorFlow:
v = tf.Variable(..., name="weights")
assert v.name == "weights"
w = tf.Variable(..., name="weights")
assert v.name == "weights_1" # The "_1" is added by TensorFlow.
The consequence of this is that, when you call self.create_network() in the second call to tfl.fit(), the variables will all have different names from the names that are stored in the checkpoint—or would have been if the saver had been constructed after the network. (You can avoid this behavior by passing a name-Variable dictionary to the saver constructor, but this is usually quite awkward.)
There are two main workarounds:
In each call to tflasso.fit(), create the whole model afresh, by defining a new tf.Graph, then in that graph building the network and creating a tf.train.Saver.
RECOMMENDED Create the network, then the tf.train.Saver in the tflasso constructor, and reuse this graph on each call to tflasso.fit(). Note that you might need to do some more work to reorganize things (in particular, I'm not sure what you do with self.X and self.xlen) but it should be possible to achieve this with placeholders and feeding.

Categories

Resources