Python/Tensorflow: Saver for network

Python/Tensorflow: Saver for network - python

According to the documentation and numerous SO posts regarding this API, the saver object must be created using
saver = tf.train.Saver(...variables...)
I wanted to know if there is any way to automatically populate the (...variables...) without having to explicitly list all variables and ops used in my network.
Right now my network is only two layers so it is not a huge hassle, but it feels downright stone-age like to have to list all the variables manually.

The default initializer for tf.train.Saver will create an instance that saves/restores all saveable objects in your graph, which typically includes all of your model variables. Therefore you should be able to write:
saver = tf.train.Saver()
…and get the desired effect without too much trouble.

Related

How to grab one tensor from an existing model and use it in another one?

What I want to do is to grab some weights and biases from an existing trained model, and then use them in my customized op (model or graph).
I can restore model with:
# Create context
with tf.Graph().as_default(), tf.Session() as sess:
# Create model
with tf.variable_scope('train'):
train_model = MyModel(some_args)
And then grab tensor:
latest_ckpt = tf.train.latest_checkpoint(path)
if latest_ckpt:
saver.restore(sess, latest_ckpt)
weight = tf.get_default_graph().get_tensor_by_name("example:0")
My question is, if I want to use that weight in another context (model or graph), how to safely copy its value to the new graph, e.g.:
with self.test_session(use_gpu=True, graph=ops.Graph()) as sess:
with vs.variable_scope("test", initializer=initializer):
# How can I make it possible?
w = tf.get_variable('name', initializer=weight)
Any help is welcome, thank you so much.
Thanks #Sorin for the inspiration, I found a simple and clean way to do this:
z = graph.get_tensor_by_name('prefix/NN/W1:0')
with tf.Session(graph=graph) as sess:
z_value = sess.run(z)
with tf.Graph().as_default() as new_graph, tf.Session(graph=new_graph) as sess:
w = tf.get_variable('w', initializer=z_value)

The hacky way is to use tf.assign to assign the weight to the variable you want (make sure it only happens once at the begining, and not every iteration, otherwise the model won't be able to adjust those weights).
The slightly less hacky way is to load the graph and session of the trained model and modify the graph to add the operations you need. This will make the graph a bit more messy since you also have the entire graph of the original model, but it's a bit cleaner since you can depend directly on the operations instead of the weights (that is if the original model was doing a sigmoid activation, this will copy the activation as well). The unused parts of the graph will be automatically pruned by tensorflow.
The clean way to do it is to use www.tenforflow.com/hub . It's a library that allows you to define parts of the graph as modules that you can export and import into any graph. This will handle all dependencies and configuration and also gives you nice controls over the training (i.e. if you want to freeze the weights, or delay the training for some number of iterations, etc.)

Using tf.data.Dataset makes saved model bigger

I recently have an issue with saving the model in a bigger size.
I am using tensorflow 1.4
Before, I used
tf.train.string_input_producer() and tf.train.batch()
to load images from a text file. And in the training,
tf.train.start_queue_runners() and tf.train.Coordinator()
were used to provide data to the network. In this case, every time I saved the model using
saver.save(sess, checkpoint_path, global_step=iters)
only gave me a small size file, i.e. a file named model.ckpt-1000.data-00000-of-00001 with 1.6MB.
Now, I use
tf.data.Dataset.from_tensor_slices()
to supply images to an input placeholder and the saved model become 290MB. But I don't know why. I suspect the tensorflow saver saved the dataset into the model as well. If so, how to remove them to make it smaller, and only the weights of the network are saved.
This is not network depended because I tried in two networks and they were all like that.
I have googled but unfortunately didn't see any inspiration related to this issue. (Or this is not an issue, just I don't know how do?)
Thank you very much for any idea and help!
Edit
The method I initialised the dataset is:
1.First generated numpy.array dataset:
self.train_hr, self.train_lr = cifar10.load_dataset(sess)
The initial dataset is numpy.array, for example [8000,32,32,3]. I passed sess into this function is because in the function, I did tf.image.resize_images() and use sess.run() to generate numpy.array. The returns self.train_hr and self.train_lr are numpy.array in shape [8000,64,64,3].
2.Then I created the dataset:
self.img_hr = tf.placeholder(tf.float32)
self.img_lr = tf.placeholder(tf.float32)
dataset = tf.data.Dataset.from_tensor_slices((self.img_hr, self.img_lr))
dataset = dataset.repeat(conf.num_epoch).shuffle(buffer_size=conf.shuffle_size).batch(conf.batch_size)
self.iterator = dataset.make_initializable_iterator()
self.next_batch = self.iterator.get_next()
3.Then I initialised network and dataset, did the training and saved model:
self.labels = tf.placeholder(tf.float32,
shape=[conf.batch_size, conf.hr_size, conf.hr_size, conf.img_channel])
self.inputs = tf.placeholder(tf.float32,
shape=[conf.batch_size, conf.lr_size, conf.lr_size, conf.img_channel])
self.net = Net(self.labels, self.inputs, mask_type=conf.mask_type,
is_linear_only=conf.linear_mapping_only, scope='sr_spc')
sess.run(self.iterator.initializer,
feed_dict={self.img_hr: self.train_hr, self.img_lr: self.train_lr})
while True:
hr_img, lr_img = sess.run(self.next_batch)
_, loss, summary_str = sess.run([train_op, self.net.loss, summary_op],
feed_dict={self.labels: hr_img, self.inputs: lr_img})
...
...
checkpoint_path = os.path.join(conf.model_dir, 'model.ckpt')
saver.save(sess, checkpoint_path, global_step=iters)
All the sess are the same instance.

I suspect you created a tensorflow constant tf.constant out of your dataset, which would explain why the dataset gets stored with the graph. There is an initializeable dataset which let's you feed in the data using feed_dict at runtime. It's a few extra lines of code to configure but it's probably what you wanted to use.
https://www.tensorflow.org/programmers_guide/datasets
Note that constants get created for you automatically in the Python wrapper. The following statements are equivalent:
tf.Variable(42)
tf.Variable(tf.constant(42))

Tensorflow indeed saves your dataset. To solve it, lets understand why.
How tensorflow works and what does it save?
In short, Tensorflow API lets you build a computation graph via code, and then optimize it. Every op/variable/constant you define in the graph is working on tensors and is part of that graph. This framework is convenient since Tensorflow just build a graph, then the framework decides (or you specify) where to compute the graph in order to gain maximum speed out of your hardware, for instance, by computing on your GPU.
The GPU is a great example since this is a great example for your issue. Sending data from HDD/RAM/Processor to GPU is expensive time-wise. Therefore, Tensorflow also allow you to create input producers that will pretty much automatically manage the data transferred between all peripheral units, by queuing them and managing threads. However, I haven't seen much gain from that approach. Note that the inputs produced by datasets are also tensors, specifically constants/variables that are used as input to the network.. Therefore, they are part of the graph.
When saving a graph, we save several things:
Metadata - which defines the graph and its structure.
Values - of each variable/constant in the graph, in order to load it and reuse the network.
When you use datasets, the values of the non-trainable variables are saved, and therefore, your checkpoint file is larger.
To better understand datasets, see its implementation in the package files.
TL;DR - How do I fix my problem?
If its not reducing performance, use feeding dictionary to feed placeholders. Do not use tensors to store your data. This way those variables will not be saved.
Save only tensors that you would like to load (weights, biases, etc). You can user .eval() method to find its values, save it as JSON or such, and load it later by reconstructing the graph.
Good luck!

I solved this issue (not perfectly as I still don't know where the problem happens). Instead, I made a workaround to avoid saving a large amount of data.
I defined a saver fed in a specific list of variables. That list only contains the nodes of my graph. Here I show a small example of my workaround:
import tensorflow as tf
v1= tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
v2= tf.Variable(tf.zeros([200]), name="v2")
saver = tf.train.Saver( [v2])
# saver = tf.train.Saver()
with tf.Session() as sess:
init_op = tf.global_variables_initializer()
sess.run(init_op)
saver.save(sess,"checkpoint/model_test",global_step=1)
v2 is the variable list. Or you can use variables_list = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='net') to collect all the nodes.

Loading SavedModel is a lot slower than loading a tf.train.Saver checkpoint

I changed from tf.train.Saver to the SavedModel format which surprisingly means loading my model from disk is a lot slower (instead of a couple of seconds it takes minutes). Why is this and what can I do to load the model faster?
I used to do this:
# Save model
saver = tf.train.Saver()
save_path = saver.save(session, model_path)
# Load model
saver = tf.train.import_meta_graph(model_path + '.meta')
saver.restore(session, model_path)
But now I do this:
# Save model
builder = tf.saved_model.builder.SavedModelBuilder(model_path)
builder.add_meta_graph_and_variables(session, [tf.saved_model.tag_constants.TRAINING])
builder.save()
# Load model
tf.saved_model.loader.load(session, [tf.saved_model.tag_constants.TRAINING], model_path)

I am by no ways an expert in Tensorflow, but if I had to take a guess as to why this is happening, I would say that:
tf.train.Saver(), saves a complete meta-graph. Therefore, all the information needed to perform any operations contained in your graph is already there. All tensorflow needs to do to load the model, is insert the meta-graph into the default/current graph and you're good to go.
The SavedModelBuilder() on the other hand, behind the scene creates a language agnostic representation of your operations and variables. Which means that the loading method has to extract all the information, then recreate all the operation and variables from your previous graph, and insert them into the default/current graph.
Depending on the size of your graph, recreating everything that it contained might take some time.
Concerning the second question, as #J H said, if there are no reasons for you to use one strategy over the other, and time is of the essence, then just go with the fastest one.

what can I do to load the model faster?
Switch back to tf.train.Saver, as your question shows no motivations for using SavedModelBuilder, and makes it clear that elapsed time matters to you. Alternatively, an MCVE that reproduced the timing issue would allow others to collaborate with you on profiling, diagnosing, and fixing any perceived performance issue.

how to use tensorflow saver with multiple models?

I'm having a lot of trouble understanding the proper use of tf.train.Saver
I have a session where I create several distinct and separate network models. All models are trained and I save the best performing networks for later use.
However, when I try to restore a model at a later time I get an error which seems to indicate that some variables are either not getting saved or restored:
NotFoundError: Tensor name "Network_8/train/beta2_power" not found in checkpoint files networks/network_0.ckpt
for some reason, when I try and load the variables for Network_0 I'm being told I need variable information for Network_8.
What is the best way to make sure I only save/restore the correct variables from a multi-network session?
It seems part of my problem is that, while I have created a dict object for the Variables I want to save (weights and biases) for each network, when I setup an optimizer such as the AdamOptimizer tensorflow automatically creates extra variables which need to be initialized. This is fine if you use tf.train.Saver to save ALL variables and you only have one network, however I am training multiple networks and only saving the best results. I'm not sure how to specify the variables tf auto adds to my dict for saving.

My solution is to create a part_saver with the same tensor name both in the original model and the new model (i.e. Network_0 and Network_8) which only restores the needed variables.
part_saver = tf.train.Saver({"W":w,"b":b,...})
Init all the variables in Network_8 before restoring the partial model.

Restoring graph in tensorflow fails because there is no variable to save

I know that there are countless questions on stack and github, etc. on how to restore a trained model in Tensorflow. I have read most of them (1,2,3).
I have almost exactly the same problem as 3 however I would like if possible to solve it in a different fashion as my training and my test need to be in separate scripts called from the shell and I do not want to add the exact same lines I used to define the graph in the test script so I cannot use tensorflow FLAGS and the other answers based on reruning the graph by hand.
I also do not want to sess.run every variables and manually map them by hands as it was explained as my graph is quite big (Using import_graph_def with the arguments input_map).
So I run some graph and train it in a specific script. Like for instance (but without the training part)
#Script 1
import tensorflow as tf
import cPickle as pickle
x=tf.Variable(42)
saver=tf.train.Saver()
sess=tf.Session()
#Saving the graph
graph_def=sess.graph_def
with open('graph.pkl','wb') as output:
pickle.dump(graph_def,output,HIGHEST_PROTOCOL)
#Training the model
sess.run(tf.initialize_all_variables())
#Saving the variables
saver.save(sess,"pretrained_model.ckpt")
I now have both graph and variables saved so I should be able to run my test model from another script even if I have extra training nodes in my graph.
#Script 2
import tensorflow as tf
import cPickle as pickle
sess=tf.Session()
with open('graph.pkl','rb') as input:
graph_def=pickle.load(input)
tf.import_graph_def(graph_def,name='persisted')
Then obviously I want to restore the variables using a saver but I encounter the same problem as 3 as there are no variables found to save to even create a saver. So I cannot write:
saver=tf.train.Saver()
saver.restore(sess,"pretrained_model.ckpt")
Is there a way to bypass those limitations ? I thought by importing graph it would recover the uninitialized variables in every node but it seems not. Do I really need to rerun it a second time like most of the answers given ?

The list of variables is saved in a Collection which is not saved in the GraphDef. Saver by default uses the list in ops.GraphKeys.VARIABLES collection (accessible through tf.all_variables()), and if you restored from GraphDef rather than using Python API to build your model, that collection is empty. You could specify the list of variables manually in tf.train.Saver(var_list=['MyVariable1:0', 'MyVariable2:0',...]).
Alternatively instead of GraphDef you could use MetaGraphDef which saves collections, there's a recently added MetaGraphDef HowTo

To my knowledge and my tests you can't simply pass names to tf.train.Saver object. It must be either list of variables o dictionary.
I would also like to read model from graph_def and then load variables using saver - however attempting it results only in error message: "Variable to save is not a variable"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.