Tensorflow allows us to save/load model's structure, using method tf.train.write_graph, so that we can restore it in the future to continue our training session. However, I'm wondering that if this is necessary because I can create a module, e.g GraphDefinition.py, and use this module to re-create the model.
So, which is the better way to save the model structure or are there any rule of thumb that suggest which way should I use when saving a model?
First of all you have to understand, that tensorflow graph does not have current weights in it (until you save them manually there) and if you load model structure from graph.pb, you will start you train from the very beginning. But if you want to continue train or use your trained model, you have to save checkpoint (using tf Saver) with the values of the variables in it, not only the structure.
Check out this tread: Tensorflow: How to restore a previously saved model (python)
Related
I have defined a deep learning model my_unet() in tensorflow. During training I set save_weigths=False since I wanted to save the entire model (not only the wieghts bu the whole configuration). The generated file is path_to_model.hdf5.
However, when loading back the model I used the earlier version (I forgot to update it) in which I first called the model and then load the model using:
model = my_unet()
model.load_weights(path_to_model.hdf5)
Instead of simply using: model = tf.keras.models.load_model(path_to_model.hdf5) to load the entire model.
Both ways of loading the model and the weights provided the same predictions when run in some dummy data and there were no errors.
My question is: Why loading the entire model using model.load_weights() does not generate any problem? What is the structure of the hdf5 file and how theese two ways of loading exactly work? Where can I find this information?
You can please see the documentation here for any future reference: http://davis.lbl.gov/Manuals/HDF5-1.8.7/UG/03_DataModel.html
I have been searching for a method to do this for so long, and I can not find an answer. Most threads I found are of people wanting to do the opposite.
Backstory:
I am experimenting with some pre-trained models provided by the tensorflow/models repository. The models are saved as .pb frozen graphs. I want to fine-tune some of these models by changing the final layers to suit my application.
Hence, I want to load the models inside a jupyter notebook as a normal keras h5 model.
How can I do that?
do you have a better way to do so?
Thanks.
seems like all you would have to do is download the model files and store them in a directory. Call the directory for example c:\models. Then load the model.
model = tf.keras.models.load_model(r'c:\models')
model.summary() # prints out the model layers
# generate code to modify the model as you typically do for transfer learning
# compile the changed model
# train the model
# save the trained model as a .h5 file
dir=r'path to the directory you want to save the model to'
model_identifier= 'abcd.h5' # for abcd use whatever identification you want
save_path=os.path.join(dir, model_identifier)
model.save(save_path)
Which method is best, whether saving model checkpoints or saving entire model to disk for each epochs. Why nobody saves the entire model?
A keras model has two things, an architecture and weights. If you save the whole model in each checkpoint, you’re saving the architecture every time. For this reason the best on training is to save only weight and use the wireframe in memory.
On tensorflow.keras documentation have more about other methods.
Checkpoints are used to save your model if in case your system crashes or code interrupted while training so when you start training your model again after crashes you don't have to start from scratch.Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model.
The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model.
you can see the above info in the official doc of tensorflow. #R Nanthak
I have a huge Tensorflow model (the checkpoint file is 4-5 gbs). I was wondering if there's a different way to save Tensorflow models, besides the checkpoint way, that is space/memory efficient.
I know that a checkpoint file also saves all the optimizer gradients, so maybe those can be cut out too.
My model is very simple, just two matrices of embeddings, perhaps I can only save those matrices to .npy directly?
What you want to do with the checkpoint is to freeze it. Check out this page from tensorflow's official documentation.
The freezing process strips off all extraneous information from the checkpoint that isn't used for forward inference. Tensorflow provides an easy to use script for it called freeze_graph.py.
I'm having a lot of trouble understanding the proper use of tf.train.Saver
I have a session where I create several distinct and separate network models. All models are trained and I save the best performing networks for later use.
However, when I try to restore a model at a later time I get an error which seems to indicate that some variables are either not getting saved or restored:
NotFoundError: Tensor name "Network_8/train/beta2_power" not found in checkpoint files networks/network_0.ckpt
for some reason, when I try and load the variables for Network_0 I'm being told I need variable information for Network_8.
What is the best way to make sure I only save/restore the correct variables from a multi-network session?
It seems part of my problem is that, while I have created a dict object for the Variables I want to save (weights and biases) for each network, when I setup an optimizer such as the AdamOptimizer tensorflow automatically creates extra variables which need to be initialized. This is fine if you use tf.train.Saver to save ALL variables and you only have one network, however I am training multiple networks and only saving the best results. I'm not sure how to specify the variables tf auto adds to my dict for saving.
My solution is to create a part_saver with the same tensor name both in the original model and the new model (i.e. Network_0 and Network_8) which only restores the needed variables.
part_saver = tf.train.Saver({"W":w,"b":b,...})
Init all the variables in Network_8 before restoring the partial model.