Accessing Estimator evaluation results via SessionRunHooks

Accessing Estimator evaluation results via SessionRunHooks - python

I'm trying to modify a program that uses the Estimator class in TensorFlow (v1.10) and I would like to access the evaluation metric results every time evaluation occurs so that I can copy the checkpoint files only when a new maximum has been achieved.
One idea I had was to create a class inheriting from SessionRunHook, doing the work I want in the after_run method. According to the documentation I can specify what is passed to after_run using before_run. However I cannot find a way to access the evaluation metrics results I want from the information passed in to before_run.
I looked into the Estimator code and it appears that it is writing the results to a summary file so another idea I had was to read this back in the after_run method, but the summary api doesn't seem to provide any read operations.
Are there any other ways I can achieve what I want to do? Not using the Estimator class is not an option as that would involve drastic changes to the code I'm working with.

Checkpoints are not the same as exporting. Checkpoints are about fault-recovery and involve saving the complete training state (weights, global step number, etc.).
In your case I would recommend exporting. The exported model will written to a directory called “exporter” and the serving input function specifies what the end-user will be expected to provide to the prediction service.
You can use the class "Best Exporter" to just export the models that are perfoming best:
https://www.tensorflow.org/api_docs/python/tf/estimator/BestExporter
This class exports the serving graph and checkpoints of the best models.
Also, it performs a model export everytime when the new model is better than any exsiting model.

Related

keras demo code siamese_contrastive.py save and load model?

According to the demo code
"Image similarity estimation using a Siamese Network with a contrastive loss"
https://keras.io/examples/vision/siamese_contrastive/
I'm trying to save model by model.save to h5 or hdf5; however, after I used load_model (even tried load_weights)
it showed error message for : unknown opcode
Have done googling job which all tells me it's python version problem between py3.5~py3.6
But actually I use only python 3.8....
other info say that there's some extra job need to be done either in model building or load_model
It would be very kind for any one to help provide the save and load model part
to make this demo code more completed
thanks!!

Actually here they are using two individual factors which come in a custom object.
Custom objects:
contrastive loss
embedding layer: where we are finding euclidean_distance.
Saving model:
for the saving model, it's straightforward
<model_name>.save("siamese_contrastive.h5")
Loading model:
Here the good part will come model will not load directly here because it doesn't have an understanding of two things one is your custom layer and 2nd is your loss.
model = tf.keras.models.load_model('siamese_contrastive.h5', custom_objects={ })
In the custom object mentioned above, you have to provide the definition of those two objects.
After that, it will accept your model and it will run separately at inferencing time.
Still figuring out how??
Have a look at my implementation let me know if you still have any questions: https://github.com/anukash/Keras_siamese_contrastive

Is it possible to save the class/label mapping directly inside a keras model.h5 file?

Using model.save() I am able to save the trained model. However, upon using the model for predictions, I still need to recover the respective class/label mappings (0: 'cat', 1: 'dog' ... etc.). Currently I am saving the .class_indices from my train_generator and reload it to prepare my test-data to accomplish this. However this is quite inconvenient since it forces me to keep the mapping file somewhere save for future use of my saved model.h5 file.
Hence, I wonder if there is a simpler way of saving the class information i.e. saving it inside the model file. I can't find any information on this in the keras docs, only this post Attaching class labels to a Keras model where someone tried to come up with a 'workaround' but I assume there must be a better way.

Keras plot_model() function: More elaborate output

I'm using the Keras plot_model() function to visualize my machine learning models. Apart from having the issue that the first node of my output is always simply a very large number, there is another thing annoying me about this function: It does not provide a very elaborate output. For example, I would like to be able to see more information about the used loss function, the batch size, the number of epochs, the used optimizer, etc...
Is there any way I can retrieve this information from a model I previously saved to the disk and loaded again with the model_from_json() function?

How about TensorBoardCallback? It will create interactive graphs that you can explore based on your model if you use Tensorflow as your backend.
You just need add it as a callback to your fit function and make sure write_graph=True is set (which it is by default). If you want a shortcut you can directly invoke its methods instead of passing as a callback:
tensorboard = TensorboardCallback()
tensorboard.set_model(model) # your model here, will write graph etc
tensorboard.on_train_end() # will close the writer
Then just run tensorboard --logdir=./logs to start the server.

how to use tensorflow saver with multiple models?

I'm having a lot of trouble understanding the proper use of tf.train.Saver
I have a session where I create several distinct and separate network models. All models are trained and I save the best performing networks for later use.
However, when I try to restore a model at a later time I get an error which seems to indicate that some variables are either not getting saved or restored:
NotFoundError: Tensor name "Network_8/train/beta2_power" not found in checkpoint files networks/network_0.ckpt
for some reason, when I try and load the variables for Network_0 I'm being told I need variable information for Network_8.
What is the best way to make sure I only save/restore the correct variables from a multi-network session?
It seems part of my problem is that, while I have created a dict object for the Variables I want to save (weights and biases) for each network, when I setup an optimizer such as the AdamOptimizer tensorflow automatically creates extra variables which need to be initialized. This is fine if you use tf.train.Saver to save ALL variables and you only have one network, however I am training multiple networks and only saving the best results. I'm not sure how to specify the variables tf auto adds to my dict for saving.

My solution is to create a part_saver with the same tensor name both in the original model and the new model (i.e. Network_0 and Network_8) which only restores the needed variables.
part_saver = tf.train.Saver({"W":w,"b":b,...})
Init all the variables in Network_8 before restoring the partial model.

Most efficient way to save best performing TensorFlow model on validation set while training with thread for data loading

OK, it's so easy in Torch ML ;) and I am following indico example for threading to load the data- https://indico.io/blog/tensorflow-data-input-part2-extensions/
So, for I found three ways, which I don't like and I am sure there is a better way.
1) Train and evaluate\validated on two different application\app\run- tensorflow/models/image/cifar10/cifar10_train.py and cifar10_eval.py
I don't like this one because I will waste resources i.e. GPUs where cifar10_eval.py will run. I can do this both from one file or application but don't like to save if model is not the best performing model!
2) Create validation model with weight sharing- tensorflow/models/image/mnist/convolutional.py
Much better but I dont like the fact that I need to remember all the model parameters, I am sure there is a better way to share parameters in TensorFlow i.e. can I just copy the model and say it's for parameters sharing but input feeds are different?
3) The one currently I am doing is using tf.placeholder
But can't do threading things i.e. tf.RandomShuffleQueue with this approach. May be I don't know how to do via this approach.
So, how could I do, threading to load train data and do one epoch of training then use these weights and again do threading to load validation data and get the model performance?
Basically, I am saying multi-threads to load train and valid data and save the best peforming model. Example EXACTLY similar to imagenet multi GPU training in torch- https://github.com/soumith/imagenet-multiGPU.torch
Thank you so much!

The variable-sharing approach is probably the easiest way to do what you want.
Take a look at the "Sharing Variables" tutorial; by using tf.variable_scope() and tf.get_variable() you can reuse variables without having to manage the sharing explicitly. You can instead define the model in a function, call it with different arguments, but share the model variables between the two calls.
There are also convenience layers that wrap Tensorflow's variable management. One option is Tensorflow Slim, which makes it easier to define some classes of models (especially convolutional models).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.