Is it possible to load only weights to TF-TRT model?

Is it possible to load only weights to TF-TRT model? - python

I have two models with the exact same architecture, but different weights as the same network is used for two different problems. We're using TF-TRT to optimize the model in order to use it on edge devices.
We'd like to be able to switch from one model to the other as fast as possible. As of now, we load the next model using tf.saved_model.load(), however, this reloads the entire model including the architecture. In order to speed up the process, we'd like to simply load the weights & switch them in the model architecture.
From what I've seen, it is possible in Keras by loading a .w1 file, but we don't have such file after converting to TF-TRT.
I've found out that TRT has a Refitter object but I don't think we can use it in this case.
I'd like to know if it is possible to switch weights of a TF-TRT model, perhaps there is something I'm missing out.
Thank you for your help.

Related

How can I "see" the model/network when loading a model from tfhub?

I'm new to this topic, so forgive me my lack of knowledge. There is a very good model called inception resnet v2 that basically works like this, the input is an image and outputs a list of predictions with their positions and bounded rectangles. I find this very useful, and I thought of using the already worked model in order to recognize things that it now can't (for example if a human is wearing a mask or not). Yes, I wanted to add a new recognition class to the model.
import tensorflow as tf
import tensorflow_hub as hub
mod = hub.load("https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1")
mod is an object of type
tensorflow.python.training.tracking.tracking.AutoTrackable, reading the documentation (that was only available on the source code was a bit hard to understand without context)
and I tried to inspect some of it's properties in order to see if I could figure it out by myself.
And well, I didn't. How can I see the network, the layers, the weights? the fit methods, Is it's all abstracted away?. Can I convert it to keras? I want to experiment with it, see if I can modify it, and see if I could export the model to another representation, for example pytorch.
I wanted to do this because I thought it'd be better to modify an already working model instead of creating one from scratch. Also because I'm not good at training models myself.

I've run into this issue too. Tensorflow hub guide says:
This error frequently arises when loading models in TF1 Hub format with the hub.load() API in TF2. Adding the correct signature should fix this problem.
mod = hub.load(handle).signatures['default']
As an example, you can see this notebook.

You can dir the loaded model asset to see what's defined on it
m = hub.load(handle)
dir(model)
As mentioned in the other answer, you can also look at the signatures with print(m.signatures)
Hub models are SavedModel assets and do not have a keras .fit method on them. If you want to train the model from scratch, you'll need to go to the source code.
Some models have more extensive exported interfaces including access to individual layers, but this model does not.

Improving a pre-trained tensorflow object detection model

I want to use tensorflow for detecting cars in an embedded system, so I tried ssd_mobilenet_v2 and it actually did pretty well for me, except for some specific car types which are not very common and I think that is why the model does not recognize them. I have a dataset of these cases and I want to improve the model by fine-tuning it. I should also note that I need a .tflite file because I'm using tflite_runtime in python.
I followed these instructions https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 and I could train the model and reached a reasonable loss value. I then used export_tflite_ssd_graph.py in the object detection API to build inference_graph from the trained model. Afterwards I used toco tool to build a .tflite file out of it.
But here is the problem, after I've done all that; not only the model did not improve, but now it does not detect any cars. I got confused and do not know what is the problem, I searched a lot and did not find any tutorial about doing what I need to do. They just added a new object to a model and then exported it, which I tried and I was successful doing that. I also tried to build a .tflite file without training the model and directly from the Tensorflow detection model zoo and it worked fine. So I think the problem has something to do with the training process. Maybe I am missing something there.
Another thing that I did not find in documents is that whether is it possible to "add" a class to the current classes of an object detection model. For example, let's assume the mobilenet ssd v2 detects 90 different object classes, I would like to add another class so that the model detects 91 different classes instead of 90 classes. As far as I understand and tested after doing transfer learning using object detection API, I could only detect the objects that I had in my dataset and the old classes will be gone. So how do I do what I explained?

I found out that there is no way to 'add' a class to the previously trained classes but with providing a little amount of data of that class you can have your model detect it. The reason is that the last layer of the model changes when transfer learning is applied. In my case I labeled around 3k frames containing about 12k objects because my frames would be complicated. But for simpler tasks as I saw in tutorials 200-300 annotated images would be enough.
And for the part that the model did not detect anything it has something to do with the convert command that I used. I should have used tflite_convert instead of toco. I explained more here.

Most space/memory efficient way to save Tensorflow model for prediction only?

I have a huge Tensorflow model (the checkpoint file is 4-5 gbs). I was wondering if there's a different way to save Tensorflow models, besides the checkpoint way, that is space/memory efficient.
I know that a checkpoint file also saves all the optimizer gradients, so maybe those can be cut out too.
My model is very simple, just two matrices of embeddings, perhaps I can only save those matrices to .npy directly?

What you want to do with the checkpoint is to freeze it. Check out this page from tensorflow's official documentation.
The freezing process strips off all extraneous information from the checkpoint that isn't used for forward inference. Tensorflow provides an easy to use script for it called freeze_graph.py.

Unable to load and use multiple keras models

I'm trying to load three different models in the same process. Only the first one works as expected, the rest of them return like random results.
Basically the order is as follows:
define and compile first model
load trained weights before
rename layers
the same process for the second model
the same process for the third model
So, something like:
model1 = Model(inputs=Input(shape=input_size_im) , outputs=layers_firstmodel)
model1.compile(optimizer='sgd', loss='mse')
model1.load_weights(weights_first, by_name=True)
# rename layers but didn't work
model2 = Model(inputs=Input(shape=input_size_im) , outputs=layers_secondmodel)
model2.compile(optimizer='sgd', loss='mse')
model2.load_weights(weights_second, by_name=True)
# rename layers but didn't work
model3 = Model(inputs=Input(shape=input_size_im) , outputs=layers_thirdmodel)
model3.compile(optimizer='sgd', loss='mse')
model3.load_weights(weights_third, by_name=True)
# rename layers but didn't work
for im in list_images:
results_firstmodel = model1.predict(im)
results_secondmodel = model2.predict(im)
results_thirdmodel = model2.predict(im)
I'd like to perform some inference over a bunch of images. To do that the idea consists in looping over the images and perform inference with these three algorithms, and return the results.
I have tried to rename all layers to make them unique with no success. Also I created a different graph for each network, and with a different session do the inference. This works but it's very inefficient (in addition I have to set their weights every time because of sess.run(tf.global_variables_initializer()) removes them). Each time it's created a session tensorflow prints "creating tensorflow device (/device:GPU:0)".
I am running Tensorflow 1.4.0-rc0, Keras 2.1.1 and Ubuntu 16.04 kernel 4.14.

The OP is correct here. There is a serious bug when you try to load multiple weight files in the same script. The above answer doesn't solve this. If you actually interrogate the weights when loading weights for multiple models in the same script you will notice that the weights are different than when you just load weights for one model on its own. This is where the randomness is the OP observes coming from.
EDIT: To solve this problem you have to encapsulate the model.load_weight command within a function and the randomness that you are experiencing should go away. The problem is that something weird screws up when you have multiple load_weight commands in the same script like you have above. If you load those model weights with a function you issues should go away.

From the Keras docs we have this explanation for the user of load_weights:
loads the weights of the model from a HDF5 file (created by save_weights). By default, the architecture is expected to be unchanged. To load weights into a different architecture (with some layers in common), use by_name=True to load only those layers with the same name.
Therefore, if your architecture is unchanged you should drop the by_name=True or make it False (its default value). This could be causing the inconsistencies that you are facing, as your weights are not being loaded probably due to having different names on your layers.
Another important thing to consider is the nature of your HDF5 file, and the way you created it. If it indeed contains only the weights (created with save_weights as the docs point out) then there should be no problem in proceeding as explained before.
Now, if that HDF5 contains weights and architecture in the same file, then you should be loading it with keras.models.load_model instead (further reading if you like here). If this is the case then this would also explain those inconsistencies.
As a side suggestion, I prefer to save my models using Callbacks, like the ModelCheckpoint or the EarlyStopping if you want to automatically determine when to stop training. This not only gives you greater flexibility when training and saving your models (as you can stop them on the optimal training epoch or when you desire), but also makes loading those models easily, as you can simply use the load_model method to load both architecture and weights to your desired variable.
Finally, here is one useful SO post where saving (and loading) Keras models is explained.

Most efficient way to save best performing TensorFlow model on validation set while training with thread for data loading

OK, it's so easy in Torch ML ;) and I am following indico example for threading to load the data- https://indico.io/blog/tensorflow-data-input-part2-extensions/
So, for I found three ways, which I don't like and I am sure there is a better way.
1) Train and evaluate\validated on two different application\app\run- tensorflow/models/image/cifar10/cifar10_train.py and cifar10_eval.py
I don't like this one because I will waste resources i.e. GPUs where cifar10_eval.py will run. I can do this both from one file or application but don't like to save if model is not the best performing model!
2) Create validation model with weight sharing- tensorflow/models/image/mnist/convolutional.py
Much better but I dont like the fact that I need to remember all the model parameters, I am sure there is a better way to share parameters in TensorFlow i.e. can I just copy the model and say it's for parameters sharing but input feeds are different?
3) The one currently I am doing is using tf.placeholder
But can't do threading things i.e. tf.RandomShuffleQueue with this approach. May be I don't know how to do via this approach.
So, how could I do, threading to load train data and do one epoch of training then use these weights and again do threading to load validation data and get the model performance?
Basically, I am saying multi-threads to load train and valid data and save the best peforming model. Example EXACTLY similar to imagenet multi GPU training in torch- https://github.com/soumith/imagenet-multiGPU.torch
Thank you so much!

The variable-sharing approach is probably the easiest way to do what you want.
Take a look at the "Sharing Variables" tutorial; by using tf.variable_scope() and tf.get_variable() you can reuse variables without having to manage the sharing explicitly. You can instead define the model in a function, call it with different arguments, but share the model variables between the two calls.
There are also convenience layers that wrap Tensorflow's variable management. One option is Tensorflow Slim, which makes it easier to define some classes of models (especially convolutional models).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.