convert .pb model into quantized tflite model

convert .pb model into quantized tflite model - python

Totally new to Tensorflow,
I have created one object detection model (.pb and .pbtxt) using 'faster_rcnn_inception_v2_coco_2018_01_28' model I found from TensorFlow zoo. It works fine on windows but I want to use this model on google coral edge TPU. How can I convert my frozen model into edgetpu.tflite quantized model?

There are 2 more steps to this pipeline:
1) Convert the .pb -> tflite:
I won't go through details since there are documentation on this on tensorflow official page and it changes very often, but I'll still try to answer specifically to your question. There are 2 ways of doing this:
Quantization Aware Training: this happens during training of the model. I don't think this applies to you since your question seems to indicates that you were not aware of this process. But please correct me if I'm wrong.
Post Training Quantization: Basically loading your model where all tensors are of type float and convert it to a tflite form with int8 tensors. Again, I won't go into too much details, but I'll give you 2 actual examples of doing so :) a) with code
b) with tflite_convert tools
2) Compile the model from tflite -> edgetpu.tflite:
Once you have produced a fully quantized tflite model, congrats your model is now much more efficient for arm platform and the size is much smaller. However it will still be ran on the CPU unless you compile it for the edgetpu. You can review this doc for installation and usage. But compiling it is as easy as:
$ edgetpu_compiler -s your_quantized_model.tflite
Hope this helps!

Related

Segmentation Fault when exporting to onnx a quantized Pytorch model

I am trying to export a model to the onnx format. The architecture is complicated so I won't share it here, but basically, I have the network weights in a .pth file. I'm able to load them, create the network and perform inference with it.
It's important to note that I have adapted the code to be able to quantize the network. I have added quantize and dequantize operators as well as some torch.nn.quantized.FloatFunctional() operators.
However, whenever I try to export it with
torch.onnx.export(torch_model, # model being run
input_example, # model input
model_name, # where to save the model
export_params=True, # store the trained parameter
opset_version=11, # the ONNX version to export
# the model to
do_constant_folding=True, # whether to execute constant
# folding for optimization
)
I get Segmentation fault (core dumped)
I am working on Ubuntu 20, with the following packages installed :
torch==1.6.0
torchvision==0.7.0
onnx==1.7.0
onnxruntime==1.4.0
Note that the according to some prints I have left in the code, the inference part of the exporting completes. The segmentation fault happens afterward.
Does anyone see any reason why this may happen ?
[Edit] : I can export my network when it is not adapted for quantized operations. Therefore, the problem is not a broken installation but more a problem of some quantized operators for onnx saving.

Well, it turns out that ONNX does not support quantized models (but does not warn you in anyway when running, it just throws out a segfault). It does not seem to be on the agenda yet, so a solution can be to use TensorRT.

Check quantization status of model

I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.
To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.
Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.
Can you give me a tip on how to go about it?
Does anyone know if the tflite conversion with the TF1.x version is always fully quantized?
Thanks
More explanation:
The models should be fully quantized, as I used
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.
The way different models were created
Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with
TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models

Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.

What is the difference between Tensorflow.js Layers model and Graph model?

Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?

The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.

Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.

Can I convert all the tensorflow slim models to tflite?

I'm training tensorflow slim based models for image classification on a custom dataset. Before I invest a lot of time training such huge a dataset, I wanted to know whether or not can I convert all the models available in the slim model zoo to tflite format.
Also, I know that I can convert my custom slim-model to a frozen graph. It is the step after this which I'm worried about i.e, conversion to .tflite from my custom trained .pb model.
Is this supported ? or is there anyone who is facing conversion problems that has not yet been resolved ?
Thanks.

Many Slim models can be converted to TFLite, but it isn't a guarantee since some models might have ops not supported by TFLite.
What you could do, is try and convert your model to TensorFlow Lite using TFLiteConverter in Python before training. If the conversion succeeds, then you can train your TF model and convert it once again.

Turn trained TensorFlow model into fixed operation

Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?

Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.