Check quantization status of model

Check quantization status of model - python

I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.
To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.
Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.
Can you give me a tip on how to go about it?
Does anyone know if the tflite conversion with the TF1.x version is always fully quantized?
Thanks
More explanation:
The models should be fully quantized, as I used
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.
The way different models were created
Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with
TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models

Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.

Related

Difference in prediction values for a binary classifier in ML.net and Tensorflow

I was comparing results between the same model generated and used between two frameworks (Ml.net and tensorflow/keras) independently.
These are the steps in followed in getting the results:
Train Model in Keras save as .h5 format.
Convert to .pb model to use in ML.net, as ml.net currently uses tensorflow graphs and not saved model format.
Now for the same two images, the prediction value should be the same. As the model is the same. The difference being the underlying framework.
I could find only bits and pieces regarding why this occours. I am running both the programs with same underlying hardware and software configurations.
Is the underlying cause the difference in framework?
The image prediction scores for example images :
ml.net : 0.9929248 Fail
keras/tensorflow : 0.99635077 Fail
The difference is minute, but being deterministic irrespective of platform is a feature that I am looking for.

convert .pb model into quantized tflite model

Totally new to Tensorflow,
I have created one object detection model (.pb and .pbtxt) using 'faster_rcnn_inception_v2_coco_2018_01_28' model I found from TensorFlow zoo. It works fine on windows but I want to use this model on google coral edge TPU. How can I convert my frozen model into edgetpu.tflite quantized model?

There are 2 more steps to this pipeline:
1) Convert the .pb -> tflite:
I won't go through details since there are documentation on this on tensorflow official page and it changes very often, but I'll still try to answer specifically to your question. There are 2 ways of doing this:
Quantization Aware Training: this happens during training of the model. I don't think this applies to you since your question seems to indicates that you were not aware of this process. But please correct me if I'm wrong.
Post Training Quantization: Basically loading your model where all tensors are of type float and convert it to a tflite form with int8 tensors. Again, I won't go into too much details, but I'll give you 2 actual examples of doing so :) a) with code
b) with tflite_convert tools
2) Compile the model from tflite -> edgetpu.tflite:
Once you have produced a fully quantized tflite model, congrats your model is now much more efficient for arm platform and the size is much smaller. However it will still be ran on the CPU unless you compile it for the edgetpu. You can review this doc for installation and usage. But compiling it is as easy as:
$ edgetpu_compiler -s your_quantized_model.tflite
Hope this helps!

What is the difference between Tensorflow.js Layers model and Graph model?

Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?

The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.

Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.

I use TFLiteConvert post_training_quantize=True but my model is still too big for being hosted in Firebase ML Kit's Custom servers

I have written a TensorFlow / Keras Super-Resolution GAN. I've converted the resulting trained .h5 model to a .tflite model, using the below code, executed in Google Colab:
import tensorflow as tf
model = tf.keras.models.load_model('/content/drive/My Drive/srgan/output/srgan.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.post_training_quantize=True
tflite_model = converter.convert()
open("/content/drive/My Drive/srgan/output/converted_model_quantized.tflite", "wb").write(tflite_model)
As you can see I use converter.post_training_quantize=True which was censed to help to output a lighter .tflite model than the size of my original .h5 model, which is 159MB. The resulting .tflite model is still 159MB however.
It's so big that I can't upload it to Google Firebase Machine Learning Kit's servers in the Google Firebase Console.
How could I either:
decrease the size of the current .tflite model which is 159MB (for example using a tool),
or after having deleted the current .tflite model which is 159MB, convert the .h5 model to a lighter .tflite model (for example using a tool)?
Related questions
How to decrease size of .tflite which I converted from keras: no answer, but a comment telling to use converter.post_training_quantize=True. However, as I explained it, this solution doesn't seem to work in my case.

In general, quantization means, shifting from dtype float32 to uint8. So theoretically our model should reduce by the size of 4. This will be clearly visible in files of greater size.
Check whether your model has been quantized or not by using the tool "https://lutzroeder.github.io/netron/". Here you have to load the model and check the random layers having weight.The quantized graph contains the weights value in uint8 format
In unquantized graph the weights value will be in float32 format.
Only setting "converter.post_training_quantize=True" is not enough to quantize your model. The other settings include:
converter.inference_type=tf.uint8
converter.default_ranges_stats=[min_value,max_value]
converter.quantized_input_stats={"name_of_the_input_layer_for_your_model":[mean,std]}
Hoping you are dealing with images.
min_value=0, max_value=255, mean=128(subjective) and std=128(subjective). name_of_the_input_layer_for_your_model= first name of the graph when you load your model in the above mentioned link or you can get the name of the input layer through the code "model.input" will give the output "tf.Tensor 'input_1:0' shape=(?, 224, 224, 3) dtype=float32". Here the input_1 is the name of the input layer(NOTE: model must include the graph configuration and the weight.)

Can I convert all the tensorflow slim models to tflite?

I'm training tensorflow slim based models for image classification on a custom dataset. Before I invest a lot of time training such huge a dataset, I wanted to know whether or not can I convert all the models available in the slim model zoo to tflite format.
Also, I know that I can convert my custom slim-model to a frozen graph. It is the step after this which I'm worried about i.e, conversion to .tflite from my custom trained .pb model.
Is this supported ? or is there anyone who is facing conversion problems that has not yet been resolved ?
Thanks.

Many Slim models can be converted to TFLite, but it isn't a guarantee since some models might have ops not supported by TFLite.
What you could do, is try and convert your model to TensorFlow Lite using TFLiteConverter in Python before training. If the conversion succeeds, then you can train your TF model and convert it once again.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.