Quantization support in Keras

Quantization support in Keras - python

I have a model that is trained in Keras with tensor flow backend. The weights are in .h5 format. I am interested in applying quantization feature part of tensorflow (https://www.tensorflow.org/api_docs/python/tf/quantization). So far, I have managed to convert the weights from .h5 format to tensor flow .pb format using the tool available online (https://github.com/amir-abdi/keras_to_tensorflow/). There are a couple of issues with this and the main concern is I don’t see a reduction in my model size post quantization. Also, I need to re-convert the .pb weights to .h5 format to test it with my infrastructure.
Is there a known best method for performing tensorflow
quantization within Keras?
Is there an easy way to convert weights format from .pb to .h5?
Thanks

Related

How could I convert trained model from ONNX to PyTorch

I have a trained model in ONNX format.
https://github.com/onnx/models/tree/main/vision/object_detection_segmentation/faster-rcnn
I need to convert it to PyTorch to make some modification with outputs like change number of classes, get network to tune on my dataset.

Check quantization status of model

I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.
To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.
Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.
Can you give me a tip on how to go about it?
Does anyone know if the tflite conversion with the TF1.x version is always fully quantized?
Thanks
More explanation:
The models should be fully quantized, as I used
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.
The way different models were created
Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with
TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models

Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.

What is the difference between Tensorflow.js Layers model and Graph model?

Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?

The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.

Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.

I use TFLiteConvert post_training_quantize=True but my model is still too big for being hosted in Firebase ML Kit's Custom servers

I have written a TensorFlow / Keras Super-Resolution GAN. I've converted the resulting trained .h5 model to a .tflite model, using the below code, executed in Google Colab:
import tensorflow as tf
model = tf.keras.models.load_model('/content/drive/My Drive/srgan/output/srgan.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.post_training_quantize=True
tflite_model = converter.convert()
open("/content/drive/My Drive/srgan/output/converted_model_quantized.tflite", "wb").write(tflite_model)
As you can see I use converter.post_training_quantize=True which was censed to help to output a lighter .tflite model than the size of my original .h5 model, which is 159MB. The resulting .tflite model is still 159MB however.
It's so big that I can't upload it to Google Firebase Machine Learning Kit's servers in the Google Firebase Console.
How could I either:
decrease the size of the current .tflite model which is 159MB (for example using a tool),
or after having deleted the current .tflite model which is 159MB, convert the .h5 model to a lighter .tflite model (for example using a tool)?
Related questions
How to decrease size of .tflite which I converted from keras: no answer, but a comment telling to use converter.post_training_quantize=True. However, as I explained it, this solution doesn't seem to work in my case.

In general, quantization means, shifting from dtype float32 to uint8. So theoretically our model should reduce by the size of 4. This will be clearly visible in files of greater size.
Check whether your model has been quantized or not by using the tool "https://lutzroeder.github.io/netron/". Here you have to load the model and check the random layers having weight.The quantized graph contains the weights value in uint8 format
In unquantized graph the weights value will be in float32 format.
Only setting "converter.post_training_quantize=True" is not enough to quantize your model. The other settings include:
converter.inference_type=tf.uint8
converter.default_ranges_stats=[min_value,max_value]
converter.quantized_input_stats={"name_of_the_input_layer_for_your_model":[mean,std]}
Hoping you are dealing with images.
min_value=0, max_value=255, mean=128(subjective) and std=128(subjective). name_of_the_input_layer_for_your_model= first name of the graph when you load your model in the above mentioned link or you can get the name of the input layer through the code "model.input" will give the output "tf.Tensor 'input_1:0' shape=(?, 224, 224, 3) dtype=float32". Here the input_1 is the name of the input layer(NOTE: model must include the graph configuration and the weight.)

Can I convert all the tensorflow slim models to tflite?

I'm training tensorflow slim based models for image classification on a custom dataset. Before I invest a lot of time training such huge a dataset, I wanted to know whether or not can I convert all the models available in the slim model zoo to tflite format.
Also, I know that I can convert my custom slim-model to a frozen graph. It is the step after this which I'm worried about i.e, conversion to .tflite from my custom trained .pb model.
Is this supported ? or is there anyone who is facing conversion problems that has not yet been resolved ?
Thanks.

Many Slim models can be converted to TFLite, but it isn't a guarantee since some models might have ops not supported by TFLite.
What you could do, is try and convert your model to TensorFlow Lite using TFLiteConverter in Python before training. If the conversion succeeds, then you can train your TF model and convert it once again.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.