Tensorflow model quantization best strategy

Tensorflow model quantization best strategy - python

I'm perplexed by the Tensorflow post-training quantization process. The official site refers to Tensorflow Lite Quantization. Unfortunately, this doesn't work in my case, that is, TFLiteConverter returns errors for my Mask RCNN model:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: <...>. Here is a list of operators for which you will need custom implementations: DecodeJpeg, StatelessWhile.
Basically, I've tried all available options offered by TFLiteConverter including experimental ones. I'm not quite surprised with those errors as it might make sense not to support decodejpeg for the mobile, however, I want my model to be served by Tensorflow Serving, thus I don't know why Tensorflow Lite is the official choice to go for.
I've also tried Graph Transform Tool, which seems to be deprecated, and discovered 2 issues. Firstly it's impossible to quantize with bfloat16 or float16, only int8. Secondly, the quantized model breaks with the error:
Broadcast between [1,20,1,20,1,256] and [1,1,2,1,2,1] is not supported yet
what isn't an issue in the regular model.
Furthermore, it's worth to mention my model was originally built with Tensorflow 1.x, and then ported to Tensorflow 2.1 via tensorflow.compat.v1.
This issue stole a significant amount of my time. I'd be grateful for any cue.

You can convert the model to Tensorflow Lite and use unsupported ops (Like DecodeJpeg) from TF, this is called SELECT TF OPS, see the guide here on how to enable it during conversion.

Related

What is "The supported operations" means in TensorFlow Lite for Microcontrollers?

I want to create an image classification model for facial recognition with a OpenMV Cam H7 and tensorflow. It's explained in the tensorflow documentation that "TensorFlow Lite for Microcontrollers currently supports a limited subset of TensorFlow operations, which impacts the model architectures that it is possible to run"
"The supported operations can be seen in the file all_ops_resolver.cc"
so what are supported operations?, and how do I know which supported operations I'm using in my model

If you open the link to the all_ops_resolver.cc file you just shared you will be able to see the list of supported operations. The list includes the basic building blocks needed to design a facial recognition model, like the layers Conv2D/DepthwiseConv2D and FullyConnected, and activations as Relu and Softmax.
To see the layers being used in a model you can just call model.summary() in Tensorflow/Keras.
I recommend you to start by looking for a simple Tensorflow example showing how to build a face recognition model and try to build a similar one just using operations supported by Tensorflow-lite for Microcontrollers.

easyOCR allocates GPU even when on gpu=False

I need to use multiple (my) TF CNN models for a rather complex task. I also need to use easyOCR (PyTorch) every now and then. However easyOCR usage is pretty rare and the task is very small in comparison to TF models inference. Therefore I use gpu=False in easyocr.Reader constructor. Nevertheless, as soon as easyocr predicts anything, GPU is allocated for for pyTorch (This is a known bug, I already checked easyOCR's github issues) and any TF model throws error:
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so tr
y looking to see if a warning log message was printed above.
If I predict with any TF model first, the easyocr model throws:
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278
I have found a workaround, but it seems rather dangerous to put something like this into production.
Is there a more safe way of achieving this?

Keras Model for text-prediction from Python to C++ [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am using Keras (with Theano) to train my CNN model. Does anyone has idea how can I use it in my C++ application? Does anyone tried something similar? I have idea to write some python code that will generate a c++ code with network functions - any suggestion on it?
I found a similar question here how to use Tensorflow Keras model in C++ but without answer.

To answer my own question and have a solution - I wrote a plain c++ solution called keras2cpp (its code available on github).
In this solution you store network architecture (in json) and weights (in hdf5). Then you can dump a network to a plain text file with provided script. You can use obtained text file with network in pure c++ code. There are no dependencies on python libraries or hdf5. It should work for theano and tensorflow backend.

I found myself in a similar situation but needed to not only support forward passes of sequential Keras models in C++ but also of more complex models build with the functional API.
So I wrote a new library called frugally-deep. You can find it on GitHub and it is published under the MIT License: https://github.com/Dobiasd/frugally-deep
Additionally to supporting many common layer types it can keep up with (and sometimes even beat) the performance of TensorFlow on a single CPU. You can find up-to-date benchmark results for some common model in the repo.
By automatic testing frugally-deep guarantees that the output of a model used with it in C++ is exactly the same as if run with Keras in Python.

If your keras model is trained using tensorflow backend, you can save the keras model as a tensorflow model following this code:
https://github.com/amir-abdi/keras_to_tensorflow
Here is a shorter version of the code:
from keras import backend as K
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
weight_file_path = 'path to your keras model'
net_model = load_model(weight_file_path)
sess = K.get_session()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), 'name of the output tensor')
graph_io.write_graph(constant_graph, 'output_folder_path', 'output.pb', as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join('output_folder_path', 'output.pb'))

You Can try this one
https://github.com/gosha20777/keras2cpp
Keras2cpp is a small library for running trained Keras models from a C++ application without any dependencies.
Supported Keras layers:
- Dense
- Convolution1D
- Convolution2D
- Convolution3D
- Flatten
- ELU
- Activation
- MaxPooling2D
- Embedding
- LocallyConnected1D
- LocallyConnected2D
- LSTM
- GRU
- CNN
- BatchNormalization
Supported activation:
- linear
- relu
- softplus
- tanh
- sigmoid
- hard_sigmoid
- elu
- softsign
- softmax
Design goals:
Compatibility with networks generated by Keras using TensorFlow backend.
CPU only.
No external dependencies, standard library, C++17.
Model stored in memory.

The solutions found here are quite good, but if your model has some different types of layers not supported by these libraries, I would recommend doing the following:
Converting the Keras model to a tensorflow model.
Freeze the model and use Tranform graph tool provided by tensorflow (you'll have to build it from source with bazel)
Compile the C++ API tensorflow library to use it in your project.
Use the C++ API tensorflow library and link the libraries to your project.
If you want to use a something differentcompiler than bazel (like g++ for example) you can follow this great tuturial:
http://tuatini.me/building-tensorflow-as-a-standalone-project/

The easiest way is probably to make a system call to a Python script that writes the predictions to a binary or HDF5 file, which can be read in from C++. You can also directly integrate Python into C++.
If you need to deploy and distribute this easily, you can look into self-contained Python installations like Anaconda, but your best bet may be to avoid Keras and use the C++ interface to Caffe or Tensorflow. I wouldn't recommend Tensorflow since using it from C++ isn't standard; see this discussion. Caffe is arguably the second most-popular deep learning library so you can't really go wrong.

I had a similar need--I wanted to embed Keras models in a C++ application--and decided to write my own library: Kerasify
Design goals of Kerasify:
Compatibility with image processing Sequential networks generated by Keras using Theano backend. (Could work with Tensorflow if you switch around matrix col/row ordering).
No external dependencies, standard library, C++11 features OK.
Model stored on disk in binary format that can be quickly read.
Model stored in memory in contiguous block for better cache performance.
Doesn't throw exceptions, returns only bool on error.
CPU only, no GPU
Example code, unit tests, etc. at the github link. It's not fully complete, it only supports the narrow subset of Keras functions I'm using, but it should be extensible with a little effort.

Difference between different tensorflow fully connected layers

What is the difference between the different fully connected layers available in tensorflow. I understand that there could 2 versions: Object oriented and functional, but I was able to find 4 different layers in tensorflow:
tf.keras.layers.Dense
tf.layers.dense
tf.layers.Dense
tf.contrib.layers.fully_connected
The documentation contains examples using all of them. I'd also like to know when to use each layer.

Keras is a deep learning library which functions as a wrapper over 'lower level' languges such as Tensorflow and Theano. It has recently been integrated as a Tensorflow project and is part of the code-base. If you are using 'raw' Tensorflow, you should not use this layer.
Tensorflow defines a functional interface. Layers and operations that are lowercase are typically part of this. These functions are used as building blocks when defining a custom layer or a loss function.
This is the layer you should be using.
This comes from the contrib library - features that are typically more experimental and volatile. Once a feature is deemed stable, you should use its other implementation (3). (4) will still be present in the library to maintain backwards compatability.

Is a Keras wrapper function. Its functionality is same as 3. Checkout Keras.
Its a functional interface for tensorflow.
Commonly used.
Function under development.
Technically speaking first 3 have same functionality (same inputs and outputs).

Keras vs TensorFlow - does Keras have any actual benefits?

I have been implementing some deep nets in Keras, but have eventually gotten frustrated with some limitations (for example: setting floatx to float16 fails on batch normalization layers, and the only way to fix it is to actually edit the Keras source; implementing custom layers requires coding them in backend code, which destroys the ability to switch backends), there appear to be no parallel training mechanisms [unlike tf.Estimator], and even vanilla programs run 30% slower in Keras than in tf (if one is to trust the interwebs), and was grumbling about moving to tensorflow, but was pleased to discover that TensorFlow (especially if you use tf.layers stuff) is not actually any longer for anything imaginable you might want to do. Is this a failure of my imagination, or is tf.layers basically a backporting of Keras into core TensorFlow, and is there any actual use case for Keras?

Keras used to have an upper hand on TensorFlow in the past but ever since the author is now affiliated with Google all the features that made it attractive are being implemented into TensorFlow you can check version 1.8, like you rightfully pointed out tf.layers is one such example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.