Turn trained TensorFlow model into fixed operation

Turn trained TensorFlow model into fixed operation - python

Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?

Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.

Related

Why a quantized TensorFlow Lite model performs poorly on latency?

I am currently testing the latency of the inference of a U-Net network transformed with TensorFlow Lite. I am testing three NN with the same architecture on a segmentation problem (I'm testing them on my laptop with Windows OS):
First model: TensorFlow model (without optimization and created with the Keras interface).
Second model: TensorFlow model optimized with TFLite (transformed with the Python TFLite api and without quantization). It is actually the first model transformed.
Third model: TensorFlow model optimized with TFLite and quantized (transformed with the Python TFLite api and quantized with tensorflow.lite.Optimize.DEFAULT). It is actually the first model transformed.
Indeed, the second model (optimized with TFLite) improves the time performance of the first model (normal TF model) by a factor of x3 (three times faster). However, the third model (TFLite & quantization) has the worst performance time-wise. It is even slower than the first model (normal TF model).
Why the quantized model is the slowest?

It depends on which kernels your model is running.
Generally TFLite is more optimized for running on mobile devices. So it might be that in your case quantized+desktop it is using a reference implementation for some op(s).
One way to check further is to run the benchmark tool with --enable_op_profiling=true.
It will run your model with dummy data and profile the ops, and then show you summary like this
If you saw something off, then you can file a github issue with details and how to reproduce the issue and the team can debug the performance issue.

convert .pb model into quantized tflite model

Totally new to Tensorflow,
I have created one object detection model (.pb and .pbtxt) using 'faster_rcnn_inception_v2_coco_2018_01_28' model I found from TensorFlow zoo. It works fine on windows but I want to use this model on google coral edge TPU. How can I convert my frozen model into edgetpu.tflite quantized model?

There are 2 more steps to this pipeline:
1) Convert the .pb -> tflite:
I won't go through details since there are documentation on this on tensorflow official page and it changes very often, but I'll still try to answer specifically to your question. There are 2 ways of doing this:
Quantization Aware Training: this happens during training of the model. I don't think this applies to you since your question seems to indicates that you were not aware of this process. But please correct me if I'm wrong.
Post Training Quantization: Basically loading your model where all tensors are of type float and convert it to a tflite form with int8 tensors. Again, I won't go into too much details, but I'll give you 2 actual examples of doing so :) a) with code
b) with tflite_convert tools
2) Compile the model from tflite -> edgetpu.tflite:
Once you have produced a fully quantized tflite model, congrats your model is now much more efficient for arm platform and the size is much smaller. However it will still be ran on the CPU unless you compile it for the edgetpu. You can review this doc for installation and usage. But compiling it is as easy as:
$ edgetpu_compiler -s your_quantized_model.tflite
Hope this helps!

What is the difference between Tensorflow.js Layers model and Graph model?

Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?

The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.

Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.

Tensorflow - using estimator in interactive mode

I am trying to use a tensorflow neural network in "interactive" mode:
my goal would be to load a trained model, keeping it in memory, and then perform inference on it once in a while.
The problem is that apparently the tensorflow Estimator class (tf.estimator.Estimator) does not allow to do so.
The method predict (documentation, source) takes as input a batch of features and the path to the model. Then it creates a session, loads the model and perform the inference.
After that, the session is closed and for a successive inference it is necessary to load the model again.
How could I achieve my desired behavior using the Estimator class?
Thank you

You may want to have a look at tfe.make_template, its goal is precisely to make graph-based code available in eager mode.
Following the example given during the 2018 TF summit, that would give something like
def apply_my_estimator(x)
return my_estimator(x)
t = tfe.make_template('f', apply_my_estimator, create_graph_function=True)
print(t(x))

How to add a new layer to an existing TensorFlow model?

I'm loading a protobuf file of an already trained model and want to add a custom op on the input- as well as on the output-layer. Is this somehow feasible or is the design freezed once it is trained and exported?
Unfortunately I could only find options to initially (i.e. before export) redesign the model.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn trained TensorFlow model into fixed operation - python

Yes, there is a freeze_graph.py tool just for that purpose. It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.

Related

Why a quantized TensorFlow Lite model performs poorly on latency?

convert .pb model into quantized tflite model

What is the difference between Tensorflow.js Layers model and Graph model?

Tensorflow - using estimator in interactive mode

How to add a new layer to an existing TensorFlow model?

Categories

Resources