Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?
Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.
Related
I am currently testing the latency of the inference of a U-Net network transformed with TensorFlow Lite. I am testing three NN with the same architecture on a segmentation problem (I'm testing them on my laptop with Windows OS):
First model: TensorFlow model (without optimization and created with the Keras interface).
Second model: TensorFlow model optimized with TFLite (transformed with the Python TFLite api and without quantization). It is actually the first model transformed.
Third model: TensorFlow model optimized with TFLite and quantized (transformed with the Python TFLite api and quantized with tensorflow.lite.Optimize.DEFAULT). It is actually the first model transformed.
Indeed, the second model (optimized with TFLite) improves the time performance of the first model (normal TF model) by a factor of x3 (three times faster). However, the third model (TFLite & quantization) has the worst performance time-wise. It is even slower than the first model (normal TF model).
Why the quantized model is the slowest?
It depends on which kernels your model is running.
Generally TFLite is more optimized for running on mobile devices. So it might be that in your case quantized+desktop it is using a reference implementation for some op(s).
One way to check further is to run the benchmark tool with --enable_op_profiling=true.
It will run your model with dummy data and profile the ops, and then show you summary like this
If you saw something off, then you can file a github issue with details and how to reproduce the issue and the team can debug the performance issue.
Totally new to Tensorflow,
I have created one object detection model (.pb and .pbtxt) using 'faster_rcnn_inception_v2_coco_2018_01_28' model I found from TensorFlow zoo. It works fine on windows but I want to use this model on google coral edge TPU. How can I convert my frozen model into edgetpu.tflite quantized model?
There are 2 more steps to this pipeline:
1) Convert the .pb -> tflite:
I won't go through details since there are documentation on this on tensorflow official page and it changes very often, but I'll still try to answer specifically to your question. There are 2 ways of doing this:
Quantization Aware Training: this happens during training of the model. I don't think this applies to you since your question seems to indicates that you were not aware of this process. But please correct me if I'm wrong.
Post Training Quantization: Basically loading your model where all tensors are of type float and convert it to a tflite form with int8 tensors. Again, I won't go into too much details, but I'll give you 2 actual examples of doing so :) a) with code
b) with tflite_convert tools
2) Compile the model from tflite -> edgetpu.tflite:
Once you have produced a fully quantized tflite model, congrats your model is now much more efficient for arm platform and the size is much smaller. However it will still be ran on the CPU unless you compile it for the edgetpu. You can review this doc for installation and usage. But compiling it is as easy as:
$ edgetpu_compiler -s your_quantized_model.tflite
Hope this helps!
Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?
The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.
Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.
I am trying to use a tensorflow neural network in "interactive" mode:
my goal would be to load a trained model, keeping it in memory, and then perform inference on it once in a while.
The problem is that apparently the tensorflow Estimator class (tf.estimator.Estimator) does not allow to do so.
The method predict (documentation, source) takes as input a batch of features and the path to the model. Then it creates a session, loads the model and perform the inference.
After that, the session is closed and for a successive inference it is necessary to load the model again.
How could I achieve my desired behavior using the Estimator class?
Thank you
You may want to have a look at tfe.make_template, its goal is precisely to make graph-based code available in eager mode.
Following the example given during the 2018 TF summit, that would give something like
def apply_my_estimator(x)
return my_estimator(x)
t = tfe.make_template('f', apply_my_estimator, create_graph_function=True)
print(t(x))
I'm loading a protobuf file of an already trained model and want to add a custom op on the input- as well as on the output-layer. Is this somehow feasible or is the design freezed once it is trained and exported?
Unfortunately I could only find options to initially (i.e. before export) redesign the model.