Segmentation Fault when exporting to onnx a quantized Pytorch model

Segmentation Fault when exporting to onnx a quantized Pytorch model - python

I am trying to export a model to the onnx format. The architecture is complicated so I won't share it here, but basically, I have the network weights in a .pth file. I'm able to load them, create the network and perform inference with it.
It's important to note that I have adapted the code to be able to quantize the network. I have added quantize and dequantize operators as well as some torch.nn.quantized.FloatFunctional() operators.
However, whenever I try to export it with
torch.onnx.export(torch_model, # model being run
input_example, # model input
model_name, # where to save the model
export_params=True, # store the trained parameter
opset_version=11, # the ONNX version to export
# the model to
do_constant_folding=True, # whether to execute constant
# folding for optimization
)
I get Segmentation fault (core dumped)
I am working on Ubuntu 20, with the following packages installed :
torch==1.6.0
torchvision==0.7.0
onnx==1.7.0
onnxruntime==1.4.0
Note that the according to some prints I have left in the code, the inference part of the exporting completes. The segmentation fault happens afterward.
Does anyone see any reason why this may happen ?
[Edit] : I can export my network when it is not adapted for quantized operations. Therefore, the problem is not a broken installation but more a problem of some quantized operators for onnx saving.

Well, it turns out that ONNX does not support quantized models (but does not warn you in anyway when running, it just throws out a segfault). It does not seem to be on the agenda yet, so a solution can be to use TensorRT.

Related

easyOCR allocates GPU even when on gpu=False

I need to use multiple (my) TF CNN models for a rather complex task. I also need to use easyOCR (PyTorch) every now and then. However easyOCR usage is pretty rare and the task is very small in comparison to TF models inference. Therefore I use gpu=False in easyocr.Reader constructor. Nevertheless, as soon as easyocr predicts anything, GPU is allocated for for pyTorch (This is a known bug, I already checked easyOCR's github issues) and any TF model throws error:
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so tr
y looking to see if a warning log message was printed above.
If I predict with any TF model first, the easyocr model throws:
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/THCCachingHostAllocator.cpp:278
I have found a workaround, but it seems rather dangerous to put something like this into production.
Is there a more safe way of achieving this?

convert .pb model into quantized tflite model

Totally new to Tensorflow,
I have created one object detection model (.pb and .pbtxt) using 'faster_rcnn_inception_v2_coco_2018_01_28' model I found from TensorFlow zoo. It works fine on windows but I want to use this model on google coral edge TPU. How can I convert my frozen model into edgetpu.tflite quantized model?

There are 2 more steps to this pipeline:
1) Convert the .pb -> tflite:
I won't go through details since there are documentation on this on tensorflow official page and it changes very often, but I'll still try to answer specifically to your question. There are 2 ways of doing this:
Quantization Aware Training: this happens during training of the model. I don't think this applies to you since your question seems to indicates that you were not aware of this process. But please correct me if I'm wrong.
Post Training Quantization: Basically loading your model where all tensors are of type float and convert it to a tflite form with int8 tensors. Again, I won't go into too much details, but I'll give you 2 actual examples of doing so :) a) with code
b) with tflite_convert tools
2) Compile the model from tflite -> edgetpu.tflite:
Once you have produced a fully quantized tflite model, congrats your model is now much more efficient for arm platform and the size is much smaller. However it will still be ran on the CPU unless you compile it for the edgetpu. You can review this doc for installation and usage. But compiling it is as easy as:
$ edgetpu_compiler -s your_quantized_model.tflite
Hope this helps!

Turn trained TensorFlow model into fixed operation

Is there a way to take a trained TensorFlow model and convert all the tf.Variables and their respective weights (either from within a running tf.Session or from a checkpoint) into tf.constants with that value, such that one can run the model on a new input tensor without initializing or restoring the weights in a session? So can I basically condense a trained model into a fixed and immutable TensorFlow operation?

Yes, there is a freeze_graph.py tool just for that purpose.
It is described (a bit) in the Tool Developer's Guide. And you can find usage example in the Preparing models for mobile deployment section.

Tensorflow - using estimator in interactive mode

I am trying to use a tensorflow neural network in "interactive" mode:
my goal would be to load a trained model, keeping it in memory, and then perform inference on it once in a while.
The problem is that apparently the tensorflow Estimator class (tf.estimator.Estimator) does not allow to do so.
The method predict (documentation, source) takes as input a batch of features and the path to the model. Then it creates a session, loads the model and perform the inference.
After that, the session is closed and for a successive inference it is necessary to load the model again.
How could I achieve my desired behavior using the Estimator class?
Thank you

You may want to have a look at tfe.make_template, its goal is precisely to make graph-based code available in eager mode.
Following the example given during the 2018 TF summit, that would give something like
def apply_my_estimator(x)
return my_estimator(x)
t = tfe.make_template('f', apply_my_estimator, create_graph_function=True)
print(t(x))

Loading a trained model from Python to C++ in Tensorflow 1.2

I'm looking to run a basic fully-connected neural network for the MNIST dataset with the C++ API v1.2 from Tensorflow. I have trained the model and exported it using tf.train.Saver() in Python. This gave me a checkpoint file, a data file, an index file and a meta file.
I know that the data file contains the saved variables while the meta file contains the graph from using Tensorboard on a previous project.
However, I am not sure what is the recommended way to load those files
and run the trained model in a C++ environment in v1.2, since all the
tutorials and questions I've found are for older versions which differ
substantially.
I've found that tensorflow::ops::Restore should be the method to do such a thing, but I know that inference in Tensorflow isn't well supported, as such I am not certain what parameters should I give it in order to receive the trained model that I can just put into a session->Run() and receive an accuracy statement when fed test data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Segmentation Fault when exporting to onnx a quantized Pytorch model - python

Well, it turns out that ONNX does not support quantized models (but does not warn you in anyway when running, it just throws out a segfault). It does not seem to be on the agenda yet, so a solution can be to use TensorRT.

Related

easyOCR allocates GPU even when on gpu=False

convert .pb model into quantized tflite model

Turn trained TensorFlow model into fixed operation

Tensorflow - using estimator in interactive mode

Loading a trained model from Python to C++ in Tensorflow 1.2

Categories

Resources