I try to convert the Keras OCR example into a CoreML model.
I already can train my slightly modified model and everything looks good in Python. But now I want to convert the model into CoreML to use it my iOS app.
The problem is, that the CoreML file format can't support Lambda layers.
I am not an expert in this field, but as far as I understand, the Lambda layer here is used to calculate the loss using ctc_batch_cost().
The layer is created around line 464.
I guess this is used for greater precision over the "build in" loss functions.
Is there any way the model creation can be rewritten to fit the layer set CoreML supports?
I have no idea which output layer type to use for the model.
Cost functions usually aren't included in the CoreML model, since CoreML only does inference while cost functions are used for training. So strip out that layer before you export the model and you should be good to go.
Related
Is there a method to apply a low pass filter on inputs for a Keras model? I have 4 inputs of noisy sensor data, and I'm curious if I can build it into the model before I export it for Inference with ONNX, or if I need to filter it outside the model.
I'm pretty new to ML, but currently my model works perfect when running the Low Pass prior to the model. My goal would be to limit user error by being able to attach the model directly to the sensor output.
Not sure if this helps, but since you're using Keras, you can include a Lambda layer between your Input Layer and the rest of your model. The Lambda layer lets you run arbitrary Tensorflow code on model inputs, so in theory you could implement your filter in this Lambda layer, provided it is something you can do with the Tensorflow primitives.
For instance, consider the following code:
input_layer = keras.Layers.Input(shape=(signal_length,no_channels))
sig_diff = keras.Layers.Lambda(lambda x: x[:,1:] - x[:,:-1])(input_layer)
conv_1 = keras.Layers.Conv1D(5)(sig_diff)
#rest of model goes here
In this very trivial example, I'm differencing the input signal before passing it to some sort of convolutional model. You might be able to implement your low pass filter in the lambda layer as above, which should solve the problem.
I have a Keras (not tf.keras) model which I quantized (post-training) to run it on an embedded device.
To convert the model to a quantized tflite model, I tried different approaches and ended with around five versions of quantized models. They all have slightly different size but they all seem to work on my x86 machine. All models show different inference timings.
Now, I would like to check how the models are actually quantized (fully, only weights,... ) as the embedded solution only takes a fully quantized model. And I want to see more details, e.g., what are the differences in weights (maybe explaining the different model size). the model summary does not give any insights.
Can you give me a tip on how to go about it?
Does anyone know if the tflite conversion with the TF1.x version is always fully quantized?
Thanks
More explanation:
The models should be fully quantized, as I used
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
during conversion. However, I had to use the TF1.x version to transform, or respectively tf.compat.v1.lite.TFLiteConverter.from_keras_model_file with TF2.x. so I am not sure about the output model using the "classic" TF1.x version or the tf.compat.v1. version.
The way different models were created
Using TF1.3 converting a h5 model
using TF1.5.3 converting a h5 model
using TF2.2 converting a h5 model
converting h5 model to pb with TF1.3
converting h5 model to pb with
TF1.5
converting h5 model to pb with TF2.2
using TF1.5.3 converting the converted pb models
using TF2.2 converting the converted pb models
Netron is a handy tool for visualizing networks. You can choose individual layers and see the types and values of weights, biases, inputs and outputs.
I am creating a model somewhat similar to the one mentioned below:
model
I am using Keras to create such model but have struck a dead end as I have not been able find a way to add SoftMax to outputs of the LSTM units. So far all the tutorials and helping material provides with information about outputting a single class even like in the case of image captioning as provided in this link.
So is it possible to apply SoftMax to every unit of LSTM (where return sequence is true) or do I have to move to pytorch.
The answer is: yes, it is possible to apply to each unit of LSTM and no, you do not have to move to PyTorch.
While in Keras 1.X you needed to explicitly state that you add a TimeDistributed layer, in Keras 2.X you can just write:
model.add(LSTM(50,activation='relu',return_sequences=False))
model.add(Dense(number_of_classes,activation='softmax'))
Wanted to know what are the differences between this and this?
Is it just the ways the inputs vary?
The main differences between LayersModel and GraphModels are:
LayersModel can only be imported from tf.keras or keras HDF5 format model types. GraphModels can be imported from either the aforementioned model types, or TensorFlow SavedModels.
LayersModels support further training in JavaScript (through its fit() method). GraphModel supports only inference.
GraphModel usually gives you higher inference speed (10-20%) than LayersModel, due to its graph optimization, which is possible thanks to the inference-only support.
Hope this helps.
Both are doing the same task i.e. converting a NN model to tfjs format. It's just that in the 1st link model stored in h5 format (typically format in which keras model are saved) is used, while in another it's TF saved model.
I have used Keras to finetune MobileNet v1. Now I have model.h5 and I need to convert it to TensorFlow Lite to use it in Android app.
I use TFLite conversion script tflite_convert. I can convert it without quantization but I need more performance so I need to make quantization.
If I run this script:
tflite_convert --output_file=model_quant.tflite \
--keras_model_file=model.h5 \
--inference_type=QUANTIZED_UINT8 \
--input_arrays=input_1 \
--output_arrays=predictions/Softmax \
--mean_values=128 \
--std_dev_values=127 \
--input_shape="1,224,224,3"
It fails:
F tensorflow/contrib/lite/toco/tooling_util.cc:1634] Array
conv1_relu/Relu6, which is an input to the DepthwiseConv operator
producing the output array conv_dw_1_relu/Relu6, is lacking min/max
data, which is necessary for quantization. If accuracy matters, either
target a non-quantized output format, or run quantized training with
your model from a floating point checkpoint to change the input graph
to contain min/max information. If you don't care about accuracy, you
can pass --default_ranges_min= and --default_ranges_max= for easy
experimentation.\nAborted (core dumped)\n"
If I use default_ranges_min and default_ranges_max (called as "dummy-quantization"), it works but it is only for debugging performance without accuracy as it is described in error log.
So what I need to do to make Keras model correctly quantizable? Do I need to find best default_ranges_min and default_ranges_max? How? Or it is about changes in Keras training phase?
Library versions:
Python 3.6.4
TensorFlow 1.12.0
Keras 2.2.4
Unfortunately, Tensorflow does not provide the tooling for post-training per layer quantization in flatbuffer (tflite) yet, but only in protobuf. The only available way now is to introduce fakeQuantization layers in your graph and re-train / fine-tune your model on the train or a calibration set. This is called "Quantization-aware training".
Once the fakeQuant layers are introduced, then you can feed the training set and TF is going to use them on Feed-Forward as simulated quantisation layers (fp-32 datatypes that represent 8-bit values) and back-propagate using full precision values. This way, you can get back the accuracy loss that caused by quantization.
In addition, the fakeQuant layers are going to capture the ranges per layer or per channel through moving average and store them in min / max variables.
Later, you can extract the graph definition and get rid of the fakeQuant nodes through freeze_graph tool.
Finally, the model can be fed into tf_lite_converter (cross-fingers it won't brake) and extract the u8_tflite with captured ranges.
A very good white-paper, explaining all these is provided by Google here : https://arxiv.org/pdf/1806.08342.pdf
Hope that helps.