I am trying to measure FLOPS for a TFLite model in TF2.
I know that Tensorflow 1.x had the tf.profiler, which was awesome for measuring flops. It doesn't seem to work well with tf.keras.
Could anybody please describe how to measure FLOPs for a TFLite model in TF2? I can't seem to find an answer online.
Thank you all so much for your time.
Edit: The link commented below does not help with tflite.
I encountered the same problem and wrote a simple python package to roughly calculate FLOPS.
https://github.com/lisosia/tflite-flops
Only Conv and DepthwiseConv layers are considered, but it was sufficient for my use case.
Unfortunately, there's no direct way you can calculate the FLOPS for a tflite model. However, you can estimate its value indirectly, by following these 3 steps:
Use the official TFLite performance tool to measure how long your model takes (in ms) to perform a single inference.
Use some benchmark app (such as xOPS) to estimate how many floating-point operations per second (FLOPS) your target device can run.
Use the results you got from steps 1 and 2 to estimate the number of floating-point operations your model performs during a single inference.
The final result will probably be a rough approximation, but it still can bring some value to your performance analysis.
Related
System information
OpenVINO => 2022.1
Operating System / Platform =>Intel(R) Core(TM) i5-9400F CPU # 2.90GHz/ Windows 10 64 Bit
I trained the following YoloV5 model:
Model Size: Large
Labels: ['mango', 'apple', 'milk', 'orange', 'grapes'].
batch-size: 4
Img-Size: 512
When I perform the inference on the trained YoloV5 Model the detections are descent and it is able to detect all 5 labels. The detection confidence is also good averaging around 90%.
I then optimized the model using OpenVino:
Quantization: FP16, FP32
But the converted model only detects mango, apple, and grapes and completely ignores the remaining labels.
Things I have tried:
Retraining the Yolov5 model with different batch-size.
Tried different quantization while converting to OpenVino.
Tried different (previous) versions of OpenVino like 2020.4.
I have previously faced similar issues while training other models but could never figure out the solution or even the cause of the same. Has anyone else faced similar issues?
It would be ideal if someone can guide me in a direction to help solve it. Other answers that also explain potential causes of the issue are also welcome!
Converting the model into a smaller precision has its pros and cons.
The inferencing time is faster but the trade-off is accuracy.
If your use case involves something like clinical results that require to be accurate, it is not recommended to use smaller precision as you need to bear with less accuracy. Meanwhile, if your use case needs to be fast without being precise, then smaller precision (FP16/INT8) is suitable.
You should carefully choose the right precision depending on your use case and also hardware.
This might help you to further understand.
I am trying to make a Multi-Class classification application, but my dataset has 300 classes, is it possible to train my model with all these classes with a normal PC?
Sure it is. You can even train imagenet with 1000 categories or more, if you have enough time! ;)
You just have to think about which loss function you want (categorical crossentropy, sparse categorical crossentropy or even binary if you want to penalize each output node independently), apart from that there's not really much difference between 10, 100 or a 1000 classes.
Of course you have to increase your model size to account for more classes, so RAM may be an issue, but then you can always decrease batch size. If you are using images and convnets and your model is still too large, try to downsample the images, use pooling layers or larger strides.
If your computer is too old and slow, you can also try Google Colab which offers free GPU and even TPU online!
It is difficult to answer this question. The training time of your model depends on a number of factors. It might be best to train your model for a certain amount of hours and evaluate the performance. You could make use of fitting a learning curve, which could provide an esitmation of how many data points your require for training to achieve a certain performance. After that you could link the required amount of data points to computation time.
Here is an article provides an algorithm for fitting a learning curve: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3307431/.
I am combining a Monte-Carlo Tree Search with a convolutional neural network as the rollout policy. I've identified the Keras model.predict function as being very slow. After experimentation, I found that surprisingly model parameter size and prediction sample size don't affect the speed significantly. For reference:
0.00135549 s for 3 samples with batch_size = 3
0.00303991 s for 3 samples with batch_size = 1
0.00115528 s for 1 sample with batch_size = 1
0.00136132 s for 10 samples with batch_size = 10
as you can see I can predict 10 samples at about the same speed as 1 sample. The change is also very minimal though noticeable if I decrease parameter size by 100X but I'd rather not change parameter size by that much anyway. In addition, the predict function is very slow the first time run through (~0.2s) though I don't think that's the problem here since the same model is predicting multiple times.
I wonder if there is some workaround because clearly the 10 samples can be evaluated very quickly, all I want to be able to do is predict the samples at different times and not all at once since I need to update the Tree Search before making a new prediction. Perhaps should I work with tensorflow instead?
The batch size controls parallelism when predicting, so it is expected that increasing the batch size will have better performance, as you can use more cores and use GPU more efficiently.
You cannot really workaround, there is nothing really to work around, using a batch size of one is the worst case for performance. Maybe you should look into a smaller network that is faster to predict, or predict on the CPU if your experiments are done in a GPU, to minimize overhead due to transfer.
Don't forget that model.predict does a full forward pass of the network, so its speed completely depends on the network architecture.
One way that gave me a speed up was switching from model.predict(x) to,
model.predict_on_batch(x)
making sure your x shape has 1 as the first dimension.
I don't think working with pure Tensorflow would change the performance much. Keras is a high-level API for low-level Tensorflow primitives. You could use a smaller model instead, like MobileNetV3 or EfficientNet, but this would require retraining.
If you need to remain with the existing model, you could try OpenVINO. OpenVINO is optimized for Intel hardware, but it should work with any CPU. It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy. Then it uses vectorization in runtime.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets are below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert the HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device, e.g., CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what the best choice for you is, use AUTO. You care about latency, so I suggest adding a performance hint (as shown below) to use the device that fulfills your requirement.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.
I need to fit a deep neural network to data coming from a data generating process, think of an AR(5). So I have five features per observation and one y for some large number N observations in each simulation. I am interested only in the root mean squared error of the best performing DNN in each simulation.
Since it's a simulation setting, I have to do a large number of these simulations and within each simulation fit a neural network to the data. The only reasonable way I can think of doing this is fit the DNN via hyper-parameter optimisation given each simulation (dlib's find_min_global will be my optimiser).
Does it make sense to do this exercise in C++ (slow development because I am not proficient) or Python (faster iteration because I am fairly proficient).
From where I am sitting, C++ or Python might not make much of a difference in execution time, because the model has to be compiled each time the optimiser proposes a new hyper-parameter vector (am I wrong here?).
If it is possible to compile once, and test all hyper-parameters between the lower and upper bounds, then C++ would be my go to solution(Is this possible in any of the open source DNN languages?).
If anyone has done this exercise before, please advice.
Thank you all for your help.
See looking at your problem, one way to implement this is to use genetic/evolutionary algorithm. Considering that I understood your problem correctly, you want to sweep through all the hyper-parameters to get the get the best solution.
So, I would recommend using python for this and tensorflow, keras all support this. So this might not be a problem.
Note - If I understood your question differently, then please feel free to correct me.
I was asked to create a machine algorithm using tensorflow and python that could detect anomalies by creating a range of 'normal' values. I have two perameters, a large array of floats around 1.5 and timestamps. I have not seen similar threads using tensorflow in a basic sense, and since I am new to technology I am looking to make a more basic machine. However, I would like to have it be unsupervised, meaning that I do not specify what an anomaly is, but rather a large amount of past data does. Thank you, I am running python 3.5 and tensorflow 1.2.1.
Deep Learning - Anomaly and Fraud Detection
https://exploreai.org/p/deep-learning-anomaly-and-fraud-detection
Simply normalize the values and feed it to the tensorflow autoencoder model.
So, autoencoders are deep neural networks used to reproduce the input at the output layer i.e. the number of neurons in the output layer is exactly the same as the number of neurons in the input layer. Consider the image below
The autoencoders work in a similar way. The encoder part of the architecture breaks down the input data to a compressed version ensuring that important data is not lost but the overall size of the data is reduced significantly. This concept is called Dimensionality Reduction.
Check this repo for code : Autoencoder in tensorflow