Improving a pre-trained tensorflow object detection model

Improving a pre-trained tensorflow object detection model - python

I want to use tensorflow for detecting cars in an embedded system, so I tried ssd_mobilenet_v2 and it actually did pretty well for me, except for some specific car types which are not very common and I think that is why the model does not recognize them. I have a dataset of these cases and I want to improve the model by fine-tuning it. I should also note that I need a .tflite file because I'm using tflite_runtime in python.
I followed these instructions https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 and I could train the model and reached a reasonable loss value. I then used export_tflite_ssd_graph.py in the object detection API to build inference_graph from the trained model. Afterwards I used toco tool to build a .tflite file out of it.
But here is the problem, after I've done all that; not only the model did not improve, but now it does not detect any cars. I got confused and do not know what is the problem, I searched a lot and did not find any tutorial about doing what I need to do. They just added a new object to a model and then exported it, which I tried and I was successful doing that. I also tried to build a .tflite file without training the model and directly from the Tensorflow detection model zoo and it worked fine. So I think the problem has something to do with the training process. Maybe I am missing something there.
Another thing that I did not find in documents is that whether is it possible to "add" a class to the current classes of an object detection model. For example, let's assume the mobilenet ssd v2 detects 90 different object classes, I would like to add another class so that the model detects 91 different classes instead of 90 classes. As far as I understand and tested after doing transfer learning using object detection API, I could only detect the objects that I had in my dataset and the old classes will be gone. So how do I do what I explained?

I found out that there is no way to 'add' a class to the previously trained classes but with providing a little amount of data of that class you can have your model detect it. The reason is that the last layer of the model changes when transfer learning is applied. In my case I labeled around 3k frames containing about 12k objects because my frames would be complicated. But for simpler tasks as I saw in tutorials 200-300 annotated images would be enough.
And for the part that the model did not detect anything it has something to do with the convert command that I used. I should have used tflite_convert instead of toco. I explained more here.

Related

keras demo code siamese_contrastive.py save and load model?

According to the demo code
"Image similarity estimation using a Siamese Network with a contrastive loss"
https://keras.io/examples/vision/siamese_contrastive/
I'm trying to save model by model.save to h5 or hdf5; however, after I used load_model (even tried load_weights)
it showed error message for : unknown opcode
Have done googling job which all tells me it's python version problem between py3.5~py3.6
But actually I use only python 3.8....
other info say that there's some extra job need to be done either in model building or load_model
It would be very kind for any one to help provide the save and load model part
to make this demo code more completed
thanks!!

Actually here they are using two individual factors which come in a custom object.
Custom objects:
contrastive loss
embedding layer: where we are finding euclidean_distance.
Saving model:
for the saving model, it's straightforward
<model_name>.save("siamese_contrastive.h5")
Loading model:
Here the good part will come model will not load directly here because it doesn't have an understanding of two things one is your custom layer and 2nd is your loss.
model = tf.keras.models.load_model('siamese_contrastive.h5', custom_objects={ })
In the custom object mentioned above, you have to provide the definition of those two objects.
After that, it will accept your model and it will run separately at inferencing time.
Still figuring out how??
Have a look at my implementation let me know if you still have any questions: https://github.com/anukash/Keras_siamese_contrastive

How do I train the DeepSORT tracker for custom class?

I want to detect and count the number of vines in a vineyard using Deep Learning and Computer Vision techniques. I am using the YOLOv4 object detector and training on the darknet framework. I have been able to integrate the SORT tracker into my application and it works well, but I still have the following issues:
The tracker sometimes reassigns a new ID to the object
The detector sometimes misidentifies the object (which lead to incorrect tracking)
The tracker sometimes does not track a detected object.
You can see an example of the reassignment issue in the following image. As you can see, in frame 40 the id 9 was a metal post, and frame 42 onwards it is being assigned to a tree
In searching for the cause of these problems, I have learnt that DeepSORT is an improved version of the SORT, which aims to handle this problem by using a Neural Network for associating tracks to detections.
Problem:
The problem I am facing is with the training of this particular model for Deepsort. I have seen that the authors have used cosine metric learning to train their model, but I am not being able to customize the learning for my custom classes. The questions I have are as follows:
I have a dataset of annotated (YOLO TXT format) images which I have used to train the YOLOv4 model. Can I reuse the same dataset for the Deepsort tracker? If so, then how?
If I cannot reuse the dataset, then how do I create my own dataset for training the model?
Thanks in advance for the help!

Yes, you can use the same classes for DeepSORT. SORT works in 2 stages, and DeepSORT adds a 3rd stage. First stage is detection, which is handled by YOLOv3, next is track association, which is handled by Kalman Filter and IOU. DeepSORT implements the 3rd stage, a Siamese network to compare the appearance features between current detections and the features of each track. I've seen implementations use ResNet as the feature embedding network
Basically once YOLO detects your class, you pass the cropped detected image over to your siamese network and it converts it into feature embeddings and compares those features with the past ones using cosine distance.
In conclusion, you can use the same YOLO classes for DeepSORT and SORT since they both need a detection stage, which is handled by YOLO.

How can I "see" the model/network when loading a model from tfhub?

I'm new to this topic, so forgive me my lack of knowledge. There is a very good model called inception resnet v2 that basically works like this, the input is an image and outputs a list of predictions with their positions and bounded rectangles. I find this very useful, and I thought of using the already worked model in order to recognize things that it now can't (for example if a human is wearing a mask or not). Yes, I wanted to add a new recognition class to the model.
import tensorflow as tf
import tensorflow_hub as hub
mod = hub.load("https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1")
mod is an object of type
tensorflow.python.training.tracking.tracking.AutoTrackable, reading the documentation (that was only available on the source code was a bit hard to understand without context)
and I tried to inspect some of it's properties in order to see if I could figure it out by myself.
And well, I didn't. How can I see the network, the layers, the weights? the fit methods, Is it's all abstracted away?. Can I convert it to keras? I want to experiment with it, see if I can modify it, and see if I could export the model to another representation, for example pytorch.
I wanted to do this because I thought it'd be better to modify an already working model instead of creating one from scratch. Also because I'm not good at training models myself.

I've run into this issue too. Tensorflow hub guide says:
This error frequently arises when loading models in TF1 Hub format with the hub.load() API in TF2. Adding the correct signature should fix this problem.
mod = hub.load(handle).signatures['default']
As an example, you can see this notebook.

You can dir the loaded model asset to see what's defined on it
m = hub.load(handle)
dir(model)
As mentioned in the other answer, you can also look at the signatures with print(m.signatures)
Hub models are SavedModel assets and do not have a keras .fit method on them. If you want to train the model from scratch, you'll need to go to the source code.
Some models have more extensive exported interfaces including access to individual layers, but this model does not.

R-CNN: looking for REPO where FC for classification is retrainable

I'm studying different object detection algorithms for my interest.
The main reference are Andrej Karpathy's slides on object detection slides here.
I would like to start from some reference, in particular something which allows me to directly test some of the network mentioned on my data (mainly consisting in onboard cameras of car and bike races).
Unfortunately I already used some pretrained network (repo forked from JunshengFu one, where I slightly adapt Yolo to my use case), but the classification accuracy is rather poor, I guess because there were not many training instances of racing cars like Formula 1.
For this reason I would like to retrain the networks and here is where I'm finding the most issues:
properly training some of the networks requires either hardware (powerful GPUs) or time I don't have so I was wondering whether I could retrain just some part of the network, in particular the classification network and if there is any repo already allowing that.
Thank you in advance

That is called fine-tuning of the network or transfer-learning. Basically you can do that for any network you find (having similar problem domains of course), and then depending on the amount of the data you have you will either fine-tune whole network or freeze some layers and train only last layers. For your case you would probably need to freeze whole network except last fully-connected layers (which you will actually replace with new ones, satisfying your number of classes), which perform classification. I don't know what library you use, but tensorflow has official tutorial on transfer-learning. However it's not very clear tbh.
More user-friendly tutorial you can find here by some enthusiast: tutorial. Here you can find a code repository as well. One correction you need thou is that the author performs fine-tuning of the whole network, while if you want to freeze some layers you will need to get list of the trainable variables and remove those you want to freeze and pass the resultant list to the optimizer (so he ignores removed vars), like following:
all_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,scope='InceptionResnetV2')
to_train = all_vars[-6:] // you better specify them by name explicitely, but this still will work
optimizer = tf.train.AdamOptimizer(lr=0.0001)
train_op = slim.learning.create_train_op(total_loss,optimizer, variables_to_train=to_train)
Further, tensorflow has a so called model zoo (bunch of trained models you can use for your purposes and transfer-learning). You can find it here.

Is there a way to limit an existing TensorFlow object detection model?

On the tensorflow/models repo on GitHub, they supply five pre-trained models for object detection.
These models are trained on the COCO dataset and can identify 90 different objects.
I need a model to just detect people, and nothing else. I can modify the code to only print labels on people, but it will still look for the other 89 objects, which takes more time than just looking for one object.
I can train my own model, but I would rather be able to use a pre-trained model, instead of spending a lot of time training my own model.
So is there a way, either by modifying the model file, or the TensorFlow or Object Detection API code, so it only looks for a single object?

Either you finetune your model, with just a few thousands steps, on pedestrians (a small dataset to train would be enough) or you look in your in your label definition file (.pbtx file), search for the person label, and do whatever stuff you want with the others.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.