I trained one custom object detection model for my use case using darknet and yolov4. I mentioned 3 classes in the obj.name file as mentioned below:
# data/obj.names
no_helmet
helmet
vest
The training was completed and detection was also working with good results.
Now, I wanted to add 2 new classes to the model, so I updated the class file with 2 new class names:
# data/obj.names
no_helmet
helmet
vest
fire
smoke
I made changes to the config file and updated classes=3 to classes=5 and filter=24 to filter=30 for all the [yolo] layers and preceding [convolutional] layers.
For the dataset, I only provided images and annotations for the 2 new classes (fire and smoke).
Then I started the darknet training and for the weights parameter, I provided my old yolov4 trained weights. After it was completed, I ran tests and it didn't detect anything in the image. Not even the old classes.
Where did I go wrong?
What I feel is that since I didn't provide a dataset for older classes, the model forgot those. But, then it should have at least detected new classes, right? or am I wrong here?
NEW EDIT:
I trained again with the combined dataset(old 3 classes and 2 new classes) on the pre-trained custom weights (trained for the first 3 classes) and when I ran it for the test, still no output is there.
Could somebody explain to me what's going on here? I'm thinking there is some mathematics behind it which I don't know about.
Do I have to train from scratch every time?
You can't insert new classes in a trained model.
When you trained using only the fire and smoke dataset, your config were not configured for 2 classes, but for 5 instead. Probably it's why you didn't get anything in your second test.
The model does not forget, but uses that weight as a head start for your new model. Just train again with a dataset containing all categories labelled.
From the procedure that YOLO follows in training, any further training will enhance and modify the currently layers, so if you added new classes and retrained, you will enhance the exiting one and create new classes from scratch, which means not all classes will be the same during detection. I think the safest way is to train all from scratch.
Related
I am trying to train a custom object detector, is ther a limit on the number of target class objects that the yolov5 architecture can be trained on.
For example- coco dataset has 80 target class, suppose I have 500 object types to detect, is it advisable to use yolov5.
Can this be explain with reasons.
You can add as many classes you want to any network.
The yolo architecture is known for giving more attention to inference time rather than to performance. Although achieving good results on conventional datasets, the yolo model is built for speed.
But essentially you want a network that has a good backbone (deep and wide) that can really obtain rich features from your image.
From my experience, there is really no straight forward answer. It depends on your dataset as well, if you have large/medium/small objects to detect. I really recommend trying out different models, because every single model, will perform differently on custom datasets. From here you select the best one. State-of-the-art models don't directly relate to the best model on transfer learning and fine-tuning.
The Yolo and all other single shot detectors, for me, were the ones who worked best for fine-tuning purposes (RetinaNet was best for my use cases so far), they are good for hyper parameter tuning because you can train them fast and test what works and what doesn't. With two stage detectors (Faster-RCNN etc) I never achieved overall good results, mainly because the training process is different and much slower.
I recommend you read this article, it explains both architecture types, pros and cons.
Additionally if you want to train a model for more than 500 classes, Tensorflow Object Detection API has pre-trained models for the OpenImages dataset (600 classes), and there is the Detectron 2 on LVIS dataset (1200 classes). I recommend starting with models which were trained on a higher number of classes if you want to fine tune to a similar number of classes in your dataset.
I want to use tensorflow for detecting cars in an embedded system, so I tried ssd_mobilenet_v2 and it actually did pretty well for me, except for some specific car types which are not very common and I think that is why the model does not recognize them. I have a dataset of these cases and I want to improve the model by fine-tuning it. I should also note that I need a .tflite file because I'm using tflite_runtime in python.
I followed these instructions https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 and I could train the model and reached a reasonable loss value. I then used export_tflite_ssd_graph.py in the object detection API to build inference_graph from the trained model. Afterwards I used toco tool to build a .tflite file out of it.
But here is the problem, after I've done all that; not only the model did not improve, but now it does not detect any cars. I got confused and do not know what is the problem, I searched a lot and did not find any tutorial about doing what I need to do. They just added a new object to a model and then exported it, which I tried and I was successful doing that. I also tried to build a .tflite file without training the model and directly from the Tensorflow detection model zoo and it worked fine. So I think the problem has something to do with the training process. Maybe I am missing something there.
Another thing that I did not find in documents is that whether is it possible to "add" a class to the current classes of an object detection model. For example, let's assume the mobilenet ssd v2 detects 90 different object classes, I would like to add another class so that the model detects 91 different classes instead of 90 classes. As far as I understand and tested after doing transfer learning using object detection API, I could only detect the objects that I had in my dataset and the old classes will be gone. So how do I do what I explained?
I found out that there is no way to 'add' a class to the previously trained classes but with providing a little amount of data of that class you can have your model detect it. The reason is that the last layer of the model changes when transfer learning is applied. In my case I labeled around 3k frames containing about 12k objects because my frames would be complicated. But for simpler tasks as I saw in tutorials 200-300 annotated images would be enough.
And for the part that the model did not detect anything it has something to do with the convert command that I used. I should have used tflite_convert instead of toco. I explained more here.
I wish to repeat a series of image classification experiments by reusing a CNN with the same CNN with identical hyperparameters especially initializations. So, if I save a model after I have instantiated it and before I train it, does that also save the initializations so I then reload it later and train with a different data set and labels, does it start this new model with the same hyperparameters and initializations as the first model I trained with the first data set/classification labels? I am currently using fastai which is, of course, a library/set of API's, built on Pythorch but I think that everyone would be helped with a more general explanation that covers all CNN's using any library.
I expect an answer that says, "after this point in the workflow creating a CNN, the model is initialized and if you save it at this point, you can reload it later and use the same hyperparameters and initializations in your next model."
you can save the learner as soon it is created.
Example:
learn = cnn_learner(data,models.resnet34,metrics=error_rate)
learn.save('init')
later on:
learn.load('init)
I'm working on a project that requires the recognition of just people in a video or a live stream from a camera. I'm currently using the tensorflow object recognition API with python, and i've tried different pre-trained models and frozen inference graphs. I want to recognize only people and maybe cars so i don't need my neural network to recognize all 90 classes that come with the frozen inference graphs, based on mobilenet or rcnn, as it seems this slows the process, and 89 of this 90 classes are not needed in my project. Do i have to train my own model or is there a way to modify the inference graphs and the existing models? This is probably a noob question for some of you, but mind that i've worked with tensorflow and machine learning for just one month.
Thanks in advance
Shrinking the last layer to output 1 or two classes is not likely to yield large speed ups. This is because most of the computation is in the intermediate layers. You could shrink the intermediate layers, but this would result in poorer accuracy.
Yes, you have to train own model. Let's see in short words some ways how to do.
OPTION 1. When you want to apply transfer knowledge as maximum as possible, you can froze the CNN layers. After, you change a quantity of detected classes with dimension of classifier (dense layers). The classifier is the latest part in CNN architecture. Now, you should retrain only classifier.
OPTION 2. Assuming, you want to apply transfer knowledge for first layers of CNN (for example, froze first 2-3 CNN layers) and retrain rest of CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain rest of CNN layers and classifier.
OPTION 3. Assuming, you want to retrain whole CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain whole CNN with classifier.
Generally, the Tensorflow Object Detection API is a good start for beginners! How to proceed with your problem you can see here more detail about whole process and extra explanation here.
On the tensorflow/models repo on GitHub, they supply five pre-trained models for object detection.
These models are trained on the COCO dataset and can identify 90 different objects.
I need a model to just detect people, and nothing else. I can modify the code to only print labels on people, but it will still look for the other 89 objects, which takes more time than just looking for one object.
I can train my own model, but I would rather be able to use a pre-trained model, instead of spending a lot of time training my own model.
So is there a way, either by modifying the model file, or the TensorFlow or Object Detection API code, so it only looks for a single object?
Either you finetune your model, with just a few thousands steps, on pedestrians (a small dataset to train would be enough) or you look in your in your label definition file (.pbtx file), search for the person label, and do whatever stuff you want with the others.