I have a real-time problem which is aimed to detect 9 objects. As far as I understand, yolo has promising results on real-time object detection problems so I am searching good instructions to train a pre-trained yolo model with my custom "own" dataset.
I have my dataset and they are already labeled, also they have bounding box coordinates in .txt files in yolo format. However, it is a bit confusing to find a good instruction on the web about yolo custom dataset training for own object detection problem, since instructions are mostly using generic dataset such as COCO, PASCAL etc. or their instructions are not well enough to implement the object detection model on own dataset.
TL;DR
My question is, are there some handy instructions about implementing yolo object detection for own dataset? I am more looking for frameworks to implement yolo model rather than darknet C implementation since I am more familiar with python so it would be perfect if you could provide Pytorch or Tensorflow implementation.
It is more appraciated if you already implemented yolov3-v4 with your own dataset with the help of instructions you found on the web and you are willing to share those instructions.
Thanks in advance.
For training purpose I would highly recommend AlexeyAB's repository as it's highly optimised for accuracy and speed, although it is also written in C. As far as testing and deployment is considered you have a lot of options:
OpenCV's DNN Module: refer this article.
Tensorflow Model
Pytorch Model
Out of these OpenCV's DNN implementation is the fastest for testing/inference.
Related
Currently, I am learning CNN by myself. I find lots of sources on the Internet. But most of them use Pytorch and Tensorflow. I want to find some examples for image classification that uses NumPy only and have some way to train my dataset, save and load the trained model. Does anyone know where is the example?
I wrote a library if you are interested: https://github.com/samrere/pytortto
basically it's a pytorch written in numpy&cupy. It completely follows pytorch interface, and can be trained in GPU.
I've included several basic examples, such as training Resnet, UNet, vision transformer and DCGAN, all trained/finetuned entirely using simple numpy functions.
I want to detect and count the number of vines in a vineyard using Deep Learning and Computer Vision techniques. I am using the YOLOv4 object detector and training on the darknet framework. I have been able to integrate the SORT tracker into my application and it works well, but I still have the following issues:
The tracker sometimes reassigns a new ID to the object
The detector sometimes misidentifies the object (which lead to incorrect tracking)
The tracker sometimes does not track a detected object.
You can see an example of the reassignment issue in the following image. As you can see, in frame 40 the id 9 was a metal post, and frame 42 onwards it is being assigned to a tree
In searching for the cause of these problems, I have learnt that DeepSORT is an improved version of the SORT, which aims to handle this problem by using a Neural Network for associating tracks to detections.
Problem:
The problem I am facing is with the training of this particular model for Deepsort. I have seen that the authors have used cosine metric learning to train their model, but I am not being able to customize the learning for my custom classes. The questions I have are as follows:
I have a dataset of annotated (YOLO TXT format) images which I have used to train the YOLOv4 model. Can I reuse the same dataset for the Deepsort tracker? If so, then how?
If I cannot reuse the dataset, then how do I create my own dataset for training the model?
Thanks in advance for the help!
Yes, you can use the same classes for DeepSORT. SORT works in 2 stages, and DeepSORT adds a 3rd stage. First stage is detection, which is handled by YOLOv3, next is track association, which is handled by Kalman Filter and IOU. DeepSORT implements the 3rd stage, a Siamese network to compare the appearance features between current detections and the features of each track. I've seen implementations use ResNet as the feature embedding network
Basically once YOLO detects your class, you pass the cropped detected image over to your siamese network and it converts it into feature embeddings and compares those features with the past ones using cosine distance.
In conclusion, you can use the same YOLO classes for DeepSORT and SORT since they both need a detection stage, which is handled by YOLO.
I'm having problems in finding the best network and configuration to detect small-scale objects. Since now I got very Los mAPs on small objects (i am trying to detect traffic Signs using mapillary dataset)
I have tried using Faster R-CNN 101 (resizing the input to 1024) and the SSD 101 with FPN (resizing the input to 1024).
I did not find a pre-trained model of faster R-CNN with FPN so i could not try that.
What do you think would be the best network and confuguration to detect small objects?
Thank you.
The models you mentioned are models that are built for speed. With small object detection, you often care more about accuracy of the model. So you should probably use bigger models that sacrifice speed for accuracy (mAP). If you want to use tensorflow 2, here is an overview of the available models. Also, for small object detection you should keep high resolution, as you said. You could also maybe crop images into multiple crops instead, to detect on portions of images.
So I disagree with #Akash Desai about SSD, but I also think that detectron2 is more up to date to state of the art models for better performance. So if you don't care about the framework, maybe switch to detectron2.
SSD is best for detecting small as well as large target ,because it will try to do prediction on each and every feature map.
you resized images to 1024 ??? it this case model will take more time to train on dataset, so keep the size of images small like 460*460.
also you can try with detectron2 ,its faster & simpler than tensorflow.
https://colab.research.google.com/github/Tony607/detectron2_instance_segmentation_demo/blob/master/Detectron2_custom_coco_data_segmentation.ipynb
I wish to know whether I can use an Inception or ResNet model to identify faces. I want to know whether transfer learning and training is even considerable for my task.
I just want to be able to identify faces but I am also curious whether I can retrain/optimize a pre-trained model for my task.
Or have I been reading of things wrong; do I need to get a pre-trained model that was designed for faces?
I have tried poking around with Inception and VGG16 but I have not trained them for faces. I am working on it but I want to know whether this is even viable or simply a waste of time. If I use transfer learning with FaceNet I think I'll be better off.
Transfer learning for facial detection would be a great way to go ahead. Also, yes transfer learning with facenet is a great idea.
Also, for transfer learning to work it is not necessary that the model had to be initially pre-trained with only faces like using facenet. A model pre-trained with imagenet would also be pretty darn good! This is a very hot topic, so do not try to reinvent the wheel. There are many repositories that have already done this using transfer learning from imagenet dataset and using resnet50 with astonishingly good results.
Here is a link to one such repository:
https://github.com/loheden/face_recognition_with_siamese_network
Also note that siamese networks is a technique that is especially good in the facial recognition use case. The concept of siamese is really simple: take two images and compare the features of these two images. If the similarity in features are above a set threshold, then the two images are the same (the two faces are the same) else not the same (face not recognized).
Here is a research paper on siamese networks for facial recognition.
Also, here is a two-part tutorial on how to implement the siamese network for facial recognition using transfer learning:
http://www.loheden.com/2018/07/face-recognition-with-siamese-network.html
http://www.loheden.com/2018/07/face-recognition-with-siamese-network_29.html
The above tutorial's code is in the first Github link I shared at the beginning of this answer.
I'm currently trying to use the Object Detection framework from Google Tensorflow.
I have a trained model for something similar to MNIST. What is the easiest way to use this as the classification checkpoint?
As I currently understand it I can use a classification or an object detection checkpoint.
Just don't know how to use my classification checkpoint as I think the structure of the network is specified by the type in
feature_extractor {
type: "ssd_mobilenet_v2"
...
Do I have to provide my own model type using this?
All of the pretrained models have a lot of layers and my MNIST like dataset has only 3 which was quite easy to train.
The goal in general is to detect math symbols on a white background with bounding boxes. The classification part was easy but trying to extend it by object detection seems to hard. Using pretrained models for object detection which were trained on real world images seem to be better than scratch but pretty bad in general.
Any ideas appreciated!