Use classification model for object detection framework - python

I'm currently trying to use the Object Detection framework from Google Tensorflow.
I have a trained model for something similar to MNIST. What is the easiest way to use this as the classification checkpoint?
As I currently understand it I can use a classification or an object detection checkpoint.
Just don't know how to use my classification checkpoint as I think the structure of the network is specified by the type in
feature_extractor {
type: "ssd_mobilenet_v2"
...
Do I have to provide my own model type using this?
All of the pretrained models have a lot of layers and my MNIST like dataset has only 3 which was quite easy to train.
The goal in general is to detect math symbols on a white background with bounding boxes. The classification part was easy but trying to extend it by object detection seems to hard. Using pretrained models for object detection which were trained on real world images seem to be better than scratch but pretty bad in general.
Any ideas appreciated!

Related

Training Yolov3-v4 with own dataset for real-time object detection

I have a real-time problem which is aimed to detect 9 objects. As far as I understand, yolo has promising results on real-time object detection problems so I am searching good instructions to train a pre-trained yolo model with my custom "own" dataset.
I have my dataset and they are already labeled, also they have bounding box coordinates in .txt files in yolo format. However, it is a bit confusing to find a good instruction on the web about yolo custom dataset training for own object detection problem, since instructions are mostly using generic dataset such as COCO, PASCAL etc. or their instructions are not well enough to implement the object detection model on own dataset.
TL;DR
My question is, are there some handy instructions about implementing yolo object detection for own dataset? I am more looking for frameworks to implement yolo model rather than darknet C implementation since I am more familiar with python so it would be perfect if you could provide Pytorch or Tensorflow implementation.
It is more appraciated if you already implemented yolov3-v4 with your own dataset with the help of instructions you found on the web and you are willing to share those instructions.
Thanks in advance.
For training purpose I would highly recommend AlexeyAB's repository as it's highly optimised for accuracy and speed, although it is also written in C. As far as testing and deployment is considered you have a lot of options:
OpenCV's DNN Module: refer this article.
Tensorflow Model
Pytorch Model
Out of these OpenCV's DNN implementation is the fastest for testing/inference.

How does YOLO split images into grid cells?

I am trying to learn object detection models and strategies they use but I cant get my head around understanding how Yolo or SSD does it using convolutional neural networks. Can anybody give some insight ? How do they structure CNN's ? It would be wonderful if code(Python) could be provided.

How do I train the DeepSORT tracker for custom class?

I want to detect and count the number of vines in a vineyard using Deep Learning and Computer Vision techniques. I am using the YOLOv4 object detector and training on the darknet framework. I have been able to integrate the SORT tracker into my application and it works well, but I still have the following issues:
The tracker sometimes reassigns a new ID to the object
The detector sometimes misidentifies the object (which lead to incorrect tracking)
The tracker sometimes does not track a detected object.
You can see an example of the reassignment issue in the following image. As you can see, in frame 40 the id 9 was a metal post, and frame 42 onwards it is being assigned to a tree
In searching for the cause of these problems, I have learnt that DeepSORT is an improved version of the SORT, which aims to handle this problem by using a Neural Network for associating tracks to detections.
Problem:
The problem I am facing is with the training of this particular model for Deepsort. I have seen that the authors have used cosine metric learning to train their model, but I am not being able to customize the learning for my custom classes. The questions I have are as follows:
I have a dataset of annotated (YOLO TXT format) images which I have used to train the YOLOv4 model. Can I reuse the same dataset for the Deepsort tracker? If so, then how?
If I cannot reuse the dataset, then how do I create my own dataset for training the model?
Thanks in advance for the help!
Yes, you can use the same classes for DeepSORT. SORT works in 2 stages, and DeepSORT adds a 3rd stage. First stage is detection, which is handled by YOLOv3, next is track association, which is handled by Kalman Filter and IOU. DeepSORT implements the 3rd stage, a Siamese network to compare the appearance features between current detections and the features of each track. I've seen implementations use ResNet as the feature embedding network
Basically once YOLO detects your class, you pass the cropped detected image over to your siamese network and it converts it into feature embeddings and compares those features with the past ones using cosine distance.
In conclusion, you can use the same YOLO classes for DeepSORT and SORT since they both need a detection stage, which is handled by YOLO.

Transfer learning for facial identification using classifiers

I wish to know whether I can use an Inception or ResNet model to identify faces. I want to know whether transfer learning and training is even considerable for my task.
I just want to be able to identify faces but I am also curious whether I can retrain/optimize a pre-trained model for my task.
Or have I been reading of things wrong; do I need to get a pre-trained model that was designed for faces?
I have tried poking around with Inception and VGG16 but I have not trained them for faces. I am working on it but I want to know whether this is even viable or simply a waste of time. If I use transfer learning with FaceNet I think I'll be better off.
Transfer learning for facial detection would be a great way to go ahead. Also, yes transfer learning with facenet is a great idea.
Also, for transfer learning to work it is not necessary that the model had to be initially pre-trained with only faces like using facenet. A model pre-trained with imagenet would also be pretty darn good! This is a very hot topic, so do not try to reinvent the wheel. There are many repositories that have already done this using transfer learning from imagenet dataset and using resnet50 with astonishingly good results.
Here is a link to one such repository:
https://github.com/loheden/face_recognition_with_siamese_network
Also note that siamese networks is a technique that is especially good in the facial recognition use case. The concept of siamese is really simple: take two images and compare the features of these two images. If the similarity in features are above a set threshold, then the two images are the same (the two faces are the same) else not the same (face not recognized).
Here is a research paper on siamese networks for facial recognition.
Also, here is a two-part tutorial on how to implement the siamese network for facial recognition using transfer learning:
http://www.loheden.com/2018/07/face-recognition-with-siamese-network.html
http://www.loheden.com/2018/07/face-recognition-with-siamese-network_29.html
The above tutorial's code is in the first Github link I shared at the beginning of this answer.

Object detection - How to detect and extract features using CNN and classify them using a classifier?

I have an image classification problem where the number of classes increases over time and when a new class is created I just trained the model with images of the new class. I know this is not possible to do with a CNN, so to solve this problem I did transfer learning where I used a Keras pretrained model to extract the features of the images but instead of replacing the last layers (used for classification) with new layers, I used a Random Forest that is able to increase the number of classes. I achieved an accuracy of 86% using the InceptionResnetV2 trained on the imagenet dataset, which is good for now.
Now I want to do the same but on an object detection problem. How can I achieve this? Can I use the Tensorflow Object Detection API?
Is it possible to replace the last layers, of a pretrained CNN with a detection algorithm like Faster-RCNN or SSD, with a random forest?
Yes, you could implement the above-mentioned approach using Tensorflow object detection API. Also, you could use your InceptionResnetV2 trained model as a feature extractor. The tensorflow object detection API already has InceptionResnetV2 feature extractor trained on coco dataset. Its available at https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
Or if you want to provide or create custom feature extractor, please follow the link https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/defining_your_own_model.md
If you are new to Tensorflow object detection API. Please follow this tutorial,
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
Hope this helps.

Categories

Resources