YOLO V3 Not Learning Well From Data

YOLO V3 Not Learning Well From Data - python

I tried training YOLO V3 on a signature dataset, but the trained model after 2000 iterations couldn't produce any detection.
The data set consisted of around 5000 images with signatures on them. The document pages are black and white. The images are labeled accurately as the pages are generated by me by placing signatures on the pages.
I used the YOLOv3 default architecture, but trained from scratch. I tried using darknet53.conv.74 and fine tuning it, but it didn't work, which I assume is because the network is trained on photo data while the data I have are documents. Training from scratch, I trained on a GPU AWS machine for 2000 iterations. During training, the output is like follows:
It went from:
2: 3249.269043, 3238.557373 avg, 0.000000 rate, 10.226817 seconds, 128 images Loaded: 0.000073 seconds
To:
2032: 0.667013, 0.644689 avg, 0.001000 rate, 22.906654 seconds, 130048 images Loaded: 0.000103 seconds
So the training loss has significantly decreased and has been hovering around 0.6 for couple hundred iterations at least.
The only part I'm not 100% sure is how to start the training process, and I used the code below to train it.
./darknet detector train params/darknet.data params/darknet-yolov3.cfg
darknet.data is as follows:
classes = 1
train = ./params/data_train.txt
valid = ./params/data_test.txt
names = ./params/classes.names
backup = ./params/weights/
And darknet-yolov3.cfg is the exact same as yolov3.cfg.
I tried testing the model using couple different images with signatures on them, and they are rather simple cases. But the trained model failed to detect any signature in all of these test images.
If anyone has any suggestions on what I should do to/test, it would be greatly appreciated! Thanks!

Related

Can't oversample my image data using SMOTE

I'm new to machine learning, and i have been working on a project for early dementia detection using cnn.
I am facing issue in oversampling my data.(data is MRI images from imported from kaggle with train and test classes having 4 sub classes(nondemented,milddemented....)). the train data has around 5120 images and test has around 1200 with 176258 size which i have resized to 176176
for x,y in train_data:
images.append(x)
images = np.concatenate(images)
train_images = images.reshape(len(images),176*176*3)
sm=SMOTE(sampling_strategy='minority',random_state=42)
train_images=sm.fit_resample(train_images)
this is the code,i have applied the same procedure for test data as well upto reshaping, in the last line its causing an error, now i know there in fit_resample there has to be 2 arguments second one been labels, but in this case where i just have images, what should i put there as second argument, should it be my test_data? i have no clue. please help me

Training model on custom data

I am trying to try an object detection model on a custom data set. I want it to recognize a specififc piece of metal from my garage. I took like 32 photos and labelled them. The training goes well, but up to 10% loss. After that it goes very slow, so I need to stop it. After that, I implemented the model on camera, but it has no accuracy. Could it be because of the fact that I have only 32 images of the object? I have tried with YoloV2 and Faster RCNN.

It has low probability that your model implemented to a camera has no accuracy because you have only 32 images.
Anyway before you've had about up to 10% loss (It seems to be about 90% accuracy), so it should work I think the problem is not in the amount of images.
After training your model you need to save the coefficients of model trained.
Make sure that you implemented model trained, and not you don't use model from scratch

Just labeling will not help in object detection. What you are doing is image classification but expecting results of object detection.
Object detection requires bounding box annotations and changes in the loss function which is to be fed to the model during each backpropagation step.
You need some tools to do data annotations first, then manipulate your Yolov2/Fast-RCNN codes along with the loss function. Train it well and try using Image Augmentations to generate little more images because 32 images are less. In that case, you might end up in a pitfall of getting higher training accuracy but less test accuracy. Training models in fewer images sometimes lead to unexpected overfitting.
Only then you should try to implement using the camera.

How one can quickly verify that a CNN actually learns?

I tried to build a CNN from scratch based on LeNet architecture from this article
I implemented backdrop and now trying to train it on the MNIST dataset using SGD with 16 batch size. I want to find a quick way to verify that the learning goes well and there are no bugs. For this, I visualize loss for every 100th batch but it takes too long on my laptop and I don't see an overall dynamic (the loss fluctuates downwards, but occasionally jumps up back so I am not sure). Could anyone suggest a proven way to find that the CNN works well without waiting many hours of training?

The MNIST consist of 60k datasets of 28 * 28 pixel.Training a CNN with batch size 16 will have 4000 forward pass per epochs.
Now taking into consideration that your are using LeNet which not a very deep model.
I would suggest you to do followings:
Check your PC specifications such as RAM,Processor,GPU etc.
Try your to train your model on cloud service such Google Colab, Kaggle and others
Try a batch size of 128 or 64
Try to normalize your image data set before training
Training speed also depends on machine learning framework you are using such as Tensorflow, Pytorch etc.
I hope this will help.

Object detection in 1080p with SSD Mobilenet (Tensorflow API)

Hello everybody,
My objective is to detect people and cars (day and night) on images of the size of 1920x1080, for this I use the tensorflow API, I use a SSD mobilenet model, I annotated 1000 images (900 for training, 100 for evaluation) from 7 different cameras. I launch the training with an image size of 960x540. My model does not converge. I do not know what to do, should I make different classes for day and night objects?
On a tutorial for face detection with the tensorflow API, they use a dataset with images containing only faces, then use the model on complex scenes. Is this a good idea knowing that a model like SSD also learns negative examples?
Thank you
(sources: https://blog.usejournal.com/face-detection-for-cctv-surveillance-6b8851ca3751)

What do you mean by "not converge"? Are you referring to the train/validation loss?
In this case, the first thing that comes to my mind is to reduce the learning rate (I had a similar problem).
You can do it by modifying you configuration file, in the "train_config" section you'll find the value "initial_learning_rate".
Try to set it up to a lower value (like, an order of magnitude lower) and see if it helps.

Trained model detects almost everything as one class after a long training

I trained a custom person detector using Tensorflow and Inception's pretrained model then after a few thousands of step and an average of 2-1 loss, I've stopped the training and tested it with a live video. The result was quite good and only gets few false positives. It can detect some person but not everyone so I decided to continue on training the model until I get an average loss of below 1 then tested it again. It now detects almost everything as a person even the whole frame of the video even when there is no object present. The models seems to work great on pictures but not on videos. Is that an overfitting?
Sorry I forgot how many steps it is. I accidentally deleted the training folder that contains the ckpt and tfevents.
edit: I forgot that I am also training the same model with same dataset but higher batch size on a cloud as a backup which is now on a higher step. I'll edit the post later and will provide the infos from tensorboard once I've finished downloading and testing the model from the cloud.
edit2: I downloaded the trained model on 200k steps from the cloud and it is working, it detects persons but sometimes recognizes the whole frame as "person" for less than a second when I am moving the camera. I guess this could be improved by continuing on training the model.
Total Loss on tensorboard
For now, I'll just continue the training on the cloud and try to document every results of my test. I'll also try to resize some images on my dataset and train it on my local machine using mobilenet and compare the results from two models.

As you are saying the model did well when there were less training iterations, I guess the pre-trained model could already detect the person object and your training set made the detection worse.
The models seems to work great on pictures but not on videos
If your single pictures are detected fine, then videos should work too. the only difference can be from video image resolution and quality. So, compare the image resolution and the video.
Is that an overfitting?
The images and the videos, you are talking about, If the images were used in training you should not use them to evaluate the model. If the model is over fitted it will detect the training images but not any other ones.
As you are saying, the model detects too many detections, I think this is not because of overfitting, it can be about your dataset. I think
You have too little amount of data to train.
The network model is too big and complicated for the amount of data. Try smaller network like VGG, inception_v1(ssd mobile net) etc.
The image resolution used in training set is very different from the evaluation images.
Learning rate is important, but I think in your case it's fine.
I think you can check carefully the dataset you used for training and use as many data as you can for the training. These are the things I generally experienced and wasted time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.