I trained a custom person detector using Tensorflow and Inception's pretrained model then after a few thousands of step and an average of 2-1 loss, I've stopped the training and tested it with a live video. The result was quite good and only gets few false positives. It can detect some person but not everyone so I decided to continue on training the model until I get an average loss of below 1 then tested it again. It now detects almost everything as a person even the whole frame of the video even when there is no object present. The models seems to work great on pictures but not on videos. Is that an overfitting?
Sorry I forgot how many steps it is. I accidentally deleted the training folder that contains the ckpt and tfevents.
edit: I forgot that I am also training the same model with same dataset but higher batch size on a cloud as a backup which is now on a higher step. I'll edit the post later and will provide the infos from tensorboard once I've finished downloading and testing the model from the cloud.
edit2: I downloaded the trained model on 200k steps from the cloud and it is working, it detects persons but sometimes recognizes the whole frame as "person" for less than a second when I am moving the camera. I guess this could be improved by continuing on training the model.
Total Loss on tensorboard
For now, I'll just continue the training on the cloud and try to document every results of my test. I'll also try to resize some images on my dataset and train it on my local machine using mobilenet and compare the results from two models.
As you are saying the model did well when there were less training iterations, I guess the pre-trained model could already detect the person object and your training set made the detection worse.
The models seems to work great on pictures but not on videos
If your single pictures are detected fine, then videos should work too. the only difference can be from video image resolution and quality. So, compare the image resolution and the video.
Is that an overfitting?
The images and the videos, you are talking about, If the images were used in training you should not use them to evaluate the model. If the model is over fitted it will detect the training images but not any other ones.
As you are saying, the model detects too many detections, I think this is not because of overfitting, it can be about your dataset. I think
You have too little amount of data to train.
The network model is too big and complicated for the amount of data. Try smaller network like VGG, inception_v1(ssd mobile net) etc.
The image resolution used in training set is very different from the evaluation images.
Learning rate is important, but I think in your case it's fine.
I think you can check carefully the dataset you used for training and use as many data as you can for the training. These are the things I generally experienced and wasted time.
Related
I am trying to try an object detection model on a custom data set. I want it to recognize a specififc piece of metal from my garage. I took like 32 photos and labelled them. The training goes well, but up to 10% loss. After that it goes very slow, so I need to stop it. After that, I implemented the model on camera, but it has no accuracy. Could it be because of the fact that I have only 32 images of the object? I have tried with YoloV2 and Faster RCNN.
It has low probability that your model implemented to a camera has no accuracy because you have only 32 images.
Anyway before you've had about up to 10% loss (It seems to be about 90% accuracy), so it should work I think the problem is not in the amount of images.
After training your model you need to save the coefficients of model trained.
Make sure that you implemented model trained, and not you don't use model from scratch
Just labeling will not help in object detection. What you are doing is image classification but expecting results of object detection.
Object detection requires bounding box annotations and changes in the loss function which is to be fed to the model during each backpropagation step.
You need some tools to do data annotations first, then manipulate your Yolov2/Fast-RCNN codes along with the loss function. Train it well and try using Image Augmentations to generate little more images because 32 images are less. In that case, you might end up in a pitfall of getting higher training accuracy but less test accuracy. Training models in fewer images sometimes lead to unexpected overfitting.
Only then you should try to implement using the camera.
Hello everybody,
My objective is to detect people and cars (day and night) on images of the size of 1920x1080, for this I use the tensorflow API, I use a SSD mobilenet model, I annotated 1000 images (900 for training, 100 for evaluation) from 7 different cameras. I launch the training with an image size of 960x540. My model does not converge. I do not know what to do, should I make different classes for day and night objects?
On a tutorial for face detection with the tensorflow API, they use a dataset with images containing only faces, then use the model on complex scenes. Is this a good idea knowing that a model like SSD also learns negative examples?
Thank you
(sources: https://blog.usejournal.com/face-detection-for-cctv-surveillance-6b8851ca3751)
What do you mean by "not converge"? Are you referring to the train/validation loss?
In this case, the first thing that comes to my mind is to reduce the learning rate (I had a similar problem).
You can do it by modifying you configuration file, in the "train_config" section you'll find the value "initial_learning_rate".
Try to set it up to a lower value (like, an order of magnitude lower) and see if it helps.
I tried training YOLO V3 on a signature dataset, but the trained model after 2000 iterations couldn't produce any detection.
The data set consisted of around 5000 images with signatures on them. The document pages are black and white. The images are labeled accurately as the pages are generated by me by placing signatures on the pages.
I used the YOLOv3 default architecture, but trained from scratch. I tried using darknet53.conv.74 and fine tuning it, but it didn't work, which I assume is because the network is trained on photo data while the data I have are documents. Training from scratch, I trained on a GPU AWS machine for 2000 iterations. During training, the output is like follows:
It went from:
2: 3249.269043, 3238.557373 avg, 0.000000 rate, 10.226817 seconds, 128 images Loaded: 0.000073 seconds
To:
2032: 0.667013, 0.644689 avg, 0.001000 rate, 22.906654 seconds, 130048 images Loaded: 0.000103 seconds
So the training loss has significantly decreased and has been hovering around 0.6 for couple hundred iterations at least.
The only part I'm not 100% sure is how to start the training process, and I used the code below to train it.
./darknet detector train params/darknet.data params/darknet-yolov3.cfg
darknet.data is as follows:
classes = 1
train = ./params/data_train.txt
valid = ./params/data_test.txt
names = ./params/classes.names
backup = ./params/weights/
And darknet-yolov3.cfg is the exact same as yolov3.cfg.
I tried testing the model using couple different images with signatures on them, and they are rather simple cases. But the trained model failed to detect any signature in all of these test images.
If anyone has any suggestions on what I should do to/test, it would be greatly appreciated! Thanks!
I'm a Tensorflow newby and I'm trying to train a 1 class model for object detection. In particular I'm trying to recognize an arrow like the following:
I need a very fast recognition so I started wondering if a pre-trained model can contain such kind of shape.
Unfortunately didn't find anything similar and therefor I started with my own training of the arrow using as model the faster_rcnn_inception_v2_coco_2018_01_28.
I'm using his pipeline config, and I'm using his fine_tune_checkpoint as well, is this right considering that I have to train a completely different object?
The result is a training with a very good accuracy but very low speed. I need to increase the framerate and I didn't understand yet if the less is the "training loss" the more is the "object recognition speed", or not.
Any suggestion on how could I speedup the detection?
I'm using his pipeline config, and I'm using his fine_tune_checkpoint
as well, is this right considering that I have to train a completely
different object?
Yes! Every time you want to change the output of a deep NN, you should take a pretrained model. Training a model from scratch can take several weeks and you will never be able to generate enough data on your own. Taking a pretrained model and fine-tuning it is a way to go.
I didn't understand yet if the
less is the "training loss" the more is the "object recognition
speed", or not.
No. Training loss just tells you how good your model performs with respect to the training set.
The issue you are having is a classic speed vs. accuracy trade-off. I encourage you to take a look at this table and find a model which is fast enough for you (i.e. lowest run-time) but have decent accuracy. I would first check SSD here.
The result is a training with a very good accuracy but very low speed.
How much FPS does your algorithm perform?
Since you already have prepared dataset, I would suggest using Tiny-Yolo which performs 244 FPS on COCO dataset https://pjreddie.com/darknet/yolo/
Preparing training dataset for Tiny-Yolo is very easy if you use this repository
And
I didn't understand yet if the less is the "training loss" the more is the "object recognition speed"
Training lost has nothing to do with speed.
I'm working on a project that requires the recognition of just people in a video or a live stream from a camera. I'm currently using the tensorflow object recognition API with python, and i've tried different pre-trained models and frozen inference graphs. I want to recognize only people and maybe cars so i don't need my neural network to recognize all 90 classes that come with the frozen inference graphs, based on mobilenet or rcnn, as it seems this slows the process, and 89 of this 90 classes are not needed in my project. Do i have to train my own model or is there a way to modify the inference graphs and the existing models? This is probably a noob question for some of you, but mind that i've worked with tensorflow and machine learning for just one month.
Thanks in advance
Shrinking the last layer to output 1 or two classes is not likely to yield large speed ups. This is because most of the computation is in the intermediate layers. You could shrink the intermediate layers, but this would result in poorer accuracy.
Yes, you have to train own model. Let's see in short words some ways how to do.
OPTION 1. When you want to apply transfer knowledge as maximum as possible, you can froze the CNN layers. After, you change a quantity of detected classes with dimension of classifier (dense layers). The classifier is the latest part in CNN architecture. Now, you should retrain only classifier.
OPTION 2. Assuming, you want to apply transfer knowledge for first layers of CNN (for example, froze first 2-3 CNN layers) and retrain rest of CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain rest of CNN layers and classifier.
OPTION 3. Assuming, you want to retrain whole CNN with classifier. After, you change a quantity of detected classes with dimension of classifier. Now, you should retrain whole CNN with classifier.
Generally, the Tensorflow Object Detection API is a good start for beginners! How to proceed with your problem you can see here more detail about whole process and extra explanation here.