I am trying to try an object detection model on a custom data set. I want it to recognize a specififc piece of metal from my garage. I took like 32 photos and labelled them. The training goes well, but up to 10% loss. After that it goes very slow, so I need to stop it. After that, I implemented the model on camera, but it has no accuracy. Could it be because of the fact that I have only 32 images of the object? I have tried with YoloV2 and Faster RCNN.
It has low probability that your model implemented to a camera has no accuracy because you have only 32 images.
Anyway before you've had about up to 10% loss (It seems to be about 90% accuracy), so it should work I think the problem is not in the amount of images.
After training your model you need to save the coefficients of model trained.
Make sure that you implemented model trained, and not you don't use model from scratch
Just labeling will not help in object detection. What you are doing is image classification but expecting results of object detection.
Object detection requires bounding box annotations and changes in the loss function which is to be fed to the model during each backpropagation step.
You need some tools to do data annotations first, then manipulate your Yolov2/Fast-RCNN codes along with the loss function. Train it well and try using Image Augmentations to generate little more images because 32 images are less. In that case, you might end up in a pitfall of getting higher training accuracy but less test accuracy. Training models in fewer images sometimes lead to unexpected overfitting.
Only then you should try to implement using the camera.
Related
I am learning Convolution Neural Network now and practicing it on kaggle digit recognizer (MNIST) dataset.
While training the data, I noticed that inspite of initial gradually growing accuracy, in between there was a huge jump i.e from 0.8984 to 0.9814.
As a beginner, I want to investigate what does this jump really show about my model. Here is the image of the epochs:
enter image description here
I have circled the jump in yellow. Thanks in advance!
As the loss gradually starts to decrease, this create an impact on fitting of the model. The cost function makes the loss go down, which directly creates an impact on the fitting of model. Better the fitting of model into training data, better the accuracy (which we can easily see as the accuracy increases with the reduction in loss). There is almost a difference of 0.08 in your consecutive loss function which is enough for the model to fit more from the current state.
Now as the model progresses, we try it on the testing dataset because the real world data is nothing like the data we trained it on.
However, a higher accuracy might not always be good as the model is considered to be over-evaluated which is also known as overfitting which means the model is performing too well that it can't handle any little changes. Therefore, a correct balance between learning rate and epochs are required in order to predict the classes correctly. It also depends on the architecture, Optimizing function which make sure the oscillations are low and numerous other things.
I have trained a image classification model using keras which gives training acc of 98%, validation acc of 98% testing acc of 90%, but performs very poor on new input images. I don't know the why??
You write that you trained an image classification model. Let's say that you want to write an app which lets people photograph their dinner and it returns a prediction on which food it is.
If you train your model with "perfect" images, i.e. high-quality images that are sharp, but now you let your model predict a "real-life" which was taken in a dark restaurant, then you have the problem: You let model predict sth it basically didn't learn. That might be the issue.
I think it would be great if you could provide us with some typical images of your training-dataset and the dataset you are predicting on. Maybe the ones you took for predictions are not typical for the whole training dataset.
I tried image classification using trained model and its working well but some images could not find perfectly in that time have to get that image and label from users so my doubt is..Is it possible to add new data into already trained model?
No, during inference time you use the weights of the trained model for predictions. Which basically means that at the time your model is deployed the capabilities of your image classifier are fixed by the weights. If you wish to improve your model, you would have to retrain your model with the new - data. However, there is another paradigm of learning called "Online Learning" where the model is continuously learning and modifying the weights. In this case your weights are not fixed and your model is continuously updating its weights with each training input. However afaik this is not usually recommended for CNNs, because the backward pass of gradients is computationally intensive and your inference will be slow because of this.
No model can predict with 100% accuracy if it does it's an ideal model. And if you want to add more data to your train model you have to retrain the model with the new data. Having more data is always a good idea. It allows the “data to tell for itself,” instead of relying on assumptions and weak correlations. Presence of more data results in better and accurate models. So if you want to get better accuracy you have to train your model with more data. Without retraining, you can't add data to your trained model.
I'm a Tensorflow newby and I'm trying to train a 1 class model for object detection. In particular I'm trying to recognize an arrow like the following:
I need a very fast recognition so I started wondering if a pre-trained model can contain such kind of shape.
Unfortunately didn't find anything similar and therefor I started with my own training of the arrow using as model the faster_rcnn_inception_v2_coco_2018_01_28.
I'm using his pipeline config, and I'm using his fine_tune_checkpoint as well, is this right considering that I have to train a completely different object?
The result is a training with a very good accuracy but very low speed. I need to increase the framerate and I didn't understand yet if the less is the "training loss" the more is the "object recognition speed", or not.
Any suggestion on how could I speedup the detection?
I'm using his pipeline config, and I'm using his fine_tune_checkpoint
as well, is this right considering that I have to train a completely
different object?
Yes! Every time you want to change the output of a deep NN, you should take a pretrained model. Training a model from scratch can take several weeks and you will never be able to generate enough data on your own. Taking a pretrained model and fine-tuning it is a way to go.
I didn't understand yet if the
less is the "training loss" the more is the "object recognition
speed", or not.
No. Training loss just tells you how good your model performs with respect to the training set.
The issue you are having is a classic speed vs. accuracy trade-off. I encourage you to take a look at this table and find a model which is fast enough for you (i.e. lowest run-time) but have decent accuracy. I would first check SSD here.
The result is a training with a very good accuracy but very low speed.
How much FPS does your algorithm perform?
Since you already have prepared dataset, I would suggest using Tiny-Yolo which performs 244 FPS on COCO dataset https://pjreddie.com/darknet/yolo/
Preparing training dataset for Tiny-Yolo is very easy if you use this repository
And
I didn't understand yet if the less is the "training loss" the more is the "object recognition speed"
Training lost has nothing to do with speed.
I trained a custom person detector using Tensorflow and Inception's pretrained model then after a few thousands of step and an average of 2-1 loss, I've stopped the training and tested it with a live video. The result was quite good and only gets few false positives. It can detect some person but not everyone so I decided to continue on training the model until I get an average loss of below 1 then tested it again. It now detects almost everything as a person even the whole frame of the video even when there is no object present. The models seems to work great on pictures but not on videos. Is that an overfitting?
Sorry I forgot how many steps it is. I accidentally deleted the training folder that contains the ckpt and tfevents.
edit: I forgot that I am also training the same model with same dataset but higher batch size on a cloud as a backup which is now on a higher step. I'll edit the post later and will provide the infos from tensorboard once I've finished downloading and testing the model from the cloud.
edit2: I downloaded the trained model on 200k steps from the cloud and it is working, it detects persons but sometimes recognizes the whole frame as "person" for less than a second when I am moving the camera. I guess this could be improved by continuing on training the model.
Total Loss on tensorboard
For now, I'll just continue the training on the cloud and try to document every results of my test. I'll also try to resize some images on my dataset and train it on my local machine using mobilenet and compare the results from two models.
As you are saying the model did well when there were less training iterations, I guess the pre-trained model could already detect the person object and your training set made the detection worse.
The models seems to work great on pictures but not on videos
If your single pictures are detected fine, then videos should work too. the only difference can be from video image resolution and quality. So, compare the image resolution and the video.
Is that an overfitting?
The images and the videos, you are talking about, If the images were used in training you should not use them to evaluate the model. If the model is over fitted it will detect the training images but not any other ones.
As you are saying, the model detects too many detections, I think this is not because of overfitting, it can be about your dataset. I think
You have too little amount of data to train.
The network model is too big and complicated for the amount of data. Try smaller network like VGG, inception_v1(ssd mobile net) etc.
The image resolution used in training set is very different from the evaluation images.
Learning rate is important, but I think in your case it's fine.
I think you can check carefully the dataset you used for training and use as many data as you can for the training. These are the things I generally experienced and wasted time.