YOLOV5 is not giving prediction on live webcam?

YOLOV5 is not giving prediction on live webcam? - python

Actually i have trained a yolov5 model with traffic sign dataset diamentions (1360,800) with size of each image is 600KB but when i do real-time prediction on laptop camera its note able to predict on those signs. The diamention of webcame image are (600,450) with size 280KB does this problem due to the size or diamention of the image.
One thing to keep in mind i have no GPU,CPU in local pc i have trained model on colab. and its working fine on image for test in dataset with high size and diamentions.
This YOLOV5 trained model is working on test data of their own dataset but its not working on my captured image or live webcame of my pc.

Related

How to tile images on inference with Yolo Object Detection

I've recently trained a custom yolov5 model to recognize animals on safari.
Animals on safari are far away most of the time, and so, after resizing images to 640x640, most of the animals are now too small to be detected.
I've researched the technique of tiling, taking a large image and splitting it into 5x5 smaller images, allowing the inference not to take up as much ram as trying to run the inference on the initial large image.
However, there is no instruction on how to do this on real-time inference.
The model I'm using is Yolov5 trained with PyTorch.
Does anyone know how to do tiling on real-time inference?

How can I deploy a trained CNN model to production on a ARM?

I have trained a CNN model with Keras for semantic segmentation of craneal images and saved the weights and this trained model.
Now, I want to put it into production on a microprocessor. The pipeline of the process in the micro involves reading an image from a sensor and using it as input for the CNN model (U-Net). Then, the resulted binary image is used as a mask for an area of interest from which a variable is measured. Finally, a number is given as a result.
So, is it possible to load a trained model on a microprocessor? And if so, how?
Which features should have the microprocessor in order to work with CNN models?
Thanks in advance!

Input-Digits-Image is not recognised correctly using Tensorflow's MNIST dataset

I am trying to do handwriting character recognition using Tensorflow in Google-colab.
I have trained and tested model with an accuracy of 91%
I tried it on image given in the tutorial, and it worked correctly.
it was 28*28 resized.
When I wanted to try it on my input-image, it is predicting wrong results as 2,3, but my input-image is of 'digit-6'.
the problem may be in image-operations and before passing to model.
also, further I wanted to pass that image for realtime-recognition.
I am doing resizing, inverting of the image, to make it compatible with my trained labels.
OpenCV input image is represented opposite-notation of tensorflow labels, as the current matrix represents black as 0 and white as 255.
my GitHub Jupyter-notebook file is followed from tutorial of digitalocean's blog
How can I upload an image taken from a phone/webcam and recognize characters from that image?
where I am making mistakes in processing image?
further, I wanted to pass that image in a project - real-time recognition of characters
testing images are

do you know Mnist data set is restricted with padding of images?
appropriate realtime image processing is needed.
This is useful article about that
https://link.medium.com/0ySCmyMpzU
and following is my project about simple mnist game
https://github.com/mym0404/Math-Writer

tensorflow fined tune model Fast-RCNN does not show bounding box

I fined tuned both SSD Mobilenet and Fast-RCNN on the same dataset. Both models ran training and inference without any error. But Fast-RCNN fined tuned model does not show any bounding boxes. So I tried with one training image to see if the trained model is able to draw a bbox on an image that it was trained from. But it shows nothing. Where should I start looking for debugging ?

Trained model detects almost everything as one class after a long training

I trained a custom person detector using Tensorflow and Inception's pretrained model then after a few thousands of step and an average of 2-1 loss, I've stopped the training and tested it with a live video. The result was quite good and only gets few false positives. It can detect some person but not everyone so I decided to continue on training the model until I get an average loss of below 1 then tested it again. It now detects almost everything as a person even the whole frame of the video even when there is no object present. The models seems to work great on pictures but not on videos. Is that an overfitting?
Sorry I forgot how many steps it is. I accidentally deleted the training folder that contains the ckpt and tfevents.
edit: I forgot that I am also training the same model with same dataset but higher batch size on a cloud as a backup which is now on a higher step. I'll edit the post later and will provide the infos from tensorboard once I've finished downloading and testing the model from the cloud.
edit2: I downloaded the trained model on 200k steps from the cloud and it is working, it detects persons but sometimes recognizes the whole frame as "person" for less than a second when I am moving the camera. I guess this could be improved by continuing on training the model.
Total Loss on tensorboard
For now, I'll just continue the training on the cloud and try to document every results of my test. I'll also try to resize some images on my dataset and train it on my local machine using mobilenet and compare the results from two models.

As you are saying the model did well when there were less training iterations, I guess the pre-trained model could already detect the person object and your training set made the detection worse.
The models seems to work great on pictures but not on videos
If your single pictures are detected fine, then videos should work too. the only difference can be from video image resolution and quality. So, compare the image resolution and the video.
Is that an overfitting?
The images and the videos, you are talking about, If the images were used in training you should not use them to evaluate the model. If the model is over fitted it will detect the training images but not any other ones.
As you are saying, the model detects too many detections, I think this is not because of overfitting, it can be about your dataset. I think
You have too little amount of data to train.
The network model is too big and complicated for the amount of data. Try smaller network like VGG, inception_v1(ssd mobile net) etc.
The image resolution used in training set is very different from the evaluation images.
Learning rate is important, but I think in your case it's fine.
I think you can check carefully the dataset you used for training and use as many data as you can for the training. These are the things I generally experienced and wasted time.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.