I am counting the total no. of vehicles in a video, but I want to detect only the vehicles which are travelling up(roads have a divider) so my point is, Can i use yolo only on a rectangle where vehicles are moving up? I dont want to detect vehicles that are on the other side of the road.
is there a way like i can draw a rectangle and only detect objects on that specific rectangle?
The best I can think of is for every frame, i'll have to crop the frame, perform all the operations and stitch it back to the original frame. I am expecting an easier alternative for the same
Any help is appreciated. Thanks
You can perform yolo on the entire image as usual, but add an if condition to only draw boxes the center of which falls in a specific region. Or you can add this condition (position) next to the conditions of IoU (where detected boxes are filtered). Also you can separate counting based on the direction of moving vehicles and use two different counters for the two directions.
If you don't mind me asking, how are you tracking the vehicles?
i'm doing a similar thing...
if your product is going to be fixed on like a light poll then clearly you can either detect the road and zebra crossing by training a model.
or
manually enter these values...
later run your object detection and object tracking on only these parts of the frames i.e, use
frame[ymax:ymin, xmax:xmin]
This reduces the image size so your processing speed increases.
but why do you need the full image again after your work? still if you do need it then you just have to add the values of xmin and ymin of your object detection box on the road to the bounding box of the vehicle detected in that object detection box to get its bounding box values in uncropped image.
Related
I'm looking for ideas to help improve my current approach for real-time object detection using computer vision (specifically the opencv library). My goal is to accurately detect a golf-ball through image processing in a large variety of environments/lighting conditions. My detection process works quite well probably 80% of the time, but I'm hitting edge cases that cause failures that I can't ignore. The edge case I'm focusing on right now is extreme shadows being cast by the golf-ball. Here is a pair of example images. The coloured image is my source, and the black and white image is my post-processed result.
There are a few important variables to consider with my application
Source coming from a video feed, and being processed in real-time
It can be windy, so camera shake can be an issue
Camera isn't guaranteed to be incredible quality, so need to account for extra noise/not incredible resolution
I won't go into full details in the processing I'm doing to detect moving objects (Kalman Filter, Background Subtraction, ...) as in this specific example I'm failing to detect a stationary object (ie. ball has gone to rest).
Grab initial frame before any balls are in-frame as my base frame (this will be used for background subtraction)
convert image to greyscale
apply a median blur to eliminate noise, which can otherwise be pretty extreme due to a combination of camera shake, poor camera quality
apply an adaptive threshold on the image. I'm using ADAPTIVE_THRESH_GUASSIAN_C and have been tuning the block size and C constant values as best I can
apply background subtraction (I'm using the built-in CNT Subtractor)
Apply a small dilation kernel to the entire image to try and increase the size of the contours that are left after the above processing, as they can sometimes become quite small after the blur filter for example
use opencv's "findContours" with RETR_TREE, and CHAIN_APPROX_SIMPLE parameters
walk the contour hierarchy, looking for "filled in" contours. The idea being that the golf-balls should mostly be completely filled in, compared to other objects which will have an outline, and I can use the hierarchy to determine which contours are filled in or not (ie do they have child contours)
for each filled in contour, do an enclosing circle. Compare area of enclosing circle to contour area, filter by an acceptable difference to determine how circular the object is
another pass, filtering by min/max area size since I can assume the camera will always be at a similar height to "hone-in" on the object
As you can see from the images above, this approach runs into problems when the ball itself has a lot of contrast due to shadows. To me it looks like the adaptive threshold pass is filtering out the darker part of the ball (due to shadow) which creates a non-circular shape. Perhaps I need to dial in the adaptive threshhold pass to allow for a bit more contrast since we can assume shadows are always on the dark side and a ground shadow should be darker than the shaded part of the ball? I'd also like to completely eliminate the leftover outline of the ground-shadow if possible. My guess is that the edges of the shadow being slightly lighter is the reason they don't get filtered out by my adaptive threshold pass. Open to any and all suggestions :-)
Let me start by saying that I'm a complete amateur in image recognition and I'm trying to complete my first assignment using OpenCV in Python. I'm currently really struggling and therefore I came here for some advice or any help in general that would put me on the right path.
What am I currently trying to do:
My goal here is to recognize a shooting target image that user uploads and compare it to one of two shooting target templates (images provided lower). My app is afterward going to calculate this shooting target based on the template it matches and give the user a really accurate score of his shot/shots (based on millimeters from the center of the target). This is just a long goal. For now, I'm just trying to figure out how to distinguish the uploaded target image from the templates I have.
Examples of shooting targets:
As I mentioned I have two shooting target templates: target 1 and target 2.
The user then uploads a target that must match one of the templates.
Example that matches target 1
Example that matches target 2
Whenever the uploaded shooting target doesn't match any of the templates, the app should tell the user and not continue with the calculation.
What have I done and tried so far:
For starters, I figured it would be beneficial to remove everything from the background and crop the image by the shooting target, and so I did. (I thought if I removed all of the background interference I could easily just compare the two images, but I later found out this actually wouldn't be accurate at all).
After that, I tried to calculate the percentage of the black color to the other color inside the target (without the background), but again found out this wouldn't be accurate since the shooter could shoot through a lot of the black color and then the percentage would fluctuate. Also, I wouldn't be able to tell if it's one of the templates since another completely different shooting target could have the same amount of black color in the middle.
As of comparison of the two images, I tried a lot of ways (histogram, feature matching with brute force, template matching) and neither of those seemed to be accurate nor usable (I could have been doing it wrong tho, that's a possibility).
What I have figured after all of those failures is that possibly the best solution would be to compare the circles inside the shooting target or the numbers inside the black middle circles, but I couldn't figure out how to do so properly.
Do you guys have any idea on how to go about this? I would really appreciate any help or any push towards the solution of my problem. Code examples are highly appreciated and would make my day.
Best regards.
The targets seem to differ only in score bands (rings) 4, 5 and 6. So I would try and concentrate on those areas.
I took your sample images and resized them to exactly 500x500 pixels, then I measured the radius from the centre to the outside edge of band 4 (which was 167 px) and to the edge of band 6 (which was 95 px). So the outer limit of the area of interest is 167/500, or 0.33xW and the inner limit is 95/500, or 0.19xW where W is the width of the enclosing rectangle.
So, you can draw that mask like this:
#!/usr/bin/env python3
import numpy as np
import cv2
# Define width/height of target in pixels
W = 300
# Make mask, white for area of interest, black elsewhere
mask = np.zeros((W,W),dtype=np.uint8)
cv2.circle(mask, (W//2,W//2), int(0.33*W), 255, -1) # White outer circle
cv2.circle(mask, (W//2,W//2), int(0.19*W), 0, -1) # Black inner circle
That gives you this mask:
You can now calculate, say, the mean of all pixels within that mask using:
maskedMean = cv2.mean(YourImage, mask)
and only pixels that are white within the mask will contribute to the mean.
Here is the mask placed beside one of your targets:
I want to validate the object detection(the green boxes) which I have marked, whether it's tracking that object only in a video.
How do I check whether it's tracking that object only and not moved to any other object? say in this case How do I validate that the left car(black) is tracking
correctly over the next set of frames along with another object(white car on the right side)
edit: I have tried with finding contours and extracting particularly that object (the black car in this frame) and tried to extract features out of it, but that didn't work.
If you initialize your tracking with a detection step, you could periodically reiterate that detection inside the region you are tracking to make sure the car is still there. Alternatively, you could describe the object region using various histograms (color, gradients, etc.) and check whether the region you are tracking is still similar to what it started with.
I suggest checking out color histograms and HOGs (histogram of oriented gradients) to start with, and maybe combine them.
You could also analyze the objects' motion to detect irregularities, jumps, etc. Consider comparing next-frame positions to predicted positions with a Kalman filter.
I'm doing a project in multiple object tracking, particularly pedestrians on a streets. I've read about the tracking API in OpenCV and there's a part where you have to specify a ROI (a.k.a. square) around your tracked objects. Yet, I don't know how to make the machine first understand that it needs to detect pedestrians only, and then from the detected people, draw the ROI around that object. Also, the number of people on the street is different in each frame, so how to automate the program to detect the people and then draw that square on them? Thanks.
Using Python, OpenCV, and live webcam input, I can't figure out how to set a point based on an x y coordinate and track where it moves.
Below is a simple example to track a yellow object.
https://github.com/abidrahmank/OpenCV-Python/blob/master/Other_Examples/track_yellow_draw_line.py
Here is the method to track yellow color:
1) Extract the first frame of video
2) Convert frame into HSV color space. Take H plane and threshold it for yellow color so that you get binary image with yellow object as white (also called blob) and remaining as black.
3) Now you find centre point of blob. You can use moments or contours(especially if you have more than one blob. In the example above, very simple logic is used. Just find leftmost,rightmost,topmost and bottommost points on blob and draw a rectangle around it). And store this values.
4) Extract next frame and follow all above steps to get new position. Join these two position and draw a line.
Over.
There are a few blogs that explain the basics. Check out this one: Object tracking in OpenCV and Python 2.6.
Edit: I don't think you can track arbitrary points. To be able to make a correspondence between one point in two images, you need to know something unique about the point to track. This is often done with interest points, which are "unique enough" to be compared across images. Other methods are based making the point easy to detect using a projection scheme.