I'm working on a project where i have to detect colored cars form video frames taken from Bird's eye
view.
For detection i used Histogram backprojection to obtain a binary image that suppose to contain only the target region of interest.
The process works fine until i tried to generalize the detection by testing it on video that contains object with similar color distribution ( like me crawling under the table and parts of my T-shirts are visible)
As you can see, both car and irrelevant objects are moving and the detection results are:
As you see irrelevant objects that share similar color distribution are shown in the binary images. However, thanks to Stack overflow experts i could improve the detection by telling the algorithm to choose the blob that represents the target object by adding the following constraints:
1-Rectangularity check
2-area and ratio check
with above constraints i could get rid of large irrelevant objects that are detected. However, for small object (see binary images), it doesn't work that much as the rectangularity for the target object (small red car) ranges between (0.72 and 1) and the small irrelevant objects do fall in this range. So i decided to add another constraint which calculating the distance between centroids of the car moving every 5 successive frames and threshold depending on that distance by doing the flowing:
import scipy.spatial.distance
from collections import deque
#defining the Centroid
centroids=deque(maxlen=40) #double ended queue containing the detected centroids
.
.
.
centroids.appendleft(center)
#center comes from detection process. e.g centroids=[(120,130), (125,132),...
Distance = scipy.spatial.distance.euclidean(pts1_red[0], pts1_red[5])
if D<=50:
#choose that blob
So tested that on different videos and turns out the the distance between the centroids ranges between 0 and 50 (0 when the car stops).
So my question is:
Is there a way i can invest this property so that it helps enhancing the detection in such a way so that the detection ignores the T-shirt?, Since when the car is no longer visible and the irrelevant object stays it will calculate the distance difference of the irrelevant object and this distance will get small till it is less than 50!
Thanks in Advance
Based on the information provided by the OP, here is an approach to solve this problem.
Sample Images
I have create a few sample images that roughly represent the objects moving across time. The center object represents the car that we are seeking and the other objects have been detected incorrectly by the classifier.
The first four images represent a car ( centre object )moving from left to right, along with two other objects that have been detected incorrectly. In the fifth image, the car has moved out of the frame but two incorrect detections are still present. The sixth frame consists of a new car entering into the frame with other incorrect detections.
Solution - Code
The comments contain information regarding the algorithm. We are computing the centroid of each blob and comparing it with the centroid of the previously detected/extracted blob.
import os
import cv2
import numpy as np
# Reading files and sorting them in the right order
all_files = os.listdir(".")
all_images = [file_name for file_name in all_files if file_name.endswith(".png")]
all_images.sort(key=lambda k: k.split(".")[0][-1])
print(all_images) #
# Initially, no centroid information is available.
previous_centroid_x = -1
previous_centroid_y = -1
DIST_THRESHOLD = 30
for i, image_name in enumerate(all_images):
rgb_image = cv2.imread(image_name)
height, width = rgb_image.shape[:2]
gray_image = cv2.cvtColor(rgb_image, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray_image, 127, 255, 0)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
blankImage = np.zeros_like(rgb_image)
for cnt in contours:
# Refer to https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contour_features/py_contour_features.html#moments
M = cv2.moments(cnt)
cX = int(M["m10"] / M["m00"])
cY = int(M["m01"] / M["m00"])
# Refer to https://www.pyimagesearch.com/2016/04/11/finding-extreme-points-in-contours-with-opencv/
# https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_contours/py_contour_properties/py_contour_properties.html#contour-properties
extLeft = tuple(cnt[cnt[:, :, 0].argmin()][0])
extRight = tuple(cnt[cnt[:, :, 0].argmax()][0])
extTop = tuple(cnt[cnt[:, :, 1].argmin()][0])
extBot = tuple(cnt[cnt[:, :, 1].argmax()][0])
color = (0, 0, 255)
if i == 0: # First frame - Assuming that you can find the correct blob accurately in the first frame
# Here, I am using a simple logic of checking if the blob is close to the centre of the image.
if abs(cY - (height / 2)) < DIST_THRESHOLD: # Check if the blob centre is close to the half the image's height
previous_centroid_x = cX # Update variables for finding the next blob correctly
previous_centroid_y = cY
DIST_THRESHOLD = (extBot[1] - extTop[1]) / 2 # Update centre distance error with half the height of the blob
color = (0, 255, 0)
else:
if abs(cY - previous_centroid_y) < DIST_THRESHOLD: # Compare with previous centroid y and see if it lies within Distance threshold
previous_centroid_x = cX
previous_centroid_y = cY
color = (0, 255, 0)
cv2.drawContours(blankImage, [cnt], 0, color, -1)
cv2.circle(blankImage, (cX, cY), 3, (255, 0, 0), -1)
cv2.imwrite("result_" + image_name, blankImage)
Updating the threshold enables the algorithm to track the object's centroid across the frames. Since the object can move up and down a little, we want to match the centroids of objects found in the current frame, to the centroid of the car found in the previous frame.
Solution - Results
Green - Selected blob
Red - Rejected blob
Object centres have also been marked for reference.
Note - This is not a perfect solution. It has several limitations, but it can help you to design an approximate solution for your problem.
Related
I am processing binary images, and was previously using this code to find the largest area in the binary image:
# Use the hue value to convert to binary
thresh = 20
thresh, thresh_img = cv2.threshold(h, thresh, 255, cv2.THRESH_BINARY)
cv2.imshow('thresh', thresh_img)
cv2.waitKey(0)
cv2.destroyAllWindows()
# Finding Contours
# Use a copy of the image since findContours alters the image
contours, _ = cv2.findContours(thresh_img.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
#Extract the largest area
c = max(contours, key=cv2.contourArea)
This code isn't really doing what I need it to do, now I think it would better to extract the most central area in the binary image.
Binary Image
Largest Image
This is currently what the code is extracting, but I am hoping to get the central circle in the first binary image extracted.
OpenCV comes with a point-polygon test function (for contours). It even gives a signed distance, if you ask for that.
I'll find the contour that is closest to the center of the picture. That may be a contour actually overlapping the center of the picture.
Timings, on my quadcore from 2012, give or take a millisecond:
findContours: ~1 millisecond
all pointPolygonTests and argmax: ~1 millisecond
mask = cv.imread("fkljm.png", cv.IMREAD_GRAYSCALE)
(height, width) = mask.shape
ret, mask = cv.threshold(mask, 128, 255, cv.THRESH_BINARY) # required because the sample picture isn't exactly clean
# get contours
contours, hierarchy = cv.findContours(mask, cv.RETR_LIST | cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
center = (np.array([width, height]) - 1) / 2
# find contour closest to center of picture
distances = [
cv.pointPolygonTest(contour, center, True) # looking for most positive (inside); negative is outside
for contour in contours
]
iclosest = np.argmax(distances)
print("closest contour is", iclosest, "with distance", distances[iclosest])
# draw closest contour
canvas = cv.cvtColor(mask, cv.COLOR_GRAY2BGR)
cv.drawContours(image=canvas, contours=[contours[iclosest]], contourIdx=-1, color=(0, 255, 0), thickness=5)
closest contour is 45 with distance 65.19202405202648
a cv.floodFill() on the center point can also quickly yield a labeling on that blob... assuming the mask is positive there. Otherwise, there needs to be search.
(cx, cy) = center.astype(int)
assert mask[cy,cx], "floodFill not applicable"
# trying cv.floodFill on the image center
mask2 = mask >> 1 # turns everything else gray
cv.floodFill(image=mask2, mask=None, seedPoint=center.astype(int), newVal=255)
# use (mask2 == 255) to identify that blob
This also takes less than a millisecond.
Some practically faster approaches might involve a pyramid scheme (low-res versions of the mask) to quickly identify areas of the picture that are candidates for an exact test (distance/intersection).
Test target pixel. Hit (positive)? Done.
Calculate low-res mask. Per block, if any pixel is positive, block is positive.
Find positive blocks, sort by distance, examine closer all those that are within sqrt(2) * blocksize of the best distance.
There are several ways you define "most central." I chose to define it as the region with the closest distance to the point you're searching for. If the point is inside the region, then that distance will be zero.
I also chose to do this with a pixel-based approach rather than a polygon-based approach, like you're doing with findContours().
Here's a step-by-step breakdown of what this code is doing.
Load the image, put it into grayscale, and threshold it. You're already doing these things.
Identify connected components of the image. Connected components are places where there are white pixels which are directly connected to other white pixels. This breaks up the image into regions.
Using np.argwhere(), convert a true/false mask into an array of coordinates.
For each coordinate, compute the Euclidean distance between that point and search_point.
Find the minimum within each region.
Across all regions, find the smallest distance.
import cv2
import numpy as np
img = cv2.imread('test197_img.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh_img = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
n_groups, comp_grouped = cv2.connectedComponents(thresh_img)
components = []
search_point = [600, 150]
for i in range(1, n_groups):
mask = (comp_grouped == i)
component_coords = np.argwhere(mask)[:, ::-1]
min_distance = np.sqrt(((component_coords - search_point) ** 2).sum(axis=1)).min()
components.append({
'mask': mask,
'min_distance': min_distance,
})
closest = min(components, key=lambda x: x['min_distance'])['mask']
Output:
as the title states, I'm trying to crop the largest circle out of an image. I'm using OpenCV in python. To be exact, it's a shooting target, which always has the same format, but the picture of it can be taken with any mobile device and in different lighting conditions (I will include some examples lower).
I'm completely new to image recognition, so I have been trying out many different ways of doing this, but couldn't figure out a universal solution, that would work on all of my target images.
Why I'm trying to do this:
My assignment is to calculate score of one or multiple shots on the given target image. I have tried color segmentation to find the shots, but since the shots can be on different backgrounds, this wouldn't work properly. So now I'm trying to see the difference between the empty shooting target image and the already shot on target image. Also, I need to be able to tell, which target it was shot on (there are two target types). So I'm trying to crop out only the target from image to get rid of the background interference and then continue with the shot identifications.
What I have tried so far:
1) Finding the largest circle with HoughCircles. My next step would be to somehow remove the outer part of that found circle. I have played with the configuration of HoughCircles method for quite some time, but always one of the example images wasn't highlighting the most outer circle correctly or wasn't highlighting any of the circles :/.
My final configuration looked something like this:
img = cv2.GaussianBlur(img, (3, 3), 0)
cv2.HoughCircles(img, cv2.HOUGH_GRADIENT, 2, 10000, param1=50, param2=100, minRadius=200, maxRadius=0)
It seemed like using HoughCircles wouldn't be the right way to do this, so I moved on to another possible solution I found on the internet.
2) Finding all the countours by filtering the 'black' color range in which the circles seem to be on the pictures and than finding the largest one. The problem with this solution seemed to be that sometimes the pictures had a shadow that destroyed the outer circle and therefore it seemed impossible to crop by it.
My code looked like this:
# black color boundaries [B, G, R]
lower = [0, 0, 0]
upper = [150, 150, 150]
# create NumPy arrays from the boundaries
lower = np.array(lower, dtype="uint8")
upper = np.array(upper, dtype="uint8")
# find the colors within the specified boundaries and apply the mask
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask=mask)
ret, thresh = cv2.threshold(mask, 40, 255, 0)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
if len(contours) != 0:
# draw in blue the contours that were founded
cv2.drawContours(output, contours, -1, 255, 3)
# find the biggest countour (c) by the area
c = max(contours, key=cv2.contourArea)
x, y, w, h = cv2.boundingRect(c)
After that, I would try to draw a circle by the largest found contour (c) and crop by it. But I have already seen that the drawn circles weren't complete (probably due to some shadow on the picture) and therefore this wouldn't work anyway.
After those failures, I have tried so many solutions from other questions on here, but none would work for my problem.
Example images:
Target example 1
Target example 2
Target to calc score 1
Target to calc score 2
To be completely honest with you, I'm really lost on how to go about this. I would appreciate any help, advice, anything.
There are two different types of target in your samples. You may want to process them separately or ask the user for the input, what kind of target it is. Basically, you want to know how large the black part of the target, does it cover 7-10 or 4-10.
Binarize your image. Build a histogram along X and Y -- you'll find the position of the black part of your target as (x_left, x_right, y_top, y_bottom). Once you know that, you can calculate the center ((top+bottom)/2, (left+right)/2). After that you can easily calculate the score for every pixel of the image, since you know the center, the black spot size and the number of different score areas within.
I have a dataset of x-ray images that i am trying to clean by rotating the images so the arm is vertical and cropping the image of any excess space. Here are some examples from the dataset:
I am currently working out the best way to work out the angle of the x-ray and rotate the image based on that.
My curent approach is to detect the line of the side of the rectangle that the scan is in using the hough transform, and rotate the image based on that.
I tried to run the hough transform on the output of a canny edge detector but this doesnt work so well for images where the edge of the rectangle is blurred like in the first image.
I cant use cv's box detection as sometimes the rectangle around the scan has an edge off screen.
So i currently use adaptive thresholding to find the edge of the box and then median filter it and try to find the longest line in this, but sometimes the wrong line is the longest and the image gets rotated completley wrong.
Adaptive thresholding is used due to the fact that soem scans have different brightnesses.
The current implementation i have is:
def get_lines(img):
#threshold
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 15, 4.75)
median = cv2.medianBlur(thresh, 3)
# detect lines
lines = cv2.HoughLines(median, 1, np.pi/180, 175)
return sorted(lines, key=lambda x: x[0][0], reverse=True)
def rotate(image, angle):
(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
return cv2.warpAffine(image, M, (nW, nH))
def fix_rotation(input):
lines = get_lines(input)
rho, theta = lines[0][0]
return rotate_bound(input, theta*180/np.pi)
and produces the following results:
When it goes wrong:
I was wondering if there are any better techniques to usein order to improve the performance of this and what the best way to go about cropping the images after they have been rotated would be?
The idea is to use the blob of the arm itself and fit an ellipse around it. Then, extract its major axis. I quickly tested the idea in Matlab – not OpenCV. Here's what I did, you should be able to use OpenCV's equivalent functions to achieve similar outputs.
First, compute the threshold value of your input via Otsu. Then add some bias to the threshold value to find a better segmentation and use this value to threshold the image.
In pseudo-code:
//the bias value
threshBias = 0.4;
//get the binary threshold via otsu:
thresholdLevel = graythresh( grayInput, “otsu” );
//add bias to the original value
thresholdLevel = thresholdLevel - threshSensitivity * thresholdLevel;
//get the fixed binary image:
thresholdLevel = imbinarize( grayInput, thresholdLevel );
After small blob filtering, this is the output:
Now, get the contours/blobs and fit an ellipse for each contour. Check out the OpenCV example here: https://docs.opencv.org/3.4.9/de/d62/tutorial_bounding_rotated_ellipses.html
You end up with two ellipses:
We are looking for the biggest ellipse, the one with the biggest area and the biggest major and minor axis. I used the width and height of each ellipse to filter the results. The target ellipse is then colored in green. Finally, I get the major axis of the target ellipse, here colored in yellow:
Now, to implement these ideas in OpenCV you have these options:
Use fitEllipse to find the ellipses. The return value of this
function is a RotatedRect object. The data stored here are the
vertices of the ellipse.
Instead of fitting an ellipse you could try using minAreaRect, which
finds a rotated rectangle of the minimum area enclosing a blob.
You can use image moments to calculate the rotation angle.
Using opencv moments function, calculate the second order central moments to construct a covariance matrix and then obtain the orientation as shown here in the Image moment wiki page.
Obtain the normalized central moments nu20, nu11 and nu02 from opencv moments. Then the orientation is calculated as
0.5 * arctan(2 * nu11/(nu20 - nu02))
Please refer the given link for details.
You can use the raw image itself or the preprocessed one for the calculation of orientation. See which one gives you better accuracy and use it.
As for the bounding-box, once you rotate the image, assuming you used the preprocessed one, get all the non-zero pixel coordinates of the rotated image and calculate their upright bounding-box using opencv boundingRect.
I need to detect the vein junctions of wings bee (the image is just one example). I use opencv - python.
ps: maybe the image lost a little bit of quality, but the image is all connected with one pixel wide.
This is an interesting question. The result I got is not perfect, but it might be a good start. I filtered the image with a kernel that only looks at the edges of the kernel. The idea being, that a junction has at least 3 lines that cross the kernel-edge, where regular lines only have 2. This means that when the kernel is over a junction, the resulting value will be higher, so a threshold will reveal them.
Due to the nature of the lines there are some value positives and some false negatives. A single joint will most likely be found several times, so you'll have to account for that. You can make them unique by drawing small dots and detecting those dots.
Result:
Code:
import cv2
import numpy as np
# load the image as grayscale
img = cv2.imread('xqXid.png',0)
# make a copy to display result
im_or = img.copy()
# convert image to larger datatyoe
img.astype(np.int32)
# create kernel
kernel = np.ones((7,7))
kernel[2:5,2:5] = 0
print(kernel)
#apply kernel
res = cv2.filter2D(img,3,kernel)
# filter results
loc = np.where(res > 2800)
print(len(loc[0]))
#draw circles on found locations
for x in range(len(loc[0])):
cv2.circle(im_or,(loc[1][x],loc[0][x]),10,(127),5)
#display result
cv2.imshow('Result',im_or)
cv2.waitKey(0)
cv2.destroyAllWindows()
Note: you can try to tweak the kernel and the threshold. For example, with the code above I got 126 matches. But when I use
kernel = np.ones((5,5))
kernel[1:4,1:4] = 0
with threshold
loc = np.where(res > 1550)
I got 33 matches in these locations:
You can use Harris corner detector algorithm to detect vein junction in above image. Compared to the previous techniques, Harris corner detector takes the differential of the corner score into account with reference to direction directly, instead of using shifting patches for every 45 degree angles, and has been proved to be more accurate in distinguishing between edges and corners (Source: wikipedia).
code:
img = cv2.imread('wings-bee.png')
# convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)
'''
args:
img - Input image, it should be grayscale and float32 type.
blockSize - It is the size of neighbourhood considered for corner detection
ksize - Aperture parameter of Sobel derivative used.
k - Harris detector free parameter in the equation.
'''
dst = cv2.cornerHarris(gray, 9, 5, 0.04)
# result is dilated for marking the corners
dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img_thresh = cv2.threshold(dst, 0.32*dst.max(), 255, 0)[1]
img_thresh = np.uint8(img_thresh)
# get the matrix with the x and y locations of each centroid
centroids = cv2.connectedComponentsWithStats(img_thresh)[3]
stop_criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# refine corner coordinates to subpixel accuracy
corners = cv2.cornerSubPix(gray, np.float32(centroids), (5,5), (-1,-1), stop_criteria)
for i in range(1, len(corners)):
#print(corners[i])
cv2.circle(img, (int(corners[i,0]), int(corners[i,1])), 5, (0,255,0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
output:
You can check the theory behind Harris Corner detector algorithm from here.
I have found a plethora of questions regarding finding "things" in images using openCV, et al. in Python but so far I have been unable to piece them together for a reliable solution to my problem.
I am attempting to use computer vision to help count tiny surface mount electronics parts. The idea is for me to dump parts onto a solid color piece of paper, snap a picture, and have the software tell me how many items are in it.
The "things" differ from one picture to the next but will always be identical in any one image. I seem to be able to manually tune the parameters for things like hue/saturation for a particular part but it tends to require tweaking every time I change to a new part.
My current, semi-functioning code is posted below:
import imutils
import numpy
import cv2
import sys
def part_area(contours, round=10):
"""Finds the mode of the contour area. The idea is that most of the parts in an image will be separated and that
finding the most common area in the list of areas should provide a reasonable value to approximate by. The areas
are rounded to the nearest multiple of 200 to reduce the list of options."""
# Start with a list of all of the areas for the provided contours.
areas = [cv2.contourArea(contour) for contour in contours]
# Determine a threshold for the minimum amount of area as 1% of the overall range.
threshold = (max(areas) - min(areas)) / 100
# Trim the list of areas down to only those that exceed the threshold.
thresholded = [area for area in areas if area > threshold]
# Round the areas to the nearest value set by the round argument.
rounded = [int((area + (round / 2)) / round) * round for area in thresholded]
# Remove any areas that rounded down to zero.
cleaned = [area for area in rounded if area != 0]
# Count the areas with the same values.
counts = {}
for area in cleaned:
if area not in counts:
counts[area] = 0
counts[area] += 1
# Reduce the areas down to only those that are in groups of three or more with the same area.
above = []
for area, count in counts.iteritems():
if count > 2:
for _ in range(count):
above.append(area)
# Take the mean of the areas as the average part size.
average = sum(above) / len(above)
return average
def find_hue_mode(hsv):
"""Given an HSV image as an input, compute the mode of the list of hue values to find the most common hue in the
image. This is used to determine the center for the background color filter."""
pixels = {}
for row in hsv:
for pixel in row:
hue = pixel[0]
if hue not in pixels:
pixels[hue] = 0
pixels[hue] += 1
counts = sorted(pixels.keys(), key=lambda key: pixels[key], reverse=True)
return counts[0]
if __name__ == "__main__":
# load the image and resize it to a smaller factor so that the shapes can be approximated better
image = cv2.imread(sys.argv[1])
# define range of blue color in HSV
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
center = find_hue_mode(hsv)
print 'Center Hue:', center
lower = numpy.array([center - 10, 50, 50])
upper = numpy.array([center + 10, 255, 255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower, upper)
inverted = cv2.bitwise_not(mask)
blurred = cv2.GaussianBlur(inverted, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 100)
dilated = cv2.dilate(edged, None, iterations=1)
eroded = cv2.erode(dilated, None, iterations=1)
# find contours in the thresholded image and initialize the shape detector
contours = cv2.findContours(eroded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
# Compute the area for a single part to use when setting the threshold and calculating the number of parts within
# a contour area.
part_area = part_area(contours)
# The threshold for a part's area - can't be too much smaller than the part itself.
threshold = part_area * 0.5
part_count = 0
for contour in contours:
if cv2.contourArea(contour) < threshold:
continue
# Sometimes parts are close enough together that they become one in the image. To battle this, the total area
# of the contour is divided by the area of a part (derived earlier).
part_count += int((cv2.contourArea(contour) / part_area) + 0.1) # this 0.1 "rounds up" slightly and was determined empirically
# Draw an approximate contour around each detected part to give the user an idea of what the tool has computed.
epsilon = 0.1 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
# Print the part count and show off the processed image.
print 'Part Count:', part_count
cv2.imshow("Image", image)
cv2.waitKey(0)
Here's an example of the type of input image I am using:
or this:
And I'm currently getting results like this:
The results clearly show that the script is having trouble identifying some parts and it's true Achilles heel seems to be when parts touch one another.
So my question/challenge is, what can I do to improve the reliability of this script?
The script is to be integrated into an existing Python tool so I am searching for a solution using Python. The solution does not need to be pure Python as I am willing to install whatever 3rd party libraries might be needed.
If the objects are all of similar types, you might have more success isolating a single example in the image and then using feature matching to detect them.
A full solution would be out of scope for Stack Overflow, but my suggestion for progress would be to first somehow find one or more "correct" examples using your current rectangle retrieval method. You could probably look for all your samples that are of the expected size, or that are accurate rectangles.
Once you have isolated a few positive examples, use some feature matching techniques to find the others. There is a lot of reading up you probably need to do on it but that is a potential solution.
A general summary is that you use your positive examples to find "features" of the object you want to detect. These "features" are generally things like corners or changes in gradient. OpenCV contains many methods you can use.
Once you have the features, there are several algorithms in OpenCV you can look at that will search the image for all matching features. You’ll want one that is rotation invariant (can detect the same features arranged in different rotation), but you probably don’t need scale invariance (can detect the same features at multiple scales).
My one concern with this method is that the items you are searching for in your images are quite small. It might be difficult to find good, consistent features to match on.
You're tackling a 2D object recognition problem, for which there are many possible approaches. You've gone about it using background/foreground segmentation, which is ok as you have control on the scene (laying down the background paper sheet). However this will always have fundamental limitations when the objects touch. A simple solution to your problem can be this:
1) You assume that touching objects are rare events (which is a fine assumption in your problem). Therefore you can compute the areas for each segmented region, and compute the median of these, which will give a robust estimate for the object's area. Let's call this robust estimate A (in squared pixels). This will be fine if fewer than 50% of regions correspond to touching objects.
2) You then proceed to measure the number of objects in each segmented region. Let Ai be the area of the ith region. You then compute the number of objects in each region by Ni=round(Ai/A). You then sum Ni to give you the total number of objects.
This approach will be fine as long as the following conditions are met:
A) The touching objects do not significantly overlap
B) You do not have objects lying on their sides. If you do you might be able to deal with this using two area estimates (side and flat). Better to eliminate this scenario if you can for simplicity.
C) The objects are all roughly the same distance to the camera. If this is not the case then the areas of the objects (in pixels) cannot be modelled well by a single value.
D) There are not partially visible objects at the borders of the image.
E) You ensure that only the same type of object is visible in each image.