I'm working with license plates, what I do is apply a series of filters to it, such as:
Grayscale
Blur
Threshhold
Binary
The problem is when I doing this, there are some contour like this image at borders, how can I clear them? or make it just black color (masked)? I used this code but sometimes it falls.
# invert image and detect contours
inverted = cv2.bitwise_not(image_binary_and_dilated)
contours, hierarchy = cv2.findContours(inverted,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
# get the biggest contour
biggest_index = -1
biggest_area = -1
i = 0
for c in contours:
area = cv2.contourArea(c)
if area > biggest_area:
biggest_area = area
biggest_index = i
i = i+1
print("biggest area: " + str(biggest_area) + " index: " + str(biggest_index))
cv2.drawContours(image_binary_and_dilated, contours, biggest_index, [0,0,255])
center, size, angle = cv2.minAreaRect(contours[biggest_index])
rot_mat = cv2.getRotationMatrix2D(center, angle, 1.)
#cv2.warpPerspective()
print(size)
dst = cv2.warpAffine(inverted, rot_mat, (int(size[0]), int(size[1])))
mask = dst * 0
x1 = max([int(center[0] - size[0] / 2)+1, 0])
y1 = max([int(center[1] - size[1] / 2)+1, 0])
x2 = int(center[0] + size[0] / 2)-1
y2 = int(center[1] + size[1] / 2)-1
point1 = (x1, y1)
point2 = (x2, y2)
print(point1)
print(point2)
cv2.rectangle(dst, point1, point2, [0,0,0])
cv2.rectangle(mask, point1, point2, [255,255,255], cv2.FILLED)
masked = cv2.bitwise_and(dst, mask)
#cv2_imshow(imgg)
cv2_imshow(dst)
cv2_imshow(masked)
#cv2_imshow(mask)
Some results:
The original plates were:
Good result 1
Good result 2
Good result 3
Good result 4
Bad result 1
Bad result 2
Binary plates are:
Image 1
Image 2
Image 3
Image 4
Image 5 - Bad result 1
Image 6 - Bad result 2
How can I fix this code? only that I want to avoid that bad result or improve it.
INTRODUCTION
What you are asking starts to become complicated, and I believe there is not anymore a right or wrong answer, just different ways to do this. Almost all of them will yield positive and negative results, most likely in a different ratio. Having a 100% positive result is quite a challenging task, and I do believe my answer does not reach it. Yet it can be the basis for a more sophisticated work towards that goal.
MY PROPOSAL
So, I want to make a different proposal here.
I am not 100% sure why you are doing all the steps, and I believe some of them could be unnecessary.
Let's start from the problem: you want to remove the white parts on the borders (which are not numbers).
So, we need an idea about how to distinguish them from the letters, in order to correctly tackle them.
If we just try to contour and warp, it is likely to work on some images and not on others, because not all of them look the same. This is the hardest problem to have a general solution that works for many images.
What are the difference between the characteristics of the numbers and the characteristics of the borders (and other small points?):
after thinking about that, I would say: the shapes! That meaning, if you would imagine a bounding box around a letter/number, it would look like a rectangle, whose size is related to the image size. While in the case of the border, they are usually very large and narrow, or too small to be considered a letter/number (random points).
Therefore, my guess would be on segmentation, dividing the features via their shape. So we take the binary image, we remove some parts using the projection on their axes (as you correctly asked in the previous question and I believe we should use) and we get an image where each letter is separated from the white borders.
Then we can segment and check the shape of each segmented object, and if we think these are letters, we keep them, otherwise we discard them.
THE CODE
I wrote the code before as an example on your data. Some of the parameters are tuned on this set of images, so they may have to be relaxed for a larger dataset.
import cv2
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import scipy.ndimage as ndimage
# do this for all the images
num_images = 6
plt.figure(figsize=(16,16))
for k in range(num_images):
# read the image
binary_image = cv2.imread("binary_image/img{}.png".format(k), cv2.IMREAD_GRAYSCALE)
# just for visualization purposes, I create another image with the same shape, to show what I am doing
new_intermediate_image = np.zeros((binary_image.shape), np.uint8)
new_intermediate_image += binary_image
# here we will copy only the cleaned parts
new_cleaned_image = np.zeros((binary_image.shape), np.uint8)
### THIS CODE COMES FROM THE PREVIOUS ANSWER:
# https://stackoverflow.com/questions/62127537/how-to-clean-binary-image-using-horizontal-projection?noredirect=1&lq=1
(rows,cols)=binary_image.shape
h_projection = np.array([ x/rows for x in binary_image.sum(axis=0)])
threshold_h = (np.max(h_projection) - np.min(h_projection)) / 10
print("we will use threshold {} for horizontal".format(threshold))
# select the black areas
black_areas_horizontal = np.where(h_projection < threshold_h)
for j in black_areas_horizontal:
new_intermediate_image[:, j] = 0
v_projection = np.array([ x/cols for x in binary_image.sum(axis=1)])
threshold_v = (np.max(v_projection) - np.min(v_projection)) / 10
print("we will use threshold {} for vertical".format(threshold_v))
black_areas_vertical = np.where(v_projection < threshold_v)
for j in black_areas_vertical:
new_intermediate_image[j, :] = 0
### UNTIL HERE
# define the features we are looking for
# this parameters can also be tuned
min_width = binary_image.shape[1] / 14
max_width = binary_image.shape[1] / 2
min_height = binary_image.shape[0] / 5
max_height = binary_image.shape[0]
print("we look for feature with width in [{},{}] and height in [{},{}]".format(min_width, max_width, min_height, max_height))
# segment the iamge
labeled_array, num_features = ndimage.label(new_intermediate_image)
# loop over all features found
for i in range(num_features):
# get a bounding box around them
slice_x, slice_y = ndimage.find_objects(labeled_array==i)[0]
roi = labeled_array[slice_x, slice_y]
# check the shape, if the bounding box is what we expect, copy it to the new image
if roi.shape[0] > min_height and \
roi.shape[0] < max_height and \
roi.shape[1] > min_width and \
roi.shape[1] < max_width:
new_cleaned_image += (labeled_array == i)
# print all images on a grid
plt.subplot(num_images,3,1+(k*3))
plt.imshow(binary_image)
plt.subplot(num_images,3,2+(k*3))
plt.imshow(new_intermediate_image)
plt.subplot(num_images,3,3+(k*3))
plt.imshow(new_cleaned_image)
that produces the output (in the grid, left image are the input images, central one are the images after the mask based on histogram projections, and on the right are the cleaned images):
CONCLUSIONS:
As said above, this method does not yield 100% positive results. The last picture has lower quality and some parts are unconnected, and they are lost in the process. I personally believe this is a price to pay to get cleaner image, and if you have a lot of images, it won't be a problem, and you can remove those kind of images. Overall, I think this method returns quite clear images, where all other parts that are not letters or numbers are correctly removed.
ADVANTAGES
the image is clean, nothing more than letters or numbers are kept
the parameters can be tuned, and should be consistent across images
in case of problem, using some prints or some debugging on the loop that chooses the features to keep should make it easier to understand where are the problem and correct them
LIMITATIONS
it may fail in some cases where letters and numbers touch the white borders, which seems quite possible. It is handled from the black_areas created using the projection, but I am not so confident this will work 100% of the time.
some small parts of the numbers can be lost during the process, as in the last picture.
Related
I have this set of images :
The leftmost one is the reference image.
I want to have a value telling me how close is any of the other images to the leftmost one.
I experimented with matchShapes(), by calling it for each contour and averaging the values, but I didn't get useful result (the rightmost one had a too high value, for example)
I would also want the matching to work only in the correct orientation.
If they're purely black and white images it would probably be easier to just AND the two pictures together and sum up the total pixels left in the result.
Something like this:
import cv2
import numpy as np
x = np.zeros((100,100))
y = np.zeros((100,100))
for i in range(25,75):
x[i][i] = 255
y[i][100-i] = 255
cv2.imshow('x', x)
cv2.imshow('y', y)
z = cv2.bitwise_and(x,y)
sum = 0
for i in range(0,z.shape[0]):
for j in range(0,z.shape[1]):
if z[i][j] == 255:
sum += 1
print(f"Similarity Score: {sum}")
cv2.imshow('z',z)
cv2.waitKey(0)
There probably exists some better library to perform this all in one line but if performance isn't much of a concern perhaps this could work.
It was difficult to not recognize images that were too different. With the methods proposed here, I always got really close values for images that I thought were too different to correspond.
In the end, I did a multistep process:
First I got the contour of the test image like so :
testContours, _ = cv.findContours(testImage, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
Then, if the contour count between the test image and the original image are not the same, I abort.
If they have the same contour count, I then calculate the average between all shape distances of the contours :
distances = []
sd = cv2.createShapeContextDistanceExtractor()
for i in range(len(testContours)):
d2 = sd.computeDistance(testContours[i], originalContours[i])
distances.append(d2)
value = sum(distances) / len(distances)
Then, I count the number of white pixels after AND-ing the two images, divided by the total number of pixels in the source image (in case the contours match but are not placed correctly)
exactly_placed_ratio = cv.countNonZero(cv.bitwise_and(testImage, originalImage)) / cv.countNonZero(originalImage)
In the end I have two values, I can use the first one to check if the shapes are close enough, and the second one to check if they are in the right position relative to the whole image.
I have a bundle of different images with birds but some of them contains feathers, slightly visible birds etc. I need to find images where the bird visible well and remove images with feathers, far distant birds, etc.
I already tried ORB, simple template matching, Canny edge detection. And I cannot use neural nets.
Now i try with such algorithm:
Binarize template image to get shapes
Slide window over another binarized image with sliding window and calculate matchShape with template in every window
Find best match
As you can see this method gives me strange result
Binary template
.
Shape on the other binary image, for example:
I calculated matchShapes in different parts of this image and the best result ~ 0.05 I got in this part:
which is obviously not similar to original shape.
Code for sliding window:
import cv2
OFFSET = 5
SCALE_RATIO = [0.5, 1]
def get_scaled_list(img_path, template):
matcher_list = []
img = cv2.imread(img_path)
#JUST BINARIZATION AND RESIZING
img = preprocess(resize_image(img))
height, width = img.shape
# building size of scale window
for scaler in SCALE_RATIO:
x_point = 0
y_point = 0
x1_point = int(width * scaler)
y1_point = x1_point
if x1_point > height:
y1_point = height
while y1_point <= height:
while x1_point <= width:
img1 = img[y_point:y1_point, x_point:x1_point]
#Comparing template and part of image
diff = cv2.matchShapes(template, img1, cv2.CONTOURS_MATCH_I1, 0)
data_tuple = (img_path, x_point, y_point, int(width * scaler), diff)
matcher_list.append(data_tuple)
x_point += OFFSET
x1_point += OFFSET
x_point = 0
x1_point = int(width * scaler)
y_point += OFFSET
y1_point += OFFSET
return matcher_list
How can I perform correct shape matching and why is the best result performs here?
The naive window sliding method with a rigid template will work very poorly. In particular, the sizes are different making correct overlap impossible.
What you are trying to achieve is difficult because you only have edge information and the edges are complex, broken in several independent arcs and with junctions.
You can find many solutions when you have a single closed curve (lookup "elastic contour matching" for instance), but not for your case. This would be a case of "approximate elastic graph matching".
Other possible approaches are by special distance functions such as the chamfer or Hausdorff distances, but you can still be stuck because of the size mismatch.
I have an image that is given as below, which I am trying to convert into a binary image.
To convert this image into a binary image, I did:
image[image > 0] = 255
This creates a binary image with colored region having only white pixels. But I also want to convert the pixels that are above the colored region to value 255. How could I do this? I do not only want to convert the colored pixels to white but also the area above it. Is there I could do this? The area denoted by arrows will remain black (i.e area after the colored region)
UPDATE
Also, how could I approach if the edges are as shown below:
If I understood correctly your problem a more elaborated approach should be taken to get the desired result.
First of all the approach to use a simple threshold creates a noisy approach.
I used a modified image of your sample:
If you apply your thresholding then a result like this might come up:
A finer thresholding can be useful here:
image3 = cv2.inRange(image, np.array([10, 10, 10]), np.array([255, 255, 255]))
which creates a binary image as a result (resembles your desired output except for the upper strip):
TO get rid of the strip I would (it's just an approach not the perfect one though) use something to find the corner created by the white region and then use it to draw all the region above it with black:
ind = np.where(image3 == 255)
max_x = np.max(ind[1])
max_y = ind[0][np.argmax(ind[1])]
image3[:max_y, :max_x] = 255
And the result would be like this:
By all mean this is not the perfect answer. But it might be something helpful.
I recreated the image as follows:
Then I read it and followed your path with making it binary first (with a small modification to reduce noises):
import numpy as np
from matplotlib import pyplot as plt
img = plt.imread("sample.jpg")
img2 = img.copy()
img2[img2.sum(-1) > 30] = 255
img2[img2.sum(-1) <= 30] = 0
Here is the result after this modification:
OPTION 1
This might not be what you asked but it is similar to one of the solutions discussed in the comments, and I think it is partly correct:
i, j = np.where(img2.sum(-1) > 0) # find all white coordinates
i, j = (i[j.argmax()], j[j.argmax()]) # the corner white point into the black
img2[:i, :j] = 255 # paint whine all the left-above rectangle from this point
Here is the final result:
This is an imperfect but pretty simple pure numpy solution.
OPTION 2
In this solution, we need some simple calculus and linear algebra. Take two points in 2D space and draw a line between them. So, what is the function of the borderline?
point2 = (i, j) # same i and j from OPTION1 (coordinates of the top-right corner)
point1 = (img2.shape[0], img2[-1].sum(-1).argmin()) # the bottom-right white corner.
a = (point2[1] - point1[1]) / (point2[0] - point1[0])
c = point1[1] - a * point1[0]
f = lambda x: int(a * x + c)
Now, paint all areas to the left of the line:
for i in range(img2.shape[0]):
img2[:i, :f(i)+1] = 255
Here is the result:
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.
I have found a plethora of questions regarding finding "things" in images using openCV, et al. in Python but so far I have been unable to piece them together for a reliable solution to my problem.
I am attempting to use computer vision to help count tiny surface mount electronics parts. The idea is for me to dump parts onto a solid color piece of paper, snap a picture, and have the software tell me how many items are in it.
The "things" differ from one picture to the next but will always be identical in any one image. I seem to be able to manually tune the parameters for things like hue/saturation for a particular part but it tends to require tweaking every time I change to a new part.
My current, semi-functioning code is posted below:
import imutils
import numpy
import cv2
import sys
def part_area(contours, round=10):
"""Finds the mode of the contour area. The idea is that most of the parts in an image will be separated and that
finding the most common area in the list of areas should provide a reasonable value to approximate by. The areas
are rounded to the nearest multiple of 200 to reduce the list of options."""
# Start with a list of all of the areas for the provided contours.
areas = [cv2.contourArea(contour) for contour in contours]
# Determine a threshold for the minimum amount of area as 1% of the overall range.
threshold = (max(areas) - min(areas)) / 100
# Trim the list of areas down to only those that exceed the threshold.
thresholded = [area for area in areas if area > threshold]
# Round the areas to the nearest value set by the round argument.
rounded = [int((area + (round / 2)) / round) * round for area in thresholded]
# Remove any areas that rounded down to zero.
cleaned = [area for area in rounded if area != 0]
# Count the areas with the same values.
counts = {}
for area in cleaned:
if area not in counts:
counts[area] = 0
counts[area] += 1
# Reduce the areas down to only those that are in groups of three or more with the same area.
above = []
for area, count in counts.iteritems():
if count > 2:
for _ in range(count):
above.append(area)
# Take the mean of the areas as the average part size.
average = sum(above) / len(above)
return average
def find_hue_mode(hsv):
"""Given an HSV image as an input, compute the mode of the list of hue values to find the most common hue in the
image. This is used to determine the center for the background color filter."""
pixels = {}
for row in hsv:
for pixel in row:
hue = pixel[0]
if hue not in pixels:
pixels[hue] = 0
pixels[hue] += 1
counts = sorted(pixels.keys(), key=lambda key: pixels[key], reverse=True)
return counts[0]
if __name__ == "__main__":
# load the image and resize it to a smaller factor so that the shapes can be approximated better
image = cv2.imread(sys.argv[1])
# define range of blue color in HSV
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
center = find_hue_mode(hsv)
print 'Center Hue:', center
lower = numpy.array([center - 10, 50, 50])
upper = numpy.array([center + 10, 255, 255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower, upper)
inverted = cv2.bitwise_not(mask)
blurred = cv2.GaussianBlur(inverted, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 100)
dilated = cv2.dilate(edged, None, iterations=1)
eroded = cv2.erode(dilated, None, iterations=1)
# find contours in the thresholded image and initialize the shape detector
contours = cv2.findContours(eroded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
# Compute the area for a single part to use when setting the threshold and calculating the number of parts within
# a contour area.
part_area = part_area(contours)
# The threshold for a part's area - can't be too much smaller than the part itself.
threshold = part_area * 0.5
part_count = 0
for contour in contours:
if cv2.contourArea(contour) < threshold:
continue
# Sometimes parts are close enough together that they become one in the image. To battle this, the total area
# of the contour is divided by the area of a part (derived earlier).
part_count += int((cv2.contourArea(contour) / part_area) + 0.1) # this 0.1 "rounds up" slightly and was determined empirically
# Draw an approximate contour around each detected part to give the user an idea of what the tool has computed.
epsilon = 0.1 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
# Print the part count and show off the processed image.
print 'Part Count:', part_count
cv2.imshow("Image", image)
cv2.waitKey(0)
Here's an example of the type of input image I am using:
or this:
And I'm currently getting results like this:
The results clearly show that the script is having trouble identifying some parts and it's true Achilles heel seems to be when parts touch one another.
So my question/challenge is, what can I do to improve the reliability of this script?
The script is to be integrated into an existing Python tool so I am searching for a solution using Python. The solution does not need to be pure Python as I am willing to install whatever 3rd party libraries might be needed.
If the objects are all of similar types, you might have more success isolating a single example in the image and then using feature matching to detect them.
A full solution would be out of scope for Stack Overflow, but my suggestion for progress would be to first somehow find one or more "correct" examples using your current rectangle retrieval method. You could probably look for all your samples that are of the expected size, or that are accurate rectangles.
Once you have isolated a few positive examples, use some feature matching techniques to find the others. There is a lot of reading up you probably need to do on it but that is a potential solution.
A general summary is that you use your positive examples to find "features" of the object you want to detect. These "features" are generally things like corners or changes in gradient. OpenCV contains many methods you can use.
Once you have the features, there are several algorithms in OpenCV you can look at that will search the image for all matching features. You’ll want one that is rotation invariant (can detect the same features arranged in different rotation), but you probably don’t need scale invariance (can detect the same features at multiple scales).
My one concern with this method is that the items you are searching for in your images are quite small. It might be difficult to find good, consistent features to match on.
You're tackling a 2D object recognition problem, for which there are many possible approaches. You've gone about it using background/foreground segmentation, which is ok as you have control on the scene (laying down the background paper sheet). However this will always have fundamental limitations when the objects touch. A simple solution to your problem can be this:
1) You assume that touching objects are rare events (which is a fine assumption in your problem). Therefore you can compute the areas for each segmented region, and compute the median of these, which will give a robust estimate for the object's area. Let's call this robust estimate A (in squared pixels). This will be fine if fewer than 50% of regions correspond to touching objects.
2) You then proceed to measure the number of objects in each segmented region. Let Ai be the area of the ith region. You then compute the number of objects in each region by Ni=round(Ai/A). You then sum Ni to give you the total number of objects.
This approach will be fine as long as the following conditions are met:
A) The touching objects do not significantly overlap
B) You do not have objects lying on their sides. If you do you might be able to deal with this using two area estimates (side and flat). Better to eliminate this scenario if you can for simplicity.
C) The objects are all roughly the same distance to the camera. If this is not the case then the areas of the objects (in pixels) cannot be modelled well by a single value.
D) There are not partially visible objects at the borders of the image.
E) You ensure that only the same type of object is visible in each image.