I have found a plethora of questions regarding finding "things" in images using openCV, et al. in Python but so far I have been unable to piece them together for a reliable solution to my problem.
I am attempting to use computer vision to help count tiny surface mount electronics parts. The idea is for me to dump parts onto a solid color piece of paper, snap a picture, and have the software tell me how many items are in it.
The "things" differ from one picture to the next but will always be identical in any one image. I seem to be able to manually tune the parameters for things like hue/saturation for a particular part but it tends to require tweaking every time I change to a new part.
My current, semi-functioning code is posted below:
import imutils
import numpy
import cv2
import sys
def part_area(contours, round=10):
"""Finds the mode of the contour area. The idea is that most of the parts in an image will be separated and that
finding the most common area in the list of areas should provide a reasonable value to approximate by. The areas
are rounded to the nearest multiple of 200 to reduce the list of options."""
# Start with a list of all of the areas for the provided contours.
areas = [cv2.contourArea(contour) for contour in contours]
# Determine a threshold for the minimum amount of area as 1% of the overall range.
threshold = (max(areas) - min(areas)) / 100
# Trim the list of areas down to only those that exceed the threshold.
thresholded = [area for area in areas if area > threshold]
# Round the areas to the nearest value set by the round argument.
rounded = [int((area + (round / 2)) / round) * round for area in thresholded]
# Remove any areas that rounded down to zero.
cleaned = [area for area in rounded if area != 0]
# Count the areas with the same values.
counts = {}
for area in cleaned:
if area not in counts:
counts[area] = 0
counts[area] += 1
# Reduce the areas down to only those that are in groups of three or more with the same area.
above = []
for area, count in counts.iteritems():
if count > 2:
for _ in range(count):
above.append(area)
# Take the mean of the areas as the average part size.
average = sum(above) / len(above)
return average
def find_hue_mode(hsv):
"""Given an HSV image as an input, compute the mode of the list of hue values to find the most common hue in the
image. This is used to determine the center for the background color filter."""
pixels = {}
for row in hsv:
for pixel in row:
hue = pixel[0]
if hue not in pixels:
pixels[hue] = 0
pixels[hue] += 1
counts = sorted(pixels.keys(), key=lambda key: pixels[key], reverse=True)
return counts[0]
if __name__ == "__main__":
# load the image and resize it to a smaller factor so that the shapes can be approximated better
image = cv2.imread(sys.argv[1])
# define range of blue color in HSV
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
center = find_hue_mode(hsv)
print 'Center Hue:', center
lower = numpy.array([center - 10, 50, 50])
upper = numpy.array([center + 10, 255, 255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower, upper)
inverted = cv2.bitwise_not(mask)
blurred = cv2.GaussianBlur(inverted, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 100)
dilated = cv2.dilate(edged, None, iterations=1)
eroded = cv2.erode(dilated, None, iterations=1)
# find contours in the thresholded image and initialize the shape detector
contours = cv2.findContours(eroded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
# Compute the area for a single part to use when setting the threshold and calculating the number of parts within
# a contour area.
part_area = part_area(contours)
# The threshold for a part's area - can't be too much smaller than the part itself.
threshold = part_area * 0.5
part_count = 0
for contour in contours:
if cv2.contourArea(contour) < threshold:
continue
# Sometimes parts are close enough together that they become one in the image. To battle this, the total area
# of the contour is divided by the area of a part (derived earlier).
part_count += int((cv2.contourArea(contour) / part_area) + 0.1) # this 0.1 "rounds up" slightly and was determined empirically
# Draw an approximate contour around each detected part to give the user an idea of what the tool has computed.
epsilon = 0.1 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
# Print the part count and show off the processed image.
print 'Part Count:', part_count
cv2.imshow("Image", image)
cv2.waitKey(0)
Here's an example of the type of input image I am using:
or this:
And I'm currently getting results like this:
The results clearly show that the script is having trouble identifying some parts and it's true Achilles heel seems to be when parts touch one another.
So my question/challenge is, what can I do to improve the reliability of this script?
The script is to be integrated into an existing Python tool so I am searching for a solution using Python. The solution does not need to be pure Python as I am willing to install whatever 3rd party libraries might be needed.
If the objects are all of similar types, you might have more success isolating a single example in the image and then using feature matching to detect them.
A full solution would be out of scope for Stack Overflow, but my suggestion for progress would be to first somehow find one or more "correct" examples using your current rectangle retrieval method. You could probably look for all your samples that are of the expected size, or that are accurate rectangles.
Once you have isolated a few positive examples, use some feature matching techniques to find the others. There is a lot of reading up you probably need to do on it but that is a potential solution.
A general summary is that you use your positive examples to find "features" of the object you want to detect. These "features" are generally things like corners or changes in gradient. OpenCV contains many methods you can use.
Once you have the features, there are several algorithms in OpenCV you can look at that will search the image for all matching features. You’ll want one that is rotation invariant (can detect the same features arranged in different rotation), but you probably don’t need scale invariance (can detect the same features at multiple scales).
My one concern with this method is that the items you are searching for in your images are quite small. It might be difficult to find good, consistent features to match on.
You're tackling a 2D object recognition problem, for which there are many possible approaches. You've gone about it using background/foreground segmentation, which is ok as you have control on the scene (laying down the background paper sheet). However this will always have fundamental limitations when the objects touch. A simple solution to your problem can be this:
1) You assume that touching objects are rare events (which is a fine assumption in your problem). Therefore you can compute the areas for each segmented region, and compute the median of these, which will give a robust estimate for the object's area. Let's call this robust estimate A (in squared pixels). This will be fine if fewer than 50% of regions correspond to touching objects.
2) You then proceed to measure the number of objects in each segmented region. Let Ai be the area of the ith region. You then compute the number of objects in each region by Ni=round(Ai/A). You then sum Ni to give you the total number of objects.
This approach will be fine as long as the following conditions are met:
A) The touching objects do not significantly overlap
B) You do not have objects lying on their sides. If you do you might be able to deal with this using two area estimates (side and flat). Better to eliminate this scenario if you can for simplicity.
C) The objects are all roughly the same distance to the camera. If this is not the case then the areas of the objects (in pixels) cannot be modelled well by a single value.
D) There are not partially visible objects at the borders of the image.
E) You ensure that only the same type of object is visible in each image.
Related
I need to find the largest empty area in the document and display its coordinates, center point and area, using python to put a QR Code there.
I think OpenCV and Numpy should be enough for this task.
What kinda THRESH to use? Because there are a lot of types of scans:
gray, BW, with color, and how to find the contour properly?
How this can be implemented in the fastest way? An example using the
first scan from google is attached, where you can see that the code
should find the largest empty square area.
#Mark Setchell Thanks! This code works perfectly for all docs with a white background, but when I use smth with a color in the background it finds a completely different area. Also, to keep thin lines in the docs I used Erode after thresholding. Tried to change thresholding and erode parameters, still not working properly.
Edited post, added color pictures.
Here's a possible approach:
#!/usr/bin/env python3
import cv2
import numpy as np
def largestSquare(im):
# Make image square of 100x100 to simplify and speed up
s = 100
work = cv2.resize(im, (s,s), interpolation=cv2.INTER_NEAREST)
# Make output accumulator - uint16 is ok because...
# ... max value is 100x100, i.e. 10,000 which is less than 65,535
# ... and you can make a PNG of it too
p = np.zeros((s,s), np.uint16)
# Find largest square
for i in range(1, s):
for j in range(1, s):
if (work[i][j] > 0 ):
p[i][j] = min(p[i][j-1], p[i-1][j], p[i-1][j-1]) + 1
else:
p[i][j] = 0
# Save result - just for illustration purposes
cv2.imwrite("result.png",p)
# Work out what the actual answer is
ind = np.unravel_index(np.argmax(p, axis=None), p.shape)
print(f'Location: {ind}')
print(f'Length of side: {p[ind]}')
# Load image and threshold
im = cv2.imread('page.png', cv2.IMREAD_GRAYSCALE)
_, thr = cv2.threshold(im,127,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# Get largest white square
largestSquare(thr)
Output
Location: (21, 77)
Length of side: 18
Notes:
I edited out your red annotation so it didn't interfere with my algorithm.
I did Otsu thresholding to get pure black and white - that may or may not be appropriate to your use case. It will depend on your scans and paper background etc.
I scaled the image down to 100x100 so it doesn't take all day to run. You will need to scale the results back up to the size of your original image but I assume you can do that easily enough.
Keywords: Image processing, image, Python, OpenCV, largest white square, largest empty space.
I am working on a problem where i need to find the path the robot can take without hitting any crop rows.Raw Image
My initial approach was to convert this into birds eye view and then use canny and skeletonize techniques.Then I applied Hough transform to come up with the crop rows.This works well when the rows are straight but if i rotate the image by 45 degrees I couldn't find any rows with Hough transform.So I decided to use another approach.
First I only selected the green region and applied morphological filters to remove small branches which come out
img = cv2.imread('''')
min_green2 = np.array([45, 50, 50])
max_green2 = np.array([75, 250, 250])
image_blur = cv2.GaussianBlur(img, (5, 5), 0)
image_blur_hsv = cv2.cvtColor(image_blur, cv2.COLOR_BGR2HSV)
image_green = cv2.inRange(image_blur_hsv, min_red2, max_red2)
se1 = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
se2 = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
mask = cv2.morphologyEx(image_green, cv2.MORPH_OPEN, se1)
I ended up with thisFinal_output
Now I want to detect the path the robot can take which is the black region.So only the first row is my region of interest and I tried different methods to draw a line in center of the row but couldn't find any help in opencv.I did manage to get a work around by splitting the image into two vertically and used cv2.fitline function to get a line joining one side of row and did the same with other side of row and finally I plotted the center line.But this is not an ideal approach and I feel like there might be some opencv functions to do it in much better way.Can some one help me with this or guide me in right way.
This is the final output I am looking for
Final expected result with green color showing the center of path
So, here is my approach using numpy and scipy, which yielded this result:
.
Without doing any bluring or morphological operations, use the Canny edge detector:
edges = cv2.Canny(image, 100, 200, None, 3, cv2.DIST_L2)
Notice that most of the edges surround the track your robot wants to follow. Since each edge is a collection of white pixels, we could calculate a column's total intensity:
normalized = cv2.normalize(edges, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX)
column_intensity = normalized.sum(axis=0)
Plotting the results, we get
If we were to find the minimum of the graph, then we would find the x direction, where most of the edges are avoided. But first, let's smooth the function so as to avoid some noise.
# smooth function through moving average
window_size = 30
window = np.ones((window_size,)) / window_size
smoothed = np.convolve(column_intensity, window, mode="valid")
Since there are a lot of local minima, our additional constraint is that the x-direction the robot should take is the closest to the center of the image.
# find indices of local minima and select the one closest to the center
indices = scipy.signal.argrelmin(smoothed)[0]
distances = np.abs(indices - int(width / 2))
x = indices[np.argmin(distances)]
Now that we have the x-direction, we need to determine a y coordinate so as to estimate the angle the robot should rotate (tan(angle)=y/x). There are as many choices as there are rows in the image, which means the y coordinate needs to be manually set. If we choose a y closer to the robot, the angle will be more volatile as the robot advances. Conversely, if we choose a y that is far from the robot, then it will be less volatile but less accurate as well. That is up to you; the final image was created with a y = 400.
I hope this fits your needs :)
as the title states, I'm trying to crop the largest circle out of an image. I'm using OpenCV in python. To be exact, it's a shooting target, which always has the same format, but the picture of it can be taken with any mobile device and in different lighting conditions (I will include some examples lower).
I'm completely new to image recognition, so I have been trying out many different ways of doing this, but couldn't figure out a universal solution, that would work on all of my target images.
Why I'm trying to do this:
My assignment is to calculate score of one or multiple shots on the given target image. I have tried color segmentation to find the shots, but since the shots can be on different backgrounds, this wouldn't work properly. So now I'm trying to see the difference between the empty shooting target image and the already shot on target image. Also, I need to be able to tell, which target it was shot on (there are two target types). So I'm trying to crop out only the target from image to get rid of the background interference and then continue with the shot identifications.
What I have tried so far:
1) Finding the largest circle with HoughCircles. My next step would be to somehow remove the outer part of that found circle. I have played with the configuration of HoughCircles method for quite some time, but always one of the example images wasn't highlighting the most outer circle correctly or wasn't highlighting any of the circles :/.
My final configuration looked something like this:
img = cv2.GaussianBlur(img, (3, 3), 0)
cv2.HoughCircles(img, cv2.HOUGH_GRADIENT, 2, 10000, param1=50, param2=100, minRadius=200, maxRadius=0)
It seemed like using HoughCircles wouldn't be the right way to do this, so I moved on to another possible solution I found on the internet.
2) Finding all the countours by filtering the 'black' color range in which the circles seem to be on the pictures and than finding the largest one. The problem with this solution seemed to be that sometimes the pictures had a shadow that destroyed the outer circle and therefore it seemed impossible to crop by it.
My code looked like this:
# black color boundaries [B, G, R]
lower = [0, 0, 0]
upper = [150, 150, 150]
# create NumPy arrays from the boundaries
lower = np.array(lower, dtype="uint8")
upper = np.array(upper, dtype="uint8")
# find the colors within the specified boundaries and apply the mask
mask = cv2.inRange(img, lower, upper)
output = cv2.bitwise_and(img, img, mask=mask)
ret, thresh = cv2.threshold(mask, 40, 255, 0)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
if len(contours) != 0:
# draw in blue the contours that were founded
cv2.drawContours(output, contours, -1, 255, 3)
# find the biggest countour (c) by the area
c = max(contours, key=cv2.contourArea)
x, y, w, h = cv2.boundingRect(c)
After that, I would try to draw a circle by the largest found contour (c) and crop by it. But I have already seen that the drawn circles weren't complete (probably due to some shadow on the picture) and therefore this wouldn't work anyway.
After those failures, I have tried so many solutions from other questions on here, but none would work for my problem.
Example images:
Target example 1
Target example 2
Target to calc score 1
Target to calc score 2
To be completely honest with you, I'm really lost on how to go about this. I would appreciate any help, advice, anything.
There are two different types of target in your samples. You may want to process them separately or ask the user for the input, what kind of target it is. Basically, you want to know how large the black part of the target, does it cover 7-10 or 4-10.
Binarize your image. Build a histogram along X and Y -- you'll find the position of the black part of your target as (x_left, x_right, y_top, y_bottom). Once you know that, you can calculate the center ((top+bottom)/2, (left+right)/2). After that you can easily calculate the score for every pixel of the image, since you know the center, the black spot size and the number of different score areas within.
My task is to detect an object in a given image using OpenCV (I do not care whether it is the Python or C++ implementation). The object, shown below in three examples, is a black rectangle with five white rectagles within. All dimensions are known.
However, the rotation, scale, distance, perspective, lighting conditions, camera focus/lens, and background of the image are not known. The edge of the black rectangle is not guaranteed to be fully visible, however there will not be anything in front of the five white rectangles ever - they will always be fully visible. The end goal is to be able to detect the presence of this object within an image, and rotate, scale, and crop to show the object with the perspective removed. I am fairly confident that I can adjust the image to crop to just the object, given its four corners. However I am not so confident that I can reliably find those four corners. In ambiguous cases, not finding the object is preferred to misidentifying some other feature of the image as the object.
Using OpenCV I have come up with the following methods, however I feel I might be missing something obvious. Are there any more methods available, or is one of these the optimal solution?
Edge based outline
First idea was to look for the outside edge of the object.
Using Canny edge detection (after scaling to known size, grayscaling and gaussian blurring), finding a contour which best matches the outer shape of the object.
This deals with perspective, colour, size issues, but fails when there is a complicated background for example, or if there is something of similar shape to the object elsewhere in the image. Maybe this could be improved by a better set of rules for finding the correct contour - perhaps involving the five white rectangles as well as the outer edge.
Feature detection
The next idea was to match to a known template using feature detecting.
Using ORB feature detecting, descriptor matching and homography (from this tutorial) fails, I believe because the features it is detecting are very similar to other features within the object (lots of coreners which are precisely one-quarter white and three-quarters black). However, I do like the idea of matching to a known template - this idea makes sense to me. I suppose though that because the object is quite basic geometrically, it's likely to find a lot of false positives in the feature matching step.
Parallel Lines
Using Houghlines or HoughLinesP, looking for evenly spaced parallel lines. Have just started down this road so need to investigate the best methods for thresholding etc. While it looks messy for images with complex backgrounds, I think it may work well as I can rely on the fact that the white rectangles within the black object should always be high contrast, giving a good indication of where the lines are.
'Barcode Scan'
My final idea is to scan the image by line, looking for the white to black pattern.
I have not started this method, but the idea is to take a strip of the image (at some angle), convert to HSV colour space, and look for the regular black-to-white pattern appearing five times sequentially in the Value column. This idea sounds promising to me, as I believe it should ignore many of the unknown variables.
Thoughts
I have looked at a number of OpenCV tutorials, as well as SO questions such as this one, however because my object is quite geometrically simple I am having issues implementing the ideas given.
I feel like this is an achievable task, however my struggle is knowing which method to pursue further. I have experimented with the first two ideas quite a bit, and while I haven't achieved anything very reliable, maybe there is something I am missing. Is there a standard way of achieving this task which I have not thought of, or is one of my suggested methods the most sensible?
EDIT: Once the corners are found using one of the above methods (or some other method), I am thinking of using Hu Moments or OpenCV's matchShapes() function to remove any false positives.
EDIT2: Added some more input image examples as requested by #Timo
Orig1
Orig2
Orig3
Extra image 1
Extra image 2
Extra image 3
Extra image 4
I had some time looking into the problem and made a little python script. I'm detecting the white rectangles inside your shape. Paste the code into a .py file and copy all input images in an input subfolder. The final result of the image is just a dummy atm and the script isn't complete yet. I'll try to continue it in the next couple of days. The script will create a debug subfolder where it'll save some images that show the current detection state.
import numpy as np
import cv2
import os
INPUT_DIR = 'input'
DEBUG_DIR = 'debug'
OUTPUT_DIR = 'output'
IMG_TARGET_SIZE = 1000
# each algorithm must return a rotated rect and a confidence value [0..1]: (((x, y), (w, h), angle), confidence)
def main():
# a list of all used algorithms
algorithms = [rectangle_detection]
# load and prepare images
files = list(os.listdir(INPUT_DIR))
images = [cv2.imread(os.path.join(INPUT_DIR, f), cv2.IMREAD_GRAYSCALE) for f in files]
images = [scale_image(img) for img in images]
for img, filename in zip(images, files):
results = [alg(img, filename) for alg in algorithms]
roi, confidence = merge_results(results)
display = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
display = cv2.drawContours(display, [cv2.boxPoints(roi).astype('int32')], -1, (0, 230, 0))
cv2.imshow('img', display)
cv2.waitKey()
def merge_results(results):
'''Merges all results into a single result.'''
return max(results, key=lambda x: x[1])
def scale_image(img):
'''Scales the image so that the biggest side is IMG_TARGET_SIZE.'''
scale = IMG_TARGET_SIZE / np.max(img.shape)
return cv2.resize(img, (0,0), fx=scale, fy=scale)
def rectangle_detection(img, filename):
debug_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
_, binarized = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(binarized, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# detect all rectangles
rois = []
for contour in contours:
if len(contour) < 4:
continue
cont_area = cv2.contourArea(contour)
if not 1000 < cont_area < 15000: # roughly filter by the volume of the detected rectangles
continue
cont_perimeter = cv2.arcLength(contour, True)
(x, y), (w, h), angle = rect = cv2.minAreaRect(contour)
rect_area = w * h
if cont_area / rect_area < 0.8: # check the 'rectangularity'
continue
rois.append(rect)
# save intermediate results in the debug folder
rois_img = cv2.drawContours(debug_img, contours, -1, (0, 0, 230))
rois_img = cv2.drawContours(rois_img, [cv2.boxPoints(rect).astype('int32') for rect in rois], -1, (0, 230, 0))
save_dbg_img(rois_img, 'rectangle_detection', filename, 1)
# todo: detect pattern
return rois[0], 1.0 # dummy values
def save_dbg_img(img, folder, filename, index=0):
'''Writes the given image to DEBUG_DIR/folder/filename_index.png.'''
folder = os.path.join(DEBUG_DIR, folder)
if not os.path.exists(folder):
os.makedirs(folder)
cv2.imwrite(os.path.join(folder, '{}_{:02}.png'.format(os.path.splitext(filename)[0], index)), img)
if __name__ == "__main__":
main()
Here is an example image of the current WIP
The next step is to detect the pattern / relation between mutliple rectangles. I'll update this answer when I make progress.
Is it possible to detect the upper side of a dice? While this will be an easy task if you look from the top, from many perspectives multiple sides are visible.
Here is an example of a dice, feel free to take your own pictures:
You usually want to know the score you have achieved. It is easy for me to extract ALL dots, but how to only extract those on the top? In this special case, the top side is the largest, but this might not always be true. I am looking for someting which evaluates the distortion of the top square (or circle in this case, which I can extract) in relation to the perspective given by the grid in the bottom.
Example program with some results is given below.
import numpy as np
import cv2
img = cv2.imread('dice.jpg')
# Colour range to be extracted
lower_blue = np.array([0,0,0])
upper_blue = np.array([24,24,24])
# Threshold the BGR image
dots = cv2.inRange(img, lower_blue, upper_blue)
# Colour range to be extracted
lower_blue = np.array([0,0,0])
upper_blue = np.array([226,122,154])
# Threshold the BGR image
upper_side_shape = cv2.inRange(img, lower_blue, upper_blue)
cv2.imshow('Upper side shape',upper_side_shape)
cv2.imshow('Dots',dots)
cv2.waitKey(0)
cv2.destroyAllWindows()
Some resulting images:
The best solution is dot size, which I mentioned in the comment. You find the largest dot, consider it as max, and then create a tolerance level.
But what if all dots are nearly equal (viewing it from the edge at angle that makes things equidistant), or even too small? The best solution for that is creating a boundary to capture the dots. This requires the analysis of the dice's edge (edge detection basically), but once you define the boundary, you're solid.
All you need is to capture the edges of the dice from the perspective you're seeing.
Here's a visual example:
Since you have a virtual boundary set, you'll simply measure dots above a specific point on the y-axis.
The dot size is a good heuristic, but I would also add the dot roundness: if you compute the second order image moments of the binarized dots, the more the x and y moment are similar, the more round the figure. This will of course fail, like the size, for a side view, but then what does "top-side" really means if you can't sense gravity..
why try to chop up the image at all? based on what numbers you see on the side you can infer what number is on the top. your side numbers can be used as a check to validate your guess.
note that you'll have to be careful about handedness (see: http://mathworld.wolfram.com/Dice.html)