I'm writing for Android with OpenCV. I'm segmenting an image similar to below using marker-controlled watershed, without the user manually marking the image. I'm planning to use the regional maxima as markers.
minMaxLoc() would give me the value, but how can I restrict it to the blobs which is what I'm interested in? Can I utilize the results from findContours() or cvBlob blobs to restrict the ROI and apply maxima to each blob?
First of all: the function minMaxLoc finds only the global minimum and global maximum for a given input, so it is mostly useless for determining regional minima and/or regional maxima. But your idea is right, extracting markers based on regional minima/maxima for performing a Watershed Transform based on markers is totally fine. Let me try to clarify what is the Watershed Transform and how you should correctly use the implementation present in OpenCV.
Some decent amount of papers that deal with watershed describe it similarly to what follows (I might miss some detail, if you are unsure: ask). Consider the surface of some region you know, it contains valleys and peaks (among other details that are irrelevant for us here). Suppose below this surface all you have is water, colored water. Now, make holes in each valley of your surface and then the water starts to fill all the area. At some point, differently colored waters will meet, and when this happen, you construct a dam such that they don't touch each other. In the end you have a collection of dams, which is the watershed separating all the different colored water.
Now, if you make too many holes in that surface, you end up with too many regions: over-segmentation. If you make too few you get an under-segmentation. So, virtually any paper that suggests using watershed actually presents techniques to avoid these problems for the application the paper is dealing with.
I wrote all this (which is possibly too naïve for anyone that knows what the Watershed Transform is) because it reflects directly on how you should use watershed implementations (which the current accepted answer is doing in a completely wrong manner). Let us start on the OpenCV example now, using the Python bindings.
The image presented in the question is composed of many objects that are mostly too close and in some instances overlapping. The usefulness of watershed here is to separate correctly these objects, not to group them into a single component. So you need at least one marker for each object and good markers for the background. As an example, first binarize the input image by Otsu and perform a morphological opening for removing small objects. The result of this step is shown below in the left image. Now with the binary image consider applying the distance transform to it, result at right.
With the distance transform result, we can consider some threshold such that we consider only the regions most distant to the background (left image below). Doing this, we can obtain a marker for each object by labeling the different regions after the earlier threshold. Now, we can also consider the border of a dilated version of the left image above to compose our marker. The complete marker is shown below at right (some markers are too dark to be seen, but each white region in the left image is represented at the right image).
This marker we have here makes a lot of sense. Each colored water == one marker will start to fill the region, and the watershed transformation will construct dams to impede that the different "colors" merge. If we do the transform, we get the image at left. Considering only the dams by composing them with the original image, we get the result at right.
import sys
import cv2
import numpy
from scipy.ndimage import label
def segment_on_dt(a, img):
border = cv2.dilate(img, None, iterations=5)
border = border - cv2.erode(border, None)
dt = cv2.distanceTransform(img, 2, 3)
dt = ((dt - dt.min()) / (dt.max() - dt.min()) * 255).astype(numpy.uint8)
_, dt = cv2.threshold(dt, 180, 255, cv2.THRESH_BINARY)
lbl, ncc = label(dt)
lbl = lbl * (255 / (ncc + 1))
# Completing the markers now.
lbl[border == 255] = 255
lbl = lbl.astype(numpy.int32)
cv2.watershed(a, lbl)
lbl[lbl == -1] = 0
lbl = lbl.astype(numpy.uint8)
return 255 - lbl
img = cv2.imread(sys.argv[1])
# Pre-processing.
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, img_bin = cv2.threshold(img_gray, 0, 255,
cv2.THRESH_OTSU)
img_bin = cv2.morphologyEx(img_bin, cv2.MORPH_OPEN,
numpy.ones((3, 3), dtype=int))
result = segment_on_dt(img, img_bin)
cv2.imwrite(sys.argv[2], result)
result[result != 255] = 0
result = cv2.dilate(result, None)
img[result == 255] = (0, 0, 255)
cv2.imwrite(sys.argv[3], img)
I would like to explain a simple code on how to use watershed here. I am using OpenCV-Python, but i hope you won't have any difficulty to understand.
In this code, I will be using watershed as a tool for foreground-background extraction. (This example is the python counterpart of the C++ code in OpenCV cookbook). This is a simple case to understand watershed. Apart from that, you can use watershed to count the number of objects in this image. That will be a slightly advanced version of this code.
1 - First we load our image, convert it to grayscale, and threshold it with a suitable value. I took Otsu's binarization, so it would find the best threshold value.
import cv2
import numpy as np
img = cv2.imread('sofwatershed.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Below is the result I got:
( even that result is good, because great contrast between foreground and background images)
2 - Now we have to create the marker. Marker is the image with same size as that of original image which is 32SC1 (32 bit signed single channel).
Now there will be some regions in the original image where you are simply sure, that part belong to foreground. Mark such region with 255 in marker image. Now the region where you are sure to be the background are marked with 128. The region you are not sure are marked with 0. That is we are going to do next.
A - Foreground region:- We have already got a threshold image where pills are white color. We erode them a little, so that we are sure remaining region belongs to foreground.
fg = cv2.erode(thresh,None,iterations = 2)
fg :
B - Background region :- Here we dilate the thresholded image so that background region is reduced. But we are sure remaining black region is 100% background. We set it to 128.
bgt = cv2.dilate(thresh,None,iterations = 3)
ret,bg = cv2.threshold(bgt,1,128,1)
Now we get bg as follows :
C - Now we add both fg and bg :
marker = cv2.add(fg,bg)
Below is what we get :
Now we can clearly understand from above image, that white region is 100% foreground, gray region is 100% background, and black region we are not sure.
Then we convert it into 32SC1 :
marker32 = np.int32(marker)
3 - Finally we apply watershed and convert result back into uint8 image:
cv2.watershed(img,marker32)
m = cv2.convertScaleAbs(marker32)
m :
4 - We threshold it properly to get the mask and perform bitwise_and with the input image:
ret,thresh = cv2.threshold(m,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
res = cv2.bitwise_and(img,img,mask = thresh)
res :
Hope it helps!!!
ARK
Foreword
I'm chiming in mostly because I found both the watershed tutorial in the OpenCV documentation (and C++ example) as well as mmgp's answer above to be quite confusing. I revisited a watershed approach multiple times to ultimately give up out of frustration. I finally realized I needed to at least give this approach a try and see it in action. This is what I've come up with after sorting out all of the tutorials I've come across.
Aside from being a computer vision novice, most of my trouble probably had to do with my requirement to use the OpenCVSharp library rather than Python. C# doesn't have baked-in high-power array operators like those found in NumPy (though I realize this has been ported via IronPython), so I struggled quite a bit in both understanding and implementing these operations in C#. Also, for the record, I really despise the nuances of, and inconsistencies in most of these function calls. OpenCVSharp is one of the most fragile libraries I've ever worked with. But hey, it's a port, so what was I expecting? Best of all, though -- it's free.
Without further ado, let's talk about my OpenCVSharp implementation of the watershed, and hopefully clarify some of the stickier points of watershed implementation in general.
Application
First of all, make sure watershed is what you want and understand its use. I am using stained cell plates, like this one:
It took me a good while to figure out I couldn't just make one watershed call to differentiate every cell in the field. On the contrary, I first had to isolate a portion of the field, then call watershed on that small portion. I isolated my region of interest (ROI) via a number of filters, which I will explain briefly here:
Start with source image (left, cropped for demonstration purposes)
Isolate the red channel (left middle)
Apply adaptive threshold (right middle)
Find contours then eliminate those with small areas (right)
Once we have cleaned the contours resulting from the above thresholding operations, it is time to find candidates for watershed. In my case, I simply iterated through all contours greater than a certain area.
Code
Say we've isolated this contour from the above field as our ROI:
Let's take a look at how we'll code up a watershed.
We'll start with a blank mat and draw only the contour defining our ROI:
var isolatedContour = new Mat(source.Size(), MatType.CV_8UC1, new Scalar(0, 0, 0));
Cv2.DrawContours(isolatedContour, new List<List<Point>> { contour }, -1, new Scalar(255, 255, 255), -1);
In order for the watershed call to work, it will need a couple of "hints" about the ROI. If you're a complete beginner like me, I recommend checking out the CMM watershed page for a quick primer. Suffice to say we're going to create hints about the ROI on the left by creating the shape on the right:
To create the white part (or "background") of this "hint" shape, we'll just Dilate the isolated shape like so:
var kernel = Cv2.GetStructuringElement(MorphShapes.Ellipse, new Size(2, 2));
var background = new Mat();
Cv2.Dilate(isolatedContour, background, kernel, iterations: 8);
To create the black part in the middle (or "foreground"), we'll use a distance transform followed by threshold, which takes us from the shape on the left to the shape on the right:
This takes a few steps, and you may need to play around with the lower bound of your threshold to get results that work for you:
var foreground = new Mat(source.Size(), MatType.CV_8UC1);
Cv2.DistanceTransform(isolatedContour, foreground, DistanceTypes.L2, DistanceMaskSize.Mask5);
Cv2.Normalize(foreground, foreground, 0, 1, NormTypes.MinMax); //Remember to normalize!
foreground.ConvertTo(foreground, MatType.CV_8UC1, 255, 0);
Cv2.Threshold(foreground, foreground, 150, 255, ThresholdTypes.Binary);
Then we'll subtract these two mats to get the final result of our "hint" shape:
var unknown = new Mat(); //this variable is also named "border" in some examples
Cv2.Subtract(background, foreground, unknown);
Again, if we Cv2.ImShow unknown, it would look like this:
Nice! This was easy for me to wrap my head around. The next part, however, got me quite puzzled. Let's look at turning our "hint" into something the Watershed function can use. For this we need to use ConnectedComponents, which is basically a big matrix of pixels grouped by the virtue of their index. For example, if we had a mat with the letters "HI", ConnectedComponents might return this matrix:
0 0 0 0 0 0 0 0 0
0 1 0 1 0 2 2 2 0
0 1 0 1 0 0 2 0 0
0 1 1 1 0 0 2 0 0
0 1 0 1 0 0 2 0 0
0 1 0 1 0 2 2 2 0
0 0 0 0 0 0 0 0 0
So, 0 is the background, 1 is the letter "H", and 2 is the letter "I". (If you get to this point and want to visualize your matrix, I recommend checking out this instructive answer.) Now, here's how we'll utilize ConnectedComponents to create the markers (or labels) for watershed:
var labels = new Mat(); //also called "markers" in some examples
Cv2.ConnectedComponents(foreground, labels);
labels = labels + 1;
//this is a much more verbose port of numpy's: labels[unknown==255] = 0
for (int x = 0; x < labels.Width; x++)
{
for (int y = 0; y < labels.Height; y++)
{
//You may be able to just send "int" in rather than "char" here:
var labelPixel = (int)labels.At<char>(y, x); //note: x and y are inexplicably
var borderPixel = (int)unknown.At<char>(y, x); //and infuriatingly reversed
if (borderPixel == 255)
labels.Set(y, x, 0);
}
}
Note that the Watershed function requires the border area to be marked by 0. So, we've set any border pixels to 0 in the label/marker array.
At this point, we should be all set to call Watershed. However, in my particular application, it is useful just to visualize a small portion of the entire source image during this call. This may be optional for you, but I first just mask off a small bit of the source by dilating it:
var mask = new Mat();
Cv2.Dilate(isolatedContour, mask, new Mat(), iterations: 20);
var sourceCrop = new Mat(source.Size(), source.Type(), new Scalar(0, 0, 0));
source.CopyTo(sourceCrop, mask);
And then make the magic call:
Cv2.Watershed(sourceCrop, labels);
Results
The above Watershed call will modify labels in place. You'll have to go back to remembering about the matrix resulting from ConnectedComponents. The difference here is, if watershed found any dams between watersheds, they will be marked as "-1" in that matrix. Like the ConnectedComponents result, different watersheds will be marked in a similar fashion of incrementing numbers. For my purposes, I wanted to store these into separate contours, so I created this loop to split them up:
var watershedContours = new List<Tuple<int, List<Point>>>();
for (int x = 0; x < labels.Width; x++)
{
for (int y = 0; y < labels.Height; y++)
{
var labelPixel = labels.At<Int32>(y, x); //note: x, y switched
var connected = watershedContours.Where(t => t.Item1 == labelPixel).FirstOrDefault();
if (connected == null)
{
connected = new Tuple<int, List<Point>>(labelPixel, new List<Point>());
watershedContours.Add(connected);
}
connected.Item2.Add(new Point(x, y));
if (labelPixel == -1)
sourceCrop.Set(y, x, new Vec3b(0, 255, 255));
}
}
Then, I wanted to print these contours with random colors, so I created the following mat:
var watershed = new Mat(source.Size(), MatType.CV_8UC3, new Scalar(0, 0, 0));
foreach (var component in watershedContours)
{
if (component.Item2.Count < (labels.Width * labels.Height) / 4 && component.Item1 >= 0)
{
var color = GetRandomColor();
foreach (var point in component.Item2)
watershed.Set(point.Y, point.X, color);
}
}
Which yields the following when shown:
If we draw on the source image the dams that were marked by a -1 earlier, we get this:
Edits:
I forgot to note: make sure you're cleaning up your mats after you're done with them. They WILL stay in memory and OpenCVSharp may present with some unintelligible error message. I should really be using using above, but mat.Release() is an option as well.
Also, mmgp's answer above includes this line: dt = ((dt - dt.min()) / (dt.max() - dt.min()) * 255).astype(numpy.uint8), which is a histogram stretching step applied to the results of the distance transform. I omitted this step for a number of reasons (mostly because I didn't think the histograms I saw were too narrow to begin with), but your mileage may vary.
Related
I need to find the largest empty area in the document and display its coordinates, center point and area, using python to put a QR Code there.
I think OpenCV and Numpy should be enough for this task.
What kinda THRESH to use? Because there are a lot of types of scans:
gray, BW, with color, and how to find the contour properly?
How this can be implemented in the fastest way? An example using the
first scan from google is attached, where you can see that the code
should find the largest empty square area.
#Mark Setchell Thanks! This code works perfectly for all docs with a white background, but when I use smth with a color in the background it finds a completely different area. Also, to keep thin lines in the docs I used Erode after thresholding. Tried to change thresholding and erode parameters, still not working properly.
Edited post, added color pictures.
Here's a possible approach:
#!/usr/bin/env python3
import cv2
import numpy as np
def largestSquare(im):
# Make image square of 100x100 to simplify and speed up
s = 100
work = cv2.resize(im, (s,s), interpolation=cv2.INTER_NEAREST)
# Make output accumulator - uint16 is ok because...
# ... max value is 100x100, i.e. 10,000 which is less than 65,535
# ... and you can make a PNG of it too
p = np.zeros((s,s), np.uint16)
# Find largest square
for i in range(1, s):
for j in range(1, s):
if (work[i][j] > 0 ):
p[i][j] = min(p[i][j-1], p[i-1][j], p[i-1][j-1]) + 1
else:
p[i][j] = 0
# Save result - just for illustration purposes
cv2.imwrite("result.png",p)
# Work out what the actual answer is
ind = np.unravel_index(np.argmax(p, axis=None), p.shape)
print(f'Location: {ind}')
print(f'Length of side: {p[ind]}')
# Load image and threshold
im = cv2.imread('page.png', cv2.IMREAD_GRAYSCALE)
_, thr = cv2.threshold(im,127,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# Get largest white square
largestSquare(thr)
Output
Location: (21, 77)
Length of side: 18
Notes:
I edited out your red annotation so it didn't interfere with my algorithm.
I did Otsu thresholding to get pure black and white - that may or may not be appropriate to your use case. It will depend on your scans and paper background etc.
I scaled the image down to 100x100 so it doesn't take all day to run. You will need to scale the results back up to the size of your original image but I assume you can do that easily enough.
Keywords: Image processing, image, Python, OpenCV, largest white square, largest empty space.
I'm working with license plates, what I do is apply a series of filters to it, such as:
Grayscale
Blur
Threshhold
Binary
The problem is when I doing this, there are some contour like this image at borders, how can I clear them? or make it just black color (masked)? I used this code but sometimes it falls.
# invert image and detect contours
inverted = cv2.bitwise_not(image_binary_and_dilated)
contours, hierarchy = cv2.findContours(inverted,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
# get the biggest contour
biggest_index = -1
biggest_area = -1
i = 0
for c in contours:
area = cv2.contourArea(c)
if area > biggest_area:
biggest_area = area
biggest_index = i
i = i+1
print("biggest area: " + str(biggest_area) + " index: " + str(biggest_index))
cv2.drawContours(image_binary_and_dilated, contours, biggest_index, [0,0,255])
center, size, angle = cv2.minAreaRect(contours[biggest_index])
rot_mat = cv2.getRotationMatrix2D(center, angle, 1.)
#cv2.warpPerspective()
print(size)
dst = cv2.warpAffine(inverted, rot_mat, (int(size[0]), int(size[1])))
mask = dst * 0
x1 = max([int(center[0] - size[0] / 2)+1, 0])
y1 = max([int(center[1] - size[1] / 2)+1, 0])
x2 = int(center[0] + size[0] / 2)-1
y2 = int(center[1] + size[1] / 2)-1
point1 = (x1, y1)
point2 = (x2, y2)
print(point1)
print(point2)
cv2.rectangle(dst, point1, point2, [0,0,0])
cv2.rectangle(mask, point1, point2, [255,255,255], cv2.FILLED)
masked = cv2.bitwise_and(dst, mask)
#cv2_imshow(imgg)
cv2_imshow(dst)
cv2_imshow(masked)
#cv2_imshow(mask)
Some results:
The original plates were:
Good result 1
Good result 2
Good result 3
Good result 4
Bad result 1
Bad result 2
Binary plates are:
Image 1
Image 2
Image 3
Image 4
Image 5 - Bad result 1
Image 6 - Bad result 2
How can I fix this code? only that I want to avoid that bad result or improve it.
INTRODUCTION
What you are asking starts to become complicated, and I believe there is not anymore a right or wrong answer, just different ways to do this. Almost all of them will yield positive and negative results, most likely in a different ratio. Having a 100% positive result is quite a challenging task, and I do believe my answer does not reach it. Yet it can be the basis for a more sophisticated work towards that goal.
MY PROPOSAL
So, I want to make a different proposal here.
I am not 100% sure why you are doing all the steps, and I believe some of them could be unnecessary.
Let's start from the problem: you want to remove the white parts on the borders (which are not numbers).
So, we need an idea about how to distinguish them from the letters, in order to correctly tackle them.
If we just try to contour and warp, it is likely to work on some images and not on others, because not all of them look the same. This is the hardest problem to have a general solution that works for many images.
What are the difference between the characteristics of the numbers and the characteristics of the borders (and other small points?):
after thinking about that, I would say: the shapes! That meaning, if you would imagine a bounding box around a letter/number, it would look like a rectangle, whose size is related to the image size. While in the case of the border, they are usually very large and narrow, or too small to be considered a letter/number (random points).
Therefore, my guess would be on segmentation, dividing the features via their shape. So we take the binary image, we remove some parts using the projection on their axes (as you correctly asked in the previous question and I believe we should use) and we get an image where each letter is separated from the white borders.
Then we can segment and check the shape of each segmented object, and if we think these are letters, we keep them, otherwise we discard them.
THE CODE
I wrote the code before as an example on your data. Some of the parameters are tuned on this set of images, so they may have to be relaxed for a larger dataset.
import cv2
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import scipy.ndimage as ndimage
# do this for all the images
num_images = 6
plt.figure(figsize=(16,16))
for k in range(num_images):
# read the image
binary_image = cv2.imread("binary_image/img{}.png".format(k), cv2.IMREAD_GRAYSCALE)
# just for visualization purposes, I create another image with the same shape, to show what I am doing
new_intermediate_image = np.zeros((binary_image.shape), np.uint8)
new_intermediate_image += binary_image
# here we will copy only the cleaned parts
new_cleaned_image = np.zeros((binary_image.shape), np.uint8)
### THIS CODE COMES FROM THE PREVIOUS ANSWER:
# https://stackoverflow.com/questions/62127537/how-to-clean-binary-image-using-horizontal-projection?noredirect=1&lq=1
(rows,cols)=binary_image.shape
h_projection = np.array([ x/rows for x in binary_image.sum(axis=0)])
threshold_h = (np.max(h_projection) - np.min(h_projection)) / 10
print("we will use threshold {} for horizontal".format(threshold))
# select the black areas
black_areas_horizontal = np.where(h_projection < threshold_h)
for j in black_areas_horizontal:
new_intermediate_image[:, j] = 0
v_projection = np.array([ x/cols for x in binary_image.sum(axis=1)])
threshold_v = (np.max(v_projection) - np.min(v_projection)) / 10
print("we will use threshold {} for vertical".format(threshold_v))
black_areas_vertical = np.where(v_projection < threshold_v)
for j in black_areas_vertical:
new_intermediate_image[j, :] = 0
### UNTIL HERE
# define the features we are looking for
# this parameters can also be tuned
min_width = binary_image.shape[1] / 14
max_width = binary_image.shape[1] / 2
min_height = binary_image.shape[0] / 5
max_height = binary_image.shape[0]
print("we look for feature with width in [{},{}] and height in [{},{}]".format(min_width, max_width, min_height, max_height))
# segment the iamge
labeled_array, num_features = ndimage.label(new_intermediate_image)
# loop over all features found
for i in range(num_features):
# get a bounding box around them
slice_x, slice_y = ndimage.find_objects(labeled_array==i)[0]
roi = labeled_array[slice_x, slice_y]
# check the shape, if the bounding box is what we expect, copy it to the new image
if roi.shape[0] > min_height and \
roi.shape[0] < max_height and \
roi.shape[1] > min_width and \
roi.shape[1] < max_width:
new_cleaned_image += (labeled_array == i)
# print all images on a grid
plt.subplot(num_images,3,1+(k*3))
plt.imshow(binary_image)
plt.subplot(num_images,3,2+(k*3))
plt.imshow(new_intermediate_image)
plt.subplot(num_images,3,3+(k*3))
plt.imshow(new_cleaned_image)
that produces the output (in the grid, left image are the input images, central one are the images after the mask based on histogram projections, and on the right are the cleaned images):
CONCLUSIONS:
As said above, this method does not yield 100% positive results. The last picture has lower quality and some parts are unconnected, and they are lost in the process. I personally believe this is a price to pay to get cleaner image, and if you have a lot of images, it won't be a problem, and you can remove those kind of images. Overall, I think this method returns quite clear images, where all other parts that are not letters or numbers are correctly removed.
ADVANTAGES
the image is clean, nothing more than letters or numbers are kept
the parameters can be tuned, and should be consistent across images
in case of problem, using some prints or some debugging on the loop that chooses the features to keep should make it easier to understand where are the problem and correct them
LIMITATIONS
it may fail in some cases where letters and numbers touch the white borders, which seems quite possible. It is handled from the black_areas created using the projection, but I am not so confident this will work 100% of the time.
some small parts of the numbers can be lost during the process, as in the last picture.
I have two images:
I want to measure how straight/smooth the text borders are rendered.
First image is rendered perfectly straight, so it deserves a quality measure 1. On the other hand, the second image is rendered with a lot of variant curves (rough in a way) that is why it deserves a quality measure less than 1. How will I measure it using image processing or any Python function or any function written in other languages?
Clarification :
There are font styles that are rendered originally with straight strokes but there are also font styles that are rendered smoothly just like the cursive font styles. What I'm really after is to differentiate the text border surface roughness of the characters by giving it a quality measure.
I want to measure how straight/smooth the text borders are rendered in an image.
Inversely, it can also be said that I want to measure how rough the text borders are rendered in an image.
I don't know any python function, but I would:
1) Use potrace to trace the edges and convert them to bezier curves. Here's a vizualisation:
2) Then let's zoom to the top part of the P for example:
You draw lines perpendicular to the curve for a finite length (let's say 100 pixels). You plot the color intensity (you can convert to HSI or HSV and use one of those channels, or just convert to grayscale and take the pixel value directly) over that line:
3) Then you calculate the standard deviation of the derivative. Small standard deviation means sharp edges, large standard deviation means blurry edges. For a perfect edge, the standard deviation would be zero.
4) For every edge were you drew a perpendicular line, you now have a "smoothness" value. You can then average all the smoothness values per edge, per letter, per word or per image, as you see fit. Also, the more perpendicular lines you draw, the more accurate your smoothness value, but the more computationally intensive.
I would try something simple like creating a 'roughness' metric using a few functions from the opencv library, since it's easy to work with in Python (and C++, as well as other wrappers).
For example (without actual source, since I'm typing on my phone):
Preprocess to create binary images (many standard ways).
Use cv2.findContours to get outlines of the letters.
Use cv2.arcLength on each contour as denominators.
Use cv2.approxPolyDP to simplify each contour.
Use cv2.arcLength on each simplified contour as numerators.
Calculate ratios of simplified over full arc lengths.
In step 5, ratios closer to 1.0 require less simplification, so they're presumably less rough. Ratios closer to 0.0 require a lot of simplification, and are therefore probably very rough. Of course, you'll have to tweak the contour finding code to get appropriate outlines to work with, and you'll need to manage numerical precision to keep the math calculations meaningful, but hopefully the idea is clear enough.
OpenCV also has the useful functions cv2.convexHull and cv2.convexityDefects that you might find interesting in related work. However, they didn't seem appropriate for the letters here, since internal features on letters like M for example would be more challenging to address.
Speaking of rough things, I admit this algorithmic outline is incredibly rough! However, I hope it gives you a useful idea to try that seems straightforward to implement quickly to start getting quantitative feedback.
One idea might be simply to get the average of the number of vertices per character in Python/OpenCV using cv2.CHAIN_APPROX_SIMPLE.
Since you have the same characters and you want to know how straight they are, the CHAIN_APPROX_SIMPLE measures only horizontal and vertical corner vertices. For your first image, there should be much fewer vertices than for your second image.
CHAIN_APPROX_SIMPLE compresses horizontal, vertical, and diagonal
segments and leaves only their end points. For example, an up-right
rectangular contour is encoded with 4 points.
import cv2
import numpy as np
# read image
img = cv2.imread('lemper1.png')
#img = cv2.imread('lemper2.png')
# convert to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# threshold
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
# invert
thresh = 255 - thresh
# get contours and compute average number of vertices per character (contour)
result = img.copy()
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
num_contours = 0
sum = 0
for cntr in contours:
cv2.drawContours(result, [cntr], 0, (0,0,255), 1)
num_vertices = len(cntr)
sum = sum + num_vertices
num_contours = num_contours + 1
smoothness = (sum / num_contours)
print(smoothness)
# save resulting images
cv2.imwrite('lemper1_contours.png',result)
#cv2.imwrite('lemper2_contours.png',result)
# show thresh and result
cv2.imshow("thresh", thresh)
cv2.imshow("contours", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
First image average number of vertices: 49.666666666666664
Second image average number of vertices: 179.14285714285714
So smaller number of vertices means straighter characters.
Preface
There's a few nice ideas presented here based around the properties of the character contours as polylines. Whilst there is some inherent flaws in this approach due to them being a function of resolution and scale, I would like to offer one further interruption of the same. My algorithm is still susceptible but it may offer a different perspective.
Theory
The method I propose is to compare common characters by the number of inflections in their contours. In this context, what I mean by inflection, is a sign change between the cross products of successive polyline segments as vector. For example; consider a polyline contour of a circle, starting at the mid y coordinate and the x+ most coordinate. If we were to trace the polyline contour CW (clockwise) around the perimeter, each line segment would be incrementally a CW transform of the prior. If at any time a segment turned "away" or "outwards", this transform would be CCW (counter-clockwise) and the cross product will invert. A "rough" circle will therefore have inflections, a "perfect" or "smooth" circle will have none.
Algorithm
The algorithm follows the steps below using the Emgu.CV. C# code below that:
The images are loaded and converted to binary by means of thresholding
The binary images then undergo contour detection and these contours are sort by their bounding box, left to right, so and their indices match the occurrence order of the character they contour.
Each contour is then re-pointed to an equal number of segments in order to normalize for scale and resolution differences between images/characters.
Each contour is "walked" and its number of inflections counted.
// [Some basic extensions are omitted for clarity]
// Load the images
Image<Rgb, byte> baseLineImage = new Image<Rgb, byte>("BaseLine.png");
Image<Rgb, byte> testCaseImage = new Image<Rgb, byte>("TestCase.png");
// Convert them to Gray Scale
Image<Gray, byte> baseLineGray = baseLineImage.Convert<Gray, byte>();
Image<Gray, byte> testCaseGray = testCaseImage.Convert<Gray, byte>();
// Threshold the images to binary
Image<Gray, byte> baseLineBinary = baseLineGray.ThresholdBinaryInv(new Gray(100), new Gray(255));
Image<Gray, byte> testCaseBinary = testCaseGray.ThresholdBinaryInv(new Gray(100), new Gray(255));
// Some dilation required on the test image so that the characters are continuous
testCaseBinary = testCaseBinary.Dilate(3);
// Extract the the contours from the images to isolate the character profiles
// and sort them left to right so as the indicies match the character order
VectorOfVectorOfPoint baseLineContours = new VectorOfVectorOfPoint();
Mat baseHierarchy = new Mat();
CvInvoke.FindContours(
baseLineBinary,
baseLineContours,
baseHierarchy,
RetrType.External,
ChainApproxMethod.ChainApproxSimple);
var baseLineContoursList = baseLineContours.ToList();
baseLineContoursList.Sort(new ContourComparer());
VectorOfVectorOfPoint testCaseContours = new VectorOfVectorOfPoint();
Mat testHierarchy = new Mat();
CvInvoke.FindContours(
testCaseBinary,
testCaseContours,
testHierarchy,
RetrType.External,
ChainApproxMethod.ChainApproxSimple);
var testCaseContoursList = testCaseContours.ToList();
testCaseContoursList.Sort(new ContourComparer());
var baseLineRepointedContours = RepointContours(baseLineContoursList, 50);
var testCaseRepointedContours = RepointContours(testCaseContoursList, 50);
var baseLineInflectionCounts = GetContourInflections(baseLineRepointedContours);
var testCaseInflectionCounts = GetContourInflections(testCaseRepointedContours);
Inflection Detection/Counting
static List<List<Point>> GetContourInflections(List<VectorOfPoint> contours)
{
// A resultant list to return the inflection points
List<List<Point>> result = new List<List<Point>>();
// Calculate the forward to reverse cross product at each vertex
List<double> crossProducts;
// Points used to store 2D Vectors as X,Y (I,J)
Point priorVector, forwardVector;
foreach (VectorOfPoint contour in contours)
{
crossProducts = new List<double>();
for (int p = 0; p < contour.Size; p++)
{
// Determine the vector to the prior to this vertex
priorVector = p == 0 ?
priorVector = new Point()
{
X = contour[p].X - contour[contour.Size - 1].X,
Y = contour[p].Y - contour[contour.Size - 1].Y
} :
priorVector = new Point()
{
X = contour[p].X - contour[p - 1].X,
Y = contour[p].Y - contour[p - 1].Y
};
// Determine the vector to the next vector
// If this is the lst vertex, loop back to vertex 0
forwardVector = p == contour.Size - 1 ?
new Point()
{
X = contour[0].X - contour[p].X,
Y = contour[0].Y - contour[p].Y,
} :
new Point()
{
X = contour[p + 1].X - contour[p].X,
Y = contour[p + 1].Y - contour[p].Y,
};
// Calculate the cross product of the prior and forward vectors
crossProducts.Add(forwardVector.X * priorVector.Y - forwardVector.Y * priorVector.X);
}
// Given the calculated cross products, detect the inflection points
List<Point> inflectionPoints = new List<Point>();
for (int p = 1; p < contour.Size; p++)
{
// If there is a sign change between this and the prior cross product, an inflection,
// or change from CW to CCW bearing increments has occurred. To and from zero products
// are ignored
if ((crossProducts[p] > 0 && crossProducts[p-1] < 0) ||
(crossProducts[p] < 0 && crossProducts[p-1] > 0))
{
inflectionPoints.Add(contour[p]);
}
}
result.Add(inflectionPoints);
}
return result;
}
Output
L: Baseline Inflections:0 Testcase Inflections:22
E: Baseline Inflections:1 Testcase Inflections:16
M: Baseline Inflections:4 Testcase Inflections:15
P: Baseline Inflections:11 Testcase Inflections:17
E: Baseline Inflections:1 Testcase Inflections:10
R: Baseline Inflections:9 Testcase Inflections:16
Contours (Blue) and Inflections (Red)
I am stuck in a problem where I want to differentiate between an object and the background(having a semi-transparent white sheet with backlight) i.e a fixed rough line introduced in the background and is merged with the object. My algorithm right now is I am taking the image from the camera, smoothing with gaussian blur, then extracting Value component from HSV, applying local binarization using wolf method to get the binarized image after which using OpenCV connected component algorithm I remove some small artifacts that are not connected to object as seen here. Now there is only this line artifact which is merged with the object but I want only the object as seen in this image. Please note that there are 2 lines in the binary image so using the 8 connected logic to detect lines not making a loop is not possible this is what I think and tried also. here is the code for that
size = np.size(thresh_img)
skel = np.zeros(thresh_img.shape,np.uint8)
element = cv2.getStructuringElement(cv2.MORPH_RECT,(3,3))
done = False
while( not done):
eroded = cv2.erode(thresh_img,element)
temp = cv2.dilate(eroded,element)
temp = cv2.subtract(thresh_img,temp)
skel = cv2.bitwise_or(skel,temp)
thresh_img = eroded.copy()
zeros = size - cv2.countNonZero(thresh_img)
if zeros==size:
done = True
# set max pixel value to 1
s = np.uint8(skel > 0)
count = 0
i = 0
while count != np.sum(s):
# non-zero pixel count
count = np.sum(s)
# examine 3x3 neighborhood of each pixel
filt = cv2.boxFilter(s, -1, (3, 3), normalize=False)
# if the center pixel of 3x3 neighborhood is zero, we are not interested in it
s = s*filt
# now we have pixels where the center pixel of 3x3 neighborhood is non-zero
# if a pixels' 8-connectivity is less than 2 we can remove it
# threshold is 3 here because the boxfilter also counted the center pixel
s[s < 1] = 0
# set max pixel value to 1
s[s > 0] = 1
i = i + 1
Any help in the form of code would be highly appreciated thanks.
Since you are already using connectedComponents the best way is to exclude, not only the ones which are small, but also the ones that are touching the borders of the image.
You can know which ones are to be discarded using connectedComponentsWithStats() that gives you also information about the bounding box of each component.
Alternatively, and very similarly you can switch from connectedComponents() to findContours() which gives you directly the Components so you can discard the external ones and the small ones to retrieved the part you are interested in.
I have found a plethora of questions regarding finding "things" in images using openCV, et al. in Python but so far I have been unable to piece them together for a reliable solution to my problem.
I am attempting to use computer vision to help count tiny surface mount electronics parts. The idea is for me to dump parts onto a solid color piece of paper, snap a picture, and have the software tell me how many items are in it.
The "things" differ from one picture to the next but will always be identical in any one image. I seem to be able to manually tune the parameters for things like hue/saturation for a particular part but it tends to require tweaking every time I change to a new part.
My current, semi-functioning code is posted below:
import imutils
import numpy
import cv2
import sys
def part_area(contours, round=10):
"""Finds the mode of the contour area. The idea is that most of the parts in an image will be separated and that
finding the most common area in the list of areas should provide a reasonable value to approximate by. The areas
are rounded to the nearest multiple of 200 to reduce the list of options."""
# Start with a list of all of the areas for the provided contours.
areas = [cv2.contourArea(contour) for contour in contours]
# Determine a threshold for the minimum amount of area as 1% of the overall range.
threshold = (max(areas) - min(areas)) / 100
# Trim the list of areas down to only those that exceed the threshold.
thresholded = [area for area in areas if area > threshold]
# Round the areas to the nearest value set by the round argument.
rounded = [int((area + (round / 2)) / round) * round for area in thresholded]
# Remove any areas that rounded down to zero.
cleaned = [area for area in rounded if area != 0]
# Count the areas with the same values.
counts = {}
for area in cleaned:
if area not in counts:
counts[area] = 0
counts[area] += 1
# Reduce the areas down to only those that are in groups of three or more with the same area.
above = []
for area, count in counts.iteritems():
if count > 2:
for _ in range(count):
above.append(area)
# Take the mean of the areas as the average part size.
average = sum(above) / len(above)
return average
def find_hue_mode(hsv):
"""Given an HSV image as an input, compute the mode of the list of hue values to find the most common hue in the
image. This is used to determine the center for the background color filter."""
pixels = {}
for row in hsv:
for pixel in row:
hue = pixel[0]
if hue not in pixels:
pixels[hue] = 0
pixels[hue] += 1
counts = sorted(pixels.keys(), key=lambda key: pixels[key], reverse=True)
return counts[0]
if __name__ == "__main__":
# load the image and resize it to a smaller factor so that the shapes can be approximated better
image = cv2.imread(sys.argv[1])
# define range of blue color in HSV
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
center = find_hue_mode(hsv)
print 'Center Hue:', center
lower = numpy.array([center - 10, 50, 50])
upper = numpy.array([center + 10, 255, 255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower, upper)
inverted = cv2.bitwise_not(mask)
blurred = cv2.GaussianBlur(inverted, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 100)
dilated = cv2.dilate(edged, None, iterations=1)
eroded = cv2.erode(dilated, None, iterations=1)
# find contours in the thresholded image and initialize the shape detector
contours = cv2.findContours(eroded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
# Compute the area for a single part to use when setting the threshold and calculating the number of parts within
# a contour area.
part_area = part_area(contours)
# The threshold for a part's area - can't be too much smaller than the part itself.
threshold = part_area * 0.5
part_count = 0
for contour in contours:
if cv2.contourArea(contour) < threshold:
continue
# Sometimes parts are close enough together that they become one in the image. To battle this, the total area
# of the contour is divided by the area of a part (derived earlier).
part_count += int((cv2.contourArea(contour) / part_area) + 0.1) # this 0.1 "rounds up" slightly and was determined empirically
# Draw an approximate contour around each detected part to give the user an idea of what the tool has computed.
epsilon = 0.1 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
# Print the part count and show off the processed image.
print 'Part Count:', part_count
cv2.imshow("Image", image)
cv2.waitKey(0)
Here's an example of the type of input image I am using:
or this:
And I'm currently getting results like this:
The results clearly show that the script is having trouble identifying some parts and it's true Achilles heel seems to be when parts touch one another.
So my question/challenge is, what can I do to improve the reliability of this script?
The script is to be integrated into an existing Python tool so I am searching for a solution using Python. The solution does not need to be pure Python as I am willing to install whatever 3rd party libraries might be needed.
If the objects are all of similar types, you might have more success isolating a single example in the image and then using feature matching to detect them.
A full solution would be out of scope for Stack Overflow, but my suggestion for progress would be to first somehow find one or more "correct" examples using your current rectangle retrieval method. You could probably look for all your samples that are of the expected size, or that are accurate rectangles.
Once you have isolated a few positive examples, use some feature matching techniques to find the others. There is a lot of reading up you probably need to do on it but that is a potential solution.
A general summary is that you use your positive examples to find "features" of the object you want to detect. These "features" are generally things like corners or changes in gradient. OpenCV contains many methods you can use.
Once you have the features, there are several algorithms in OpenCV you can look at that will search the image for all matching features. You’ll want one that is rotation invariant (can detect the same features arranged in different rotation), but you probably don’t need scale invariance (can detect the same features at multiple scales).
My one concern with this method is that the items you are searching for in your images are quite small. It might be difficult to find good, consistent features to match on.
You're tackling a 2D object recognition problem, for which there are many possible approaches. You've gone about it using background/foreground segmentation, which is ok as you have control on the scene (laying down the background paper sheet). However this will always have fundamental limitations when the objects touch. A simple solution to your problem can be this:
1) You assume that touching objects are rare events (which is a fine assumption in your problem). Therefore you can compute the areas for each segmented region, and compute the median of these, which will give a robust estimate for the object's area. Let's call this robust estimate A (in squared pixels). This will be fine if fewer than 50% of regions correspond to touching objects.
2) You then proceed to measure the number of objects in each segmented region. Let Ai be the area of the ith region. You then compute the number of objects in each region by Ni=round(Ai/A). You then sum Ni to give you the total number of objects.
This approach will be fine as long as the following conditions are met:
A) The touching objects do not significantly overlap
B) You do not have objects lying on their sides. If you do you might be able to deal with this using two area estimates (side and flat). Better to eliminate this scenario if you can for simplicity.
C) The objects are all roughly the same distance to the camera. If this is not the case then the areas of the objects (in pixels) cannot be modelled well by a single value.
D) There are not partially visible objects at the borders of the image.
E) You ensure that only the same type of object is visible in each image.