I have an array of bounding boxes from the object detection system.
They are in the format:
[[x,y], [x,y], [x,y], [x,y]]
I want to find the largest bounding box that is not intersecting with any other provided boxes nor is inside an excluded box.
I am using python, but response in any programming language is welcomed :)
Visual example
How I tried and failed to solve this problem.
Approach I.
Iterate over every point and find the min and max of x and y.
Then crop to a polygon using these coordinates.
The problem is that algorithm on an example image would remove the top part of the image but there is no need to because we 'missed' top left and right boxes.
Approach II.
Try to choose to crop only one side at a time, because usually in my dataset things to exclude are on one side. e.g. remove top 100px
So I calculated the min and max of x and y like before.
Then the calculated area of every possible cut - left, right, top, bottom and choose one with the smallest area.
This approach failed pretty quickly when there are boxes on two sides of picture like left and right
Consider a full recangle (initially the whole picture) and take away one excluded box. You will get 2x2x2x2=16 possible rectangular subdivisions, for example this one.
┌────────────────────────┐
│ │
│ │
├───────┬───────┬────────┤
│ │ exc │ │
│ │ lude │ │
│ ├───────┴────────┤
│ │ │
│ │ │
└───────┴────────────────┘
For each box in the subdivision, take away the next excluded box.
Do this N times, and take the biggest box of the final step.
Here's a potential solution to find the bounding box contour with the largest surface area. We have two requirements:
Largest bounding box is not intersecting with any other box
Largest bounding box is not inside another box
Essentially we can reword the two requirements to this:
Given C1 and C2, determine if C1 and C2 intersect
Given C1 and C2, check if there is a point from C1 in C2
To solve #1, we can create a contour_intersect function that uses a bitwise AND operation with np.logical_and() to detect intersection. The idea is to create two separate masks for each contour and then use the logical AND operation on them. Any points that have a positive value (1 or True) will be points of intersection. Essentially, if the entire array is False then there was no intersection between the contours. But if there is a single True, then the contours touched at some point and thus intersect.
For #2, we can create a function contour_inside and use cv2.pointPolygonTest() to determine if a point is inside, outside, or on the edge of a contour. The function returns +1, -1, or 0 to indicate if a point is inside, outside, or on the contour, respectively. We find the centroid of C1 and then check if that point is inside C2.
Here's an example to visualize the scenarios:
Input image with three contours. Nothing special here, the expected answer would be the contour with the largest area.
Answer:
Contour #0 is the largest
Next we add two additional contours. Contour #3 will represent the intersection scenario and contour #4 will represent the inside contour scenario.
Answer:
Contour #0 has failed test
Contour #1 has failed test
Contour #2 is the largest
To solve this problem, we find contours then sort using contour area from largest to smallest. Next, we compare this contour with all other contours and check the two cases. If either case fails, we dump the current contour and move onto the next largest contour. The first contour that passes both tests for all other contours is our largest bounding box contour. Normally, contour #0 would be our largest but it fails the intersection test. We then move onto contour #1 but this fails the inside test. Thus the last remaining contour that passes both tests is contour #2.
import cv2
import numpy as np
# Check if C1 and C2 intersect
def contour_intersect(original_image, contour1, contour2):
# Two separate contours trying to check intersection on
contours = [contour1, contour2]
# Create image filled with zeros the same size of original image
blank = np.zeros(original_image.shape[0:2])
# Copy each contour into its own image and fill it with '1'
image1 = cv2.drawContours(blank.copy(), contours, 0, 1)
image2 = cv2.drawContours(blank.copy(), contours, 1, 1)
# Use the logical AND operation on the two images
# Since the two images had bitwise and applied to it,
# there should be a '1' or 'True' where there was intersection
# and a '0' or 'False' where it didnt intersect
intersection = np.logical_and(image1, image2)
# Check if there was a '1' in the intersection
return intersection.any()
# Check if C1 is in C2
def contour_inside(contour1, contour2):
# Find centroid of C1
M = cv2.moments(contour1)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
inside = cv2.pointPolygonTest(contour2, (cx, cy), False)
if inside == 0 or inside == -1:
return False
elif inside == 1:
return True
# Load image, convert to grayscale, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find contours, sort by contour area from largest to smallest
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
sorted_cnts = sorted(cnts, key=lambda x: cv2.contourArea(x), reverse=True)
# "Intersection" and "inside" contours
# Add both contours to test
# --------------------------------
intersect_contour = np.array([[[230, 93]], [[230, 187]], [[326, 187]], [[326, 93]]])
sorted_cnts.append(intersect_contour)
cv2.drawContours(original, [intersect_contour], -1, (36,255,12), 3)
inside_contour = np.array([[[380, 32]], [[380, 229]], [[740, 229]], [[740, 32]]])
sorted_cnts.append(inside_contour)
cv2.drawContours(original, [inside_contour], -1, (36,255,12), 3)
# --------------------------------
# Find centroid for each contour and label contour number
for count, c in enumerate(sorted_cnts):
M = cv2.moments(c)
cx = int(M['m10']/M['m00'])
cy = int(M['m01']/M['m00'])
cv2.putText(original, str(count), (cx-5, cy+5), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (246,255,12), 3)
# Find largest bounding box contour
largest_contour_name = ""
largest_contour = ""
contours_length = len(sorted_cnts)
for i1 in range(contours_length):
found = True
for i2 in range(i1 + 1, contours_length):
c1 = sorted_cnts[i1]
c2 = sorted_cnts[i2]
# Test intersection and "inside" contour
if contour_intersect(original, c1, c2) or contour_inside(c1, c2):
print('Contour #{} has failed test'.format(i1))
found = False
continue
if found:
largest_contour_name = i1
largest_contour = sorted_cnts[i1]
break
print('Contour #{} is the largest'.format(largest_contour_name))
print(largest_contour)
# Display
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.imshow('original', original)
cv2.waitKey()
Note: The assumption is that you have an array of contours from cv2.findContours() with the format like this example:
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
sorted_cnts = sorted(cnts, key=lambda x: cv2.contourArea(x), reverse=True)
for c in sorted_cnts:
print(c)
print(type(c))
x,y,w,h = cv2.boundingRect(c)
print((x,y,w,h))
Output
[[[230 93]]
[[230 187]]
[[326 187]]
[[326 93]]]
<class 'numpy.ndarray'>
(230, 93, 97, 95)
Performance note: The intersection check function suffers on the performance side since it creates three copies of the input image to draw the contours and may be slower when it comes to execution time with a greater number of contours or a larger input image size. I'll leave this optimization step to you!
You can use the cv2.boundingRect() method to get the x, y, w, h of each bounding box, and with the x, y, w, h of each bounding box, you can use the condition x2 + w2 > x1 > x2 - w1 and y2 + h2 > y1 > y2 - h1 to check if any two bounding boxes intersect or are within each others:
import cv2
import numpy as np
def intersect(b1, b2):
x1, y1, w1, h1 = b1
x2, y2, w2, h2 = b2
return x2 + w2 > x1 > x2 - w1 and y2 + h2 > y1 > y2 - h1
# Here I am generating a random array of 10 boxes in the format [[x,y], [x,y], [x,y], [x,y]]
np.random.seed(55)
boxes = np.random.randint(10, 150, (10, 4, 2)) + np.random.randint(0, 300, (10, 1, 2))
bounds = [cv2.boundingRect(box) for box in boxes]
valids = [b1 for b1 in bounds if not any(intersect(b1, b2) for b2 in bounds if b1 != b2)]
if valids:
x, y, w, h = max(valids, key=lambda b: b[2] * b[3])
print(f"x: {x} y: {y} w: {w} h: {h}")
else:
print("All boxes intersect.")
Output:
x: 75 y: 251 w: 62 h: 115
For visualization:
import cv2
import numpy as np
def intersect(b1, b2):
x1, y1, w1, h1 = b1
x2, y2, w2, h2 = b2
return x2 + w2 > x1 > x2 - w1 and y2 + h2 > y1 > y2 - h1
np.random.seed(55)
boxes = np.random.randint(10, 150, (10, 4, 2)) + np.random.randint(0, 300, (10, 1, 2))
bounds = [cv2.boundingRect(box) for box in boxes]
valids = [b1 for b1 in bounds if not any(intersect(b1, b2) for b2 in bounds if b1 != b2)]
img = np.zeros((500, 500), "uint8")
for x, y, w, h in bounds:
cv2.rectangle(img, (x, y), (x + w, y + h), 255, 1)
if valids:
x, y, w, h = max(valids, key=lambda b: b[2] * b[3])
cv2.rectangle(img, (x, y), (x + w, y + h), 128, -1)
cv2.imshow("IMAGE", img)
cv2.waitKey(0)
Output:
Assumption: you want the largest box from your array that complies with your rules, and it is not the largest NEW bounding box that complies.
This is pseudo code, you still have to fill in blanks
int largestBoxIndex = -1;
int largestBoxArea = -1;
for (i=0; i<allBoxes[].length; i++)
{
box CurrentBox = allBoxes[i];
bool isComply = false;
for (j=0; j<allBoxes[].length; j++)
{
isComply = false;
if(i==j) break;
ComparedBox = allBoxes[j]
if (isIntersected(CurrentBox, ComparedBox)) break;
if (isInside(CurrentBox, ComparedBox)) break;
isComply = true;
}
if(isComply)
if(Area(allBoxes[i]) > largestBoxArea)
{
largestBoxArea = Area(allBoxes[i]):
largestBoxIndex =i;
}
}
if(largestBoxIndex != -1)
largestBoxIndex;//this is the largest box
A simple mathematical solution to the problem
Suppose you are given 5 rectangles as shown below:
rects = [[100, 100, 200, 200],
[200, 200, 200, 200],
[200, 500, 200, 200],
[350, 50, 150, 200],
[500, 400, 200, 300]]
Note that the format of these rectangles is: [x, y, width, height]
Where, (x, y) is the coordinate of the top left corner of the rectangle, and width & height are the width and height of the rectangle respectively. You will have to covert your coordinates in this format first.
3 out of these 5 are intersecting.
Now what we will do is iterate over these rectangles one by one, and for each rectangle, find the intersection of this rectangle with the other rectangles one by one. If any rectangle is found to be intersecting with any of the other rectangles, then we'll set the flag value for the two rectangles as 0. If a rectangle is found not to be intersecting with any other rectangle, then its flag value will be set to 1. (Default flag value is -1). Finally, we'll find the rectangle of the greatest area among the rectangles with flag value 1.
Let's see the code for finding the intersection area of the two rectangles:
# Rect : [x, y, w, h]
def Intersection(Rect1, Rect2):
x = max(Rect1[0], Rect2[0])
y = max(Rect1[1], Rect2[1])
w = min(Rect1[0] + Rect1[2], Rect2[0] + Rect2[2]) - x
h = min(Rect1[1] + Rect1[3], Rect2[1] + Rect2[3]) - y
if w < 0 or h < 0:
return None
return [x, y, w, h]
This function will return None if there is no intersecting area between these rectangles or it will return the coordinates of the intersection rectangle(Ignore this value for the current problem. This might be helpful in other problems).
Now, let's have a look at the algorithm.
n = len(rects)
# -1 : Not determined
# 0 : Intersects with some
# 1 : No intersection
flag = [-1]*n
for i in range(n):
if flag[i] == 0:
continue
isIntersecting = False
for j in range(n):
if i == j or flag[j] == 1:
continue
Int_Rect = Intersection(rects[i], rects[j])
if Int_Rect is not None:
isIntersecting = True
flag[j] = 0
flag[i] = 0
break
if isIntersecting == False:
flag[i] = 1
# Finding the maximum area rectangle without any intersection.
maxRect = None
maxArea = -1
for i in range(n):
if flag[i] == 1:
if rects[i][2] * rects[i][3] > maxArea:
maxRect = rects[i]
maxArea = rects[i][2] * rects[i][3]
print(maxRect)
Note: Add the "excluded areas" rectangle coordinates to the rects list and assign their flag value as 0 to avoid them from getting selected as the maximum area rectangle.
This solution does not involve any images so it will be the fastest algorithm unless it is optimized.
Find the biggest square in numpy array
Maybe this would help? If you know the size of the whole area you can calculate the biggest box within numpy array. If you set all your given boxes to 1 and your whole area to 0 you need to find the largest area that is unique and not 1.
Here's a O(n^2) solution. find_maxbox takes array of rectangles and convert them into Box objects and then compare each pair of boxes to eliminate invalid rectangles. This solution assumes that the rectangles' sides are parallel to X-Y axes.
class Box():
def __init__(self, coordinates):
self.coordinates = tuple(sorted(coordinates))
self.original = coordinates
self.height = abs(self.coordinates[0][1] - self.coordinates[3][1])
self.width = abs(self.coordinates[0][0] - self.coordinates[3][0])
self.excluded = False
def __eq__(self, b2):
return self.coordinates == b2.coordinates
def get_area(self):
return self.height * self.width
def bounding_box(self, b2):
maxX, maxY = map(max, zip(*self.coordinates, *b2.coordinates))
minX, minY = map(min, zip(*self.coordinates, *b2.coordinates))
return Box([(minX, minY), (maxX, minY), (minX, maxY), (maxX, maxY)])
def intersects(self, b2):
box = self.bounding_box(b2)
if box.height < self.height + b2.height and box.width < self.width + b2.width:
return True
else: return False
def encloses(self, b2):
return self == self.bounding_box(b2)
def exclude(self):
self.excluded = True
def is_excluded(self):
return self.excluded
def __str__(self):
return str(self.original)
def __repr__(self):
return str(self.original)
# Pass array of rectangles as argument.
def find_maxbox(boxes):
boxes = sorted(map(Box, boxes), key=Box.get_area, reverse=True)
_boxes = []
_boxes.append((boxes[0], boxes[0]))
for b1 in boxes[1:]:
b2, bb2 = _boxes[-1]
bbox = b1.bounding_box(bb2)
if not b1.intersects(bb2):
_boxes.append((b1, bbox))
continue
for (b2, bb2) in reversed(_boxes):
if not b1.intersects(bb2):
break
if b1.intersects(b2):
if b2.encloses(b1):
b1.exclude()
break
b1.exclude()
b2.exclude()
_boxes.append((b1, bbox))
for box in boxes:
if box.is_excluded():
continue
else: return box.original
return None
In other words:
rectangles that share points are excluded
of the remaining rectangles, take the largest
No need for contours, centroids, bounding boxes, masking or redrawing pixels!
As stated before, in the provided case, the rectangles coordinates contain duplicates. Here, we use a single class to store the outer limits of the rectangle. The Separating Axis theorem from this answer by #samgak is used in an intersects() method.
from __future__ import annotations # optional
from dataclasses import dataclass # optional ?
#dataclass
class Rectangle:
left: int
top: int
right: int
bottom: int
def __repr__(self):
"""String representation of the rectangle's coordinates."""
return f"⟔ {self.left},{self.top} ⟓ {self.right},{self.bottom}"
def intersects(self, other: Rectangle):
"""Whether this Rectangle shares points with another Rectangle."""
h = self.right < other.left or self.left > other.right
v = self.bottom < other.top or self.top > other.bottom
return not h or not v
def size(self):
"""An indicator of the Rectangle's size, equal to half the perimeter."""
return self.right - self.left + self.bottom - self.top
main = Rectangle(100, 100, 325, 325)
others = {
0: Rectangle(100, 100, 400, 400),
1: Rectangle(200, 200, 300, 300),
2: Rectangle(200, 300, 300, 500),
3: Rectangle(300, 300, 500, 500),
4: Rectangle(500, 500, 600, 600),
5: Rectangle(350, 350, 600, 600),
}
for i, r in others.items():
print(i, main.intersects(r), r.size())
Simply put, h is True if the other rectangle is completely to the left or to the right; v is True if it's at the top or the bottom. The intersects() method returns True if the rectangles share points (even so much as a corner).
Output:
0 True 600
1 True 200
2 True 300
3 True 400
4 False 500
5 False 200
It is then trivial to find the largest:
valid = {r.size():i for i, r in others.items() if not main.intersects(r)}
print('Largest:', valid[max(valid)], 'with size', max(valid))
Output:
Largest: 4 with size 500
This answer assumes left < right and top < bottom for all rectangles.
The following function turns the provided rectangle coordinates to the kind used by the Rectangle class above. This assumes that the order is [[l, t], [r, t], [r, b], [l, b]] (a path).
def trim(coordinates):
"""Remove redundant coordinates in a path describing a rectangle."""
return coordinates[0][0], coordinates[1][1], coordinates[2][0], coordinates[3][1]
Finally, we want to do this for all rectangles, not just a "main" one. We can simply have each rectangle be the main one in turns. Use itertools.combinations() on an iterable such as a list:
itertools.combinations(rectangles, 2)
This will ensure that we don't compare two rectangles more than one time.
Related
The output that I get is just the reference image and no bounding box is seen in the output.
I have tried this code from this website: https://www.sicara.fr/blog-technique/object-detection-template-matching
Here's the reference image
Reference Image
Here are the templates:
1:
templates 2:
templates 3:
As compared to the website, using the code the output should look like this:
Expected Output:
I am expecting to have this output as discussed in the website, however, when I tried to run this code, nothing seems to be detected. Here is the code that I copied:
import cv2
import numpy as np
DEFAULT_TEMPLATE_MATCHING_THRESHOLD = 0.9
class Template:
"""
A class defining a template
"""
def __init__(self, image_path, label, color, matching_threshold=DEFAULT_TEMPLATE_MATCHING_THRESHOLD):
"""
Args:
image_path (str): path of the template image path
label (str): the label corresponding to the template
color (List[int]): the color associated with the label (to plot detections)
matching_threshold (float): the minimum similarity score to consider an object is detected by template
matching
"""
self.image_path = image_path
self.label = label
self.color = color
self.template = cv2.imread(image_path)
self.template_height, self.template_width = self.template.shape[:2]
self.matching_threshold = matching_threshold
image = cv2.imread("reference.jpg")
templates = [
Template(image_path="Component1.jpg", label="1", color=(0, 0, 255), matching_threshold=0.99),
Template(image_path="Component2.jpg", label="2", color=(0, 255, 0,) , matching_threshold=0.91),
Template(image_path="Component3.jpg", label="3", color=(0, 191, 255), matching_threshold=0.99),
detections = []
for template in templates:
template_matching = cv2.matchTemplate(template.template, image, cv2.TM_CCORR_NORMED)
match_locations = np.where(template_matching >= template.matching_threshold)
for (x, y) in zip(match_locations[1], match_locations[0]):
match = {
"TOP_LEFT_X": x,
"TOP_LEFT_Y": y,
"BOTTOM_RIGHT_X": x + template.template_width,
"BOTTOM_RIGHT_Y": y + template.template_height,
"MATCH_VALUE": template_matching[y, x],
"LABEL": template.label,
"COLOR": template.color
}
detections.append(match)
def compute_iou(boxA, boxB):
xA = max(boxA["TOP_LEFT_X"], boxB["TOP_LEFT_X"])
yA = max(boxA["TOP_LEFT_Y"], boxB["TOP_LEFT_Y"])
xB = min(boxA["BOTTOM_RIGHT_X"], boxB["BOTTOM_RIGHT_X"])
yB = min(boxA["BOTTOM_RIGHT_Y"], boxB["BOTTOM_RIGHT_Y"])
interArea = max(0, xB - xA + 1) * max(0, yB - yA + 1)
boxAArea = (boxA["BOTTOM_RIGHT_X"] - boxA["TOP_LEFT_X"] + 1) * (boxA["BOTTOM_RIGHT_Y"] - boxA["TOP_LEFT_Y"] + 1)
boxBArea = (boxB["BOTTOM_RIGHT_X"] - boxB["TOP_LEFT_X"] + 1) * (boxB["BOTTOM_RIGHT_Y"] - boxB["TOP_LEFT_Y"] + 1)
iou = interArea / float(boxAArea + boxBArea - interArea)
return iou
def non_max_suppression(objects, non_max_suppression_threshold=0.5, score_key="MATCH_VALUE"):
"""
Filter objects overlapping with IoU over threshold by keeping only the one with maximum score.
Args:
objects (List[dict]): a list of objects dictionaries, with:
{score_key} (float): the object score
{top_left_x} (float): the top-left x-axis coordinate of the object bounding box
{top_left_y} (float): the top-left y-axis coordinate of the object bounding box
{bottom_right_x} (float): the bottom-right x-axis coordinate of the object bounding box
{bottom_right_y} (float): the bottom-right y-axis coordinate of the object bounding box
non_max_suppression_threshold (float): the minimum IoU value used to filter overlapping boxes when
conducting non-max suppression.
score_key (str): score key in objects dicts
Returns:
List[dict]: the filtered list of dictionaries.
"""
sorted_objects = sorted(objects, key=lambda obj: obj[score_key], reverse=True)
filtered_objects = []
for object_ in sorted_objects:
overlap_found = False
for filtered_object in filtered_objects:
iou = compute_iou(object_, filtered_object)
if iou > non_max_suppression_threshold:
overlap_found = True
break
if not overlap_found:
filtered_objects.append(object_)
return filtered_objects
NMS_THRESHOLD = 0.2
detections = non_max_suppression(detections, non_max_suppression_threshold=NMS_THRESHOLD)
image_with_detections = image.copy()
for detection in detections:
cv2.rectangle(
image_with_detections,
(detection["TOP_LEFT_X"], detection["TOP_LEFT_Y"]),
(detection["BOTTOM_RIGHT_X"], detection["BOTTOM_RIGHT_Y"]),
detection["COLOR"],
2,
)
cv2.putText(
image_with_detections,
f"{detection['LABEL']} - {detection['MATCH_VALUE']}",
(detection["TOP_LEFT_X"] + 2, detection["TOP_LEFT_Y"] + 20),
cv2.FONT_HERSHEY_SIMPLEX, 0.5,
detection["COLOR"], 1,
cv2.LINE_AA,
)
# NMS_THRESHOLD = 0.2
# detection = non_max_suppression(detections, non_max_suppression_threshold=NMS_THRESHOLD)
print("Image written to file-system: ", status)
cv2.imshow("res", image_with_detections)
cv2.waitKey(0)
this is how his final output looks like:
5
Here's my attempt in detecting the larger components, the code was able to detect them and here is the result:
Result
Here are the resize templates and the original components that I wanted to detect but unfortunately can't:
1st 2nd 3rd
Here is a method of finding multiple matches in template matching in Python/OpenCV using your reference and smallest template. I have remove all the white padding you had around your template. My method simply draws a black rectangle over the correlation image where it matches and then repeats looking for the next best match in the modified correlation image.
I have used cv2.TM_CCORR_NORMED and a match threshold of 0.90. You have 4 of these templates showing in your reference image, so I set my search number to 4 and spacing of 10 for the non-maximum suppression by masking. You have other small items of the same shape and size, but the text on them is different. So you will need different templates for each.
Reference:
Template:
import cv2
import numpy as np
# read image
img = cv2.imread('circuit_board.jpg')
# read template
tmplt = cv2.imread('circuit_item.png')
hh, ww, cc = tmplt.shape
# set arguments
match_thresh = 0.90 # stopping threshold for match value
num_matches = 4 # stopping threshold for number of matches
match_radius = 10 # approx radius of match peaks
match_radius2 = match_radius//2
# get correlation surface from template matching
corrimg = cv2.matchTemplate(img,tmplt,cv2.TM_CCORR_NORMED)
hc, wc = corrimg.shape
# get locations of all peaks higher than match_thresh for up to num_matches
imgcopy = img.copy()
corrcopy = corrimg.copy()
for i in range(0, num_matches):
# get max value and location of max
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(corrcopy)
x1 = max_loc[0]
y1 = max_loc[1]
x2 = x1 + ww
y2 = y1 + hh
loc = str(x1) + "," + str(y1)
if max_val > match_thresh:
print("match number:", i+1, "match value:", max_val, "match x,y:", loc)
# draw draw white bounding box to define match location
cv2.rectangle(imgcopy, (x1,y1), (x2,y2), (255,255,255), 1)
# insert black rectangle over copy of corr image so not find that match again
corrcopy[y1-match_radius2:y1+match_radius2, x1-match_radius2:x1+match_radius2] = 0
i = i + 1
else:
break
# save results
# power of 4 exaggeration of correlation image to emphasize peaks
cv2.imwrite('circuit_board_multi_template_corr.png', (255*cv2.pow(corrimg,4)).clip(0,255).astype(np.uint8))
cv2.imwrite('circuit_board_multi_template_corr_masked.png', (255*cv2.pow(corrcopy,4)).clip(0,255).astype(np.uint8))
cv2.imwrite('circuit_board_multi_template_match.png', imgcopy)
# show results
# power of 4 exaggeration of correlation image to emphasize peaks
cv2.imshow('image', img)
cv2.imshow('template', tmplt)
cv2.imshow('corr', cv2.pow(corrimg,4))
cv2.imshow('corr masked', cv2.pow(corrcopy,4))
cv2.imshow('result', imgcopy)
cv2.waitKey(0)
cv2.destroyAllWindows()
Original Correlation Image:
Modified Correlation Image after 4 matches:
Matches Marked on Input as White Rectangles:
Match Locations:
match number: 1 match value: 0.9982172250747681 match x,y: 128,68
match number: 2 match value: 0.9762057065963745 match x,y: 128,90
match number: 3 match value: 0.9755787253379822 match x,y: 128,48
match number: 4 match value: 0.963689923286438 match x,y: 127,107
I have these two Images
Vertical white line on a black background
Horizontal white line on a black background
I used the following code to get the angle of the line
import numpy as np
import cv2
x = cv2.imread('ver.png')
cv_image = cv2.cvtColor(x, cv2.COLOR_RGB2GRAY)
ret, thresh = cv2.threshold(cv_image,70,255,cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
recta = cv2.minAreaRect(contours[0])
center_x, center_y, angle = recta
print (angle)
but the printed angle value is the same for both images which is -90
it is mentioned in the documentation that the cv2.minAreaRect() returns as following:
( top-left corner(x,y), (width, height), angle of rotation )
but for me, it only returns (center-x, center-y, angle)
btw: I wanna write a code for a line follower drone so that I need to know the angle of the detected line so I adjust my drone according to it
minAreaRect prints -90 for both because it defines the rectangle differently for those lines (you can swap the width and height and end up with the same rectangle). If you need something that can distinguish between them then you can take the rectangle corners and find the longer side. You can use that line to calculate an angle.
The following code will distinguish between the two (0 degrees for horizontal, -89.999999 degrees for vertical). It should be bound between [-90, 90] degrees (relative to the bottom of the screen).
import numpy as np
import cv2
import math
# 2d distance
def dist2D(one, two):
dx = one[0] - two[0];
dy = one[1] - two[1];
return math.sqrt(dx*dx + dy*dy);
# angle between three points (the last point is the middle)
def angle3P(p1, p2, p3):
# get distances
a = dist2D(p3, p1);
b = dist2D(p3, p2);
c = dist2D(p1, p2);
# calculate angle // assume a and b are nonzero
# (law of cosines)
numer = c**2 - a**2 - b**2;
denom = -2 * a * b;
if denom == 0:
denom = 0.000001;
rads = math.acos(numer / denom);
degs = math.degrees(rads);
# check if past 90 degrees
return degs;
# get the rotated box
x = cv2.imread('horizontal.png')
cv_image = cv2.cvtColor(x, cv2.COLOR_RGB2GRAY)
ret, thresh = cv2.threshold(cv_image,70,255,cv2.THRESH_BINARY)
_, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
recta = cv2.minAreaRect(contours[0])
center_x, center_y, angle = recta
# get the corners
box = cv2.boxPoints(recta)
box = np.int0(box)
# choose the first point
root = box[0];
# find the longer side
end = None;
one = box[-1];
two = box[1];
if dist2D(one, root) > dist2D(two, root):
end = one;
else:
end = two;
# take the left-most point
left_point = None;
right_point = None;
if end[0] < root[0]:
left_point = end;
right_point = root;
else:
left_point = root;
right_point = end;
# calculate the angle [-90, 90]
offshoot = [left_point[0] + 100, left_point[1]];
angle = angle3P(right_point, offshoot, left_point);
if left_point[1] > right_point[0]:
angle = -angle;
print(angle);
Edit:
Woops, I got my orientation mixed up. I edited the code, now it should be from [-90, 90] degrees.
I have 2 contours (cont1 and cont2) received from cv2.findContours(). How do I know if they intersect or not? I don't need coordinates, I only need a boolean True or False.
I have attempted different ways and already tried to do a check with
if ((cont1 & cont2).area() > 0):
...
but got the error that the array has no method "Area()"
...
cont1array = cv2.findContours(binary1, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]
cont2array = cv2.findContours(binary2, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)[0]
...
for cont1 in cont1array:
for cont2 in cont2array:
print("cont1")
print(cont1)
print(type(cont1))
print("cont2")
print(cont2)
print(type(cont2))
> if cont1 and cont2 intersect: #i dont know how check intersect
print("yes they intersect")
else:
print("no they do not intersect")
# cont1
# [[172 302]
# [261 301]
# [262 390]
# [173 391]]
# <class 'numpy.ndarray'>
# cont2
# [[ 0 0]
# [ 0 699]
# [499 699]
# [499 0]]
# <class 'numpy.ndarray'>
The answer by nathancy works, but suffers on the performance side where as in the example creates 3 copies of the image to draw the contours thus, is sluggish when it comes to execution time.
My alternative answer is as below;
def contour_intersect(cnt_ref,cnt_query, edges_only = True):
intersecting_pts = []
## Loop through all points in the contour
for pt in cnt_query:
x,y = pt[0]
## find point that intersect the ref contour
## edges_only flag check if the intersection to detect is only at the edges of the contour
if edges_only and (cv2.pointPolygonTest(cnt_ref,(x,y),True) == 0):
intersecting_pts.append(pt[0])
elif not(edges_only) and (cv2.pointPolygonTest(cnt_ref,(x,y),True) >= 0):
intersecting_pts.append(pt[0])
if len(intersecting_pts) > 0:
return True
else:
return False
EDIT!!
After testing this code, realized that this check fails when there are no two similar points of a contour. Thus, I've rewritten the algorithm which checks of two contour lines intersect.
def ccw(A,B,C):
return (C[1]-A[1]) * (B[0]-A[0]) > (B[1]-A[1]) * (C[0]-A[0])
def contour_intersect(cnt_ref,cnt_query):
## Contour is a list of points
## Connect each point to the following point to get a line
## If any of the lines intersect, then break
for ref_idx in range(len(cnt_ref)-1):
## Create reference line_ref with point AB
A = cnt_ref[ref_idx][0]
B = cnt_ref[ref_idx+1][0]
for query_idx in range(len(cnt_query)-1):
## Create query line_query with point CD
C = cnt_query[query_idx][0]
D = cnt_query[query_idx+1][0]
## Check if line intersect
if ccw(A,C,D) != ccw(B,C,D) and ccw(A,B,C) != ccw(A,B,D):
## If true, break loop earlier
return True
return False
Once you have the two contours from cv2.findContours(), you can use a bitwise AND operation to detect intersection. Specifically, we can use np.logical_and(). The idea is to create two separate images for each contour and then use the logical AND operation on them. Any points that have a positive value (1 or True) will be points of intersection. So since you're only looking to obtain a boolean value of whether there is intersection, we can check the intersected image to see if there is a single positive value. Essentially, if the entire array is False then there was no intersection between the contours. But if there is a single True, then the contours touched and thus intersect.
def contourIntersect(original_image, contour1, contour2):
# Two separate contours trying to check intersection on
contours = [contour1, contour2]
# Create image filled with zeros the same size of original image
blank = np.zeros(original_image.shape[0:2])
# Copy each contour into its own image and fill it with '1'
image1 = cv2.drawContours(blank.copy(), [contours[0]], 0, 1)
image2 = cv2.drawContours(blank.copy(), [contours[1]], 1, 1)
# Use the logical AND operation on the two images
# Since the two images had bitwise and applied to it,
# there should be a '1' or 'True' where there was intersection
# and a '0' or 'False' where it didnt intersect
intersection = np.logical_and(image1, image2)
# Check if there was a '1' in the intersection
return intersection.any()
Example
Original Image
Detected Contour
We now pass the two detected contours to the function and obtain this intersection array:
[[False False False ... False False False]
[False False False ... False False False]
[False False False ... False False False]
...
[False False False ... False False False]
[False False False ... False False False]
[False False False ... False False False]]
We check the intersection array to see if True exists. We will obtain a True or 1 where the contours intersect and False or 0 where they do not.
return intersection.any()
Thus we obtain
False
Full code
import cv2
import numpy as np
def contourIntersect(original_image, contour1, contour2):
# Two separate contours trying to check intersection on
contours = [contour1, contour2]
# Create image filled with zeros the same size of original image
blank = np.zeros(original_image.shape[0:2])
# Copy each contour into its own image and fill it with '1'
image1 = cv2.drawContours(blank.copy(), contours, 0, 1)
image2 = cv2.drawContours(blank.copy(), contours, 1, 1)
# Use the logical AND operation on the two images
# Since the two images had bitwise AND applied to it,
# there should be a '1' or 'True' where there was intersection
# and a '0' or 'False' where it didnt intersect
intersection = np.logical_and(image1, image2)
# Check if there was a '1' in the intersection array
return intersection.any()
original_image = cv2.imread("base.png")
image = original_image.copy()
cv2.imshow("original", image)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
cv2.imshow("gray", gray)
blurred = cv2.GaussianBlur(gray, (5,5), 0)
cv2.imshow("blur", blurred)
threshold = cv2.threshold(blurred, 60, 255, cv2.THRESH_BINARY)[1]
cv2.imshow("thresh", threshold)
contours = cv2.findContours(threshold.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Depending on OpenCV version, number of arguments return by cv.findContours
# is either 2 or 3
contours = contours[1] if len(contours) == 3 else contours[0]
contour_list = []
for c in contours:
contour_list.append(c)
cv2.drawContours(image, [c], 0, (0,255,0), 2)
print(contourIntersect(original_image, contour_list[0], contour_list[1]))
cv2.imshow("contour", image)
cv2.waitKey(0)
#Ivans and #nathancys answers are the best ones I saw here. However, drawing lines is still compute intensive, especially if there are many points in your contours, while computing bitwise ands directly can harm performance, especially if your canvas is large. A simple way to improve performance is to first check for bbox intersections; if you see that bboxes dont intersect, you know the contours dont. If your bboxes do intersect, just draw the smallest filled (or outline) ROI for both contours and compute a simple bitwise and. I have found this to provide significant speedups compared to the other techniques listed here, and prevents issues with large, complex contours on a large canvas. I use torch to compute bbox ious for simplicity/legibility.
import cv2
import numpy as np
import torchvision.ops.boxes as bops
def contour_intersect(cnt_ref, cnt_query):
## Contours are both an np array of points
## Check for bbox intersection, then check pixel intersection if bboxes intersect
# first check if it is possible that any of the contours intersect
x1, y1, w1, h1 = cv2.boundingRect(cnt_ref)
x2, y2, w2, h2 = cv2.boundingRect(cnt_query)
# get contour areas
area_ref = cv2.contourArea(cnt_ref)
area_query = cv2.contourArea(cnt_query)
# get coordinates as tensors
box1 = torch.tensor([[x1, y1, x1 + w1, y1 + h1]], dtype=torch.float)
box2 = torch.tensor([[x2, y2, x2 + w2, y2 + h2]], dtype=torch.float)
# get bbox iou
iou = bops.box_iou(box1, box2)
if iou == 0:
# bboxes dont intersect, so contours dont either
return False
else:
# bboxes intersect, now check pixels
# get the height, width, x, and y of the smaller contour
if area_ref >= area_query:
h = h2
w = w2
x = x2
y = y2
else:
h = h1
w = w1
x = x1
y = y1
# get a canvas to draw the small contour and subspace of the large contour
contour_canvas_ref = np.zeros((h, w), dtype='uint8')
contour_canvas_query = np.zeros((h, w), dtype='uint8')
# draw the pixels areas, filled (can also be outline)
cv2.drawContours(contour_canvas_ref, [cnt_ref], -1, 255, thickness=cv2.FILLED,
offset=(-x, -y))
cv2.drawContours(contour_canvas_query, [cnt_query], -1, 255, thickness=cv2.FILLED,
offset=(-x, -y))
# check for any pixel overlap
return np.any(np.bitwise_and(contour_canvas_ref, contour_canvas_query))
To handle the case where one contour contains another, we can replace
image1 = cv2.drawContours(blank.copy(), contours, 0, 1)
image2 = cv2.drawContours(blank.copy(), contours, 1, 1)
of Nathancy's answer with
image1 = cv2.fillPoly(blank.copy(), [contour1], 1)
image2 = cv2.fillPoly(blank.copy(), [contour2], 1)
Extracting table data from digital PDFs have been simple using camelot and tabula. However, the solution doesn't work with scanned images of the document pages specifically when the table doesn't have borders and inner grids. I have been trying to generate vertical and horizontal lines using OpenCV. However, since the scanned images will have slight rotation angles, it is difficult to proceed with the approach.
How can we utilize OpenCV to generate grids (horizontal and vertical lines) and borders for the scanned document page which contains table data (along with paragraphs of text)? If this is feasible, how to nullify the rotation angle of the scanned image?
I wrote some code to estimate the horizontal lines from the printed letters in the page. The same could be done for vertical ones I guess. The code below follows some general assumptions, here
some basic steps in pseudo code style:
prepare picture for contour detection
do contour detection
we assume most contours are letters
calc mean width of all contours
calc mean area of contours
filter all contours with two conditions:
a) contour (letter) heigths < meanHigh * 2
b) contour area > 4/5 meanArea
calc center point of all remaining contours
assume we have line regions (bins)
list all center point which are inside the region
do linear regression of region points
save slope and intercept
calc mean slope and intercept
here the full code:
import cv2
import numpy as np
from scipy import stats
def resizeImageByPercentage(img,scalePercent = 60):
width = int(img.shape[1] * scalePercent / 100)
height = int(img.shape[0] * scalePercent / 100)
dim = (width, height)
# resize image
return cv2.resize(img, dim, interpolation = cv2.INTER_AREA)
def calcAverageContourWithAndHeigh(contourList):
hs = list()
ws = list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
ws.append(w)
hs.append(h)
return np.mean(ws),np.mean(hs)
def calcAverageContourArea(contourList):
areaList = list()
for cnt in contourList:
a = cv2.minAreaRect(cnt)
areaList.append(a[2])
return np.mean(areaList)
def calcCentroid(contour):
houghMoments = cv2.moments(contour)
# calculate x,y coordinate of centroid
if houghMoments["m00"] != 0: #case no contour could be calculated
cX = int(houghMoments["m10"] / houghMoments["m00"])
cY = int(houghMoments["m01"] / houghMoments["m00"])
else:
# set values as what you need in the situation
cX, cY = -1, -1
return cX,cY
def getCentroidWhenSizeInRange(contourList,letterSizeWidth,letterSizeHigh,deltaOffset,minLetterArea=10.0):
centroidList=list()
for cnt in contourList:
(x, y, w, h) = cv2.boundingRect(cnt)
area = cv2.minAreaRect(cnt)
#calc diff
diffW = abs(w-letterSizeWidth)
diffH = abs(h-letterSizeHigh)
#thresold A: almost smaller than mean letter size +- offset
#when almost letterSize
if diffW < deltaOffset and diffH < deltaOffset:
#threshold B > min area
if area[2] > minLetterArea:
cX,cY = calcCentroid(cnt)
if cX!=-1 and cY!=-1:
centroidList.append((cX,cY))
return centroidList
DEBUGMODE = True
#read image, do git clone https://github.com/WZBSocialScienceCenter/pdftabextract.git for the example
img = cv2.imread('pdftabextract/examples/catalogue_30s/data/ALA1934_RR-excerpt.pdf-2_1.png')
#get some basic infos
imgHeigh, imgWidth, imgChannelAmount = img.shape
if DEBUGMODE:
cv2.imwrite("img00original.jpg",resizeImageByPercentage(img,30))
cv2.imshow("original",img)
# prepare img
imgGrey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply Gaussian filter
imgGaussianBlur = cv2.GaussianBlur(imgGrey,(5,5),0)
#make binary img, black or white
_, imgBinThres = cv2.threshold(imgGaussianBlur, 130, 255, cv2.THRESH_BINARY)
## detect contours
contours, _ = cv2.findContours(imgBinThres, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
#we get some letter parameter
averageLetterWidth, averageLetterHigh = calcAverageContourWithAndHeigh(contours)
threshold1AllowedLetterSizeOffset = averageLetterHigh * 2 # double size
averageContourAreaSizeOfMinRect = calcAverageContourArea(contours)
threshHold2MinArea = 4 * averageContourAreaSizeOfMinRect / 5 # 4/5 * mean
print("mean letter Width: ", averageLetterWidth)
print("mean letter High: ", averageLetterHigh)
print("threshold 1 tolerance: ", threshold1AllowedLetterSizeOffset)
print("mean letter area ", averageContourAreaSizeOfMinRect)
print("thresold 2 min letter area ", threshHold2MinArea)
#we get all centroid of letter sizes contours, the other we ignore
centroidList = getCentroidWhenSizeInRange(contours,averageLetterWidth,averageLetterHigh,threshold1AllowedLetterSizeOffset,threshHold2MinArea)
if DEBUGMODE:
#debug print all centers:
imgFilteredCenter = img.copy()
for cX,cY in centroidList:
#draw in red color as BGR
cv2.circle(imgFilteredCenter, (cX, cY), 5, (0, 0, 255), -1)
cv2.imwrite("img01letterCenters.jpg",resizeImageByPercentage(imgFilteredCenter,30))
cv2.imshow("letterCenters",imgFilteredCenter)
#we estimate a bin widths
amountPixelFreeSpace = averageLetterHigh #TODO get better estimate out of histogram
estimatedBinWidth = round( averageLetterHigh + amountPixelFreeSpace) #TODO round better ?
binCollection = dict() #range(0,imgHeigh,estimatedBinWidth)
#we do seperate the center points into bins by y coordinate
for i in range(0,imgHeigh,estimatedBinWidth):
listCenterPointsInBin = list()
yMin = i
yMax = i + estimatedBinWidth
for cX,cY in centroidList:
if yMin < cY < yMax:#if fits in bin
listCenterPointsInBin.append((cX,cY))
binCollection[i] = listCenterPointsInBin
#we assume all point are in one line ?
#model = slope (x) + intercept
#model = m (x) + n
mList = list() #slope abs in img
nList = list() #intercept abs in img
nListRelative = list() #intercept relative to bin start
minAmountRegressionElements = 12 #is also alias for letter amount we expect
#we do regression for every point in the bin
for startYOfBin, values in binCollection.items():
#we reform values
xValues = [] #TODO use more short transform
yValues = []
for x,y in values:
xValues.append(x)
yValues.append(y)
#we assume a min limit of point in bin
if len(xValues) >= minAmountRegressionElements :
slope, intercept, r, p, std_err = stats.linregress(xValues, yValues)
mList.append(slope)
nList.append(intercept)
#we calc the relative intercept
nRelativeToBinStart = intercept - startYOfBin
nListRelative.append(nRelativeToBinStart)
if DEBUGMODE:
#we debug print all lines in one picute
imgLines = img.copy()
colorOfLine = (0, 255, 0) #green
for i in range(0,len(mList)):
slope = mList[i]
intercept = nList[i]
startPoint = (0, int( intercept)) #better round ?
endPointY = int( (slope * imgWidth + intercept) )
if endPointY < 0:
endPointY = 0
endPoint = (imgHeigh,endPointY)
cv2.line(imgLines, startPoint, endPoint, colorOfLine, 2)
cv2.imwrite("img02lines.jpg",resizeImageByPercentage(imgLines,30))
cv2.imshow("linesOfLetters ",imgLines)
#we assume in mean we got it right
meanIntercept = np.mean(nListRelative)
meanSlope = np.mean(mList)
print("meanIntercept :", meanIntercept)
print("meanSlope ", meanSlope)
#TODO calc angle with math.atan(slope) ...
if DEBUGMODE:
cv2.waitKey(0)
original:
center point of letters:
lines:
I had the same problem some time ago and this tutorial is the solution to that. It explains using pdftabextract which is a Python library by Markus Konrad and leverages OpenCV’s Hough transform to detect the lines and works even if the scanned document is a bit tilted. The tutorial walks your through parsing a 1920s German newspaper
I am doing a college class project on image processing. This is my original image:
I want to join nearby/overlapping bounding boxes on individual text line images, but I don't know how. My code looks like this so far (thanks to #HansHirse for the help):
import os
import cv2
import numpy as np
from scipy import stats
image = cv2.imread('example.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
#dilation
kernel = np.ones((5,5), np.uint8)
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
#find contours
ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# https://www.pyimagesearch.com/2015/04/20/sorting-contours-using-python-and-opencv/
def sort_contours(cnts, method="left-to-right"):
# initialize the reverse flag and sort index
reverse = False
i = 0
# handle if we need to sort in reverse
if method == "right-to-left" or method == "bottom-to-top":
reverse = True
# handle if we are sorting against the y-coordinate rather than
# the x-coordinate of the bounding box
if method == "top-to-bottom" or method == "bottom-to-top":
i = 1
# construct the list of bounding boxes and sort them from top to
# bottom
boundingBoxes = [cv2.boundingRect(c) for c in cnts]
(cnts, boundingBoxes) = zip(*sorted(zip(cnts, boundingBoxes),
key=lambda b: b[1][i], reverse=reverse))
# return the list of sorted contours and bounding boxes
return (cnts, boundingBoxes)
sortedctrs,sortedbbs=sort_contours(ctrs)
xyminmax=[]
for cnt in sortedctrs:
x, y, w, h = cv2.boundingRect(cnt)
xyminmax.append([x,y,x+w,y+h])
distances=[]
for i in range(len(xyminmax)):
try:
first_xmax = xyminmax[i][2]
second_xmin = xyminmax[i + 1][0]
distance=abs(second_xmin-first_xmax)
distances.append(distance)
except IndexError:
pass
THRESHOLD=stats.mode(distances, axis=None)[0][0]
new_rects=[]
for i in range(len(xyminmax)):
try:
# [xmin,ymin,xmax,ymax]
first_ymin=xyminmax[i][1]
first_ymax=xyminmax[i][3]
second_ymin=xyminmax[i+1][1]
second_ymax=xyminmax[i+1][3]
first_xmax = xyminmax[i][2]
second_xmin = xyminmax[i+1][0]
firstheight=abs(first_ymax-first_ymin)
secondheight=abs(second_ymax-second_ymin)
distance=abs(second_xmin-first_xmax)
if distance<THRESHOLD:
new_xmin=xyminmax[i][0]
new_xmax=xyminmax[i+1][2]
if first_ymin>second_ymin:
new_ymin=second_ymin
else:
new_ymin = first_ymin
if firstheight>secondheight:
new_ymax = first_ymax
else:
new_ymax = second_ymax
new_rects.append([new_xmin,new_ymin,new_xmax,new_ymax])
else:
new_rects.append(xyminmax[i])
except IndexError:
pass
for rect in new_rects:
cv2.rectangle(image, (rect[0], rect[1]), (rect[2], rect[3]), (121, 11, 189), 2)
cv2.imwrite("result.png",image)
which produces this image as a result:
I want to join very close or overlapping bounding boxes such as these
into a single bounding box so the formula doesn't get separated into single characters. I have tried using cv2.groupRectangles but the print results were just NULL.
So, here comes my solution. I partially modified your (initial) code to my preferred naming, etc. Also, I commented all the stuff, I added.
import cv2
import numpy as np
image = cv2.imread('images/example.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
kernel = np.ones((5, 5), np.uint8)
img_dilated = cv2.dilate(thresh, kernel, iterations = 1)
cnts, _ = cv2.findContours(img_dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Array of initial bounding rects
rects = []
# Bool array indicating which initial bounding rect has
# already been used
rectsUsed = []
# Just initialize bounding rects and set all bools to false
for cnt in cnts:
rects.append(cv2.boundingRect(cnt))
rectsUsed.append(False)
# Sort bounding rects by x coordinate
def getXFromRect(item):
return item[0]
rects.sort(key = getXFromRect)
# Array of accepted rects
acceptedRects = []
# Merge threshold for x coordinate distance
xThr = 5
# Iterate all initial bounding rects
for supIdx, supVal in enumerate(rects):
if (rectsUsed[supIdx] == False):
# Initialize current rect
currxMin = supVal[0]
currxMax = supVal[0] + supVal[2]
curryMin = supVal[1]
curryMax = supVal[1] + supVal[3]
# This bounding rect is used
rectsUsed[supIdx] = True
# Iterate all initial bounding rects
# starting from the next
for subIdx, subVal in enumerate(rects[(supIdx+1):], start = (supIdx+1)):
# Initialize merge candidate
candxMin = subVal[0]
candxMax = subVal[0] + subVal[2]
candyMin = subVal[1]
candyMax = subVal[1] + subVal[3]
# Check if x distance between current rect
# and merge candidate is small enough
if (candxMin <= currxMax + xThr):
# Reset coordinates of current rect
currxMax = candxMax
curryMin = min(curryMin, candyMin)
curryMax = max(curryMax, candyMax)
# Merge candidate (bounding rect) is used
rectsUsed[subIdx] = True
else:
break
# No more merge candidates possible, accept current rect
acceptedRects.append([currxMin, curryMin, currxMax - currxMin, curryMax - curryMin])
for rect in acceptedRects:
img = cv2.rectangle(image, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (121, 11, 189), 2)
cv2.imwrite("images/result.png", image)
For your example
I get the following output
Now, you have to find a proper threshold to meet your expectations. Maybe, there is even some more work to do, especially to get the whole formula, since the distances don't vary that much.
Disclaimer: I'm new to Python in general, and specially to the Python API of OpenCV (C++ for the win). Comments, improvements, highlighting Python no-gos are highly welcome!
Here is a slightly different approach, using the OpenCV Wrapper library.
import cv2
import opencv_wrapper as cvw
image = cv2.imread("example.png")
gray = cvw.bgr2gray(image)
thresh = cvw.threshold_otsu(gray, inverse=True)
# dilation
img_dilation = cvw.dilate(thresh, 5)
# Find contours
contours = cvw.find_external_contours(img_dilation)
# Map contours to bounding rectangles, using bounding_rect property
rects = map(lambda c: c.bounding_rect, contours)
# Sort rects by top-left x (rect.x == rect.tl.x)
sorted_rects = sorted(rects, key=lambda r: r.x)
# Distance threshold
dt = 5
# List of final, joined rectangles
final_rects = [sorted_rects[0]]
for rect in sorted_rects[1:]:
prev_rect = final_rects[-1]
# Shift rectangle `dt` back, to find out if they overlap
shifted_rect = cvw.Rect(rect.tl.x - dt, rect.tl.y, rect.width, rect.height)
intersection = cvw.rect_intersection(prev_rect, shifted_rect)
if intersection is not None:
# Join the two rectangles
min_y = min((prev_rect.tl.y, rect.tl.y))
max_y = max((prev_rect.bl.y, rect.bl.y))
max_x = max((prev_rect.br.x, rect.br.x))
width = max_x - prev_rect.tl.x
height = max_y - min_y
new_rect = cvw.Rect(prev_rect.tl.x, min_y, width, height)
# Add new rectangle to final list, making it the new prev_rect
# in the next iteration
final_rects[-1] = new_rect
else:
# If no intersection, add the box
final_rects.append(rect)
for rect in sorted_rects:
cvw.rectangle(image, rect, cvw.Color.MAGENTA, line_style=cvw.LineStyle.DASHED)
for rect in final_rects:
cvw.rectangle(image, rect, cvw.Color.GREEN, thickness=2)
cv2.imwrite("result.png", image)
And the result
The green boxes are the final result, while the magenta boxes are the original ones.
I used the same threshold as #HansHirse.
The equals sign still needs some work. Either a higher dilation kernel size or use the same technique vertically.
Disclosure: I am the author of OpenCV Wrapper.
Easy-to-read solution:
contours = get_contours(frame)
boxes = [cv2.boundingRect(c) for c in contours]
boxes = merge_boxes(boxes, x_val=40, y_val=20) # Where x_val and y_val are axis thresholds
def get_contours(frame): # Returns a list of contours
contours = cv2.findContours(frame, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = imutils.grab_contours(contours)
return contours
def merge_boxes(boxes, x_val, y_val):
size = len(boxes)
if size < 2:
return boxes
if size == 2:
if boxes_mergeable(boxes[0], boxes[1], x_val, y_val):
boxes[0] = union(boxes[0], boxes[1])
del boxes[1]
return boxes
boxes = sorted(boxes, key=lambda r: r[0])
i = size - 2
while i >= 0:
if boxes_mergeable(boxes[i], boxes[i + 1], x_val, y_val):
boxes[i] = union(boxes[i], boxes[i + 1])
del boxes[i + 1]
i -= 1
return boxes
def boxes_mergeable(box1, box2, x_val, y_val):
(x1, y1, w1, h1) = box1
(x2, y2, w2, h2) = box2
return max(x1, x2) - min(x1, x2) - minx_w(x1, w1, x2, w2) < x_val \
and max(y1, y2) - min(y1, y2) - miny_h(y1, h1, y2, h2) < y_val
def minx_w(x1, w1, x2, w2):
return w1 if x1 <= x2 else w2
def miny_h(y1, h1, y2, h2):
return h1 if y1 <= y2 else h2
def union(a, b):
x = min(a[0], b[0])
y = min(a[1], b[1])
w = max(a[0] + a[2], b[0] + b[2]) - x
h = max(a[1] + a[3], b[1] + b[3]) - y
return x, y, w, h
--> If you have bounding boxes and want to merge along both X and Y directions, use this snippet
--> Adjust x_pixel_value and y_pixel_value to your preferences
--> But for this, you need to have the bounding boxes
import cv2
img = cv2.imread(your image path)
x_pixel_value = 5
y_pixel_value = 6
bboxes_list = [] # your bounding boxes list
rects_used = []
for i in bboxes_list:
rects_used.append(False)
end_bboxes_list = []
for enum,i in enumerate(bboxes_list):
if rects_used[enum] == True:
continue
xmin = i[0]
xmax = i[2]
ymin = i[1]
ymax = i[3]
for enum1,j in enumerate(bboxes_list[(enum+1):], start = (enum+1)):
i_xmin = j[0]
i_xmax = j[2]
i_ymin = j[1]
i_ymax = j[3]
if rects_used[enum1] == False:
if abs(ymin - i_ymin) < x_pixel_value:
if abs(xmin-i_xmax) < y_pixel_value or abs(xmax-i_xmin) < y_pixel_value:
rects_used[enum1] = True
xmin = min(xmin,i_xmin)
xmax = max(xmax,i_xmax)
ymin = min(ymin,i_ymin)
ymax = max(ymax,i_ymax)
final_box = [xmin,ymin,xmax,ymax]
end_bboxes_list.append(final_box)
for i in end_bboxes_list:
cv2.rectangle(img,(i[0],i[1]),(i[2],i[3]), color = [0,255,0], thickness = 2)
cv2.imshow("Image",img)
cv2.waitKey(10000)
cv2.destroyAllWindows()