I want to detect text on x-ray images. The goal is to extract the oriented bounding boxes as a matrix where each row is a detected bounding box and each row contains the coordinates of all four edges i.e. [x1, x2, y1, y2]. I'm using python 3 and OpenCV 4.2.0.
Here is a sample image:
The string "test word", "a" and "b" should be detected.
I followed this OpenCV tutorial about creating rotated boxes for contours and this stackoverflow answer about detecting a text area in an image.
The resulting boundary boxes should look something like this:
I was able to detect the text, but the result included a lot of boxes without text.
Here is what I tried so far:
img = cv2.imread(file_name)
## Open the image, convert it into grayscale and blur it to get rid of the noise.
img2gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY)
image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask)
ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
dilated = cv2.dilate(new_img, kernel, iterations=6)
canny_output = cv2.Canny(dilated, 100, 100 * 2)
cv2.imshow('Canny', canny_output)
## Finds contours and saves them to the vectors contour and hierarchy.
contours, hierarchy = cv2.findContours(canny_output, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Find the rotated rectangles and ellipses for each contour
minRect = [None] * len(contours)
for i, c in enumerate(contours):
minRect[i] = cv2.minAreaRect(c)
# Draw contours + rotated rects + ellipses
drawing = np.zeros((canny_output.shape[0], canny_output.shape[1], 3), dtype=np.uint8)
for i, c in enumerate(contours):
color = (255, 0, 255)
# contour
cv2.drawContours(drawing, contours, i, color)
# rotated rectangle
box = cv2.boxPoints(minRect[i])
box = np.intp(box) # np.intp: Integer used for indexing (same as C ssize_t; normally either int32 or int64)
cv2.drawContours(img, [box], 0, color)
cv2.imshow('Result', img)
cv2.waitKey()
Do I need to run the results through OCR to make sure whether it is text or not? What other approaches should I try?
PS: I'm quite new to computer vision and not familiar with most concepts yet.
Here's a simple approach:
Obtain binary image. Load image, create blank mask, convert to grayscale, Gaussian blur, then Otsu's threshold
Merge text into a single contour. Since we want to extract the text as one piece, we perform morphological operations to connect individual text contours into a single contour.
Extract text. We find contours then filter using contour area with cv2.contourArea and aspect ratio using cv2.arcLength + cv2.approxPolyDP. If a contour passes the filter, we find the rotated bounding box and draw this onto our mask.
Isolate text. We perform an cv2.bitwise_and operation to extract the text.
Here's a visualization of the process. Using this screenshotted input image (since your provided input image was connected as one image):
Input image -> Binary image
Morph close -> Detected text
Isolated text
Results with the other image
Input image -> Binary image + morph close
Detected text -> Isolated text
Code
import cv2
import numpy as np
# Load image, create mask, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
blank = np.zeros(image.shape[:2], dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Merge text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find contours
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
# Filter using contour area and aspect ratio
x,y,w,h = cv2.boundingRect(c)
area = cv2.contourArea(c)
ar = w / float(h)
if (ar > 1.4 and ar < 4) or ar < .85 and area > 10 and area < 500:
# Find rotated bounding box
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(image,[box],0,(36,255,12),2)
cv2.drawContours(blank,[box],0,(255,255,255),-1)
# Bitwise operations to isolate text
extract = cv2.bitwise_and(thresh, blank)
extract = cv2.bitwise_and(original, original, mask=extract)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.imshow('close', close)
cv2.imshow('extract', extract)
cv2.waitKey()
I removed the text using the following comand (after the code of above):
gray2 = cv2.cvtColor(extract, cv2.COLOR_BGR2GRAY)
blur2 = cv2.GaussianBlur(gray2, (5,5), 0)
thresh2 = cv2.threshold(blur2, 0, 255, cv2.THRESH_BINARY)[1]
test = cv2.inpaint(original, thresh2, 7, cv2.INPAINT_TELEA)
Related
I want to retrieve all contours of the image below, but ignore text.
Image:
When I try to find the contours of the current image I get the following:
I have no idea how to go about this as I am new to using OpenCV and image processing. I want to get ignore the text, how can I achieve this? If ignoring is not possible but making a single bounding box surrounding the text is, than that would be good too.
Edit:
Criteria that I need to match:
The contours may very in size and shape.
The colors from the image may differ.
The colors and size of the text inside the image may differ.
Here is one way to do that in Python/OpenCV.
Read the input
Convert to grayscale
Get Canny edges
Apply morphology close to ensure they are closed
Get all contour hierarchy
Filter contours to keep only those above threshold in perimeter
Draw contours on input
Draw each contour on a black background
Save results
Input:
import numpy as np
import cv2
# read input
img = cv2.imread('short_title.png')
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# get canny edges
edges = cv2.Canny(gray, 1, 50)
# apply morphology close to ensure they are closed
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
edges = cv2.morphologyEx(edges, cv2.MORPH_CLOSE, kernel)
# get contours
contours = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
contours = contours[0] if len(contours) == 2 else contours[1]
# filter contours to keep only large ones
result = img.copy()
i = 1
for c in contours:
perimeter = cv2.arcLength(c, True)
if perimeter > 500:
cv2.drawContours(result, c, -1, (0,0,255), 1)
contour_img = np.zeros_like(img, dtype=np.uint8)
cv2.drawContours(contour_img, c, -1, (0,0,255), 1)
cv2.imwrite("short_title_contour_{0}.jpg".format(i),contour_img)
i = i + 1
# save results
cv2.imwrite("short_title_gray.jpg", gray)
cv2.imwrite("short_title_edges.jpg", edges)
cv2.imwrite("short_title_contours.jpg", result)
# show images
cv2.imshow("gray", gray)
cv2.imshow("edges", edges)
cv2.imshow("result", result)
cv2.waitKey(0)
Grayscale:
Edges:
All contours on input:
Contour 1:
Contour 2:
Contour 3:
Contour 4:
Here are two options for erasing the text:
Using pytesseract OCR.
Finding white (and small) connected components.
Both solution build a mask, dilate the mask and use cv2.inpaint for erasing the text.
Using pytesseract:
Find text boxes using pytesseract.image_to_boxes.
Fill the boxes in the mask with 255.
Code sample:
import cv2
import numpy as np
from pytesseract import pytesseract, Output
# Tesseract path
pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
img = cv2.imread('ShortAndInteresting.png')
# https://stackoverflow.com/questions/20831612/getting-the-bounding-box-of-the-recognized-words-using-python-tesseract
boxes = pytesseract.image_to_boxes(img, lang='eng', config=' --psm 6') # Run tesseract, returning the bounding boxes
h, w, _ = img.shape # assumes color image
mask = np.zeros((h, w), np.uint8)
# Fill the bounding boxes on the image
for b in boxes.splitlines():
b = b.split(' ')
mask = cv2.rectangle(mask, (int(b[1]), h - int(b[2])), (int(b[3]), h - int(b[4])), 255, -1)
mask = cv2.dilate(mask, np.ones((5, 5), np.uint8)) # Dilate the boxes in the mask
clean_img = cv2.inpaint(img, mask, 2, cv2.INPAINT_NS) # Remove the text using inpaint (replace the masked pixels with the neighbor pixels).
# Show mask and clean_img for testing
cv2.imshow('mask', mask)
cv2.imshow('clean_img', clean_img)
cv2.waitKey()
cv2.destroyAllWindows()
Mask:
Finding white (and small) connected components:
Use mask = cv2.inRange(img, (230, 230, 230), (255, 255, 255)) for finding the text (assume the text is white).
Finding connected components in the mask using cv2.connectedComponentsWithStats(mask, 4)
Remove large components from the mask - fill components with large area with zeros.
Code sample:
import cv2
import numpy as np
img = cv2.imread('ShortAndInteresting.png')
mask = cv2.inRange(img, (230, 230, 230), (255, 255, 255))
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(mask, 4) # Finding connected components with statistics
# Remove large components from the mask (fill components with large area with zeros).
for i in range(1, nlabel):
area = stats[i, cv2.CC_STAT_AREA] # Get area
if area > 1000:
mask[labels == i] = 0 # Remove large connected components from the mask (fill with zero)
mask = cv2.dilate(mask, np.ones((5, 5), np.uint8)) # Dilate the text in the maks
cv2.imwrite('mask2.png', mask)
clean_img = cv2.inpaint(img, mask, 2, cv2.INPAINT_NS) # Remove the text using inpaint (replace the masked pixels with the neighbor pixels).
# Show mask and clean_img for testing
cv2.imshow('mask', mask)
cv2.imshow('clean_img', clean_img)
cv2.waitKey()
cv2.destroyAllWindows()
Mask:
Clean image:
Note:
My assumption is that you know how to split the image into contours, and the only issue is the present of the text.
I would recommend using flood fill, find the seed point for each color region, flood fill it to ignore the text values within. Hope that helps!
Refer to example of using floodfill here: https://www.programcreek.com/python/example/89425/cv2.floodFill
Example below copied from link above
def fillhole(input_image):
'''
input gray binary image get the filled image by floodfill method
Note: only holes surrounded in the connected regions will be filled.
:param input_image:
:return:
'''
im_flood_fill = input_image.copy()
h, w = input_image.shape[:2]
mask = np.zeros((h + 2, w + 2), np.uint8)
im_flood_fill = im_flood_fill.astype("uint8")
cv.floodFill(im_flood_fill, mask, (0, 0), 255)
im_flood_fill_inv = cv.bitwise_not(im_flood_fill)
img_out = input_image | im_flood_fill_inv
return img_out
This might be a bit too "general" question, but how do I perform GRAYSCALE image segmentation and keep the largest contour? I am trying to remove background noise (i.e. labels) from breast mammograms, but I am not successful. Here is the original image:
First, I applied AGCWD algorithm (based on paper "Efficient Contrast Enhancement Using Adaptive Gamma Correction With Weighting Distribution") in order to get better contrast of the image pixels, like so:
Afterwards, I tried executing following steps:
Image segmentation using OpenCV's KMeans clustering algorithm:
enhanced_image_cpy = enhanced_image.copy()
reshaped_image = np.float32(enhanced_image_cpy.reshape(-1, 1))
number_of_clusters = 10
stop_criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.1)
ret, labels, clusters = cv2.kmeans(reshaped_image, number_of_clusters, None, stop_criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
clusters = np.uint8(clusters)
Canny Edge Detection:
removed_cluster = 1
canny_image = np.copy(enhanced_image_cpy).reshape((-1, 1))
canny_image[labels.flatten() == removed_cluster] = [0]
canny_image = cv2.Canny(canny_image,100,200).reshape(enhanced_image_cpy.shape)
show_images([canny_image])
Find and Draw Contours:
initial_contours_image = np.copy(canny_image)
initial_contours_image_bgr = cv2.cvtColor(initial_contours_image, cv2.COLOR_GRAY2BGR)
_, thresh = cv2.threshold(initial_contours_image, 50, 255, 0)
contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(initial_contours_image_bgr, contours, -1, (255,0,0), cv2.CHAIN_APPROX_SIMPLE)
show_images([initial_contours_image_bgr])
Here is how image looks after I draw 44004 contours:
I am not sure how can I get one BIG contour, instead of 44004 small ones. Any ideas how to fix my approach, or possibly any ideas on using alternative approach to get rid of label in top right corner.
Thanks in advance!
Here is one way to do that in Python OpenCV
Read the image
Threshold and invert so the borders are black
Remove the borders of the image as follows (so as to make it easier to get the relevant contours later):
Count the number of non-zero pixels in each column and find the first and last column that have counts greater than 0
Count the number of non-zero pixels in each row and find the first and last row that have counts greater than 0
Crop the image to remove the borders
Crop thresh1 and invert to make thresh2
Get the external contours from thresh2
Find the largest contour and draw as white filled on a black background as a mask
Make all pixels in the cropped image black where the mask is black
Save the results -
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('xray3.png')
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold and invert
thresh1 = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY)[1]
thresh1 = 255 - thresh1
# remove borders
# count number of white pixels in columns as new 1D array
count_cols = np.count_nonzero(thresh1, axis=0)
# get first and last x coordinate where black
first_x = np.where(count_cols>0)[0][0]
last_x = np.where(count_cols>0)[0][-1]
print(first_x,last_x)
# count number of white pixels in rows as new 1D array
count_rows = np.count_nonzero(thresh1, axis=1)
# get first and last y coordinate where black
first_y = np.where(count_rows>0)[0][0]
last_y = np.where(count_rows>0)[0][-1]
print(first_y,last_y)
# crop image
crop = img[first_y:last_y+1, first_x:last_x+1]
# crop thresh1 and invert
thresh2 = thresh1[first_y:last_y+1, first_x:last_x+1]
thresh2 = 255 - thresh2
# get external contours and keep largest one
contours = cv2.findContours(thresh2, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
# make mask from contour
mask = np.zeros_like(thresh2 , dtype=np.uint8)
cv2.drawContours(mask, [big_contour], 0, 255, -1)
# make crop black everywhere except where largest contour is white in mask
result = crop.copy()
result[mask==0] = (0,0,0)
# write result to disk
cv2.imwrite("xray3_thresh1.jpg", thresh1)
cv2.imwrite("xray3_crop.jpg", crop)
cv2.imwrite("xray3_thresh2.jpg", thresh2)
cv2.imwrite("xray3_mask.jpg", mask)
cv2.imwrite("xray3_result.png", result)
# display it
cv2.imshow("thresh1", thresh1)
cv2.imshow("crop", crop)
cv2.imshow("thresh2", thresh2)
cv2.imshow("mask", mask)
cv2.imshow("result", result)
cv2.waitKey(0)
Threshold 1 image:
Cropped image:
Threshold 2 image:
Mask image:
Result:
I have an image which looks something like this:
here is the image
I need to make borders around the points such that they are divided into clusters. For example, the center of the image is one region. One other region can be the top of the image. How can I achieve this, with python preferably?
Here is one way to do that in Python/OpenCV.
- Read the input as unchanged, since it has transparency
- Separate the base image and the alpha channel
- Mask the base image with the alpha channel so as to make the white outer region with the text into all black
- Convert that image into grayscale and then into black/white
- Apply morphology close to connect all the dots in the regions
- Find all contours larger than some minimum area
- Draw the contours on the base image
- Save the results
Input:
import cv2
import numpy as np
# read image with transparency
image = cv2.imread("dots.png", cv2.IMREAD_UNCHANGED)
# separate base image and alpha channel and make background under transparency into black to remove white border and text
base = image[:,:,0:3]
alpha = image[:,:,3]
alpha = cv2.merge([alpha,alpha,alpha])
img = cv2.bitwise_and(base, alpha)
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#do threshold on gray image
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY)[1]
# apply morphology close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 15))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# Get contours
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
result = base.copy()
for c in cnts:
area = cv2.contourArea(c)
if area > 100:
cv2.drawContours(result, [c], -1, (0, 255, 0), 1)
# display it
cv2.imshow("BASE", base)
cv2.imshow("BLACKENED", img)
cv2.imshow("CLOSED", close)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
# write results to disk
cv2.imwrite("dots_blackened.png", img)
cv2.imwrite("dots_closed.png", close)
cv2.imwrite("dots_clusters.png", result)
Base Image with transparency blackened:
Morphology Close Image:
Contours on base image:
I pieced together a quick algorithm in python to get the input boxes from a handwritten invoice.
# some preprocessing
img = np.copy(orig_img)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
img = cv2.GaussianBlur(img,(5,5),0)
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# get contours
contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for i, cnt in enumerate(contours):
approx = cv2.approxPolyDP(cnt, 0.01*cv2.arcLength(cnt,True), True)
if len(approx) == 4:
cv2.drawContours(orig_img, contours, i, (0, 255, 0), 2)
It fails to get the 2nd one in this example because the handwriting crosses the box boundary.
Note that this picture could be taken with a mobile phone, so aspect ratios may be a little funny.
So, what are some neat recipes to get around my problem?
And as a bonus. These boxes are from an A4 page with a lot of other stuff going on. Would you recommend a whole different approach to getting the handwritten numbers out?
EDIT
This might be interesting. If I don't filter for 4 sided polys, I get the contours but they go all around the hand-drawn digit. Maybe there's a way to make contours have water-like cohesion so that they pinch off when they get close to themselves?
FURTHER EDIT
Here is the original image without bounding boxes drawn on
Here's a potential solution:
Obtain binary image. We load the image, convert to grayscale, apply a Gaussian blur, and then Otsu's threshold
Detect horizontal lines. We create a horizontal kernel and draw detected horizontal lines onto a mask
Detect vertical lines. We create a vertical kernel and draw detected vertical lines onto a mask
Perform morphological opening. We create a rectangular kernel and perform morph opening to smooth out noise and separate any connected contours
Find contours, draw rectangle, and extract ROI. We find contours and draw the bounding rectangle onto the image
Here's a visualization of each step:
Binary image
Detected horizontal and vertical lines drawn onto a mask
Morphological opening
Result
Individual extracted saved ROI
Note: To extract only the hand written numbers/letters out of each ROI, take a look at a previous answer in Remove borders from image but keep text written on borders (preprocessing before OCR)
Code
import cv2
import numpy as np
# Load image, grayscale, blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Find horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (50,1))
detect_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2)
cnts = cv2.findContours(detect_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255), 3)
# Find vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,50))
detect_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2)
cnts = cv2.findContours(detect_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv2.drawContours(mask, [c], -1, (255,255,255), 3)
# Morph open
mask = cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=1)
# Draw rectangle and save each ROI
number = 0
cnts = cv2.findContours(opening, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
ROI = original[y:y+h, x:x+w]
cv2.imwrite('ROI_{}.png'.format(number), ROI)
number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('mask', mask)
cv2.imshow('opening', opening)
cv2.imshow('image', image)
cv2.waitKey()
Since the squares have a quite straight lines, it's good to use Hough transform:
1- Make the image grayscale, then do an Otsu threshold on it, then reverse the binary image
2- Do Hough transform (HoughLinesP) and draw the lines on a new image
3- With findContours and drawContours, make the 3 roi clean
4- Erode the final image a little to make the boxes neater
I wrote the code in C++, it's easily convertible to python:
Mat img = imread("D:/1.jpg", 0);
threshold(img, img, 0, 255, THRESH_OTSU);
imshow("Binary image", img);
img = 255 - img;
imshow("Reversed binary image", img);
Mat img_1 = Mat::zeros(img.size(), CV_8U);
Mat img_2 = Mat::zeros(img.size(), CV_8U);
vector<Vec4i> lines;
HoughLinesP(img, lines, 1, 0.1, 95, 10, 1);
for (size_t i = 0; i < lines.size(); i++)
line(img_1, Point(lines[i][0], lines[i][1]), Point(lines[i][2], lines[i][3]),
Scalar(255, 255, 255), 2, 8);
imshow("Hough Lines", img_1);
vector<vector<Point>> contours;
findContours(img_1,contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
for (int i = 0; i< contours.size(); i++)
drawContours(img_2, contours, i, Scalar(255, 255, 255), -1);
imshow("final result after drawcontours", img_2); waitKey(0);
Thank you to those who shared solutions. I ended up taking a slightly different path in the end.
Grayscale, Gaussian Blur, Otsu threshold
Get contours
Filter contours by aspect ratio and extent
Return the minimum upright bounding box of the contour.
Remove any bounding boxes that encapsulate smaller bounding boxes (because you get two boxes, one for the inside contour, and one for the outside).
Here's the code if anyone's interested (except for step 5 - that was just basic numpy manipulation)
orig_img = cv2.imread('example0.jpg')
img = np.copy(orig_img)
img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
img = cv2.GaussianBlur(img,(5,5),0)
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
contours, hierarchy = cv2.findContours(img, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
boxes = list()
for i, cnt in enumerate(contours):
x,y,w,h = cv2.boundingRect(cnt)
aspect_ratio = float(w)/h
area = cv2.contourArea(cnt)
rect_area = w*h
extent = float(area)/rect_area
if abs(aspect_ratio - 1) < 0.1 and extent > 0.7:
boxes.append((x,y,w,h))
And here's an example of what came out when cutting out the boundary boxes from the original image.
I am trying to extract object from an image using the color using OpenCV, I have tried by inverse thresholding and grayscale combined with cv2.findContours() but I am unable to use it recursively. Furthermore I can't figure out how to "cut out" the match from the original image and save it to a single file.
EDIT
~
import cv2
import numpy as np
# load the images
empty = cv2.imread("empty.jpg")
full = cv2.imread("test.jpg")
# save color copy for visualization
full_c = full.copy()
# convert to grayscale
empty_g = cv2.cvtColor(empty, cv2.COLOR_BGR2GRAY)
full_g = cv2.cvtColor(full, cv2.COLOR_BGR2GRAY)
empty_g = cv2.GaussianBlur(empty_g, (51, 51), 0)
full_g = cv2.GaussianBlur(full_g, (51, 51), 0)
diff = full_g - empty_g
# thresholding
diff_th =
cv2.adaptiveThreshold(full_g,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
# combine the difference image and the inverse threshold
zone = cv2.bitwise_and(diff, diff_th, None)
# threshold to get the mask instead of gray pixels
_, zone = cv2.threshold(bag, 100, 255, 0)
# dilate to account for the blurring in the beginning
kernel = np.ones((15, 15), np.uint8)
bag = cv2.dilate(bag, kernel, iterations=1)
# find contours, sort and draw the biggest one
contours, _ = cv2.findContours(bag, cv2.RETR_TREE,
cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:3]
i = 0
while i < len(contours):
x, y, width, height = cv2.boundingRect(contours[i])
roi = full_c[y:y+height, x:x+width]
cv2.imwrite("piece"+str(i)+".png", roi)
i += 1
Where empty is just a white image size 1500 * 1000 as the one above and test is the one above.
This is what I came up with, only downside, I have a third image instead of only the 2 expected showing a shadow zone now...
Here's a simple approach:
Obtain binary image. Load the image, grayscale, Gaussian blur, Otsu's threshold, then dilate to obtain a binary black/white image.
Extract ROI. Find contours, obtain bounding boxes, extract ROI using Numpy slicing, and save each ROI
Binary image (Otsu's thresholding + dilation)
Detected ROIs highlighted in green
To extract each ROI, you can find the bounding box coordinates using cv2.boundingRect(), crop the desired region, then save the image
x,y,w,h = cv2.boundingRect(c)
ROI = original[y:y+h, x:x+w]
First object
Second object
import cv2
# Load image, grayscale, Gaussian blur, Otsu's threshold, dilate
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
dilate = cv2.dilate(thresh, kernel, iterations=1)
# Find contours, obtain bounding box coordinates, and extract ROI
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
image_number = 0
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
ROI = original[y:y+h, x:x+w]
cv2.imwrite("ROI_{}.png".format(image_number), ROI)
image_number += 1
cv2.imshow('image', image)
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.waitKey()