I have a small script (GitHub) (based on this answer) to detect objects on a white background. The script is working fine and detects the objects. For example, this image:
becomes this:
and I crop the boundingRect (red one).
I'll be doing further operations on this image. For example instead of a rectangle crop, I will be cropping just the contour. (Anyway, these are further problems to be faced.)
What I want to do, now, is scale up/grow the contour (green one). I'm not sure if scale and grow means the same thing in this context, because when I think of scale, there's usually a single point of origin/anchor point. With grow, it's relative to the edges. I want to have something like this (created in Photoshop):
So after I detect the object/find contours, I want to grow it by some value/ratio, so that I have some space/pixels to modify which won't affect the object. How can I do that?
Mentioned script:
# drop an image on this script file
img_path = Path(sys.argv[1])
# open image with Pillow and convert it to RGB if the image is CMYK
img = Image.open(str(img_path))
if img.mode == "CMYK":
img = ImageCms.profileToProfile(img, "Color Profiles\\USWebCoatedSWOP.icc", "Color Profiles\\sRGB_Color_Space_Profile.icm", outputMode="RGB")
img = cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshed = cv2.threshold(gray, 240, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11,11))
morphed = cv2.morphologyEx(threshed, cv2.MORPH_CLOSE, kernel)
contours = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[-2]
contour = sorted(contours, key=cv2.contourArea)[-1]
x, y, w, h = cv2.boundingRect(contour)
final = cv2.drawContours(img, contours, -1, (0,255,0), 2)
cv2.rectangle(final, (x,y), (x+w,y+h), (0,0,255), 2)
cv2.imshow("final", final)
Images posted here are scaled down to keep the question short. Original images and the script(s) can be found on the mentioned (first paragraph) GitHub page.
Thanks to HansHirse's suggestion (using morphological dilation), I've managed to make it work.
img_path = Path(sys.argv[1])
def cmyk_to_rgb(cmyk_img):
img = Image.open(cmyk_img)
if img.mode == "CMYK":
img = ImageCms.profileToProfile(img, "Color Profiles\\USWebCoatedSWOP.icc", "Color Profiles\\sRGB_Color_Space_Profile.icm", outputMode="RGB")
return cv2.cvtColor(numpy.array(img), cv2.COLOR_RGB2BGR)
def cv_threshold(img, thresh=128, maxval=255, type=cv2.THRESH_BINARY):
if len(img.shape) == 3:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshed = cv2.threshold(img, thresh, maxval, type)[1]
return threshed
def find_contours(img, to_gray=None):
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (11,11))
morphed = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
contours = cv2.findContours(morphed, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
return contours[-2]
def mask_from_contours(ref_img, contours):
mask = numpy.zeros(ref_img.shape, numpy.uint8)
mask = cv2.drawContours(mask, contours, -1, (255,255,255), -1)
return cv2.cvtColor(mask, cv2.COLOR_BGR2GRAY)
def dilate_mask(mask, kernel_size=10):
kernel = numpy.ones((kernel_size,kernel_size), numpy.uint8)
dilated = cv2.dilate(mask, kernel, iterations=1)
return dilated
def draw_contours(src_img, contours):
canvas = cv2.drawContours(src_img.copy(), contours, -1, (0,255,0), 2)
x, y, w, h = cv2.boundingRect(contours[-1])
cv2.rectangle(canvas, (x,y), (x+w,y+h), (0,0,255), 2)
return canvas
orig_img = cmyk_to_rgb(str(img_path))
orig_threshed = cv_threshold(orig_img, 240, type=cv2.THRESH_BINARY_INV)
orig_contours = find_contours(orig_threshed)
orig_mask = mask_from_contours(orig_img, orig_contours)
orig_output = draw_contours(orig_img, orig_contours)
dilated_mask = dilate_mask(orig_mask, 50)
dilated_contours = find_contours(dilated_mask)
dilated_output = draw_contours(orig_img, dilated_contours)
cv2.imshow("orig_output", orig_output)
cv2.imshow("dilated_output", dilated_output)
I believe the code is self-explonatory enough. An example output:
Full script (again) can be found at show_dilated_contours.py
As a bonus, later I wanted to smooth the contours. I've came across this blog post in which the author talks about how to smooth the edges of a shape (in Photoshop). The idea is really simple and can also be applied in OpenCV to smooth the contours. The steps are:
Create a mask from contours (or from the shape)
Blur the mask
Threshold the blurred mask (now, we have a smoother mask than the mask in step 1)
Find the contours again on the blurred + thresholded image. Since the mask/shape is smoother, we'll get smoother contours.
Example code and output:
# ... continuing previos code
# pass 1
smooth_mask_blurred = cv2.GaussianBlur(dilated_mask, (21,21), 0)
smooth_mask_threshed1 = cv_threshold(smooth_mask_blurred)
# pass 2
smooth_mask_blurred = cv2.GaussianBlur(smooth_mask_threshed1, (21,21), 0)
smooth_mask_threshed2 = cv_threshold(smooth_mask_blurred)
# find contours from smoothened mask
smooth_mask_contours = find_contours(smooth_mask_threshed2)
# draw the contours on the original image
smooth_mask_output = draw_contours(orig_img, smooth_mask_contours)
cv2.imshow("dilated_output", dilated_output)
cv2.imshow("smooth_mask_output", smooth_mask_output)
Full code at show_smooth_contours.py.
the task I want to do looks pretty simple: I take as input several images with an object centered in the photo and a little color chart needed for other purposes. My code normally works for the majority of the cases, but sometimes fails miserably and I just can't understand why.
For example (these are the source images), it works correctly on this https://imgur.com/PHfIqcb but not on this https://imgur.com/qghzO3V
Here's the code of the interested part:
img = cv2.imread(path)
height, width, channel = img.shape
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((31, 31), np.uint8)
dil = cv2.dilate(gray, kernel, iterations=1)
_, th = cv2.threshold(dil, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
th_er1 = cv2.bitwise_not(th)
_, contours, _= cv2.findContours(th_er1, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
max_index = np.argmax(areas)
x,y,w,h = cv2.boundingRect(cnt)
After that I'm just going to crop the image accordingly to the given results (getting the biggest rectangle contour), basically cutting off the photo only the main object.
But as I said, using very similar images sometimes works and sometimes not.
Thank you in advance.
maybe you could try not using otsu's method, and just set threshold manually, if it's possible... ;)
You can use the Canny edge detector. In the two images, there is a good threshold value to isolate the object in the center of the image. After applying the threshold, we blur the results and apply the Canny edge detector before finding the contours:
import cv2
import numpy as np
def process(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(img_gray, 190, 255, cv2.THRESH_BINARY_INV)
img_blur = cv2.GaussianBlur(thresh, (3, 3), 1)
img_canny = cv2.Canny(img_blur, 0, 0)
kernel = np.ones((5, 5))
img_dilate = cv2.dilate(img_canny, kernel, iterations=1)
return cv2.erode(img_dilate, kernel, iterations=1)
def get_contours(img):
contours, hierarchies = cv2.findContours(process(img), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key=cv2.contourArea)
cv2.drawContours(img, [cnt], -1, (0, 255, 0), 30)
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 30)
img = cv2.imread("image.jpeg")
cv2.imshow("Result", img)
Input images:
Output images:
The green outlines are the contours of the objects, and the red outlines are the bounding boxes of the objects.
I want to detect text on x-ray images. The goal is to extract the oriented bounding boxes as a matrix where each row is a detected bounding box and each row contains the coordinates of all four edges i.e. [x1, x2, y1, y2]. I'm using python 3 and OpenCV 4.2.0.
Here is a sample image:
The string "test word", "a" and "b" should be detected.
I followed this OpenCV tutorial about creating rotated boxes for contours and this stackoverflow answer about detecting a text area in an image.
The resulting boundary boxes should look something like this:
I was able to detect the text, but the result included a lot of boxes without text.
Here is what I tried so far:
img = cv2.imread(file_name)
## Open the image, convert it into grayscale and blur it to get rid of the noise.
img2gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
ret, mask = cv2.threshold(img2gray, 180, 255, cv2.THRESH_BINARY)
image_final = cv2.bitwise_and(img2gray, img2gray, mask=mask)
ret, new_img = cv2.threshold(image_final, 180, 255, cv2.THRESH_BINARY) # for black text , cv.THRESH_BINARY_INV
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
dilated = cv2.dilate(new_img, kernel, iterations=6)
canny_output = cv2.Canny(dilated, 100, 100 * 2)
cv2.imshow('Canny', canny_output)
## Finds contours and saves them to the vectors contour and hierarchy.
contours, hierarchy = cv2.findContours(canny_output, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Find the rotated rectangles and ellipses for each contour
minRect = [None] * len(contours)
for i, c in enumerate(contours):
minRect[i] = cv2.minAreaRect(c)
# Draw contours + rotated rects + ellipses
drawing = np.zeros((canny_output.shape[0], canny_output.shape[1], 3), dtype=np.uint8)
for i, c in enumerate(contours):
color = (255, 0, 255)
# contour
cv2.drawContours(drawing, contours, i, color)
# rotated rectangle
box = cv2.boxPoints(minRect[i])
box = np.intp(box) # np.intp: Integer used for indexing (same as C ssize_t; normally either int32 or int64)
cv2.drawContours(img, [box], 0, color)
cv2.imshow('Result', img)
Do I need to run the results through OCR to make sure whether it is text or not? What other approaches should I try?
PS: I'm quite new to computer vision and not familiar with most concepts yet.
Here's a simple approach:
Obtain binary image. Load image, create blank mask, convert to grayscale, Gaussian blur, then Otsu's threshold
Merge text into a single contour. Since we want to extract the text as one piece, we perform morphological operations to connect individual text contours into a single contour.
Extract text. We find contours then filter using contour area with cv2.contourArea and aspect ratio using cv2.arcLength + cv2.approxPolyDP. If a contour passes the filter, we find the rotated bounding box and draw this onto our mask.
Isolate text. We perform an cv2.bitwise_and operation to extract the text.
Here's a visualization of the process. Using this screenshotted input image (since your provided input image was connected as one image):
Input image -> Binary image
Morph close -> Detected text
Isolated text
Results with the other image
Input image -> Binary image + morph close
Detected text -> Isolated text
import cv2
import numpy as np
# Load image, create mask, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
original = image.copy()
blank = np.zeros(image.shape[:2], dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Merge text into a single contour
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)
# Find contours
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
# Filter using contour area and aspect ratio
x,y,w,h = cv2.boundingRect(c)
area = cv2.contourArea(c)
ar = w / float(h)
if (ar > 1.4 and ar < 4) or ar < .85 and area > 10 and area < 500:
# Find rotated bounding box
rect = cv2.minAreaRect(c)
box = cv2.boxPoints(rect)
box = np.int0(box)
# Bitwise operations to isolate text
extract = cv2.bitwise_and(thresh, blank)
extract = cv2.bitwise_and(original, original, mask=extract)
cv2.imshow('thresh', thresh)
cv2.imshow('image', image)
cv2.imshow('close', close)
cv2.imshow('extract', extract)
I removed the text using the following comand (after the code of above):
gray2 = cv2.cvtColor(extract, cv2.COLOR_BGR2GRAY)
blur2 = cv2.GaussianBlur(gray2, (5,5), 0)
thresh2 = cv2.threshold(blur2, 0, 255, cv2.THRESH_BINARY)[1]
test = cv2.inpaint(original, thresh2, 7, cv2.INPAINT_TELEA)
I'm using OpenCV 4 - python 3 - to find an specific area in a black & white image.
This area is not a 100% filled shape. It may hame some gaps between the white lines.
This is the base image from where I start processing:
This is the rectangle I expect - made with photoshop -:
Results I got with hough transform lines - not accurate -
So basically, I start from the first image and I expect to find what you see in the second one.
Any idea of how to get the rectangle of the second image?
I'd like to present an approach which might be computationally less expensive than the solution in fmw42's answer only using NumPy's nonzero function. Basically, all non-zero indices for both axes are found, and then the minima and maxima are obtained. Since we have binary images here, this approach works pretty well.
Let's have a look at the following code:
import cv2
import numpy as np
# Read image as grayscale; threshold to get rid of artifacts
_, img = cv2.threshold(cv2.imread('images/LXSsV.png', cv2.IMREAD_GRAYSCALE), 0, 255, cv2.THRESH_BINARY)
# Get indices of all non-zero elements
nz = np.nonzero(img)
# Find minimum and maximum x and y indices
y_min = np.min(nz[0])
y_max = np.max(nz[0])
x_min = np.min(nz[1])
x_max = np.max(nz[1])
# Create some output
output = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
cv2.rectangle(output, (x_min, y_min), (x_max, y_max), (0, 0, 255), 2)
# Show results
cv2.imshow('img', img)
cv2.imshow('output', output)
I borrowed the cropped image from fmw42's answer as input, and my output should be the same (or most similar):
Hope that (also) helps!
In Python/OpenCV, you can use morphology to connect all the white parts of your image and then get the outer contour. Note I have modified your image to remove the parts at the top and bottom from your screen snap.
import cv2
import numpy as np
# read image as grayscale
img = cv2.imread('blackbox.png')
# convert to grayscale
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# threshold
_,thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY)
# apply close to connect the white areas
kernel = np.ones((75,75), np.uint8)
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# get contours (presumably just one around the outside)
result = img.copy()
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
for cntr in contours:
x,y,w,h = cv2.boundingRect(cntr)
cv2.rectangle(result, (x, y), (x+w, y+h), (0, 0, 255), 2)
# show thresh and result
cv2.imshow("thresh", thresh)
cv2.imshow("Bounding Box", result)
# save resulting images
Image after morphology:
Here's a slight modification to #fmw42's answer. The idea is connect the desired regions into a single contour is very similar however you can find the bounding rectangle directly since there's only one object. Using the same cropped input image, here's the result.
We can optionally extract the ROI too
import cv2
# Grayscale, threshold, and dilate
image = cv2.imread('3.png')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Connect into a single contour and find rect
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=1)
x,y,w,h = cv2.boundingRect(dilate)
ROI = original[y:y+h,x:x+w]
cv2.rectangle(image, (x, y), (x+w, y+h), (36, 255, 12), 2)
cv2.imshow('image', image)
cv2.imshow('ROI', ROI)
I am trying to extract object from an image using the color using OpenCV, I have tried by inverse thresholding and grayscale combined with cv2.findContours() but I am unable to use it recursively. Furthermore I can't figure out how to "cut out" the match from the original image and save it to a single file.
import cv2
import numpy as np
# load the images
empty = cv2.imread("empty.jpg")
full = cv2.imread("test.jpg")
# save color copy for visualization
full_c = full.copy()
# convert to grayscale
empty_g = cv2.cvtColor(empty, cv2.COLOR_BGR2GRAY)
full_g = cv2.cvtColor(full, cv2.COLOR_BGR2GRAY)
empty_g = cv2.GaussianBlur(empty_g, (51, 51), 0)
full_g = cv2.GaussianBlur(full_g, (51, 51), 0)
diff = full_g - empty_g
# thresholding
diff_th =
# combine the difference image and the inverse threshold
zone = cv2.bitwise_and(diff, diff_th, None)
# threshold to get the mask instead of gray pixels
_, zone = cv2.threshold(bag, 100, 255, 0)
# dilate to account for the blurring in the beginning
kernel = np.ones((15, 15), np.uint8)
bag = cv2.dilate(bag, kernel, iterations=1)
# find contours, sort and draw the biggest one
contours, _ = cv2.findContours(bag, cv2.RETR_TREE,
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:3]
i = 0
while i < len(contours):
x, y, width, height = cv2.boundingRect(contours[i])
roi = full_c[y:y+height, x:x+width]
cv2.imwrite("piece"+str(i)+".png", roi)
i += 1
Where empty is just a white image size 1500 * 1000 as the one above and test is the one above.
This is what I came up with, only downside, I have a third image instead of only the 2 expected showing a shadow zone now...
Here's a simple approach:
Obtain binary image. Load the image, grayscale, Gaussian blur, Otsu's threshold, then dilate to obtain a binary black/white image.
Extract ROI. Find contours, obtain bounding boxes, extract ROI using Numpy slicing, and save each ROI
Binary image (Otsu's thresholding + dilation)
Detected ROIs highlighted in green
To extract each ROI, you can find the bounding box coordinates using cv2.boundingRect(), crop the desired region, then save the image
x,y,w,h = cv2.boundingRect(c)
ROI = original[y:y+h, x:x+w]
First object
Second object
import cv2
# Load image, grayscale, Gaussian blur, Otsu's threshold, dilate
image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5,5), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
dilate = cv2.dilate(thresh, kernel, iterations=1)
# Find contours, obtain bounding box coordinates, and extract ROI
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
image_number = 0
for c in cnts:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
ROI = original[y:y+h, x:x+w]
cv2.imwrite("ROI_{}.png".format(image_number), ROI)
image_number += 1
cv2.imshow('image', image)
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
I'm building a simple OCR, I'm facing a problem of not being to crop the letters after segmenting them using OpenCV. Can anyone help me with a simple way to crop the letters?
Here's the segmenting code.
import cv2
import numpy as np
mser = cv2.MSER_create()
# original image
# -1 loads as-is so if it will be 3 or 4 channel as the original
image = cv2.imread('1.jpg', -1)
# mask defaulting to black for 3-channel and transparent for 4-channel
# (of course replace corners with yours)
mask = np.zeros(image.shape, dtype=np.uint8)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
vis = image.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
channel_count = image.shape[2] # i.e. 3 or 4 depending on your image
ignore_mask_color = (255,)*channel_count
cv2.fillConvexPoly(mask, hulls, ignore_mask_color)
# from Masterfool: use cv2.fillConvexPoly if you know it's convex
masked_image = cv2.bitwise_and(vis, hulls)
#for m in range(len(hulls)):
#masked_image = cv2.bitwise_and(vis, ignore_mask_color)
# save the result
#cv2.imwrite('img'+m, masked_image)
This results:
I need each letter to be cropped using the same hulls. Any help?
You can't crop and directly save the hulls as you can see them in the example you posted. Or, better, you can crop and paste them in a square/rectangle canvas. But it's not the answer you want for this question.
So, if you have all the text which is computer written, best option to begin is to apply cv2.findContours() to the image. There are also other specific tools you can use, but for now (and relatively to this question) use this.
import cv2
import numpy as np
#import image
image = cv2.imread('image.png')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', gray)
ret,thresh = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV)
cv2.imshow('second', thresh)
kernel = np.ones((1,1), np.uint8)
img_dilation = cv2.dilate(thresh, kernel, iterations=1)
cv2.imshow('dilated', img_dilation)
#find contours
im2,ctrs, hier = cv2.findContours(img_dilation.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
#sort contours
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])
for i, ctr in enumerate(sorted_ctrs):
# Get bounding box
x, y, w, h = cv2.boundingRect(ctr)
# Getting ROI
roi = image[y:y+h, x:x+w]
# show ROI
#cv2.imshow('segment no:'+str(i),roi)
cv2.rectangle(image,(x,y),( x + w, y + h ),(0,255,0),2)
if w > 15 and h > 15:
cv2.imwrite('roi{}.png'.format(i), roi)
cv2.imshow('marked areas',image)
You can tweak the kernel for more or less wide of the rectangle detection.