I am trying to get the corners of the box in image. Following are example images, their threshold results and on the right after the arrow are the results that I need. You might have seen these images before too on slack because I am using these images for my example questions on slack.
Following is the code that allows me reach till the middle image.
import cv2
import numpy as np
img_file = 'C:/Users/box.jpg'
img = cv2.imread(img_file, cv2.IMREAD_COLOR)
img = cv2.blur(img, (5, 5))
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h, s, v = cv2.split(hsv)
thresh0 = cv2.adaptiveThreshold(s, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
thresh1 = cv2.adaptiveThreshold(v, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
thresh2 = cv2.adaptiveThreshold(v, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
thresh = cv2.bitwise_or(thresh0, thresh1)
cv2.imshow('Image-thresh0', thresh0)
cv2.waitKey(0)
cv2.imshow('Image-thresh1', thresh1)
cv2.waitKey(0)
cv2.imshow('Image-thresh2', thresh2)
cv2.waitKey(0)
Is there any method in opencv that can do it for me. I tried dilation cv2.dilate() and erosion cv2.erode() but it doesn't work in my cases.Or if not then what could be alternative ways of doing it ?
Thanks
Canny version of the image ... On the left with low threshold and on the right with high threshold
Below is a python implementation of #dhanushka's approach
import cv2
import numpy as np
# load color image
im = cv2.imread('input.jpg')
# smooth the image with alternative closing and opening
# with an enlarging kernel
morph = im.copy()
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 1))
morph = cv2.morphologyEx(morph, cv2.MORPH_CLOSE, kernel)
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 2))
# take morphological gradient
gradient_image = cv2.morphologyEx(morph, cv2.MORPH_GRADIENT, kernel)
# split the gradient image into channels
image_channels = np.split(np.asarray(gradient_image), 3, axis=2)
channel_height, channel_width, _ = image_channels[0].shape
# apply Otsu threshold to each channel
for i in range(0, 3):
_, image_channels[i] = cv2.threshold(~image_channels[i], 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY)
image_channels[i] = np.reshape(image_channels[i], newshape=(channel_height, channel_width, 1))
# merge the channels
image_channels = np.concatenate((image_channels[0], image_channels[1], image_channels[2]), axis=2)
# save the denoised image
cv2.imwrite('output.jpg', image_channels)
The above code doesn't give good results if the image you are dealing are invoices(or has large amount of text on a white background).
In order to get good results on such images, remove
gradient_image = cv2.morphologyEx(morph, cv2.MORPH_GRADIENT, kernel)
and pass morph obj to the split function and remove the ~ symbol inside for loop
You can smooth the image to some degree by applying alternative morphological closing and opening operations with an enlarging structuring element.Here are the original and smoothed versions.
Then take the morphological gradient of the image.
Then apply Otsu threshold to each of the channels, and merge those channels.
If your image sizes are different (larger), you might want to either change some of the parameters of the code or resize the images roughly to the sizes used here. The code is in c++ but it won't be difficult to port it to python.
/* load color image */
Mat im = imread(INPUT_FOLDER_PATH + string("2.jpg"));
/*
smooth the image with alternative closing and opening
with an enlarging kernel
*/
Mat morph = im.clone();
for (int r = 1; r < 4; r++)
{
Mat kernel = getStructuringElement(MORPH_ELLIPSE, Size(2*r+1, 2*r+1));
morphologyEx(morph, morph, CV_MOP_CLOSE, kernel);
morphologyEx(morph, morph, CV_MOP_OPEN, kernel);
}
/* take morphological gradient */
Mat mgrad;
Mat kernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3));
morphologyEx(morph, mgrad, CV_MOP_GRADIENT, kernel);
Mat ch[3], merged;
/* split the gradient image into channels */
split(mgrad, ch);
/* apply Otsu threshold to each channel */
threshold(ch[0], ch[0], 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
threshold(ch[1], ch[1], 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
threshold(ch[2], ch[2], 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU);
/* merge the channels */
merge(ch, 3, merged);
Not sure about how robust that solution will be but the idea is pretty simple. The edges of the box should be more pronounced than all the other high frequencies on those images. Thus using some basic preprocessing should allow to emphasize them.
I used your code to make a prototype but the contour finding doesn't have to be the right path. Also sorry for the iterative unsharp masking - didn't have time to adjust the parameters.
import cv2
import numpy as np
def unsharp_mask(img, blur_size = (9,9), imgWeight = 1.5, gaussianWeight = -0.5):
gaussian = cv2.GaussianBlur(img, (5,5), 0)
return cv2.addWeighted(img, imgWeight, gaussian, gaussianWeight, 0)
img_file = 'box.png'
img = cv2.imread(img_file, cv2.IMREAD_COLOR)
img = cv2.blur(img, (5, 5))
img = unsharp_mask(img)
img = unsharp_mask(img)
img = unsharp_mask(img)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h, s, v = cv2.split(hsv)
thresh = cv2.adaptiveThreshold(s, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
_, contours, heirarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(contours, key = cv2.contourArea, reverse = True)
#for cnt in cnts:
canvas_for_contours = thresh.copy()
cv2.drawContours(thresh, cnts[:-1], 0, (0,255,0), 3)
cv2.drawContours(canvas_for_contours, contours, 0, (0,255,0), 3)
cv2.imshow('Result', canvas_for_contours - thresh)
cv2.imwrite("result.jpg", canvas_for_contours - thresh)
cv2.waitKey(0)
method 1: using AI models
always try image segmentation models if feasible to your project, robust models will work better on a wider domain than any thresholding technique.
for example Rembg , try online on a Huggingface space
here are the results:
method 2:
almost similar to other answers but with another approach.
instead of closing and opening to blur the "noise", we use cv2.bilateralFilter which is similar to photoshop's surface blur, read more
im = cv2.imread('1.png')
blur = cv2.bilateralFilter(im,21,75,75)
use sobel filter to find edges
from skimage.filters import sobel
gray = cv2.cvtColor(blur,cv2.COLOR_BGR2GRAY)
mm = sobel(gray)
mm = ((mm/mm.max())*255).astype('uint8')
apply thresholding, I use Sauvola Thresholding here:
from skimage.filters import threshold_sauvola
mm2 = np.invert(mm)
thresh_sauvola = threshold_sauvola(mm2, window_size=51)
th = mm2 < thresh_sauvola
dilate and fill holes:
def fill_hole(input_mask):
h, w = input_mask.shape
canvas = np.zeros((h + 2, w + 2), np.uint8)
canvas[1:h + 1, 1:w + 1] = input_mask.copy()
mask = np.zeros((h + 4, w + 4), np.uint8)
cv2.floodFill(canvas, mask, (0, 0), 1)
canvas = canvas[1:h + 1, 1:w + 1].astype(np.bool)
return ~canvas | input_mask
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
th2 =cv2.morphologyEx((th*255).astype('uint8'), cv2.MORPH_DILATE, kernel)
filled = fill_hole(th2==255)
Related
the task I want to do looks pretty simple: I take as input several images with an object centered in the photo and a little color chart needed for other purposes. My code normally works for the majority of the cases, but sometimes fails miserably and I just can't understand why.
For example (these are the source images), it works correctly on this https://imgur.com/PHfIqcb but not on this https://imgur.com/qghzO3V
Here's the code of the interested part:
img = cv2.imread(path)
height, width, channel = img.shape
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
kernel = np.ones((31, 31), np.uint8)
dil = cv2.dilate(gray, kernel, iterations=1)
_, th = cv2.threshold(dil, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
th_er1 = cv2.bitwise_not(th)
_, contours, _= cv2.findContours(th_er1, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
areas = [cv2.contourArea(c) for c in contours]
max_index = np.argmax(areas)
cnt=contours[max_index]
x,y,w,h = cv2.boundingRect(cnt)
After that I'm just going to crop the image accordingly to the given results (getting the biggest rectangle contour), basically cutting off the photo only the main object.
But as I said, using very similar images sometimes works and sometimes not.
Thank you in advance.
maybe you could try not using otsu's method, and just set threshold manually, if it's possible... ;)
You can use the Canny edge detector. In the two images, there is a good threshold value to isolate the object in the center of the image. After applying the threshold, we blur the results and apply the Canny edge detector before finding the contours:
import cv2
import numpy as np
def process(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(img_gray, 190, 255, cv2.THRESH_BINARY_INV)
img_blur = cv2.GaussianBlur(thresh, (3, 3), 1)
img_canny = cv2.Canny(img_blur, 0, 0)
kernel = np.ones((5, 5))
img_dilate = cv2.dilate(img_canny, kernel, iterations=1)
return cv2.erode(img_dilate, kernel, iterations=1)
def get_contours(img):
contours, hierarchies = cv2.findContours(process(img), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key=cv2.contourArea)
cv2.drawContours(img, [cnt], -1, (0, 255, 0), 30)
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 30)
img = cv2.imread("image.jpeg")
get_contours(img)
cv2.imshow("Result", img)
cv2.waitKey(0)
Input images:
Output images:
The green outlines are the contours of the objects, and the red outlines are the bounding boxes of the objects.
I am trying read text from this
using Python with OpenCV. However, it is not able to read it.
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img=cv.imread(file_path,0)
img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)
th2 =cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
cv.THRESH_BINARY,11,2)
th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv.THRESH_BINARY,11,2)
titles = ['Original Image', 'Global Thresholding (v = 127)',
'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']
images = [img, th1, th2, th3]
for i in range(4):
plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
plt.title(titles[i])
plt.xticks([]),plt.yticks([])
plt.show()
anyway to do this?
Instead of working on the grayscale image, working on saturation channel of the HSV color space makes the subsequent steps easier.
img = cv2.imread(image_path_to_captcha)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
s_component = hsv[:,:,1]
s_component
Next, apply a Gaussian blur of appropriate kernel size and sigma value, and later threshold.
blur = cv2.GaussianBlur(s_component,(7,7), 7)
ret,th3 = cv2.threshold(blur,127,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
th3
Next, finding contours in the image above and preserving those above a certain area threshold in the black image variable which will be used as mask later on.
contours, hierarchy = cv2.findContours(th3, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
black = np.zeros((img.shape[0], img.shape[1]), np.uint8)
for contour in contours:
if cv2.contourArea(contour) >600 :
cv2.drawContours(black, [contour], 0, 255, -1)
black
Using the black image variable as mask over the threshold image
res = cv2.bitwise_and(th3, th3, mask = black)
res
Finally, applying morphological thinning to the above result
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
erode = cv2.erode(res, kernel, iterations=1)
erode
The end result is not what you expect. You can try experimenting different morphology operations prior to drawing contours as well.
EDIT
You can perform distance transform on the above image and use the result:
dist = cv2.distanceTransform(res, cv2.DIST_L2, 3)
dst = cv2.normalize(dist, dst=None, alpha=0, beta=255,norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
dst
I applied the floodfill function in opencv to extract the foreground from the background but some of the objects in the image were not recognized by the algorithm so I would like to know how I can improve my detections and what modifications are necessary.
image = cv2.imread(args["image"])
image = cv2.resize(image, (800, 800))
h,w,chn = image.shape
ratio = image.shape[0] / 800.0
orig = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 75, 200)
# show the original image and the edge detected image
print("STEP 1: Edge Detection")
cv2.imshow("Image", image)
cv2.imshow("Edged", edged)
warped1 = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
T = threshold_local(warped1, 11, offset = 10, method = "gaussian")
warped1 = (warped1 > T).astype("uint8") * 255
print("STEP 3: Apply perspective transform")
seed = (10, 10)
foreground, birdEye = floodFillCustom(image, seed)
cv2.circle(birdEye, seed, 50, (0, 255, 0), -1)
cv2.imshow("originalImg", birdEye)
cv2.circle(birdEye, seed, 100, (0, 255, 0), -1)
cv2.imshow("foreground", foreground)
cv2.imshow("birdEye", birdEye)
gray = cv2.cvtColor(foreground, cv2.COLOR_BGR2GRAY)
cv2.imshow("gray", gray)
cv2.imwrite("gray.jpg", gray)
threshImg = cv2.threshold(gray, 1, 255, cv2.THRESH_BINARY)[1]
h_threshold,w_threshold = threshImg.shape
area = h_threshold*w_threshold
cv2.imshow("threshImg", threshImg)[![enter image description here][1]][1]
The floodFillCustom function is as follows -
def floodFillCustom(originalImage, seed):
originalImage = np.maximum(originalImage, 10)
foreground = originalImage.copy()
cv2.floodFill(foreground, None, seed, (0, 0, 0),
loDiff=(10, 10, 10), upDiff=(10, 10, 10))
return [foreground, originalImage]
A little bit late, but here's an alternative solution for segmenting the tools. It involves converting the image to the CMYK color space and extracting the K (Key) component. This component can be thresholded to get a nice binary mask of the tools, the procedure is very straightforward:
Convert the image to the CMYK color space
Extract the K (Key) component
Threshold the image via Otsu's thresholding
Apply some morphology (a closing) to clean up the mask
(Optional) Get bounding rectangles of all the tools
Let's see the code:
# Imports
import cv2
import numpy as np
# Read image
imagePath = "C://opencvImages//"
inputImage = cv2.imread(imagePath+"DAxhk.jpg")
# Create deep copy for results:
inputImageCopy = inputImage.copy()
# Convert to float and divide by 255:
imgFloat = inputImage.astype(np.float) / 255.
# Calculate channel K:
kChannel = 1 - np.max(imgFloat, axis=2)
# Convert back to uint 8:
kChannel = (255*kChannel).astype(np.uint8)
The first step is to convert the BGR image to CMYK. There's no direct conversion in OpenCV for this, so I applied directly the conversion formula. We can get every color space component from that formula, but we are only interested on the K channel. The conversion is easy, but we need to be careful with the data types. We need to operate on float arrays. After getting the K channel, we convert back the image to an unsigned 8-bit array, this is the resulting image:
Let's threshold this image using Otsu's thresholding method:
# Threshold via Otsu:
_, binaryImage = cv2.threshold(kChannel, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
This yields the following binary image:
Looks very nice! Additionally, we can clean it up a little bit (joining the little gaps) using a morphological closing. Let's apply a rectangular structuring element of size 5 x 5 and use 2 iterations:
# Use a little bit of morphology to clean the mask:
# Set kernel (structuring element) size:
kernelSize = 5
# Set morph operation iterations:
opIterations = 2
# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
binaryImage = cv2.morphologyEx(binaryImage, cv2.MORPH_CLOSE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
Which results in this:
Very cool. What follows is optional. We can get the bounding rectangles for every tool by looking for the outer (external) contours:
# Find the contours on the binary image:
contours, hierarchy = cv2.findContours(binaryImage, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Look for the outer bounding boxes (no children):
for _, c in enumerate(contours):
# Get the contours bounding rectangle:
boundRect = cv2.boundingRect(c)
# Get the dimensions of the bounding rectangle:
rectX = boundRect[0]
rectY = boundRect[1]
rectWidth = boundRect[2]
rectHeight = boundRect[3]
# Set bounding rectangle:
color = (0, 0, 255)
cv2.rectangle( inputImageCopy, (int(rectX), int(rectY)),
(int(rectX + rectWidth), int(rectY + rectHeight)), color, 5 )
cv2.imshow("Bounding Rectangles", inputImageCopy)
cv2.waitKey(0)
Which produces the final image:
I'm using a KNN to detect characters, however, it is sensitive to background noise. the image is basically what I'm using and I have developed a mini script to try get the best threshold image. would anyone have any suggestions/ changed to get better results? make the X more viewable. ( attached version of current output and what the input is)
import cv2
import numpy as np
image = cv2.imread("resize.png")
img = image
img = cv2.blur(img, (5, 5))
imgGray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # get grayscale image
imgBlurred = cv2.GaussianBlur(imgGray, (11, 11), 5) # blur
# cv2.imshow("test",imgBlurred)
imgThresh = cv2.adaptiveThreshold(imgBlurred, # input image
255, # make pixels that pass the threshold full white
cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
# use gaussian rather than mean, seems to give better results
cv2.THRESH_BINARY_INV,
# invert so foreground will be white, background will be black
13, # size of a pixel neighborhood used to calculate threshold value
2) # constant subtracted from the mean or weighted mean
imgThreshCopy = imgThresh.copy() # make a copy of the thresh image, this in necessary b/c findContours modifies the image
kernel = np.ones((3, 3), np.uint8)
kernel2 = np.ones((7, 7), np.uint8)
kernel3 = np.ones((1, 1), np.uint8)
imgThreshCopy = cv2.morphologyEx(imgThreshCopy, cv2.MORPH_OPEN, kernel)
imgThreshCopy = cv2.morphologyEx(imgThreshCopy, cv2.MORPH_CLOSE, kernel2)
imgThreshCopy = cv2.dilate(imgThreshCopy, kernel3, iterations=150)
res = imgThreshCopy
cv2.imwrite("test.jpg", res)
cv2.imshow("image", res)
cv2.waitKey(0)
output
input
i used this code:
horizontalStructure = cv2.getStructuringElement(cv2.MORPH_RECT, (horizontalsize, 1))
horizontal = cv2.erode(horizontal, horizontalStructure, (-1, -1))
horizontal = cv2.dilate(horizontal, horizontalStructure, (-1, -1))
to remove lines.
and some filters to delete the noises and bold the font:
blur = cv2.GaussianBlur(img, (11, 11), 0)
thresh = cv2.threshold(blur, 80, 255, cv2.THRESH_BINARY)[1]
kernel = np.ones((2,1), np.uint8)
dilation = cv2.erode(thresh, kernel, iterations=1)
dilation = cv2.bitwise_not(dilation)
Despite the threshold and other methods, as you can see lots of noise remained
This is the result I want to reach:
Do you know an OpenCV filter that will help me achieve this result?
The following solution is not a perfect, and not generic solution, but I hope it's good enough for your needs.
For removing the line I suggest using cv2.connectedComponentsWithStats for finding clusters, and mask the wide or long clusters.
The solution uses the following stages:
Convert image to Grayscale.
Apply threshold and invert polarity.
Use automatic thresholding by applying flag cv2.THRESH_OTSU.
Use "close" morphological operation to close small gaps.
Find connected components (clusters) with statistics.
Iterate the clusters, and delete clusters with large width and large height.
Remove very small clusters - considered to be noise.
The top and left side is cleaned "manually".
Here is the code:
import numpy as np
import cv2
img = cv2.imread('Heshbonit.jpg') # Read input image
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Convert to Grayscale.
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) # Convert to binary and invert polarity
# Use "close" morphological operation to close small gaps
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]));
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.array([1, 1]).T);
nlabel,labels,stats,centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)
thresh_size = 100
# Delete all lines by filling wide and long lines with zeros.
# Delete very small clusters (assumes to be noise).
for i in range(1, nlabel):
#
if (stats[i, cv2.CC_STAT_WIDTH] > thresh_size) or (stats[i, cv2.CC_STAT_HEIGHT] > thresh_size):
thresh[labels == i] = 0
if stats[i, cv2.CC_STAT_AREA] < 4:
thresh[labels == i] = 0
# Clean left and top margins "manually":
thresh[:, 0:30] = 0
thresh[0:10, :] = 0
# Inverse polarity
thresh = 255 - thresh
# Write result to file
cv2.imwrite('thresh.png', thresh)