I'm trying to develop a script using Python and OpenCV to detect some highlighted regions on a scanned instrumentation diagram and output text using Tesseract's OCR function. My workflow is first to detect the general vicinity of the region of interest, and then apply processing steps to remove everything aside from the blocks of text (lines, borders, noise). The processed image is then feed into Tesseract's OCR engine.
This workflow is works on about half of the images, but fails on the rest due to the text touching the borders. I'll show some examples of what I mean below:
Step 1: Find regions of interest by creating a mask using InRange with the color range of the highlighter.
Step 2: Contour regions of interest, crop and save to file.
--- Referenced code begins here ---
Step 3: Threshold image and apply Canny Edge Detection
Step 4: Contour the edges and filter them into circular shape using cv2.approxPolyDP and looking at ones with vertices greater than 8. Taking the first or second largest contour usually corresponds to the inner edge.
Step 5: Using masks and bitwise operations, everything inside contour is transferred to a white background image. Dilation and erosion is applied to de-noise the image and create the final image that gets fed into the OCR engine.
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
d_path = "Test images\\"
img_name = "cropped_12.jpg"
img = cv2.imread(d_path + img_name) # Reads the image
## Resize image before calculating contour
height, width = img.shape[:2]
img = cv2.resize(img,(2*width,2*height),interpolation = cv2.INTER_CUBIC)
img_orig = img.copy() # Makes copy of original image
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # Convert to grayscale
# Apply threshold to get binary image and write to file
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Edge detection
edges = cv2.Canny(img,100,200)
# Find contours of mask threshold
_, contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Find contours associated w/ polygons with 8 sides or more
cnt_list = []
area_list = [cv2.contourArea(c) for c in contours]
for j in contours:
poly_pts = cv2.approxPolyDP(j,0.01*cv2.arcLength(j,True),True)
area = cv2.contourArea(j)
if (len(poly_pts) > 8) & (area == max(area_list)):
cnt_list.append(j)
cv2.drawContours(img_orig, cnt_list, -1, (255,0,0), 2)
# Show contours
cv2.namedWindow('Show',cv2.WINDOW_NORMAL)
cv2.imshow("Show",img_orig)
cv2.waitKey()
cv2.destroyAllWindows()
# Zero pixels outside circle
mask = np.zeros(img.shape).astype(img.dtype)
cv2.fillPoly(mask, cnt_list, (255,255,255))
mask_inv = cv2.bitwise_not(mask)
a = cv2.bitwise_and(img,img,mask = mask)
wh_back = np.ones(img.shape).astype(img.dtype)*255
b = cv2.bitwise_and(wh_back,wh_back,mask = mask_inv)
res = cv2.add(a,b)
# Get rid of noise
kernel = np.ones((2, 2), np.uint8)
res = cv2.dilate(res, kernel, iterations=1)
res = cv2.erode(res, kernel, iterations=1)
# Show final image
cv2.namedWindow('result',cv2.WINDOW_NORMAL)
cv2.imshow("result",res)
cv2.waitKey()
cv2.destroyAllWindows()
When code works, these are the images that get outputted:
Working
However, in the instances where the text touches the circular border, the code assumes part of the text is part of the larger contour and ignores the last letter. For example:
Not working
Are there any processing steps that can help me bypass this problem? Or perhaps a different approach? I've tried using Hough Circle Transforms to try to detect the borders, but they're quite finicky and doesn't work as well as contouring.
I'm quite new to OpenCV and Python so any help would be appreciated.
If the Hough circle transform didn't work for you I think you're best option will be to approximate the boarder shape. The best method I know for that is: Douglas-Peucker algorithm which will make your contour simpler by reducing the perimeter on pics.
You can check this reference file from OpenCV to see the type of post processing you can apply to your boarder. They also mention Douglas-Peucker:
OpenCV boarder processing
Just a hunch. After OTSU thresholding. Erode and dilate the image. This will result in vanishing of very thin joints. The code for the same is below.
kernel = np.ones((5,5),np.uint8)
th3 = cv2.erode(th3, kernel,iterations=1)
th3 = cv2.dilate(th3, kernel,iterations=1)
Let me know how it goes. I have couple more idea if this did not work.
Related
I want to count cardboard boxes and read a specific label which will only contain 3 words with white background on a conveyer belt using OpenCV and Python. Attached is the image I am using for experiments. The problem so far is that I am unable to detect the complete box due to noise and if I try to check w and h in x, y, w, h = cv2.boundingRect(cnt) then it simply filter out the text. in this case ABC is written on the box. Also the box have detected have spikes on both top and bottom, which I am not sure how to filter.
Below it the code I am using
import cv2
# reading image
image = cv2.imread('img002.jpg')
# convert the image to grayscale format
img_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# apply binary thresholding
ret, thresh = cv2.threshold(img_gray, 150, 255, cv2.THRESH_BINARY)
# visualize the binary image
cv2.imshow('Binary image', thresh)
# collectiong contours
contours,h = cv2.findContours(thresh, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
# looping through contours
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
cv2.rectangle(image,(x,y),(x+w,y+h),(0,215,255),2)
cv2.imshow('img', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Also please suggest how to crop the text ABC and then apply an OCR on that to read the text.
Many Thanks.
EDIT 2: Many thanks for your answer and based upon your suggestion I changed the code so that it can check for boxes in a video. It worked liked a charm expect it only failed to identify one box for a long time. Below is my code and link to the video I have used. I have couple of questions around this as I am new to OpenCV, if you can find some time to answer.
import cv2
import numpy as np
from time import time as timer
def get_region(image):
contours, hierarchy = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
c = max(contours, key = cv2.contourArea)
black = np.zeros((image.shape[0], image.shape[1]), np.uint8)
mask = cv2.drawContours(black,[c],0,255, -1)
return mask
cap = cv2.VideoCapture("Resources/box.mp4")
ret, frame = cap.read()
fps = 60
fps /= 1000
framerate = timer()
elapsed = int()
while(1):
start = timer()
ret, frame = cap.read()
# convert the image to grayscale format
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Performing threshold on the hue channel `hsv[:,:,0]`
thresh = cv2.threshold(hsv[:,:,0],127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
mask = get_region(thresh)
masked_img = cv2.bitwise_and(frame, frame, mask = mask)
newImg = cv2.cvtColor(masked_img, cv2.COLOR_BGR2GRAY)
# collectiong contours
c,h = cv2.findContours(newImg, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
cont_sorted = sorted(c, key=cv2.contourArea, reverse=True)[:5]
x,y,w,h = cv2.boundingRect(cont_sorted[0])
cv2.rectangle(frame,(x,y),(x+w,y+h),(255,0,0),5)
#cv2.imshow('frame',masked_img)
cv2.imshow('Out',frame)
if cv2.waitKey(1) & 0xFF == ord('q') or ret==False :
break
diff = timer() - start
while diff < fps:
diff = timer() - start
cap.release()
cv2.destroyAllWindows()
Link to video: https://www.storyblocks.com/video/stock/boxes-and-packages-move-along-a-conveyor-belt-in-a-shipment-factory-a-few-blank-boxes-for-your-custom-graphics-lmgxtwq
Questions:
How can we be 100% sure if the rectangle drawn is actually on top of a box and not on belt or somewhere else.
Can you please tell me how can I use the function you have provided in original answer to use for other boxes in this new code for video.
Is it correct way to again convert masked frame to grey, find contours again to draw a rectangle. Or is there a more efficient way to do it.
The final version of this code is intended to run on raspberry pi. So what can we do to optimize the code's performance.
Many thank again for your time.
There are 2 steps to be followed:
1. Box segmentation
We can assume there will be no background change since the conveyor belt is present. We can segment the box using a different color space. In the following I have used HSV color space:
img = cv2.imread('box.jpg')
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Performing threshold on the hue channel `hsv[:,:,0]`
th = cv2.threshold(hsv[:,:,0],127,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)[1]
Masking the largest contour in the binary image:
def get_region(image):
contours, hierarchy = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
c = max(contours, key = cv2.contourArea)
black = np.zeros((image.shape[0], image.shape[1]), np.uint8)
mask = cv2.drawContours(black,[c],0,255, -1)
return mask
mask = get_region(th)
Applying the mask on the original image:
masked_img = cv2.bitwise_and(img, img, mask = mask)
2. Text Detection:
The text region is enclosed in white, which can be isolated again by applying a suitable threshold. (You might want to apply some statistical measure to calculate the threshold)
# Applying threshold at 220 on green channel of 'masked_img'
result = cv2.threshold(masked_img[:,:,1],220,255,cv2.THRESH_BINARY)[1]
Note:
The code is written for the shared image. For boxes of different sizes you can filter contours with approximately 4 vertices/sides.
# Function to extract rectangular contours above a certain area
def extract_rect(contours, area_threshold):
rect_contours = []
for c in contours:
if cv2.contourArea(c) > area_threshold:
perimeter = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.02*perimeter, True)
if len(approx) == 4:
cv2.drawContours(image, [approx], 0, (0,255,0),2)
rect_contours.append(c)
return rect_contours
Experiment using a statistical value (mean, median, etc.) to find optimal threshold to detect text region.
Your additional questions warranted a separate answer:
1. How can we be 100% sure if the rectangle drawn is actually on top of a box and not on belt or somewhere else?
PRO: For this very purpose I chose the Hue channel of HSV color space. Shades of grey, white and black (on the conveyor belt) are neutral in this channel. The brown color of the box is contrasting could be easily segmented using Otsu threshold. Otsu's algorithm finds the optimal threshold value without user input.
CON You might face problems when boxes are also of the same color as conveyor belt
2. Can you please tell me how can I use the function you have provided in original answer to use for other boxes in this new code for video.
PRO: In case you want to find boxes using edge detection and without using color information; there is a high chance of getting many unwanted edges. By using extract_rect() function, you can filter contours that:
have approximately 4 sides (quadrilateral)
are above certain area
CON If you have parcels/packages/bags that have more than 4 sides you might need to change this.
3. Is it correct way to again convert masked frame to grey, find contours again to draw a rectangle. Or is there a more efficient way to do it.
I felt this is the best way, because all that is remaining is the textual region enclosed in white. Applying threshold of high value was the simplest idea in my mind. There might be a better way :)
(I am not in the position to answer the 4th question :) )
I'm doing cell segmentation, so I'm trying to code a function that removes all minor contours around the main one in order to do a mask.
That happens because I load an image with some color markers:
The problem is when I do threshold, it assumes that "box" between the color markers as a part of the main contour.
As you may see in my code, I don't directly pass color image to grays because the red turns black but there are other colors too, at least 8, and always different in each image. I've got thousands of images like this where just one cell is displayed, but in most of it, there are always outsiders contours attached. My goal is to come to a function that gives a binary image of a single cell for each image input like this. So I'm starting with this code:
import cv2 as cv
cell1 = cv.imread(image_cell, 0)
imgray = cv.cvtColor(cell1,cv.COLOR_BGR2HSV)
imgray = cv.cvtColor(imgray,cv.COLOR_BGR2GRAY)
ret,thresh_binary = cv.threshold(imgray,107,255,cv.THRESH_BINARY)
cnts= cv.findContours(image =cv.convertScaleAbs(thresh_binary) , mode =
cv.RETR_TREE,method = cv.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
for c in cnts:
cv.drawContours(thresh_binary,[c], 0, (255,255,255), -1)
kernel = cv.getStructuringElement(cv.MORPH_RECT, (3,3))
opening = cv.morphologyEx(thresh_binary, cv.MORPH_OPEN, kernel,
iterations=2) # erosion followed by dilation
Summing up, how do I get just the red contour from image 1?
So another approach, without color ranges.
A couple of things are not going right in your code I think. First, you are drawing the contours on thresh_binary, but that already has the outer lines of the other cells as well - the lines you are trying to get rid off. I think that is why you use opening(?) while in this case you shouldn't.
To fix things, first a little information on how findContours works. findContours starts looking for white shapes on a black background and then looks for black shapes inside that white contour and so on. That means that the white outline of the cells in the thresh_binary are detected as a contour. Inside of it are other contours, including the one you want. docs with examples
What you should do is first look only for contours that have no contours inside of them. The findContours also returns a hierarchy of contours. It indicates whether a contour has 'childeren'. If it has none (value: -1) then you look at the size of the contour and disregard the ones that are to small. You could also just look for the largest, as that is probably the one you want. Finally you draw the contour on a black mask.
Result:
Code:
import cv2 as cv
import numpy as np
# load image as grayscale
cell1 = cv.imread("PjMQR.png",0)
# threshold image
ret,thresh_binary = cv.threshold(cell1,107,255,cv.THRESH_BINARY)
# findcontours
contours, hierarchy = cv.findContours(image =thresh_binary , mode = cv.RETR_TREE,method = cv.CHAIN_APPROX_SIMPLE)
# create an empty mask
mask = np.zeros(cell1.shape[:2],dtype=np.uint8)
# loop through the contours
for i,cnt in enumerate(contours):
# if the contour has no other contours inside of it
if hierarchy[0][i][2] == -1 :
# if the size of the contour is greater than a threshold
if cv2.contourArea(cnt) > 10000:
cv.drawContours(mask,[cnt], 0, (255), -1)
# display result
cv2.imshow("Mask", mask)
cv2.imshow("Img", cell1)
cv2.waitKey(0)
cv2.destroyAllWindows()
Note: I used the image you uploaded, your image probably has far fewer pixels, so a smaller contourArea
Note2: enumerate loops through the contours, and returns both a contour and an index for each loop
Actually, in your code the 'box' is a legitimate extra contour. And you draw all contours on the final image, so that includes the 'box'. This could cause issues if any of the other colored cells are fully in the image.
A better approach is to separate out the color you want. The code below creates a binary mask that only displays the pixels that are in the defined range of red colors. You can use this mask with findContours.
Result:
Code:
import cv2
# load image
img = cv2.imread("PjMQR.png")
# Convert HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# define range of red color in HSV
lower_val = np.array([0,20,0])
upper_val = np.array([15,255,255])
# Threshold the HSV image to get only red colors
mask = cv2.inRange(hsv, lower_val, upper_val)
# display image
cv2.imshow("Mask", mask)
cv2.waitKey(0)
cv2.destroyAllWindows()
This code can help you understand how the different values in this process (HSV with inRange) works. inRange docs
I am new to OpenCV and Python and I made a program that finds contours with area that is above 500 and saves them into a new image I used boundingRect as advised on the internet, it runs and does the job well but I got a problem with an output of an image. It seems that noises near beside the region of interest are also saved. As you can see in the image below, there are some tiny shapes near beside the ROI. The output is good for other images its just that I want to get rid of noises like this. Is there a way to remove those kind of noises in the output?
Here is the output of the program I made:
Here is the input image:
Hide with contouring
This solution uses cv2.drawContours() to simply draw black contours over the noise. I ran the black and white sample image through a few iterations of dilation, filtered contours by area, and then drew black contour lines over the noise. I used the threshold feature because there turned out to be a good bit of minuscule noise in what initially appeared to be a simple black and white image.
Input:
Code:
import cv2
thresh_value = 10
img = cv2.imread("cells_BW.jpg")
img = cv2.medianBlur(img, 5)
dilation = cv2.dilate(img,(3,3),iterations = 3)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(T, thresh) = cv2.threshold(img_gray, thresh_value, 255, cv2.THRESH_BINARY)
_, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
contours = [i for i in contours if cv2.contourArea(i) < 5000]
cv2.drawContours(img, contours, -1, (0,0,0), 10, lineType=8)
cv2.imwrite("cells_BW_CLEAN.jpg", img)
Output:
There could be several solutions depends on the assumption on the input data.
Probable Methods
If the ROI has a significantly different color than others,
1-1. You can threshold the input image using RGB before finding the contour.
If the area of the object you want to find is significantly bigger that others,
2-1. Fill the holes like this example
2-2. Calculate the size of the blobs, and exclude all the blobs except the largest one (example to calculate the size of blobs).
If there has intersection point between the contours of multiple objects, Method 2 surely fail to segment the region of single cell.
I'm trying to find the contours of this image, but the method findContours only returns 1 contour, the contour is highlighted in image 2. I'm trying to find all external contours like these circles where the numbers are inside. What am i doing wrong? What can i do to accomplish it?
image 1:
image 2:
Below is the relevant portion of my code.
thresh = cv2.threshold(image, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
cnts = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
When i change cv2.RETR_EXTERNAL to cv2.RETR_LIST it seems to detect the same contour twice or something like this. Image 3 shows when the border of circle is first detected and then it is detected again as shows image 4. I'm trying to find only outer borders of these circles. How can i accomplish that?
image 3
image 4
The problem is the flag cv2.RETR_EXTERNAL that you used in the function call. As described in the OpenCV documentation, this only returns the external contour.
Using the flag cv2.RETR_LIST you get all contours in the image. Since you try to detect rings, this list will contain the inner and the outer contour of these rings.
To filter the outer boundary of the circles, you could use cv2.contourArea() to find the larger of two overlapping contours.
I am not sure this is really what you expect nevertheless in case like this there is many way to help findContours to do its job.
Here is a way I use frequently.
Convert your image to gray
Ig = cv2.cvtColor(I,cv2.COLOR_BGR2GRAY)
Thresholding
The background and foreground values looklike quite uniform in term of colours but locally they are not so I apply an thresholding based on Otsu's method in order to binarise the intensities.
_,It = cv2.threshold(Ig,0,255,cv2.THRESH_OTSU)
Sobel magnitude
In order to extract only the contours I process the magnitude of the Sobel edges detector.
sx = cv2.Sobel(It,cv2.CV_32F,1,0)
sy = cv2.Sobel(It,cv2.CV_32F,0,1)
m = cv2.magnitude(sx,sy)
m = cv2.normalize(m,None,0.,255.,cv2.NORM_MINMAX,cv2.CV_8U)
thinning (optional)
I use the thinning function which is implemented in ximgproc.
The interest of the thining is to reduce the contours thickness to as less pixels as possible.
m = cv2.ximgproc.thinning(m,None,cv2.ximgproc.THINNING_GUOHALL)
Final Step findContours
_,contours,hierarchy = cv2.findContours(m,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
disp = cv2.merge((m,m,m)
disp = cv2.drawContours(disp,contours,-1,hierarchy=hierarchy,color=(255,0,0))
Hope it help.
I think an approach based on SVM or a CNN might be more robust.
You can find an example here.
This one may also be interesting.
-EDIT-
I found a let say easier way to reach your goal.
Like previously after loading the image applying a threshold ensure that the image is binary.
By reversing the image using a bitwise not operation the contours become white over a black background.
Applying cv2.connectedComponentsWithStats return (among others) a label matrix in which each connected white region in the source has been assign a unique label.
Then applying findContours based on the labels it is possible give the external contours for every areas.
import numpy as np
import cv2
from matplotlib import pyplot as plt
I = cv2.imread('/home/smile/Downloads/ext_contours.png',cv2.IMREAD_GRAYSCALE)
_,I = cv2.threshold(I,0.,255.,cv2.THRESH_OTSU)
I = cv2.bitwise_not(I)
_,labels,stats,centroid = cv2.connectedComponentsWithStats(I)
result = np.zeros((I.shape[0],I.shape[1],3),np.uint8)
for i in range(0,labels.max()+1):
mask = cv2.compare(labels,i,cv2.CMP_EQ)
_,ctrs,_ = cv2.findContours(mask,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
result = cv2.drawContours(result,ctrs,-1,(0xFF,0,0))
plt.figure()
plt.imshow(result)
P.S. Among the outputs return by the function findContours there is a hierachy matrix.
It is possible to reach the same result by analyzing that matrix however it is a little bit more complex as explain here.
Instead of finding contours, I would suggest applying the Hough circle transform using the appropriate parameters.
Finding contours poses a challenge. Once you invert the binary image the circles are in white. OpenCV finds contours both along the outside and the inside of the circle. Moreover since there are letters such as 'A' and 'B', contours will again be found along the outside of the letters and within the holes. You can find contours using the appropriate hierarchy criterion but it is still tedious.
Here is what I tried by finding contours and using hierarchy:
Code:
#--- read the image, convert to gray and obtain inverse binary image ---
img = cv2.imread('keypad.png', 1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV|cv2.THRESH_OTSU)
#--- find contours ---
_, contours, hierarchy = cv2.findContours(binary, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
#--- copy of original image ---
img2 = img.copy()
#--- select contours having a parent contour and append them to a list ---
l = []
for h in hierarchy[0]:
if h[0] > -1 and h[2] > -1:
l.append(h[2])
#--- draw those contours ---
for cnt in l:
if cnt > 0:
cv2.drawContours(img2, [contours[cnt]], 0, (0,255,0), 2)
cv2.imshow('img2', img2)
For more info on contours and their hierarchical relationship please refer this
UPDATE
I have a rather crude way to ignore unwanted contours. Find the average area of all the contours in list l and draw those that are above the average:
Code:
img3 = img.copy()
a = 0
for j, i in enumerate(l):
a = a + cv2.contourArea(contours[i])
mean_area = int(a/len(l))
for cnt in l:
if (cnt > 0) & (cv2.contourArea(contours[cnt]) > mean_area):
cv2.drawContours(img3, [contours[cnt]], 0, (0,255,0), 2)
cv2.imshow('img3', img3)
You can select only the outer borders by this function:
def _select_contours(contours, hierarchy):
"""select contours of the second level"""
# find the border of the image, which has no father
father_i = None
for i, h in enumerate(hierarchy):
if h[3] == -1:
father_i = i
break
# collect its sons
new_contours = []
for c, h in zip(contours, hierarchy):
if h[3] == father_i:
new_contours.append(c)
return new_contours
Note that you should use cv2.RETR_TREE in cv2.findContours() to get the contours and hierarchy.
I want to detect the text area of images using python 2.7 and opencv 2.4.9
and draw a rectangle area around it. Like shown in the example image below.
I am new to image processing so any idea how to do this will be appreciated.
There are multiple ways to go about detecting text in an image.
I recommend looking at this question here, for it may answer your case as well. Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem). The solutions here may not work for your case, but I recommend trying them out.
If I were to go about this I would do the following process:
Prep your image:
If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black. I would first white out all content that is not black (or already white). Doing so will leave only the black text left.
# must import if working with opencv in python
import numpy as np
import cv2
# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_bound = np.array([0,0,lower_val])
upper_bound = np.array([255,255,upper_val])
mask = cv2.inRange(gray, lower_bound, upper_bound)
return cv2.bitwise_and(gray, gray, mask = mask)
Now that all you have is the black text the goal is to get those boxes. As stated before, there are different ways of going about this.
Stroke Width Transform (SWT)
The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform " by Boris Epshtein, Eyal Ofek, and Yonatan Wexler. To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code. You can still use the code above to remove the blueprint design though, and that may help the overall performance of the swt algorithm.
Here is a c library that implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete. Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.
The library I linked is CCV. It is a library that is meant to be used in your applications, not recreate algorithms. So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments. Still, useful to know it exists if you don't want to code the algorithm yourself.
Home Brewed Non-SWT Method
If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels. Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels. Then simply find each contours' bounding box and you are done.
# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)
# finds contours of eroded image
def prep( img, kern_size = 3 ):
img = erode( img, kern_size )
retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) # invert colors for findContours
return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image
# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
while len(contours) > num_of_labels:
kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
previous = (prep_img, contours, hierarchy)
processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
if len(contours) < num_of_labels:
return (processed_img, contours, hierarchy)
else:
return previous
# finds bounding boxes of all contours
def bounding_box(contours):
bBox = []
for curve in contours:
box = cv2.boundingRect(curve)
bBox.append(box)
return bBox
The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image. To avoid this make regions of interest via your new found boxes and trim the white space. Then save that roi's shape as your new box.
Perhaps you have no way of knowing how many labels will be in the image. If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.
Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.
After you found your boxes, simply use those boxes with respect to the original image and you will be done.
Scene Text Detection Module in OpenCV 3
As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end. Documentation for the scene text detection can be found with a simple google search.
The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module. The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.
Links:
Documentation OpenCv
Older Documentation
The source code is located here, for analysis and understanding
Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module. The same with the SWT algorithm. I just got into this stuff this past few months, but as I learn more I will edit this answer.
Here's a simple image processing approach using only thresholding and contour filtering:
Obtain binary image. Load image, convert to grayscale, Gaussian blur, and adaptive threshold
Combine adjacent text. We create a rectangular structuring kernel then dilate to form a single contour
Filter for text contours. We find contours and filter using contour area. From here we can draw the bounding box with cv2.rectangle()
Using this original input image (removed red lines)
After converting the image to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image
Next we dilate to combine the text into a single contour
From here we find contours and filter using a minimum threshold area (in case there was small noise). Here's the result
If we wanted to, we could also extract and save each ROI using Numpy slicing
Code
import cv2
# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)
# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(thresh, kernel, iterations=4)
# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
if area > 10000:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
# ROI = image[y:y+h, x:x+w]
# cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
# ROI_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()