Similar questions have been asked, but none of them seem to help in my case (nevertheless, I learned a few things from those threads).
I am using Tesseract for OCR, but the results are far from satisfactory when the text is slightly skewed (see the image above).
Inspired by similar cases, I tried to use OpenCV to detect and fix the skew, but unfortunately it just doesn't seem to work. Below, you can see my current attempt, which doesn't yield the necessary result. What I get is just another bounding box around the image (that has already been cropped).
import cv2
from matplotlib import pyplot as plt
import numpy as np
img = cv2.imread("skew.JPG")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#gray = cv2.bitwise_not(gray)
ret,thresh1 = cv2.threshold(gray, 0, 255 ,cv2.THRESH_OTSU)
rect_kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3
, 2))
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)
cv2.imshow('dilation', dilation)
cv2.waitKey(0)
cv2.destroyAllWindows()
contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
rect = cv2.minAreaRect(cnt)
box = cv2.boxPoints(rect)
box = np.int0(box)
cv2.drawContours(img,[box],0,(0,0,255),3)
cv2.imshow('final', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
I would appreciate any advice.
Tesseract seems to have a lot of troubles when the text has some distortions.
The idea is to find the contour of the text to be able to undistort the image and then use Tesseract.
The contour is generally a rectangle which has undergone the same distortion as the text. So it does not appear as a perfect rectangle in your image anymore. Opencv gives you different methods to find it. cv2.minAreaRect() finds the best rotated rectangle. It may be sufficient depending on the distortion of your text. Otherwise, you can use cv2.convexHull() to better fit your text.
The contour should give you the corners of the text that you want to remap to a regular rectangle. You can do that with:
cv2.getAffineTransform(corners, dest_corners) # requires 3 points
cv2.getPerspectiveTransform(corners, dest_corners) # requires 4 points
and then
cv2.warpAffine(...)
cv2.warpPerspective(...)
Also, don't forget to correctly set the page segmentation method that Tesseract needs to use (https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality). In your case, "6 Assume a single uniform block of text." seems to be adapted.
Related
I have a following problem that I want to solve:
Paper cup is photographed from it's side and it's edges and contours are found with Canny and findContours function (please see IMG1 attached bellow), then I have to test if that contour is straight (without any bumps or other imprefections) and return if it's passed or not.
EDIT: as per Christoph Rackwitz recomendation I'm posting original non altered image.
I have successfully implemented edge detection as shown in IMG1 bellow, but now I want to remove parts of edges that are not used and want to get result that are shown in IMG2.
What would be the best way of doing it?
Few things that are on my mind:
Masks (mask off other paths)
Somehow delete paths that are radically different in x coordinates.
Yet another issue is detecting bumps on that curvy path. I'm thinking about few possible solutions:
Loop thru contours and look for spikes in x coordinates (radical changes, from 20 to 40 for examples)
draw similar radius arc and compare it with that contour.
The best result is shown in IMG3 image. Would be the perfect way of solving this problem for me.
Maybe there is someone who has more expierience with OPENCV and could light up my path to solving this problem. THANK YOU!
my code is as follow:
import numpy as np
import cv2
# Read the original image
img = cv2.imread('test2.jpg')
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_blur = cv2.GaussianBlur(img_gray, (3,3), 0)
edges = cv2.Canny(image=img_blur, threshold1=110, threshold2=180) # Canny Edge Detection
ret, thresh = cv2.threshold(edges, 110, 180, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(image=thresh, mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_NONE)
# print(contours)
for cnt in contours:
print(cv2.arcLength(cnt, False))
image_copy = img.copy()
cv2.drawContours(image=image_copy, contours=contours, contourIdx=-1, color=(255, 0, 0), thickness=1, lineType=cv2.LINE_AA)
cv2.imshow('Binary image', image_copy)
cv2.waitKey(0)
cv2.destroyAllWindows()
IMG1 This is what is currently detected by OpenCV
IMG2 Good edge detection that I strive to achieve
IMG3 Perfect result with combined expected path and defect in one image
IMG4 Original image for testing and so on
I have a microscopy image and need to calculate the area shown in red. The idea is to build a function that returns the area inside the red line on the right photo (float value, X mm²).
Since I have almost no experience in image processing, I don't know how to approach this (maybe silly) problem and so I'm looking for help. Other image examples are pretty similar with just 1 aglomerated "interest area" close to the center.
I'm comfortable coding in python and have used the software ImageJ for some time.
Any python package, software, bibliography, etc. should help.
Thanks!
EDIT:
The example in red I made manually just to make people understand what I want. Detecting the "interest area" must be done inside the code.
Canny, morphological transformation and contours can provide a decent result.
Although it might need some fine-tuning depending on the input images.
import numpy as np
import cv2
# Change this with your filename
image = cv2.imread('test.png', cv2.IMREAD_GRAYSCALE)
# You can fine-tune this, or try with simple threshold
canny = cv2.Canny(image, 50, 580)
# Morphological Transformations
se = np.ones((7,7), dtype='uint8')
image_close = cv2.morphologyEx(canny, cv2.MORPH_CLOSE, se)
contours, _ = cv2.findContours(image_close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
# Create black canvas to draw area on
mask = np.zeros(image.shape[:2], np.uint8)
biggest_contour = max(contours, key = cv2.contourArea)
cv2.drawContours(mask, biggest_contour, -1, 255, -1)
area = cv2.contourArea(biggest_contour)
print(f"Total area image: {image.shape[0] * image.shape[1]} pixels")
print(f"Total area contour: {area} pixels")
cv2.imwrite('mask.png', mask)
cv2.imshow('img', mask)
cv2.waitKey(0)
# Draw the contour on the original image for demonstration purposes
original = cv2.imread('test.png')
cv2.drawContours(original, biggest_contour, -1, (0, 0, 255), -1)
cv2.imwrite('result.png', original)
cv2.imshow('result', original)
cv2.waitKey(0)
The code produces the following output:
Total area image: 332628 pixels
Total area contour: 85894.5 pixels
Only thing left to do is convert the pixels to your preferred measurement.
Two images for demonstration below.
Result
Mask
I don't know much about this topic, but it seems like a very similar question to this one, which has what look like two great answers:
Python: calculate an area within an irregular contour line
I'm trying to develop a script using Python and OpenCV to detect some highlighted regions on a scanned instrumentation diagram and output text using Tesseract's OCR function. My workflow is first to detect the general vicinity of the region of interest, and then apply processing steps to remove everything aside from the blocks of text (lines, borders, noise). The processed image is then feed into Tesseract's OCR engine.
This workflow is works on about half of the images, but fails on the rest due to the text touching the borders. I'll show some examples of what I mean below:
Step 1: Find regions of interest by creating a mask using InRange with the color range of the highlighter.
Step 2: Contour regions of interest, crop and save to file.
--- Referenced code begins here ---
Step 3: Threshold image and apply Canny Edge Detection
Step 4: Contour the edges and filter them into circular shape using cv2.approxPolyDP and looking at ones with vertices greater than 8. Taking the first or second largest contour usually corresponds to the inner edge.
Step 5: Using masks and bitwise operations, everything inside contour is transferred to a white background image. Dilation and erosion is applied to de-noise the image and create the final image that gets fed into the OCR engine.
import cv2
import numpy as np
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
d_path = "Test images\\"
img_name = "cropped_12.jpg"
img = cv2.imread(d_path + img_name) # Reads the image
## Resize image before calculating contour
height, width = img.shape[:2]
img = cv2.resize(img,(2*width,2*height),interpolation = cv2.INTER_CUBIC)
img_orig = img.copy() # Makes copy of original image
img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) # Convert to grayscale
# Apply threshold to get binary image and write to file
_, img = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Edge detection
edges = cv2.Canny(img,100,200)
# Find contours of mask threshold
_, contours, hierarchy = cv2.findContours(edges, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Find contours associated w/ polygons with 8 sides or more
cnt_list = []
area_list = [cv2.contourArea(c) for c in contours]
for j in contours:
poly_pts = cv2.approxPolyDP(j,0.01*cv2.arcLength(j,True),True)
area = cv2.contourArea(j)
if (len(poly_pts) > 8) & (area == max(area_list)):
cnt_list.append(j)
cv2.drawContours(img_orig, cnt_list, -1, (255,0,0), 2)
# Show contours
cv2.namedWindow('Show',cv2.WINDOW_NORMAL)
cv2.imshow("Show",img_orig)
cv2.waitKey()
cv2.destroyAllWindows()
# Zero pixels outside circle
mask = np.zeros(img.shape).astype(img.dtype)
cv2.fillPoly(mask, cnt_list, (255,255,255))
mask_inv = cv2.bitwise_not(mask)
a = cv2.bitwise_and(img,img,mask = mask)
wh_back = np.ones(img.shape).astype(img.dtype)*255
b = cv2.bitwise_and(wh_back,wh_back,mask = mask_inv)
res = cv2.add(a,b)
# Get rid of noise
kernel = np.ones((2, 2), np.uint8)
res = cv2.dilate(res, kernel, iterations=1)
res = cv2.erode(res, kernel, iterations=1)
# Show final image
cv2.namedWindow('result',cv2.WINDOW_NORMAL)
cv2.imshow("result",res)
cv2.waitKey()
cv2.destroyAllWindows()
When code works, these are the images that get outputted:
Working
However, in the instances where the text touches the circular border, the code assumes part of the text is part of the larger contour and ignores the last letter. For example:
Not working
Are there any processing steps that can help me bypass this problem? Or perhaps a different approach? I've tried using Hough Circle Transforms to try to detect the borders, but they're quite finicky and doesn't work as well as contouring.
I'm quite new to OpenCV and Python so any help would be appreciated.
If the Hough circle transform didn't work for you I think you're best option will be to approximate the boarder shape. The best method I know for that is: Douglas-Peucker algorithm which will make your contour simpler by reducing the perimeter on pics.
You can check this reference file from OpenCV to see the type of post processing you can apply to your boarder. They also mention Douglas-Peucker:
OpenCV boarder processing
Just a hunch. After OTSU thresholding. Erode and dilate the image. This will result in vanishing of very thin joints. The code for the same is below.
kernel = np.ones((5,5),np.uint8)
th3 = cv2.erode(th3, kernel,iterations=1)
th3 = cv2.dilate(th3, kernel,iterations=1)
Let me know how it goes. I have couple more idea if this did not work.
I am new using OpenCV and I am having a problem detecting corners in a image. Supposing I have an image-grid like this one where there are multiple corners.
I'd like to catch just the 4 extreme corners, highlighted in red.
Using cv2.cornerHarris() I've managed to highlight all the possible corners really accurately but I can't find a way to adjust these values and let just the 4 extreme remain.
This is my code:
import cv2
import numpy as np
filename = 'start.jpg'
img = cv2.imread(filename)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv2.cornerHarris(gray,2,29,0.04)
#result is dilated for marking the corners, not important
dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv2.imshow('dst',img)
if cv2.waitKey(0) & 0xff == 27:
cv2.destroyAllWindows()
I've got this code from the official OpenCV website and I've adjusted few values.
Anyone be able to help? Am I in the completely wrong direction? If I've not been clear what I'd like to have is the output-image the I've attached in my question while my code is just giving me all the possible corners in the grid.
Grid does not start from coordinates x,y = 0,0
Thank you in advance guys
I have another solution using contours.
Convert image to grayscale
Obtain binary image
Perform morphological close operation
Find contour present externally
contours, hierarchy = cv2.findContours(th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
Obtain the bounding rectangle around this contour. You will get the coordinates of this rectangle, which are the corners of the grid
cnts = contours[0]
x,y,w,h = cv2.boundingRect(cnts)
Draw those corners on the grid
cv2.circle(im,(x,y), 3, (0,0,255), -1)
cv2.circle(im,(x+w,y), 3, (0,0,255), -1)
cv2.circle(im,(x+w,y+h), 3, (0,0,255), -1)
cv2.circle(im,(x,y+h), 3, (0,0,255), -1)
cv2.imshow("Corners of grid", img)
There is no easy solution if the grid is in a generic position in the image. The grid extremal corners might even not be detected by the Harris filter, depending on the quality of the image itself. For a robust solution you will need to explicitly model the topology of the graph of detected corners (i.e. find out which corner is neighbor of which ones in the grid). Here is an example of how this is done for the case of a checkerboard.
For a somewhat less robust solution, you could compute the convex hull of the detected corner distribution, then fit 4 lines to it. The lines' intersections would then yield the extremal corners.
I want to detect the text area of images using python 2.7 and opencv 2.4.9
and draw a rectangle area around it. Like shown in the example image below.
I am new to image processing so any idea how to do this will be appreciated.
There are multiple ways to go about detecting text in an image.
I recommend looking at this question here, for it may answer your case as well. Although it is not in python, the code can be easily translated from c++ to python (Just look at the API and convert the methods from c++ to python, not hard. I did it myself when I tried their code for my own separate problem). The solutions here may not work for your case, but I recommend trying them out.
If I were to go about this I would do the following process:
Prep your image:
If all of your images you want to edit are roughly like the one you provided, where the actual design consists of a range of gray colors, and the text is always black. I would first white out all content that is not black (or already white). Doing so will leave only the black text left.
# must import if working with opencv in python
import numpy as np
import cv2
# removes pixels in image that are between the range of
# [lower_val,upper_val]
def remove_gray(img,lower_val,upper_val):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_bound = np.array([0,0,lower_val])
upper_bound = np.array([255,255,upper_val])
mask = cv2.inRange(gray, lower_bound, upper_bound)
return cv2.bitwise_and(gray, gray, mask = mask)
Now that all you have is the black text the goal is to get those boxes. As stated before, there are different ways of going about this.
Stroke Width Transform (SWT)
The typical way to find text areas: you can find text regions by using stroke width transform as depicted in "Detecting Text in Natural Scenes with Stroke Width Transform " by Boris Epshtein, Eyal Ofek, and Yonatan Wexler. To be honest, if this is as fast and reliable as I believe it is, then this method is a more efficient method than my below code. You can still use the code above to remove the blueprint design though, and that may help the overall performance of the swt algorithm.
Here is a c library that implements their algorithm, but it is stated to be very raw and the documentation is stated to be incomplete. Obviously, a wrapper will be needed in order to use this library with python, and at the moment I do not see an official one offered.
The library I linked is CCV. It is a library that is meant to be used in your applications, not recreate algorithms. So this is a tool to be used, which goes against OP's want for making it from "First Principles", as stated in comments. Still, useful to know it exists if you don't want to code the algorithm yourself.
Home Brewed Non-SWT Method
If you have meta data for each image, say in an xml file, that states how many rooms are labeled in each image, then you can access that xml file, get the data about how many labels are in the image, and then store that number in some variable say, num_of_labels. Now take your image and put it through a while loop that erodes at a set rate that you specify, finding external contours in the image in each loop and stopping the loop once you have the same number of external contours as your num_of_labels. Then simply find each contours' bounding box and you are done.
# erodes image based on given kernel size (erosion = expands black areas)
def erode( img, kern_size = 3 ):
retval, img = cv2.threshold(img, 254.0, 255.0, cv2.THRESH_BINARY) # threshold to deal with only black and white.
kern = np.ones((kern_size,kern_size),np.uint8) # make a kernel for erosion based on given kernel size.
eroded = cv2.erode(img, kern, 1) # erode your image to blobbify black areas
y,x = eroded.shape # get shape of image to make a white boarder around image of 1px, to avoid problems with find contours.
return cv2.rectangle(eroded, (0,0), (x,y), (255,255,255), 1)
# finds contours of eroded image
def prep( img, kern_size = 3 ):
img = erode( img, kern_size )
retval, img = cv2.threshold(img, 200.0, 255.0, cv2.THRESH_BINARY_INV) # invert colors for findContours
return cv2.findContours(img,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE) # Find Contours of Image
# given img & number of desired blobs, returns contours of blobs.
def blobbify(img, num_of_labels, kern_size = 3, dilation_rate = 10):
prep_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count.
while len(contours) > num_of_labels:
kern_size += dilation_rate # add dilation_rate to kern_size to increase the blob. Remember kern_size must always be odd.
previous = (prep_img, contours, hierarchy)
processed_img, contours, hierarchy = prep( img.copy(), kern_size ) # dilate img and check current contour count, again.
if len(contours) < num_of_labels:
return (processed_img, contours, hierarchy)
else:
return previous
# finds bounding boxes of all contours
def bounding_box(contours):
bBox = []
for curve in contours:
box = cv2.boundingRect(curve)
bBox.append(box)
return bBox
The resulting boxes from the above method will have space around the labels, and this may include part of the original design, if the boxes are applied to the original image. To avoid this make regions of interest via your new found boxes and trim the white space. Then save that roi's shape as your new box.
Perhaps you have no way of knowing how many labels will be in the image. If this is the case, then I recommend playing around with erosion values until you find the best one to suit your case and get the desired blobs.
Or you could try find contours on the remaining content, after removing the design, and combine bounding boxes into one rectangle based on their distance from each other.
After you found your boxes, simply use those boxes with respect to the original image and you will be done.
Scene Text Detection Module in OpenCV 3
As mentioned in the comments to your question, there already exists a means of scene text detection (not document text detection) in opencv 3. I understand you do not have the ability to switch versions, but for those with the same question and not limited to an older opencv version, I decided to include this at the end. Documentation for the scene text detection can be found with a simple google search.
The opencv module for text detection also comes with text recognition that implements tessaract, which is a free open-source text recognition module. The downfall of tessaract, and therefore opencv's scene text recognition module is that it is not as refined as commercial applications and is time consuming to use. Thus decreasing its performance, but its free to use, so its the best we got without paying money, if you want text recognition as well.
Links:
Documentation OpenCv
Older Documentation
The source code is located here, for analysis and understanding
Honestly, I lack the experience and expertise in both opencv and image processing in order to provide a detailed way in implementing their text detection module. The same with the SWT algorithm. I just got into this stuff this past few months, but as I learn more I will edit this answer.
Here's a simple image processing approach using only thresholding and contour filtering:
Obtain binary image. Load image, convert to grayscale, Gaussian blur, and adaptive threshold
Combine adjacent text. We create a rectangular structuring kernel then dilate to form a single contour
Filter for text contours. We find contours and filter using contour area. From here we can draw the bounding box with cv2.rectangle()
Using this original input image (removed red lines)
After converting the image to grayscale and Gaussian blurring, we adaptive threshold to obtain a binary image
Next we dilate to combine the text into a single contour
From here we find contours and filter using a minimum threshold area (in case there was small noise). Here's the result
If we wanted to, we could also extract and save each ROI using Numpy slicing
Code
import cv2
# Load image, grayscale, Gaussian blur, adaptive threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (9,9), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)
# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9,9))
dilate = cv2.dilate(thresh, kernel, iterations=4)
# Find contours, highlight text areas, and extract ROIs
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
ROI_number = 0
for c in cnts:
area = cv2.contourArea(c)
if area > 10000:
x,y,w,h = cv2.boundingRect(c)
cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 3)
# ROI = image[y:y+h, x:x+w]
# cv2.imwrite('ROI_{}.png'.format(ROI_number), ROI)
# ROI_number += 1
cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()