I have a few thousands of PDF files containing B&W images (1bit) from digitalized paper forms. I'm trying to OCR some fields, but sometime the writing is too faint:
I've just learned about morphological transforms. They are really cool!!! I feel like I'm abusing them (like I did with regular expressions when I learned Perl).
I'm only interested in the date, 07-06-2017:
im = cv2.blur(im, (5, 5))
plt.imshow(im, 'gray')
ret, thresh = cv2.threshold(im, 250, 255, 0)
plt.imshow(~thresh, 'gray')
People filling this form seems to have some disregard for the grid, so I tried to get rid of it. I'm able to isolate the horizontal line with this transform:
horizontal = cv2.morphologyEx(
~thresh,
cv2.MORPH_OPEN,
cv2.getStructuringElement(cv2.MORPH_RECT, (100, 1)),
)
plt.imshow(horizontal, 'gray')
I can get the vertical lines as well:
plt.imshow(horizontal ^ ~thresh, 'gray')
ret, thresh2 = cv2.threshold(roi, 127, 255, 0)
vertical = cv2.morphologyEx(
~thresh2,
cv2.MORPH_OPEN,
cv2.getStructuringElement(cv2.MORPH_RECT, (2, 15)),
iterations=2
)
vertical = cv2.morphologyEx(
~vertical,
cv2.MORPH_ERODE,
cv2.getStructuringElement(cv2.MORPH_RECT, (9, 9))
)
horizontal = cv2.morphologyEx(
~horizontal,
cv2.MORPH_ERODE,
cv2.getStructuringElement(cv2.MORPH_RECT, (7, 7))
)
plt.imshow(vertical & horizontal, 'gray')
Now I can get rid of the grid:
plt.imshow(horizontal & vertical & ~thresh, 'gray')
The best I got was this, but the 4 is still split into 2 pieces:
plt.imshow(cv2.morphologyEx(im2, cv2.MORPH_CLOSE,
cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))), 'gray')
Probably at this point it is better to use cv2.findContours and some heuristic in order to locate each digit, but I was wondering:
should I give up and demand all documents to be rescanned in grayscale?
are there better methods in order to isolate and locate the faint digits?
do you know any morphological transform to join cases like the "4"?
[update]
Is rescanning the documents too demanding? If it is no great trouble I believe it is better to get higher quality inputs than training and trying to refine your model to withstand noisy and atypical data
A bit of context: I'm a nobody working at a public agency in Brazil. The price for ICR solutions start in the 6 digits so nobody believes a single guy can write an ICR solution in-house. I'm naive enough to believe I can prove them wrong. Those PDF documents were sitting at an FTP server (about 100K files) and were scanned just to get rid of the dead tree version. Probably I can get the original form and scan again myself but I would have to ask for some official support - since this is the public sector I would like to keep this project underground as much as I can. What I have now is an error rate of 50%, but if this approach is a dead end there is no point trying to improve it.
Maybe with Active contour model of some sort?
For example, I found this library: https://github.com/pmneila/morphsnakes
Took your final "4" number:
After some quick tweaking (without actually understanding the parameters, so it may be possible to get a better result) I got this:
with the following code (I also hacked into the morphsnakes.py a little to save the images):
import morphsnakes
import numpy as np
from scipy.misc import imread
from matplotlib import pyplot as ppl
def circle_levelset(shape, center, sqradius, scalerow=1.0):
"""Build a binary function with a circle as the 0.5-levelset."""
grid = np.mgrid[list(map(slice, shape))].T - center
phi = sqradius - np.sqrt(np.sum((grid.T)**2, 0))
u = np.float_(phi > 0)
return u
#img = imread("testimages/mama07ORI.bmp")[...,0]/255.0
img = imread("four.png")[...,0]/255.0
# g(I)
gI = morphsnakes.gborders(img, alpha=900, sigma=3.5)
# Morphological GAC. Initialization of the level-set.
mgac = morphsnakes.MorphGAC(gI, smoothing=1, threshold=0.29, balloon=-1)
mgac.levelset = circle_levelset(img.shape, (39, 39), 39)
# Visual evolution.
ppl.figure()
morphsnakes.evolve_visual(mgac, num_iters=50, background=img)
Related
I'm pretty new to both python and openCV. I just need it for one project. Users take picture of ECG with their phones and send it to the server I need to extract the graph data and that's all.
Here's a sample image :
Original Image Sample
I should first crop the image to have only the graph I think.As I couldn't find a way I did it manually.
Here's some code which tries to isolate the graph by making the lines white (It works on the cropped image) Still leaves some nasty noises and inacurate polygons at the end some parts are not detected :
import cv2
import numpy as np
img = cv2.imread('image.jpg')
kernel = np.ones((6,6),np.uint8)
dilation = cv2.dilate(img,kernel,iterations = 1)
gray = cv2.cvtColor(dilation, cv2.COLOR_BGR2GRAY)
#
ret,gray = cv2.threshold(gray,160,255,0)
gray2 = gray.copy()
mask = np.zeros(gray.shape,np.uint8)
contours, hier = cv2.findContours(gray,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt) > 400:
approx = cv2.approxPolyDP(cnt,
0.005 * cv2.arcLength(cnt, True), True)
if(len(approx) >= 5):
cv2.drawContours(img, [approx], 0, (0, 0, 255), 5)
res = cv2.bitwise_and(gray2,gray2,mask)
cv2.imwrite('output.png',img)
Now I need to make it better. I found most of the code from different places and attached them togheter.
np.ones((6,6),np.uint8)
For example here if I use anything other than 6,6 I'm in trouble :frowning:
also
cv2.threshold(gray,160,255,0)
I found 160 255 by tweaking and all other hardcoded values in my code what if the lighting on another picture is different and these values won't work anymore?
And other than this I don't still get the result I want some polygons are attached by two different lines from bottom and top!
I just want one line to go from beggining to the end.
Please guide me to tweak and fix it for more general use.
This is not truly a duplicate of How to extract decimal in image with Pytesseract, as those answers did not solve my problem and my use case is different.
I'm using PyTesseract to recognise text in table cells. When it comes to recognising drug doses with decimal points, the OCR fails to recognise the ., though is accurate for everything else. I'm using tesseract v5.0.0-alpha.20200328 on Windows 10.
My pre-processing consists of upscaling by 400% using cubic, conversion to black and white, dilation and erosion, morphology, and blurring. I've tried a decent combination of all of these (as well as each on their own), and nothing has recognized the ..
I've tried --psm of various values as well as a character whitelist. I believe the font is Sergoe UI.
Before processing:
After processing:
PyTesseract output: 25mg »p
Processing code:
import cv2, pytesseract
import numpy as np
image = cv2.imread( '01.png' )
upscaled_image = cv2.resize(image, None, fx = 4, fy = 4, interpolation = cv2.INTER_CUBIC)
bw_image = cv2.cvtColor(upscaled_image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((2, 2), np.uint8)
dilated_image = cv2.dilate(bw_image, kernel, iterations=1)
eroded_image = cv2.erode(dilated_image, kernel, iterations=1)
thresh = cv2.threshold(eroded_image, 205, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
morh_image = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
blur_image = cv2.threshold(cv2.bilateralFilter(morh_image, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
final_image = blur_image
text = pytesseract.image_to_string(final_image, lang='eng', config='--psm 10')
If you haven't made sure of this, check out this link
visit https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/xk2ErJnFBQAJ
One major solution for for many problems is text height, I was facing many issues but wasn't able to figure out why, but seems sending image with correct size letters to tesseract solves many problems.
instead of upscaling to a random % try the number with which your image has letters close to 30- 40 Px.
Also if somehow your preprocessing change "." into a noise like char then too it will get ignored.
I had a similar case that and was able to increase the number of correct decimals by using image processing methods and upscaling of the image. Yet, a small share of the decimals were not recognized correctly.
The solution I found was to change the language setting for pytesseract:
I was using a non-English setting, but changing the config to lang='eng' fixed all remaining issues.
That might not help with the original question, though, as the setting is already eng.
I am currently trying to dectect some reflective and overlapping objects in several images. These images all come with noisy background and the objects I want to detect are overlapping badly.
Here are some examples of the images:
overlapping and sometimes the background can be bad too
Besides being able to detect the objects, I also want to count them in the future (so when there's an overlap it needs to know how many objects are overlapping).
What I tried was that I denoise the image first, and tried to apply canny edge.
image = cv2.fastNlMeansDenoisingColored(image,None,10,10,7,21) #denoise twice
image = cv2.fastNlMeansDenoisingColored(image,None,10,10,7,21)
imageblur = cv2.GaussianBlur(image,(5,5),0) #blur
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edged = cv2.Canny(gray, 10, 250)
hist = cv2.calcHist([gray],[0],None,[256],[0,256]) #check pixel histogram
#plt.hist(gray.ravel(),256,[0,256]); plt.show()
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
dilated = cv2.dilate(edged,kernel,iterations = 1)
closed = cv2.morphologyEx(dilated, cv2.MORPH_CLOSE, kernel)
imcompare = cv2.hconcat([edged,closed])
imcompare = cv2.resize(imcompare, (0,0), fx=0.7, fy=0.7)
cv2.imshow('Compare',imcompare)
(cnts, _) = cv2.findContours(closed.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
And these are my results:
without bounding boxes, with boxes
I'm not an expert in CV and I know what I'm trying right now probably wouldn't get me to desired result but I really hope I could get this project done, so any hints or directions are greatly appreciated! I'm willing to study any material that may help, and I just want to at least have a starting point which I could dive into from.
Btw I also tried looking into different color spaces, pixel histogram, watershed...etc but some challenging parts to me is that my "object" is not an object with uniform color but instead covers a big range of intensity (since it's reflective) and the background can be distracting too. I also thought about using neural network or YOLO or some other ML methods but I don't know which one could be the best (since I also want to seperate the overlapping objects).
Thanks to any response!
I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?
I tried image processing operations on the image and I used pytesseract to recognize the characters.
Using the code bellow I got from this photo:
to this one:
and then to this one:
Sharpen function:
def sharpen(img):
sharpen = iaa.Sharpen(alpha=1.0, lightness = 1.0)
sharpen_img = sharpen.augment_image(img)
return sharpen_img
Image processing code:
textZone = cv2.pyrUp(sharpen(originalImage[y:y + h - 1, x:x + w - 1])) #text zone cropped from the original image
sharp = cv2.cvtColor(textZone, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(sharp, 127, 255, cv2.THRESH_BINARY)
#the functions such as opening are inverted (I don't know why) that's why I did opening with MORPH_CLOSE parameter, dilatation with erode and so on
kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
open = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_open)
kernel_dilate = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,7))
dilate = cv2.erode(open,kernel_dilate)
kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
close = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, kernel_close)
print(pytesseract.image_to_string(close))
This is the result of pytesseract.image_to_string:
22203;?!)
92:53 a
The expected result is :
22/03/20
02:53 A
"Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?"
First, kudos for taking this project on and getting this far with it. What you have from the OpenCV/cv2 standpoint looks pretty good.
Now, if you're thinking of Tesseract to carry you the rest of the way, at the very least you'll have to train it. Here you have a tough choice: Invest in training Tesseract, or work up a CNN to recognize a limited alphabet. If you have a way to segment the image, I'd be tempted to go with the latter.
From the result you got and the expected result, you can see that some of the characters are recognized correctly. Assuming you are using a different image from that shown in the tutorial, I recommend you to change the values of threshold and getStructuringElement.
These values work better depending on the image color. The tutorial author must have optimized it for his/her use (by trial and error or some other way).
Here is a video if you want to play around with those value using sliders in opencv. You can also print your result in the same loop to see if you are getting the desired result.
One potential thing you could do to improve recognition on the characters is to dilate the characters so pytesseract gives a better result. Dilating the characters will connect the individual blobs together and can fix the / or the A characters. So starting with your latest binary image:
Original
Dilate with a 3x3 kernel with iterations=1 (left) or iterations=2 (right). You can experiment with other values but don't do it too much or the characters will all connect. Maybe this will provide a better result with you OCR.
import cv2
image = cv2.imread("1.PNG")
thresh = cv2.threshold(image, 115, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(thresh, kernel, iterations=1)
final = cv2.threshold(dilate, 115, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('final', final)
cv2.waitKey(0)
The image below shows an aerial photo of a house block (re-oriented with the longest side vertical), and the same image subjected to Adaptive Thresholding and Difference of Gaussians.
Images: Base; Adaptive Thresholding; Difference of Gaussians
The roof-print of the house is obvious (to the human eye) on the AdThresh image: it's a matter of connecting some obvious dots. In the sample image, finding the blue-bounded box below -
Image with desired rectangle marked in blue
I've had a crack at implementing HoughLinesP() and findContours(), but get nothing sensible (probably because there's some nuance that I'm missing). The python script-chunk that fails to find anything remotely like the blue box, is as follows:
import cv2
import numpy as np
from matplotlib import pyplot as plt
# read in full (RGBA) image - to get alpha layer to use as mask
img = cv2.imread('rotated_12.png', cv2.IMREAD_UNCHANGED)
grey = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Otsu's thresholding after Gaussian filtering
blur_base = cv2.GaussianBlur(grey,(9,9),0)
blur_diff = cv2.GaussianBlur(grey,(15,15),0)
_,thresh1 = cv2.threshold(grey,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
thresh = cv2.adaptiveThreshold(grey,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY,11,2)
DoG_01 = blur_base - blur_diff
edges_blur = cv2.Canny(blur_base,70,210)
# Find Contours
(ed, cnts,h) = cv2.findContours(grey, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:4]
for c in cnts:
approx = cv2.approxPolyDP(c, 0.1*cv2.arcLength(c, True), True)
cv2.drawContours(grey, [approx], -1, (0, 255, 0), 1)
# Hough Lines
minLineLength = 30
maxLineGap = 5
lines = cv2.HoughLinesP(edges_blur,1,np.pi/180,20,minLineLength,maxLineGap)
print "lines found:", len(lines)
for line in lines:
cv2.line(grey,(line[0][0], line[0][1]),(line[0][2],line[0][3]),(255,0,0),2)
# plot all the images
images = [img, thresh, DoG_01]
titles = ['Base','AdThresh','DoG01']
for i in xrange(len(images)):
plt.subplot(1,len(images),i+1),plt.imshow(images[i],'gray')
plt.title(titles[i]), plt.xticks([]), plt.yticks([])
plt.savefig('a_edgedetect_12.png')
cv2.destroyAllWindows()
I am trying to set things up without excessive parameterisation. I'm wary of 'tailoring' an algorithm for just this one image since this process will be run on hundreds of thousands of images (with roofs/rooves of different colours which may be less distinguishable from background). That said, I would love to see a solution that 'hit' the blue-box target - that way I could at the very least work out what I've done wrong.
If anyone has a quick-and-dirty way to do this sort of thing, it would be awesome to get a Python code snippet to work with.
The 'base' image ->
Base Image
You should apply the following:
1. Contrast Limited Adaptive Histogram Equalization-CLAHE and convert to gray-scale.
2. Gaussian Blur & Morphological transforms (dialation, erosion, etc) as mentioned by #bad_keypoints. This will help you get rid of the background noise. This is the most tricky step as the results will depend on the order in which you apply (first Gaussian Blur and then Morphological transforms or vice versa) and the window sizes you choose for this purpose.
3. Apply Adaptive thresholding
4. Apply Canny's Edge detection
5. Find contour having four corner points
As said earlier you need to tweak with input parameters of these functions and also need to validate these parameters with other images. As it might be possible that it will work for this case but not for other cases. Based on trial and error you need to fix the parameter values.