I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?
I tried image processing operations on the image and I used pytesseract to recognize the characters.
Using the code bellow I got from this photo:
to this one:
and then to this one:
Sharpen function:
def sharpen(img):
sharpen = iaa.Sharpen(alpha=1.0, lightness = 1.0)
sharpen_img = sharpen.augment_image(img)
return sharpen_img
Image processing code:
textZone = cv2.pyrUp(sharpen(originalImage[y:y + h - 1, x:x + w - 1])) #text zone cropped from the original image
sharp = cv2.cvtColor(textZone, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(sharp, 127, 255, cv2.THRESH_BINARY)
#the functions such as opening are inverted (I don't know why) that's why I did opening with MORPH_CLOSE parameter, dilatation with erode and so on
kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
open = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_open)
kernel_dilate = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,7))
dilate = cv2.erode(open,kernel_dilate)
kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
close = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, kernel_close)
print(pytesseract.image_to_string(close))
This is the result of pytesseract.image_to_string:
22203;?!)
92:53 a
The expected result is :
22/03/20
02:53 A
"Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?"
First, kudos for taking this project on and getting this far with it. What you have from the OpenCV/cv2 standpoint looks pretty good.
Now, if you're thinking of Tesseract to carry you the rest of the way, at the very least you'll have to train it. Here you have a tough choice: Invest in training Tesseract, or work up a CNN to recognize a limited alphabet. If you have a way to segment the image, I'd be tempted to go with the latter.
From the result you got and the expected result, you can see that some of the characters are recognized correctly. Assuming you are using a different image from that shown in the tutorial, I recommend you to change the values of threshold and getStructuringElement.
These values work better depending on the image color. The tutorial author must have optimized it for his/her use (by trial and error or some other way).
Here is a video if you want to play around with those value using sliders in opencv. You can also print your result in the same loop to see if you are getting the desired result.
One potential thing you could do to improve recognition on the characters is to dilate the characters so pytesseract gives a better result. Dilating the characters will connect the individual blobs together and can fix the / or the A characters. So starting with your latest binary image:
Original
Dilate with a 3x3 kernel with iterations=1 (left) or iterations=2 (right). You can experiment with other values but don't do it too much or the characters will all connect. Maybe this will provide a better result with you OCR.
import cv2
image = cv2.imread("1.PNG")
thresh = cv2.threshold(image, 115, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(thresh, kernel, iterations=1)
final = cv2.threshold(dilate, 115, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('final', final)
cv2.waitKey(0)
Related
I have these images containing the handwritten circular annotation on the printed text images. I want to remove these annotations from the input image. I have tried to apply some of the thresholding methods as discussed in many threads on StackOverflow, but my results are not as I expected.
However, the method that I am using works really well if the annotation is marked by a blue pen but when the annotation is marked by a black pen then the method of thresholding and erosion won’t produce the output as expected.
Here is a sample image of my achieved results on blue annotations with the thresholding and erosion method
Image (input on the left and output on the right)
Code
import cv2
import numpy as np
from google.colab.patches import cv2_imshow
img = cv2.imread("/content/Scan_0101.jpg")
cv2_imshow(img)
wimg = img[:, :, 0]
ret,thresh = cv2.threshold(wimg,120,255,cv2.THRESH_BINARY)
cv2_imshow(thresh)
kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(thresh, kernel, iterations = 1)
mask = cv2.bitwise_or(erosion, thresh)
#cv2_imshow(erosion)
white = np.ones(img.shape,np.uint8)*255
white[:, :, 0] = mask
white[:, :, 1] = mask
white[:, :, 2] = mask
result = cv2.bitwise_or(img, white)
erosion = cv2.erode(result, kernel, iterations = 1)
Here is a sample image of my achieved results on black annotations with the thresholding and erosion method
Image (input on the left and output on the right)
Any suggested approach for this problem? or how this code can be modified to produce the required results.
You must understand that as the gray values in the text and those of the hand writings are in the same range, no thresholding method in the world can work.
In fact, no algorithm at all can succeed without "hints" on what characters look like or don't look like. Even the stroke thickness is not distinctive enough.
The only possible indication is that the circles are made of a smooth and long stroke. And removing them where they cross the characters is just impossible.
Some Parts of handwritten circles (on line spacing regions) may be able to extract, with the assumption "many letters align on same line". In your image, upper and lower part of the circle will be extracted, I think.
Then, if you track the black line with starting from the extracted part (with assuming smooth curvature), it may be able to detect the connected handwritten circle.
However... in real, I think such process will encounter many difficulties : especially regarding the fact that characters will be cut off by removing curve.
This is not truly a duplicate of How to extract decimal in image with Pytesseract, as those answers did not solve my problem and my use case is different.
I'm using PyTesseract to recognise text in table cells. When it comes to recognising drug doses with decimal points, the OCR fails to recognise the ., though is accurate for everything else. I'm using tesseract v5.0.0-alpha.20200328 on Windows 10.
My pre-processing consists of upscaling by 400% using cubic, conversion to black and white, dilation and erosion, morphology, and blurring. I've tried a decent combination of all of these (as well as each on their own), and nothing has recognized the ..
I've tried --psm of various values as well as a character whitelist. I believe the font is Sergoe UI.
Before processing:
After processing:
PyTesseract output: 25mg »p
Processing code:
import cv2, pytesseract
import numpy as np
image = cv2.imread( '01.png' )
upscaled_image = cv2.resize(image, None, fx = 4, fy = 4, interpolation = cv2.INTER_CUBIC)
bw_image = cv2.cvtColor(upscaled_image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((2, 2), np.uint8)
dilated_image = cv2.dilate(bw_image, kernel, iterations=1)
eroded_image = cv2.erode(dilated_image, kernel, iterations=1)
thresh = cv2.threshold(eroded_image, 205, 255, cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
morh_image = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
blur_image = cv2.threshold(cv2.bilateralFilter(morh_image, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
final_image = blur_image
text = pytesseract.image_to_string(final_image, lang='eng', config='--psm 10')
If you haven't made sure of this, check out this link
visit https://groups.google.com/g/tesseract-ocr/c/Wdh_JJwnw94/m/xk2ErJnFBQAJ
One major solution for for many problems is text height, I was facing many issues but wasn't able to figure out why, but seems sending image with correct size letters to tesseract solves many problems.
instead of upscaling to a random % try the number with which your image has letters close to 30- 40 Px.
Also if somehow your preprocessing change "." into a noise like char then too it will get ignored.
I had a similar case that and was able to increase the number of correct decimals by using image processing methods and upscaling of the image. Yet, a small share of the decimals were not recognized correctly.
The solution I found was to change the language setting for pytesseract:
I was using a non-English setting, but changing the config to lang='eng' fixed all remaining issues.
That might not help with the original question, though, as the setting is already eng.
I want to eliminate gray lines in 16 bit image you can see.
Final goal is remove line in object image(second image) with background image(first image).
I thought it need FFT, but i don't know how FFT applied. There will be other ways, too.
please help me.
One simple way using Python/OpenCV is to use morphology close multiple times with a small vertical rectangular kernel.
Input:
import cv2
import numpy as np
img = cv2.imread('lines.png')
# do morphology multiple times to remove horizontal lines
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,5))
result = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel, iterations = 9)
# write result to disk
cv2.imwrite("lines_removed.png", result)
# display it
cv2.imshow("result", result)
cv2.waitKey(0)
However, it will modify the image everywhere slightly
I am currently working on a Optical Character Recognition system for Japanese letters. I am already able to identify individual characters if the letter in question is seperated and in the right size. (The deep learning part of the work)
As a next step I am trying to segmentate individual characters in an image to make a prediction about which letter it is. (For now it is only about black characters on white background, scanned PDFs and such)
So far, the most promising results I got was using the function "cv2.findContours" from OpenCV.
Here are 3 examples:
While the results are not entirely horrible, there are still many cases where two or more characters are treated as one or where one characters is split up into multiple boxes. I cannot seem to make the code work for all fonts and character sizes.
While the first image is still pretty close to being perfect, the second and third are not nearly as accurate. (I hope it is clear where the mistakes are)
I tried completely different approaches, such as hough transform, but I couldn't achieve anything nearly as good as this approach.
This by the way is my current code:
import cv2
import numpy as np
file_name = '../data/test.jpg'
img = cv2.imread(file_name)
img_final = cv2.imread(file_name)
img_final = cv2.resize(img_final, (img_final.shape[1], img_final.shape[0]))
img2gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(img2gray,(7,7),0)
# thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)Y)
ret, mask = cv2.threshold(blur, 180, 255, cv2.THRESH_BINARY)
image_final = cv2.bitwise_and(img2gray , img2gray , mask = mask)
ret, new_img = cv2.threshold(image_final, 180 , 255, cv2.THRESH_BINARY_INV)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(2,2))
dilated = cv2.dilate(new_img,kernel,iterations = 1)
_, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
index = 0
for contour in contours:
[x,y,w,h] = cv2.boundingRect(contour)
if w <1 and h<1:
continue
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,255),2)
cropped = image_final[y :y + h , x : x + w]
s = '../output/crop_' + str(index) + '.jpg'
cv2.imwrite(s , cropped)
index = index + 1
cv2.imshow('captcha_result' , img)
cv2.waitKey()
s2 = '../data/output.jpg'
cv2.imwrite(s2 , img)
Now questions are the following:
Does anybody have an idea how to improve the accuracy of my code?
Is it better to take a whole new approach?
Can a sliding window help me here?
Where do I go from here?
Can I maybe use the sliding window to send the individual characters to the prediction?
With all the false positives (e.g. characters being split in two, despite trying to limit it) I am uncertain whether or not I can simply use the cropped images of characters as they are and how to further filter the results.
As I am new to all this, I would really appreciate any help or hint I can get!
I am looking forward to your replies! :)
I'm working with images that have text. The problem is that these images are receipts, and after a lot of transformations, the text lost quality.
I'm using python and opencv.
I was trying with a lot of combinations of morphological transformations from the doc Morphological Transformations, but I don't get satisfactory results.
I'm doing this right now (I'll comment what I've tried, and just let uncommented what I'm using):
kernel = np.ones((2, 2), np.uint8)
# opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
# closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
# dilation = cv2.dilate(opening, kernel, iterations=1)
# kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(img, kernel, iterations=1)
# gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
#
img = erosion.copy()
With this, from this original image:
I get this:
It's a little bit better, as you can see. But it still too bad. The OCR (tesseract) doesn't recognize the characters here very well. I've trained, but as you can note, every "e" is different, and so on.
I get good results, but I think, if I resolve this problem, they would be even better.
Maybe I can do another thing, or use a better combination of the morphological transformations. If there is another tool (PIL, imagemagick, etc..) that I could use, I can use it.
Here's the whole image, so you can see how it looks:
As I said, it's not so bad, but a little be more "optimization" of the letters would be perfect.
After years working in this theme, I can tell now, that what I wanted to do take a big effort, it's quite slow, and NEVER worked as I expected. The irregularities of the pixels in the characters are always unpredictable, that's why "easy algorithms" don't work.
Question: It's impossible then to have a decent OCR, which can read damaged characters?
Answer: No, it's not impossible. But it takes "a bit" more than just using erosion, morphological closing or something like that.
Then, how? Neural Networks :)
Here are two amazing papers that help me a lot:
Can we build language-independent OCR using LSTM networks?
Reading Scene Text in Deep Convolutional Sequences
And for those who aren't familiar with RNN, I can suggest this:
Understanding LSTM Networks
There's also a python library, which works pretty good (and unfortunately even better for C++):
ocropy
I really hope this can help someone.
In my experience erode impairs OCR quality. If you have grayscale image (not binary) you can use better binarization algorithm. I use SAUVOLA algorithm for binarization. If you have only binary image the best thing you can do is removing the noise (remove all small dots).
Did you consider the neighboring pixels and add sum of them.
For example:
n = numpy.zeros((3,3))
s = numpy.zeros((3,3))
w = numpy.zeros((3,3))
e = numpy.zeros((3,3))
n[0][1] = 1
s[2][1] = 1
w[1][0] = 1
e[1][2] = 1
img_n = cv2.erode(img, n, iterations=1)
img_s = cv2.erode(img, s, iterations=1)
img_w = cv2.erode(img, w, iterations=1)
img_e = cv2.erode(img, e, iterations=1)
result = img_n + img_s + img_w + img_e + img
Also, you can either numpy or cv2 to add the arrays.
I found the Ramer–Douglas–Peucker Algorithm I'm trying to implement it for closed polygons in Haskell. Maybe it can solve something.