Digit recognition in python (OpenCV and pytesseract) - python

I'm currently trying to detect numbers from small screenshots. However, I have found the accuracy to be quite poor. I've been using OpenCV, the image is captured in RGB and converted to greyscale, then thresholding has been performed using a global value (I found adaptive didn't work so well).
Here is an example grey-scale of one of the numbers, followed by an example of the image post thresh-holding (the numbers can range from 1-99). Note that the initial screenshot of the image is quite small and is thus enlarged.
Any suggestions on how to improve accuracy using OpenCV or a different system altogether are much appreciated. Some code included below, the function is passed a screenshot in RGB of the number.
def getNumber(image):
image = cv2.resize(image, (0, 0), fx=3, fy=3)
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh, image_bin = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)
image_final = PIL.Image.fromarray(image_bin)
txt = pytesseract.image_to_string(
image_final, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt

Here's what i could improve, using otsu treshold is more efficent to separate text from background than giving an arbitrary value. Tesseract works better with black text on white background, and i also added padding as tesseract struggle to recognize characters if they are too close to the border.
This is the final image [final_image][1] and pytesseract manage to read "46"
import cv2,numpy,pytesseract
def getNumber(image):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Otsu Tresholding automatically find best threshold value
_, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# invert the image if the text is white and background is black
count_white = numpy.sum(binary_image > 0)
count_black = numpy.sum(binary_image == 0)
if count_black > count_white:
binary_image = 255 - binary_image
# padding
final_image = cv2.copyMakeBorder(image, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=(255, 255, 255))
txt = pytesseract.image_to_string(
final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt
Function is executed as :
>> getNumber(cv2.imread(img_path))
EDIT : note that you do not need this line :
image_final = PIL.Image.fromarray(image_bin)
as you can pass to pytesseractr an image in numpy array format (wich cv2 use), and Tesseract accuracy only drops for characters under 35 pixels (and also bigger, 35px height is actually the optimal height) so i did not resize it.
[1]: https://i.stack.imgur.com/OaJgQ.png

Related

Image enhancement for pytesseract 7seg detection

I'm in the middle of developing a system that predict numbers from 7Seg LCD and I'm using for the matter tesseract OCR engine and it's wrapper for python pytesseract.
I'm taking pictures with a camera then cropping the Region of Interest and I found out that I have to enhance my Image quality to increase the accuracy of the OCR engine.
I used some Image processing techniques (gray scale --> Gaussian Blur --> threshold) and I got a quiet good image but tesseract still can't detect the numbers in the image.
I use the code:
image = cv2.imread('test.jpg')
image = image[50:200, 300:540]
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
cv2.imshow('result', image)
cv2.waitKey()
cv2.destroyAllWindows()
cv2.imwrite('enhanced.jpg', image)
tess_dir_config = r'--tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"'
text = image_to_string(image, lang='letsgodigital', config=tess_dir_config)
print(text)
The Output Image:
The Input Image:
The engine usually have an empty output and if not it will not detect the number correctly.
Is there some sort of other image processing that I can use to get the potential of the Engine.
Note: I'am using letsgodigital weights
This works for me, if I improve the crop a little, and use page segmentation mode 7. (This mode does no page segmentation and assumes a single line of text.)
import cv2
import matplotlib.pyplot as plt
import pytesseract
image = cv2.imread('seven_seg_disp.jpg')
# Strip off top of meter and little percent symbol.
image = image[90:200, 300:520]
# plt.imshow(image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# plt.imshow(image)
tess_dir_config = r'--tessdata-dir "../.tesseract" --psm 7'
text = pytesseract.image_to_string(image, lang='letsgodigital', config=tess_dir_config)
text = text.strip()
print(text) # prints 75
Note: I changed the value of tessdata-dir because it's in a different place on my computer.

How to remove spotted background from numbers using cv2?

I'm using py-tesseract for OCR on images as below but I'm unable to get consistent output from the unprocessed images. How can the spotted background be reduced and the numbers highlighted using cv2 to increase accuracy? I'm also interested in keeping the separators in the output string.
Below pre-processing seems to work with some accuracy
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
(T, threshInv) = cv2.threshold(blurred, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
Getting output using psm --6: 6.903.722,99
Here's one solution, which is based on the ideas on a similar post. The main idea is to apply a Hit-or-Miss operation looking for the pattern you want to eliminate. In this case the pattern is one black (or white, if you invert the image) surrounded by pixels of the complimentary color. I've also included a thresholding operation with some bias, because some of the characters are easily destroyed (you could really benefit from more high-res image). These are the steps:
Get grayscale image via color conversion
Threshold with bias to get a binary image
Apply the Hit-or-Miss with one central pixel target kernel
Use the result from the prior operation to suppress the noise in the original image
Let's see the code:
# Imports:
import numpy as np
import cv2
image path
path = "D://opencvImages//"
fileName = "8WFNvsZ.jpg"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
thresh, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Use Otsu's threshold value and add some bias:
thresh = 1.05 * thresh
_, binaryImage = cv2.threshold(grayscaleImage, thresh, 255, cv2.THRESH_BINARY_INV )
The first bit of code gets the binary image of the input. Note that I've added some bias to the threshold obtained via Otsu to avoid degrading the characters. This is the result:
Ok, let's apply the Hit-or-Miss operation to get the dot mask:
# Perform morphological hit or miss operation
kernel = np.array([[-1,-1,-1], [-1,1,-1], [-1,-1,-1]])
dotMask = cv2.filter2D(binaryImage, -1, kernel)
# Bitwise-xor mask with binary image to remove dots
result = cv2.bitwise_xor(binaryImage, dotMask)
The dot mask is this:
And the result of subtracting (or XORing) this mask to the original binary image is this:
If I run the inverted (black text on white background) result image on PyOCR I get this string output:
Text is: 6.003.722,09
The other image produces this final result:
And its OCR returns this:
Text is: 4.705.640,00

Post-process and read blurred digits - OpenCV / tesseract

My boss just gave me 9000 images showing a multi-meter output, and he wants me to read them all and write the voltage output on a text file for Monday morning!
There's no way I can do it manually , so I desperately need your help to automate the process.
I have already done a bit of coding. The multimeter screen position is fixed over the 9000 images, so I just cropped the pictures and zoomed on the screen.
Here's my code so far :
import pytesseract as tess
tess.pytesseract.tesseract_cmd = r'D:\Programs\Tesseract-OCR\tesseract.exe'
from PIL import Image
from matplotlib import pyplot as plt
import cv2
import numpy as np
img = cv2.imread('test1.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
ret, BW = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)
plt.imshow(BW)
plt.show()
print(tess.image_to_string(img,config='digits --oem 3'))
My code outputs this kind of picture:
For this picture, tesseract reads the following number: 138
I have two problems here: the first one is I don't know how to properly pre-process my pictures for tesseract to easily read them.
Second problem: I don't know how to process the decimal dot. In fact, on some picture there is no decimal dot, so I need to find a way to make tesseract read it.
Do you think I'll need to manually "train" tesseract, so that I reach a near 100% accuracy?
Thank you sooo much for your help, I'll continue my research meanwhile on the OpenCv functions!
here's one of the originals:
UPDATED VERSION:
Ok so I updated a bit my code so far :
# Loading the picture
img = cv2.imread('test1.JPG')
# Crop - Rotate - Scale up
x = 1400
y = 1375
h = 325
w = 800
img = img[y:y+h, x:x+w]
image = cv2.rotate(img, cv2.ROTATE_180)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,10))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=1)
cv2.imshow('close', close)
x = 0
y = 0
h = 300
w = 750
img = close[y:y+h, x:x+w]
plt.imshow(img)
plt.show()
print(tess.image_to_string(img,config='--psm 7 -c tessedit_char_whitelist=0123456789 --oem 0'))
I get a finer image for tesseract to analyse. But still no luck with the decimal dot :/.
you need around 3-4 hours to type them manually, just spend 1 extrahour to write a wrapper that opens the image and let's you input a value.
when automating, my biggest concern would be if all the measurements have the same unit of kilo-ohm, you should really include that check in your open-cv code.
good luck!

How to improve OCR with Pytesseract text recognition?

Hi I am looking to improve my performance with pytesseract at digit recognition.
I take my raw image and split it into parts that look like this:
The size can vary.
To this I apply some pre-processing methods like so
image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)
and I get this
I then pass this to pytesseract:
num = pytesseract.image_to_string(result_img, lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
However this is not good enough for me and often gets numbers wrong.
I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.
Thank you.
You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold
to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string. We use the --psm 6 configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:
Input image -> Binary -> Invert
Result from Pytesseract OCR
8
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()

Pytesseract random bug when reading text

I'm creating a bot for a video game and I have to read some information displayed on the screen. Given that the information is always at the same position, I have no issue to take a screenshot and crop the picture to the right position.
90% of the time, the recognition will be perfect, but sometimes it will return something that seems totally random (see the example below).
I've tried to turn the picture into black and white with no success, and tried to change the pytesseract config (config = ("-l fra --oem 1 --psm 6"))
def readScreenPart(x,y,w,h):
monitor = {"top": y, "left": x, "width": w, "height": h}
output = "monitor.png"
with mss.mss() as sct:
sct_img = sct.grab(monitor)
mss.tools.to_png(sct_img.rgb, sct_img.size, output=output)
img = cv2.imread("monitor.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite("result.png", img)
config = ("-l fra --oem 1 --psm 6")
return pytesseract.image_to_string(img,config=config)
Example : this picture generates a bug, it returns the string "IRPMV/LEIILK"
Another image
Now I don't know where the issue comes from, given that it is not just a single wrong character but a totally random result..
Thanks for your help
Preprocessing is an important step before throwing the image into Pytesseract. Generally, you want to have the desired text in black with the background in white. Currently, your foreground text is in green instead of white. Here's a simple process to fix the format
Convert image to grayscale
Otsu's threshold to obtain a binary image
Invert image
Original image
Otsu's threshold
Invert image
Output from Pytesseract
122 Vitalité
Other image
200 Vitalité
Before inverting the image, it may be a good idea to perform morphological operations to smooth/filter the text. But for your images, the text does not necessary require additional smoothing
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('3.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
As the comment said, it's about your text and background color. Tesseract is basically useless with light text on dark background, here is the few lines i apply to any text image before giving it to tesseract :
# convert color image to grayscale
grayscale_image = cv2.cvtColor(your_image, cv2.COLOR_BGR2GRAY)
# Otsu Tresholding method find perfect treshold, return an image with only black and white pixels
_, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# we just don't know if the text is in black and background in white or vice-versa
# so we count how many black pixels and white pixels there are
count_white = numpy.sum(binary > 0)
count_black = numpy.sum(binary == 0)
# if there are more black pixels than whites, then it's the background that is black so we invert the image's color
if count_black > count_white:
binary_image = 255 - binary_image
black_text_white_background_image = binary_image
Now you're sure to have black text on white background no matter wich colors was the original image, also Tesseract is (weirdly) the most efficient if the characters have an height of 35pixels, larger characters doesn't significantly reduce the accuracy, but just a few pixels shorter can make tesseract useless!

Categories

Resources