I'm creating a bot for a video game and I have to read some information displayed on the screen. Given that the information is always at the same position, I have no issue to take a screenshot and crop the picture to the right position.
90% of the time, the recognition will be perfect, but sometimes it will return something that seems totally random (see the example below).
I've tried to turn the picture into black and white with no success, and tried to change the pytesseract config (config = ("-l fra --oem 1 --psm 6"))
def readScreenPart(x,y,w,h):
monitor = {"top": y, "left": x, "width": w, "height": h}
output = "monitor.png"
with mss.mss() as sct:
sct_img = sct.grab(monitor)
mss.tools.to_png(sct_img.rgb, sct_img.size, output=output)
img = cv2.imread("monitor.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imwrite("result.png", img)
config = ("-l fra --oem 1 --psm 6")
return pytesseract.image_to_string(img,config=config)
Example : this picture generates a bug, it returns the string "IRPMV/LEIILK"
Another image
Now I don't know where the issue comes from, given that it is not just a single wrong character but a totally random result..
Thanks for your help
Preprocessing is an important step before throwing the image into Pytesseract. Generally, you want to have the desired text in black with the background in white. Currently, your foreground text is in green instead of white. Here's a simple process to fix the format
Convert image to grayscale
Otsu's threshold to obtain a binary image
Invert image
Original image
Otsu's threshold
Invert image
Output from Pytesseract
122 Vitalité
Other image
200 Vitalité
Before inverting the image, it may be a good idea to perform morphological operations to smooth/filter the text. But for your images, the text does not necessary require additional smoothing
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('3.png',0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
result = 255 - thresh
data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.waitKey()
As the comment said, it's about your text and background color. Tesseract is basically useless with light text on dark background, here is the few lines i apply to any text image before giving it to tesseract :
# convert color image to grayscale
grayscale_image = cv2.cvtColor(your_image, cv2.COLOR_BGR2GRAY)
# Otsu Tresholding method find perfect treshold, return an image with only black and white pixels
_, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# we just don't know if the text is in black and background in white or vice-versa
# so we count how many black pixels and white pixels there are
count_white = numpy.sum(binary > 0)
count_black = numpy.sum(binary == 0)
# if there are more black pixels than whites, then it's the background that is black so we invert the image's color
if count_black > count_white:
binary_image = 255 - binary_image
black_text_white_background_image = binary_image
Now you're sure to have black text on white background no matter wich colors was the original image, also Tesseract is (weirdly) the most efficient if the characters have an height of 35pixels, larger characters doesn't significantly reduce the accuracy, but just a few pixels shorter can make tesseract useless!
Related
I have the following image:
Initial Image
I am using the following code the rotate the image:
from skimage.transform import rotate
image = cv2.imread('122.png')
rotated = rotate(image,34,cval=1,resize = True)
Once I execute this code, I receive the following image:
Rotated Image
To eliminate the blur on the image, I use the following code to set a threshold. Anything that is not white is turned to black (so the gray spots turn black). The code for that is as follows:
ret, thresh_hold = cv2.threshold(rotated, 0, 100, cv2.THRESH_BINARY)
plt.imshow(thresh_hold)
Instead of getting a nice clear picture, I receive the following:
Choppy Image
Does anyone know what I can do to improve the image quality, or adjust the threshold to create a clearer image?
I attempted to adjust the threshold to different values, but this changed the image to all black or all white.
One way to approach that is to simply antialias the image in Python/OpenCV.
To do that one simply converts to grayscale. Then blurs the image, then applies a stretch of the image.
Adjust the blur sigma to change the antialiasing.
Input:
import cv2
import numpy as np
import skimage.exposure
# load image
img = cv2.imread('122.png')
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# blur threshold image
blur = cv2.GaussianBlur(gray, (0,0), sigmaX=2, sigmaY=2, borderType = cv2.BORDER_DEFAULT)
# stretch so that 255 -> 255 and 127.5 -> 0
result = skimage.exposure.rescale_intensity(blur, in_range=(127.5,255), out_range=(0,255)).astype(np.uint8)
# save output
cv2.imwrite('122_antialiased.png', result)
# Display various images to see the steps
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
I'm in the middle of developing a system that predict numbers from 7Seg LCD and I'm using for the matter tesseract OCR engine and it's wrapper for python pytesseract.
I'm taking pictures with a camera then cropping the Region of Interest and I found out that I have to enhance my Image quality to increase the accuracy of the OCR engine.
I used some Image processing techniques (gray scale --> Gaussian Blur --> threshold) and I got a quiet good image but tesseract still can't detect the numbers in the image.
I use the code:
image = cv2.imread('test.jpg')
image = image[50:200, 300:540]
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
cv2.imshow('result', image)
cv2.waitKey()
cv2.destroyAllWindows()
cv2.imwrite('enhanced.jpg', image)
tess_dir_config = r'--tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"'
text = image_to_string(image, lang='letsgodigital', config=tess_dir_config)
print(text)
The Output Image:
The Input Image:
The engine usually have an empty output and if not it will not detect the number correctly.
Is there some sort of other image processing that I can use to get the potential of the Engine.
Note: I'am using letsgodigital weights
This works for me, if I improve the crop a little, and use page segmentation mode 7. (This mode does no page segmentation and assumes a single line of text.)
import cv2
import matplotlib.pyplot as plt
import pytesseract
image = cv2.imread('seven_seg_disp.jpg')
# Strip off top of meter and little percent symbol.
image = image[90:200, 300:520]
# plt.imshow(image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# plt.imshow(image)
tess_dir_config = r'--tessdata-dir "../.tesseract" --psm 7'
text = pytesseract.image_to_string(image, lang='letsgodigital', config=tess_dir_config)
text = text.strip()
print(text) # prints 75
Note: I changed the value of tessdata-dir because it's in a different place on my computer.
I have the following function to pre-process an image for Tesseract OCR, in most of the image the text is white, there can be green, red and purple text too. I want to be able to read all of that, but when I apply the thresholding during the pre-processing the red text is gone. Is there a way to avoid this? It doesn't happen with the green text unless it's dark green
def pre_process_img(img):
open_cv_image = numpy.array(img)
# Convert RGB to BGR
open_cv_image = open_cv_image[:, :, ::-1].copy()
img_gray = cv2.cvtColor(numpy.array(img), cv2.COLOR_BGR2GRAY)
img_gray = cv2.resize(img_gray, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
img_inverted = 255 - img_gray
ret, thresh1 = cv2.threshold(img_inverted, 127, 255, cv2.THRESH_BINARY)
# [DEBUG] show pre processed image
# cv2.imshow("inverted", thresh1)
# cv2.waitKey(0)
return thresh1
In this function img is a PIL.Image.Image image, I convert it to an OpenCV image and apply preprocessing (turning into greyscale, rezising, inverting and binary thresholding). With psm 11 on Tesseract it has given a good enough result.
Btw If you have any suggestion to improve my pre_process_img function I'm open to listen. I'm new to OpenCV and I just stuck with the thing that gave me the best result from everything I've tried
This is my image here
Convert from BGR to HSV colorspace in Python/OpenCV. Then simply threshold the value channel. Here is the value channel. You will see that all text is white (in this case).
I want to auto adjust the brightness and contrast of a color image taken from phone under different lighting conditions. Please help me I am new to OpenCV.
Source:
Input Image
Result:
result
What I am looking for is more of a localized transformation. In essence, I want the shadow to get as light as possible completely gone if possible and get darker pixels of the image to get darker, more in contrast and the light pixels to get more white but not to a point where it gets overexposed or anything like that.
I have tried CLAHE, Histogram Equalization, Binary Thresholding, Adaptive Thresholding, etc But nothing has worked.
My initials thoughts are that I need to neutralize Highlights and bring darker pixels more towards the average value while keeping the text and lines as dark as possible. And then maybe do a contrast filter. But I am unable to Get the result please help me.
Here is one way to do that in Python/OpenCV.
Read the input
Increase contrast
Convert original to grayscale
Adaptive threshold
Use the thresholded image to make the background white on the contrast increased image
Save results
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("math_diagram.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 15)
# make background of input white where thresh is white
result = img.copy()
result[thresh==255] = (255,255,255)
# write results to disk
cv2.imwrite("math_diagram_threshold.jpg", thresh)
cv2.imwrite("math_diagram_processed.jpg", result)
# display it
cv2.imshow("THRESHOLD", thresh)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
Threshold image:
Result:
You can use any local binarization method. In OpenCV there is one such method called Wolf-Julion local binarization which can be applied to the input image. Below is code snippet as an example:
import cv2
image = cv2.imread('input.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)[:,:,2]
T = cv2.ximgproc.niBlackThreshold(gray, maxValue=255, type=cv2.THRESH_BINARY_INV, blockSize=81, k=0.1, binarizationMethod=cv2.ximgproc.BINARIZATION_WOLF)
grayb = (gray > T).astype("uint8") * 255
cv2.imshow("Binary", grayb)
cv2.waitKey(0)
The output result from above code is below. Please note that to use ximgproc module you need to install opencv contrib package.
I'm currently trying to detect numbers from small screenshots. However, I have found the accuracy to be quite poor. I've been using OpenCV, the image is captured in RGB and converted to greyscale, then thresholding has been performed using a global value (I found adaptive didn't work so well).
Here is an example grey-scale of one of the numbers, followed by an example of the image post thresh-holding (the numbers can range from 1-99). Note that the initial screenshot of the image is quite small and is thus enlarged.
Any suggestions on how to improve accuracy using OpenCV or a different system altogether are much appreciated. Some code included below, the function is passed a screenshot in RGB of the number.
def getNumber(image):
image = cv2.resize(image, (0, 0), fx=3, fy=3)
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh, image_bin = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)
image_final = PIL.Image.fromarray(image_bin)
txt = pytesseract.image_to_string(
image_final, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt
Here's what i could improve, using otsu treshold is more efficent to separate text from background than giving an arbitrary value. Tesseract works better with black text on white background, and i also added padding as tesseract struggle to recognize characters if they are too close to the border.
This is the final image [final_image][1] and pytesseract manage to read "46"
import cv2,numpy,pytesseract
def getNumber(image):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Otsu Tresholding automatically find best threshold value
_, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# invert the image if the text is white and background is black
count_white = numpy.sum(binary_image > 0)
count_black = numpy.sum(binary_image == 0)
if count_black > count_white:
binary_image = 255 - binary_image
# padding
final_image = cv2.copyMakeBorder(image, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=(255, 255, 255))
txt = pytesseract.image_to_string(
final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt
Function is executed as :
>> getNumber(cv2.imread(img_path))
EDIT : note that you do not need this line :
image_final = PIL.Image.fromarray(image_bin)
as you can pass to pytesseractr an image in numpy array format (wich cv2 use), and Tesseract accuracy only drops for characters under 35 pixels (and also bigger, 35px height is actually the optimal height) so i did not resize it.
[1]: https://i.stack.imgur.com/OaJgQ.png