I want to read a column of number from an attached image (png file).
My code is
import cv2
import pytesseract
import os
img = cv2.imread(os.path.join(image_path, image_name), 0)
config= "-c
tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
pytesseract.image_to_string(img, config=config)
This code gives me the output string: 'n113\nun\n1.08'. As we can see, there are two problems:
It fails to recognize a decimal point in 1.13 (see attached picture).
It totally cannot read 1.11 (see attached picture). It just returns 'nun'.
What is a solution to these problems?
Bests
You need to preprocess the image. A simple approach is to resize the image, convert to grayscale, and obtain a binary image using Otsu's threshold. From here we can apply a slight gaussian blur then invert the image so the desired text to extract is in white with the background in black. Here's the processed image ready for OCR
Result from OCR
1.13
1.11
1.08
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=400)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = 255 - cv2.GaussianBlur(thresh, (5,5), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
Related
I'm in the middle of developing a system that predict numbers from 7Seg LCD and I'm using for the matter tesseract OCR engine and it's wrapper for python pytesseract.
I'm taking pictures with a camera then cropping the Region of Interest and I found out that I have to enhance my Image quality to increase the accuracy of the OCR engine.
I used some Image processing techniques (gray scale --> Gaussian Blur --> threshold) and I got a quiet good image but tesseract still can't detect the numbers in the image.
I use the code:
image = cv2.imread('test.jpg')
image = image[50:200, 300:540]
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
cv2.imshow('result', image)
cv2.waitKey()
cv2.destroyAllWindows()
cv2.imwrite('enhanced.jpg', image)
tess_dir_config = r'--tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata"'
text = image_to_string(image, lang='letsgodigital', config=tess_dir_config)
print(text)
The Output Image:
The Input Image:
The engine usually have an empty output and if not it will not detect the number correctly.
Is there some sort of other image processing that I can use to get the potential of the Engine.
Note: I'am using letsgodigital weights
This works for me, if I improve the crop a little, and use page segmentation mode 7. (This mode does no page segmentation and assumes a single line of text.)
import cv2
import matplotlib.pyplot as plt
import pytesseract
image = cv2.imread('seven_seg_disp.jpg')
# Strip off top of meter and little percent symbol.
image = image[90:200, 300:520]
# plt.imshow(image)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (3,3), 0)
_, image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# plt.imshow(image)
tess_dir_config = r'--tessdata-dir "../.tesseract" --psm 7'
text = pytesseract.image_to_string(image, lang='letsgodigital', config=tess_dir_config)
text = text.strip()
print(text) # prints 75
Note: I changed the value of tessdata-dir because it's in a different place on my computer.
I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
The original image looks like this:
The resulting image looks like this:
I do this for a bunch of numbers with the same font in the same location here are the results:
It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.
Here is a snippet of the code I'm using:
def checkCurrentState():
"""image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
cv2.imwrite("screenshot.png", image)"""
image = cv2.imread("screenshot.png")
checkNumbers(image)
def checkNumbers(image):
numbers = []
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
for i in storeLocations:
cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
number = pytesseract.image_to_string(cropped)
numbers.append(number)
print(number)
cv2.imshow("Screenshot", cropped)
cv2.waitKey(0)
To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.
To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Here's a visualization of each step:
Input image
Convert to grayscale -> Gaussian blur
Otsu's threshold -> Morph open to remove noise
Result from Pytesseract OCR
1100
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
print(data)
cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.waitKey()
I want to recognize a image like this:
I am using the following config:
config="--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ,."
but when I try to convert that, I get the following:
1581
1
W
I think that the image shows really clearly what is written and think that there is a problem with pytesseract. Can you help?
Preprocessing the image to obtain a binary image before performing OCR seems to work. You could also try to resize the image so that more details would be seen
Results
158.1
1
IT
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale and Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# Perform text extraction
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
I have these images:
I want to remove the noise from the background(i.e make the background white in 1st and 3rd and black in 2nd) in all these images, I tried this method: Remove noise from threshold image opencv python but it didn't work, how can I do it?
P.S
This is the original image that I am trying to enhance.
You can use adaptive threshold on your original image in Python/OpenCV
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("writing.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 10)
# write results to disk
cv2.imwrite("writing_thresh.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
For example this image returns Sieteary ear
While this image returns the correct answer
The only difference between the 2 images is 2 pixels in the height.
I have tried applying some threshold but didnt seem to help...
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(path)
print(pytesseract.image_to_string(image, lang='eng'))
You can perform some preprocessing using OpenCV. The idea is to enlarge the image with imutils, obtain a binary image using Otsu's threshold, then add a slight Gaussian blur. For optimal detection, the image should be in the form where desired text to be detected is in black with the background in white. Here's the preprocessing results for the two images:
Before -> After
The output result from Pytesseract for both images are the same
BigBootyHunter2
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.jpg')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()