I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
The original image looks like this:
The resulting image looks like this:
I do this for a bunch of numbers with the same font in the same location here are the results:
It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.
Here is a snippet of the code I'm using:
def checkCurrentState():
"""image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
cv2.imwrite("screenshot.png", image)"""
image = cv2.imread("screenshot.png")
checkNumbers(image)
def checkNumbers(image):
numbers = []
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
for i in storeLocations:
cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
number = pytesseract.image_to_string(cropped)
numbers.append(number)
print(number)
cv2.imshow("Screenshot", cropped)
cv2.waitKey(0)
To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.
To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Here's a visualization of each step:
Input image
Convert to grayscale -> Gaussian blur
Otsu's threshold -> Morph open to remove noise
Result from Pytesseract OCR
1100
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
print(data)
cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.waitKey()
Related
So, I have been trying to enhance images so I can use text recognition, but since the images are extremely low quality and I am a beginner I haven't been able to perform a great job.
Below is the original image:
Original Image:
First I resized the image
img = cv2.imread('test.jpg')
cv2.imshow('Original',img)
cv2.waitKey(0)
img = cv2.resize(img,(500,500),interpolation = cv2.INTER_AREA)
cv2.imshow('Resized',img)
cv2.waitKey(0)
then I changed the background color to gray
img_gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow('GRAY',img_gray)
cv2.waitKey(0)
I did some thresholding
ret, img_threshold = cv2.threshold(img_gray, 70, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('THRESHOLD', img_threshold)
cv2.waitKey(0)
and I used morphology to get a better image
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(6,6))
opening = cv2.morphologyEx(img_threshold, cv2.MORPH_OPEN, kernel, iterations = 2)
kernel = np.ones((9,9),np.uint8)
open_img = cv2.morphologyEx(opening, cv2.MORPH_OPEN, kernel, iterations = 3)
cv2.imshow('OPENING',open_img)
cv2.waitKey(0)
My final product is below:
Final Image 2
My question is how can I remove the white chunks and the line crossing the numbers
import cv2
import numpy as np
# Load image, grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Create rectangular structuring element and dilate
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=4)
cv2.imshow('dilate', dilate)
cv2.waitKey()
I am trying to mask the text elements in an image and return an image with just the remaining portions. I have applied thresholding and dilating, but how can I retain the background.
Image after thresholding and dilating
Original image:
Here is a simple approach:
Using the inverted dilated image cv2.bitwise_not(dilate), create a mask over the original image.
res = cv2.bitwise_and(image, image, mask=cv2.bitwise_not(dilate))
In the above image you have all text regions and its boundaries masked out.
Now replace those masked out regions with the background of your original image. To do that, first I noted down the coordinates where of the text regoins in mask_ind. Then replaced the pixel values in those regions with the background of the original image image[0,0]
mask_ind = (dilate == 255)
res[mask_ind] = image[0,0]
cv2.imshow(res)
I have these images:
I want to remove the noise from the background(i.e make the background white in 1st and 3rd and black in 2nd) in all these images, I tried this method: Remove noise from threshold image opencv python but it didn't work, how can I do it?
P.S
This is the original image that I am trying to enhance.
You can use adaptive threshold on your original image in Python/OpenCV
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("writing.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 10)
# write results to disk
cv2.imwrite("writing_thresh.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
This mathematica code removes glare from an image:
img = Import["foo.png"]
Inpaint[img, Dilation[saturated, DiskMatrix[20]]]
as shown in the most upvoted answer here:
https://dsp.stackexchange.com/questions/1215/how-to-remove-a-glare-clipped-brightness-from-an-image
I want to use opencv instead of Mathematica to get the same result. How would I write equivalent code in opencv-python?
Here is how to do that in Python/OpenCV.
But I do not think the OpenCV inpainting routines are working or at least are not working well for my Python 3.7.5 and OpenCV 3.4.8.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread('apple.png')
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold grayscale image to extract glare
mask = cv2.threshold(gray, 220, 255, cv2.THRESH_BINARY)[1]
# Optionally add some morphology close and open, if desired
#kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7,7))
#mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
#kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
#mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=1)
# use mask with input to do inpainting
result = cv2.inpaint(img, mask, 21, cv2.INPAINT_TELEA)
# write result to disk
cv2.imwrite("apple_mask.png", mask)
cv2.imwrite("apple_inpaint.png", result)
# display it
cv2.imshow("IMAGE", img)
cv2.imshow("GRAY", gray)
cv2.imshow("MASK", mask)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
Thresholded image:
Result:
I want to read a column of number from an attached image (png file).
My code is
import cv2
import pytesseract
import os
img = cv2.imread(os.path.join(image_path, image_name), 0)
config= "-c
tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
pytesseract.image_to_string(img, config=config)
This code gives me the output string: 'n113\nun\n1.08'. As we can see, there are two problems:
It fails to recognize a decimal point in 1.13 (see attached picture).
It totally cannot read 1.11 (see attached picture). It just returns 'nun'.
What is a solution to these problems?
Bests
You need to preprocess the image. A simple approach is to resize the image, convert to grayscale, and obtain a binary image using Otsu's threshold. From here we can apply a slight gaussian blur then invert the image so the desired text to extract is in white with the background in black. Here's the processed image ready for OCR
Result from OCR
1.13
1.11
1.08
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=400)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = 255 - cv2.GaussianBlur(thresh, (5,5), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()