I am doing a document reader that parse all text inside it to a google spreadsheet, this script is supposed to save time in my work, the problem is that the binary image has a lot of noise (really small points around text) that confuses pytesseract. How could i remove this noise? the code i am using to binarize the image is :
import pytesseract
import cv2
import numpy as np
import os
import re
import argparse
#binarization of images
def binarize(img):
#convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#apply adaptive thresholding
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
#return thresholded image
return thresh
#construct argument parser
parser = argparse.ArgumentParser(description='Binarize image and parse text in image to string')
parser.add_argument('-i', '--image', help='path to image', required=True)
parser.add_argument('-o', '--output', help='path to output file', required=True)
args = parser.parse_args()
# load image
img = cv2.imread(args.image)
#binarization of image
thresh = binarize(img)
#show image
cv2.imshow('image', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
#save image
cv2.imwrite(args.output+'/imagen3.jpg', thresh)
and the result image i want to clean is :
and if i apply erosion this is the result:
which is worst than the other
EDIT: original image is :
You just need to increase your adaptive threshold arguments in Python/OpenCV.
Input:
import cv2
# read image
img = cv2.imread("petrol.png")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 25)
# write results to disk
cv2.imwrite("petrol_threshold.png", thresh)
# display it
cv2.imshow("THRESHOLD", thresh)
cv2.waitKey(0)
Results:
Related
I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
The original image looks like this:
The resulting image looks like this:
I do this for a bunch of numbers with the same font in the same location here are the results:
It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.
Here is a snippet of the code I'm using:
def checkCurrentState():
"""image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
cv2.imwrite("screenshot.png", image)"""
image = cv2.imread("screenshot.png")
checkNumbers(image)
def checkNumbers(image):
numbers = []
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
for i in storeLocations:
cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
number = pytesseract.image_to_string(cropped)
numbers.append(number)
print(number)
cv2.imshow("Screenshot", cropped)
cv2.waitKey(0)
To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.
To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Here's a visualization of each step:
Input image
Convert to grayscale -> Gaussian blur
Otsu's threshold -> Morph open to remove noise
Result from Pytesseract OCR
1100
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
print(data)
cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.waitKey()
I am following this tutorial on OpenCV, where the tutorial uses the following code:
import argparse
import imutils
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="path to input image")
args = vars(ap.parse_args())
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 255, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow("Thresh", thresh)
cv2.waitKey()
Yet the image thresh displays as a blank window with a light grey background, where the tutorial is supposed to shoe a monochrome profile of an image that came with the tutorial source code.
I'm using the same code, and the same input image, yet the tutorial tells me to expect a monochrome image showing object outlines, while I only get a blank, grey image. What could be wrong here?
Your thresh parameter value should be different than maxval value.
thresh = cv2.threshold(src=gray,
thresh=127,
maxval=255,
type=cv2.THRESH_BINARY_INV)[1]
From the documentation
gray is your source image.
thresh is the threshold value
maxval is maximum value
typeis thresholding type
When you set both thresh and maxval the same value, you are saying:
I want to display my pixels above 255 values, but none of the pixels should exceed 255.
Therefore the result is a blank image.
One possible way is setting thresh to 127.
For instance:
Original Image Thresholded Image
import cv2
image = cv2.imread("fgtc7_256_256.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(src=gray,
thresh=127,
maxval=255,
type=cv2.THRESH_BINARY_INV)[1]
cv2.imshow("Thresh", thresh)
cv2.waitKey(0)
I have these images:
I want to remove the noise from the background(i.e make the background white in 1st and 3rd and black in 2nd) in all these images, I tried this method: Remove noise from threshold image opencv python but it didn't work, how can I do it?
P.S
This is the original image that I am trying to enhance.
You can use adaptive threshold on your original image in Python/OpenCV
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("writing.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 10)
# write results to disk
cv2.imwrite("writing_thresh.jpg", thresh)
# display it
cv2.imshow("thresh", thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
I want to read a column of number from an attached image (png file).
My code is
import cv2
import pytesseract
import os
img = cv2.imread(os.path.join(image_path, image_name), 0)
config= "-c
tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
pytesseract.image_to_string(img, config=config)
This code gives me the output string: 'n113\nun\n1.08'. As we can see, there are two problems:
It fails to recognize a decimal point in 1.13 (see attached picture).
It totally cannot read 1.11 (see attached picture). It just returns 'nun'.
What is a solution to these problems?
Bests
You need to preprocess the image. A simple approach is to resize the image, convert to grayscale, and obtain a binary image using Otsu's threshold. From here we can apply a slight gaussian blur then invert the image so the desired text to extract is in white with the background in black. Here's the processed image ready for OCR
Result from OCR
1.13
1.11
1.08
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Resize, grayscale, Otsu's threshold
image = cv2.imread('1.png')
image = imutils.resize(image, width=400)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Blur and perform text extraction
thresh = 255 - cv2.GaussianBlur(thresh, (5,5), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()
For example this image returns Sieteary ear
While this image returns the correct answer
The only difference between the 2 images is 2 pixels in the height.
I have tried applying some threshold but didnt seem to help...
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = Image.open(path)
print(pytesseract.image_to_string(image, lang='eng'))
You can perform some preprocessing using OpenCV. The idea is to enlarge the image with imutils, obtain a binary image using Otsu's threshold, then add a slight Gaussian blur. For optimal detection, the image should be in the form where desired text to be detected is in black with the background in white. Here's the preprocessing results for the two images:
Before -> After
The output result from Pytesseract for both images are the same
BigBootyHunter2
Code
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.jpg')
image = imutils.resize(image, width=500)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.waitKey()