I want to crop a gray image by a mask. The gray image contains a digit that I want to process further for OCR.
The gray image in the following code snippet is only an excerpt of a bigger image but I took only the relevant part out of it (both for the image and the mask).
I thought that the cv2 function bitwise_and() should do the job, but I am apparently wrong (at least in the way I am doing it).
How can I achieve that my result contains these parts of the gray image that represent the digit 5 and all other parts of it are set to 0? Is this really something that cv2 should do, or is it better to use numpy for that? I have to keep the shades of gray as binarizing it would lead to errors in a further tesseract step.
So effectively it seems that everything equal/below the threshold 133 has to be set to 0.
The result:
The code:
import pandas as pd
import io
import cv2
maskString = \
"""
0,0,0,0,0,0,0,0,0,0
0,0,2,2,2,2,2,2,0,0
0,0,2,2,2,2,2,2,0,0
0,0,2,2,0,0,0,0,0,0
0,0,2,2,0,0,0,0,0,0
0,2,2,2,2,2,0,0,0,0
0,2,2,2,2,2,2,2,0,0
0,2,2,0,0,0,2,2,0,0
0,0,0,0,0,0,2,2,2,0
0,2,2,0,0,0,2,2,0,0
0,2,2,0,0,0,2,2,0,0
0,0,2,2,2,2,2,2,0,0
0,0,0,2,2,2,0,0,0,0
0,0,0,0,0,0,0,0,0,0
"""
grayedString = \
"""
133,133,133,133,133,133,133,133,133,133
133,132,168,201,201,201,201,185,132,133
133,132,225,232,201,201,201,185,132,133
133,132,247,185,132,132,132,132,132,133
133,142,255,168,132,132,132,132,132,133
133,168,255,159,201,193,151,132,132,133
133,193,255,232,201,217,255,177,132,133
133,168,193,132,132,132,201,255,132,133
133,132,132,132,132,132,168,255,168,133
133,185,168,132,132,132,168,255,151,133
133,217,247,132,132,132,210,247,132,133
133,142,240,232,201,232,255,159,132,133
133,132,142,185,201,185,142,132,132,133
133,133,133,133,133,133,133,133,133,133"""
grayed = pd.read_csv(io.StringIO(grayedString), sep=',', header=None).values.astype('uint8')
mask = pd.read_csv(io.StringIO(maskString), sep=',', header=None).values.astype('uint8')
result = cv2.bitwise_and(grayed,mask)
Currently I'm using the code below to get text from image and it works fine, but it doesn't work well with these two images, it seems like tesseract cannot scan these types of image. Please show me how to fix it
https://i.ibb.co/zNkbhKG/Untitled1.jpg
https://i.ibb.co/XVbjc3s/Untitled3.jpg
def read_screen():
spinner = Halo(text='Reading screen', spinner='bouncingBar')
spinner.start()
screenshot_file="Screens/to_ocr.png"
screen_grab(screenshot_file)
#prepare argparse
ap = argparse.ArgumentParser(description='HQ_Bot')
ap.add_argument("-i", "--image", required=False,default=screenshot_file,help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", type=str, default="thresh", help="type of preprocessing to be done")
args = vars(ap.parse_args())
# load the image
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
if args["preprocess"] == "thresh":
gray = cv2.threshold(gray, 177, 177,
cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
elif args["preprocess"] == "blur":
gray = cv2.medianBlur(gray, 3)
# store grayscale image as a temp file to apply OCR
filename = "Screens/{}.png".format(os.getpid())
cv2.imwrite(filename, gray)
# load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
#ENG
#text = pytesseract.image_to_string(Image.open(filename))
#VIET
text = pytesseract.image_to_string(Image.open(filename), lang='vie')
os.remove(filename)
os.remove(screenshot_file)
# show the output images
'''cv2.imshow("Image", image)
cv2.imshow("Output", gray)
os.remove(screenshot_file)
if cv2.waitKey(0):
cv2.destroyAllWindows()
print(text)
'''
spinner.succeed()
spinner.stop()
return text
You should try different psm modes instead of default like so:
target = pytesseract.image_to_string(im,config='--psm 4',lang='vie')
Exert from docs:
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR.
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
So for example for /Untitled3.jpg you could try --psm 4 and failing that you could try --psm 11 for both.
Depending on your version of tesseract you could also try different oem modes:
Use --oem 1 for LSTM, --oem 0 for Legacy Tesseract. Please note that Legacy Tesseract models are only included in traineddata files from tessdata repo.
EDIT
Also as seen in your images there are two languages so if you wish to use lang parameter you need to manually separate image into two to not to confuse tesseract engine and use different lang values for them.
EDIT 2
Below a full working example with Unitiled3. What I noticed was your improper use of thresholding. You should set maxval to something bigger than the value you are thresholding at. Like in my example I set thresh 177 but maxval to 255 so everything above 177 will be black. I didn't even had to do any binarization.
import cv2
import pytesseract
from cv2.cv2 import imread, cvtColor, COLOR_BGR2GRAY, threshold, THRESH_BINARY
image = imread("./Untitled3.jpg")
image = cvtColor(image,COLOR_BGR2GRAY)
_,image = threshold(image,177,255,THRESH_BINARY)
cv2.namedWindow("TEST")
cv2.imshow("TEST",image)
cv2.waitKey()
text = pytesseract.image_to_string(image, lang='eng')
print(text)
Output:
New York, New York
Salzburg, Austria
Hollywood, California
I'm trying to do OCR arabic on the following ID but I get a very noisy picture, and can't extract information from it.
Here is my attempt
import tesserocr
from PIL import Image
import pytesseract
import matplotlib as plt
import cv2
import imutils
import numpy as np
image = cv2.imread(r'c:\ahmed\ahmed.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bilateralFilter(gray,11,18,18)
gray = cv2.GaussianBlur(gray,(5,5), 0)
kernel = np.ones((2,2), np.uint8)
gray = cv2.adaptiveThreshold(gray,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
cv2.THRESH_BINARY,11,2)
#img_dilation = cv2.erode(gray, kernel, iterations=1)
#cv2.imshow("dilation", img_dilation)
cv2.imshow("gray", gray)
text = pytesseract.image_to_string(gray, lang='ara')
print(text)
with open(r"c:\ahmed\file.txt", "w", encoding="utf-8") as myfile:
myfile.write(text)
cv2.waitKey(0)
result
sample
The text for your id is in black color which makes the extraction process easy. All you need to do is threshold the dark pixels and you should be able to get the text out.
Here is a snip of the code
import cv2
import numpy as np
# load image in grayscale
image = cv2.imread('AVXjv.jpg',0)
# remove noise
dst = cv2.blur(image,(3,3))
# extract dark regions which corresponds to text
val, dst = cv2.threshold(dst,80,255,cv2.THRESH_BINARY_INV)
# morphological close to connect seperated blobs
dst = cv2.dilate(dst,None)
dst = cv2.erode(dst,None)
cv2.imshow("dst",dst)
cv2.waitKey(0)
And here is the result:
This is my output using ImageMagick TextCleaner script:
Script: textcleaner -g -e stretch -f 50 -o 30 -s 1 C:/Users/PC/Desktop/id.jpg C:/Users/PC/Desktop/out.png
Take a look here if you want to install and use TextCleaner script on Windows... It's a tutorial I made as simple as possible after few researches I made when I was in your same situation.
Now it should be very easy to detect the text and (not sure how simple) recognize it.