I am trying to extract numbers from an image using pytesseract but it does not return any text. Here is my code.
from PIL import Image
import pytesseract
im = Image.open('time.png')
custom_oem_psm_config = r'--oem 3 --psm 11 -c tessedit_char_whitelist="0123456789"'# -c preserve_interword_spaces=0'
text= pytesseract.pytesseract.image_to_string(im, config=custom_oem_psm_config)
print(text)
Here is my image
Here is the output
Pyteserract is not able to extract from all images.
It is mostly able to extract text which is similar to normal fonts we use on Microsoft word, notepad, etc.
Related
I'm trying to read the digits from this image:
Using pytesseract with these settings:
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(img, config=custom_config)
This is the output:
((E ST7 [71aT6T2 ] THETOGOG5 15 [8)
Whitelisting only integers, as well as changing your psm provides much better results. You also need to remove carriage returns, and white space. Below is code that does that.
import pytesseract
import re
from PIL import Image
#Open image
im = Image.open("numbers.png")
#Define configuration that only whitelists number characters
custom_config = r'--oem 3 --psm 11 -c tessedit_char_whitelist=0123456789'
#Find the numbers in the image
numbers_string = pytesseract.image_to_string(im, config=custom_config)
#Remove all non-number characters
numbers_int = re.sub(r'[a-z\n]', '', numbers_string.lower())
#print the output
print(numbers_int)
The result of the code on your image is: '31477423353'
Unfortunately, a few numbers are still missing. I tried some experimentation, and downloaded your image and erased the grid.
After removing the grid and executing the code again, pytesseract produces a perfect result: '314774628300558'
So you might try to think about how you can remove the grid programmatically. There are alternatives to pytesseract, but regardless you will get better output with the text isolated in the image.
I have tried this way to workaround:
from pytesseract import pytesseract
from PIL import Image
img = Image.open('img.jpg')
text = pytesseract.image_to_string(img, config='')
# Displaying the extracted text
print(text[:-1])
But this code does not extract all the text.
Here is the output output
I'm trying to extract texts from some images. It worked for hundreds of other images but in some cases it doesn't find any texts. In order to optimize the images for extraction phase, all images are converted to black and white. All of their backgrounds are white and others are black such as icons, texts etc.
For example it worked for below image and succesfully found 'Sleep Timer' text in the image. I'm not sure if it's relevant but size of the below image with 'Sleep Timer' text is 320 × 351
But for the below image it doesn't find any text at all. Image size for this one is 161 × 320.
Since I couldn't find the reason, I tried to resize the image but it didn't work.
Here is my code:
from pytesseract import Output
import pytesseract
import cv2
image = cv2.imread('imagePath')
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(results["text"])):
text = results["text"][i]
conf = int(results["conf"][i])
print("Confidence: {}".format(conf))
print("Text: {}".format(text))
print("")
It is working for me I tested:
import pytesseract
print(pytesseract.image_to_string('../images/grmgrm.jfif'))
results = pytesseract.image_to_data('../images/grmgrm.jfif', output_type=pytesseract.Output.DICT)
print(results)
Are you getting an error? Show us the error you are getting.
hello everyone I'm trying to extract a license number plate from Tunisian cars so i decided to use tesseract to extract the numbers and word 'تونس' so before that i installed tesseract-OCR v5.0.0 for windows 10 and i wanted to try on an image with Arabic words but i got this result words are reversed i didn't know how to fix this
enter image description here
this is the code I've been used
import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd=r"C:\Program Files\Tesseract-OCR\tesseract.exe"
text1= cv2.imread ('text.jpg')
text=pytesseract.image_to_string(text1 , lang='ara')
print(text)
cv2.imshow("img",text1)
cv2.waitKey(0)
I need to extract digits from images (see sample images). I tried pytesseract but it is not working, it produces empty results. Below is the code I am using
Code
import pytesseract
import cv2
img = cv2.imread('image_path')
digits = pytesseract.image_to_string(img)
print(digits)
Sample Images
I have a large pool of images, as shown above. Tesseract is not working on any of them.
Try adding config --psm 7 (meaning Treat the image as a single text line.)
import pytesseract
import cv2
img = cv2.imread('image_path')
digits = pytesseract.image_to_string(img,config='--psm 7')
print(digits)
#'971101004900 1545'