I have tried this way to workaround:
from pytesseract import pytesseract
from PIL import Image
img = Image.open('img.jpg')
text = pytesseract.image_to_string(img, config='')
# Displaying the extracted text
print(text[:-1])
But this code does not extract all the text.
Here is the output output
Related
I need to convert lots of jpg/png files to docx files & then to pdf. My sole concern is to write the data in an image to a pdf file & if I need to edit any text manually, I can do that in word & save it in the corresponding pdf file.
I've tried using API but failed as the text is not correctly matching.
My image files contain only texts & not anything else.
I already have docx to pdf conversion code in Python.
from docx2pdf import convert
input = 'INPUT_FILE_NAME.docx'
output = 'OUTPUT_FILE_NAME.pdf'
convert(input)
convert(input, output)
convert("Output")
Kindly suggest me how to convert a png/jpg file to docx. Thanks.
EDIT --------------
I've successfully made this code run. I've uploaded in my github repo.
from PIL import Image
from pytesseract import pytesseract
#Define path to tessaract.exe
path_to_tesseract = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
#Define path to image
path_to_image = 'texttoimage.png'
#Point tessaract_cmd to tessaract.exe
pytesseract.tesseract_cmd = path_to_tesseract
#Open image with PIL
img = Image.open(path_to_image)
#Extract text from image
text = pytesseract.image_to_string(img)
print(text)
I am trying to extract numbers from an image using pytesseract but it does not return any text. Here is my code.
from PIL import Image
import pytesseract
im = Image.open('time.png')
custom_oem_psm_config = r'--oem 3 --psm 11 -c tessedit_char_whitelist="0123456789"'# -c preserve_interword_spaces=0'
text= pytesseract.pytesseract.image_to_string(im, config=custom_oem_psm_config)
print(text)
Here is my image
Here is the output
Pyteserract is not able to extract from all images.
It is mostly able to extract text which is similar to normal fonts we use on Microsoft word, notepad, etc.
I'm trying to extract texts from some images. It worked for hundreds of other images but in some cases it doesn't find any texts. In order to optimize the images for extraction phase, all images are converted to black and white. All of their backgrounds are white and others are black such as icons, texts etc.
For example it worked for below image and succesfully found 'Sleep Timer' text in the image. I'm not sure if it's relevant but size of the below image with 'Sleep Timer' text is 320 × 351
But for the below image it doesn't find any text at all. Image size for this one is 161 × 320.
Since I couldn't find the reason, I tried to resize the image but it didn't work.
Here is my code:
from pytesseract import Output
import pytesseract
import cv2
image = cv2.imread('imagePath')
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(results["text"])):
text = results["text"][i]
conf = int(results["conf"][i])
print("Confidence: {}".format(conf))
print("Text: {}".format(text))
print("")
It is working for me I tested:
import pytesseract
print(pytesseract.image_to_string('../images/grmgrm.jfif'))
results = pytesseract.image_to_data('../images/grmgrm.jfif', output_type=pytesseract.Output.DICT)
print(results)
Are you getting an error? Show us the error you are getting.
I am trying to read text from the image , the image consist of single character it is not reading correctly. this is the type of images i have
it is reading this image as 't'
most of them it is reading incorrectly .
these are some of my images
this is my code
import pytesseract
from PIL import Image
import requests
import io
text = pytesseract.image_to_string(Image.open('15.png'), lang='eng',config='--psm 10')
print(text)
This is the first time I am working with OCR. I have an image and want to extract data from the image. My image looks like this:
I have 500 such images and will have to record the parameters and the respective values. I'm thinking of doing it through code than doing manually.
I have tried with python py-tesseract and PIL libraries. They are performing good if the image contains some simple text.This is what i tried
from PIL import Image, ImageEnhance, ImageFilter
from pytesseract import image_to_string
from pytesseract import image_to_boxes
im = Image.open("AHU.png")
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save('temp2.jpg')
text = image_to_string(Image.open('temp2.jpg'))
print(text)
What to do in this case where there are several parameters? All my images are similar with respect to position of the values.