This is an image with Pytesseract guessing what's on small window with '59' below in the white text.
The window is a live screen grab and not a static image.
[EDIT] Was advised to post the small image so people can experiment with it, so here:-
Here is the code:
import numpy as np
import cv2
from PIL import ImageGrab
import pytesseract as loki
loki.pytesseract.tesseract_cmd = r"C:\Users\Rahul And Anisha\AppData\Local\Tesseract-OCR\tesseract.exe"
while True:
Odo = ImageGrab.grab(bbox = (1055,505, 1170, 570))
Speed = loki.image_to_string(Odo)
Odo = cv2.cvtColor(np.array(Odo), cv2.COLOR_BGR2RGB)
cv2.imshow('Speed' , Odo)
print(Speed)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
The problem is that no matter what config I set (Tried --psm1 through --psm13), tesseract is unable to guess the number correctly
What's the problem here?
Try adding a little bit of empty area around the text(padding). The below code is for the smaller image.
M = np.float32([[1,0,25],[0,1,25]])
img = cv2.warpAffine(img,M,(cols*2,rows*2),borderValue=(127,127,127))
custom_oem_psm_config = r'--oem 3 --psm 3 -c tessedit_char_whitelist=1234567890'
print(pytesseract.image_to_string(img,config=custom_oem_psm_config))
This should work but try passing the binarized image instead, tesseract works best with binarized images. Preprocessing is mandatory before passing the image to tesseract. Psm modes do not process the image.
Please correct me if I am wrong.
Related
I am trying to import a Nikon '.NEF' file into OpenCV. '.NEF' is the file extension for a RAW file format for pictures captured by Nikon cameras. When I open the file in Preview on a Mac, I see that the resolution is 6000 by 4000, and the picture is extremely clear. However, when I import it into OpenCV, I see only 120 by 160 (by 3 for RGB channels) data points, and this leads to a big loss in resolution.
My understanding is that there are 120 by 160 pixels in the NumPy array storing the information about pixels for OpenCV. I tried using -1 for the IMREAD_UNCHANGED flag, but many pixels were left out and image quality was greatly affected.
For your reference, here is my code:
# first Jupyter block
img = cv2.imread('DSC_1051.NEF', -1)
img.shape
Performing img.shape returns (120, 160, 3).
# second Jupyter block
cv2.namedWindow("Resize", cv2.WINDOW_NORMAL)
cv2.resizeWindow("Resize", 1000, 700)
# Displaying the image
cv2.imshow("Resize", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Summary of problem:
Original image shape is (6000, 4000)
Open CV imports (120, 160), leading to a big loss in resolution
Using the IMREAD_UNCHANGED flag did not lead to OpenCV importing all the pixels in the image, leading to a loss in quality of the image upon performing cv2.imshow().
My question: how can I use OpenCV to import the desired number of pixels? Is there a specific function that I can use? Am I missing an argument to be passed?
If you want to manipulate RAW images without losing resolution with python you'd need to check on a specialized library like rawpy
import rawpy
with rawpy.imread('filename.NEF') as raw:
raw_image = raw.raw_image
You can check the rawpy documentation for more information
Notes:
To install rawpy, Python<=3.7 is required
If you explain a little bit more what do u need to do with the image I could help you with that
Example 1: how to save .NEF as .jpg
Option A: rawpy + Pillow (you need to install Pillow too)
import rawpy
from PIL import Image
with rawpy.imread('filename.NEF') as raw:
rgb = raw.postprocess(use_camera_wb=True)
Image.fromarray(rgb).save('image.jpg', quality=90, optimize=True)
Option B: rawpy + cv2
import rawpy
import cv2
with rawpy.imread('filename.NEF') as raw:
rgb = raw.postprocess(use_camera_wb=True)
bgr = cv2.cvtColor(rgb, cv2.COLOR_RGB2BGR)
cv2.imwrite("image.jpg",bgr)
Quality comparison
I test the code with this 19.2mb .NEF image and I got these results:
Method
.jpg output size
Dimensions
PIL
9kb
320x212
cv2
14kb
320x212
rawpy + PIL
1.4mb
4284 × 2844
rawpy + cv2
2.5mb
4284 × 2844
Example 2: show .NEF with cv2
import rawpy
import cv2
with rawpy.imread('filename.NEF') as raw:
rgb = raw.postprocess(use_camera_wb=True)
bgr = cv2.cvtColor(rgb, cv2.COLOR_RGB2BGR)
cv2.imshow('image', bgr)
cv2.waitKey(0)
cv2.destroyAllWindows()
I'm trying to extract texts from some images. It worked for hundreds of other images but in some cases it doesn't find any texts. In order to optimize the images for extraction phase, all images are converted to black and white. All of their backgrounds are white and others are black such as icons, texts etc.
For example it worked for below image and succesfully found 'Sleep Timer' text in the image. I'm not sure if it's relevant but size of the below image with 'Sleep Timer' text is 320 × 351
But for the below image it doesn't find any text at all. Image size for this one is 161 × 320.
Since I couldn't find the reason, I tried to resize the image but it didn't work.
Here is my code:
from pytesseract import Output
import pytesseract
import cv2
image = cv2.imread('imagePath')
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pytesseract.image_to_data(rgb, output_type=Output.DICT)
for i in range(0, len(results["text"])):
text = results["text"][i]
conf = int(results["conf"][i])
print("Confidence: {}".format(conf))
print("Text: {}".format(text))
print("")
It is working for me I tested:
import pytesseract
print(pytesseract.image_to_string('../images/grmgrm.jfif'))
results = pytesseract.image_to_data('../images/grmgrm.jfif', output_type=pytesseract.Output.DICT)
print(results)
Are you getting an error? Show us the error you are getting.
I want to be able to recognize digits from images. So I have been playing around with tesseract and python. I looked into how to prepare the image and tried running tesseract on it and I must say I am pretty disappointed by how badly my digits are recognized. I have tried to prepare my images with OpenCV and thought I did a pretty good job (see examples below) but tesseract has a lot of errors when trying to identify my images. Am I expecting too much here? But when I look at these example images I think that tesseract should easily be able to identify these digits without any problems. I am wondering if the accuracy is not there yet or if somehow my configuration is not optimal. Any help or direction would be gladly appreciated.
Things I tried to improve the digit recognition: (nothing seemed to improved the results significantly)
limit characters: config = "--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789"
Upscale images
add a white border around the image to give the letters more space, as I have read that this improves the recognition process
Threshold image to only have black and white pixels
Examples:
Image 1:
Tesseract recognized: 72
Image 2:
Tesseract recognized: 0
EDIT:
Image 3:
https://ibb.co/1qVtRYL
Tesseract recognized: 1723
I'm not sure what's going wrong for you. I downloaded those images and tesseract interprets them just fine for me. What version of tesseract are you using (I'm using 5.0)?
781429
209441
import pytesseract
import cv2
import numpy as np
from PIL import Image
# set path
pytesseract.pytesseract.tesseract_cmd = r'C:\\Users\\ichu\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe';
# load images
first = cv2.imread("first_text.png");
second = cv2.imread("second_text.png");
images = [first, second];
# convert to pillow
pimgs = [];
for img in images:
rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB);
pimgs.append(Image.fromarray(rgb));
# do text
for img in pimgs:
text = pytesseract.image_to_string(img, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789');
print(text[:-2]); # drops newline + end char
so i'm using opencv in python to look at a specific part of screen using this code:
import numpy as np
from PIL import ImageGrab
import cv2
while(True):
printscreen_pil = ImageGrab.grab(bbox=(852,530,911,575))
printscreen = np.array(printscreen_pil.getdata(),dtype='uint8')\
.reshape((printscreen_pil.size[1],printscreen_pil.size[0],3))
cv2.imshow('window',cv2.cvtColor(printscreen, cv2.COLOR_BGR2GRAY))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
I want to give an output whenever this part of the screen matches an image. I'm stuck and reading up on 100 different tutorials but I'm stuck at the moment.
I've this python code which I use to convert a text written in a picture to a string, it does work for certain images which have large characters, but not for the one I'm trying right now which contains only digits.
This is the picture:
This is my code:
import pytesseract
from PIL import Image
img = Image.open('img.png')
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
result = pytesseract.image_to_string(img)
print (result)
Why is it failing at recognising this specific image and how can I solve this problem?
I have two suggestions.
First, and this is by far the most important, in OCR preprocessing images is key to obtaining good results. In your case I suggest binarization. Your images look extremely good so you shouldn't have any problem but if you do, then maybe you should try to binarize your images:
import cv2
from PIL import Image
img = cv2.imread('gradient.png')
# If your image is not already grayscale :
# img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold = 180 # to be determined
_, img_binarized = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)
pil_img = Image.fromarray(img_binarized)
And then try the ocr again with the binarized image.
Check if your image is in grayscale and uncomment if needed.
This is simple thresholding. Adaptive thresholding also exists but it is noisy and does not bring anything in your case.
Binarized images will be much easier for Tesseract to handle. This is already done internally (https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) but sometimes things can be messed up and very often it's useful to do your own preprocessing.
You can check if the threshold value is right by looking at the images :
import matplotlib.pyplot as plt
plt.imshow(img, cmap='gray')
plt.imshow(img_binarized, cmap='gray')
Second, if what I said above still doesn't work, I know this doesn't answer "why doesn't pytesseract work here" but I suggest you try out tesserocr. It is a maintained python wrapper for Tesseract.
You could try:
import tesserocr
text_from_ocr = tesserocr.image_to_text(pil_img)
Here is the doc for tesserocr from pypi : https://pypi.org/project/tesserocr/
And for opencv : https://pypi.org/project/opencv-python/
As a side-note, black and white is treated symetrically in Tesseract so having white digits on a black background is not a problem.