I've this python code which I use to convert a text written in a picture to a string, it does work for certain images which have large characters, but not for the one I'm trying right now which contains only digits.
This is the picture:
This is my code:
import pytesseract
from PIL import Image
img = Image.open('img.png')
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
result = pytesseract.image_to_string(img)
print (result)
Why is it failing at recognising this specific image and how can I solve this problem?
I have two suggestions.
First, and this is by far the most important, in OCR preprocessing images is key to obtaining good results. In your case I suggest binarization. Your images look extremely good so you shouldn't have any problem but if you do, then maybe you should try to binarize your images:
import cv2
from PIL import Image
img = cv2.imread('gradient.png')
# If your image is not already grayscale :
# img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold = 180 # to be determined
_, img_binarized = cv2.threshold(img, threshold, 255, cv2.THRESH_BINARY)
pil_img = Image.fromarray(img_binarized)
And then try the ocr again with the binarized image.
Check if your image is in grayscale and uncomment if needed.
This is simple thresholding. Adaptive thresholding also exists but it is noisy and does not bring anything in your case.
Binarized images will be much easier for Tesseract to handle. This is already done internally (https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality) but sometimes things can be messed up and very often it's useful to do your own preprocessing.
You can check if the threshold value is right by looking at the images :
import matplotlib.pyplot as plt
plt.imshow(img, cmap='gray')
plt.imshow(img_binarized, cmap='gray')
Second, if what I said above still doesn't work, I know this doesn't answer "why doesn't pytesseract work here" but I suggest you try out tesserocr. It is a maintained python wrapper for Tesseract.
You could try:
import tesserocr
text_from_ocr = tesserocr.image_to_text(pil_img)
Here is the doc for tesserocr from pypi : https://pypi.org/project/tesserocr/
And for opencv : https://pypi.org/project/opencv-python/
As a side-note, black and white is treated symetrically in Tesseract so having white digits on a black background is not a problem.
Related
I have two questions.
I am working with openCv and python and I am trying to have an image's contours. I am succesfull at that but when I try to se what is the difference between when I use cv2.drawContorus() functions and directly edit image with cv2.findContours() without sending a copy of original image as the source parameter. I have tried on some images but I couldnt see anything even happenning.
I am trying to get the contours of a square I created with paint square tool. But when I try with cv2.CHAIN_APPROX_SIMPLE method, it gives me coordinates of 6 points which none of the combinations from them is suitable for my square. Why does it do like that?
Can someone explain?
Here is my code for both problems:
import cv2
import numpy as np
image = cv2.imread(r"C:\Users\fazil\Desktop\12.png")
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.Canny(gray,75,200)
gray = cv2.threshold(gray,127,255,cv2.THRESH_BINARY_INV)[1]
cv2.imshow("s",gray)
contours, hiearchy = cv2.findContours(gray,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
print(contours[1])
cv2.drawContours(image,contours,1,(45,67,89),5)
cv2.imshow("k",gray)
cv2.imshow("j",image)
cv2.waitKey(0)
cv2.destroyAllWindows()
I want to extract car images without using Mask RCNN. I tried a couple of methods but couldn't decide on how to proceed with any of them. I need recommendation on which method would be best and how to go through with it.
Method 1 - Using XML files and haar cascade classifier
I was thinking of using xml files to detection and crop car images. The problems I faced were:
They only detect car in square shapes. I needed car images cropped. So ultimately I ended up with better images of cropped cars. This didn't solve my problem.
The cropped image didn't detect car as a whole but small parts of it. Maybe due to XML file's config.
My code:
!wget https://raw.githubusercontent.com/shaanhk/New-GithubTest/master/cars.xml
import numpy as py
import cv2
car_cascade=cv2.CascadeClassifier('cars.xml')
img = cv2.imread('im1.jpg')
cars = car_cascade.detectMultiScale(img, 1.1, 1)
for (x,y,w,h) in cars:
cv2.rectangle(img,(x,y),(x+w,y+h),(0,0,255),2)
cv2.imshow('image',img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Resulting image:
Method 2 - Using Canny Edge Detection
I tried to perform canny edge detection for car. It worked to some extent that I managed to reduce edges to mostly car object. But I don't know how to proceed from there.
My code:
import cv2
import numpy as np
image= cv2.imread('im1.jpg')
imagecopy= np.copy(image)
grayimage= cv2.cvtColor(imagecopy, cv2.COLOR_RGB2GRAY)
canny= cv2.Canny(grayimage, 300,150)
cv2.imshow('Highway Edge Detection Image', canny)
cv2.waitKey(0)
cv2.destroyAllWindows()
Resulting Image:
Method 3 - Extract car image using color gradients
On googling I found a method using HSV transformation and then creating a custom mask to extract cars. But I don't know much about this method and have no idea how to go about it. I used the code provided and am posting it below.
Code:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import cv2
%matplotlib inline
image = mpimg.imread('im1.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)
# HSV channels
h = hsv[:,:,0]
s = hsv[:,:,1]
v = hsv[:,:,2]
background_hue = h[10,10]
lower_hue = np.array([background_hue-10,0,0])
upper_hue = np.array([background_hue+10,255,255])
mask = cv2.inRange(hsv, lower_hue, upper_hue)
# Mask the image to let the car show through
masked_image = np.copy(image)
masked_image[mask != 0] = [0, 0, 0]
cv2.imwrite('mask.jpg',masked_image)
# Display it!
plt.imshow(masked_image)
Image:
I'd like to mention, I'm a complete beginner in Computer Vision and am trying to learn by doing some small stuff like these. My code is probably very flawed and hopefully I can work on it on the way. Please feel absolutely free to mention any other method (except Mask RCNN) or any problems with code.
I am trying to recognize the text in a captcha and it is not possible for me. I am using python3, openCv and tesseract.
The simplified code is:
import cv2
from pytesseract import *
img_path = "path"
img = cv2.imread(img_path)
img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_LINEAR)
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
pytesseract.image_to_string(img)
I think I should remove the color lines first, then leave the text alone, and maybe change the brightness and contrast. What filter could apply?
These are some images to recognize.
For recognising captcha text using pytesseract-ocr, you can do the following..
Prepare custom train_set to training your tesseract instance to recognise a specific font [Optional]
The captcha images need some pre-processing(such as * Apply Black & White Filter > Scale(up) > Blur > Morphological Transformation + Adaptive threshold*)to enhance the text part and reduce the noises/lines.
For removing lines: In the sample images only the text can be seen in black color and there is no black line, so you can simply convert the each non-black pixel to white by using PIL or OpenCV, you can even utilize some specific algo like Hough Line Transform to detect and remove lines.
You can learn about all these filters and algos from the official documentation and tutorial on OpenCV website.
I'm trying to read different cropped images from a big file and I manage to read most of them but there are some of them which return an empty string when I try to read them with tesseract.
The code is just this line:
pytesseract.image_to_string(cv2.imread("img.png"), lang="eng")
Is there anything I can try to be able to read these kind of images?
Thanks in advance
Edit:
Thresholding the image before passing it to pytesseract increases the accuracy.
import cv2
import numpy as np
# Grayscale image
img = Image.open('num.png').convert('L')
ret,img = cv2.threshold(np.array(img), 125, 255, cv2.THRESH_BINARY)
# Older versions of pytesseract need a pillow image
# Convert back if needed
img = Image.fromarray(img.astype(np.uint8))
print(pytesseract.image_to_string(img))
This printed out
5.78 / C02
Edit:
Doing just thresholding on the second image returns 11.1. Another step that can help is to set the page segmentation mode to "Treat the image as a single text line." with the config --psm 7. Doing this on the second image returns 11.1 "202 ', with the quotation marks coming from the partial text at the top. To ignore those, you can also set what characters to search for with a whitelist by the config -c tessedit_char_whitelist=0123456789.%. Everything together:
pytesseract.image_to_string(img, config='--psm 7 -c tessedit_char_whitelist=0123456789.%')
This returns 11.1 202. Clearly pytesseract is having a hard time with that percent symbol, which I'm not sure how to improve on that with image processing or config changes.
I tried almost all filters in PIL, but failed.
Is there any function in numpy of scipy to remove the noise?
Like Bwareaopen() in Matlab()?
e.g:
PS: If there is a way to fill the letters into black, I will be grateful
Numpy/Scipy can do morphological operations just as well as Matlab can.
See scipy.ndimage.morphology, containing, among other things, binary_opening(), the equivalent of Matlab's bwareaopen().
Numpy/Scipy solution: scipy.ndimage.morphology.binary_opening. More powerful solution: use scikits-image.
from skimage import morphology
cleaned = morphology.remove_small_objects(YOUR_IMAGE, min_size=64, connectivity=2)
See http://scikit-image.org/docs/0.9.x/api/skimage.morphology.html#remove-small-objects
I don't think this is what you want, but this works (uses Opencv (which uses Numpy)):
import cv2
# load image
fname = 'Myimage.jpg'
im = cv2.imread(fname,cv2.COLOR_RGB2GRAY)
# blur image
im = cv2.blur(im,(4,4))
# apply a threshold
im = cv2.threshold(im, 175 , 250, cv2.THRESH_BINARY)
im = im[1]
# show image
cv2.imshow('',im)
cv2.waitKey(0)
Output ( image in a window ):
You can save the image using cv2.imwrite