How to improve pytesseract function for capctha decoding?

How to improve pytesseract function for capctha decoding? - python

I want to extract the numbers from an image in python. In order to do that, I have chosen pytesseract. When I tried extracting the text from the image, the results weren't satisfactory. I also went through the following code and implemented all the techniques listed with other answers. Yet, it doesn't seem to perform well.
sample images:
and my code is:
import cv2 as cv
import pytesseract
from PIL import Image
import matplotlib.pyplot as plt
pytesseract.pytesseract.tesseract_cmd = r"E:\tesseract\tesseract.exe"
def recognize_text(image):
# edge preserving filter denoising 10,150
dst = cv.pyrMeanShiftFiltering(image, sp=10, sr=150)
plt.imshow(dst)
# grayscale image
gray = cv.cvtColor(dst, cv.COLOR_BGR2GRAY)
# binarization
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
# morphological manipulation corrosion expansion
erode = cv.erode(binary, None, iterations=2)
dilate = cv.dilate(erode, None, iterations=1)
# logical operation makes the background white the font is black for easy recognition.
cv.bitwise_not(dilate, dilate)
# identify
test_message = Image.fromarray(dilate)
custom_config = r'digits'
text = pytesseract.image_to_string(test_message, config=custom_config)
print(f' recognition result ：{text}')
src = cv.imread(r'roughh/testt/f.jpg')
recognize_text(src)
My problem with my code is that it only works with the images of '396156' & '436359' and not with any other images. Please suggest some improvement in my code.

I don't know if you've solved your problem, but this kind of images must be pre-processed using this solution. You will need to tweak the parameters. I worked with a similar dataset and aforementioned solution works well. Let me know your results.
Editing the answer
I'm improving my answer, to not show just link for reference.
The key for this kind of problem is image pre-processing. The main idea is to clean up the input image conserving just the characters.
Given an input image as
We want an output image as
The follow code contains the image pre-processing that I used based on the solution:
# loading image and checking the height and width
img = cv.imread('PNgCd.jpg')
(h, w) = img.shape[:2]
print("Height: {} Width:{}".format(h,w))
cv.imshow('Image', img)
cv.waitKey(0)
cv.destroyAllWindows()
#converting into RBG and resizing the image
img = cv.cvtColor(img, cv.COLOR_BGR2RGB) # converting into RGB order
img = imutils.resize(img, width=450) #resizing the width into 500 pxls
cv.imshow('Image', img)
cv.waitKey(0)
cv.destroyAllWindows()
#gray scale
gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
cv.imshow('Gray', gray)
cv.waitKey(0)
cv.destroyAllWindows()
# image thresholdinf with Otsu method and inverse operation
thresh = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)[1]
cv.imshow('Thresh Otsu', thresh)
cv.waitKey(0)
cv.destroyAllWindows()
#distance tramsform
dist = cv.distanceTransform(thresh, cv.DIST_L2, 5)
dist = cv.normalize(dist, dist, 0, 1.0, cv.NORM_MINMAX)
dist = (dist*255).astype('uint8')
cv.imshow('dist', dist)
cv.waitKey(0)
cv.destroyAllWindows()
#image thresholding with binary operation
dist = cv.threshold(dist, 0, 255, cv.THRESH_BINARY |
cv.THRESH_OTSU)[1]
cv.imshow('thresh binary', dist)
cv.waitKey(0)
cv.destroyAllWindows()
#morphological operation
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
opening = cv.morphologyEx(dist, cv.MORPH_OPEN, kernel)
cv.imshow('Morphological - Opening', opening)
cv.waitKey(0)
cv.destroyAllWindows()
#dilation or erode (it's depend on your image)
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
dilation = cv.dilate(opening, kernel, iterations = 1)
cv.imshow('Dilation', dilation)
cv.waitKey(0)
cv.destroyAllWindows()
# found contours and filtering them
cnts = cv.findContours(dilation.copy(), cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
nums = []
for c in cnts:
(x, y, w, h) = cv.boundingRect(c)
if w >= 5 and h > 15:
nums.append(c)
len(nums)
#Convex hull and image masking
nums = np.vstack([nums[i] for i in range(0, len(nums))])
hull = cv.convexHull(nums)
mask = np.zeros(dilation.shape[:2], dtype='uint8')
cv.drawContours(mask, [hull], -1, 255, -1)
mask = cv.dilate(mask, None, iterations = 2)
cv.imshow('mask', mask)
cv.waitKey(0)
cv.destroyAllWindows()
# bitwise to retrieval the characters from the original image
final = cv.bitwise_and(dilation, dilation, mask=mask)
cv.imshow('final', final)
cv.imwrite('final.jpg', final)
cv.waitKey(0)
cv.destroyAllWindows()
# OCR'ing the pre-processed image
config = "--psm 7 -c tessedit_char_whitelist=0123456789"
text = tsr.image_to_string(final, config=config)
print(text)
The code is an example to how to deal with this kind of image. We must keep in mind, Tesseract is not perfect and, it requires cleaned images to work well. This code can also fail for others images like that, we must tweak the parameters or try other techniques of image pre-processing. You must also know the --psm modes, in this case I've considered --psm 7, that treats the image as a single text line. For this kind of image, you can also try --psm 8, that treats the image as single word. This code is just a start point, you can improve it according your need.

Related

Pytesseract and OpenCV can't detect digits

Thanks in advance to everyone that will answer.
I am new to OpenCV, Pytesseract and overall very inexperienced about image processing and recognition.
I am trying to detect a digit from a pdf, for the sake of this code I will directly provide the image:
Initial image
My objective is to detect the number in the colored box, which in this case is number 6.
My code for preprocessing is the following:
import numpy as np
import pytesseract
from PIL import Image
from PIL import ImageFilter, ImageEnhance
pytesseract.pytesseract.tesseract_cmd = 'Tesseract-OCR\tesseract.exe'
# -----Reading the image-----------------------------------------------------
img = cv2.imread('page_image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.resize(gray, (1028, 720))
thres_gray = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)[1]
gray_inv = cv2.bitwise_not(thres_gray)
gray_test = cv2.bitwise_not(gray_inv)
out2 = cv2.bitwise_or(gray, gray, mask=gray_inv)
thresh_end = cv2.threshold(out2, 254, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
imageObject = Image.fromarray(thresh_end)
enhancer = ImageEnhance.Sharpness(imageObject)
sharpened1 = imageObject.filter(ImageFilter.SHARPEN)
sharpened2 = sharpened1.filter(ImageFilter.SHARPEN)
# sharpened2.show()
From this I obtain the following picture:
Preprocessed image
At this point, since I am still learning about how to detect the region of interest and crop it with OpenCV, to test the code I decided to manually crop the image to test if my script works correctly enough.
Therefore the image I pass to pytesseract is the following:
Final image to read with pytesseract
I am not really sure if the image is good enough to be read, but this is the best I could get.
From this I try image_to_string:
trial = pytesseract.image_to_string(sharpened2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
I have tried many different configurations for the tesseract but none of it worked and the final output is always an empty string.
I would be really grateful if you could help me understand whether the image is not good enough or I am doing something wrong with the tesseract configuration.
If you could also be able to help me cropping the image correctly that would be awesome, but even detecting the number is enough for me.
Sorry for the long post and thanks again.

Try this:
import cv2
import pytesseract
import numpy as np
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread("form.jpg")
# https://stackoverflow.com/questions/10948589/choosing-the-correct-upper-and-lower-hsv-boundaries-for-color-detection-withcv
ORANGE_MIN = np.array([5, 50, 50], np.uint8)
ORANGE_MAX = np.array([15, 255, 255], np.uint8)
hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
frame_threshed = cv2.inRange(hsv_img, ORANGE_MIN, ORANGE_MAX)
# cv2.imshow("frame_threshed", frame_threshed)
thresh = cv2.threshold(frame_threshed, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
# cv2.imshow("thresh", thresh)
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
# cv2.imshow("dilate", thresh)
for c in cnts:
x, y, w, h = cv2.boundingRect(c)
ROI = thresh[y:y + h, x:x + w]
ratio = 100.0 / ROI.shape[1]
dim = (100, int(ROI.shape[0] * ratio))
resizedCubic = cv2.resize(ROI, dim, interpolation=cv2.INTER_CUBIC)
threshGauss = cv2.adaptiveThreshold(resizedCubic, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 255, 17)
cv2.imshow("ROI", threshGauss)
text = int(pytesseract.image_to_string(threshGauss, lang='eng', config="--oem 3 --psm 13"))
print(f"Detected text: {text}")
cv2.waitKey(0)
I used HSV method to detect orange color first. Then, once the ROI was clearly visible, I applied "classic" image pre-processing steps.
Take a look at this link to understand how to select other colors than orange.
I also resized the ROI a bit.

How to handle Credit Cards fonts with OpenCV and Tesseract in Python

I'm trying to read cards and output card numbers and expiry date with OpenCV.
import cv2
import pytesseract
filename = 'image1.png'
img = cv2.imread(filename)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
canny = cv2.Canny(gray, 50, 150, apertureSize=3)
result = pytesseract.image_to_string(canny)
print(f"OCR Results: {result}")
cv2.imshow('img', img)
cv2.imshow('canny', canny)
if cv2.waitKey(0) & 0xff == 27:
cv2.destroyAllWindows()
Image before processing
Image after Canny
The result text does not look good. See the screenshot below:
Question: How can I properly handle the cards fonts well for better results. Any idea is highly appreciated.
Thanks.

It looks like the OCR is not working well when passing the edges of the text.
You better apply threshold instead of using Canny.
I suggest the following stages:
Convert from BGR to HSV color space, and get the S (saturation) color channel of HSV.
All gray pixels in S are zero, and colored pixels are above zero.
Convert to binary using automatic threshold (use cv2.THRESH_OTSU).
Crop the contour with the maximum size.
Because the image you posted contains some background.
Apply OCR on the cropped area.
Here is the code:
import numpy as np
import cv2
import imutils # https://pypi.org/project/imutils/
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' # I am using Windows
img = cv2.imread('image1.png') # Read input image
# Convert from BGR to HSV color space
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the saturation color channel - all gray pixels are zero, and colored pixels are above zero.
s = hsv[:, :, 1]
# Convert to binary using automatic threshold (use cv2.THRESH_OTSU)
ret, thresh = cv2.threshold(s, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
# Find contours (in inverted thresh)
cnts = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = imutils.grab_contours(cnts)
# Find the contour with the maximum area.
c = max(cnts, key=cv2.contourArea)
# Get bounding rectangle
x, y, w, h = cv2.boundingRect(c)
# Crop the bounding rectangle out of thresh
thresh_card = thresh[y:y+h, x:x+w].copy()
# OCR
result = pytesseract.image_to_string(thresh_card)
print(f"OCR Results:\n {result}")
# Show images for debugging
cv2.imshow('s', s)
cv2.imshow('thresh', thresh)
cv2.imshow('thresh_card', thresh_card)
cv2.waitKey(0)
cv2.destroyAllWindows()
OCR Result:
Visa Classic
| By)
4000 1234 Sb18 9010
CARDHOLDER MARE
VISA
Still not perfect...
s:
thresh:
thresh_card:

remove small whits dots from binary image using opencv python

i have a binary image and I want to remove small white dots from the image using opencv python.You can refer to my problem here enter link description here
My original image is
i want the output image as:

This seems to work using connected components in Python Opencv.
#!/bin/python3.7
import cv2
import numpy as np
src = cv2.imread('img.png', cv2.IMREAD_GRAYSCALE)
# convert to binary by thresholding
ret, binary_map = cv2.threshold(src,127,255,0)
# do connected components processing
nlabels, labels, stats, centroids = cv2.connectedComponentsWithStats(binary_map, None, None, None, 8, cv2.CV_32S)
#get CC_STAT_AREA component as stats[label, COLUMN]
areas = stats[1:,cv2.CC_STAT_AREA]
result = np.zeros((labels.shape), np.uint8)
for i in range(0, nlabels - 1):
if areas[i] >= 100: #keep
result[labels == i + 1] = 255
cv2.imshow("Binary", binary_map)
cv2.imshow("Result", result)
cv2.waitKey(0)
cv2.destroyAllWindows()
cv2.imwrite("Filterd_result.png, result)
See here

You can simply use image smoothing techniques like gaussian blur, etc. to remove noise from the image, followed by binary thresholding like below:
img = cv2.imread("your-image.png",0)
blur = cv2.GaussianBlur(img,(13,13),0)
thresh = cv2.threshold(blur, 100, 255, cv2.THRESH_BINARY)[1]
cv2.imshow('original', img)
cv2.imshow('output', thresh)
cv2.waitKey(0)
cv2.destroyAllWinsdows()
output:
Read about different image smoothing/blurring techniques from here.

You can use the closing function - erosion followed by dilation. It don't need the blurring function.
import cv2 as cv
import numpy as np
img = cv.imread('original',0)
kernel = np.ones((5,5),np.uint8)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
cv2.imshow('original', img)
cv2.imshow('output', opening)
cv2.waitKey(0)
cv2.destroyAllWindows()

How to extract text from an image with a slight background present?

I'm looking to extract the text from an image, The output I am receiving is not very accurate. I wonder if there's any additional steps I can take to process the image more to increase the accuracy of this OCR.
I've looked into some of the different ways to process the image and improve the OCR results. The image is quite small and I've been able to blow it up slightly, but to no avail.
The image will always be horizontal, no other text will be present other than the numbers. The maximum number will go up to 55000.
An example of the image in question:
After image processing, my image is scaled up by 4 on the X and Y axis. And some saturation is removed, although this does not improve the accuracy at all.
image = self._process(scale=6, iterations=2)
text = pytesseract.image_to_string(image, config="--psm 7")
My process method is doing the following:
# Resize and desaturate.
image = cv2.resize(image, None, fx=scale, fy=scale,
interpolation=cv2.INTER_CUBIC)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion.
kernel = np.ones((1, 1), np.uint8)
image = cv2.dilate(image, kernel, iterations=iterations)
image = cv2.erode(image, kernel, iterations=iterations)
return image
Expected: "10411"
The actual value is varied, usually an unrecognizable string, or some numbers are parsed but the accuracy rate is too low to be usable.

I don't have experience with OCR, but I think you're on the right track: increasing the image size so the algorithm has more pixels to work with and increasing the distinction between the numbers and the background.
Tricks I added: thresholding the image, which creates a mask where only the white pixels remain. There were a few white blobs that were not numbers, so I used findContours to color those unwanted blobs black.
Result:
Code:
import numpy as np
import cv2
# load image
image = cv2.imread('number.png')
# resize image
image = cv2.resize(image,None,fx=5, fy=5, interpolation = cv2.INTER_CUBIC)
# create grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform threshold
retr, mask = cv2.threshold(gray_image, 230, 255, cv2.THRESH_BINARY)
# find contours
ret, contours, hier = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# draw black over the contours smaller than 200 - remove unwanted blobs
for cnt in contours:
# print contoursize to detemine threshold
print(cv2.contourArea(cnt))
if cv2.contourArea(cnt) < 200:
cv2.drawContours(mask, [cnt], 0, (0), -1)
#show image
cv2.imshow("Result", mask)
cv2.imshow("Image", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

find rectangle in image and extract text inside of it to save it as new image

I am new to OpenCV so I really need your help. I have a bunch of images like this one:
I need to detect the rectangle on the image, extract the text part from it and save it as a new image.
Can you please help me with this?
Thank you!

Just to add to Danyals answer I have added an example code with steps written in comments. For this image you don't even need to perform morphological opening on the image. But usually for this kind of noise in the image it is recomended. Cheers!
import cv2
import numpy as np
# Read the image and create a blank mask
img = cv2.imread('napis.jpg')
h,w = img.shape[:2]
mask = np.zeros((h,w), np.uint8)
# Transform to gray colorspace and invert Otsu threshold the image
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# ***OPTIONAL FOR THIS IMAGE
### Perform opening (erosion followed by dilation)
#kernel = np.ones((2,2),np.uint8)
#opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
# ***
# Search for contours, select the biggest and draw it on the mask
_, contours, hierarchy = cv2.findContours(thresh, # if you use opening then change "thresh" to "opening"
cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key=cv2.contourArea)
cv2.drawContours(mask, [cnt], 0, 255, -1)
# Perform a bitwise operation
res = cv2.bitwise_and(img, img, mask=mask)
########### The result is a ROI with some noise
########### Clearing the noise
# Create a new mask
mask = np.zeros((h,w), np.uint8)
# Transform the resulting image to gray colorspace and Otsu threshold the image
gray = cv2.cvtColor(res,cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Search for contours and select the biggest one again
_, contours, hierarchy = cv2.findContours(thresh,cv2.RETR_TREE,cv2.CHAIN_APPROX_NONE)
cnt = max(contours, key=cv2.contourArea)
# Draw it on the new mask and perform a bitwise operation again
cv2.drawContours(mask, [cnt], 0, 255, -1)
res = cv2.bitwise_and(img, img, mask=mask)
# If you will use pytesseract it is wise to make an aditional white border
# so that the letters arent on the borders
x,y,w,h = cv2.boundingRect(cnt)
cv2.rectangle(res,(x,y),(x+w,y+h),(255,255,255),1)
# Crop the result
final_image = res[y:y+h+1, x:x+w+1]
# Display the result
cv2.imshow('img', final_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:

One way to do this (if the rectangle sizes are somewhat predictable) is:
Convert the image to black and white
Invert the image
Perform morphological opening on the image from (2) with a horizontal line / rectangle (I tried with 2x30).
Perform morphological opening on the image from (2) with a vertical line (I tried it with 15x2).
Add the images from (3) and (4). You should only have a white rectangle now. Now can remove all corresponding rows and columns in the original image that are entirely zero in this image.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to improve pytesseract function for capctha decoding? - python

Related

Pytesseract and OpenCV can't detect digits

How to handle Credit Cards fonts with OpenCV and Tesseract in Python

remove small whits dots from binary image using opencv python

How to extract text from an image with a slight background present?

find rectangle in image and extract text inside of it to save it as new image

Categories

Resources