I have a following question. I would like to read these types of captcha in python:
The best code I have done is this, however it is not able to solve all these captchas:
import pytesseract
import cv2
import numpy as np
import re
def odstran_sum(img,threshold):
"""Funkce odstrani sum."""
filtered_img = np.zeros_like(img)
labels,stats= cv2.connectedComponentsWithStats(img.astype(np.uint8),connectivity=8)[1:3]
label_areas = stats[1:, cv2.CC_STAT_AREA]
for i,label_area in enumerate(label_areas):
if label_area > threshold:
filtered_img[labels==i+1] = 1
return filtered_img
def preprocess(img_path):
"""Konvertuje do binary obrazku."""
img = cv2.imread(img_path,0)
blur = cv2.GaussianBlur(img, (3,3), 0)
thresh = cv2.threshold(blur, 150, 255, cv2.THRESH_BINARY_INV)[1]
filtered_img = 255-odstran_sum(thresh,20)*255
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
erosion = cv2.erode(filtered_img,kernel,iterations = 1)
return erosion
def captcha_to_string(obrazek):
"""Funkce vrati text z captchy"""
text = pytesseract.image_to_string(obrazek)
return re.sub(r'[^\x00-\x7F]+',' ', text).strip()
img = preprocess(CAPTCHA_NAME)
text = captcha_to_string(img)
print(text)
Is it possible to improve my code that it will be able to read all these five examples? Thanks a lot.
I don't think there is much to be improved beside writing own neural network for image recognition based on similar captchas. Captchas are rather designed so that computer has hard time decoding them, so I don't think you can get perfect results.
Related
I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?
I tried image processing operations on the image and I used pytesseract to recognize the characters.
Using the code bellow I got from this photo:
to this one:
and then to this one:
Sharpen function:
def sharpen(img):
sharpen = iaa.Sharpen(alpha=1.0, lightness = 1.0)
sharpen_img = sharpen.augment_image(img)
return sharpen_img
Image processing code:
textZone = cv2.pyrUp(sharpen(originalImage[y:y + h - 1, x:x + w - 1])) #text zone cropped from the original image
sharp = cv2.cvtColor(textZone, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(sharp, 127, 255, cv2.THRESH_BINARY)
#the functions such as opening are inverted (I don't know why) that's why I did opening with MORPH_CLOSE parameter, dilatation with erode and so on
kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
open = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_open)
kernel_dilate = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,7))
dilate = cv2.erode(open,kernel_dilate)
kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
close = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, kernel_close)
print(pytesseract.image_to_string(close))
This is the result of pytesseract.image_to_string:
22203;?!)
92:53 a
The expected result is :
22/03/20
02:53 A
"Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?"
First, kudos for taking this project on and getting this far with it. What you have from the OpenCV/cv2 standpoint looks pretty good.
Now, if you're thinking of Tesseract to carry you the rest of the way, at the very least you'll have to train it. Here you have a tough choice: Invest in training Tesseract, or work up a CNN to recognize a limited alphabet. If you have a way to segment the image, I'd be tempted to go with the latter.
From the result you got and the expected result, you can see that some of the characters are recognized correctly. Assuming you are using a different image from that shown in the tutorial, I recommend you to change the values of threshold and getStructuringElement.
These values work better depending on the image color. The tutorial author must have optimized it for his/her use (by trial and error or some other way).
Here is a video if you want to play around with those value using sliders in opencv. You can also print your result in the same loop to see if you are getting the desired result.
One potential thing you could do to improve recognition on the characters is to dilate the characters so pytesseract gives a better result. Dilating the characters will connect the individual blobs together and can fix the / or the A characters. So starting with your latest binary image:
Original
Dilate with a 3x3 kernel with iterations=1 (left) or iterations=2 (right). You can experiment with other values but don't do it too much or the characters will all connect. Maybe this will provide a better result with you OCR.
import cv2
image = cv2.imread("1.PNG")
thresh = cv2.threshold(image, 115, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(thresh, kernel, iterations=1)
final = cv2.threshold(dilate, 115, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('final', final)
cv2.waitKey(0)
I have tried to read text from image of receipt using pytesseract. But a result text have a lot weird characters and it really looks awful.
There is my code which i used to manipulate image:
import sys
from PIL import Image
import cv2 as cv
import numpy as np
import pytesseract
def manipulate_image(img):
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
kernel = np.ones((1,1), dtype = "uint8")
img = cv.erode(img, kernel, iterations = 1)
img = cv.threshold(img, 0, 255,
cv.THRESH_BINARY | cv.THRESH_OTSU)[1]
img = cv.medianBlur(img, 3)
return img
if len(sys.argv) > 2:
print("Please provide only name of image.")
elif len(sys.argv) == 2:
img = cv.imread(sys.argv[1])
img = manipulate_image(img)
cv.imwrite("test.png", img)
text = pytesseract.image_to_string(img)
print text.encode('utf8')
else:
print("Please provide name of image.")
there is my test receipt image:
https://imgur.com/a/RjeQ9dL
and there is output image after manupulate:
https://imgur.com/a/1tFZRdq
and there is text result:
""'9vco4v‘l7
0 .Vt3t00N 00t300N BUNUUS
SKLEP PUU POPUGOH|
UL. JHGIELLUNSKA 25, 70-364 SZCZ[C|N
TEL. 91 4841-20-58
N|P: 955—150-21-B2
dn.19r03.05 Uydr.8534
PARAGON FISKALNY
CIHSTKH 17 0,3 ¥ 16,30 = 4.89 B
Sp.0p.B 4,89 PTU B= 8,00% 0,35
Razem PTU 0,35
ZOP{HCUNU GUTUNKQ PLN
RESZTA PLN
0025/1373 H0103 0N|0 H.
15F H9HF[B9416} 13fl02D6k0[20D4334C
7?? BW 140
Any idea how to perform it in better way to get nicer results?
Applying simple thresholding will not be enough for pyTesseract to properly detect the characters. There is much more preprocessing that can be done to drastically improve your results, such as:
using Tesseract V4, where deep learning is implemented
segmenting characters
using only the part of the receipt where the text is through edge detection
perspective transform to straighten out the text
These are somewhat lengthy topics to write all in one answer, but you can check out some articles on pyImageSearch, where this is talked about in much more depth:
https://www.pyimagesearch.com/2014/09/01/build-kick-ass-mobile-document-scanner-just-5-minutes/
https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/
I am currently working on a Optical Character Recognition system for Japanese letters. I am already able to identify individual characters if the letter in question is seperated and in the right size. (The deep learning part of the work)
As a next step I am trying to segmentate individual characters in an image to make a prediction about which letter it is. (For now it is only about black characters on white background, scanned PDFs and such)
So far, the most promising results I got was using the function "cv2.findContours" from OpenCV.
Here are 3 examples:
While the results are not entirely horrible, there are still many cases where two or more characters are treated as one or where one characters is split up into multiple boxes. I cannot seem to make the code work for all fonts and character sizes.
While the first image is still pretty close to being perfect, the second and third are not nearly as accurate. (I hope it is clear where the mistakes are)
I tried completely different approaches, such as hough transform, but I couldn't achieve anything nearly as good as this approach.
This by the way is my current code:
import cv2
import numpy as np
file_name = '../data/test.jpg'
img = cv2.imread(file_name)
img_final = cv2.imread(file_name)
img_final = cv2.resize(img_final, (img_final.shape[1], img_final.shape[0]))
img2gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(img2gray,(7,7),0)
# thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)Y)
ret, mask = cv2.threshold(blur, 180, 255, cv2.THRESH_BINARY)
image_final = cv2.bitwise_and(img2gray , img2gray , mask = mask)
ret, new_img = cv2.threshold(image_final, 180 , 255, cv2.THRESH_BINARY_INV)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(2,2))
dilated = cv2.dilate(new_img,kernel,iterations = 1)
_, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
index = 0
for contour in contours:
[x,y,w,h] = cv2.boundingRect(contour)
if w <1 and h<1:
continue
cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,255),2)
cropped = image_final[y :y + h , x : x + w]
s = '../output/crop_' + str(index) + '.jpg'
cv2.imwrite(s , cropped)
index = index + 1
cv2.imshow('captcha_result' , img)
cv2.waitKey()
s2 = '../data/output.jpg'
cv2.imwrite(s2 , img)
Now questions are the following:
Does anybody have an idea how to improve the accuracy of my code?
Is it better to take a whole new approach?
Can a sliding window help me here?
Where do I go from here?
Can I maybe use the sliding window to send the individual characters to the prediction?
With all the false positives (e.g. characters being split in two, despite trying to limit it) I am uncertain whether or not I can simply use the cropped images of characters as they are and how to further filter the results.
As I am new to all this, I would really appreciate any help or hint I can get!
I am looking forward to your replies! :)
Good morning,
I'm currently trying to study real-time liquid surface deformations by sending a laser sheet on the surface and gathering its reflection. What I obtain is typically a bright curve at each timestep, and I wish to analyze its coordinates.
I thus brought myself to write a Python script, which is displayed right below (The analysis part is retaken from laser curved line detection using opencv and python, as it represents nearly exactly what I'm trying to do, except that I'm working with a video flow) :
import cv2
from PIL import Image
import cv2.cv as cv
import numpy as np
import time
myfile = open("hauteur.txt","w")
#Import camera flow
class Target:
def __init__(self):
self.capture = cv.CaptureFromCAM(0)
cv.namedWindow("Target", 1)
cv.SetCaptureProperty(self.capture,cv.CV_CAP_PROP_FRAME_WIDTH, 150)
cv.SetCaptureProperty(self.capture,cv.CV_CAP_PROP_FRAME_HEIGHT, 980)
cv.SetCaptureProperty(self.capture,cv.CV_CAP_PROP_FPS, 60 )
def run(self):
frame = cv.QueryFrame(self.capture)
frame_size = cv.GetSize(frame)
color_image_cv = cv.CreateImage(cv.GetSize(frame), 8, 3)
color_image = np.array(color_image_cv)
grey_image = cv.CreateImage(cv.GetSize(frame), cv.IPL_DEPTH_8U, 1)
first = True
t = time.clock()
# Frame analysis
while True:
ret, bw = cv2.threshold(color_image, 0, 255, cv2.THRESH_BINARY)
contours, hierarchy = cv2.findContours(bw, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
curves = np.zeros((img.shape[0], img.shape[1], 3), np.uint8)
for i in range(len(contours)):
for col in range(draw.shape[1]):
M = cv2.moments(draw[:, col])
if M['m00'] != 0:
x = col
y = int (M['m01']/M['m00'])
curves[y, x, :] = (0, 0, 255)
res = {'X' : x, 'Y' : y, 't' : t}
print res
myfile.write('{X}\t{Y}\t{t}'.format(**res))
myfile.write("\n")
cv2.ShowImage("Target", color_image)
# Listen for ESC key
c = cv2.WaitKey(7) % 0x100
if c == 27:
break
if __name__=="__main__":
t = Target()
t.run()
However, the use of cv and cv2 functions within the same code seems to bring a nice mess and I get the error
src data type = 17 is not supported
from line
ret, bw = cv2.threshold(color_image, 0, 255, cv2.THRESH_BINARY)
I understand this arises from the way cv and cv2 functions create and store images, but any conversion process I try doesn't seem to work, and I didn't find equivalent cv2 functions to insert in my video flow importing part (but, as you may understand, I'm clearly not a programming pro and I may have skipped what I'd need in the documentation). Is there then a way to conciliate these cv and cv2 functions, or get a equivalent camera flow with cv2 functions ?
Bonus question : How fast can an script like this run (considering that I'd eventually need this to run at 300-400 fps, I'm not even sure this is actually feasible) ?
Thanks for your attention
ok, cv2 video code:
def __init__(self):
self.capture = cv2.VideoCapture(0)
cv2.namedWindow("Target", 1)
self.capture.set(cv2.CAP_PROP_FRAME_WIDTH, 150)
self.capture.set(cv2.CAP_PROP_FRAME_HEIGHT, 980)
self.capture.set(cv2.CAP_PROP_FPS, 60 )
def run(self):
ok, frame = self.capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY);
...
Bonus question : ofc, it can only run as fast, as the capture delivers. 300fps seems absurd, 30fps, more likely.
I use openCV to find the external contour of a given image and fill it.
The images I use as input are images of pants like the one attached. The problem is that sometimes (like in the attached image) the contour is not completely closed and then I can't fill it. What can I do in this case?
Please see code below.
Thanks, Li
from PIL import Image
import os
import numpy
import bs4
import scipy
import cv2
image_obj_original = cv2.imread(image_file)
image_name = os.path.split(image_file)[-1]
name, extension = os.path.splitext(image_name)
# normalize to a standard size
image_obj = cv2.resize(image_obj_original, STANDARD_SIZE)
imWithBorder = cv2.copyMakeBorder(image_obj, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=[255, 255, 255])
# convert to grey-scale
greyscale_image = cv2.cvtColor(imWithBorder,cv2.COLOR_BGR2GRAY)
# get canny edges
canny_edges = cv2.Canny(greyscale_image, 1, 255)
h, w = canny_edges.shape[:2]
contours0, hierarchy = cv2.findContours( canny_edges.copy(), cv2.RETR_EXTERNAL , cv2.CHAIN_APPROX_SIMPLE)
contours = [cv2.approxPolyDP(cnt, 3, True) for cnt in contours0]
vis = numpy.zeros((h, w, 3), numpy.uint8)
cv2.drawContours(vis,contours,0,255,-1)
vis = cv2.bitwise_not(vis)
cv2.imshow('image', vis)
I agree that it is somewhat annoying that Canny in OpenCV is not always closing one last pixel in contour of object, and that prevent you from finding one closed contour. The workaround that I used to solve this problem is use of "close" morphological operation:
dilate(canny_edges, canny_edges, Mat());
erode(canny_edges, canny_edges, Mat());
As any workaround it is not perfect but it does solves the problem.