Blurry text from identification documents detection and ocr - python

I have a very specific scene text detection and parsing problem. I am not even sure if you can say it is an actual scene text.
I have extracted a name field from an identity card photo:
I could immediately start applying some OCR on that image, but i believe a further text localisation could be applied. To achieve this image: Do you know any of such text localisation algorithms? I have already tried 'FASText by Busta', 'EAST by argman' and they work decently. Any algorithms on this specific task?
After the localisation of the text i think now it is the best time to apply OCR. And now i feel lost. Which OCR could you recommend to use? I have already tried 'Tesseract' but it just doesn't work well. Is it a better idea to make your own OCR for document characters by using e.g. Tensorflow?

Try to increase the contrast of the image. You can use:
import matplotlib.pyplot as plt
import cv2
import numpy as np
def cvt_BGR2RGB(img):
return cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
def contrast(img,show=False):
# CLAHE (Contrast Limited Adaptive Histogram Equalization)
clahe=cv2.createCLAHE(clipLimit=3., tileGridSize=(8,8))
lab=cv2.cvtColor(img, cv2.COLOR_BGR2LAB) # convert from BGR to LAB color space
l,a,b=cv2.split(lab) # split on 3 different channels
l2=clahe.apply(l) # apply CLAHE to the L-channel
lab=cv2.merge((l2,a,b)) # merge channels
img2=cv2.cvtColor(lab, cv2.COLOR_LAB2BGR) # convert from LAB to BGR
if show:
#plot the original and contrasted image
f=plt.figure(figsize=(15,15))
ax1=f.add_subplot(121)
img1_cvt=cvt_BGR2RGB(img)
plt.imshow(img1_cvt)
ax2=f.add_subplot(122)
img2_cvt=cvt_BGR2RGB(img2)
plt.imshow(img2_cvt)
plt.show()
return img,img2
And maybe then you can use pyteserract

Related

Why can't tesseract extract text that has a black background?

I have attached a very simple text image that I want text from. It is white with a black background. To the naked eye it seems absolutely legible but apparently to tesseract it is a rubbish. I have tried changing the oem and psm parameters but nothing seems to work. Please note that this works for other images but for this one.
Please try running it on your machine and see if it works. Or else I might have to change my ocr engine altogether.
Note: It was working earlier until I tried to add black pixels around the image to help the extraction process. Also I don't think that tesseract was trained on black text on a white background. It should be able to do this too. Also if this was true why does it work for other text images that have the same format as this one
Edit: Miraculously I tried running the script again and this time it was able to extract Chand properly but failed in the below mentioned case. Also please look at the parameters I have used. I have read the documentation and I feel this would be the right choice. I have added the image for your reference. It is not about just this image. Why is tesseract failing for such simple use cases?
To find the desired result, you need to know the followings:
Page-segmentation-modes
Suggested Image processing methods
The input images are boldly written, we need to shrink the bold font and then assume the output as a single uniform block of text.
To shrink the images we could use erosion
Result will be:
Erode
Result
CHAND
BAKLIWAL
Code:
# Load the library
import cv2
import pytesseract
# Initialize the list
img_lst = ["lKpdZ.png", "ZbDao.png"]
# For each image name in the list
for name in img_lst:
# Load the image
img = cv2.imread(name)
# Convert to gry-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Erode the image
erd = cv2.erode(gry, None, iterations=2)
# OCR with assuming the image as a single uniform block of text
txt = pytesseract.image_to_string(erd, config="--psm 6")
print(txt)

Counting Objects in an image using OPENCV and Python

I'm currently in the pursue of counting the number of shrimps in a given image. I'm using this test image:
The code I have used so far is the following:
import cv2
import numpy as np
from matplotlib import pyplot as plt
#Load img
path = r'C:\Users\...' #the path to the image
original=cv2.imread(path, cv2.COLOR_BGR2RGB)
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
#Hist to proceed with the binarizarion
hist = cv2.calcHist([img],[0],None,[256],[0,256])
#do the threshold
ret,thresh = cv2.threshold(img,60,255,cv2.THRESH_BINARY_INV)
From this point I have tried different morphological transformations such a erode, dilate, open and close but they don't seem to be working and separating the objects as I want.
I've read that I can apply a Watershed transformation so separate touching elements, but I donĀ“t have experience in this (working at this point at the moment).
After that I am planning on using a Simple Blob Detector to count the blobs, I don't know if these steps are correct.
Any help is very welcomed!

Why are my images displaying as grey squares?

I'm writing a script in Python for my image processing class, which should read a directory for images, display them, and then I will eventually add additional code to perform Otsu thresholding on these images. I can get a reference image to display properly to include Otsu thresholding; however, I run into trouble when I attempt to display the remaining images in the directory. I am not sure that my images are being read from the directory correctly, as I am trying to store them in an array; however, I can see the output window displays grey squares which correspond to the dimensions of the actual image resolutions, which suggests that they are being at least partly read correctly.
I've already attempted to isolate the script to load images and display them into a separate file and running it. I was concerned that the successful processing of my sample image (which included a black/white binarization) was somehow affecting my image display later. This was not the case, as running a separate script produced the same grey square output.
****Update****
I've managed to tweak the below script(not yet updated) to run almost correctly. By writing the full filepath directly for each file, I can get the output to display correctly. It appears there is some issue with loading images into an array, best I can tell; a potential workaround for future testing is importing file locations as a string array, and implementing that vs. loading images into an array directly.
import cv2 as cv
import numpy as np
from PIL import Image
import glob
from matplotlib import pyplot as plot
import time
image=cv.imread('Fig ref.jpg')
image2=cv.cvtColor(image, cv.COLOR_RGB2GRAY)
cv.imshow('Image', image)
# global thresholding
ret1,th1 = cv.threshold(image2,127,255,cv.THRESH_BINARY)
# Otsu's thresholding
ret2,th2 = cv.threshold(image2,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
# Otsu's thresholding after Gaussian filtering
blur = cv.GaussianBlur(image2,(5,5),0)
ret3,th3 = cv.threshold(blur,0,255,cv.THRESH_BINARY+cv.THRESH_OTSU)
# plot all the images and their histograms
images = [image2, 0, th1,
image2, 0, th2,
blur, 0, th3]
titles = ['Original Noisy Image','Histogram','Global Thresholding (v=127)',
'Original Noisy Image','Histogram',"Otsu's Thresholding",
'Gaussian filtered Image','Histogram',"Otsu's Thresholding"]
for i in range(3):
plot.subplot(3,3,i*3+1),plot.imshow(images[i*3],'gray')
plot.title(titles[i*3]), plot.xticks([]), plot.yticks([])
plot.subplot(3,3,i*3+2),plot.hist(images[i*3].ravel(),256)
plot.title(titles[i*3+1]), plot.xticks([]), plot.yticks([])
plot.subplot(3,3,i*3+3),plot.imshow(images[i*3+2],'gray')
plot.title(titles[i*3+2]), plot.xticks([]), plot.yticks([])
plot.show()
imageFolderPath = 'D:\Google Drive\Engineering\Senior Year\Image processing\Image processing group work'
imagePath = glob.glob(imageFolderPath + '/*.JPG')
im_array = np.array( [np.array(Image.open(img).convert('RGB')) for img in imagePath] )
temp=cv.imread("D:\Google Drive\Engineering\Senior Year\Image processing\Image processing group work\Fig ref.jpg")
cv.imshow('image', temp)
time.sleep(15)
for i in range(9):
cv.imshow('Image', im_array[i])
time.sleep(2)
plot.subplot(3,3,i*3+3),plot.imshow(images[i*3+2],'gray'): The second argument says you use gray color map. Get rid of it and you would get color displays.

Python, text detection OCR

I am trying to extract data from a scanned form. The form has a standard format similar to the one shown in the image below:
I have tried using pytesseract (tesseract OCR) to detect the image's text and it has done a decent job at finding the text and converting the image to text.
However it essentially just gives me all the detected text without keeping the format of the data.
I would like to be able to do something like the below:
Find a particular piece of text and then find the associated data below or beside it. Similar to this question using opencv Detect text region in image using Opencv
Is there a way that I can essentially do the following:
Either find all text boxes on the form, perform OCR on each box and see which one is the closest match to the "witnesess:" text, then find the sections immediately below it and perform separate OCR on those.
Or if the form is standard and I know the approximate location of the "witness" text section can I specify its general location in opencv and then just extract the below text and perform OCR on it.
EDIT: I have tried the below code to try to detect specific regions of the text. However it is not specifically identifying the text just all regions.
import cv2
img = cv2.imread('t2.jpg')
mser = cv2.MSER_create()
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
cv2.imshow('img', vis)
Here is the result:
I think you have the answer already in your own post.
I did recently something similar and this is how I did it:
//id_image was loaded with cv2.imread
temp_image = id_image[start_y:end_y,start_x:end_x]
img = Image.fromarray(temp_image)
text = pytesseract.image_to_string(img, config="-psm 7")
So basically, if your format is predefined, you just need to know the location of the fields that you want the text of (which you already know), crop it, and then apply the ocr (tesseract) extraction.
In this case you need import pytesseract, PIL, cv2, numpy.

OpenCV cv2.rectangle output binary image

I have been trying to draw rectangle on a black image, uscv2.rectangle.Here is my code : (It is just a sample, in actual code there is a loop i.e values x2,y2,w2,h2 changes in a loop)
heir = np.zeros((np.shape(image1)[0],np.shape(image1)[1]),np.uint8);
cv2.rectangle(heir,(x2,y2),(x2+w2,y2+h2),(255,255,0),5)
cv2.imshow("img",heir);
cv2.waitKey()
It is giving the following output:
Why the image is like that? Why the boundaries are not just a line a width 5.
I have tried, but I am not able to figure it out.
Can't post this in a comment, but it's a negative answer: the same operations work for me on Windows/python 2.7.8/opencv3.1
import numpy as np
import cv2
heir = np.zeros((100,200),np.uint8);
x2=10
y2=20
w2=30
h2=40
cv2.rectangle(heir,(x2,y2),(x2+w2,y2+h2),(255,255,0),5)
cv2.imshow("img",heir);
cv2.waitKey()
Because you are loading the image to be tagged (draw rectangles) in grayscale, thats why when you are adding rectangles/bounding boxes the colors are being converted to grayscale.
To fix the issue, open image in "color" format. Since, you didn't included that part of code, here is the proposed solution:
tag_img = cv2.imread(MYIMAGE,1)
Pay attention to the second parameter here, which is "1" and means load image as color. Read more about reading images here: https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_gui/py_image_display/py_image_display.html

Categories

Resources