I am working on OCR project in Python and using layoutparser library which implements tesseract to do the detection. I have issues trying to detect one number in the document as i cannot get the accuracy over 45% (depending on preprocessing methods it's between 30 and 45%) everything else is between 80 and 100%. I have tried multiple different methods of preprocessing but without any success so I want to know if there is anything else to try.
Original image is a scan of document, with multiple fields and a graph (i cannot share original image as it contains client data). One of the fields contains the number im interested in. First i tried doing OCR on whole document, which gave me the best results (45% accuracy). Image was converted to grayscale, resized to height 1800, while keeping aspect ratio and then I applied canny edge detector from opencv. The only tesseract parameter used was languages="slk" as i need it to detect other data as well. I wasn't really happpy with the results so i thought why not try to cut out just the number and do detection on that.
So i cut out the field (from grayscaled and resized image) which contains the number + some area around it since i dont know where exactly the number is, i can only guess. This is how the cutout looks like image here. It always contains only digits and is not skewed. Best result i achieved was 33% accuracy with the following preprocessing:
image = adjust_gamma(image, 1.2)
image = denoise_tv_chambolle(image, multichannel=True)
image = cv2.normalize(src=image, dst=None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
image = cv2.bitwise_not(image)
Tesseract parameters used: languages="slk", config="--psm 6, digits". Bitwise_not is used so the resulting image is black digits on white background, which should result in tesseract giving better results.
Other things i tried for preprocessing were thresholding(binary and otsu), adaptive thresholding, blur, gaussian blur, dilation, erosion, all different --psm parameters and LSTM model for tesseract. None of them gave better results. I read through tesseract preprocessing documentation and multiple other stackoverflow questions with similar problems. I am thankful for any ideas about what to do to get better results.
Related
I'm currently learning about computer vision OCR. I have an image that needs to be scan. I face a problem during the image cleansing.
I use opencv2 in python to do the things. This is the original image:
image = cv2.imread(image_path)
cv2.imshow("imageWindow", image)
I want to cleans the above image, the number at the middle (64) is the area I wanted to scan. However, the number got cleaned as well.
image[np.where((image > [0,0,105]).all(axis=2))] = [255,255,255]
cv2.imshow("imageWindow", image)
What should I do to correct the cleansing here? I wanted to make the screen where the number 64 located is cleansed coz I will perform OCR scan afterwards.
Please help, thank you in advance.
What you're trying to do is called "thresholding". Looks like your technique is recoloring pixels that fall below a certain threshold, but the LCD digit darkness varies enough in that image to throw it off.
I'd spend some time reading about thresholding, here's a good starting place:
Thresholding in OpenCV with Python. You're probably going to need an adaptive technique (like Adaptive Gaussian Thresholding), but you may find other ways that work for your images.
I'm trying to do OCR of a scanned document which has handwritten signatures in it. See the image below.
My question is simple, is there a way to still extract the names of the people using OCR while ignoring the signatures? When I run Tesseract OCR it fails to retrieve the names. I tried grayscaling/blurring/thresholding, using the code below, but without luck. Any suggestions?
image = cv2.imread(file_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.GaussianBlur(image, (5, 5), 0)
image = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
You can use scikit-image's Gaussian filter to blur thin lines first (with an appropriate sigma), followed by binarization of image (e.g., with some thresholding function), then by morphological operations (such as remove_small_objects or opening with some appropriate structure), to remove the signatures mostly and then try classification of the digits with sliding window (assuming that one is already trained with some blurred characters as in the test image). The following shows an example.
from skimage.morphology import binary_opening, square
from skimage.filters import threshold_minimum
from skimage.io import imread
from skimage.color import rgb2gray
from skimage.filters import gaussian
im = gaussian(rgb2gray(imread('lettersig.jpg')), sigma=2)
thresh = threshold_minimum(im)
im = im > thresh
im = im.astype(np.bool)
plt.figure(figsize=(20,20))
im1 = binary_opening(im, square(3))
plt.imshow(im1)
plt.axis('off')
plt.show()
[EDIT]: Use Deep Learning Models
Another option is to pose the problem as an object detection problem where the alphabets are objects. We can use deep learning: CNN/RNN/Fast RNN models (with tensorflow/keras) for object detection or Yolo model (refer to the this article for car detection with yolo model).
I suppose the input pictures are grayscale, otherwise maybe the different color of the ink could have a distinctive power.
The problem here is that, your training set - I guess - contains almost only 'normal' letters, without the disturbance of the signature - so naturally the classifier won't work on letters with the ink of signature on them. One way to go could be to extend the training set with letters of this type. Of course it is quite a job to extract and label these letters one-by-one.
You can use real letters with different signatures on them, but it might be also possible to artificially generate similar letters. You just need different letters with different snippets of signatures moved above them. This process might be automated.
You may try to preprocess the image with morphologic operations.
You can try opening to remove the thin lines of the signature. The problem is that it may remove the punctuation as well.
image = cv2.imread(file_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(5,5))
image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
You may have to alter the kernel size or shape. Just try different sets.
You can try other OCR providers for the same task. For example, https://cloud.google.com/vision/ try this. You can upload an image and check for free.
You will get a response from API from where you can extract the text which you need. Documentation for extracting that text is also given on the same webpage.
Check out this. this will help you in fetching that text. this is my own answer when I faced the same problem. Convert Google Vision API response to JSON
I have two different types of images (which I cannot post due to reputation, so I've linked them.):
Image 1 Image 2
I was trying to extract hand features from the images using OpenCV and Python. Which kinda looks like this:
import cv2
image = cv2.imread('image.jpg')
blur = cv2.GaussianBlur(image, (5,5), 0)
gray = cv2.cvtColor(blur, cv2.COLOR_BGR2GRAY)
retval, thresh1 = cv2.threshold(gray, 70, 255, / cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
cv2.imshow('image', thresh1)
cv2.waitKey(0)
The result of which looks like this:
Image 1 Image 2
The change in background in the second image is messing with the cv2.threshold() function and its not getting the skin parts right. Is there a way to do this right?
As a follow up question, what is the best way to extract hand features? I tried a HaaR Cascade and I didn't really get results? Should I train my own cascade? What other options do I have?
It's hard to say based on a sample size of two images, but I would try OpenCV's Integral Channel Features (ChnFtrs), which are like supercharged Haar features that can take cues from colour as well as any other image channels you care to create and provide.
In any case, you are going to have to train your own cascades. Separate cascades for front and profile shots of course.
Take out your thresholding by skin colour, because as you've already noticed, it may throw away some or all of the hands depending on the actual subject's skin colour and lighting. ChnFtrs will do the skin detection for you more robustly than a fixed threshold can. (Though for future reference, all humans are actually orange :))
You could eliminate some false positives by only detecting within a bounding box of where you expect the hands to be.
Try both RGB and YUV channels to see what works best. You could also throw in the results of edge detection (say, Canny, maximised across your 3 colour channels) for good measure. At the end, you could cull channels which are underused to save processing if necessary.
If you have much variation in hand pose, you may need to group similar poses and train a separate ChnFtrs cascade for each group. Individual cascades do not have a branching structure, so they do not cope well when the positive samples are disjoint in parameter space. This is, AFAIK, a bit of an unexplored area.
A correctly trained ChnFtrs cascade (or several) may give you a bounding box for the hands, which will help in extracting hand contours, but it can't exclude invalid contours within the same bounding box. Most other object detection routines will also have this problem.
Another option, which may be better/simpler than ChnFtrs, is LINEMOD (a current favourite of mine). It has the advantage that there's no complex training process, nor any training time needed.
I am working with Google Vision API and Python to apply text_detection which is an OCR function of Google Vision API which detects the text on the image and returns it as an output. My original image is the following:
I have used the following different algorithms:
1) Apply text_detection to the original image
2) Enlarge the original image by 3 times and then apply text_detection
3) Apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this
4) Enlarge the original image by 3 times, apply Canny, findContours, drawContours on a mask (with OpenCV) and then text_detection to this
5) Sharpen the original image and then apply text_detection
6) Enlarge the original image by 3 times, sharpen the image and then apply text_detection
The ones which fare the best are (2) and (5). On the other hand, (3) and (4) are probably the worse among them.
The major problem is that text_detection does not detect in most cases the minus sign especially the one of '-1.00'.
Also, I do not know why, sometimes it does not detect '-1.00' itself at all which is quite surprising as it does not have any significant problem with the other numbers.
What do you suggest me to do to detect accurately the minus sign and in general the numbers?
(Keep in mind that I want to apply this algorithm to different boxes so the numbers may not be at the same position as in this image)
I dealt with the same problem. Your end goal is to correctly identify the text. For OCR conversion you are using a third party service or tool (google API / tesseract etc.)
All the the approach that you are talking about become meaningless because whatever transformations that you are doing using openCV will be repeated by tesseract.The best you should do is supply the input in a easy format.
What did work for me the best is breaking the image is parts (BOXES - "SQUARES AND RECTANGLES" - using a sample code for identifying the rectangles in all channels in openCV repo examples using https://github.com/opencv/opencv/blob/master/samples/python/squares.py) and then crop it and then send it for OCR by parts.
Since you are using Google Vision API which detects the text on the image, so it is not obvious for a text detection API to detect negative numbers in first place. Assuming that fact that you may not able to re-train the API as per your case, I would recommend you to write a simple script which filters the contours on the basis of it's shape and size, using this script you can easily segment out the negative signs and then merge it with the output from Google Vision API as
import cv2
import numpy as np
img = cv2.imread("path/to/img.jpg", 0)
ret, thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
i, contours, hierarchy = cv2.findContours(thresh.copy(), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
# filter the contours.
for cnt in contours:
x, y, w, h = cv2.boundingRect(cnt)
if 5 < cv2.contourArea(cnt) < 50 and float(w)/h > 3:
print "I have detected a minus sign at : ", x, y, w, h
After this filtering process you can make a calculated guess if a given digit has a negative sign close it it's left side.
If Google Vision API uses Tesseract, which I think it does,,
then optimization is usually as follows:
Sharpen
Binarize (or grayscale if you must)
Trim borders (Tesseract likes smooth background)
Deskew (Tesseract tolerates very small skew angle. It likes nice straight text lines)
Reshape and resize (Put it in a page-like shape and resize if necessary)
As for negative signs, well, use Tesseract directly, if you can.
You will be able to retrain it or to download better trainings.
Or well, you can correct the errors using additional algorithm. I.e. implement your recheck as suggested in ZdaR's answer.
I want to convert the picture into black and white image accurately where the seeds will be represented by white color and the background as black color. I would like to have it in python opencv code. Please help me out
I got good result for the above picture using the given code below. Now I have another picture for which thresholding doesn't seem to work. How can I tackle this problem. The output i got is in the following picture
also, there are some dents in the seeds, which the program takes it as the boundary of the seed which is not a good results like in the picture below. How can i make the program ignore dents. Is masking the seeds a good option in this case.
I converted the image from BGR color space to HSV color space.
Then I extracted the hue channel:
Then I performed threshold on it:
Note:
Whenever you face difficulty in certain areas try working in a different color space, the HSV color space being most prominent.
UPDATE:
Here is the code:
import cv2
import numpy as np
filename = 'seed.jpg'
img = cv2.imread(filename) #---Reading image file---
hsv_img = cv2.cvtColor(img,cv2.COLOR_BGR2HSV) #---Converting RGB image to HSV
hue, saturation, value, = cv2.split(hsv_img) #---Splitting HSV image to 3 channels---
blur = cv2.GaussianBlur(hue,(3,3),0) #---Blur to smooth the edges---
ret,th = cv2.threshold(blur, 38, 255, 0) #---Binary threshold---
cv2.imshow('th.jpg',th)
Now you can perform contour operations to highlight your regions of interest also. Try it out!! :)
ANOTHER UPDATE:
I found the contours higher than a certain constraint to get this:
There are countless ways for image segmentation.
The simplest one is a global threshold operation. If you want to know more on other methods you should read some books. Which I recommend anyway befor you do any further image processing. It doesn't make much sense to start image processing if you don't know the most basic tools.
Just to show you how this could be achieved:
I converted the image from RGB to HSB. I then applied separate global thresholds to the hue and brightness channels to get the best segmentation result for both images.
Both binary images were then combined using a pixelwise AND operation. I did this because both channels gave sub-optimal results, but their overlap was pretty good.
I also applied some morphological operators to clean up the results.
Of course you can just invert the image to get the desired black background...
Thresholds and the used channels of course depend on the image you have and what you want to achieve. This is a very case-specific process that can be dynamically adapted to a limited extend.
This could be followed by labling or whatever you need: