I'm using py-tesseract for OCR on images as below but I'm unable to get consistent output from the unprocessed images. How can the spotted background be reduced and the numbers highlighted using cv2 to increase accuracy? I'm also interested in keeping the separators in the output string.
Below pre-processing seems to work with some accuracy
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
(T, threshInv) = cv2.threshold(blurred, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
Getting output using psm --6: 6.903.722,99
Here's one solution, which is based on the ideas on a similar post. The main idea is to apply a Hit-or-Miss operation looking for the pattern you want to eliminate. In this case the pattern is one black (or white, if you invert the image) surrounded by pixels of the complimentary color. I've also included a thresholding operation with some bias, because some of the characters are easily destroyed (you could really benefit from more high-res image). These are the steps:
Get grayscale image via color conversion
Threshold with bias to get a binary image
Apply the Hit-or-Miss with one central pixel target kernel
Use the result from the prior operation to suppress the noise in the original image
Let's see the code:
# Imports:
import numpy as np
import cv2
image path
path = "D://opencvImages//"
fileName = "8WFNvsZ.jpg"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
thresh, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Use Otsu's threshold value and add some bias:
thresh = 1.05 * thresh
_, binaryImage = cv2.threshold(grayscaleImage, thresh, 255, cv2.THRESH_BINARY_INV )
The first bit of code gets the binary image of the input. Note that I've added some bias to the threshold obtained via Otsu to avoid degrading the characters. This is the result:
Ok, let's apply the Hit-or-Miss operation to get the dot mask:
# Perform morphological hit or miss operation
kernel = np.array([[-1,-1,-1], [-1,1,-1], [-1,-1,-1]])
dotMask = cv2.filter2D(binaryImage, -1, kernel)
# Bitwise-xor mask with binary image to remove dots
result = cv2.bitwise_xor(binaryImage, dotMask)
The dot mask is this:
And the result of subtracting (or XORing) this mask to the original binary image is this:
If I run the inverted (black text on white background) result image on PyOCR I get this string output:
Text is: 6.003.722,09
The other image produces this final result:
And its OCR returns this:
Text is: 4.705.640,00
Related
Good day. I'm trying to identify the both printed and hand written text from the below check leaf
and here is the image after preprocessing, used below code
import cv2
import pytesseract
import numpy as np
img = cv2.imread('Images/cheque_leaf.jpg')
# Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi)
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
h, w = img.shape[:2]
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
# it to reduce noise
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # perform OTSU threhold
thresh = cv2.rectangle(thresh, (0, 0), (w, h), (0, 0, 0), 2) # draw a rectangle around regions of interest in an image
# Dilates an image by using a specific structuring element.
# enrich the charecters(to large)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
# The function erodes the source image using the specified structuring element that determines
# the shape of a pixel neighborhood over which the minimum is taken
erode = cv2.erode(thresh, kernel, iterations = 1)
# To extract the text
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(thresh, config=custom_config)
and now using pytesseract.image_to_string() method to convert image to text. here I'm getting irrelavant output. In that above image I wanted to identify the date,branch payee,amount in both numbers and wordings and digital signature name followed by account number.
any OCR Techniques to solve the above problem by extract the exact data as mentioned above. Thanks in advance
The following is just one of the several approaches.
I would suggest using Sauvola threshold technique. Threshold is calculated for each pixel in the image using a specific formula mentioned here. It involves calculating the mean and standard deviation of pixel values within a certain window.
This functionality is available in the skimage library (also known as scikit-image)
Following is the working example for the given image:
from skimage.filters import threshold_sauvola
img = cv2.imread('cheque.jpg', cv2.IMREAD_GRAYSCALE)
# choosing a window size of 13 (feel free to change it and visualize)
thresh_sauvola = threshold_sauvola(img, window_size=13)
binary_sauvola = img > thresh_sauvola
# converting resulting Boolean array to unsigned integer array of 8-bit (0 - 255)
binary_sauvola_int = binary_sauvola.astype(np.uint8)
result = cv2.normalize(binary_sauvola_int, dst=None, alpha=0, beta=255,norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
Result:
Note: This result is just a launchpad to try out other image processing techniques to get your desired result.
I want to apply some kind of preprocessing to this image so that text can be more readable, so that later I can read text from image. I'm new to this so I do not know what should I do, should I increase contrast or should I reduce noise, or something else. Basically, I want to remove these gray areas on the image and keep only black letters (as clear as they can be) and white background.
import cv2
img = cv2.imread('slika1.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', img)
cv2.waitKey(0)
thresh = 200
img = cv2.threshold(img, thresh, 255, cv2.THRESH_BINARY)[1]
cv2.imshow('filter',img)
cv2.waitKey(0)
I read the image and applied threshold to the image but I needed to try 20 different thresholds until I found one that gives results.
Is there any better way to solve problems like this?
The problem is that I can get different pictures with different size of gray areas, so sometime I do not need to apply any kind of threshold, and sometimes I do, because of that I think that my solution with threshold is not that good.
For this image, my code works good:
But for this it gives terrible results:
Try division normalization in Python/OpenCV. Divide the input by its blurred copy. Then sharpen. You may want to crop the receipt better or mask out the background first.
Input:
import cv2
import numpy as np
import skimage.filters as filters
# read the image
img = cv2.imread('receipt2.jpg')
# convert to gray
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# blur
smooth = cv2.GaussianBlur(gray, (95,95), 0)
# divide gray by morphology image
division = cv2.divide(gray, smooth, scale=255)
# sharpen using unsharp masking
sharp = filters.unsharp_mask(division, radius=1.5, amount=1.5, multichannel=False, preserve_range=False)
sharp = (255*sharp).clip(0,255).astype(np.uint8)
# save results
cv2.imwrite('receipt2_division.png',division)
cv2.imwrite('receipt2_division_sharp.png',sharp)
# show results
cv2.imshow('smooth', smooth)
cv2.imshow('division', division)
cv2.imshow('sharp', sharp)
cv2.waitKey(0)
cv2.destroyAllWindows()
Division result:
Sharpened result:
Hi I am looking to improve my performance with pytesseract at digit recognition.
I take my raw image and split it into parts that look like this:
The size can vary.
To this I apply some pre-processing methods like so
image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)
and I get this
I then pass this to pytesseract:
num = pytesseract.image_to_string(result_img, lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
However this is not good enough for me and often gets numbers wrong.
I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.
Thank you.
You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold
to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string. We use the --psm 6 configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:
Input image -> Binary -> Invert
Result from Pytesseract OCR
8
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
I am trying to remove rules and a background smiley face from multiple notebook pages before performing text detection and recognition on the handwritten text.
An earlier thread offers helpful hints, but my problem is different in several respects.
The text to keep is written over the background items to be removed.
The items to be removed have distinct colors from that of the text, which may be the key to their removal.
The lines to be removed are not very straight, and the smiley face even less so.
I'm thinking of using OpenCV for this task, but I'm open to using ImageMagick or command-line GIMP so long as I can process the entire batch at once. Since I have never used any of these tools before, any advice would be welcome. Thank you.
Here's a simple approach with the assumption that the text is blue
Convert image to HSV format and color threshold with cv2.inRange()
Perform morphological transformations to smooth image
Isolate characters
Recolor characters for OCR/Tesseract
We begin by converting the image to HSV format and create a mask to isolate the characters
image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)
Now we perform morphological transformations to remove small noise
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
We have the desired text outlines so we can isolate characters by masking with the original image
result[close==0] = (255,255,255)
Finally to prepare the image for OCR/Tesseract, we change the characters to black
retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]
Full code
import numpy as np
import cv2
image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
result[close==0] = (255,255,255)
cv2.imshow('cleaned', result)
retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]
cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.waitKey()
I am implementing a program to detect lines in images from a camera. The problem is that when the photo is blurry, my line detection algorithm misses a few lines. Is there a way to increase the accuracy of the cv.HoughLines() function without editing the parameters?
Example input image:
Desired image:
My current implementation:
def find_lines(img):
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
edges = cv.dilate(gray,np.ones((3,3), np.uint8),iterations=5)
edges = cv.Canny(gray, 50, 150, apertureSize=3)
lines = cv.HoughLines(edges, 1, np.pi/180, 350)
It would be a good idea to preprocess the image before giving it to cv2.HoughLines(). Also I think cv2.HoughLinesP() would be better. Here's a simple approach
Convert image to grayscale
Apply a sharpening kernel
Threshold image
Perform morphological operations to smooth/filter image
We apply a sharpening kernel using cv2.filter2D() which gives us the general shape of the line and removes the blurred sections. Other filters can be found here.
Now we threshold the image to get solid lines
There are small imperfections so we can use morphological operations with a cv2.MORPH_ELLIPSE kernel to get clean diamond shapes
Finally to get the desired result, we dilate using the same kernel. Depending on the number of iterations, we can obtain thinner or wider lines
Left (iterations=2), Right (iterations=3)
import cv2
import numpy as np
image = cv2.imread('1.png', 0)
sharpen_kernel = np.array([[-1,-1,-1], [-1,9,-1], [-1,-1,-1]])
sharpen = cv2.filter2D(image, -1, sharpen_kernel)
thresh = cv2.threshold(sharpen,220, 255,cv2.THRESH_BINARY)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=3)
result = cv2.dilate(opening, kernel, iterations=3)
cv2.imshow('thresh', thresh)
cv2.imshow('sharpen', sharpen)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
cv2.waitKey()
You're looking for image sharpening techniques. You'll find suggestions here.
You can use different kernel operations to achieve this. OpenCV lists this C++ code here
// sharpen image using "unsharp mask" algorithm
Mat blurred; double sigma = 1, threshold = 5, amount = 1;
GaussianBlur(img, blurred, Size(), sigma, sigma);
Mat lowContrastMask = abs(img - blurred) < threshold;
Mat sharpened = img*(1+amount) + blurred*(-amount);
img.copyTo(sharpened, lowContrastMask);
which should be fairly easy to convert to Python.