Good day. I'm trying to identify the both printed and hand written text from the below check leaf
and here is the image after preprocessing, used below code
import cv2
import pytesseract
import numpy as np
img = cv2.imread('Images/cheque_leaf.jpg')
# Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi)
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
h, w = img.shape[:2]
# By default OpenCV stores images in BGR format and since pytesseract assumes RGB format,
# we need to convert from BGR to RGB format/mode:
# it to reduce noise
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # perform OTSU threhold
thresh = cv2.rectangle(thresh, (0, 0), (w, h), (0, 0, 0), 2) # draw a rectangle around regions of interest in an image
# Dilates an image by using a specific structuring element.
# enrich the charecters(to large)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
# The function erodes the source image using the specified structuring element that determines
# the shape of a pixel neighborhood over which the minimum is taken
erode = cv2.erode(thresh, kernel, iterations = 1)
# To extract the text
custom_config = r'--oem 3 --psm 6'
pytesseract.image_to_string(thresh, config=custom_config)
and now using pytesseract.image_to_string() method to convert image to text. here I'm getting irrelavant output. In that above image I wanted to identify the date,branch payee,amount in both numbers and wordings and digital signature name followed by account number.
any OCR Techniques to solve the above problem by extract the exact data as mentioned above. Thanks in advance
The following is just one of the several approaches.
I would suggest using Sauvola threshold technique. Threshold is calculated for each pixel in the image using a specific formula mentioned here. It involves calculating the mean and standard deviation of pixel values within a certain window.
This functionality is available in the skimage library (also known as scikit-image)
Following is the working example for the given image:
from skimage.filters import threshold_sauvola
img = cv2.imread('cheque.jpg', cv2.IMREAD_GRAYSCALE)
# choosing a window size of 13 (feel free to change it and visualize)
thresh_sauvola = threshold_sauvola(img, window_size=13)
binary_sauvola = img > thresh_sauvola
# converting resulting Boolean array to unsigned integer array of 8-bit (0 - 255)
binary_sauvola_int = binary_sauvola.astype(np.uint8)
result = cv2.normalize(binary_sauvola_int, dst=None, alpha=0, beta=255,norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)
Result:
Note: This result is just a launchpad to try out other image processing techniques to get your desired result.
Related
I have an image I am attempting to split into its separate components, I have successfully created a mask of the objects in the image using k-means clustering. (I have included the results and mask below)
I am then trying to crop each individual part of the original image and save it to a new image, is this possible?
import numpy as np
import cv2
path = 'software (1).jpg'
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
twoDimage = img.reshape((-1,3))
twoDimage = np.float32(twoDimage)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
K = 2
attempts=10
ret,label,center = cv2.kmeans(twoDimage,K,None,criteria,attempts,cv2.KMEANS_PP_CENTERS)
center = np.uint8(center)
res = center[label.flatten()]
result_image = res.reshape((img.shape))
cv2.imwrite('result.jpg',result_image)
Original image
Result of k-means
My solution involves creating a binary object mask where all the objects are colored in white and the background in black. I then extract each object based on area, from smallest to smallest. I use this "isolated object" mask to segment each object in the original image. I then write the result to disk. These are the steps:
Resize the image (your original input is gigantic)
Convert to grayscale
Extract each object based on area from largest to smallest
Create a binary mask of the isolated object
Apply a little bit of morphology to enhance the mask
Mask the original BGR image with the binary mask
Apply flood-fill to color the background with white
Save image to disk
Repeat the process for all the objects in the image
Let's see the code. Through the script I use two helper functions: writeImage and findBiggestBlob. The first function is pretty self-explanatory. The second function creates a binary mask of the biggest blob in a binary input image. Both functions are presented here:
# Writes an PGN image:
def writeImage(imagePath, inputImage):
imagePath = imagePath + ".png"
cv2.imwrite(imagePath, inputImage, [cv2.IMWRITE_PNG_COMPRESSION, 0])
print("Wrote Image: " + imagePath)
def findBiggestBlob(inputImage):
# Store a copy of the input image:
biggestBlob = inputImage.copy()
# Set initial values for the
# largest contour:
largestArea = 0
largestContourIndex = 0
# Find the contours on the binary image:
contours, hierarchy = cv2.findContours(inputImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
# Get the largest contour in the contours list:
for i, cc in enumerate(contours):
# Find the area of the contour:
area = cv2.contourArea(cc)
# Store the index of the largest contour:
if area > largestArea:
largestArea = area
largestContourIndex = i
# Once we get the biggest blob, paint it black:
tempMat = inputImage.copy()
cv2.drawContours(tempMat, contours, largestContourIndex, (0, 0, 0), -1, 8, hierarchy)
# Erase smaller blobs:
biggestBlob = biggestBlob - tempMat
return biggestBlob
Now, let's check out the main script. Let's read the image and get the initial binary mask:
# Imports
import cv2
import numpy as np
# Read image
imagePath = "D://opencvImages//"
inputImage = cv2.imread(imagePath + "L85Bu.jpg")
# Get image dimensions
originalImageHeight, originalImageWidth = inputImage.shape[:2]
# Resize at a fixed scale:
resizePercent = 30
resizedWidth = int(originalImageWidth * resizePercent / 100)
resizedHeight = int(originalImageHeight * resizePercent / 100)
# resize image
inputImage = cv2.resize(inputImage, (resizedWidth, resizedHeight), interpolation=cv2.INTER_LINEAR)
writeImage(imagePath+"objectInput", inputImage)
# Deep BGR copy:
colorCopy = inputImage.copy()
# Convert BGR to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
_, binaryImage = cv2.threshold(grayscaleImage, 250, 255, cv2.THRESH_BINARY_INV)
This is the input resized by 30% according to resizePercent:
And this is the binary mask created with a fixed threshold of 250:
Now, I'm gonna run this mask through a while loop. With each iteration I'll extract the biggest blob until there's no blobs left. Each step will create a new binary mask where the only thing present is one object at a time. This will be the key to isolating the objects in the original (resized) BGR image:
# Image counter to write pngs to disk:
imageCounter = 0
# Segmentation flag to stop the processing loop:
segmentObjects = True
while (segmentObjects):
# Get biggest object on the mask:
currentBiggest = findBiggestBlob(binaryImage)
# Use a little bit of morphology to "widen" the mask:
kernelSize = 3
opIterations = 2
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform Dilate:
binaryMask = cv2.morphologyEx(currentBiggest, cv2.MORPH_DILATE, morphKernel, None, None, opIterations,cv2.BORDER_REFLECT101)
# Mask the original BGR (resized) image:
blobMask = cv2.bitwise_and(colorCopy, colorCopy, mask=binaryMask)
# Flood-fill at the top left corner:
fillPosition = (0, 0)
# Use white color:
fillColor = (255, 255, 255)
colorTolerance = (0,0,0)
cv2.floodFill(blobMask, None, fillPosition, fillColor, colorTolerance, colorTolerance)
# Write file to disk:
writeImage(imagePath+"object-"+str(imageCounter), blobMask)
imageCounter+=1
# Subtract current biggest blob to
# original binary mask:
binaryImage = binaryImage - currentBiggest
# Check for stop condition - all pixels
# in the binary mask should be black:
whitePixels = cv2.countNonZero(binaryImage)
# Compare agaisnt a threshold - 10% of
# resized dimensions:
whitePixelThreshold = 0.01 * (resizedWidth * resizedHeight)
if (whitePixels < whitePixelThreshold):
segmentObjects = False
There are some things worth noting here. This is the first isolated mask created for the first object:
Nice. A simple mask with the BGR image will do. However, I can improve the quality of the mask if I apply a dilate morphological operation. This will "widen" the blob, covering the original outline by a few pixels. (The operation actually searches for the maximum intensity pixel within a Neighborhood of pixels). Next, the masking will produce a BGR image where there's only the object blob and a black background. I don't want that black background, I want it white. I flood-fill at the top left corner to get the first BGR mask:
I save each mask a new file on disk. Very cool. Now, the condition to break from the loop is pretty simple - stop when all the blobs have been processed. To achieve this I subtract the current biggest blob to the original binary white and count the number of white pixels. When the count is below a certain threshold (in this case 10% of the resized image) stop the loop.
Check out this gif of every object isolated. Each frame is saved to disk as a png file:
detect external contours.
for each contour:
2.1 create black mask image
2.2 draw i-th filled contour on the mask image
2.3 create black result iamge for i-th part
2.4 copy with just created mask from the source image to the result image
3.5 save result image for i-th part
I'm using py-tesseract for OCR on images as below but I'm unable to get consistent output from the unprocessed images. How can the spotted background be reduced and the numbers highlighted using cv2 to increase accuracy? I'm also interested in keeping the separators in the output string.
Below pre-processing seems to work with some accuracy
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (7, 7), 0)
(T, threshInv) = cv2.threshold(blurred, 0, 255,
cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
Getting output using psm --6: 6.903.722,99
Here's one solution, which is based on the ideas on a similar post. The main idea is to apply a Hit-or-Miss operation looking for the pattern you want to eliminate. In this case the pattern is one black (or white, if you invert the image) surrounded by pixels of the complimentary color. I've also included a thresholding operation with some bias, because some of the characters are easily destroyed (you could really benefit from more high-res image). These are the steps:
Get grayscale image via color conversion
Threshold with bias to get a binary image
Apply the Hit-or-Miss with one central pixel target kernel
Use the result from the prior operation to suppress the noise in the original image
Let's see the code:
# Imports:
import numpy as np
import cv2
image path
path = "D://opencvImages//"
fileName = "8WFNvsZ.jpg"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu:
thresh, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Use Otsu's threshold value and add some bias:
thresh = 1.05 * thresh
_, binaryImage = cv2.threshold(grayscaleImage, thresh, 255, cv2.THRESH_BINARY_INV )
The first bit of code gets the binary image of the input. Note that I've added some bias to the threshold obtained via Otsu to avoid degrading the characters. This is the result:
Ok, let's apply the Hit-or-Miss operation to get the dot mask:
# Perform morphological hit or miss operation
kernel = np.array([[-1,-1,-1], [-1,1,-1], [-1,-1,-1]])
dotMask = cv2.filter2D(binaryImage, -1, kernel)
# Bitwise-xor mask with binary image to remove dots
result = cv2.bitwise_xor(binaryImage, dotMask)
The dot mask is this:
And the result of subtracting (or XORing) this mask to the original binary image is this:
If I run the inverted (black text on white background) result image on PyOCR I get this string output:
Text is: 6.003.722,09
The other image produces this final result:
And its OCR returns this:
Text is: 4.705.640,00
I want to auto adjust the brightness and contrast of a color image taken from phone under different lighting conditions. Please help me I am new to OpenCV.
Source:
Input Image
Result:
result
What I am looking for is more of a localized transformation. In essence, I want the shadow to get as light as possible completely gone if possible and get darker pixels of the image to get darker, more in contrast and the light pixels to get more white but not to a point where it gets overexposed or anything like that.
I have tried CLAHE, Histogram Equalization, Binary Thresholding, Adaptive Thresholding, etc But nothing has worked.
My initials thoughts are that I need to neutralize Highlights and bring darker pixels more towards the average value while keeping the text and lines as dark as possible. And then maybe do a contrast filter. But I am unable to Get the result please help me.
Here is one way to do that in Python/OpenCV.
Read the input
Increase contrast
Convert original to grayscale
Adaptive threshold
Use the thresholded image to make the background white on the contrast increased image
Save results
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("math_diagram.jpg")
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do adaptive threshold on gray image
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 15)
# make background of input white where thresh is white
result = img.copy()
result[thresh==255] = (255,255,255)
# write results to disk
cv2.imwrite("math_diagram_threshold.jpg", thresh)
cv2.imwrite("math_diagram_processed.jpg", result)
# display it
cv2.imshow("THRESHOLD", thresh)
cv2.imshow("RESULT", result)
cv2.waitKey(0)
Threshold image:
Result:
You can use any local binarization method. In OpenCV there is one such method called Wolf-Julion local binarization which can be applied to the input image. Below is code snippet as an example:
import cv2
image = cv2.imread('input.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)[:,:,2]
T = cv2.ximgproc.niBlackThreshold(gray, maxValue=255, type=cv2.THRESH_BINARY_INV, blockSize=81, k=0.1, binarizationMethod=cv2.ximgproc.BINARIZATION_WOLF)
grayb = (gray > T).astype("uint8") * 255
cv2.imshow("Binary", grayb)
cv2.waitKey(0)
The output result from above code is below. Please note that to use ximgproc module you need to install opencv contrib package.
I'm currently trying to detect numbers from small screenshots. However, I have found the accuracy to be quite poor. I've been using OpenCV, the image is captured in RGB and converted to greyscale, then thresholding has been performed using a global value (I found adaptive didn't work so well).
Here is an example grey-scale of one of the numbers, followed by an example of the image post thresh-holding (the numbers can range from 1-99). Note that the initial screenshot of the image is quite small and is thus enlarged.
Any suggestions on how to improve accuracy using OpenCV or a different system altogether are much appreciated. Some code included below, the function is passed a screenshot in RGB of the number.
def getNumber(image):
image = cv2.resize(image, (0, 0), fx=3, fy=3)
img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh, image_bin = cv2.threshold(img, 125, 255, cv2.THRESH_BINARY)
image_final = PIL.Image.fromarray(image_bin)
txt = pytesseract.image_to_string(
image_final, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt
Here's what i could improve, using otsu treshold is more efficent to separate text from background than giving an arbitrary value. Tesseract works better with black text on white background, and i also added padding as tesseract struggle to recognize characters if they are too close to the border.
This is the final image [final_image][1] and pytesseract manage to read "46"
import cv2,numpy,pytesseract
def getNumber(image):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Otsu Tresholding automatically find best threshold value
_, binary_image = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
# invert the image if the text is white and background is black
count_white = numpy.sum(binary_image > 0)
count_black = numpy.sum(binary_image == 0)
if count_black > count_white:
binary_image = 255 - binary_image
# padding
final_image = cv2.copyMakeBorder(image, 10, 10, 10, 10, cv2.BORDER_CONSTANT, value=(255, 255, 255))
txt = pytesseract.image_to_string(
final_image, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')
return txt
Function is executed as :
>> getNumber(cv2.imread(img_path))
EDIT : note that you do not need this line :
image_final = PIL.Image.fromarray(image_bin)
as you can pass to pytesseractr an image in numpy array format (wich cv2 use), and Tesseract accuracy only drops for characters under 35 pixels (and also bigger, 35px height is actually the optimal height) so i did not resize it.
[1]: https://i.stack.imgur.com/OaJgQ.png
I am working in python on openCV 3.0. In order to find the largest white pixel region, first of all thresholded gray image to binary image.
import cv2
import numpy as np
img = cv2.imread('graimage.png')
img = cv2.resize(img,(400,500))
gray = img.copy()
(thresh, im_bw) = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY )
derp,contours,hierarchy = cv2.findContours(im_bw,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
cnts = max(cnts, key=cv2.contourArea)
But it shows error as follows.
cv2.error: ..../opencv/modules/imgproc/src/contours.cpp:198: error: (-210) [Start]FindContours supports only CV_8UC1 images when mode != CV_RETR_FLOODFILL otherwise supports CV_32SC1 images only in function cvStartFindContours.
It looks like this was answered in the comments, but just to mark the question as answered:
CV_8UC1 means 8-bit pixels, unsigned, and only one channel, so grayscale. It looks like you're reading it in with 3 color channels, or CV_8UC3. You can check the image type by printing img.dtype and img.shape. The dtype should be uint8, and the shape should be (#, #), indicating two dimensions. I'm guessing you'll see that shape prints (#, #, 3) for your image as-is, indicating three color channels.
As #user3515225 said, you can fix that by reading the image in as grayscale using cv2.imread('img.png', cv2.IMREAD_GRAYSCALE). That assumes you have no use for color anywhere else, though. If you want a separate grayscale copy of the image, then replace gray = img.copy() with gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY) instead.