How to separate foreground text from noisy background in Python? - python

I am trying to extract the text from this image
.
I tried adjusting contrast and brightness, smoothing (e.g. GaussianBlur, medianblur) and threshold techniques(e.g. Otsu) with OpenCV, there is still lots of remaining noise.
Is there anything else I can try?

You could try a combination of Gaussian blurring, thresholding, and morphological operations to isolate the text. Here's a pipeline
Blur -> Threshold -> Opening -> Dilation -> Bitwise-and
import cv2
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (7,7), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,2))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=3)
# Repair text
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,7))
dilate = cv2.dilate(opening, kernel, iterations=2)
# Bitwise-and with input image
result = cv2.bitwise_and(image,image,mask=dilate)
result[dilate==0] = (255,255,255)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('dilate', dilate)
cv2.imshow('result', result)
cv2.waitKey()

If a more general solution is required, read on. Otherwise, you can refer to nathancy's answer, or plenty of other answers in this site.
I assume that
extract the text from this image
means you want the text from this image as a string or the ROI of the text.
This is something called OCR (Optical Character Recognition) and is a pretty complicated deep learning problem specifically for the type of image you posted (noisy, low sharpness, low dynamic range etc.). If you are looking for a vanilla OpenCV library that can do this out of the box then, as far as I know, there isn't any.
Check these links for source code and explanation -
OCR with pytesseract
Text ROI detection using EAST

Related

Image Enhancing in Python

So, I have been trying to enhance images so I can use text recognition, but since the images are extremely low quality and I am a beginner I haven't been able to perform a great job.
Below is the original image:
Original Image:
First I resized the image
img = cv2.imread('test.jpg')
cv2.imshow('Original',img)
cv2.waitKey(0)
img = cv2.resize(img,(500,500),interpolation = cv2.INTER_AREA)
cv2.imshow('Resized',img)
cv2.waitKey(0)
then I changed the background color to gray
img_gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow('GRAY',img_gray)
cv2.waitKey(0)
I did some thresholding
ret, img_threshold = cv2.threshold(img_gray, 70, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('THRESHOLD', img_threshold)
cv2.waitKey(0)
and I used morphology to get a better image
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(6,6))
opening = cv2.morphologyEx(img_threshold, cv2.MORPH_OPEN, kernel, iterations = 2)
kernel = np.ones((9,9),np.uint8)
open_img = cv2.morphologyEx(opening, cv2.MORPH_OPEN, kernel, iterations = 3)
cv2.imshow('OPENING',open_img)
cv2.waitKey(0)
My final product is below:
Final Image 2
My question is how can I remove the white chunks and the line crossing the numbers

Compare two different images and find the differences

I have a webcam which takes pictures of a concrete slab. Now I want to check if there are objects on the slab or not. The objects could be anything and accordingly cannot be enumerated in a class. Unfortunately I cannot compare the webcam image directly with an image without objects on the concrete slab, because the image of the camera could shift minimally in x and y direction and the lighting is also not always the same. So I cannot use cv2.substract.
I would prefer a foreground and background substract, where the background is just my concrete slab and the foreground is then the objects. But since the objects don´t move but lie still on the slab, I can´t use cv2.createBackgroundSubtractorMOG2() either.
The Pictures look like this:
The Concrete slap without any objects:
The slap with Objects:
In Python/OpenCV, you could do division normalization to even out the illumination and make the background white. Then do your subtraction. Then use morphology to clean up small regions. Then find contours and discard any small regions that are due to noise left after the division normalization and morphology.
Here is how to do division normalization.
Input 1:
Input 2:
import cv2
import numpy as np
# load image
img1 = cv2.imread("img1.jpg")
img2 = cv2.imread("img2.jpg")
# convert to grayscale
gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# blur
blur1 = cv2.GaussianBlur(gray1, (0,0), sigmaX=13, sigmaY=13)
blur2 = cv2.GaussianBlur(gray2, (0,0), sigmaX=13, sigmaY=13)
# divide
divide1 = cv2.divide(gray1, blur1, scale=255)
divide2 = cv2.divide(gray2, blur2, scale=255)
# threshold
thresh1 = cv2.threshold(divide1, 200, 255, cv2.THRESH_BINARY)[1]
thresh2 = cv2.threshold(divide2, 200, 255, cv2.THRESH_BINARY)[1]
# morphology
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
morph1 = cv2.morphologyEx(thresh1, cv2.MORPH_OPEN, kernel)
morph2 = cv2.morphologyEx(thresh2, cv2.MORPH_OPEN, kernel)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
morph1 = cv2.morphologyEx(morph1, cv2.MORPH_CLOSE, kernel)
morph2 = cv2.morphologyEx(morph2, cv2.MORPH_CLOSE, kernel)
# write result to disk
cv2.imwrite("img1_division_normalize.jpg", divide1)
cv2.imwrite("img2_division_normalize.jpg", divide2)
cv2.imwrite("img1_division_morph1.jpg", morph1)
cv2.imwrite("img1_division_morph2.jpg", morph2)
# display it
cv2.imshow("img1_norm", divide1)
cv2.imshow("img2_norm", divide2)
cv2.imshow("img1_thresh", thresh1)
cv2.imshow("img2_thresh", thresh2)
cv2.imshow("img1_morph", morph1)
cv2.imshow("img2_morph", morph2)
cv2.waitKey(0)
cv2.destroyAllWindows()
Image 1 Normalized:
Image 2 Normalized:
Image 1 thresholded and morphology cleaned:
Image 2 thresholded and morphology cleaned:
In this case, Image 1 becomes completely white. So it (and subtraction) is not really needed. You just need to find contours for the second image result and if necessary discard tiny regions by area. The rest are your objects.

Using pytesseract to get text from an image

I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
The original image looks like this:
The resulting image looks like this:
I do this for a bunch of numbers with the same font in the same location here are the results:
It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.
Here is a snippet of the code I'm using:
def checkCurrentState():
"""image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
cv2.imwrite("screenshot.png", image)"""
image = cv2.imread("screenshot.png")
checkNumbers(image)
def checkNumbers(image):
numbers = []
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
for i in storeLocations:
cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
number = pytesseract.image_to_string(cropped)
numbers.append(number)
print(number)
cv2.imshow("Screenshot", cropped)
cv2.waitKey(0)
To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.
To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Here's a visualization of each step:
Input image
Convert to grayscale -> Gaussian blur
Otsu's threshold -> Morph open to remove noise
Result from Pytesseract OCR
1100
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
print(data)
cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.waitKey()

How to improve OCR with Pytesseract text recognition?

Hi I am looking to improve my performance with pytesseract at digit recognition.
I take my raw image and split it into parts that look like this:
The size can vary.
To this I apply some pre-processing methods like so
image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)
and I get this
I then pass this to pytesseract:
num = pytesseract.image_to_string(result_img, lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
However this is not good enough for me and often gets numbers wrong.
I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.
Thank you.
You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold
to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string. We use the --psm 6 configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:
Input image -> Binary -> Invert
Result from Pytesseract OCR
8
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()

Remove surrounding lines and background graphic noise from handwritten text

I am trying to remove rules and a background smiley face from multiple notebook pages before performing text detection and recognition on the handwritten text.
An earlier thread offers helpful hints, but my problem is different in several respects.
The text to keep is written over the background items to be removed.
The items to be removed have distinct colors from that of the text, which may be the key to their removal.
The lines to be removed are not very straight, and the smiley face even less so.
I'm thinking of using OpenCV for this task, but I'm open to using ImageMagick or command-line GIMP so long as I can process the entire batch at once. Since I have never used any of these tools before, any advice would be welcome. Thank you.
Here's a simple approach with the assumption that the text is blue
Convert image to HSV format and color threshold with cv2.inRange()
Perform morphological transformations to smooth image
Isolate characters
Recolor characters for OCR/Tesseract
We begin by converting the image to HSV format and create a mask to isolate the characters
image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)
Now we perform morphological transformations to remove small noise
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
We have the desired text outlines so we can isolate characters by masking with the original image
result[close==0] = (255,255,255)
Finally to prepare the image for OCR/Tesseract, we change the characters to black
retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]
Full code
import numpy as np
import cv2
image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
result[close==0] = (255,255,255)
cv2.imshow('cleaned', result)
retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]
cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.waitKey()

Categories

Resources