Image Enhancing in Python

So, I have been trying to enhance images so I can use text recognition, but since the images are extremely low quality and I am a beginner I haven't been able to perform a great job.
Below is the original image:
Original Image:
First I resized the image
img = cv2.imread('test.jpg')
img = cv2.resize(img,(500,500),interpolation = cv2.INTER_AREA)
then I changed the background color to gray
img_gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
I did some thresholding
ret, img_threshold = cv2.threshold(img_gray, 70, 255, cv2.THRESH_BINARY_INV)
cv2.imshow('THRESHOLD', img_threshold)
and I used morphology to get a better image
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(6,6))
opening = cv2.morphologyEx(img_threshold, cv2.MORPH_OPEN, kernel, iterations = 2)
kernel = np.ones((9,9),np.uint8)
open_img = cv2.morphologyEx(opening, cv2.MORPH_OPEN, kernel, iterations = 3)
My final product is below:
Final Image 2
My question is how can I remove the white chunks and the line crossing the numbers


Identify lines/dots on white image discarding the patterns

I'm working on computer vision and I have an image as shown below:
I want to identify the black line on the tissue. I have tried the following code
import cv2
img = cv2.imread('image.png')
# convert img to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# do morphology gradient
kernel = cv2.getStructuringElement(cv2.MORPH_RECT , (3,3))
morph = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, kernel)
# apply gain
morph = cv2.multiply(morph, 10)
morph=cv2.resize(morph, (1000, 552))
imgStack = stackImages(0.5, ([img ], [morph]))
cv2.imshow('Stacked Images', imgStack)
the above line of code gives:
As we can see, the existing pattern prevails and it is difficult to identify the line. How to discard the true pattern and identify the anamolies.
I did try the other answers in stackoverflow, but nothing seem to work
In reference to the comments, I was suggesting to apply global threshold cv2.threshold():
img = cv2.imread(r'C:\Users\524316\Desktop\Stack\tissue.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshold_value = 20
th = cv2.threshold(gray, threshold_value, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow(cv2.hconcat([gray, th]))
Notice the black line highlighted while no other patterns are affected.

Using pytesseract to get text from an image

I'm trying to use pytesseract to convert some images into text. The images are very basic and I tried using some preprocessing:
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
The original image looks like this:
The resulting image looks like this:
I do this for a bunch of numbers with the same font in the same location here are the results:
It still gives no text in the output. For a few of the images, it does, but not for all and the images look nearly identical.
Here is a snippet of the code I'm using:
def checkCurrentState():
"""image = pyautogui.screenshot()
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
cv2.imwrite("screenshot.png", image)"""
image = cv2.imread("screenshot.png")
def checkNumbers(image):
numbers = []
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.bitwise_not(gray)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
for i in storeLocations:
cropped = gray[i[1]:i[1]+storeHeight, i[0]:i[0]+storeWidth]
number = pytesseract.image_to_string(cropped)
cv2.imshow("Screenshot", cropped)
To perform OCR on an image, its important to preprocess the image. The idea is to obtain a processed image where the text to extract is in black with the background in white. Here's a simple approach using OpenCV and Pytesseract OCR.
To do this, we convert to grayscale, apply a slight Gaussian blur, then Otsu's threshold to obtain a binary image. From here, we can apply morphological operations to remove noise. We perform text extraction using the --psm 6 configuration option to assume a single uniform block of text. Take a look here for more options.
Here's a visualization of each step:
Input image
Convert to grayscale -> Gaussian blur
Otsu's threshold -> Morph open to remove noise
Result from Pytesseract OCR
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Grayscale, Gaussian blur, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Morph open to remove noise
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=1)
# Perform text extraction
data = pytesseract.image_to_string(opening, lang='eng', config='--psm 6')
cv2.imshow('blur', blur)
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)

How to improve pytesseract function for capctha decoding?

I want to extract the numbers from an image in python. In order to do that, I have chosen pytesseract. When I tried extracting the text from the image, the results weren't satisfactory. I also went through the following code and implemented all the techniques listed with other answers. Yet, it doesn't seem to perform well.
sample images:
and my code is:
import cv2 as cv
import pytesseract
from PIL import Image
import matplotlib.pyplot as plt
pytesseract.pytesseract.tesseract_cmd = r"E:\tesseract\tesseract.exe"
def recognize_text(image):
# edge preserving filter denoising 10,150
dst = cv.pyrMeanShiftFiltering(image, sp=10, sr=150)
# grayscale image
gray = cv.cvtColor(dst, cv.COLOR_BGR2GRAY)
# binarization
ret, binary = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)
# morphological manipulation corrosion expansion
erode = cv.erode(binary, None, iterations=2)
dilate = cv.dilate(erode, None, iterations=1)
# logical operation makes the background white the font is black for easy recognition.
cv.bitwise_not(dilate, dilate)
# identify
test_message = Image.fromarray(dilate)
custom_config = r'digits'
text = pytesseract.image_to_string(test_message, config=custom_config)
print(f' recognition result :{text}')
src = cv.imread(r'roughh/testt/f.jpg')
My problem with my code is that it only works with the images of '396156' & '436359' and not with any other images. Please suggest some improvement in my code.
I don't know if you've solved your problem, but this kind of images must be pre-processed using this solution. You will need to tweak the parameters. I worked with a similar dataset and aforementioned solution works well. Let me know your results.
Editing the answer
I'm improving my answer, to not show just link for reference.
The key for this kind of problem is image pre-processing. The main idea is to clean up the input image conserving just the characters.
Given an input image as
We want an output image as
The follow code contains the image pre-processing that I used based on the solution:
# loading image and checking the height and width
img = cv.imread('PNgCd.jpg')
(h, w) = img.shape[:2]
print("Height: {} Width:{}".format(h,w))
cv.imshow('Image', img)
#converting into RBG and resizing the image
img = cv.cvtColor(img, cv.COLOR_BGR2RGB) # converting into RGB order
img = imutils.resize(img, width=450) #resizing the width into 500 pxls
cv.imshow('Image', img)
#gray scale
gray = cv.cvtColor(img, cv.COLOR_RGB2GRAY)
cv.imshow('Gray', gray)
# image thresholdinf with Otsu method and inverse operation
thresh = cv.threshold(gray, 0, 255, cv.THRESH_BINARY_INV | cv.THRESH_OTSU)[1]
cv.imshow('Thresh Otsu', thresh)
#distance tramsform
dist = cv.distanceTransform(thresh, cv.DIST_L2, 5)
dist = cv.normalize(dist, dist, 0, 1.0, cv.NORM_MINMAX)
dist = (dist*255).astype('uint8')
cv.imshow('dist', dist)
#image thresholding with binary operation
dist = cv.threshold(dist, 0, 255, cv.THRESH_BINARY |
cv.imshow('thresh binary', dist)
#morphological operation
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
opening = cv.morphologyEx(dist, cv.MORPH_OPEN, kernel)
cv.imshow('Morphological - Opening', opening)
#dilation or erode (it's depend on your image)
kernel = cv.getStructuringElement(cv.MORPH_CROSS, (2,2))
dilation = cv.dilate(opening, kernel, iterations = 1)
cv.imshow('Dilation', dilation)
# found contours and filtering them
cnts = cv.findContours(dilation.copy(), cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
nums = []
for c in cnts:
(x, y, w, h) = cv.boundingRect(c)
if w >= 5 and h > 15:
#Convex hull and image masking
nums = np.vstack([nums[i] for i in range(0, len(nums))])
hull = cv.convexHull(nums)
mask = np.zeros(dilation.shape[:2], dtype='uint8')
cv.drawContours(mask, [hull], -1, 255, -1)
mask = cv.dilate(mask, None, iterations = 2)
cv.imshow('mask', mask)
# bitwise to retrieval the characters from the original image
final = cv.bitwise_and(dilation, dilation, mask=mask)
cv.imshow('final', final)
cv.imwrite('final.jpg', final)
# OCR'ing the pre-processed image
config = "--psm 7 -c tessedit_char_whitelist=0123456789"
text = tsr.image_to_string(final, config=config)
The code is an example to how to deal with this kind of image. We must keep in mind, Tesseract is not perfect and, it requires cleaned images to work well. This code can also fail for others images like that, we must tweak the parameters or try other techniques of image pre-processing. You must also know the --psm modes, in this case I've considered --psm 7, that treats the image as a single text line. For this kind of image, you can also try --psm 8, that treats the image as single word. This code is just a start point, you can improve it according your need.

How to remove glare from images in opencv?

This mathematica code removes glare from an image:
img = Import["foo.png"]
Inpaint[img, Dilation[saturated, DiskMatrix[20]]]
as shown in the most upvoted answer here:
I want to use opencv instead of Mathematica to get the same result. How would I write equivalent code in opencv-python?
Here is how to do that in Python/OpenCV.
But I do not think the OpenCV inpainting routines are working or at least are not working well for my Python 3.7.5 and OpenCV 3.4.8.
import cv2
import numpy as np
# read image
img = cv2.imread('apple.png')
# convert to gray
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# threshold grayscale image to extract glare
mask = cv2.threshold(gray, 220, 255, cv2.THRESH_BINARY)[1]
# Optionally add some morphology close and open, if desired
#kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7,7))
#mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
#kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
#mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=1)
# use mask with input to do inpainting
result = cv2.inpaint(img, mask, 21, cv2.INPAINT_TELEA)
# write result to disk
cv2.imwrite("apple_mask.png", mask)
cv2.imwrite("apple_inpaint.png", result)
# display it
cv2.imshow("IMAGE", img)
cv2.imshow("GRAY", gray)
cv2.imshow("MASK", mask)
cv2.imshow("RESULT", result)
Thresholded image:

How to extract text from an image with a slight background present?

I'm looking to extract the text from an image, The output I am receiving is not very accurate. I wonder if there's any additional steps I can take to process the image more to increase the accuracy of this OCR.
I've looked into some of the different ways to process the image and improve the OCR results. The image is quite small and I've been able to blow it up slightly, but to no avail.
The image will always be horizontal, no other text will be present other than the numbers. The maximum number will go up to 55000.
An example of the image in question:
After image processing, my image is scaled up by 4 on the X and Y axis. And some saturation is removed, although this does not improve the accuracy at all.
image = self._process(scale=6, iterations=2)
text = pytesseract.image_to_string(image, config="--psm 7")
My process method is doing the following:
# Resize and desaturate.
image = cv2.resize(image, None, fx=scale, fy=scale,
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion.
kernel = np.ones((1, 1), np.uint8)
image = cv2.dilate(image, kernel, iterations=iterations)
image = cv2.erode(image, kernel, iterations=iterations)
return image
Expected: "10411"
The actual value is varied, usually an unrecognizable string, or some numbers are parsed but the accuracy rate is too low to be usable.
I don't have experience with OCR, but I think you're on the right track: increasing the image size so the algorithm has more pixels to work with and increasing the distinction between the numbers and the background.
Tricks I added: thresholding the image, which creates a mask where only the white pixels remain. There were a few white blobs that were not numbers, so I used findContours to color those unwanted blobs black.
import numpy as np
import cv2
# load image
image = cv2.imread('number.png')
# resize image
image = cv2.resize(image,None,fx=5, fy=5, interpolation = cv2.INTER_CUBIC)
# create grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# perform threshold
retr, mask = cv2.threshold(gray_image, 230, 255, cv2.THRESH_BINARY)
# find contours
ret, contours, hier = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# draw black over the contours smaller than 200 - remove unwanted blobs
for cnt in contours:
# print contoursize to detemine threshold
if cv2.contourArea(cnt) < 200:
cv2.drawContours(mask, [cnt], 0, (0), -1)
#show image
cv2.imshow("Result", mask)
cv2.imshow("Image", image)

