Given the code below, the cv2.dilate and cv2.erode functions in python return the same image I send to it. What am I doing wrong? I am using OpenCV3.0.0. and numpy1.9.0 on iPython 2.7
im = np.zeros((100,100), dtype=np.uint8)
im[50:,50:] = 255
dilated = cv2.dilate(im, (11,11))
print np.array_equal(im, dilated)
Which returns:
True
{Edited}
The other dilate post represents a question of kernel datatype. This post actually reflects a function call error.
The function requires a kernel, not a kernel size. So a correct function call would be below.
dilated = cv2.dilate(im, np.ones((11, 11)))
You need to specify a proper kernel. It can be rectangular, circular, etc.
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
im = np.zeros((100,100), dtype=np.uint8)
im[50:,50:] = 255
dilated = cv2.dilate(im, kernel, iterations = 1)
I think it has to do with your second line where you modify the array. Probably the data type gets infected.
the function has to be called this way:
cv2.dilate(img (input), kernel, iterations = number (how many times you want to apply the filter)
Related
I have a grayscale image with something written in the front and something at the back. I'd like to filter out the back part of the letters and only have the front. It's only grayscale and not RGB, and I'd rather not have to calculate pixels manually.
Is there any library function I can use to do this? I'm new to python and at the moment, using PIL library, so that's my preference. But if there are other libraries, I'm open to that as well.
Here's the image:
Are you looking for a function that automatically strips the background for any given image, or just one that can filter out pixels that meet a certain criteria for this particular image?
The eval function applies the same transformation to every pixel in an image. This works for your image.
with Image.open("jFmbt.jpg") as im:
im = im.convert("L")
out_image = Image.eval(im, lambda x: 256 if x > 175 and x < 250 else x)
A very commonly used library for that would be OpenCV.
import cv2 as cv
# 0 flag -> read image as greyscale
img = cv.imread("img.jpg", 0)
# threshold
ret, thresh = cv.threshold(img, 150, 255, cv.THRESH_BINARY)
# result
cv.imwrite("output.jpg", thresh)
The resulting image would be:
I want to eliminate gray lines in 16 bit image you can see.
Final goal is remove line in object image(second image) with background image(first image).
I thought it need FFT, but i don't know how FFT applied. There will be other ways, too.
please help me.
One simple way using Python/OpenCV is to use morphology close multiple times with a small vertical rectangular kernel.
Input:
import cv2
import numpy as np
img = cv2.imread('lines.png')
# do morphology multiple times to remove horizontal lines
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,5))
result = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel, iterations = 9)
# write result to disk
cv2.imwrite("lines_removed.png", result)
# display it
cv2.imshow("result", result)
cv2.waitKey(0)
However, it will modify the image everywhere slightly
I'm working on my bachelor's degree final project and I want to create an OCR for bottle inspection with python. I need some help with text recognition from the image. Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?
I tried image processing operations on the image and I used pytesseract to recognize the characters.
Using the code bellow I got from this photo:
to this one:
and then to this one:
Sharpen function:
def sharpen(img):
sharpen = iaa.Sharpen(alpha=1.0, lightness = 1.0)
sharpen_img = sharpen.augment_image(img)
return sharpen_img
Image processing code:
textZone = cv2.pyrUp(sharpen(originalImage[y:y + h - 1, x:x + w - 1])) #text zone cropped from the original image
sharp = cv2.cvtColor(textZone, cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(sharp, 127, 255, cv2.THRESH_BINARY)
#the functions such as opening are inverted (I don't know why) that's why I did opening with MORPH_CLOSE parameter, dilatation with erode and so on
kernel_open = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
open = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel_open)
kernel_dilate = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,7))
dilate = cv2.erode(open,kernel_dilate)
kernel_close = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 5))
close = cv2.morphologyEx(dilate, cv2.MORPH_OPEN, kernel_close)
print(pytesseract.image_to_string(close))
This is the result of pytesseract.image_to_string:
22203;?!)
92:53 a
The expected result is :
22/03/20
02:53 A
"Do I need to apply the cv2 operations in a better way, train tesseract or should I try another method?"
First, kudos for taking this project on and getting this far with it. What you have from the OpenCV/cv2 standpoint looks pretty good.
Now, if you're thinking of Tesseract to carry you the rest of the way, at the very least you'll have to train it. Here you have a tough choice: Invest in training Tesseract, or work up a CNN to recognize a limited alphabet. If you have a way to segment the image, I'd be tempted to go with the latter.
From the result you got and the expected result, you can see that some of the characters are recognized correctly. Assuming you are using a different image from that shown in the tutorial, I recommend you to change the values of threshold and getStructuringElement.
These values work better depending on the image color. The tutorial author must have optimized it for his/her use (by trial and error or some other way).
Here is a video if you want to play around with those value using sliders in opencv. You can also print your result in the same loop to see if you are getting the desired result.
One potential thing you could do to improve recognition on the characters is to dilate the characters so pytesseract gives a better result. Dilating the characters will connect the individual blobs together and can fix the / or the A characters. So starting with your latest binary image:
Original
Dilate with a 3x3 kernel with iterations=1 (left) or iterations=2 (right). You can experiment with other values but don't do it too much or the characters will all connect. Maybe this will provide a better result with you OCR.
import cv2
image = cv2.imread("1.PNG")
thresh = cv2.threshold(image, 115, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
dilate = cv2.dilate(thresh, kernel, iterations=1)
final = cv2.threshold(dilate, 115, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow('image', image)
cv2.imshow('dilate', dilate)
cv2.imshow('final', final)
cv2.waitKey(0)
I have binarized image. Yellow = 1 mask, purple = 0 background:
I can filter if the whole masks is small by some threshold this way:
def filter_image(img):
if img.sum() < 10:
return np.zeros(img.shape)
else:
return img
However how get rid of this small yellow points?
That can be easily done with a Morphological Transformation
You can check the docs here
Something like this should work:
import cv2
import numpy as np
img = cv2.imread('some_image.png',0)
kernel = np.ones((5,5),np.uint8)
closing = cv2.morphologyEx(img, cv.MORPH_CLOSE, kernel)
You can play with the values inside the kernel to close the holes with a determinate size.
I hope it helped
Maybe you can use sequential morphological dilations and erosions in order to remove this noise.
These operation are also known as open and close.
I'm working with images that have text. The problem is that these images are receipts, and after a lot of transformations, the text lost quality.
I'm using python and opencv.
I was trying with a lot of combinations of morphological transformations from the doc Morphological Transformations, but I don't get satisfactory results.
I'm doing this right now (I'll comment what I've tried, and just let uncommented what I'm using):
kernel = np.ones((2, 2), np.uint8)
# opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
# closing = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel)
# dilation = cv2.dilate(opening, kernel, iterations=1)
# kernel = np.ones((3, 3), np.uint8)
erosion = cv2.erode(img, kernel, iterations=1)
# gradient = cv2.morphologyEx(img, cv2.MORPH_GRADIENT, kernel)
#
img = erosion.copy()
With this, from this original image:
I get this:
It's a little bit better, as you can see. But it still too bad. The OCR (tesseract) doesn't recognize the characters here very well. I've trained, but as you can note, every "e" is different, and so on.
I get good results, but I think, if I resolve this problem, they would be even better.
Maybe I can do another thing, or use a better combination of the morphological transformations. If there is another tool (PIL, imagemagick, etc..) that I could use, I can use it.
Here's the whole image, so you can see how it looks:
As I said, it's not so bad, but a little be more "optimization" of the letters would be perfect.
After years working in this theme, I can tell now, that what I wanted to do take a big effort, it's quite slow, and NEVER worked as I expected. The irregularities of the pixels in the characters are always unpredictable, that's why "easy algorithms" don't work.
Question: It's impossible then to have a decent OCR, which can read damaged characters?
Answer: No, it's not impossible. But it takes "a bit" more than just using erosion, morphological closing or something like that.
Then, how? Neural Networks :)
Here are two amazing papers that help me a lot:
Can we build language-independent OCR using LSTM networks?
Reading Scene Text in Deep Convolutional Sequences
And for those who aren't familiar with RNN, I can suggest this:
Understanding LSTM Networks
There's also a python library, which works pretty good (and unfortunately even better for C++):
ocropy
I really hope this can help someone.
In my experience erode impairs OCR quality. If you have grayscale image (not binary) you can use better binarization algorithm. I use SAUVOLA algorithm for binarization. If you have only binary image the best thing you can do is removing the noise (remove all small dots).
Did you consider the neighboring pixels and add sum of them.
For example:
n = numpy.zeros((3,3))
s = numpy.zeros((3,3))
w = numpy.zeros((3,3))
e = numpy.zeros((3,3))
n[0][1] = 1
s[2][1] = 1
w[1][0] = 1
e[1][2] = 1
img_n = cv2.erode(img, n, iterations=1)
img_s = cv2.erode(img, s, iterations=1)
img_w = cv2.erode(img, w, iterations=1)
img_e = cv2.erode(img, e, iterations=1)
result = img_n + img_s + img_w + img_e + img
Also, you can either numpy or cv2 to add the arrays.
I found the Ramer–Douglas–Peucker Algorithm I'm trying to implement it for closed polygons in Haskell. Maybe it can solve something.