I want to use pytesseract to read digits from images. The images look as follows:
The digits are dotted and in order to be able to use pytesseract, I need black connected digits on a white background. To do so, I thought about using erode and dilate as preprocessing techniques. As you can see, the images are similar, yet quite different in certain aspects. For example, the dots in the first image are darker than the background, while the dots in the second are whiter. That means, in the first image I can use erode to get black connected lines and in the second image I can use dilate to get white connected lines and then inverse the colors. This leads to the following results:
Using an appropriate threshold, the first image can easily be read with pytesseract. The second image, whoever, is more tricky. The problem is, that for example parts of the "4" are darker than the background around the three. So a simple threshold is not going to work. I need something like local threshold or local contrast enhancement. Does anybody have an idea here?
Edit:
OTSU, mean threshold and gaussian threshold lead to the following results:
Your images are pretty low res, but you can try a method called gain division. The idea is that you try to build a model of the background and then weight each input pixel by that model. The output gain should be relatively constant during most of the image.
After gain division is performed, you can try to improve the image by applying an area filter and morphology. I only tried your first image, because it is the "least worst".
These are the steps to get the gain-divided image:
Apply a soft median blur filter to get rid of high frequency noise.
Get the model of the background via local maximum. Apply a very strong close operation, with a big structuring element (I’m using a rectangular kernel of size 15).
Perform gain adjustment by dividing 255 between each local maximum pixel. Weight this value with each input image pixel.
You should get a nice image where the background illumination is pretty much normalized, threshold this image to get a binary mask of the characters.
Now, you can improve the quality of the image with the following, additional steps:
Threshold via Otsu, but add a little bit of bias. (This, unfortunately, is a manual step depending on the input).
Apply an area filter to filter out the smaller blobs of noise.
Let's see the code:
import numpy as np
import cv2
# image path
path = "C:/opencvImages/"
fileName = "iA904.png"
# Reading an image in default mode:
inputImage = cv2.imread(path+fileName)
# Remove small noise via median:
filterSize = 5
imageMedian = cv2.medianBlur(inputImage, filterSize)
# Get local maximum:
kernelSize = 15
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(imageMedian, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)
# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))
# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)
# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8")
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)
This is what gain division gets you:
Note that the lighting is more balanced. Now, let's apply a little bit of contrast enhancement:
# Contrast Enhancement:
grayscaleImage = np.uint8(cv2.normalize(grayscaleImage, grayscaleImage, 0, 255, cv2.NORM_MINMAX))
You get this, which creates a little bit more contrast between the foreground and the background:
Now, let's try to threshold this image to get a nice, binary mask. As I suggested, try Otsu's thresholding but add (or subtract) a little bit of bias to the result. This step, as mentioned, is dependent on the quality of your input:
# Threshold via Otsu + bias adjustment:
threshValue, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
threshValue = 0.9 * threshValue
_, binaryImage = cv2.threshold(grayscaleImage, threshValue, 255, cv2.THRESH_BINARY)
You end up with this binary mask:
Invert this and filter out the small blobs. I set an area threshold value of 10 pixels:
# Invert image:
binaryImage = 255 - binaryImage
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)
# Set the minimum pixels for the area filter:
minArea = 10
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype("uint8")
And this is the final binary mask:
If you plan on sending this image to an OCR, you might want to apply some morphology first. Maybe a closing to try and join the dots that make up the characters. Also be sure to train your OCR classifier with a font that is close to what you are actually trying to recognize. This is the (inverted) mask after a size 3 rectangular closing operation with 3 iterations:
Edit:
To get the last image, process the filtered output as follows:
# Set kernel (structuring element) size:
kernelSize = 3
# Set operation iterations:
opIterations = 3
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
# Invert image to obtain black numbers on white background:
closingImage = 255 - closingImage
Related
Writing CNN to classify pictures. I encountered a problem with garbage pixels. image The resulting network gives ~90% quality, it seems that it can be improved by averaging these pixels.
Is there a ready algorithm in numpy, opencv, etc. that allows to do this? Not normally smoothing, but specifically for these pixels. Or do I have to do it manually?
I agree that if you are using a CNN for some kind of classification you should train the network to handle this kind of noisy images. Maybe augment your dataset with some salt and pepper noise. Anyway, here's a possible solution for filtering out the outliers. It builds on the idea proposed by fmw42. These are the steps:
Apply a median blur with a large kernel
Convert the original (unprocessed) image to grayscale
(Invert) Threshold the grayscale image with a low threshold value (e.g, 5) to create a mask for the outliers close to 0.
Threshold the grayscale image with a high low threshold value (e.g., 250) to create a mask for the outliers close to 255.
Combine both mask to create the outlier mask
Use the outlier mask to adaptive-filter the original input image substituting the median values where necessary.
Let's see the code:
# Imports:
import cv2
import numpy as np
# image path
path = "D://opencvImages//noisyNumbers//"
fileName = "noisy01.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Apply median filter:
filteredImage = cv2.medianBlur(inputImage, ksize=11)
# Convert input image to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
The median filtering with a kernel size of 11 looks like this:
The outliers are practically gone. Now, let's put this aside for the moment and compute a pair of binary masks for both outliers:
# Get low mask:
_, lowMask = cv2.threshold(grayscaleImage, 5, 255, cv2.THRESH_BINARY_INV)
# Get high mask:
_, highMask = cv2.threshold(grayscaleImage, 250, 255, cv2.THRESH_BINARY)
# Create outliers mask:
outliersMask = cv2.add(lowMask, highMask)
The outliers mask is this:
Now, you really don't provide your original data. You provide an image most likely plotted using matplotlib. That's a problem, because the image you posted is processed and compressed. This results in some sharp edges around the outliers on the original image. One straightforward solution is to dilate the outliers mask a little bit to cover this compression artifacts:
# Set kernel (structuring element) size:
kernelSize = 3
# Set operation iterations:
opIterations = 1
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Apply dilation:
outliersMask = cv2.dilate(outliersMask, maxKernel)
The outliers mask now looks like this:
Ok, let's adaptive-filter the original input using the median blurred image and the outliers mask. Just make sure to reshape all the numpy arrays to their proper size for broadcasting:
# Re-shape the binary mask to a 3-channeled image:
augmentedBinary = cv2.merge([outliersMask, outliersMask, outliersMask])
# Apply the adaptive filter:
cleanedImage = np.where(augmentedBinary == (255, 255, 255), filteredImage, inputImage)
# Show the result
cv2.imshow("Adaptive Filtering", cleanedImage)
cv2.waitKey(0)
For the first image, this is the result:
More results:
I have a screenshot received from an iPhone, both dark and light mode.
I need to use OCR to extract the URL but am unable to do so with the underlining that appears.
What would be the best way to remove the horizontal lines from the message? Except the phone number, it doesn't matter if other parts of the screenshot are distorted.
I've tried approaches as described in
Removing Horizontal Lines in image (OpenCV, Python, Matplotlib)
https://docs.opencv.org/3.2.0/d1/dee/tutorial_moprh_lines_detection.html
https://legacy.imagemagick.org/discourse-server/viewtopic.php?t=22338
And none seem to work well, at all.
Here's a possible solution for your problem. I'm using mock screenshots, since, like I suggested, it is better to use lossless images to get a better result. The main idea here is to extract the color of the text box and to fill the rest of the image with that color, then threshold the image. By doing this, we will reduce the intensity variation and obtain a better thresholded image - since the image histogram will contain fewer intensity values. These are the steps:
Crop the image to a ROI (Region Of Interest)
Get the colors in that ROI via K-Means
Get the color of the text box
Flood-fill the ROI with the color of the text box
Apply Otsu's thresholding to get a binary image
Get OCR of the image
Suppose this is our test images, one uses a a "light" theme while the other uses a "dark" theme:
I'll be using pyocr as OCR engine. Let's use image one, the code would be this:
# imports:
from PIL import Image
import numpy as np
import cv2
import pyocr
import pyocr.builders
tools = pyocr.get_available_tools()
# The tools are returned in the recommended order of usage
tool = tools[0]
langs = tool.get_available_languages()
lang = langs[0]
# image path
path = "D://opencvImages//"
fileName = "mockText.png"
# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)
# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Set the ROI location:
roiX = 0
roiY = 235
roiWidth = 750
roiHeight = 1080
# Crop the ROI:
smsROI = grayscaleImage[roiY:roiHeight, roiX:roiWidth]
The first bit crops the ROI - everything that is of interest, leaving out the "header" and the "footer" of the image, where's there's info that we really don't need. This is the current ROI:
Wouldn't be nice to (approximately) get all the colors used in the image? Fortunately that's what Color Quantization gives us - a reduced pallet of the average colors present in an image, provided the number of the colors we are looking for. Let's apply K-Means and use 3 clusters to group this colors.
In our test images, most of the pixels are background - so, the largest cluster of pixels will belong to the background. The text represents the smallest cluster of pixels. That leaves the remaining cluster our target - the color of the text box. Let's apply K-Means, then. We need to format the data before, though, because K-Means needs float re-arranged arrays:
# Reshape the data to width x height, number of channels:
kmeansData = smsROI.reshape((-1,1))
# convert the data to np.float32
kmeansData = np.float32(kmeansData)
# define criteria, number of clusters(K) and apply kmeans():
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 5, 1.0)
# Define number of clusters (3 colors):
K = 3
# Run K-means:
_, _, center = cv2.kmeans(kmeansData, K, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
# Convert the centers to uint8:
center = np.uint8(center)
# Sort centers from small to largest:
center = sorted(center, reverse=False)
# Get text color and min color:
textBoxColor = int(center[1][0])
minColor = min(center)[0]
print("Minimum Color is: "+str(minColor))
print("Text Box Color is: "+str(textBoxColor))
The info of interest is in center. That's where our colors are. After sorting this list and getting the minimum color value (that I'll use later to distinguish between a light and a dark theme) we can print the values. For the first test image, these values are:
Minimum Color is: 23
Text Box Color is: 225
Alright, so far so good. We have the color of the text box. Let's use that and flood-fill the entire ROI at position (x=0, y=0):
# Apply flood-fill at seed point (0,0):
cv2.floodFill(smsROI, mask=None, seedPoint=(0, 0), newVal=textBoxColor)
The result is this:
Very nice. Let's apply Otsu's thresholding on this bad boy:
# Threshold via Otsu:
_, binaryImage = cv2.threshold(smsROI, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
Now, here comes the minColor part. If you are processing a dark theme screenshot and threshold it you will get white text on black background. If you were to process a light theme screenshot you would get black text on white background. We will always produce the same no matter the input: white text and black background. Let's check the min color, if this equals 0 (black) you just received a dark theme screenshot and you don't need to invert the image. Otherwise, invert the image:
# Process "Dark Theme / Light Theme":
if minColor != 0:
# Invert image if is not already inverted:
binaryImage = 255 - binaryImage
cv2.imshow("binaryImage", binaryImage)
cv2.waitKey(0)
For our first test image, the result is:
Notice the little bits of small noise. Let's apply an area filter (function defined at the end of the post) to get rid of pixels below a certain area threshold:
# Run a minimum area filter:
minArea = 10
binaryImage = areaFilter(minArea, binaryImage)
This is the filtered image:
Very nice. Lastly, I write this image and use pyocr to get the text as a string:
cv2.imwrite(path + "ocrText.png", binaryImage)
txt = tool.image_to_string(
Image.open(path + "ocrText.png"),
lang=lang,
builder=pyocr.builders.TextBuilder()
)
print("Image text is: "+txt)
Which results in:
Image text is: 301248 is your Amazon
verification code
If you test the second image you get the same exact result. This is the definition and implementation of the areaFilter function:
def areaFilter(minArea, inputImage):
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(inputImage, connectivity=4)
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
return filteredImage
I am trying to remove a transparent watermark from an image.
Here is my sample image:
I would like to remove the text "Watermark" from the image. As you can see, the text is transparent. So I would like to replace that text to the original background.
Something like this would be my desired output:
I tried some examples (I am currently using cv2, if other libraries can solve the problem please also recommend), but none of them where near from succeeding. I know the way to go would be to have a mask (like in this post), but they all already have masked images, but I don't.
Here is what I tried to do to have a mask, I turned down the saturation to black and white, and created an image "imagemask.jpg", then tried going through the pixels with a for loop:
mask = cv2.imread('imagemask.jpg')
new = []
rows, cols, _ = mask.shape
for i in range(rows):
new.append([])
#print(i)
for j in range(cols):
k = img[i, j]
#print(k)
if all(x in range(110, 130) for x in k):
new[-1].append((255, 255, 255))
else:
new[-1].append((0, 0, 0))
cv2.imwrite('finalmask.jpg', np.array(new))
Then after that wanted to use the code for the mask, but I realized the "finalmask.jpg" is a complete mess... so I didn't try using the code for the mask.
Is this actually possible? I have been trying for around 3 hours but receiving no luck...
This is not trivial, my friend. To add insult to injury, your image is very low-res, compressed and has a nasty glare - that won't help processing at all. Please, look at your input and set your expectations accordingly. With that said, let's try to get the best result with what we have. These are the steps I propose:
Try to segment the watermark text from the image
Filter the segmentation mask and try to get a binary mask as clean as possible
Use the text mask to in-paint the offending area using the input image as reference
Now, the tricky part, as you already saw, is segmenting the text. After trying out some techniques and color spaces, I found that the CMYK color space - particularly the K channel - offers promising results. The text is reasonably clear and we can try an Adaptive Thresholding on this, let's take a look:
# Imports
import cv2
import numpy as np
# Read image
imagePath = "D://opencvImages//"
img = cv2.imread(imagePath+"0f5zZm.jpg")
# Store a deep copy for the inpaint operation:
originalImg = img.copy()
# Convert to float and divide by 255:
imgFloat = img.astype(np.float) / 255.
# Calculate channel K:
kChannel = 1 - np.max(imgFloat, axis=2)
OpenCV does not offer BGR to CMYK conversion directly, so I manually had to get the K channel using the conversion formula. It is very straightforward. The K (or Key) channel represents pixels of the lowest intensity (black) with color white. Meaning that the text, which is almost white, will be rendered in black... This is the K Channel of the input:
You see how the darker pixels on the input are almost white here? That's nice, it seems to get a clear separation between the text and everything else. It's a shame that we have some big nasty glare on the right side. Anyway, the conversion involves float operations, so gotta be careful with data types. Maybe we can improve this image with a little brightness/contrast adjustment. Just a little bit, I'm just trying to separate more the text from that nasty glare:
# Apply a contrast/brightness adjustment on Channel K:
alpha = 0
beta = 1.2
adjustedK = cv2.normalize(kChannel, None, alpha, beta, cv2.NORM_MINMAX, cv2.CV_32F)
# Convert back to uint 8:
adjustedK = (255*adjustedK).astype(np.uint8)
This is the adjusted image:
There's a little bit more separation between the text and the glare, it seems. Alright, let's apply an Adaptive Thresholding on this bad boy to get an initial segmentation mask:
# Adaptive Thresholding on adjusted Channel K:
windowSize = 21
windowConstant = 11
binaryImg = cv2.adaptiveThreshold(adjustedK, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY_INV, windowSize, windowConstant)
You see I'm using a not-so-big windowSize here for the thresholding? Feel free to tune out these parameters if you like. This is the binary image I get:
Yeah, there's a lot of noise. Here's what I propose to get a cleaner mask: There's some obvious blobs that are bigger than the text. Likewise, there are other blobs that are smaller than the text. Let's locate the big blobs and the small blobs and subtract them. The resulting image should contain the text, if we set our parameters correctly. Let's see:
# Get the biggest blobs on the image:
minArea = 180
bigBlobs = areaFilter(minArea, binaryImg)
# Filter the smallest blobs on the image:
minArea = 20
smallBlobs = areaFilter(minArea, binaryImg)
# Let's try to isolate the text:
textMask = smallBlobs - bigBlobs
cv2.imshow("Text Mask", textMask)
cv2.waitKey(0)
Here I'm using a helper function called areaFilter. This function returns all the blobs of an image that are above a minimum area threshold. I'll post the function at the end of the answer. In the meantime, check out these cool images:
Big blobs:
Filtered small blobs:
The difference between them:
Sadly, it seems that some portions of the characters didn't survive the filtering operations. That's because the intersection of the glare and the text is too much for the algorithm to get a clear separation. Something that could benefit the result of the in-painting is a subtle blur on this mask, to get rid of that compression alias. Let's apply some Gaussian Blur to smooth the mask a little bit:
# Blur the mask a little bit to get a
# smoother inpanting result:
kernelSize = (3, 3)
textMask = cv2.GaussianBlur(textMask, kernelSize, cv2.BORDER_DEFAULT)
The kernel is not that big, I just want a subtle effect. This is the result:
Finally, let's apply the in-painting:
# Apply the inpaint method:
inpaintRadius = 10
inpaintMethod = cv2.INPAINT_TELEA
result = cv2.inpaint(originalImg, textMask, inpaintRadius, inpaintMethod)
cv2.imshow("Inpaint Result", result)
cv2.waitKey(0)
This is the final result:
Well, is not that bad, considering the input image. You can try to further improve the result adjusting some values, but the reality of this life, my dude, is that the input image is not that great to begin with. Here's the areaFilter function:
def areaFilter(minArea, inputImage):
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(inputImage, connectivity=4)
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
return filteredImage
I have images, which look like the following:
I want to find the bounding boxes for the 8 digits. My first try was to use cv2 with the following code:
import cv2
import matplotlib.pyplot as plt
import cvlib as cv
from cvlib.object_detection import draw_bbox
im = cv2.imread('31197402.png')
bbox, label, conf = cv.detect_common_objects(im)
output_image = draw_bbox(im, bbox, label, conf)
plt.imshow(output_image)
plt.show()
Unfortunately that doesn't work. Does anyone have an idea?
The problem in your solution is likely the input image, which is very poor in quality. There’s hardly any contrast between the characters and the background. The blob detection algorithm from cvlib is probably failing to distinguish between character blobs and background, producing a useless binary mask. Let’s try to solve this using purely OpenCV.
I propose the following steps:
Apply adaptive threshold to get a reasonably good binary mask.
Clean the binary mask from blob noise using an area filter.
Improve the quality of the binary image using morphology.
Get the outer contours of each character and fit a bounding rectangle to each character blob.
Crop each character using the previously calculated bounding rectangle.
Let’s see the code:
# importing cv2 & numpy:
import numpy as np
import cv2
# Set image path
path = "C:/opencvImages/"
fileName = "mrrm9.png"
# Read input image:
inputImage = cv2.imread(path+fileName)
inputCopy = inputImage.copy()
# Convert BGR to grayscale:
grayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
From here there’s not much to discuss, just reading the BGR image and converting it to grayscale. Now, let’s apply an adaptive threshold using the gaussian method. This is the tricky part, as the parameters are adjusted manually depending on the quality of the input. The way the method works is dividing the image into a grid of cells of windowSize, it then applies a local threshold to found the optimal separation between foreground and background. An additional constant, indicated by windowConstant can be added to the threshold to fine tune the output:
# Set the adaptive thresholding (gasussian) parameters:
windowSize = 31
windowConstant = -1
# Apply the threshold:
binaryImage = cv2.adaptiveThreshold(grayscaleImage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, windowSize, windowConstant)
You get this nice binary image:
Now, as you can see, the image has some blob noise. Let’s apply an area filter to get rid of the noise. The noise is smaller than the target blobs of interest, so we can easy filter them based on area, like this:
# Perform an area filter on the binary blobs:
componentsNumber, labeledImage, componentStats, componentCentroids = \
cv2.connectedComponentsWithStats(binaryImage, connectivity=4)
# Set the minimum pixels for the area filter:
minArea = 20
# Get the indices/labels of the remaining components based on the area stat
# (skip the background component at index 0)
remainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]
# Filter the labeled pixels based on the remaining labels,
# assign pixel intensity to 255 (uint8) for the remaining pixels
filteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype('uint8')
This is the filtered image:
We can improve the quality of this image with some morphology. Some of the characters seem to be broken (Check out the first 3 - it is broken in two separated blobs). We can join them applying a closing operation:
# Set kernel (structuring element) size:
kernelSize = 3
# Set operation iterations:
opIterations = 1
# Get the structuring element:
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
# Perform closing:
closingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)
This is the "closed" image:
Now, you want to get the bounding boxes for each character. Let’s detect the outer contour of each blob and fit a nice rectangle around it:
# Get each bounding box
# Find the big contours/blobs on the filtered image:
contours, hierarchy = cv2.findContours(closingImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
contours_poly = [None] * len(contours)
# The Bounding Rectangles will be stored here:
boundRect = []
# Alright, just look for the outer bounding boxes:
for i, c in enumerate(contours):
if hierarchy[0][i][3] == -1:
contours_poly[i] = cv2.approxPolyDP(c, 3, True)
boundRect.append(cv2.boundingRect(contours_poly[i]))
# Draw the bounding boxes on the (copied) input image:
for i in range(len(boundRect)):
color = (0, 255, 0)
cv2.rectangle(inputCopy, (int(boundRect[i][0]), int(boundRect[i][1])), \
(int(boundRect[i][0] + boundRect[i][2]), int(boundRect[i][1] + boundRect[i][3])), color, 2)
The last for loop is pretty much optional. It fetches each bounding rectangle from the list and draws it on the input image, so you can see each individual rectangle, like this:
Let's visualize that on the binary image:
Additionally, if you want to crop each character using the bounding boxes we just got, you do it like this:
# Crop the characters:
for i in range(len(boundRect)):
# Get the roi for each bounding rectangle:
x, y, w, h = boundRect[i]
# Crop the roi:
croppedImg = closingImage[y:y + h, x:x + w]
cv2.imshow("Cropped Character: "+str(i), croppedImg)
cv2.waitKey(0)
This is how you can get the individual bounding boxes. Now, maybe you are trying to pass these images to an OCR. I tried passing the filtered binary image (after the closing operation) to pyocr (That’s the OCR I’m using) and I get this as output string: 31197402
The code I used to get the OCR of the closed image is this:
# Set the OCR libraries:
from PIL import Image
import pyocr
import pyocr.builders
# Set pyocr tools:
tools = pyocr.get_available_tools()
# The tools are returned in the recommended order of usage
tool = tools[0]
# Set OCR language:
langs = tool.get_available_languages()
lang = langs[0]
# Get string from image:
txt = tool.image_to_string(
Image.open(path + "closingImage.png"),
lang=lang,
builder=pyocr.builders.TextBuilder()
)
print("Text is:"+txt)
Be aware that the OCR receives black characters on white background, so you must invert the image first.
I need to detect the vein junctions of wings bee (the image is just one example). I use opencv - python.
ps: maybe the image lost a little bit of quality, but the image is all connected with one pixel wide.
This is an interesting question. The result I got is not perfect, but it might be a good start. I filtered the image with a kernel that only looks at the edges of the kernel. The idea being, that a junction has at least 3 lines that cross the kernel-edge, where regular lines only have 2. This means that when the kernel is over a junction, the resulting value will be higher, so a threshold will reveal them.
Due to the nature of the lines there are some value positives and some false negatives. A single joint will most likely be found several times, so you'll have to account for that. You can make them unique by drawing small dots and detecting those dots.
Result:
Code:
import cv2
import numpy as np
# load the image as grayscale
img = cv2.imread('xqXid.png',0)
# make a copy to display result
im_or = img.copy()
# convert image to larger datatyoe
img.astype(np.int32)
# create kernel
kernel = np.ones((7,7))
kernel[2:5,2:5] = 0
print(kernel)
#apply kernel
res = cv2.filter2D(img,3,kernel)
# filter results
loc = np.where(res > 2800)
print(len(loc[0]))
#draw circles on found locations
for x in range(len(loc[0])):
cv2.circle(im_or,(loc[1][x],loc[0][x]),10,(127),5)
#display result
cv2.imshow('Result',im_or)
cv2.waitKey(0)
cv2.destroyAllWindows()
Note: you can try to tweak the kernel and the threshold. For example, with the code above I got 126 matches. But when I use
kernel = np.ones((5,5))
kernel[1:4,1:4] = 0
with threshold
loc = np.where(res > 1550)
I got 33 matches in these locations:
You can use Harris corner detector algorithm to detect vein junction in above image. Compared to the previous techniques, Harris corner detector takes the differential of the corner score into account with reference to direction directly, instead of using shifting patches for every 45 degree angles, and has been proved to be more accurate in distinguishing between edges and corners (Source: wikipedia).
code:
img = cv2.imread('wings-bee.png')
# convert image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = np.float32(gray)
'''
args:
img - Input image, it should be grayscale and float32 type.
blockSize - It is the size of neighbourhood considered for corner detection
ksize - Aperture parameter of Sobel derivative used.
k - Harris detector free parameter in the equation.
'''
dst = cv2.cornerHarris(gray, 9, 5, 0.04)
# result is dilated for marking the corners
dst = cv2.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img_thresh = cv2.threshold(dst, 0.32*dst.max(), 255, 0)[1]
img_thresh = np.uint8(img_thresh)
# get the matrix with the x and y locations of each centroid
centroids = cv2.connectedComponentsWithStats(img_thresh)[3]
stop_criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# refine corner coordinates to subpixel accuracy
corners = cv2.cornerSubPix(gray, np.float32(centroids), (5,5), (-1,-1), stop_criteria)
for i in range(1, len(corners)):
#print(corners[i])
cv2.circle(img, (int(corners[i,0]), int(corners[i,1])), 5, (0,255,0), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
output:
You can check the theory behind Harris Corner detector algorithm from here.