Bulk removing unwanted parts of images - python

I have downloaded a number of images (1000) from a website but they each have a black and white ruler running along 1 or 2 edges and some have these catalogue number tickets. I need these elements removed, the ruler at the very least.
Example images of coins:
The images all have the ruler in slightly different places so i cant just preform the same crop on them.
So I tried to remove the black and replace it with white using this code
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
im = Image.open('image-0.jpg')
im = im.convert('RGBA')
data = np.array(im) # "data" is a height x width x 4 numpy array
red, green, blue, alpha = data.T # Temporarily unpack the bands for readability
# Replace black with white
black_areas = (red < 150) & (blue < 150) & (green < 150)
data[..., :-1][black_areas.T] = (255, 255, 255) # Transpose back needed
im2 = Image.fromarray(data)
im2.show()
but it pretty much just removed half the coin as well:
I was having a read of some posts on opencv but though I'd see if there was a simpler way I'd missed first.

So I have taken a look at your problem and I have found a solution for your two images you provided, I hope it works for you other images as well but it is always hard to tell as it can be different on an individual basis. This solution is using OpenCV for preprocessing and contour detection to get the 2nd and 3rd largest elements in your picture (largest is the bounding box around the edges) which should be your coins. Then I create a box around those two items and add some padding before I crop to size.
So we start off with preprocessing:
import numpy as np
import cv2
img = cv2.imread(r'<PATH TO YOUR IMAGE>')
img = cv2.resize(img, None, fx=3, fy=3)
imgray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(imgray, (5, 5), 0)
ret, thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
Still rather basic, we make the image bigger so it is easier to detect contours, then we turn it into grayscale, blur it and apply thresholding to it so we turn all grey values either white or black. This then gives us the following image:
We now do contour detection, get the areas around our contours and sort them by the biggest area. Then we drop the biggest one as it is the box around the whole image and take the 2nd and 3rd biggest. And then get the x,y,w,h values we are interested in.
contours, hierarchy = cv2.findContours(
thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
areas = []
for cnt in contours:
area = cv2.contourArea(cnt)
areas.append((area, cnt))
areas.sort(key=lambda x: x[0], reverse=True)
areas.pop(0)
x, y, w, h = cv2.boundingRect(areas[0][1])
x2, y2, w2, h2 = cv2.boundingRect(areas[1][1])
If we draw a rectangle around those contours:
Now we take those coordinates and create a box around both of them. This might need some minor adjusting as I just quickly took the bigger width of the two and not the corresponding one for the right coin but since I added extra padding it should be fine in most cases. And finally crop to size:
pad = 15
img = img[(min(y, y2) - pad) : (max(y, y2) + max(h, h2) + pad),
(min(x, x2) - pad) : (max(x, x2) + max(w, w2) + pad)]
I hope this helps you to understand how you could achieve what you want, I tried it on both your images and it worked well for them. It might need some adjustments and depending on how your other images look the simple approach of taking the two biggest objects (apart from image bounding box) might be turned into something more sophisticated to detect the cricular shapes or something along those lines. Alternatively you could try to detect the rulers and crop from their position inwards. You will have to decide after you have done this on more example images in your dataset.

If you're looking for a robust solution, you should try something like Max Kaha's response, since it'll provide you with greater fine tuning.
Since the rulers tend to be left with just a little bit of text after your "black to white" filter, a quick solution is to use erosion followed by a dilation to create a mask for your images, and then apply the mask to the original image.
Pillow offers that with the ImageFilter class. Here's your code with a few modifications that'll achieve that:
from PIL import Image, ImageFilter
import numpy as np
import matplotlib.pyplot as plt
WHITE = 255, 255, 255
input_image = Image.open('image.png')
input_image = input_image.convert('RGBA')
input_data = np.array(input_image) # "data" is a height x width x 4 numpy array
red, green, blue, alpha = input_data.T # Temporarily unpack the bands for readability
# Replace black with white
thresh = 30
black_areas = (red < thresh) & (blue < thresh) & (green < thresh)
input_data[..., :-1][black_areas.T] = WHITE # Transpose back needed
erosion_factor = 5
# dilation is bigger to avoid cropping the objects of interest
dilation_factor = 11
erosion_filter = ImageFilter.MaxFilter(erosion_factor)
dilation_filter = ImageFilter.MinFilter(dilation_factor)
eroded = Image.fromarray(input_data).filter(erosion_filter)
dilated = eroded.filter(dilation_filter)
mask_threshold = 220
# the mask is black on regions to be hidden
mask = dilated.convert('L').point(lambda x: 255 if x < mask_threshold else 0)
# create base image
output_image = Image.new('RGBA', input_image.size, WHITE)
# paste only the desired regions
output_image.paste(input_image, mask=mask)
output_image.show()
You should also play around with the black to white threshold and the erosion/dilation factors to try and find the best fit for most of your images.

Related

Detecting a horizontal line in an image

Problem:
I'm working with a dataset that contains many images that look something like this:
Now I need all these images to be oriented horizontally or vertically, such that the color palette is either at the bottom or the right side of the image. This can be done by simply rotating the image, but the tricky part is figuring out which images should be rotated and which shouldn't.
What I have tried:
I thought that the best way to do this, is by detecting the white line that separates the the color palette from the image. I decided to rotate all images that have the palette at the bottom such that they have it at the right side.
# yes I am mixing between PIL and opencv (I like the PIL resizing more)
# resize image to be 128 by 128 pixels
img = img.resize((128, 128), PIL.Image.BILINEAR)
img = np.array(img)
# perform edge detection, not sure if these are the best parameters for Canny
edges = cv2.Canny(img, 30, 50, 3, apertureSize=3)
has_line = 0
# take numpy slice of the area where the white line usually is
# (not always exactly in the same spot which probably has to do with the way I resize my image)
for line in edges[75:80]:
# check if most of one of the lines contains white pixels
counts = np.bincount(line)
if np.argmax(counts) == 255:
has_line = True
# rotate if we found such a line
if has_line == True:
s = np.rot90(s)
An example of it working correctly:
An example of it working incorrectly:
This works maybe on 98% of images but there are some cases where it will rotate images that shouldn't be rotated or not rotate images that should be rotated. Maybe there is an easier way to do this, or maybe a more elaborate way that is more consistent? I could do it manually but I'm dealing with a lot of images. Thanks for any help and/or comments.
Here are some images where my code fails for testing purposes:
You can start by thresholding your image by setting a very high threshold like 250 to take advantage of the property that your lines are white. This will make all the background black. Now create a special horizontal kernel with a shape like (1, 15) and erode your image with it. What this will do is remove the vertical lines from the image and only the horizontal lines will be left.
import cv2
import numpy as np
img = cv2.imread('horizontal2.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 250, 255, cv2.THRESH_BINARY)
kernel_hor = np.ones((1, 15), dtype=np.uint8)
erode = cv2.erode(thresh, kernel_hor)
As stated in the question the color palates can only be on the right or the bottom. So we can test to check how many contours does the right region has. For this just divide the image in half and take the right part. Before finding contours dilate the result to fill in any gaps with a normal (3, 3) kernel. Using the cv2.RETR_EXTERNAL find the contours and count how many we have found, if greater than a certain number the image is correct side up and there is no need to rotate.
right = erode[:, erode.shape[1]//2:]
kernel = np.ones((3, 3), dtype=np.uint8)
right = cv2.dilate(right, kernel)
cnts, _ = cv2.findContours(right, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
if len(cnts) > 3:
print('No need to rotate')
else:
print('rotate')
#ADD YOUR ROTATE CODE HERE
P.S. I tested for all four images you have provided and it worked well. If in case it does not work for any image let me know.

How Can I Add an Outline/Stroke/Border to a PNG Image with Pillow Library in Python?

I am trying to use the Pillow (python-imaging-library) Python library in order to create an outline/stroke/border (with any color and width chosen) around my .png image. You can see here the original image and my wanted result (create by a phone app):
https://i.stack.imgur.com/4x4qh.png
You can download the png file of the original image here: https://pixabay.com/illustrations/brain-character-organ-smart-eyes-1773885/
I have done it in the medium size(1280x1138) but maybe it is better to do it with the smallest size (640x569).
I tried to solve the problem with two methods.
METHOD ONE
The first method is to create a fully blacked image of the brain.png image, enlarge it, and paste the original colored brain image on top of it. Here is my code:
brain_black = Image.open("brain.png") #load brain image
width = brain_black.width #in order not to type a lot
height = brain_black.height #in order not to type a lot
rectangle = Image.new("RGBA", (width, height), "black") #creating a black rectangle in the size of the brain image
brain_black.paste(rectangle, mask=brain_black) #pasting on the brain image the black rectangle, and masking it with the brain picture
#now brain_black is the brain.png image, but all its pixels are black. Let's continue:
brain_black = brain_black.resize((width+180, height+180)) #resizing the brain_black by some factor
brain_regular = Image.open("brain.png") #load the brain image in order to paste later on
brain_black.paste(brain_regular,(90,90), mask=brain_regular) #paste the regular (colored) brain on top of the enlarged black brain (in x=90, y=90, the middle of the black brain)
brain_black.save("brain_method_resize.png") #saving the image
This method doesn't work, as you can see in the image link above. It might have worked for simple geometric shapes, but not for a complicated shape like this.
METHOD TWO
The second method is to load the brain image pixels data into a 2-dimensional array, and loop over all of the pixels. Check the color of every pixel, and in every pixel which is not transparent (means A(or Alpha) is not 0 in the rgbA form) to draw a black pixel in the pixel above, below, right, left, main diagonal down, main diagonal up, secondary diagonal (/) down and secondary diagonal (/) up. Then to draw a pixel in the second pixel above, the second pixel below and etc. this was done with a "for loop" where the number of repetitions is the wanted stroke width (in this example is 30). Here is my code:
brain=Image.open("brain.png") #load brain image
background=Image.new("RGBA", (brain.size[0]+400, brain.size[1]+400), (0, 0, 0, 0)) #crate a background transparent image to create the stroke in it
background.paste(brain, (200,200), brain) #paste the brain image in the middle of the background
pixelsBrain = brain.load() #load the pixels array of brain
pixelsBack=background.load() #load the pixels array of background
for i in range(brain.size[0]):
for j in range(brain.size[1]):
r, c = i+200, j+200 #height and width offset
if(pixelsBrain[i,j][3]!=0): #checking if the opacity is not 0, if the alpha is not 0.
for k in range(30): #the loop
pixelsBack[r, c + k] = (0, 0, 0, 255)
pixelsBack[r, c - k] = (0, 0, 0, 255)
pixelsBack[r + k, c] = (0, 0, 0, 255)
pixelsBack[r - k, c] = (0, 0, 0, 255)
pixelsBack[r + k, c + k] = (0, 0, 0, 255)
pixelsBack[r - k, c - k] = (0, 0, 0, 255)
pixelsBack[r + k, c - k] =(0, 0, 0, 255)
pixelsBack[r - k, c + k] = (0, 0, 0, 255)
background.paste(brain, (200,200), brain) #pasting the colored brain onto the background, because the loop "destroyed" the picture.
background.save("brain_method_loop.png")
This method did work, but it is very time-consuming (takes about 30 seconds just for one picture and 30 pixels stroke). I want to do it for many pictures so this method is not good for me.
Is there an easier and better way to reach my wanted result using Python Pillow library. How can I do it?
And also, how can I fasten my loop code (I understood something about Numpy and OpenCV, which is better for this purpose?)
I know that if a phone app could do it in a matter of milliseconds, also python can, but I didn't find any way to do it.
Thank you.
I tried some solution similar with photoshop stroke effect using OpenCV (It is not perfect and I still finding better solution)
This algorithm is based on euclidean distance transform. I also tried dilation algorithm with ellipse kernel structure, it is bit different with photoshop, and there are some information that distance transform is the way that photoshop using.
def stroke(origin_image, threshold, stroke_size, colors):
img = np.array(origin_image)
h, w, _ = img.shape
padding = stroke_size + 50
alpha = img[:,:,3]
rgb_img = img[:,:,0:3]
bigger_img = cv2.copyMakeBorder(rgb_img, padding, padding, padding, padding,
cv2.BORDER_CONSTANT, value=(0, 0, 0, 0))
alpha = cv2.copyMakeBorder(alpha, padding, padding, padding, padding, cv2.BORDER_CONSTANT, value=0)
bigger_img = cv2.merge((bigger_img, alpha))
h, w, _ = bigger_img.shape
_, alpha_without_shadow = cv2.threshold(alpha, threshold, 255, cv2.THRESH_BINARY) # threshold=0 in photoshop
alpha_without_shadow = 255 - alpha_without_shadow
dist = cv2.distanceTransform(alpha_without_shadow, cv2.DIST_L2, cv2.DIST_MASK_3) # dist l1 : L1 , dist l2 : l2
stroked = change_matrix(dist, stroke_size)
stroke_alpha = (stroked * 255).astype(np.uint8)
stroke_b = np.full((h, w), colors[0][2], np.uint8)
stroke_g = np.full((h, w), colors[0][1], np.uint8)
stroke_r = np.full((h, w), colors[0][0], np.uint8)
stroke = cv2.merge((stroke_b, stroke_g, stroke_r, stroke_alpha))
stroke = cv2pil(stroke)
bigger_img = cv2pil(bigger_img)
result = Image.alpha_composite(stroke, bigger_img)
return result
def change_matrix(input_mat, stroke_size):
stroke_size = stroke_size - 1
mat = np.ones(input_mat.shape)
check_size = stroke_size + 1.0
mat[input_mat > check_size] = 0
border = (input_mat > stroke_size) & (input_mat <= check_size)
mat[border] = 1.0 - (input_mat[border] - stroke_size)
return mat
def cv2pil(cv_img):
cv_img = cv2.cvtColor(cv_img, cv2.COLOR_BGRA2RGBA)
pil_img = Image.fromarray(cv_img.astype("uint8"))
return pil_img
output = stroke(test_image, threshold=0, stroke_size=10, colors=((0,0,0),))
I can't do a fully tested Python solution for you at the moment as I have other commitments, but I can certainly show you how to do it in a few milliseconds and give you some pointers.
I just used ImageMagick at the command line. It runs on Linux and macOS (use brew install imagemagick) and Windows. So, I extract the alpha/transparency channel and discard all the colour info. Then use a morphological "edge out" operation to generate a fat line around the edges of the shape in the alpha channel. I then invert the white edges so they become black and make all the white pixels transparent. Then overlay on top of the original image.
Here's the full command:
magick baby.png \( +clone -alpha extract -morphology edgeout octagon:9 -threshold 10% -negate -transparent white \) -flatten result.png
So that basically opens the image, messes about with a cloned copy of the alpha layer inside the parentheses and then flattens the black outline that results back onto the original image and saves it. Let's do the steps one at a time:
Extract the alpha layer as alpha.png:
magick baby.png -alpha extract alpha.png
Now fatten the edges, invert and make everything not black become transparent and save as overlay.png:
magick alpha.png -morphology edgeout octagon:9 -threshold 10% -negate -transparent white overlay.png
Here's the final result, change the octagon:9 to octagon:19 for fatter lines:
So, with PIL... you need to open the image and convert to RGBA, then split the channels. You don't need to touch the RGB channels just the A channel.
im = Image.open('baby.png').convert('RGBA')
R, G, B, A = im.split()
Some morphology needed here - see here.
Merge the original RGB channels with the new A channel and save:
result = Image.merge((R,G,B,modifiedA))
result.save('result.png')
Note that there are Python bindings to ImageMagick called wand and you may find it easier to translate my command-line stuff using that... wand. Also, scikit-image has an easy-to-use morphology suite too.
I've written this function which is based on morphological dilation and lets you set the stroke size and color. But it's EXTREMELY slow and it seems to not work great with small elements.
If anyone can help me speed it up it would be extremely helpful.
def addStroke(image,strokeSize=1,color=(0,0,0)):
#Create a disc kernel
kernel=[]
kernelSize=math.ceil(strokeSize)*2+1 #Should always be odd
kernelRadius=strokeSize+0.5
kernelCenter=kernelSize/2-1
pixelRadius=1/math.sqrt(math.pi)
for x in range(kernelSize):
kernel.append([])
for y in range(kernelSize):
distanceToCenter=math.sqrt((kernelCenter-x+0.5)**2+(kernelCenter-y+0.5)**2)
if(distanceToCenter<=kernelRadius-pixelRadius):
value=1 #This pixel is fully inside the circle
elif(distanceToCenter<=kernelRadius):
value=min(1,(kernelRadius-distanceToCenter+pixelRadius)/(pixelRadius*2)) #Mostly inside
elif(distanceToCenter<=kernelRadius+pixelRadius):
value=min(1,(pixelRadius-(distanceToCenter-kernelRadius))/(pixelRadius*2)) #Mostly outside
else:
value=0 #This pixel is fully outside the circle
kernel[x].append(value)
kernelExtent=int(len(kernel)/2)
imageWidth,imageHeight=image.size
outline=image.copy()
outline.paste((0,0,0,0),[0,0,imageWidth,imageHeight])
imagePixels=image.load()
outlinePixels=outline.load()
#Morphological grayscale dilation
for x in range(imageWidth):
for y in range(imageHeight):
highestValue=0
for kx in range(-kernelExtent,kernelExtent+1):
for ky in range(-kernelExtent,kernelExtent+1):
kernelValue=kernel[kx+kernelExtent][ky+kernelExtent]
if(x+kx>=0 and y+ky>=0 and x+kx<imageWidth and y+ky<imageHeight and kernelValue>0):
highestValue=max(highestValue,min(255,int(round(imagePixels[x+kx,y+ky][3]*kernelValue))))
outlinePixels[x,y]=(color[0],color[1],color[2],highestValue)
outline.paste(image,(0,0),image)
return outline
Very simple and primitive solution: use PIL.ImageFilter.FIND_EDGES to find edge of drawing, it is about 1px thick, and draw a circle in every point of the edge. It is quite fast and require few libs, but has a disadvantage of no smoothing.
from PIL import Image, ImageFilter, ImageDraw
from pathlib import Path
def mystroke(filename: Path, size: int, color: str = 'black'):
outf = filename.parent/'mystroke'
if not outf.exists():
outf.mkdir()
img = Image.open(filename)
X, Y = img.size
edge = img.filter(ImageFilter.FIND_EDGES).load()
stroke = Image.new(img.mode, img.size, (0,0,0,0))
draw = ImageDraw.Draw(stroke)
for x in range(X):
for y in range(Y):
if edge[x,y][3] > 0:
draw.ellipse((x-size,y-size,x+size,y+size),fill=color)
stroke.paste(img, (0, 0), img )
# stroke.show()
stroke.save(outf/filename.name)
if __name__ == '__main__':
folder = Path.cwd()/'images'
for img in folder.iterdir():
if img.is_file(): mystroke(img, 10)
Solution using PIL
I was facing the same need: outlining a PNG image.
Here is the input image:
Input image
I see that some solution have been found, but in case some of you want another alternative, here is mine:
Basically, my solution workflow is as follow:
Read and fill the non-alpha chanel of the PNG image with the border
color
Resize the unicolor image to make it bigger
Merge the original image to the bigger unicolor image
Here you go! You have an outlined PNG image with the width and color of your choice.
Here is the code implementing the workflow:
from PIL import Image
# Set the border and color
borderSize = 20
color = (255, 0, 0)
imgPath = "<YOUR_IMAGE_PATH>"
# Open original image and extract the alpha channel
im = Image.open(imgPath)
alpha = im.getchannel('A')
# Create red image the same size and copy alpha channel across
background = Image.new('RGBA', im.size, color=color)
background.putalpha(alpha)
# Make the background bigger
background=background.resize((background.size[0]+borderSize, background.size[1]+borderSize))
# Merge the targeted image (foreground) with the background
foreground = Image.open(imgPath)
background.paste(foreground, (int(borderSize/2), int(borderSize/2)), foreground.convert("RGBA"))
imageWithBorder = background
imageWithBorder.show()
And here is the outputimage:
Output image
Hope it helps!
I found a way to do this using the ImageFilter module, it is much faster than any custom implementation that I've seen here and doesn't rely on resizing which doesn't work for convex hulls
from PIL import Image, ImageFilter
stroke_radius = 5
img = Image.open("img.png") # RGBA image
stroke_image = Image.new("RGBA", img.size, (255, 255, 255, 255))
img_alpha = img.getchannel(3).point(lambda x: 255 if x>0 else 0)
stroke_alpha = img_alpha.filter(ImageFilter.MaxFilter(stroke_radius))
# optionally, smooth the result
stroke_alpha = stroke_alpha.filter(ImageFilter.SMOOTH)
stroke_image.putalpha(stroke_alpha)
output = Image.alpha_composite(stroke_image, img)
output.save("output.png")

OpenCV image numbers identification

Sorry for bad english. I want to make condition, that image with 12 in left corner = image with 12 in right corner and != image with 21.
I need a fast way to determine this, cause there are many pics and they refresh.
I tried to use counting pixels of specific image:
result = np.count_nonzero(np.all(original > (0,0,0), axis=2))
(why I use >(0,0,0) instead of == (255,255,255)? there are grey shadows near white symbols, that eyes can't see)
This way doesn't see a difference between 12 and 21.
I tried the second way, compare new images with templates, but it one see a huge difference between 12 and 12 in left-right corners!
original = ('auto/5or.png' )
template= cv2.imread( 'auto/5t.png' )
res = cv2.matchTemplate( original, template, cv2.TM_CCOEFF_NORMED )
I didn't try yet some difficult method of determining digits, cause I think - this is too slow, even on my little pics. (I may mistake).
I have digits only from 0 to 30, I have all templates, examples, they are differ only with location inside black square.
Any thoughts? Thanks in advance.
If you don't want the position of the digits in the image to make a difference, you can threshold the image to black and white and find the bounding box and crop to it so your digits are always in the same place - then just difference the images or use what you were using before:
#!/usr/local/bin/python3
import numpy as np
from PIL import Image
# Open image, greyscale and threshold
im=np.array(Image.open('21.png').convert('L'))
# Mask of white pixels
mask = im.copy()
mask[mask<128] = 0 # Threshold pixels < 128 down to black
# Coordinates of white pixels
coords = np.argwhere(mask)
# Bounding box of white pixels
x0, y0 = coords.min(axis=0)
x1, y1 = coords.max(axis=0) + 1
# Crop to bbox
cropped = im[x0:x1, y0:y1]
# Save
Image.fromarray(cropped).save('result.png')
That gives you this:
Obviously crop your template images as well.
I am less familiar with OpenCV in Python, but it would look something like this:
import cv2
# Load image
img = cv2.imread('21.png',0)
# Threshold at 127
ret,thresh = cv2.threshold(img,127,255,0)
# Get contours
im2, contours, hierarchy = cv2.findContours(thresh, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# Get bounding box
cnt = contours[0]
x,y,w,h = cv2.boundingRect(cnt)

Merge MSER detected objetcs (OpenCV, Python)

I am working on this image as source:
Applying the next code...
import cv2
import numpy as np
mser = cv2.MSER_create()
img = cv2.imread('C:\\Users\\Link\\Desktop\\test2.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions, _ = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions]
cv2.polylines(vis, hulls, 1, (0, 255, 0))
mask = np.zeros((img.shape[0], img.shape[1], 1), dtype=np.uint8)
for contour in hulls:
cv2.drawContours(mask, [contour], -1, (255, 255, 255), -1)
text_only = cv2.bitwise_and(img, img, mask=mask)
cv2.imshow('img', vis)
cv2.waitKey(0)
cv2.imshow('img', mask)
cv2.waitKey(0)
cv2.imshow('img', text_only)
cv2.waitKey(0)
cv2.imwrite('C:\\Users\\Link\\Desktop\\test_o\\1.png', text_only)
...I am obtaining this as result (mask):
The question is this:
how to merge into a single object the number 5 in the number series (157661546) as long as it is divided in the mask image ?
Thanks
Have a look here, it seems like the exact answer.
Here instead there is my version of the above code fine tuned for text extraction (with masking too).
Below there is the original code from the previous article, "ported" to python 3, opencv 3, added mser and bounding boxes. The main difference with my version is how the grouping distance is defined: mine is text-oriented while the one below is a free geometrical distance.
import sys
import cv2
import numpy as np
def find_if_close(cnt1,cnt2):
row1,row2 = cnt1.shape[0],cnt2.shape[0]
for i in range(row1):
for j in range(row2):
dist = np.linalg.norm(cnt1[i]-cnt2[j])
if abs(dist) < 25: # <-- threshold
return True
elif i==row1-1 and j==row2-1:
return False
img = cv2.imread(sys.argv[1])
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
cv2.imshow('input', img)
ret,thresh = cv2.threshold(gray,127,255,0)
mser=False
if mser:
mser = cv2.MSER_create()
regions = mser.detectRegions(thresh)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
contours = hulls
else:
thresh = cv2.bitwise_not(thresh) # wants black bg
im2,contours,hier = cv2.findContours(thresh,cv2.RETR_EXTERNAL,2)
cv2.drawContours(img, contours, -1, (0,0,255), 1)
cv2.imshow('base contours', img)
LENGTH = len(contours)
status = np.zeros((LENGTH,1))
print("Elements:", len(contours))
for i,cnt1 in enumerate(contours):
x = i
if i != LENGTH-1:
for j,cnt2 in enumerate(contours[i+1:]):
x = x+1
dist = find_if_close(cnt1,cnt2)
if dist == True:
val = min(status[i],status[x])
status[x] = status[i] = val
else:
if status[x]==status[i]:
status[x] = i+1
unified = []
maximum = int(status.max())+1
for i in range(maximum):
pos = np.where(status==i)[0]
if pos.size != 0:
cont = np.vstack(contours[i] for i in pos)
hull = cv2.convexHull(cont)
unified.append(hull)
cv2.drawContours(img,contours,-1,(0,0,255),1)
cv2.drawContours(img,unified,-1,(0,255,0),2)
#cv2.drawContours(thresh,unified,-1,255,-1)
for c in unified:
(x,y,w,h) = cv2.boundingRect(c)
cv2.rectangle(img, (x,y), (x+w,y+h), (255, 0, 0), 2)
cv2.imshow('result', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Sample output (the yellow blob is below the binary threshold conversion so it's ignored). Red: original contours, green: unified ones, blue: bounding boxes.
Probably there is no need to use MSER as a simple findContours may work fine.
------------------------
Starting from here there is my old answer, before I found the above code. I'm leaving it anyway as it describe a couple of different approaches that may be easier/more appropriate for some scenarios.
A quick and dirty trick is to add a small gaussian blur and a high threshold before the MSER (or some dilute/erode if you prefer fancy things). In practice you just make the text bolder so that it fills small gaps. Obviously you can later discard this version and crop from the original one.
Otherwise, if your text is in lines, you may try to detect the average line center (make an histogram of Y coordinates and find the peaks for example). Then, for each line, look for fragments with a close average X. Quite fragile if text is noisy/complex.
If you do not need to split each letter, getting the bounding box for the whole word, may be easier: just split in groups based on a maximum horizontal distance between fragments (using the leftmost/rightmost points of the contour). Then use the leftmost and rightmost boxes within each group to find the whole bounding box. For multiline text first group by centroids Y coordinate.
Implementation notes:
Opencv allows you to create histograms but you probably can get away with something like this (worked for me on a similar task):
def histogram(vals, th=4, bins=400):
hist = np.zeros(bins)
for y_center in vals:
bucket = int(round(y_center / 2.)) <-- change this "2."
hist[bucket-1] += 1
print("hist: ", hist)
hist = np.where(hist > th, hist, 0)
return hist
Here my histogram is just an array with 400 buckets (my image was 800px high so each bucket catches two pixels, that is where the "2." comes from). Vals are the Y coordinates of the centroids of each fragment (you may want to ignore very small elements when you build this list). The th threshold is there just to remove some noise. You should get something like this:
0,0,0,5,22,0,0,0,0,43,7,0,0,0
This list describes, moving top to bottom, how many fragments are at each location.
Now I ran another pass to merge the peaks into a single value (just scan the array and sum while it is non-zero and reset the count on first zero) getting something like this {y:count}:
{9:27, 20:50}
Now I know I have two text rows at y=9 and y=20. Now, or before, you assign each fragment to on line (with again an 8px threshold in my case). Now you can process each line on its own, finding "words". BTW, I have your identical problem with broken letters that's why I came here looking for MSER :). Notice that if you find the whole bounding box for the word this problem happens only on the first/last letters: the other broken letters just falls inside the word box anyway.
Here is a reference for the erode/dilate thing, but gaussian blur/th worked for me.
UPDATE: I've noticed that there is something wrong in this line:
regions = mser.detectRegions(thresh)
I pass in the already thresholded image(!?). This is not relevant for the aggregation part but keep in mind that the mser part is not being used as expected.

opencv python detecting color of object is white or black

I'm new to opencv, I've managed to detect the object and place a ROI around it but I can't managed it so detect if the object is black or white. I've found something i think but i don't know if this is the right solution. The function should return True of False if it's black or white. Anyone experience with this?
def filter_color(img):
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
lower_black = np.array([0,0,0])
upper_black = np.array([350,55,100])
black = cv2.inRange(hsv, lower_black, upper_black)
If you are certain that the ROI is going to be basically black or white and not worried about misidentifying something, then you should be able to just average the pixels in the ROI and check if it is above or below some threshold.
In the code below, after you set an ROI using the newer numpy method, you can pass the roi/image into the method as if you were passing a full image.
Copy-Paste Sample
import cv2
import numpy as np
def is_b_or_w(image, black_max_bgr=(40, 40, 40)):
# use this if you want to check channels are all basically equal
# I split this up into small steps to find out where your error is coming from
mean_bgr_float = np.mean(image, axis=(0,1))
mean_bgr_rounded = np.round(mean_bgr_float)
mean_bgr = mean_bgr_rounded.astype(np.uint8)
# use this if you just want a simple threshold for simple grayscale
# or if you want to use an HSV (V) measurement as in your example
mean_intensity = int(round(np.mean(image)))
return 'black' if np.all(mean_bgr < black_max_bgr) else 'white'
# make a test image for ROIs
shape = (10, 10, 3) # 10x10 BGR image
im_blackleft_white_right = np.ndarray(shape, dtype=np.uint8)
im_blackleft_white_right[:, 0:4] = 10
im_blackleft_white_right[:, 5:9] = 255
roi_darkgray = im_blackleft_white_right[:,0:4]
roi_white = im_blackleft_white_right[:,5:9]
# test them with ROI
print 'dark gray image identified as: {}'.format(is_b_or_w(roi_darkgray))
print 'white image identified as: {}'.format(is_b_or_w(roi_white))
# output
# dark gray image identified as: black
# white image identified as: white
I don't know if this is the right approach but it worked for me.
black = [0,0,0]
Thres = 50
h,w = img.shape[:2]
black = 0
not_black = 0
for y in range(h):
for x in range(w):
pixel = img[y][x]
d = math.sqrt((pixel[0]-0)**2+(pixel[1]-0)**2+(pixel[2]-0)**2)
if d<Thres:
black = black + 1
else:
not_black = not_black +1
This one worked for me but like i said, don't know if this is the right approach. It's ask a lot of processing power therefore i defined a ROI which is much smaller. The Thres is currently hard-coded...

Categories

Resources