Extract the positions of their maximum pixel value of an image - python
I am a newbie here. I am trying to get a single line of the edge of the 2D flame then I can calculate the actual area - 3D flame area. The first thing is getting the edge. The 2D flame is sort of side-viewed concave flame, so the flame base (flat part) is brighter than the concave segment. I use the code below the find the edge, my method is finding the maximum pixel value follow the y-axis. The result seems not to get my purpose, could you please help me figure out? Thanks very much in advance.
Original image In the code I rotate the image
from PIL import Image
import numpy as np
import cv2
def initialization_rotate(path):
global h,w,img
img4 = np.array(Image.open(path).convert('L'))
img3 = img4.transpose(1,0)
img2 = img3[::-1,::1]
img = img2[400:1000,1:248]
h, w = img.shape
path = 'D:\\20190520\\14\\14\\1767.jpg'
#Noise cancellation
def opening(binary):
opened = np.zeros_like(binary)
for j in range(1,w-1):
for i in range(1,h-1):
if binary[i][j]> 100:
n1 = binary[i-1][j-1]
n2 = binary[i-1][j]
n3 = binary[i-1][j+1]
n4 = binary[i][j-1]
n5 = binary[i][j+1]
n6 = binary[i+1][j-1]
n7 = binary[i+1][j]
n8 = binary[i+1][j+1]
sum8 = int(n1) + int(n2) + int(n3) + int(n4) + int(n5) + int(n6) + int(n7) + int(n8)
if sum8 < 1000:
opened[i][j] = 0
else:
opened[i][j] = 255
else:
pass
return opened
edge = np.zeros_like(img)
# Find the max pixel value and extract the postion
for j in range(w-1):
ys = [0]
ymax = []
for i in range(h-1):
if img[i][j] > 100:
ys.append(i)
else:
pass
ymax = np.amax(ys)
edge[ymax][j] = 255
cv2.namedWindow('edge')
while(True):
cv2.imshow('edge',edge)
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
cv2.destroyAllWindows()
I have done a very quick coding and from ground up (without looking into established or state of art algorithms on edge detection). Not very suprisingly, the results are very poor. The code that I have pasted below will work only for RGB (i.e. only for three channels and not for images that are CMYK, grey-scale or RGBA or anything else). Also I have tested on single very simplistic image. In real life the images are complicated. I don't think it will fair very well there, yet. It needs a lot of work. However I am, hesitatingly, sharing it since it was requested by #Gia Tri.
Here is what I did. For every column I calculated the mean intensities and stddev intensities. I hoped that at the edge there will be change in the intensities from the average +- stdev (multiplied by a factor). If I mark the first and last in the column, I will have edge for every column and hopfully, once I stitch it, it will form and edge. The code and the attached image is for you to see, how I fared.
from scipy import ndimage
import numpy as np
import matplotlib.pyplot as plt
UppperStdBoundaryMultiplier = 1.0
LowerStdBoundaryMultiplier = 1.0
NegativeSelection = False
def SumSquareRGBintensityOfPixel(Pixel):
return np.sum(np.power(Pixel,2),axis=0)
def GetTheContinousStretchForAcolumn(Column):
global UppperStdBoundaryMultiplier
global LowerStdBoundaryMultiplier
global NegativeSelection
SumSquaresIntensityOfColumn = np.apply_along_axis(SumSquareRGBintensityOfPixel,1,Column)
Mean = np.mean(SumSquaresIntensityOfColumn)
StdDev = np.std(SumSquaresIntensityOfColumn)
LowerThreshold = Mean - LowerStdBoundaryMultiplier*StdDev
UpperThreshold = Mean + UppperStdBoundaryMultiplier*StdDev
if NegativeSelection:
Index = np.where(SumSquaresIntensityOfColumn < LowerThreshold)
Column[Index,:] = np.array([255,255,255])
else:
Index = np.where(SumSquaresIntensityOfColumn >= LowerThreshold)
LeastIndex = Index[Index==True][0]
LastIndex = Index[Index==True][-1]
Column[[LeastIndex,LastIndex],:] = np.array([255,0,0])
return Column
def DoEdgeDetection(ImageFilePath):
FileHandle = ndimage.imread(ImageFilePath)
for Column in range(FileHandle.shape[1]):
FileHandle[:,Column,:] = GetTheContinousStretchForAcolumn(FileHandle[:,Column,:])
plt.imshow(FileHandle)
plt.show()
DoEdgeDetection("/PathToImage/Image_1.jpg")
And below is the result. To the left is the query image whose edge had to be detected and to the right is the edge detected image. Edge points are marked in red dots. As you can see it fared poorly but with some investment of time and thinking, it might do far better ... or may be not. May be it is a good start but far from finish .. You, please, be the judge!
***** Edit after clarification on requirement from GiaTri ***************
So I did manage to change the program, the idea remained same. However this time the problem is overly simplified to the case that you want to detect only blue flame. Actually I went ahead and made it functional for all three color channels. However I doubt, it will be useful to you beyond blue channel.
**How to use the program below **
If your flame is vertical then choose edges = "horizontal" in the class asignment. If your edges are horizontal then choose edges = "vertical". This might be a little confusing but for the time being please use it. Later either you can change it or I can change it.
So first let me convince you that the edge detection is working much better than yesterday. See the two images below. I have taken these two flame images from internet. As before the image whose edge has to be detected is on the left and on the right is the edge-detected image. The edges are in red dot.
First horizontal flame.
and then a vertical flame.
.
There is still a lot of work left in this. However if you are a little more convinced than yesterday, then below is the code.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread
class DetectEdges():
def __init__(self, ImagePath, Channel = ["blue"], edges="vertical"):
self.Channel = Channel
self.edges = edges
self.Image_ = imread(ImagePath)
self.Image = np.copy(self.Image_)
self.Dimensions_X, self.Dimensions_Y, self.Channels = self.Image.shape
self.BackGroundSamplingPercentage = 0.5
def ShowTheImage(self):
plt.imshow(self.Image)
plt.show()
def GetTheBackGroundPixels(self):
NumberOfPoints = int(self.BackGroundSamplingPercentage*min(self.Dimensions_X, self.Dimensions_Y))
Random_X = np.random.choice(self.Dimensions_X, size=NumberOfPoints, replace=False)
Random_Y = np.random.choice(self.Dimensions_Y, size=NumberOfPoints, replace=False)
Random_Pixels = np.array(list(zip(Random_X,Random_Y)))
return Random_Pixels
def GetTheChannelEdge(self):
BackGroundPixels = self.GetTheBackGroundPixels()
if self.edges == "vertical":
if self.Channel == ["blue"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],2])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,2]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [255,0,0]
if self.Channel == ["red"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],0])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,0]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [0,255,0]
if self.Channel == ["green"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],1])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,1]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [255,0,0]
elif self.edges=="horizontal":
if self.Channel == ["blue"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],2])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,2]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [255,0,0]
if self.Channel == ["red"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],0])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,0]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [0,255,0]
if self.Channel == ["green"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],1])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,1]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [255,0,0]
Test = DetectEdges("FlameImagePath",Channel = ["blue"],edges="vertical")
Test.GetTheChannelEdge()
Test.ShowTheImage()
Please let me know if this was of any "more" help or I missed some salient requirements.
Best wishes,
By the way, Amit, I would like to show my code using the idea of the thresholding pixel value. I would love to discuss with you.
if __name__ == '__main__':
path = 'D:\\20181229__\\7\\Area 7\\1767.jpg'
img1 = cv2.imread(path)
b,g,r = cv2.split(img1)
img3 = b[94:223, 600:700]
img4 = cv2.flip(img3, 1)
h,w = img3.shape
data = []
th_val = 20
for i in range(h):
for j in range(w):
val = img3[i, -j]
if (val >= th_val):
data.append(j)
break
x = range(len(data))
plt.figure(figsize = (10, 7))
plt.subplot(121)
plt.imshow(img4)
plt.plot(data, x)
plt.subplot(121)
plt.plot(data, x)
please see the link for the result. The thing is the method still not fit totally my desire. I hope a discussion with you.
Link: https://imgur.com/QtNk7c7
Related
How to retrieve a list of indexes of white pixels whose neighbor are black using Python PIL and Numpy?
I've been looking for hours to find a question similar, but nothing has satisfied me. My problem is: I've a PIL image (representing a canal) already converted into a Numpy array (using the "L" mode of PIL), and I'd like to retrieve the white pixels whose neighbor are black (their indexes in fact), without using for loops (the image is really huge). I thought of np.where but I don't know how I should use it to solve my problem, and I also don't know if it would be faster than using for loops (because my aim would be reaching this goal with the fastest solution). I hope I'm clear enough, and I thank you in advance for your response! EDIT: for example, with this image (a simple canal, it is already a black and white image, so the image.convert('L') isn't really useful here, but the code should be generic if possible), I'd do something like that: import numpy as np from PIL import Image image = Image.open(canal) image = image.convert("L") array = np.asarray(image) l = [] for i in range(1, len(array) - 1): for j in range(1, len(array[0]) - 1): if array[i][j] == 255 and (array[i+1][j] == 0 or array[i-1][j] == 0 or array[i][j+1] == 0 or array[i][j-1] == 0): l.append((i, j)) and I'd hope to obtain l as fast as possible :) I've colored the pixels I need in red in the next image: here. EDIT2: thank you all for the help, it worked!
You could use the numba just-in-time compiler to speed up your loop. from numba import njit #njit def find_highlow_pixels(img): pixels = [] for j in range(1, img.shape[0]-1): for i in range(1, img.shape[1]-1): if ( img[j, i] == 255 and ( img[j-1, i]==0 or img[j+1,i]==0 or img[j, i-1]==0 or img[j, i+1]==0 ) ): pixels.append((j, i)) return pixels
Another possibility that came to my mind would be using the minimum filter. However, I would expect it to be slower than the first proposed solution, but could be useful to build more on top of it. import numpy as np from scipy.ndimage import minimum_filter # create a footprint that only takes the neighbours into account neighbours = (np.arange(9) % 2 == 1).reshape(3,3) # create a mask of relevant pixels, img should be your image as array mask = np.logical_and( img == 255, minimum_filter(img, footprint=neighbours) == 0 ) # get indexes indexes = np.where(mask) # as list list(zip(*indexes))
If memory space is not considered, I prefer manipulation of masks like the following. # Step 1: Generate two masks of white and black. mask_white = img == 255 mask_black = img == 0 # Step 2: Apply 8-neighborhood dilation on black mask # if you want to use numpy only, you need to implement dilation by yourself. # define function of 8-neighborhood dilation def dilate_8nb(m): index_row, index_col = np.where(m) ext_index_row = np.repeat(index_row,9) ext_index_col = np.repeat(index_col,9) ext_index_row.reshape(-1,9)[:, :3] += 1 ext_index_row.reshape(-1,9)[:, -3:] -= 1 ext_index_col.reshape(-1,9)[:, ::3] += 1 ext_index_col.reshape(-1,9)[:, 2::3] -= 1 ext_index_row = np.clip(ext_index_row, 0, m.shape[0]-1) ext_index_col = np.clip(ext_index_col, 0, m.shape[1]-1) ret = m.copy() ret[ext_index_row, ext_index_col] = True return ret ext_mask_black = dilate_8nb(mask_black) # or just using dilation in scipy # from scipy import ndimage # ext_mask_black = ndimage.binary_dilation(mask_black, structure=ndimage.generate_binary_structure(2, 2)) # Step 3: take the intersection of mask_white and ext_mask_black mask_target = mask_white & ext_mask_black # Step 4: take the index using np.where l = np.where(mask_target) # modify this type to make it consistency with your result l = list(zip(*l))
Python image processing: How do you align images that have been rotated and shifted?
Here I have some code that can vertically and horizontally shift images so that a specific feature can align (credits to https://stackoverflow.com/a/24769222/15016884): def cross_image(im1, im2): im1_gray = np.sum(im1.astype('float'), axis=2) im2_gray = np.sum(im2.astype('float'), axis=2) im1_gray -= np.mean(im1_gray) im2_gray -= np.mean(im2_gray) return signal.fftconvolve(im1_gray, im2_gray[::-1,::-1], mode='same') corr_img_null = cross_image(cloud1,cloud1) corr_img = cross_image(cloud1,cloud2) y0, x0 = np.unravel_index(np.argmax(corr_img_null), corr_img_null.shape) y, x = np.unravel_index(np.argmax(corr_img), corr_img.shape) ver_shift = y0-y hor_shift = x0-x print('horizontally shifted', hor_shift) print('vertically shifted', ver_shift) #defining the bounds of the part of the images I'm actually analyzing xstart = 100 xstop = 310 ystart = 50 ystop = 200 crop_cloud1 = cloud1[ystart:ystop, xstart:xstop] crop_cloud2 = cloud2[ystart:ystop, xstart:xstop] crop_cloud2_shift = cloud2[ystart+ver_shift:ystop+ver_shift, xstart+hor_shift:xstop+hor_shift] plot_pos = plt.figure(5) plt.title('image 1') plt.imshow(crop_cloud1) plot_pos = plt.figure(6) plt.title('image 2') plt.imshow(crop_cloud2) plot_pos = plt.figure(7) plt.title('Shifted image 2 to align with image 1') plt.imshow(crop_cloud2_shift) Here are the results: Now, I want to work with the example shown below, where rotations in addition to translations will be needed to align the features in my image. Here is my code for that: The idea is to convolve each possible configuration of image 2 for every angle from -45 to 45 (for my application, this angle is not likely to be exceeded) and find at which coordinates and rotation angle the convolution is maximized. import cv2 def rotate(img, theta): (rows, cols) = img.shape[:2] M = cv2.getRotationMatrix2D((cols / 2, rows / 2), theta, 1) res = cv2.warpAffine(img, M, (cols, rows)) return res #testing all rotations of image 2 corr_bucket = [] for i in range(-45,45): rot_img = rotate(bolt2,i) corr_img = cross_image(bolt1,rot_img) corr_bucket.append(corr_img) corr_arr = np.asarray(corr_bucket) corr_img_null = cross_image(bolt1,bolt1) y0, x0 = np.unravel_index(np.argmax(corr_img_null), corr_img_null.shape) r_index, y1, x1 = np.unravel_index(np.argmax(corr_arr), corr_arr.shape) r = -45+r_index ver_shift = y0-y hor_shift = x0-x ver_shift_r = y0-y1 hor_shift_r = x0-x1 #What parts of the image do you want to analyze xstart = 200 xstop = 300 ystart = 100 ystop = 200 crop_bolt1 = bolt1[ystart:ystop, xstart:xstop] crop_bolt2 = bolt2[ystart:ystop, xstart:xstop] rot_bolt2 = rotate(bolt2,r) shift_rot_bolt2 = rot_bolt2[ystart+ver_shift_r:ystop+ver_shift_r, xstart+hor_shift_r:xstop+hor_shift_r] plot_1 = plt.figure(9) plt.title('image 1') plt.imshow(crop_bolt1) plot_2 = plt.figure(10) plt.title('image 2') plt.imshow(crop_bolt2) plot_3 = plt.figure(11) plt.title('Shifted and rotated image 2 to align with image 1') plt.imshow(shift_rot_bolt2) Unfortunately, from the very last line, I get the error ValueError: zero-size array to reduction operation minimum which has no identity. I'm kind of new to python so I don't really know what this means or why my approach isn't working. I have a feeling that my error is somewhere in unraveling corr_arr because the x, y and r values it returns I can already see, just by estimating, would not make the lightning bolts align. Any advice?
The issue came from feeding in the entire rotated image into scipy.signal.fftconvolve. Crop a part of image2 after rotating to use as a "probe image" (crop your unrotated image 1 in the same way), and the code I have written in my question works fine.
How to code up an image stitching software for these 'simple' images?
TLDR: Need help trying to calculate overlap region between 2 graphs. So I'm trying to stitch these 2 images: Since I know that the images I will be stitching definitely come from the same image, I feel that I should be able to code this up myself. Using libraries like OpenCV feels a little like overkill for me for this task. My current idea is that I can simplify this task by doing the following steps for each image: Load image using PIL Convert image to black and white (PIL image mode āLā) [Optional: crop images to overlapping region by inspection by eye] Create vector row_sum, which is a sum of each row [Optional: log row_sum, to reduce the size of values we're working with] Plot row_sum. This would reduce the (potentially) (3*2)-dimensional problem, with 3 RGB channels for each pixel on the 2D image to a (1*2)-D problem with the black and white pixel for the 2D image instead. Then, summing across the rows reduces this to a 1D problem. I used the following code to implement the above: import matplotlib.pyplot as plt import numpy as np from PIL import Image class Stitcher(): def combine_2(self, img1, img2): # thr1, thr2 = self.get_cropped_bw(img1, 115, img2, 80) thr1, thr2 = self.get_cropped_bw(img1, 0, img2, 0) row_sum1 = np.log(thr1.sum(1)) row_sum2 = np.log(thr2.sum(1)) self.plot_4x4(thr1, thr2, row_sum1, row_sum2) def get_cropped_bw(self, img1, img1_keep_from, img2, img2_keep_till): im1 = Image.open(img1).convert("L") im2 = Image.open(img2).convert("L") data1 = (np.array(im1)[img1_keep_from:] if img1_keep_from != 0 else np.array(im1)) data2 = (np.array(im2)[:img2_keep_till] if img2_keep_till != 0 else np.array(im2)) return data1, data2 def plot_4x4(self, thr1, thr2, row_sum1, row_sum2): fig, ax = plt.subplots(2, 2, sharey="row", constrained_layout=True) ax[0, 0].imshow(thr1, cmap="Greys") ax[0, 1].imshow(thr2, cmap="Greys") ax[1, 0].plot(row_sum1, "k.") ax[1, 1].plot(row_sum2, "r.") ax[1, 0].set( xlabel="Index Value", ylabel="Row Sum", ) plt.show() imgs = (r"combine\imgs\test_image_part_1.jpg", r"combine\imgs\test_image_part_2.jpg") s = Stitcher() s.combine_2(*imgs) This gave me this graph: (I've added in those yellow boxes, to indicate the overlap regions.) This is the bit I'm stuck at. I want to find exactly: the index value of the left-side of the yellow box for the 1st image and the index value of the right-side of the yellow box for the 2nd image. I define the overlap region as the longest range for which the end of the 1st graph 'matches' the start of the 2nd graph. For the method to find the overlap region, what should I do if the row sum values aren't exactly the same (what if one is the other scaled by some factor)? I feel like this could be a problem that could use dot products to find the similarity between the 2 graphs? But I can't think of how to implement this.
I had a lot more fun with this than I expected. I wrote this using opencv, but that's just to load and show the image. Everything else is done with numpy so swapping this to PIL shouldn't be too difficult. I'm using a brute-force matcher. I also wrote a random-start hillclimber that runs in much less time, but I can't guarantee it'll find the correct answer since the gradient space isn't smooth. I won't include it in my code since it's long and janky, but if you really need the time efficiency I can add it back in later. I added a random crop and some salt and pepper noise to the images to test for robustness. The brute-force matcher operates on the idea that we don't know which section of the two images overlap, so we need to convolve the smaller image over the larger image from left to right, top to bottom. This means our search space is: horizontal = small_width + big_width vertical = small_height + big_height area = horizontal * vertical This will grow very quickly with image size. I motivate the algorithm by giving it points for having a larger overlap, but it loses more points for having differences in color for the overlapped area. Here are some pictures from an execution of this program import cv2 import numpy as np import random # randomly snips edges def randCrop(image, maxMargin): c = [random.randint(0,maxMargin) for a in range(4)]; return image[c[0]:-c[1], c[2]:-c[3]]; # adds noise to image def saltPepper(image, minNoise, maxNoise): h,w = image.shape; randNum = random.randint(minNoise, maxNoise); for a in range(randNum): x = random.randint(0, w-1); y = random.randint(0, h-1); image[y,x] = random.randint(0, 255); return image; # evaluate layout def getScore(one, two): # do raw subtraction left = one - two; right = two - one; sub = np.minimum(left, right); return np.count_nonzero(sub); # return 2d random position within range def randPos(img, big_shape): th,tw = big_shape; h,w = img.shape; x = random.randint(0, tw - w); y = random.randint(0, th - h); return [x,y]; # overlays small image onto big image def overlay(small, big, pos): # unpack h,w = small.shape; x,y = pos; # copy and place copy = big.copy(); copy[y:y+h, x:x+w] = small; return copy; # calculates overlap region def overlap(one, two, pos_one, pos_two): # unpack h1,w1 = one.shape; h2,w2 = two.shape; x1,y1 = pos_one; x2,y2 = pos_two; # set edges l1 = x1; l2 = x2; r1 = x1 + w1; r2 = x2 + w2; t1 = y1; t2 = y2; b1 = y1 + h1; b2 = y2 + h2; # go left = max(l1, l2); right = min(r1, r2); top = max(t1, t2); bottom = min(b1, b2); return [left, right, top, bottom]; # wrapper for overlay + getScore def fullScore(one, two, pos_one, pos_two, big_empty): # check positions x,y = pos_two; h,w = two.shape; th,tw = big_empty.shape; if y+h > th or x+w > tw or x < 0 or y < 0: return -99999999; # overlay temp_one = overlay(one, big_empty, pos_one); temp_two = overlay(two, big_empty, pos_two); # get overlap l,r,t,b = overlap(one, two, pos_one, pos_two); temp_one = temp_one[t:b, l:r]; temp_two = temp_two[t:b, l:r]; # score diff = getScore(temp_one, temp_two); score = (r-l) * (b-t); score -= diff*2; return score; # do brute force def bruteForce(one, two): # calculate search space # unpack size h,w = one.shape; one_size = h*w; h,w = two.shape; two_size = h*w; # small and big if one_size < two_size: small = one; big = two; else: small = two; big = one; # unpack size sh, sw = small.shape; bh, bw = big.shape; total_width = bw + sw * 2; total_height = bh + sh * 2; # set up empty images empty = np.zeros((total_height, total_width), np.uint8); # set global best best_score = -999999; best_pos = None; # start scrolling ybound = total_height - sh; xbound = total_width - sw; for y in range(ybound): print("y: " + str(y) + " || " + str(empty.shape)); for x in range(xbound): # get score score = fullScore(big, small, [sw,sh], [x,y], empty); # show # prog = overlay(big, empty, [sw,sh]); # prog = overlay(small, prog, [x,y]); # cv2.imshow("prog", prog); # cv2.waitKey(1); # compare if score > best_score: best_score = score; best_pos = [x,y]; print("best_score: " + str(best_score)); return best_pos, [sw,sh], small, big, empty; # do a step of hill climber def hillStep(one, two, best_pos, big_empty, step): # make a step new_pos = best_pos[1][:]; new_pos[0] += step[0]; new_pos[1] += step[1]; # get score return fullScore(one, two, best_pos[0], new_pos, big_empty), new_pos; # hunt around for good position # let's do a random-start hillclimber def randHill(one, two, shape): # set up empty images big_empty = np.zeros(shape, np.uint8); # set global best g_best_score = -999999; g_best_pos = None; # lets do 200 iterations iters = 200; for a in range(iters): # progress check print(str(a) + " of " + str(iters)); # start with random position h,w = two.shape[:2]; pos_one = [w,h]; pos_two = randPos(two, shape); # get score best_score = fullScore(one, two, pos_one, pos_two, big_empty); best_pos = [pos_one, pos_two]; # hill climb (only on second image) while True: # end condition: no step improves score end_flag = True; # 8-way for y in range(-1, 1+1): for x in range(-1, 1+1): if x != 0 or y != 0: # get score and update score, new_pos = hillStep(one, two, best_pos, big_empty, [x,y]); if score > best_score: best_score = score; best_pos[1] = new_pos[:]; end_flag = False; # end if end_flag: break; else: # show # prog = overlay(one, big_empty, best_pos[0]); # prog = overlay(two, prog, best_pos[1]); # cv2.imshow("prog", prog); # cv2.waitKey(1); pass; # check for new global best if best_score > g_best_score: g_best_score = best_score; g_best_pos = best_pos[:]; print("top score: " + str(g_best_score)); return g_best_score, g_best_pos; # load both images top = cv2.imread("top.jpg"); bottom = cv2.imread("bottom.jpg"); top = cv2.cvtColor(top, cv2.COLOR_BGR2GRAY); bottom = cv2.cvtColor(bottom, cv2.COLOR_BGR2GRAY); # randomly crop top = randCrop(top, 20); bottom = randCrop(bottom, 20); # randomly add noise saltPepper(top, 200, 1000); saltPepper(bottom, 200, 1000); # set up max image (assume no overlap whatsoever) tw = 0; th = 0; h, w = top.shape; tw += w; th += h; h, w = bottom.shape; tw += w*2; th += h*2; # do random-start hill climb _, best_pos = randHill(top, bottom, (th, tw)); # show empty = np.zeros((th, tw), np.uint8); pos1, pos2 = best_pos; image = overlay(top, empty, pos1); image = overlay(bottom, image, pos2); # do brute force # small_pos, big_pos, small, big, empty = bruteForce(top, bottom); # image = overlay(big, empty, big_pos); # image = overlay(small, image, small_pos); # recolor overlap h,w = empty.shape; color = np.zeros((h,w,3), np.uint8); l,r,t,b = overlap(top, bottom, pos1, pos2); color[:,:,0] = image; color[:,:,1] = image; color[:,:,2] = image; color[t:b, l:r, 0] += 100; # show images cv2.imshow("top", top); cv2.imshow("bottom", bottom); cv2.imshow("overlayed", image); cv2.imshow("Color", color); cv2.waitKey(0); Edit: I added in the random-start hillclimber
Python- Clustering Hough lines
I am working to cluster probabilistic hough lines together using unit vectors. The clustering changes every run though and is not quite right. I want to cluster the lines of [this image][2]. But I am getting this clustering and it changes drastically every run though. I know the probabilistic hough changes slightly every run but I would like the keep the merged lines pretty consistent. Is the problem with the way I am calculating unit-vector or DBSCAN or is there a better way to do clustering. Any help would be appreciated. line_dict = [] # using hough lines thru skimage.transform- probabilistic_hough_line for line in lines: meta_lines = {} start_point, end_point = line # line equations and add line info to line dictionary meta_lines["start"] = start_point meta_lines["end"] = end_point distance = [end_point[0] - start_point[0], end_point[1] - start_point[1]] norm = math.sqrt(distance[0] ** 2 + distance[1] ** 2) direction = [distance[0] / norm, distance[1] / norm] meta_lines["unit-vector"] = direction line_dict.append(meta_lines) #clustering of lines using DBSCAN X = StandardScaler().fit_transform([x["unit-vector"] for x in line_dict]) db = DBSCAN(eps=0.2, min_samples=1).fit(X) core_samples_mask = np.zeros_like(db.labels_, dtype=bool) core_samples_mask[db.core_sample_indices_] = True labels = db.labels_ # Number of clusters in labels, ignoring noise if present. n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0) clusters = [X[labels == i] for i in range(n_clusters_)] #cluster start/end poitns of lines for c in range(len(clusters)): for i in range(len(line_dict)): line_dict[i]["scale"] = X[i] if line_dict[i]["scale"] in clusters[c]: line_dict[i]["cluster"] = c line_dict.sort(key=itemgetter("cluster")) cluster_lines = [] for key, group in itertools.groupby(line_dict, lambda item: item["cluster"]): cluster_lines.append([(i["start"], i["end"]) for i in group]) merged_lines = [] for i in cluster_lines: points = [] for x in i: p0, p1 = x points.extend((p0, p1)) # sort points and use min/max for endpoints of line k = sorted(points) merged_lines.append([k[0],k[-1]]) Edit: Original Image (I am low rep on stackoverflow so I can only post 2 images, removed the original one with the hough lines on image. Hough line code: from skimage.transform import probabilistic_hough_line #Img is grayscale image thresh = threshold_otsu(img) binary = img > thresh binary = np.invert(binary) skel = skeletonize(binary) # skeletonize image lines = probabilistic_hough_line(skel, threshold=5, line_length=10, line_gap=5)
Python: Image Segmentation as pre-process for Classification
What technique do you recommend to segment the characters in this image to be ready to fed a model like the ones use with MNIST dataset; because they take one character at a time. This question is regadless the importance of transforming the image and the binarization of it. Thanks!
As a starting point i would try the following: Use OTSU threshold. Than do some morphological operations to get rid of noise and to isolate each digit. Run connected component labling. Fed each connected component to your classifier to get recognize the digit if the classification score is low discard. Final validation you expect all the digit to be more or less on line and in more or less some constant distance from each other. Here are the first 4 stages. Now you need to add your recognition software to recognize the digits. import cv2 import numpy as np from matplotlib import pyplot as plt # Params EPSSILON = 0.4 MIN_AREA = 10 BIG_AREA = 75 # Read img img = cv2.imread('i.jpg',0) # Otzu threshold a,thI = cv2.threshold(img,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU) # Morpholgical se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(1,1)) thIMor = cv2.morphologyEx(thI,cv2.MORPH_CLOSE,se) # Connected compoent labling stats = cv2.connectedComponentsWithStats(thIMor,connectivity=8) num_labels = stats[0] labels = stats[1] labelStats = stats[2] # We expect the conneccted compoennt of the numbers to be more or less with a constats ratio # So we find the medina ratio of all the comeonets because the majorty of connected compoent are numbers ratios = [] for label in range(num_labels): connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH] connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT] ratios.append(float(connectedCompoentWidth)/float(connectedCompoentHeight)) # Find median ratio medianRatio = np.median(np.asarray(ratios)) # Go over all the connected component again and filter out compoennt that are far from the ratio filterdI = np.zeros_like(thIMor) filterdI[labels!=0] = 255 for label in range(num_labels): # Ignore biggest label if(label==1): filterdI[labels == label] = 0 continue connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH] connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT] ratio = float(connectedCompoentWidth)/float(connectedCompoentHeight) if ratio > medianRatio + EPSSILON or ratio < medianRatio - EPSSILON: filterdI[labels==label] = 0 # Filter small or large compoennt if labelStats[label,cv2.CC_STAT_AREA] < MIN_AREA or labelStats[label,cv2.CC_STAT_AREA] > BIG_AREA: filterdI[labels == label] = 0 plt.imshow(filterdI) # Now go over each of the left compoenet and run the number recognotion stats = cv2.connectedComponentsWithStats(filterdI,connectivity=8) num_labels = stats[0] labels = stats[1] labelStats = stats[2] for label in range(num_labels): # Crop the bounding box around the component left = labelStats[label,cv2.CC_STAT_LEFT] top = labelStats[label, cv2.CC_STAT_TOP] width = labelStats[label, cv2.CC_STAT_WIDTH] height = labelStats[label, cv2.CC_STAT_HEIGHT] candidateDigit = labels[top:top+height,left:left+width] # plt.figure(label) # plt.imshow(candidateDigit)
I connect to the Amitay answer. For the 2: I would use thinning as morphological operation (look thinning algorithm in opencv) For the 3: And in OpenCV 3.0 there is already a function called cv::connectedComponents) Hope it helps