Python: Image Segmentation as pre-process for Classification - python

What technique do you recommend to segment the characters in this image to be ready to fed a model like the ones use with MNIST dataset; because they take one character at a time. This question is regadless the importance of transforming the image and the binarization of it.
Thanks!

As a starting point i would try the following:
Use OTSU threshold.
Than do some morphological operations to get rid of noise and to isolate each digit.
Run connected component labling.
Fed each connected component to your classifier to get recognize the digit if the classification score is low discard.
Final validation you expect all the digit to be more or less on line and in more or less some constant distance from each other.
Here are the first 4 stages. Now you need to add your recognition software to recognize the digits.
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Params
EPSSILON = 0.4
MIN_AREA = 10
BIG_AREA = 75
# Read img
img = cv2.imread('i.jpg',0)
# Otzu threshold
a,thI = cv2.threshold(img,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# Morpholgical
se = cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(1,1))
thIMor = cv2.morphologyEx(thI,cv2.MORPH_CLOSE,se)
# Connected compoent labling
stats = cv2.connectedComponentsWithStats(thIMor,connectivity=8)
num_labels = stats[0]
labels = stats[1]
labelStats = stats[2]
# We expect the conneccted compoennt of the numbers to be more or less with a constats ratio
# So we find the medina ratio of all the comeonets because the majorty of connected compoent are numbers
ratios = []
for label in range(num_labels):
connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH]
connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT]
ratios.append(float(connectedCompoentWidth)/float(connectedCompoentHeight))
# Find median ratio
medianRatio = np.median(np.asarray(ratios))
# Go over all the connected component again and filter out compoennt that are far from the ratio
filterdI = np.zeros_like(thIMor)
filterdI[labels!=0] = 255
for label in range(num_labels):
# Ignore biggest label
if(label==1):
filterdI[labels == label] = 0
continue
connectedCompoentWidth = labelStats[label,cv2.CC_STAT_WIDTH]
connectedCompoentHeight = labelStats[label, cv2.CC_STAT_HEIGHT]
ratio = float(connectedCompoentWidth)/float(connectedCompoentHeight)
if ratio > medianRatio + EPSSILON or ratio < medianRatio - EPSSILON:
filterdI[labels==label] = 0
# Filter small or large compoennt
if labelStats[label,cv2.CC_STAT_AREA] < MIN_AREA or labelStats[label,cv2.CC_STAT_AREA] > BIG_AREA:
filterdI[labels == label] = 0
plt.imshow(filterdI)
# Now go over each of the left compoenet and run the number recognotion
stats = cv2.connectedComponentsWithStats(filterdI,connectivity=8)
num_labels = stats[0]
labels = stats[1]
labelStats = stats[2]
for label in range(num_labels):
# Crop the bounding box around the component
left = labelStats[label,cv2.CC_STAT_LEFT]
top = labelStats[label, cv2.CC_STAT_TOP]
width = labelStats[label, cv2.CC_STAT_WIDTH]
height = labelStats[label, cv2.CC_STAT_HEIGHT]
candidateDigit = labels[top:top+height,left:left+width]
# plt.figure(label)
# plt.imshow(candidateDigit)

I connect to the Amitay answer.
For the 2:
I would use thinning as morphological operation (look thinning algorithm in opencv)
For the 3:
And in OpenCV 3.0 there is already a function called cv::connectedComponents)
Hope it helps

Related

How to code up an image stitching software for these 'simple' images?

TLDR:
Need help trying to calculate overlap region between 2 graphs.
So I'm trying to stitch these 2 images:
Since I know that the images I will be stitching definitely come from the same image, I feel that I should be able to code this up myself. Using libraries like OpenCV feels a little like overkill for me for this task.
My current idea is that I can simplify this task by doing the following steps for each image:
Load image using PIL
Convert image to black and white (PIL image mode ā€œLā€)
[Optional: crop images to overlapping region by inspection by eye]
Create vector row_sum, which is a sum of each row
[Optional: log row_sum, to reduce the size of values we're working with]
Plot row_sum.
This would reduce the (potentially) (3*2)-dimensional problem, with 3 RGB channels for each pixel on the 2D image to a (1*2)-D problem with the black and white pixel for the 2D image instead. Then, summing across the rows reduces this to a 1D problem.
I used the following code to implement the above:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
class Stitcher():
def combine_2(self, img1, img2):
# thr1, thr2 = self.get_cropped_bw(img1, 115, img2, 80)
thr1, thr2 = self.get_cropped_bw(img1, 0, img2, 0)
row_sum1 = np.log(thr1.sum(1))
row_sum2 = np.log(thr2.sum(1))
self.plot_4x4(thr1, thr2, row_sum1, row_sum2)
def get_cropped_bw(self, img1, img1_keep_from, img2, img2_keep_till):
im1 = Image.open(img1).convert("L")
im2 = Image.open(img2).convert("L")
data1 = (np.array(im1)[img1_keep_from:]
if img1_keep_from != 0 else np.array(im1))
data2 = (np.array(im2)[:img2_keep_till]
if img2_keep_till != 0 else np.array(im2))
return data1, data2
def plot_4x4(self, thr1, thr2, row_sum1, row_sum2):
fig, ax = plt.subplots(2, 2, sharey="row", constrained_layout=True)
ax[0, 0].imshow(thr1, cmap="Greys")
ax[0, 1].imshow(thr2, cmap="Greys")
ax[1, 0].plot(row_sum1, "k.")
ax[1, 1].plot(row_sum2, "r.")
ax[1, 0].set(
xlabel="Index Value",
ylabel="Row Sum",
)
plt.show()
imgs = (r"combine\imgs\test_image_part_1.jpg",
r"combine\imgs\test_image_part_2.jpg")
s = Stitcher()
s.combine_2(*imgs)
This gave me this graph:
(I've added in those yellow boxes, to indicate the overlap regions.)
This is the bit I'm stuck at. I want to find exactly:
the index value of the left-side of the yellow box for the 1st image and
the index value of the right-side of the yellow box for the 2nd image.
I define the overlap region as the longest range for which the end of the 1st graph 'matches' the start of the 2nd graph. For the method to find the overlap region, what should I do if the row sum values aren't exactly the same (what if one is the other scaled by some factor)?
I feel like this could be a problem that could use dot products to find the similarity between the 2 graphs? But I can't think of how to implement this.
I had a lot more fun with this than I expected. I wrote this using opencv, but that's just to load and show the image. Everything else is done with numpy so swapping this to PIL shouldn't be too difficult.
I'm using a brute-force matcher. I also wrote a random-start hillclimber that runs in much less time, but I can't guarantee it'll find the correct answer since the gradient space isn't smooth. I won't include it in my code since it's long and janky, but if you really need the time efficiency I can add it back in later.
I added a random crop and some salt and pepper noise to the images to test for robustness.
The brute-force matcher operates on the idea that we don't know which section of the two images overlap, so we need to convolve the smaller image over the larger image from left to right, top to bottom. This means our search space is:
horizontal = small_width + big_width
vertical = small_height + big_height
area = horizontal * vertical
This will grow very quickly with image size. I motivate the algorithm by giving it points for having a larger overlap, but it loses more points for having differences in color for the overlapped area.
Here are some pictures from an execution of this program
import cv2
import numpy as np
import random
# randomly snips edges
def randCrop(image, maxMargin):
c = [random.randint(0,maxMargin) for a in range(4)];
return image[c[0]:-c[1], c[2]:-c[3]];
# adds noise to image
def saltPepper(image, minNoise, maxNoise):
h,w = image.shape;
randNum = random.randint(minNoise, maxNoise);
for a in range(randNum):
x = random.randint(0, w-1);
y = random.randint(0, h-1);
image[y,x] = random.randint(0, 255);
return image;
# evaluate layout
def getScore(one, two):
# do raw subtraction
left = one - two;
right = two - one;
sub = np.minimum(left, right);
return np.count_nonzero(sub);
# return 2d random position within range
def randPos(img, big_shape):
th,tw = big_shape;
h,w = img.shape;
x = random.randint(0, tw - w);
y = random.randint(0, th - h);
return [x,y];
# overlays small image onto big image
def overlay(small, big, pos):
# unpack
h,w = small.shape;
x,y = pos;
# copy and place
copy = big.copy();
copy[y:y+h, x:x+w] = small;
return copy;
# calculates overlap region
def overlap(one, two, pos_one, pos_two):
# unpack
h1,w1 = one.shape;
h2,w2 = two.shape;
x1,y1 = pos_one;
x2,y2 = pos_two;
# set edges
l1 = x1;
l2 = x2;
r1 = x1 + w1;
r2 = x2 + w2;
t1 = y1;
t2 = y2;
b1 = y1 + h1;
b2 = y2 + h2;
# go
left = max(l1, l2);
right = min(r1, r2);
top = max(t1, t2);
bottom = min(b1, b2);
return [left, right, top, bottom];
# wrapper for overlay + getScore
def fullScore(one, two, pos_one, pos_two, big_empty):
# check positions
x,y = pos_two;
h,w = two.shape;
th,tw = big_empty.shape;
if y+h > th or x+w > tw or x < 0 or y < 0:
return -99999999;
# overlay
temp_one = overlay(one, big_empty, pos_one);
temp_two = overlay(two, big_empty, pos_two);
# get overlap
l,r,t,b = overlap(one, two, pos_one, pos_two);
temp_one = temp_one[t:b, l:r];
temp_two = temp_two[t:b, l:r];
# score
diff = getScore(temp_one, temp_two);
score = (r-l) * (b-t);
score -= diff*2;
return score;
# do brute force
def bruteForce(one, two):
# calculate search space
# unpack size
h,w = one.shape;
one_size = h*w;
h,w = two.shape;
two_size = h*w;
# small and big
if one_size < two_size:
small = one;
big = two;
else:
small = two;
big = one;
# unpack size
sh, sw = small.shape;
bh, bw = big.shape;
total_width = bw + sw * 2;
total_height = bh + sh * 2;
# set up empty images
empty = np.zeros((total_height, total_width), np.uint8);
# set global best
best_score = -999999;
best_pos = None;
# start scrolling
ybound = total_height - sh;
xbound = total_width - sw;
for y in range(ybound):
print("y: " + str(y) + " || " + str(empty.shape));
for x in range(xbound):
# get score
score = fullScore(big, small, [sw,sh], [x,y], empty);
# show
# prog = overlay(big, empty, [sw,sh]);
# prog = overlay(small, prog, [x,y]);
# cv2.imshow("prog", prog);
# cv2.waitKey(1);
# compare
if score > best_score:
best_score = score;
best_pos = [x,y];
print("best_score: " + str(best_score));
return best_pos, [sw,sh], small, big, empty;
# do a step of hill climber
def hillStep(one, two, best_pos, big_empty, step):
# make a step
new_pos = best_pos[1][:];
new_pos[0] += step[0];
new_pos[1] += step[1];
# get score
return fullScore(one, two, best_pos[0], new_pos, big_empty), new_pos;
# hunt around for good position
# let's do a random-start hillclimber
def randHill(one, two, shape):
# set up empty images
big_empty = np.zeros(shape, np.uint8);
# set global best
g_best_score = -999999;
g_best_pos = None;
# lets do 200 iterations
iters = 200;
for a in range(iters):
# progress check
print(str(a) + " of " + str(iters));
# start with random position
h,w = two.shape[:2];
pos_one = [w,h];
pos_two = randPos(two, shape);
# get score
best_score = fullScore(one, two, pos_one, pos_two, big_empty);
best_pos = [pos_one, pos_two];
# hill climb (only on second image)
while True:
# end condition: no step improves score
end_flag = True;
# 8-way
for y in range(-1, 1+1):
for x in range(-1, 1+1):
if x != 0 or y != 0:
# get score and update
score, new_pos = hillStep(one, two, best_pos, big_empty, [x,y]);
if score > best_score:
best_score = score;
best_pos[1] = new_pos[:];
end_flag = False;
# end
if end_flag:
break;
else:
# show
# prog = overlay(one, big_empty, best_pos[0]);
# prog = overlay(two, prog, best_pos[1]);
# cv2.imshow("prog", prog);
# cv2.waitKey(1);
pass;
# check for new global best
if best_score > g_best_score:
g_best_score = best_score;
g_best_pos = best_pos[:];
print("top score: " + str(g_best_score));
return g_best_score, g_best_pos;
# load both images
top = cv2.imread("top.jpg");
bottom = cv2.imread("bottom.jpg");
top = cv2.cvtColor(top, cv2.COLOR_BGR2GRAY);
bottom = cv2.cvtColor(bottom, cv2.COLOR_BGR2GRAY);
# randomly crop
top = randCrop(top, 20);
bottom = randCrop(bottom, 20);
# randomly add noise
saltPepper(top, 200, 1000);
saltPepper(bottom, 200, 1000);
# set up max image (assume no overlap whatsoever)
tw = 0;
th = 0;
h, w = top.shape;
tw += w;
th += h;
h, w = bottom.shape;
tw += w*2;
th += h*2;
# do random-start hill climb
_, best_pos = randHill(top, bottom, (th, tw));
# show
empty = np.zeros((th, tw), np.uint8);
pos1, pos2 = best_pos;
image = overlay(top, empty, pos1);
image = overlay(bottom, image, pos2);
# do brute force
# small_pos, big_pos, small, big, empty = bruteForce(top, bottom);
# image = overlay(big, empty, big_pos);
# image = overlay(small, image, small_pos);
# recolor overlap
h,w = empty.shape;
color = np.zeros((h,w,3), np.uint8);
l,r,t,b = overlap(top, bottom, pos1, pos2);
color[:,:,0] = image;
color[:,:,1] = image;
color[:,:,2] = image;
color[t:b, l:r, 0] += 100;
# show images
cv2.imshow("top", top);
cv2.imshow("bottom", bottom);
cv2.imshow("overlayed", image);
cv2.imshow("Color", color);
cv2.waitKey(0);
Edit: I added in the random-start hillclimber

Extract the positions of their maximum pixel value of an image

I am a newbie here. I am trying to get a single line of the edge of the 2D flame then I can calculate the actual area - 3D flame area. The first thing is getting the edge. The 2D flame is sort of side-viewed concave flame, so the flame base (flat part) is brighter than the concave segment. I use the code below the find the edge, my method is finding the maximum pixel value follow the y-axis. The result seems not to get my purpose, could you please help me figure out? Thanks very much in advance.
Original image In the code I rotate the image
from PIL import Image
import numpy as np
import cv2
def initialization_rotate(path):
global h,w,img
img4 = np.array(Image.open(path).convert('L'))
img3 = img4.transpose(1,0)
img2 = img3[::-1,::1]
img = img2[400:1000,1:248]
h, w = img.shape
path = 'D:\\20190520\\14\\14\\1767.jpg'
#Noise cancellation
def opening(binary):
opened = np.zeros_like(binary)
for j in range(1,w-1):
for i in range(1,h-1):
if binary[i][j]> 100:
n1 = binary[i-1][j-1]
n2 = binary[i-1][j]
n3 = binary[i-1][j+1]
n4 = binary[i][j-1]
n5 = binary[i][j+1]
n6 = binary[i+1][j-1]
n7 = binary[i+1][j]
n8 = binary[i+1][j+1]
sum8 = int(n1) + int(n2) + int(n3) + int(n4) + int(n5) + int(n6) + int(n7) + int(n8)
if sum8 < 1000:
opened[i][j] = 0
else:
opened[i][j] = 255
else:
pass
return opened
edge = np.zeros_like(img)
# Find the max pixel value and extract the postion
for j in range(w-1):
ys = [0]
ymax = []
for i in range(h-1):
if img[i][j] > 100:
ys.append(i)
else:
pass
ymax = np.amax(ys)
edge[ymax][j] = 255
cv2.namedWindow('edge')
while(True):
cv2.imshow('edge',edge)
k = cv2.waitKey(1) & 0xFF
if k == 27:
break
cv2.destroyAllWindows()
I have done a very quick coding and from ground up (without looking into established or state of art algorithms on edge detection). Not very suprisingly, the results are very poor. The code that I have pasted below will work only for RGB (i.e. only for three channels and not for images that are CMYK, grey-scale or RGBA or anything else). Also I have tested on single very simplistic image. In real life the images are complicated. I don't think it will fair very well there, yet. It needs a lot of work. However I am, hesitatingly, sharing it since it was requested by #Gia Tri.
Here is what I did. For every column I calculated the mean intensities and stddev intensities. I hoped that at the edge there will be change in the intensities from the average +- stdev (multiplied by a factor). If I mark the first and last in the column, I will have edge for every column and hopfully, once I stitch it, it will form and edge. The code and the attached image is for you to see, how I fared.
from scipy import ndimage
import numpy as np
import matplotlib.pyplot as plt
UppperStdBoundaryMultiplier = 1.0
LowerStdBoundaryMultiplier = 1.0
NegativeSelection = False
def SumSquareRGBintensityOfPixel(Pixel):
return np.sum(np.power(Pixel,2),axis=0)
def GetTheContinousStretchForAcolumn(Column):
global UppperStdBoundaryMultiplier
global LowerStdBoundaryMultiplier
global NegativeSelection
SumSquaresIntensityOfColumn = np.apply_along_axis(SumSquareRGBintensityOfPixel,1,Column)
Mean = np.mean(SumSquaresIntensityOfColumn)
StdDev = np.std(SumSquaresIntensityOfColumn)
LowerThreshold = Mean - LowerStdBoundaryMultiplier*StdDev
UpperThreshold = Mean + UppperStdBoundaryMultiplier*StdDev
if NegativeSelection:
Index = np.where(SumSquaresIntensityOfColumn < LowerThreshold)
Column[Index,:] = np.array([255,255,255])
else:
Index = np.where(SumSquaresIntensityOfColumn >= LowerThreshold)
LeastIndex = Index[Index==True][0]
LastIndex = Index[Index==True][-1]
Column[[LeastIndex,LastIndex],:] = np.array([255,0,0])
return Column
def DoEdgeDetection(ImageFilePath):
FileHandle = ndimage.imread(ImageFilePath)
for Column in range(FileHandle.shape[1]):
FileHandle[:,Column,:] = GetTheContinousStretchForAcolumn(FileHandle[:,Column,:])
plt.imshow(FileHandle)
plt.show()
DoEdgeDetection("/PathToImage/Image_1.jpg")
And below is the result. To the left is the query image whose edge had to be detected and to the right is the edge detected image. Edge points are marked in red dots. As you can see it fared poorly but with some investment of time and thinking, it might do far better ... or may be not. May be it is a good start but far from finish .. You, please, be the judge!
***** Edit after clarification on requirement from GiaTri ***************
So I did manage to change the program, the idea remained same. However this time the problem is overly simplified to the case that you want to detect only blue flame. Actually I went ahead and made it functional for all three color channels. However I doubt, it will be useful to you beyond blue channel.
**How to use the program below **
If your flame is vertical then choose edges = "horizontal" in the class asignment. If your edges are horizontal then choose edges = "vertical". This might be a little confusing but for the time being please use it. Later either you can change it or I can change it.
So first let me convince you that the edge detection is working much better than yesterday. See the two images below. I have taken these two flame images from internet. As before the image whose edge has to be detected is on the left and on the right is the edge-detected image. The edges are in red dot.
First horizontal flame.
and then a vertical flame.
.
There is still a lot of work left in this. However if you are a little more convinced than yesterday, then below is the code.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.image import imread
class DetectEdges():
def __init__(self, ImagePath, Channel = ["blue"], edges="vertical"):
self.Channel = Channel
self.edges = edges
self.Image_ = imread(ImagePath)
self.Image = np.copy(self.Image_)
self.Dimensions_X, self.Dimensions_Y, self.Channels = self.Image.shape
self.BackGroundSamplingPercentage = 0.5
def ShowTheImage(self):
plt.imshow(self.Image)
plt.show()
def GetTheBackGroundPixels(self):
NumberOfPoints = int(self.BackGroundSamplingPercentage*min(self.Dimensions_X, self.Dimensions_Y))
Random_X = np.random.choice(self.Dimensions_X, size=NumberOfPoints, replace=False)
Random_Y = np.random.choice(self.Dimensions_Y, size=NumberOfPoints, replace=False)
Random_Pixels = np.array(list(zip(Random_X,Random_Y)))
return Random_Pixels
def GetTheChannelEdge(self):
BackGroundPixels = self.GetTheBackGroundPixels()
if self.edges == "vertical":
if self.Channel == ["blue"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],2])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,2]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [255,0,0]
if self.Channel == ["red"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],0])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,0]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [0,255,0]
if self.Channel == ["green"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],1])
for column in range(self.Dimensions_Y):
PixelsAboveBackGround = np.where(self.Image[:,column,1]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
TopPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
BottomPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[[TopPixel,BottomPixel],column,:] = [255,0,0]
elif self.edges=="horizontal":
if self.Channel == ["blue"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],2])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,2]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [255,0,0]
if self.Channel == ["red"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],0])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,0]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [0,255,0]
if self.Channel == ["green"]:
MeanBackGroundInensity = np.mean(self.Image[BackGroundPixels[:,0],BackGroundPixels[:,1],1])
for row in range(self.Dimensions_X):
PixelsAboveBackGround = np.where(self.Image[row,:,1]>MeanBackGroundInensity)
if PixelsAboveBackGround[PixelsAboveBackGround==True].shape[0] > 0:
LeftPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][0]
RightPixel = PixelsAboveBackGround[PixelsAboveBackGround==True][-1]
self.Image[row,[LeftPixel,RightPixel],:] = [255,0,0]
Test = DetectEdges("FlameImagePath",Channel = ["blue"],edges="vertical")
Test.GetTheChannelEdge()
Test.ShowTheImage()
Please let me know if this was of any "more" help or I missed some salient requirements.
Best wishes,
By the way, Amit, I would like to show my code using the idea of the thresholding pixel value. I would love to discuss with you.
if __name__ == '__main__':
path = 'D:\\20181229__\\7\\Area 7\\1767.jpg'
img1 = cv2.imread(path)
b,g,r = cv2.split(img1)
img3 = b[94:223, 600:700]
img4 = cv2.flip(img3, 1)
h,w = img3.shape
data = []
th_val = 20
for i in range(h):
for j in range(w):
val = img3[i, -j]
if (val >= th_val):
data.append(j)
break
x = range(len(data))
plt.figure(figsize = (10, 7))
plt.subplot(121)
plt.imshow(img4)
plt.plot(data, x)
plt.subplot(121)
plt.plot(data, x)
please see the link for the result. The thing is the method still not fit totally my desire. I hope a discussion with you.
Link: https://imgur.com/QtNk7c7

Python- Clustering Hough lines

I am working to cluster probabilistic hough lines together using unit vectors.
The clustering changes every run though and is not quite right. I want to cluster the lines of [this image][2]. But I am getting this clustering and it changes drastically every run though.
I know the probabilistic hough changes slightly every run but I would like the keep the merged lines pretty consistent. Is the problem with the way I am calculating unit-vector or DBSCAN or is there a better way to do clustering. Any help would be appreciated.
line_dict = []
# using hough lines thru skimage.transform- probabilistic_hough_line
for line in lines:
meta_lines = {}
start_point, end_point = line
# line equations and add line info to line dictionary
meta_lines["start"] = start_point
meta_lines["end"] = end_point
distance = [end_point[0] - start_point[0], end_point[1] - start_point[1]]
norm = math.sqrt(distance[0] ** 2 + distance[1] ** 2)
direction = [distance[0] / norm, distance[1] / norm]
meta_lines["unit-vector"] = direction
line_dict.append(meta_lines)
#clustering of lines using DBSCAN
X = StandardScaler().fit_transform([x["unit-vector"] for x in line_dict])
db = DBSCAN(eps=0.2, min_samples=1).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
clusters = [X[labels == i] for i in range(n_clusters_)]
#cluster start/end poitns of lines
for c in range(len(clusters)):
for i in range(len(line_dict)):
line_dict[i]["scale"] = X[i]
if line_dict[i]["scale"] in clusters[c]:
line_dict[i]["cluster"] = c
line_dict.sort(key=itemgetter("cluster"))
cluster_lines = []
for key, group in itertools.groupby(line_dict, lambda item: item["cluster"]):
cluster_lines.append([(i["start"], i["end"]) for i in group])
merged_lines = []
for i in cluster_lines:
points = []
for x in i:
p0, p1 = x
points.extend((p0, p1))
# sort points and use min/max for endpoints of line
k = sorted(points)
merged_lines.append([k[0],k[-1]])
Edit:
Original Image (I am low rep on stackoverflow so I can only post 2 images, removed the original one with the hough lines on image.
Hough line code:
from skimage.transform import probabilistic_hough_line
#Img is grayscale image
thresh = threshold_otsu(img)
binary = img > thresh
binary = np.invert(binary)
skel = skeletonize(binary) # skeletonize image
lines = probabilistic_hough_line(skel,
threshold=5,
line_length=10,
line_gap=5)

Display images with bounding boxes while running py-faster-rcnn using VGG_CNN_M_1024

I am using the demo.pygiven in https://github.com/rbgirshick/py-faster-rcnn/tree/master/tools.
I have modified the code to run VGG_CNN_M_1024 as I am using a 2GB GPU. And as per the comments given in https://github.com/rbgirshick/fast-rcnn/issues/2, I chose to run VGG_CNN_M_1024.caffemodel instead of VGG16_faster_rcnn_final.caffemodel
This is the code in demo.py:
#!/usr/bin/env python
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""
Demo script showing detections in sample images.
See README.md for installation instructions before running.
"""
import _init_paths
from fast_rcnn.config import cfg
from fast_rcnn.test import im_detect
from fast_rcnn.nms_wrapper import nms
from utils.timer import Timer
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as sio
import caffe, os, sys, cv2
import argparse
CLASSES = ('__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
NETS = {'vgg16': ('VGG16',
'VGG16_faster_rcnn_final.caffemodel'),
'zf': ('ZF',
'ZF_faster_rcnn_final.caffemodel'),
'vgg16_m_1024':('VGG_CNN_M_1024','VGG_CNN_M_1024.caffemodel')}
def vis_detections(im, class_name, dets, thresh=0.5):
"""Draw detected bounding boxes."""
inds = np.where(dets[:, -1] >= thresh)[0]
if len(inds) == 0:
return
im = im[:, :, (2, 1, 0)]
fig, ax = plt.subplots(figsize=(12, 12))
ax.imshow(im, aspect='equal')
for i in inds:
bbox = dets[i, :4]
score = dets[i, -1]
ax.add_patch(
plt.Rectangle((bbox[0], bbox[1]),
bbox[2] - bbox[0],
bbox[3] - bbox[1], fill=False,
edgecolor='red', linewidth=3.5)
)
ax.text(bbox[0], bbox[1] - 2,
'{:s} {:.3f}'.format(class_name, score),
bbox=dict(facecolor='blue', alpha=0.5),
fontsize=14, color='white')
ax.set_title(('{} detections with '
'p({} | box) >= {:.1f}').format(class_name, class_name,
thresh),
fontsize=14)
plt.axis('off')
plt.tight_layout()
plt.draw()
def demo(net, image_name):
"""Detect object classes in an image using pre-computed object proposals."""
# Load the demo image
im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name)
im = cv2.imread(im_file)
# Detect all object classes and regress object bounds
timer = Timer()
timer.tic()
scores, boxes = im_detect(net, im)
timer.toc()
print ('Detection took {:.3f}s for '
'{:d} object proposals').format(timer.total_time, boxes.shape[0])
# Visualize detections for each class
CONF_THRESH = 0.8
NMS_THRESH = 0.3
for cls_ind, cls in enumerate(CLASSES[1:]):
cls_ind += 1 # because we skipped background
cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)]
cls_scores = scores[:, cls_ind]
dets = np.hstack((cls_boxes,
cls_scores[:, np.newaxis])).astype(np.float32)
keep = nms(dets, NMS_THRESH)
dets = dets[keep, :]
vis_detections(im, cls, dets, thresh=CONF_THRESH)
def parse_args():
"""Parse input arguments."""
parser = argparse.ArgumentParser(description='Faster R-CNN demo')
parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]',
default=0, type=int)
parser.add_argument('--cpu', dest='cpu_mode',
help='Use CPU mode (overrides --gpu)',
action='store_true')
parser.add_argument('--net', dest='demo_net', help='Network to use [vgg16]',
choices=NETS.keys(), default='vgg16_m_1024')
args = parser.parse_args()
return args
if __name__ == '__main__':
cfg.TEST.HAS_RPN = True # Use RPN for proposals
args = parse_args()
prototxt = os.path.join(cfg.MODELS_DIR, NETS[args.demo_net][0],
'faster_rcnn_alt_opt', 'faster_rcnn_test.pt')
caffemodel = os.path.join(cfg.DATA_DIR, 'faster_rcnn_models',
NETS[args.demo_net][1])
if not os.path.isfile(caffemodel):
raise IOError(('{:s} not found.\nDid you run ./data/script/'
'fetch_faster_rcnn_models.sh?').format(caffemodel))
if args.cpu_mode:
caffe.set_mode_cpu()
else:
caffe.set_mode_gpu()
caffe.set_device(args.gpu_id)
cfg.GPU_ID = args.gpu_id
net = caffe.Net(prototxt, caffemodel, caffe.TEST)
print '\n\nLoaded network {:s}'.format(caffemodel)
# Warmup on a dummy image
im = 128 * np.ones((300, 500, 3), dtype=np.uint8)
for i in xrange(2):
_, _= im_detect(net, im)
im_names = ['000456.jpg', '000542.jpg', '001150.jpg',
'001763.jpg', '004545.jpg']
for im_name in im_names:
print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
print 'Demo for data/demo/{}'.format(im_name)
demo(net, im_name)
plt.show()
And this is config.py
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Fast R-CNN config system.
This file specifies default config options for Fast R-CNN. You should not
change values in this file. Instead, you should write a config file (in yaml)
and use cfg_from_file(yaml_file) to load it and override the default options.
Most tools in $ROOT/tools take a --cfg option to specify an override file.
- See tools/{train,test}_net.py for example code that uses cfg_from_file()
- See experiments/cfgs/*.yml for example YAML config override files
"""
import os
import os.path as osp
import numpy as np
# `pip install easydict` if you don't have it
from easydict import EasyDict as edict
__C = edict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
#
# Training options
#
__C.TRAIN = edict()
# Scales to use during training (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
__C.TRAIN.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 1000
# Images to use per minibatch
__C.TRAIN.IMS_PER_BATCH = 2
# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128
# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25
# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
# Overlap threshold for a ROI to be considered background (class = 0 if
# overlap in [LO, HI))
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1
# Use horizontally-flipped images during training?
__C.TRAIN.USE_FLIPPED = True
# Train bounding-box regressors
__C.TRAIN.BBOX_REG = True
# Overlap required between a ROI and ground-truth box in order for that ROI to
# be used as a bounding-box regression training example
__C.TRAIN.BBOX_THRESH = 0.5
# Iterations between snapshots
__C.TRAIN.SNAPSHOT_ITERS = 10000
# solver.prototxt specifies the snapshot path prefix, this adds an optional
# infix to yield the path: <prefix>[_<infix>]_iters_XYZ.caffemodel
__C.TRAIN.SNAPSHOT_INFIX = ''
# Use a prefetch thread in roi_data_layer.layer
# So far I haven't found this useful; likely more engineering work is required
__C.TRAIN.USE_PREFETCH = False
# Normalize the targets (subtract empirical mean, divide by empirical stddev)
__C.TRAIN.BBOX_NORMALIZE_TARGETS = True
# Deprecated (inside weights)
__C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Normalize the targets using "precomputed" (or made up) means and stdevs
# (BBOX_NORMALIZE_TARGETS must also be True)
__C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = False
__C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)
__C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
# Train using these proposals
__C.TRAIN.PROPOSAL_METHOD = 'selective_search'
# Make minibatches from images that have similar aspect ratios (i.e. both
# tall and thin or both short and wide) in order to avoid wasting computation
# on zero-padding.
__C.TRAIN.ASPECT_GROUPING = True
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = False
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
# If an anchor statisfied by positive and negative conditions set to negative
__C.TRAIN.RPN_CLOBBER_POSITIVES = False
# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# NMS threshold used on RPN proposals
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
__C.TRAIN.RPN_MIN_SIZE = 16
# Deprecated (outside weights)
__C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Give the positive RPN examples weight of p * 1 / {num positives}
# and give negatives a weight of (1 - p)
# Set to -1.0 to use uniform example weighting
__C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
#
# Testing options
#
__C.TEST = edict()
# Scales to use during testing (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
__C.TEST.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TEST.MAX_SIZE = 1000
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Experimental: treat the (K+1) units in the cls_score layer as linear
# predictors (trained, eg, with one-vs-rest SVMs).
__C.TEST.SVM = False
# Test using bounding-box regressors
__C.TEST.BBOX_REG = True
# Propose boxes
__C.TEST.HAS_RPN = False
# Test using these proposals
__C.TEST.PROPOSAL_METHOD = 'selective_search'
## NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
## Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
## Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 300
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
__C.TEST.RPN_MIN_SIZE = 16
#
# MISC
#
# The mapping from image coordinates to feature map coordinates might cause
# some boxes that are distinct in image space to become identical in feature
# coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
# for identifying duplicate boxes.
# 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
__C.DEDUP_BOXES = 1./16.
# Pixel mean values (BGR order) as a (1, 1, 3) array
# We use the same pixel mean for all networks even though it's not exactly what
# they were trained with
__C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# For reproducibility
__C.RNG_SEED = 3
# A small number that's used many times
__C.EPS = 1e-14
# Root directory of project
__C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
# Data directory
__C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
# Model directory
__C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'pascal_voc'))
# Name (or path to) the matlab executable
__C.MATLAB = 'matlab'
# Place outputs under an experiments directory
__C.EXP_DIR = 'default'
# Use GPU implementation of non-maximum suppression
__C.USE_GPU_NMS = False
# Default GPU device id
__C.GPU_ID = 0
def get_output_dir(imdb, net=None):
"""Return the directory where experimental artifacts are placed.
If the directory does not exist, it is created.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name))
if net is not None:
outdir = osp.join(outdir, net.name)
if not os.path.exists(outdir):
os.makedirs(outdir)
return outdir
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
if type(a) is not edict:
return
for k, v in a.iteritems():
# a must specify keys that are in b
if not b.has_key(k):
raise KeyError('{} is not a valid config key'.format(k))
# the types must match, too
old_type = type(b[k])
if old_type is not type(v):
if isinstance(b[k], np.ndarray):
v = np.array(v, dtype=b[k].dtype)
else:
raise ValueError(('Type mismatch ({} vs. {}) '
'for config key: {}').format(type(b[k]),
type(v), k))
# recursively merge dicts
if type(v) is edict:
try:
_merge_a_into_b(a[k], b[k])
except:
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def cfg_from_list(cfg_list):
"""Set config keys via list (e.g., from command line)."""
from ast import literal_eval
assert len(cfg_list) % 2 == 0
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = __C
for subkey in key_list[:-1]:
assert d.has_key(subkey)
d = d[subkey]
subkey = key_list[-1]
assert d.has_key(subkey)
try:
value = literal_eval(v)
except:
# handle the case when v is a string literal
value = v
assert type(value) == type(d[subkey]), \
'type {} does not match original type {}'.format(
type(value), type(d[subkey]))
d[subkey] = value
Whenever I run the code with a ZF net, I get the output images with the bounding box.
The terminal output is given here for ZF : http://txt.do/5bqsf
However, when I run the code with the VGG_CNN_M_1024 net, there isn't any output images displayed, even though the code runs successfully.
The terminal output is given here for VGG_CNN_M_1024 : http://txt.do/5bqsf
What do I change in the code ?
VGG16_faster_rcnn_final.caffemodel is the model learned via Faster RCNN.
This model would have been initialised using VGG16.caffemodel.
What you're after is a faster rcnn model, which has been initialised using the VGG_CNN_M_1024.caffemodel. Such a model would be called
VGG_CNN_M_1024_faster_rcnn_final.caffemodel if it followed the above naming convention.
If the model is not available online, you'll need to train it on the PASCAL dataset.
As u had mentioned your not able to use the model , one alternative you can do is place ur trained model in the faster_rccn_models folder just change
VGG16_faster_rcnn_final.caffemodel to the following
VGG_CNN_M_1024.caffemodel instead of adding a third one in the net, i follow the same convention and it works for me always

Use Open CV Python to create dispairty map with StereoBM with diagonal parallax

I have a stereo pair and would like to create a disparity map. However, the shift between the two images in not simply left to right or up and down, but some combination of the two. I have tried to use the StereoBM function in Open CV Python but the results have diagonal black and white lines across the image. My question is, is it possible to use two images where the parallax is in the diagonal direction to compute a disparity map, or do the images need to be rotated in order for this function to work?
EDIT: After reading the answers below, and doing some research, I decided to try the stereoRectifyUncalibrated function. I first find key points in the first image with SURF, and then repeat this for the second image. I then use the FLANN based matcher to match the points, and I remove the outliers. I then find the fundamental mat using the findFundamentalMat function, and then I call stereoRectifyUncalibrated. However, I get an error that begins like this: (-215) CV_IS_MAT(_points1) && CV_IS_MAT(_points2) && (_points1->rows == 1 || _points1->cols == 1) &&...
I have made sure that the data types of everything are the same, and that each point array are the same dimensions. I put the part of my code where I use stereoRectifyUncalibrated below.
#Detect feature points with SURF
detector = cv2.SURF()
kp1, desc1 = detector.detectAndCompute(img1, None)
kp2, desc2 = detector.detectAndCompute(img2, None)
#Match Points
FLANN_INDEX_KDTREE = 1 # bug: flann enums are missing
flann_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
matcher = cv2.FlannBasedMatcher(flann_params, {})
matches = matcher.knnMatch(desc1, trainDescriptors = desc2, k=2)
mkp1, mkp2 = [], []
ratio = 0.75
for m in matches:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
m = m[0]
mkp1.append( kp1[m.queryIdx] )
mkp2.append( kp2[m.trainIdx] )
np.float32([kp.pt for kp in mkp1])
p1 = np.float32([kp.pt for kp in mkp1])
p2 = np.float32([kp.pt for kp in mkp2])
kp_pairs = zip(mkp1, mkp2)
H, status = cv2.findHomography(p1, p2, cv2.RANSAC, 5.0)
print '%d / %d inliers/matched' % (np.sum(status), len(status))
statusmat = np.zeros((max(status.shape),2),dtype = np.float64)
statusmat[:,0] = status[:,0]
statusmat[:,1] = status[:,0]
status = np.array(status, dtype=bool)
p1f=p1[status.view(np.ndarray).ravel()==1,:] #Remove Outliers
p2f=p2[status.view(np.ndarray).ravel()==1,:] #Remove Outliers
#Attempt to rectify using stereoRectifyUncalibrated
fundmat, mask = cv2.findFundamentalMat(p1f,p2f,cv2.RANSAC,3,0.99,)
rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1f,p2f,fundmat,imgsize)
Thanks for the answers so far!
It seems that this function stereoRectifyUncalibrated takes a row or column vector, not a n x 2 matrix
Also output seems to have 3 elements
p1fNew = p1f.reshape((p1f.shape[0] * 2, 1))
p2fNew = p2f.reshape((p2f.shape[0] * 2, 1))
retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,fundmat,imgsize)

Categories

Resources