Optimized skeleton function for opencv with python - python

So I am using OpenCV on raspbian (raspberry pi 2 model B). I am doing vision/image processing obviously and the rasppi is what I was given (I would use a computer if I could for this).
I need to run a skeleton function. I found the following implementation:
import cv2
import numpy as np
img = cv2.imread('img.png',0)
size = np.size(img)
skeleton = np.zeros(img.shape,np.uint8)
ret,img = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
finished = False
while(not finished):
eroded = cv2.erode(img,kernel)
temp = cv2.dilate(eroded,kernel)
temp = cv2.subtract(img,temp)
skeleton = cv2.bitwise_or(skeleton,temp)
img = eroded.copy()
zeros = size - cv2.countNonZero(img)
if zeros==size:
finished = True
cv2.imshow("skeleton",skeleton)
cv2.waitKey(0)
cv2.destroyAllWindows()
While it runs, it's very very slow unsurprisingly (I am doing an FFT and bandpass filtering operation the image before this, then running the skeleton operation). The other code is slow, but will complete the operations.
The images are big - I could crop them some, but I don't think it would be enough. I was trying to find an optimized version of this, but so far haven't come up with anything. Any ideas or solutions?

In this answer, I'll focus on improving your implementation, rather than the algorithm. While this won't gain us a significant amount, I think it's still useful to be aware of.
Preparation
Let's begin with some boilerplate -- necessary imports, some test image, and few functions to let us compare easily:
from timeit import default_timer as timer
import numpy as np
import cv2
# Create a decent size test image...
img = cv2.imread('cage.png',0)
img = cv2.resize(img, (2048, 2048))
cv2.normalize(img, img, 0, 255, cv2.NORM_MINMAX)
def time_fn(fn, img, iters=1):
start = timer()
result = None
for i in range(iters):
result = fn(img)
end = timer()
return (result,((end - start) / iters) * 1000)
def run_test(fn, img, i):
res, t = time_fn(fn, img, 4)
cv2.imwrite("skeleton_%d.png" % i, res[0])
print "Variant %d" % i
print "Input size = (%d, %d)" % img.shape[:2]
print "Ran %d iterations to find skeleton." % res[1]
print "Avg. find_skeleton time = %0.4f s." % (t/1000)
Variant 1 (Original)
Let's turn your implementation into a function, and remove a few unnecessary bits. Out of curiosity, let's track the number of iterations needed for the skeletonization.
def find_skeleton1(img):
skeleton = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
eroded = cv2.erode(thresh, kernel)
temp = cv2.dilate(eroded, kernel)
temp = cv2.subtract(thresh, temp)
skeleton = cv2.bitwise_or(skeleton, temp)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
And let's see how it performs to set our baseline.
>>> run_test(find_skeleton1, img, 1)
Variant 1
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 2.7969 s.
Variant 2
The first improvement we can make is to minimize the number of allocations of new array objects, and reuse as much as possible. We can create a few more temporary arrays (like skeleton), and use the dst parameter of the OpenCV functions in the loop ignoring the return value. Since we provide a destination of correct shape and data type, the existing array gets reused.
def find_skeleton2(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Let's try this out, and check that the results are the same:
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton2(img)[0])
True
>>> run_test(find_skeleton2, img, 2)
Variant 2
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 1.4356 s.
Variant 3
The next step is to get rid of unnecessary copies -- there's one that's very obvious: thresh = eroded.copy(). Notice that in the following iteration, we immediately overwrite the contents of eroded. Hence, we don't really care what it contains, as long as it's the correct shape and data type. They are, so this means that rather than performing a copy, we can just swap the two objects.
def find_skeleton3(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh, eroded = eroded, thresh # Swap instead of copy
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Again, let's verify the results match and do some timing.
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton3(img)[0])
True
>>> run_test(find_skeleton3, img, 3)
Variant 3
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 0.9839 s.
Few simple changes got the timing down to ~35% of the original. Of course, it still does hundreds of iterations processing the entire image. Next step would be to look into ways how to reduce the amount of work -- in the latter iterations, significant areas of the working image are black, and don't contribute anything to the skeleton.
NB: Measurements done on i7-4930K. I don't have a raspberry, feel free to add timings from yours, so we see what sort of effect it has.

Related

Python: Fast way to remove horizontal black line in image

I would like to remove horizontal black lines on an image:
To do this, I interpolate the RGB values ​​of each column of pixels.
The black line disappear but I think it is possible to optimize this function:
def fillMissingValue(img_in):
img_out = np.copy(img_in)
#Proceed column by column
for i in tqdm(range(img_in.shape[1])):
col = img_in[:,i]
col_r = col[:,0]
col_g = col[:,1]
col_b = col[:,2]
r = interpolate(col_r)
g = interpolate(col_g)
b = interpolate(col_b)
img_out[:,i,0] = r
img_out[:,i,1] = g
img_out[:,i,2] = b
return img_out
def interpolate(y):
x = np.arange(len(y))
idx = np.nonzero(y)
interp = interp1d(x[idx],y[idx],fill_value="extrapolate" )
return interp(x)
if __name__ == "__main__":
img = cv2.imread("lena.png")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (1024,1024))
start = time.time()
img2 = fillMissingValue(img)
end = time.time()
print("Process time: {}".format(np.round(end-start,3)))
Do you have any ideas ?
I thought of doing a prepocessing step by identifying the position of the black lines. And thus only interpolated neighboring pixels. But I don't think it's faster
Current result:
interp1d is not very efficient.
As proposed by #ChristophRackwitz in the comments, you can detect the location of the lines and use the inpainting method provided by OpenCV:
img = cv2.imread('lena.jpg')
# Locate the relatively black lines
threshold = 25
lineIdx = np.where(np.mean(img, axis=(1,2)) < threshold)
# Perform inpainting on the located lines
mask = np.zeros(img.shape[:2], dtype=np.uint8)
mask[lineIdx] = 255
# Actual inpainting.
# Note: using 2 or 1 instead of 3 make the computation
# respectively ~2 and ~4 time faster on my machine but
# the result is not as beautiful with 3.
img2 = cv2.inpaint(img, mask, 3, cv2.INPAINT_NS)
The computational part takes 87 ms on my machine while your code takes 342 ms. Note that because of JPEG compression, the result is not so great. You can inpaint the neighbour lines (eg. lineIdx-1 and lineIdx+1) so to get a much better result at the expense of the slower computation (about 2.5 slower on my machine).
An alternative solution is to perform the interpolation yourself in Numpy:
%%time
# Locate the relatively black lines
threshold = 25
lineIdx = np.where(np.mean(img, axis=(1,2)) < threshold)[0]
lineIdxSet = set(lineIdx)
img2 = img.copy()
start, end = None, None
interpolate = False
for i in range(img.shape[0]+1):
if i in lineIdxSet:
if start is None:
start = i
end = i
else:
if not (start is None):
assert not (end is None)
# The first lines are black
if start <= 0:
i0, i1 = end+1, end+1
# The last lines are black
elif end >= img.shape[0]-1:
i0, i1 = start-1, start-1
# usual case
else:
i0, i1 = start-1, end+1
# The full image is black
if i0 < 0 or i1 >= img.shape[0]:
continue
end = min(end, img.shape[0]-1)
# Actual linear interpolation (of a bloc of lines)
coef = np.linspace(0, 1, end-start+3)[1:-1].reshape(-1, 1)
l0 = img[i0].reshape(-1)
l1 = img[i1].reshape(-1)
img2[start:end+1] = (coef * l0 + (1.0-coef) * l1).reshape(-1, img.shape[1], 3)
start, end = None, None
This code takes only 5 ms on my machine. The result should be similar to the one of your original code except that it works line-by-line and not column by column and that the detection is not independent for each colour channel. Note that the inpainting method give more beautiful results when large blocs of lines are black.

Box blur is not any faster than Gaussian blur?

I have written some code to apply filters to an image using kernel convolution. Currently, it takes quite a long time, approximately 30 seconds for a 400x400 image. I understand that box blurs are much faster than Gaussian blurs. However, when I change my kernel to a box blur it seems to take as much time as the Gaussian blur. Any ideas?
import cv2
import numpy as np
img = cv2.imread('test.jpg')
img2 = cv2.imread('test.jpg')
height, width, channels = img.shape
GB3 = np.array([[1,2,1], [2,4,2], [1,2,1]])
GB5 = np.array([[1,4,6,4,1], [4,16,24,16,4], [6,24,36,24,6], [4,16,24,16,4], [1,4,6,4,1]])
BB = np.array([[1,1,1], [1,1,1], [1,1,1]])
kernel = BB
#initialise
kernel_sum = 1
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
p = kernel[i][j]
kernel_sum += p
for x in range(1,width-1):
for y in range(1,height-1):
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
filtered_sum_b += img[y-1+j,x-1+i,0]*kernel[i][j]
filtered_sum_g += img[y-1+j,x-1+i,1]*kernel[i][j]
filtered_sum_r += img[y-1+j,x-1+i,2]*kernel[i][j]
new_pixel_r = filtered_sum_r/kernel_sum
new_pixel_g = filtered_sum_g/kernel_sum
new_pixel_b = filtered_sum_b/kernel_sum
if new_pixel_r>255:
new_pixel_r = 255
elif new_pixel_r<0:
new_pixel_r = 0
if new_pixel_g>255:
new_pixel_g = 255
elif new_pixel_g<0:
new_pixel_g = 0
if new_pixel_b>255:
new_pixel_b = 255
elif new_pixel_b<0:
new_pixel_b = 0
img2[y,x,0] = new_pixel_b
img2[y,x,1] = new_pixel_g
img2[y,x,2] = new_pixel_r
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
#print(kernel_sum)
scale = 2
img_big = cv2.resize(img, (0,0), fx=scale, fy=scale)
img2_big = cv2.resize(img2, (0,0), fx=scale, fy=scale)
cv2.imshow('original', img_big)
cv2.imshow('processed', img2_big)
cv2.waitKey(0)
cv2.destroyAllWindows()
you are using python loops. that will always be orders of magnitude slower than optimized binary code. whenever possible, use library functions, i.e. numpy and OpenCV. or write your critical code as compilable Cython.
your code's access pattern is suboptimal. you should move along rows in the inner loop (for y: for x:) because that's how the image is stored. the reason here is how your CPU's cache is used. in row-major storage, a cache line contains several pixels in a row. if you run along columns, you only use that cache line once before needing another.
your code doesn't make use of the property that both types of filter are "separable"
convolution can be expressed as an elementwise multiplication in the frequency domain (DFT, multiply, inverse DFT), which is the usual way to perform convolutions.
Use OpenCV's filter2D function for your convolutions.
As for box blur vs gaussian, the only difference is "interesting" weights vs. no weights (all equal). That amounts to a few more multiplications, or not. When the code is optimized, its execution time can be dominated by the time needed to transfer the data from RAM to CPU. that goes for optimized code, not pure python loops.

Image stitching

I've recorded the video while bottle was rotated.Then i got frames from video and cut the central block from all images.
So for all frames I got the following images:
I've tried to stitch them to get panorama, but I got bad results.
I used the following program:
import glob
#rom panorama import Panorama
import sys
import numpy
import imutils
import cv2
def readImages(imageString):
images = []
# Get images from arguments.
for i in range(0, len(imageString)):
img = cv2.imread(imageString[i])
images.append(img)
return images
def findAndDescribeFeatures(image):
# Getting gray image
grayImage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find and describe the features.
# Fast: sift = cv2.xfeatures2d.SURF_create()
sift = cv2.xfeatures2d.SIFT_create()
# Find interest points.
keypoints = sift.detect(grayImage, None)
# Computing features.
keypoints, features = sift.compute(grayImage, keypoints)
# Converting keypoints to numbers.
keypoints = numpy.float32([kp.pt for kp in keypoints])
return keypoints, features
def matchFeatures(featuresA, featuresB):
# Slow: featureMatcher = cv2.DescriptorMatcher_create("BruteForce")
featureMatcher = cv2.DescriptorMatcher_create("FlannBased")
matches = featureMatcher.knnMatch(featuresA, featuresB, k=2)
return matches
def generateHomography(allMatches, keypointsA, keypointsB, ratio, ransacRep):
if not allMatches:
return None
matches = []
for match in allMatches:
# Lowe's ratio test
if len(match) == 2 and (match[0].distance / match[1].distance) < ratio:
matches.append(match[0])
pointsA = numpy.float32([keypointsA[m.queryIdx] for m in matches])
pointsB = numpy.float32([keypointsB[m.trainIdx] for m in matches])
if len(pointsA) > 4:
H, status = cv2.findHomography(pointsA, pointsB, cv2.RANSAC, ransacRep)
return matches, H, status
else:
return None
paths = glob.glob("C:/Users/andre/Desktop/Panorama-master/frames/*.jpg")
images = readImages(paths[::-1])
while len(images) > 1:
imgR = images.pop()
imgL = images.pop()
interestsR, featuresR = findAndDescribeFeatures(imgR)
interestsL, featuresL = findAndDescribeFeatures(imgL)
try:
try:
allMatches = matchFeatures(featuresR, featuresL)
_, H, _ = generateHomography(allMatches, interestsR, interestsL, 0.75, 4.0)
result = cv2.warpPerspective(imgR, H,
(imgR.shape[1] + imgL.shape[1], imgR.shape[0]))
result[0:imgL.shape[0], 0:imgL.shape[1]] = imgL
images.append(result)
except TypeError:
pass
except cv2.error:
pass
result = imutils.resize(images[0], height=260)
cv2.imshow("Result", result)
cv2.imwrite("Result.jpg", result)
cv2.waitKey(0)
My result was:
May be someone know hot to do it better? I think that using small blocks from frame should remove roundness... But...
Data: https://1drv.ms/f/s!ArcAdXhy6TxPho0FLKxyRCL-808Y9g
I managed to achieve a nice result. I rewrote your code just a little bit, here is the changed part:
def generateTransformation(allMatches, keypointsA, keypointsB, ratio):
if not allMatches:
return None
matches = []
for match in allMatches:
# Lowe's ratio test
if len(match) == 2 and (match[0].distance / match[1].distance) < ratio:
matches.append(match[0])
pointsA = numpy.float32([keypointsA[m.queryIdx] for m in matches])
pointsB = numpy.float32([keypointsB[m.trainIdx] for m in matches])
if len(pointsA) > 2:
transformation = cv2.estimateRigidTransform(pointsA, pointsB, True)
if transformation is None or transformation.shape[1] < 1 or transformation.shape[0] < 1:
return None
return transformation
else:
return None
paths = glob.glob("a*.jpg")
images = readImages(paths[::-1])
result = images[0]
while len(images) > 1:
imgR = images.pop()
imgL = images.pop()
interestsR, featuresR = findAndDescribeFeatures(imgR)
interestsL, featuresL = findAndDescribeFeatures(imgL)
allMatches = matchFeatures(featuresR, featuresL)
transformation = generateTransformation(allMatches, interestsR, interestsL, 0.75)
if transformation is None or transformation[0, 2] < 0:
images.append(imgR)
continue
transformation[0, 0] = 1
transformation[1, 1] = 1
transformation[0, 1] = 0
transformation[1, 0] = 0
transformation[1, 2] = 0
result = cv2.warpAffine(imgR, transformation, (imgR.shape[1] +
int(transformation[0, 2] + 1), imgR.shape[0]))
result[:, :imgL.shape[1]] = imgL
cv2.imshow("R", result)
images.append(result)
cv2.waitKey(1)
cv2.imshow("Result", result)
So the key thing I changed is the transformation of the images. I use estimateRigidTransform() instead of findHomography() to calculate transformation of the image. From that transformation matrix I only extract the x coordinate translation, which is in the [0, 2] cell of the resulting Affine Transformation matrix transformation. I set the other transformation matrix elements as if it is an identity transformation (no scaling, no perspective, no rotation or y translation). Then I pass it to warpAffine() to transform the imgR the same way you did with warpPerspective().
You can do it because you have stable camera and spinning object positions and you capture with a straight front view of the object. It means that you don't have to do any perspective / scaling / rotation image corrections and can just "glue" them together by x axis.
I think your approach fails because you actually observe the bottle with a slightly tilted down camera view or the bottle is not in the middle of the screen. I'll try to describe that with an image. I depict some text on the bottle with red. For example the algorithm finds a matching points pair (green) on the bottom of the captured round object. Note that the point moves not only right, but diagonally up too. The program then calculates the transformation taking into account the points which move up slightly. This continues to get worse frame by frame.
The recognition of matching image points also may be slightly inaccurate, so extracting only the x translation is even better because you give the algorithm "a clue" what actual situation you have. This makes it less applicable for another conditions, but in your case it improves the result a lot.
Also I filter out some incorrect results with if transformation[0, 2] < 0 check (it can rotate only one direction, and the code wont work if that is negative anyways).

Python SimpleBlobDetector.detect() is very slow with big images and can crash

I program in Python 3.X and use the opencv simple blob detector to retrieve their key points. This works very good. However when it comes to big images - like really big e.g. 16384x16384 pixels - processing takes really long. Here is the code:
...
if blur_sigma > 0:
img_blurred = cv2.GaussianBlur(img, (-1, -1), blur_sigma)
else:
img_blurred = img
img_hsv = cv2.cvtColor(img_blurred, cv2.COLOR_BGR2HSV)
img_filtered = cv2.inRange(img_hsv, tuple(lower_bounds), tuple(upper_bounds))
ksize = 2 * morph_size + 1
morph_element = cv2.getStructuringElement(morph_element, (ksize, ksize))
img_morphed = cv2.morphologyEx(img_filtered,
morph_operator,
morph_element,
borderType=cv2.BORDER_REFLECT)
blob_params = cv2.SimpleBlobDetector_Params()
blob_params.filterByInertia = False
blob_params.minConvexity = 0.0
blob_detector = cv2.SimpleBlobDetector_create(blob_params)
key_points = blob_detector.detect(img_morphed)
...
Additionally happened several times that in the last line, during execution, an assertion is thrown which directed me there:
cv2.error: D:\Build\OpenCV\opencv-3.2.0\modules\core\src\matrix.cpp:433: error: (-215) u != 0 in function cv::Mat::create
In there, an allocation fails.
So my questions are:
1) Is this kind of image size an edge case where the detection simply can no longer work with reasonable time?
2) Is the error also related to the image size?
3)Is there a way to improve detection with additional steps beforehand?
Thanks!

Faster method for adjusting PIL pixel values

I'm writing a script to chroma key (green screen) and composite some videos using Python and PIL (pillow). I can key the 720p images, but there's some left over green spill. Understandable but I'm writing a routine to remove that spill...however I'm struggling with how long it's taking. I can probably get better speeds using numpy tricks, but I'm not that familiar with it. Any ideas?
Here's my despill routine. It takes a PIL image and a sensitivity number but I've been leaving that at 1 so far...it's been working well. I'm coming in at just over 4 seconds for a 720p frame to remove this spill. For comparison, the chroma key routine runs in about 2 seconds per frame.
def despill(img, sensitivity=1):
"""
Blue limits green.
"""
start = time.time()
print '\t[*] Starting despill'
width, height = img.size
num_channels = len(img.getbands())
out = Image.new("RGBA", img.size, color=0)
for j in range(height):
for i in range(width):
#r,g,b,a = data[j,i]
r,g,b,a = img.getpixel((i,j))
if g > (b*sensitivity):
out_g = (b*sensitivity)
else:
out_g = g
# end if
out.putpixel((i,j), (r,out_g,b,a))
# end for
# end for
out.show()
print '\t[+] done.'
print '\t[!] Took: %0.1f seconds' % (time.time()-start)
exit()
return out
# end despill
Instead of putpixel, I tried to write the output pixel values to a numpy array then convert the array to a PIL image, but that was averaging just over 5 seconds...so this was faster somehow. I know putpixel isn't the snappiest option but I'm at a loss...
putpixel is slow, and loops like that are even slower, since they are run by the Python interpreter, which is slow as hell. The usual solution is to convert immediately the image to a numpy array and solve the problem with vectorized operations on it, which run in heavily optimized C code. In your case I would do something like:
arr = np.array(img)
g = arr[:,:,1]
bs = arr[:,:,2]*sensitivity
cond = g>bs
arr[:,:,1] = cond*bs + (~cond)*g
out = Image.fromarray(arr)
(it may not be correct and I'm sure it can be optimized way better, this is just a sketch)

Categories

Resources