I'm writing a script to chroma key (green screen) and composite some videos using Python and PIL (pillow). I can key the 720p images, but there's some left over green spill. Understandable but I'm writing a routine to remove that spill...however I'm struggling with how long it's taking. I can probably get better speeds using numpy tricks, but I'm not that familiar with it. Any ideas?
Here's my despill routine. It takes a PIL image and a sensitivity number but I've been leaving that at 1 so far...it's been working well. I'm coming in at just over 4 seconds for a 720p frame to remove this spill. For comparison, the chroma key routine runs in about 2 seconds per frame.
def despill(img, sensitivity=1):
"""
Blue limits green.
"""
start = time.time()
print '\t[*] Starting despill'
width, height = img.size
num_channels = len(img.getbands())
out = Image.new("RGBA", img.size, color=0)
for j in range(height):
for i in range(width):
#r,g,b,a = data[j,i]
r,g,b,a = img.getpixel((i,j))
if g > (b*sensitivity):
out_g = (b*sensitivity)
else:
out_g = g
# end if
out.putpixel((i,j), (r,out_g,b,a))
# end for
# end for
out.show()
print '\t[+] done.'
print '\t[!] Took: %0.1f seconds' % (time.time()-start)
exit()
return out
# end despill
Instead of putpixel, I tried to write the output pixel values to a numpy array then convert the array to a PIL image, but that was averaging just over 5 seconds...so this was faster somehow. I know putpixel isn't the snappiest option but I'm at a loss...
putpixel is slow, and loops like that are even slower, since they are run by the Python interpreter, which is slow as hell. The usual solution is to convert immediately the image to a numpy array and solve the problem with vectorized operations on it, which run in heavily optimized C code. In your case I would do something like:
arr = np.array(img)
g = arr[:,:,1]
bs = arr[:,:,2]*sensitivity
cond = g>bs
arr[:,:,1] = cond*bs + (~cond)*g
out = Image.fromarray(arr)
(it may not be correct and I'm sure it can be optimized way better, this is just a sketch)
Related
EDIT: Code is working now, thanks to Mark and zephyr. zephyr also has two alternate working solutions below.
I want to divide blend two images with PIL. I found ImageChops.multiply(image1, image2) but I couldn't find a similar divide(image, image2) function.
Divide Blend Mode Explained (I used the first two images here as my test sources.)
Is there a built-in divide blend function that I missed (PIL or otherwise)?
My test code below runs and is getting close to what I'm looking for. The resulting image output is similar to the divide blend example image here: Divide Blend Mode Explained.
Is there a more efficient way to do this divide blend operation (less steps and faster)? At first, I tried using lambda functions in Image.eval and ImageMath.eval to check for black pixels and flip them to white during the division process, but I couldn't get either to produce the correct result.
EDIT: Fixed code and shortened thanks to Mark and zephyr. The resulting image output matches the output from zephyr's numpy and scipy solutions below.
# PIL Divide Blend test
import Image, os, ImageMath
imgA = Image.open('01background.jpg')
imgA.load()
imgB = Image.open('02testgray.jpg')
imgB.load()
# split RGB images into 3 channels
rA, gA, bA = imgA.split()
rB, gB, bB = imgB.split()
# divide each channel (image1/image2)
rTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=rA, b=rB).convert('L')
gTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=gA, b=gB).convert('L')
bTmp = ImageMath.eval("int(a/((float(b)+1)/256))", a=bA, b=bB).convert('L')
# merge channels into RGB image
imgOut = Image.merge("RGB", (rTmp, gTmp, bTmp))
imgOut.save('PILdiv0.png', 'PNG')
os.system('start PILdiv0.png')
You are asking:
Is there a more efficient way to do this divide blend operation (less steps and faster)?
You could also use the python package blend modes. It is written with vectorized Numpy math and generally fast. Install it via pip install blend_modes. I have written the commands in a more verbose way to improve readability, it would be shorter to chain them. Use blend_modes like this to divide your images:
from PIL import Image
import numpy
import os
from blend_modes import blend_modes
# Load images
imgA = Image.open('01background.jpg')
imgA = numpy.array(imgA)
# append alpha channel
imgA = numpy.dstack((imgA, numpy.ones((imgA.shape[0], imgA.shape[1], 1))*255))
imgA = imgA.astype(float)
imgB = Image.open('02testgray.jpg')
imgB = numpy.array(imgB)
# append alpha channel
imgB = numpy.dstack((imgB, numpy.ones((imgB.shape[0], imgB.shape[1], 1))*255))
imgB = imgB.astype(float)
# Divide images
imgOut = blend_modes.divide(imgA, imgB, 1.0)
# Save images
imgOut = numpy.uint8(imgOut)
imgOut = Image.fromarray(imgOut)
imgOut.save('PILdiv0.png', 'PNG')
os.system('start PILdiv0.png')
Be aware that for this to work, both images need to have the same dimensions, e.g. imgA.shape == (240,320,3) and imgB.shape == (240,320,3).
There is a mathematical definition for the divide function here:
http://www.linuxtopia.org/online_books/graphics_tools/gimp_advanced_guide/gimp_guide_node55_002.html
Here's an implementation with scipy/matplotlib:
import numpy as np
import scipy.misc as mpl
a = mpl.imread('01background.jpg')
b = mpl.imread('02testgray.jpg')
c = a/((b.astype('float')+1)/256)
d = c*(c < 255)+255*np.ones(np.shape(c))*(c > 255)
e = d.astype('uint8')
mpl.imshow(e)
mpl.imsave('output.png', e)
If you don't want to use matplotlib, you can do it like this (I assume you have numpy):
imgA = Image.open('01background.jpg')
imgA.load()
imgB = Image.open('02testgray.jpg')
imgB.load()
a = asarray(imgA)
b = asarray(imgB)
c = a/((b.astype('float')+1)/256)
d = c*(c < 255)+255*ones(shape(c))*(c > 255)
e = d.astype('uint8')
imgOut = Image.fromarray(e)
imgOut.save('PILdiv0.png', 'PNG')
The problem you're having is when you have a zero in image B - it causes a divide by zero. If you convert all of those values to one instead I think you'll get the desired result. That will eliminate the need to check for zeros and fix them in the result.
I have written some code to apply filters to an image using kernel convolution. Currently, it takes quite a long time, approximately 30 seconds for a 400x400 image. I understand that box blurs are much faster than Gaussian blurs. However, when I change my kernel to a box blur it seems to take as much time as the Gaussian blur. Any ideas?
import cv2
import numpy as np
img = cv2.imread('test.jpg')
img2 = cv2.imread('test.jpg')
height, width, channels = img.shape
GB3 = np.array([[1,2,1], [2,4,2], [1,2,1]])
GB5 = np.array([[1,4,6,4,1], [4,16,24,16,4], [6,24,36,24,6], [4,16,24,16,4], [1,4,6,4,1]])
BB = np.array([[1,1,1], [1,1,1], [1,1,1]])
kernel = BB
#initialise
kernel_sum = 1
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
p = kernel[i][j]
kernel_sum += p
for x in range(1,width-1):
for y in range(1,height-1):
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
filtered_sum_b += img[y-1+j,x-1+i,0]*kernel[i][j]
filtered_sum_g += img[y-1+j,x-1+i,1]*kernel[i][j]
filtered_sum_r += img[y-1+j,x-1+i,2]*kernel[i][j]
new_pixel_r = filtered_sum_r/kernel_sum
new_pixel_g = filtered_sum_g/kernel_sum
new_pixel_b = filtered_sum_b/kernel_sum
if new_pixel_r>255:
new_pixel_r = 255
elif new_pixel_r<0:
new_pixel_r = 0
if new_pixel_g>255:
new_pixel_g = 255
elif new_pixel_g<0:
new_pixel_g = 0
if new_pixel_b>255:
new_pixel_b = 255
elif new_pixel_b<0:
new_pixel_b = 0
img2[y,x,0] = new_pixel_b
img2[y,x,1] = new_pixel_g
img2[y,x,2] = new_pixel_r
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
#print(kernel_sum)
scale = 2
img_big = cv2.resize(img, (0,0), fx=scale, fy=scale)
img2_big = cv2.resize(img2, (0,0), fx=scale, fy=scale)
cv2.imshow('original', img_big)
cv2.imshow('processed', img2_big)
cv2.waitKey(0)
cv2.destroyAllWindows()
you are using python loops. that will always be orders of magnitude slower than optimized binary code. whenever possible, use library functions, i.e. numpy and OpenCV. or write your critical code as compilable Cython.
your code's access pattern is suboptimal. you should move along rows in the inner loop (for y: for x:) because that's how the image is stored. the reason here is how your CPU's cache is used. in row-major storage, a cache line contains several pixels in a row. if you run along columns, you only use that cache line once before needing another.
your code doesn't make use of the property that both types of filter are "separable"
convolution can be expressed as an elementwise multiplication in the frequency domain (DFT, multiply, inverse DFT), which is the usual way to perform convolutions.
Use OpenCV's filter2D function for your convolutions.
As for box blur vs gaussian, the only difference is "interesting" weights vs. no weights (all equal). That amounts to a few more multiplications, or not. When the code is optimized, its execution time can be dominated by the time needed to transfer the data from RAM to CPU. that goes for optimized code, not pure python loops.
I'm trying to transform an image containing colored symbols into pixel art as featured on the right (see image below), where each colored symbol (taking up multiple pixels) would be changed into one pixel of the symbol's color.
Example of what I'm trying to achieve
So far I've written a pretty naive algorithm that just loops through all the pixels, and is pretty sluggish. I believe I could make it faster, for instance using native numpy operations, but I've been unable to find how. Any tips?
(I also started by trying to simply resize the image, but couldn't find a resampling algorithm that would make it work).
def resize(img, new_width):
width, height = img.shape[:2]
new_height = height*new_width//width
new_image = np.zeros((new_width, new_height,4), dtype=np.uint8)
x_ratio, y_ratio = width//new_width, height//new_height
for i in range(new_height):
for j in range(new_width):
sub_image = img[i*y_ratio:(i+1)*y_ratio, j*x_ratio:(j+1)*x_ratio]
found = False
for row in sub_image:
for pixel in row:
if any(pixel!=[0,0,0,0]):
new_image[i,j]=pixel
break
if found:
break
return new_image
A larger example
import cv2
import numpy as np
img=cv2.imread('zjZA8.png')
h,w,c=img.shape
new_img=np.zeros((h//7,w//7,c), dtype='uint8')
for k in range(c):
for i in range(h//7):
for j in range(w//7):
new_img[i,j,k]=np.max(img[7*i:7*i+7,7*j:7*j+7,k])
cv2.imwrite('out3.png', new_img)
Left is result with np.mean, center - source image, right - result with np.max
Please test this code:
img=cv2.imread('zjZA8.png')
h,w,c=img.shape
bgr=[0,0,0]
bgr[0], bgr[1],bgr[2] =cv2.split(img)
for k in range(3):
bgr[k].shape=(h*w//7, 7)
bgr[k]=np.mean(bgr[k], axis=1)
bgr[k].shape=(h//7, 7, w//7)
bgr[k]=np.mean(bgr[k], axis=1)
bgr[k].shape=(h//7,w//7)
bgr[k]=np.uint8(bgr[k])
out=cv2.merge((bgr[0], bgr[1],bgr[2]))
cv2.imshow('mean_image', out)
Modifying my code to use the native np.nonzero operation did the trick!
I went down from ~8s to ~0.32s on a 1645x1645 image (with new_width=235).
I also tried using numba on top of that, but the overhead ends up making it slower.
def resize(img, new_width):
height, width = img.shape[:2]
new_height = height*new_width//width
new_image = np.ones((new_height, new_width,3), dtype=np.uint8)
x_ratio, y_ratio = width//new_width, height//new_height
for i in range(new_height):
for j in range(new_width):
sub_image = img[i*y_ratio:(i+1)*y_ratio, j*x_ratio:(j+1)*x_ratio]
non_zero = np.nonzero(sub_image)
if non_zero[0].size>0:
new_image[i, j]=sub_image[non_zero[0][0],non_zero[1][0]][:3]
return new_image
I'm trying to get an array with the alpha channel value of a sprite image, using Pyglet library.
I wrote this code which actually works:
mask = []
for x in range(image.width):
mask.append([])
for y in range(image.height):
mask[x].append(ord(image.get_region(x, y, 1, 1).get_image_data().get_data("RGBA", 4)[3]))
return mask
The only problem is that it's really really slow, I guess I'm using a wrong function to retrieve the alpha channel. What can I do to make it faster?
UPDATE
I find a solution with the following code which is faster:
rawimage = image.get_image_data()
format = 'RGBA'
pitch = rawimage.width * len(format)
pixels = rawimage.get_data(format, pitch)
data = unpack("%iB" % (4 * image.width * image.height), pixels)
mask = data[3::4]
return mask
I don't know pyglet, but I'm guessing the performance issue is related to the many queries for only a single pixel. Instead you want to get the entire image from the GPU in just one call, including the colour and alpha values, and then extract just the alpha. I'd also use struct.unpack instead of ord().
Note: This code is untested and purely based on the example in the question. There is probably a better way.
from struct import unpack
...
region = image.get_region(0, 0, image.width, image.height)
packed_data = region.get_image_data().get_data("RGBA", 4)
data = unpack("%iB" % (4 * image.width * image.height), packed_data)
mask = data[3::4]
I don't think it'd be worth it but if you really didn't want to drag colour back from the GPU you could explore copying alpha to another texture first (there might even be a way to get GL to unpack it as a format conversion or reinterpretation).
I have many skeletonized images like this:
How can i detect a cycle, a loop in the skeleton?
Are there "special" functions that do this or should I implement it as a graph?
In case there is only the graph option, can the python graph library NetworkX can help me?
You can exploit the topology of the skeleton. A cycle will have no holes, so we can use scipy.ndimage to find any holes and compare. This isn't the fastest method, but it's extremely easy to code.
import scipy.misc, scipy.ndimage
# Read the image
img = scipy.misc.imread("Skel.png")
# Retain only the skeleton
img[img!=255] = 0
img = img.astype(bool)
# Fill the holes
img2 = scipy.ndimage.binary_fill_holes(img)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img == img2).all()
# As a test break the cycles
img3 = img.copy()
img3[0:200, 0:200] = 0
img4 = scipy.ndimage.binary_fill_holes(img3)
# Compare the two, an image without cycles will have no holes
print "Cycles in image: ", ~(img3 == img4).all()
I've used your "B" picture as an example. The first two images are the original and the filled version which detects a cycle. In the second version, I've broken the cycle and nothing gets filled, thus the two images are the same.
First, let's build an image of the letter B with PIL:
import Image, ImageDraw, ImageFont
image = Image.new("RGBA", (600,150), (255,255,255))
draw = ImageDraw.Draw(image)
fontsize = 150
font = ImageFont.truetype("/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf", fontsize)
txt = 'B'
draw.text((30, 5), txt, (0,0,0), font=font)
img = image.resize((188,45), Image.ANTIALIAS)
print type(img)
plt.imshow(img)
you may find a better way to do that, particularly with path to the fonts. Ii would be better to load an image instead of generating it. Anyway, we have now something to work on:
Now, the real part:
import mahotas as mh
img = np.array(img)
im = img[:,0:50,0]
im = im < 128
skel = mh.thin(im)
noholes = mh.morph.close_holes(skel)
plt.subplot(311)
plt.imshow(im)
plt.subplot(312)
plt.imshow(skel)
plt.subplot(313)
cskel = np.logical_not(skel)
choles = np.logical_not(noholes)
holes = np.logical_and(cskel,noholes)
lab, n = mh.label(holes)
print 'B has %s holes'% str(n)
plt.imshow(lab)
And we have in the console (ipython):
B has 2 holes
Converting your skeleton image to a graph representation is not trivial, and I don't know of any tools to do that for you.
One way to do it in the bitmap would be to use a flood fill, like the paint bucket in photoshop. If you start a flood fill of the image, the entire background will get filled if there are no cycles. If the fill doesn't get the entire image then you've found a cycle. Robustly finding all the cycles could require filling multiple times.
This is likely to be very slow to execute, but probably much faster to code than a technique where you trace the skeleton into graph data structure.