Use multiprocessing and multithreading to print on a image

Use multiprocessing and multithreading to print on a image - python

I am facing a problem with multiprocessing and threading my program to fast the process.
My program take a list of point into an excel and create a gray scale image from this points.
The problem is I have a million points and it takes around 1 min to process. I am sure, there is a way to speed up the processing.
Here is the code without threading:
import os
import math
import json
import time
import pandas as pd
from PIL import Image
# FUNCTIONS
def CreateDataFrame(path, columns):
print('DataFrame creation ... ', end='')
with open(path) as excel_file:
lines = excel_file.read().splitlines()
np_array = []
for line in lines:
np_array.append(list(map(float, line.split(' '))))
print('done')
return pd.DataFrame(np_array, columns=columns)
def GetOffsets(df):
print('Getting offsets ... ', end='')
dict = {}
for c in df.columns:
dict[c] = min(df[c])
print('done')
return dict
def GetMaximums(df, offsets):
print('Getting Maximums ... ', end='')
max_dict = {}
for c in df.columns:
max_dict[c] = max(df[c]) - offsets[c]
print('done')
return max_dict
def CreateImage(maximums, scale = 1):
return Image.new('RGB', (int(maximums['x'] * scale) + 1, int(maximums['z'] * scale) + 1), color='black')
# MAIN
columns = ['x', 'z', 'y']
scale = 1
df = CreateDataFrame('raw data.csv', columns)
offsets = GetOffsets(df)
maximums = GetMaximums(df, offsets)
img = CreateImage(maximums, scale)
pixels = img.load()
print('Printing ... ', end='')
for i in range(len(df)):
line = df.iloc[i]
color = int(255 * (line['y'] - offsets['y']) / maximums['y'])
pixels[int((maximums['x'] - (line['x'] - offsets['x'])) * scale), int((line['z'] - offsets['z']) * scale)] = (color, color, color)
print('done')
img.save(f'terrain {scale}.png')
If you are interested, this is how it works. First, I create a dataframe from the excel and assign columns. Then, I get the minimum values of each columns to get my offset values. Once done, I do the same thing but with maximums. Thanks to those maximums, I can create an image with the maximum x and y values. Finally, I iterate into my dataframe to get the x and y for the position and gray scale the y value.
Now I am trying to multiprocess/thread it. To do that, I added this code:
import concurrent.futures
def Process(id, start, end):
start_time = time.time()
for i in range(start, end):
line = df.iloc[i]
color = int(255 * (line['y'] - offsets['y']) / maximums['y'])
pixels[int((maximums['x'] - (line['x'] - offsets['x'])) * scale), int((line['z'] - offsets['z']) * scale)] = (color, color, color)
print(f"Thread {id} ends: {time.time() - start_time}s")
nb_thread = 12
df_size = len(df)
nb_full_thread = df_size // nb_thread
thread_rest = df_size % nb_thread
with concurrent.futures.ThreadPoolExecutor(max_workers=nb_thread) as executor:
for i in range(nb_thread):
executor.submit(Process, i, i*nb_full_thread, (i+1) * nb_full_thread - 1)
print(f"Process {i} launched")
img.save(f'threading terrain {scale}.png')
Some explanations, nb_thread is the number of threads I want to create. Then, I get the number of line in my dataframe (df_size). This is usefull to determine how many lines, a thread will manage. Once done, I create my threads in order to they process the image and save the image.
With ThreadPoolExecutor, the program works but it takes the same amount of time as the previous version. And with ProcessPoolExecutor, the forloop in the Process function not orks as expected, it stops at the first value of the range.
I don't understand those behaviours, that is why I am turning to you.
I hope, it's clear enough, do not hesitate if you have a question.

Related

Looking for a more efficient way to do the array multiplication in this loop

I have a script that is taking a bit long to run, so I was trying to look through and speed it up where I can. I found a part that takes ~10 minutes or so and I feel like it could be a bit more efficient, but I might be wrong.
Basically, I am trying to multiply one array, W, with another array, res. W is an NxN diagonal matrix with 0's on the off diagonals and real numbers on the diagonal and I have Y occurrences such that the full size of the array is NxNxY.
res is a NxY array.
I want to multiply W*res for each Y. So a single instance would be like W[:,:,1]*res[:,1].
In the simplest case this could be a for loop, but that was super inefficient, so I figured this out
Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
This is better, but this is all being done in a different loop that ends up repeated T times.
I was hoping to improve this by saving each instance of W and res, so they end up being NxNxYxT and NxYxT respectively and doing it all in a single operation. Which I think I accomplished by this:
W_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
But this seems to take even longer.
I am hoping someone can see something I am missing.
I attempted to write a script to replicate this. It isn't perfect, but I think it demonstrates the difference. W and res are not randomly generated, so improving those parts won't help (which are particularly slow right now so I timed the relevant parts separately). It seems like the 2nd case is faster for smaller sizes, but pretty quickly the 1st case becomes faster.
import numpy as np
import time
gsize = 10000 #real size is 40000
t_range = np.arange(0,100) # real size is 200
n_range = np.arange(0,5) # real size is 65
I = np.identity(len(n_range))
test = np.zeros([gsize,len(t_range)]);
W = np.zeros([len(n_range),len(n_range),gsize])
tr = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
W[:,:,i] = I*tr[:,i]
res = np.random.rand(len(n_range),gsize)
# first case
tot_time = 0
for t in t_range:
#Generating random inputs for example, this are not normally random.
res = np.random.rand(len(n_range),gsize)
W = np.zeros([len(n_range),len(n_range),gsize])
tr = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
W[:,:,i] = I*tr[:,i]
st = time.time()
Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
ss = (np.sum(np.square(Wres), axis=0))
test[:,t] = ss
tot_time = tot_time + time.time() - st
print(tot_time) # total time for first case
# 2nd method
W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float)
res_T = np.zeros((len(n_range),gsize,len(t_range)), float)
for t in t_range:
res = np.random.rand(len(n_range),gsize)
W = np.zeros([len(n_range),len(n_range),gsize])
tr = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
W[:,:,i] = I*tr[:,i]
res_T[:,:,t] = res
W_T[:,:,:,t] = W
st = time.time()
Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
test = np.sum(Wres_T, axis = 0)
print(time.time() - st) #time for second case
I'd appreciate any help/ideas.
Thanks

Given that you know some of your matrix is just a diagonal, you can save a lot of time only using the diagonal part.
Your original methods take:
0.05394101142883301
0.04986453056335449
On my computer.
Here's a couple of tries at it. Make sure you verify there's no mistakes in the values: it looks ok to me though.
Replacing just the Wres call in the first call, with Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)
tot_time = 0
for t in t_range:
#Generating random inputs for example, this are not normally random.
res = np.random.rand(len(n_range),gsize)
W = np.zeros([len(n_range),len(n_range),gsize])
tr = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
W[:,:,i] = I*tr[:,i]
st = time.time()
#Wres = np.squeeze(np.matmul(W.transpose(2, 0, 1), res.T[..., None])).T
Wres = (np.diagonal(W, axis1=0, axis2=1).T*res)
ss = (np.sum(np.square(Wres), axis=0))
test[:,t] = ss
tot_time = tot_time + time.time() - st
print(tot_time) # total time for first case
Gives me:
0.01994800567626953
Modifying the second one in a similar manner: Wres_T = np.diagonal(W_T).T * res_T.transpose(0,2,1)
# 2nd method
W_T = np.zeros((len(n_range),len(n_range),gsize,len(t_range)), float)
res_T = np.zeros((len(n_range),gsize,len(t_range)), float)
for t in t_range:
res = np.random.rand(len(n_range),gsize)
W = np.zeros([len(n_range),len(n_range),gsize])
tr = np.random.rand(len(n_range),gsize)
for i in range(0,tr.shape[1]):
W[:,:,i] = I*tr[:,i]
res_T[:,:,t] = res
W_T[:,:,:,t] = W
st = time.time()
#Wres_T = np.squeeze(np.matmul(W_T.transpose(2,3,0,1), res_T.transpose(1,2,0)[..., None])).T
Wres_T = np.diagonal(W_T).T * res_T.transpose(0,2,1)
test = np.sum(Wres_T, axis = 0)
print(time.time() - st) #time for second case
Gives me:
0.011936426162719727
So, we are at between 1/4 to 1/5 of the original time.

Quick pixel manipulation with Pillow and/or NumPy

I'm trying to improve the speed of my image manipulation as it's been too slow for actual use.
What I need to do is apply a complex transformation on the colour of every pixel on an image. The manipulation is basically apply a vector transform like T(r, g, b, a) => (r * x, g * x, b * y, a) or in layman's terms, it's a multiplication of Red and Green values by a constant, a different multiplication for Blue and keep Alpha. But I also need to manipulate it differently if the RGB colour falls under some specific colours, in those cases they must follow a dictionary/transformation table where RGB => newRGB again keeping alpha.
The algorithm would be:
for each pixel in image:
if pixel[r, g, b] in special:
return special[pixel[r, g, b]] + pixel[a]
else:
return T(pixel)
It's simple but speed has been sub-optimal. I believe there's some way using numpy vectors, but I could not find how.
Important details about the implementation:
I don't care about the original buffer/image (manipulation can be in place)
I can use wxPython, Pillow and NumPy
Order or dimension of the array is not important as long as the buffer keeps the length
The buffer is obtained from a wxPython Bitmap and special and (RG|B)_pal are transformation tables, the end result will become a wxPython Bitmap too. They're obtained like these:
# buffer
bitmap = wx.Bitmap # it's valid wxBitmap here, this is just to let you know it exists
buff = bytearray(bitmap.GetWidth() * bitmap.GetHeight() * 4)
bitmap.CopyToBuffer(buff, wx.BitmapBufferFormat_RGBA)
self.RG_mult= 0.75
self.B_mult = 0.83
self.RG_pal = []
self.B_pal = []
for i in range(0, 256):
self.RG_pal.append(int(i * self.RG_mult))
self.B_pal.append(int(i * self.B_mult))
self.special = {
# RGB: new_RGB
# Implementation specific for the fastest access
# with buffer keys are 24bit numbers, with PIL keys are tuples
}
Implementations I tried include direct buffer manipulation:
for x in range(0, bitmap.GetWidth() * bitmap.GetHeight()):
index = x * 4
r = buf[index]
g = buf[index + 1]
b = buf[index + 2]
rgb = buf[index:index + 3]
if rgb in self.special:
special = self.special[rgb]
buf[index] = special[0]
buf[index + 1] = special[1]
buf[index + 2] = special[2]
else:
buf[index] = self.RG_pal[r]
buf[index + 1] = self.RG_pal[g]
buf[index + 2] = self.B_pal[b]
Use Pillow with getdata():
pil = Image.frombuffer("RGBA", (bitmap.GetWidth(), bitmap.GetHeight()), buf)
pil_buf = []
for colour in pil.getdata():
colour_idx = colour[0:3]
if (colour_idx in self.special):
special = self.special[colour_idx]
pil_buf.append((
special[0],
special[1],
special[2],
colour[3],
))
else:
pil_buf.append((
self.RG_pal[colour[0]],
self.RG_pal[colour[1]],
self.B_pal[colour[2]],
colour[3],
))
pil.putdata(pil_buf)
buf = pil.tobytes()
Pillow with point() and getdata() (fastest I achieved, more than twice times faster than others)
pil = Image.frombuffer("RGBA", (bitmap.GetWidth(), bitmap.GetHeight()), buf)
r, g, b, a = pil.split()
r = r.point(lambda r: r * self.RG_mult)
g = g.point(lambda g: g * self.RG_mult)
b = b.point(lambda b: b * self.B_mult)
pil = Image.merge("RGBA", (r, g, b, a))
i = 0
for colour in pil.getdata():
colour_idx = colour[0:3]
if (colour_idx in self.special):
special = self.special[colour_idx]
pil.putpixel(
(i % bitmap.GetWidth(), i // bitmap.GetWidth()),
(
special[0],
special[1],
special[2],
colour[3],
)
)
i += 1
buf = pil.tobytes()
I also tried working with numpy.where but then I could not get it to work. With numpy.apply_along_axis it worked but the performance was terrible. Other tries with numpy I could not access the RGB together, only as separated bands.

Pure Numpy Version
This first optimization relies on the fact, that one probably has way less special colors than pixels. I use numpy to do all the inner loops. This works well with images of up to 1MP. If You have multiple images I'd recommend the parallel approach.
Let's define a test case:
import requests
from io import BytesIO
from PIL import Image
import numpy as np
# Load some image, so we have the same
response = requests.get("https://upload.wikimedia.org/wikipedia/commons/4/41/Rick_Astley_Dallas.jpg")
# Make areas of known color
img = Image.open(BytesIO(response.content)).rotate(10, expand=True).rotate(-10,expand=True, fillcolor=(255,255,255)).convert('RGBA')
print("height: %d, width: %d (%.2f MP)"%(img.height, img.width, img.width*img.height/10e6))
height: 5034, width: 5792 (2.92 MP)
Define our special colors
specials = {
(4,1,6):(255,255,255),
(0, 0, 0):(255, 0, 255),
(255, 255, 255):(0, 255, 0)
}
Algorithm
def transform_map(img, specials, R_factor, G_factor, B_factor):
# Your transform
def transform(x, a):
a *= x
return a.clip(0, 255).astype(np.uint8)
# Convert to array
img_array = np.asarray(img)
# Extract channels
R = img_array.T[0]
G = img_array.T[1]
B = img_array.T[2]
A = img_array.T[3]
# Find Special colors
# First, calculate a uniqe hash
color_hashes = (R + 2**8 * G + 2**16 * B)
# Find inidices of special colors
special_idxs = []
for k, v in specials.items():
key_arr = np.array(list(k))
val_arr = np.array(list(v))
spec_hash = key_arr[0] + 2**8 * key_arr[1] + 2**16 * key_arr[2]
special_idxs.append(
{
'mask': np.where(np.isin(color_hashes, spec_hash)),
'value': val_arr
}
)
# Apply transform to whole image
R = transform(R, R_factor)
G = transform(G, G_factor)
B = transform(B, B_factor)
# Replace values where special colors were found
for idx in special_idxs:
R[idx['mask']] = idx['value'][0]
G[idx['mask']] = idx['value'][1]
B[idx['mask']] = idx['value'][2]
return Image.fromarray(np.array([R,G,B,A]).T, mode='RGBA')
And finally some bench marks on a Intel Core i5-6300U # 2.40GHz
import time
times = []
for i in range(10):
t0 = time.time()
# Test
transform_map(img, specials, 1.2, .9, 1.2)
#
t1 = time.time()
times.append(t1-t0)
np.round(times, 2)
print('average run time: %.2f +/-%.2f'%(np.mean(times), np.std(times)))
average run time: 9.72 +/-0.91
EDIT Parallelization
With the same setup as above, we can get a 2x speed increase on large images. (Small ones are faster without numba)
from numba import njit, prange
from numba.core import types
from numba.typed import Dict
# Map dict of special colors or transform over array of pixel values
#njit(parallel=True, locals={'px_hash': types.uint32})
def check_and_transform(img_array, d, T):
#Save Shape for later
shape = img_array.shape
# Flatten image for 1-d iteration
img_array_flat = img_array.reshape(-1,3).copy()
N = img_array_flat.shape[0]
# Replace or map
for i in prange(N):
px_hash = np.uint32(0)
px_hash += img_array_flat[i,0]
px_hash += types.uint32(2**8) * img_array_flat[i,1]
px_hash += types.uint32(2**16) * img_array_flat[i,2]
try:
img_array_flat[i] = d[px_hash]
except Exception:
img_array_flat[i] = (img_array_flat[i] * T).astype(np.uint8)
# return image
return img_array_flat.reshape(shape)
# Wrapper for function above
def map_or_transform_jit(image: Image, specials: dict, T: np.ndarray):
# assemble numba typed dict
d = Dict.empty(
key_type=types.uint32,
value_type=types.uint8[:],
)
for k, v in specials.items():
k = types.uint32(k[0] + 2**8 * k[1] + 2**16 * k[2])
v = np.array(v, dtype=np.uint8)
d[k] = v
# get rgb channels
img_arr = np.array(img)
rgb = img_arr[:,:,:3].copy()
img_shape = img_arr.shape
# apply map
rgb = check_and_transform(rgb, d, T)
# set color channels
img_arr[:,:,:3] = rgb
return Image.fromarray(img_arr, mode='RGBA')
# Benchmark
import time
times = []
for i in range(10):
t0 = time.time()
# Test
test_img = map_or_transform_jit(img, specials, np.array([1, .5, .5]))
#
t1 = time.time()
times.append(t1-t0)
np.round(times, 2)
print('average run time: %.2f +/- %.2f'%(np.mean(times), np.std(times)))
test_img
average run time: 3.76 +/- 0.08

Multiprocessing on chunks of an image

I have a function that has to loop through individual pixels of an image and calculate some geometry. This function takes a very long time to run (~5 hours on a 24 Megapixel image) but seems like it should be easy to run in parallel on multiple cores. However, I can't for the life of me find a well documented, well explained example of doing something like this using the Multiprocessing package. Here is the code I am running right now as a toy example:
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from skimage import color
import multiprocessing
from multiprocessing import Process
#Some dumb stand in function for this exercise
def dumb_func(image):
ny, nx = image.shape
temp = np.empty_like(image)
for y in range(ny):
for x in range(nx):
temp[y, x] = np.square(image[y, x])
return temp
#Convert image to greyscale
img = color.rgb2gray(misc.ascent())
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
#Pull core count and divide by two to be safe
cores = int(multiprocessing.cpu_count() / 2)
result = np.empty_like(chunked)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
Basically this code loads in an image, converts it to greyscale, makes it bigger, and then chunks it up. The chunked array is of shape (i, j, ny, nx) where i and j are indices that identify the chunk of the image I am working with, and ny,nx describe the size in pixels of each chunk.
Additionally, I am creating an array called idxs that stores all possible indices into the chunked array to pull the chunked images out.
What I want to do is run a function (in this case the dumb_func as an example) over the chunks in parallel and store the results in the results array of the same shape. The way I imagined doing it was to loop over the idxs array and assign processes the chunks belonging to those indexes up to the number of cores, wait for those cores to finish, then feed the cores more processes until finished. I got stuck because I couldn't A) figure out how to access the return value in the function, and B) how to handle a situation where I might have 16 chunks and 5 cores leading to the last iteration only requiring a single process.
How can I go about doing this? I've spent the last 6-7 hours reading about Multiprocessing Pool, Process, Map, Starmap, etc... and can't for the life of me understand how to implement this.
Edit for Reedinationer:
This is my updated code and runs without error. However the new_data array is never updated. I filled it with a value of 100 and at the end of the routine new_data is exactly how it was initialized.
import numpy as np
import matplotlib.pyplot as plt
from scipy import misc
from multiprocessing import Process, JoinableQueue
from time import time
#SOme dumb stand in function for this exercise
def dumb_func(q, new_data):
while True:
index, image = q.get()
temp = image **2
new_data[index[0], index[1], :, :] = temp
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
img = misc.ascent()
#Resize the image
ns = 2048 #Pixel size
img = misc.imresize(img, size = (ns, ns))
#Split the image into equal chunks...not sure how this works for arrays that
#are weird shapes and aren't the same size in each dimension
divs = 4
init_split = np.array_split(img, divs, axis = 0)
side = init_split[0].shape[0]
chunked = np.empty((divs, divs, side, side))
cur = 0
for i in range(divs):
split = np.array_split(init_split[i], divs, axis = 1)
for j in range(divs):
chunked[i, j, :, :] = split[j]
cur +=1
new_data = np.full(chunked.shape, 100)
idxs = np.array(np.meshgrid(np.arange(0, divs, 1),
np.arange(0, divs, 1))).T.reshape(-1, 2)
for i in range(len(idxs)):
q.put((idxs[i], chunked[idxs[i][0], idxs[i][1], :, :]))
print ('starting workers')
worker_count = len(idxs)
processes = []
for i in range(worker_count):
p = Process(target=dumb_func, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
end = time()
print('{:.3f} seconds elapsed'.format(end - start))

I'd do something like this, starting with dependencies:
from multiprocessing import Pool
import numpy as np
from PIL import Image
# and some for testing
from random import random
from time import sleep
first I define a function to divide an image up into "chunks", sort of as you talked about:
def chunkit(ys, xs, blocksize=64):
for y in range(0, ys, blocksize):
yt = (y, min(ys, y + blocksize))
for x in range(0, xs, blocksize):
xt = (x, min(xs, x + blocksize))
yield yt, xt
it's a lazy iterator, so this can go on for a while.
I then define my worker function:
def dumb_func(cc):
(y0,y1), (x0,x1) = cc
# convert to floats for ease of processing
chunk = image[y0:y1,x0:x1] / 255.
# random slow down for testing
# sleep(random() ** 6)
res = chunk ** 2
# convert back to bytes for efficiency
return cc, (res * 255).astype(np.uint8)
I make sure the source array stays as close to original format as possible for efficiency and send it back in the same format (this might take some fiddling, if you're dealing with other pixel formats obviously).
then I put it together:
if __name__ == '__main__':
source = Image.open('tmp.jpeg')
image = np.asarray(source)
print("loaded", image.shape, image.dtype)
with Pool() as pool:
resit = pool.imap_unordered(
dumb_func, chunkit(*image.shape[:2]))
output = np.empty_like(image)
for cc, res in resit:
(y0,y1), (x0,x1) = cc
output[y0:y1,x0:x1] = res
im = Image.fromarray(output, 'RGB')
im.save('out.jpeg')
this churns through a 15Mpixel image in a couple of seconds, with most of that spent loading/saving the image. it could probably be a lot more clever with array strides and cache friendliness, but hope that helps!
note: I think this code relies on CPython Unix style process forking semantics to make sure the image is shared between processes efficiently. not sure what would happen if you ran it on something else

I've been working on code for basically this same thing. Right now the goal is just to replace white pixels with transparent ones, but it seems to replace the entire image so there is a bug somewhere...It doesn't get an error within the multiprocessing module anymore though, so maybe it could serve as an example of how to load a Queue and then have your worker processes work on it!
from PIL import Image
from multiprocessing import Process, JoinableQueue
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
new_data = [0] * len(datas) # make a blank array the size of our image to fill later
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
I think it's important to "protect" your code inside the if __name__ == "__main__" block otherwise the spawned processes seem to run it.
update
It looks like you need to implement a Manager() (or there are probably other ways I am ignorant of as well!). I got my code to run by altering it into:
from PIL import Image
from multiprocessing import Process, JoinableQueue, Manager
from threading import Thread
from time import time
def worker_function(q, new_data):
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
if __name__ == "__main__":
start = time()
q = JoinableQueue()
my_image = Image.open('InputImage.jpg')
my_image = my_image.convert('RGBA')
datas = list(my_image.getdata())
# new_data = [(0, 0, 0, 0)]*len(datas)
manager = Manager()
new_data = manager.list([(0, 0, 0, 0)]*len(datas))
print(new_data)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
print('starting workers')
worker_count = 50
processes = []
for i in range(worker_count):
p = Process(target=worker_function, args=[q, new_data])
p.daemon = True
p.start()
print('main thread waiting')
q.join()
print("Saving Image")
my_image.putdata(new_data)
my_image.save('output.png', "PNG")
end = time()
print('{:.3f} seconds elapsed'.format(end - start))
Although this doesn't seem like the fastest option! I'm sure there are other ways to increase speed. My code to do the same thing with Threads looks VERY similar:
from PIL import Image
from threading import Thread
from queue import Queue
import time
start = time.time()
q = Queue()
planeIm = Image.open('InputImage.jpg')
planeIm = planeIm.convert('RGBA')
datas = planeIm.getdata()
new_data = [0] * len(datas)
print('putting image into queue')
for count, item in enumerate(datas):
q.put((count, item))
def worker_function():
while True:
# print("Items in queue: {}".format(q.qsize()))
index, pixel = q.get()
if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240:
out_pixel = (0, 0, 0, 0)
else:
out_pixel = pixel
new_data[index] = out_pixel
q.task_done()
print('starting workers')
worker_count = 100
for i in range(worker_count):
t = Thread(target=worker_function)
t.daemon = True
t.start()
print('main thread waiting')
q.join()
print('Queue has been joined')
planeIm.putdata(new_data)
planeIm.save('output.png', "PNG")
end = time.time()
elapsed = end - start
print('{:3.3} seconds elapsed'.format(elapsed))
Yet, processing my image takes ~23 seconds with threads and ~170 seconds with multiprocessing!! I suspect this would come from the larger overhead needed to start Process objects, and the fact that my algorithm for processing each pixel is simple for now (just the if pixel[0] > 240 and pixel[1] > 240 and pixel[2] > 240: bit), so I'm likely not yielding the speed improvements that a complex pixel processing algorithm would get me. Also to note multiprocessing documentation
a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory.
Which leads me to believe that there are alternatives that are faster.

Loop through large .tif stack (image raster) and extract positions

I want to run through a large tif stack +1500 frames and extract the coordinates of the local maxima for each frame. The code below does the job, however extremely slow for large files. When running on smaller bits (e.g. 20 frames) each frame is done almost instantly - when running on the whole dataset, each frame takes seconds.
Any solutions to run a faster code? I figure it is due to the loading of the large tiff file - however it should only be necessary one time initially?
I have the following code:
from pims import ImageSequence
from skimage.feature import peak_local_max
def cmask(index,array):
radius = 3
a,b = index
nx,ny = array.shape
y,x = np.ogrid[-a:nx-a,-b:ny-b]
mask = x*x + y*y <= radius*radius
return(sum(array[mask])) # number of pixels
images = ImageSequence('tryhard_red_small.tif')
frame_list = []
x = []
y = []
int_liposome = []
BG_liposome = []
for i in range(len(images[0])):
tmp_frame = images[0][i]
xy = pd.DataFrame(peak_local_max(tmp_frame, min_distance=8,threshold_abs=3000))
x.extend(xy[0].tolist())
y.extend(xy[1].tolist())
for j in range(len(xy)):
index = x[j],y[j]
int_liposome.append(cmask(index,tmp_frame))
frame_list.extend([i]*len(xy))
print "Frame: ", i, "of ",len(images[0])
features = pd.DataFrame(
{'lip_int':int_liposome,
'y' : y,
'x' : x,
'frame' : frame_list})

Have you tried profiling the code, say with %prun or %lprun in ipython? That'll tell you exactly where your slowdowns are occurring.
I can't make my own version of this without the tif stack, but I suspect the problem is the fact that you're using lists to store everything. Every time you do an append or an extension, python is having to allocate more memory. You could try getting the total count of maxima first, then allocating your output arrays, then rerunning to fill the arrays. Something like below
# run through once to get the count of local maxima
npeaks = (len(peak_local_max(f, min_distance=8, threshold_abs=3000))
for f in images[0])
total_peaks = sum(npeaks)
# allocate storage arrays and rerun
x = np.zeros(total_peaks, np.float)
y = np.zeros_like(x)
int_liposome = np.zeros_like(x)
BG_liposome = np.zeros_like(x)
frame_list = np.zeros(total_peaks, np.int)
index_0 = 0
for frame_ind, tmp_frame in enumerate(images[0]):
peaks = pd.DataFrame(peak_local_max(tmp_frame, min_distance=8,threshold_abs=3000))
index_1 = index_0 + len(peaks)
# copy the data from the DataFrame's underlying numpy array
x[index_0:index_1] = peaks[0].values
y[index_0:index_1] = peaks[1].values
for i, peak in enumerate(peaks, index_0):
int_liposome[i] = cmask(peak, tmp_frame)
frame_list[index_0:index_1] = frame_ind
# update the starting index
index_0 = index_1
print "Frame: ", frame_ind, "of ",len(images[0])

Fast algorithm to detect main colors in an image?

Does anyone know a fast algorithm to detect main colors in an image?
I'm currently using k-means to find the colors together with Python's PIL but it's very slow. One 200x200 image takes 10 seconds to process. I've several hundred thousand images.

One fast method would be to simply divide up the color space into bins and then construct a histogram. It's fast because you only need a small number of decisions per pixel, and you only need one pass over the image (and one pass over the histogram to find the maxima).
Update: here's a rough diagram to help explain what I mean.
On the x-axis is the color divided into discrete bins. The y-axis shows the value of each bin, which is the number of pixels matching the color range of that bin. There are two main colors in this image, shown by the two peaks.

With a bit of tinkering, this code (which I suspect you might have already seen!) can be sped up to just under a second
If you increase the kmeans(min_diff=...) value to about 10, it produces very similar results, but runs in 900ms (compared to about 5000-6000ms with min_diff=1)
Further decreasing the size of the thumbnails to 100x100 doesn't seem to impact the results much either, and takes the runtime to about 250ms
Here's a slightly tweaked version of the code, which just parameterises the min_diff value, and includes some terrible code to generate an HTML file with the results/timing
from collections import namedtuple
from math import sqrt
import random
try:
import Image
except ImportError:
from PIL import Image
Point = namedtuple('Point', ('coords', 'n', 'ct'))
Cluster = namedtuple('Cluster', ('points', 'center', 'n'))
def get_points(img):
points = []
w, h = img.size
for count, color in img.getcolors(w * h):
points.append(Point(color, 3, count))
return points
rtoh = lambda rgb: '#%s' % ''.join(('%02x' % p for p in rgb))
def colorz(filename, n=3, mindiff=1):
img = Image.open(filename)
img.thumbnail((200, 200))
w, h = img.size
points = get_points(img)
clusters = kmeans(points, n, mindiff)
rgbs = [map(int, c.center.coords) for c in clusters]
return map(rtoh, rgbs)
def euclidean(p1, p2):
return sqrt(sum([
(p1.coords[i] - p2.coords[i]) ** 2 for i in range(p1.n)
]))
def calculate_center(points, n):
vals = [0.0 for i in range(n)]
plen = 0
for p in points:
plen += p.ct
for i in range(n):
vals[i] += (p.coords[i] * p.ct)
return Point([(v / plen) for v in vals], n, 1)
def kmeans(points, k, min_diff):
clusters = [Cluster([p], p, p.n) for p in random.sample(points, k)]
while 1:
plists = [[] for i in range(k)]
for p in points:
smallest_distance = float('Inf')
for i in range(k):
distance = euclidean(p, clusters[i].center)
if distance < smallest_distance:
smallest_distance = distance
idx = i
plists[idx].append(p)
diff = 0
for i in range(k):
old = clusters[i]
center = calculate_center(plists[i], old.n)
new = Cluster(plists[i], center, old.n)
clusters[i] = new
diff = max(diff, euclidean(old.center, new.center))
if diff < min_diff:
break
return clusters
if __name__ == '__main__':
import sys
import time
for x in range(1, 11):
sys.stderr.write("mindiff %s\n" % (x))
start = time.time()
fname = "akira_940x700.png"
col = colorz(fname, 3, x)
print "<h1>%s</h1>" % x
print "<img src='%s'>" % (fname)
print "<br>"
for a in col:
print "<div style='background-color: %s; width:20px; height:20px'> </div>" % (a)
print "<br>Took %.02fms<br> % ((time.time()-start)*1000)

K-means is a good choice for this task because you know number of main colors beforehand. You need to optimize K-means. I think you can reduce your image size, just scale it down to 100x100 pixels or so. Find the size on witch your algorithm works with acceptable speed. Another option is to use dimensionality reduction before k-means clustering.
And try to find fast k-means implementation. Writing such things in python is a misuse of python. It's not supposed to be used like this.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use multiprocessing and multithreading to print on a image - python

Related

Looking for a more efficient way to do the array multiplication in this loop

Quick pixel manipulation with Pillow and/or NumPy

Multiprocessing on chunks of an image

Loop through large .tif stack (image raster) and extract positions

Fast algorithm to detect main colors in an image?

Categories

Resources