I have three functions in the python that each one puts an image (image path) as input and makes a simple image processing and creates a new image (image path) as output.
in the example below, one function depends on the other, ie:
the function of alg2 takes as input the image that generates the function of alg and the function of alg3 assign as input the image that generates the function of alg2 which depends on the function of alg1.
(I hope you do not mind basically)
because of their relatively high execution time (image processing is that) I would like to ask if I can
to parallelize them using python multiprocessing.
I have read about multiprocessing map and pool but I was pretty confused .
whenever I summarize I have three interdependent functions and I would like to run them together if done.
I would also like to know how I would perform these three functions in a contemporary way if they were not interdependent, ie each was autonomous.
def alg1(input_path_image,output_path_image):
start = timeit.default_timer()
###processing###)
stop = timeit.default_timer()
print stop - start
return output_path_image
def alg1(output_path_image,output_path_image1):
start = timeit.default_timer()
###processing###
stop = timeit.default_timer()
print stop - start
return output_path_image1
def alg3(output_path_image1,output_path_image2):
start = timeit.default_timer()
###processing###
stop = timeit.default_timer()
print stop - start
return output_path_image2
if __name__ == '__main__':
alg1(input_path_image,output_path_image)
alg2(output_path_image,output_path_image1)
alg3(output_path_image1,output_path_image2)
Here is what I would do:
I would split the list of images into smaller parts. Then I would make one function out of those three functions (by making the other 2 functions as private - just for the sake of simplicity). Then you can speed up the whole process by doing:
from multiprocessing import Process
image_list = this_is_your_huge_image_list
# create smaller image lists e.g. [[1, 2, 3], [4, 5, 6], ..]
chunked_lists = [image_list[x:x+100] for x in xrange(0, len(image_list), 100)]
for img_list in chunked_lists:
p = Process(target=your_main_func, args=(img_list,))
p.start()
# without .join() here
It sounds like you're doing something CPU intensive, so you'll need to use the multiprocessing.Process object, rather than threading.Thread. Because of this, you can't return from multiprocessing.Process, and therefore will need to use a multiprocessing.Manager.
So this is an adaptation of your code which will work with multiprocessing.Process:
from multiprocessing import Process, Manager
def alg1(input_path_image,output_path_image, return_dict):
start = timeit.default_timer()
###processing###)
stop = timeit.default_timer()
print stop - start
return_dict['algo1'] = output_path_image
def alg2(output_path_image,output_path_image1, return_dict):
start = timeit.default_timer()
###processing###
stop = timeit.default_timer()
print stop - start
return_dict['algo2'] = output_path_image1
def alg3(output_path_image1,output_path_image2, return_dict):
start = timeit.default_timer()
###processing###
stop = timeit.default_timer()
print stop - start
return_dict['algo3'] = output_path_image2
if __name__ == '__main__':
manager = Manager()
return_dict = manager.dict()
a1 = Process(target=alg1, args=(output_path_image,output_path_image, return_dict))
a2 = Process(target=alg2, args=(output_path_image1,output_path_image1, return_dict))
a3 = Process(target=alg3, args=(output_path_image2,output_path_image2, return_dict))
jobs = [a1, a2, a3]
for job in jobs:
job.start()
for job in jobs:
job.join()
a1_return = return_dict['algo1']
a2_return = return_dict['algo2']
a3_return = return_dict['algo3']
You'll need to modify this further to give your print statements a little more distinction. At the moment, they will only print a number, and you won't be able to distinguish between them.
Related
I want to translate a huge matlab model to python. Therefor I need to work on the key functions first. One key function handles parallel processing. Basically, a matrix with parameters is the input, in which every row represents the parameters for one run. These parameters are used within a computation-heavy function. This computation-heavy function should run in parallel, I don't need the results of a previous run for any other run. So all processes can run independent from eachother.
Why is starmap_async slower on my pc? Also: When i add more code (to test consecutive computation) my python crashes (i use spyder). Can you give me advice?
import time
import numpy as np
import multiprocessing as mp
from functools import partial
# Create simulated data matrix
data = np.random.random((100,3000))
data = np.column_stack((np.arange(1,len(data)+1,1),data))
def EAF_DGL(*z, package_num):
sum_row = 0
for i in range(1,np.shape(z)[0]):
sum_row = sum_row + z[i]
func_result = np.column_stack((package_num,z[0],sum_row))
return func_result
t0 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap_async(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])]).get()
pool.close()
pool.join()
t1 = time.time()
calculation_time_parallel_async = t1-t0
print(calculation_time_parallel_async)
t2 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])])
pool.close()
pool.join()
t3 = time.time()
calculation_time_parallel = t3-t2
print(calculation_time_parallel)
I want to calculate the mean of several variables after iterating many times. My function creates random data, and from those, I calculate the variables (using other functions).
So far I have:
stuff1_list = []
stuff2_list = []
stuff3_list = []
for i in range(100):
data = create_data(arg1, arg2)
stuff1_list.append(calc_stuff1(data))
stuff2_list.append(calc_stuff2(data))
stuff3_list.append(calc_stuff3(data))
mean1 = np.mean(stuff1_list)
mean2 = np.mean(stuff2_list)
mean3 = np.mean(stuff3_list)
I've been trying to figure out how to do this with multiprocessing, but I am confused with Process, Queue, Pool, and so on. How can I get this job done with parallel processing?
My approach would be:
def do_stuff():
stuff_list = []
for i in range(100):
data = create_data(arg1, arg2)
stuff_list.append(calc_stuff(data))
print(np.mean(stuff_list))
for i in range(3):
p = multiprocessing.Process(target=do_stuff, args=())
p.start()
I want to integrate a parallel processing to make my for loops run faster.
However, I noticed that it has just made my code run slower. See below example where I am using joblib with a simple function on a list of random integers. Notice that without the parallel processing it runs faster than with.
Any insight as to what is happening?
def f(x):
return x**x
if __name__ == '__main__':
s = [random.randint(0, 100) for _ in range(0, 10000)]
# without parallel processing
t0 = time.time()
out1 = [f(x) for x in s]
t1 = time.time()
print("without parallel processing: ", t1 - t0)
# with parallel processing
t0 = time.time()
out2 = Parallel(n_jobs=8, batch_size=len(s), backend="threading")(delayed(f)(x) for x in s)
t1 = time.time()
print("with parallel processing: ", t1 - t0)
I am getting the following output:
without parallel processing: 0.0070569515228271484
with parallel processing: 0.10714387893676758
The parameter batch_size=len(s) effectively says give each process a batch of s jobs. This means you create 8 threads but then give all workload to 1 thread.
Also you might want to increase the workload to have a measurable advantage. I prefer to use time.sleep delays:
def f(x):
time.sleep(0.001)
return x**x
out2 = Parallel(n_jobs=8,
#batch_size=len(s),
backend="threading")(delayed(f)(x) for x in s)
without parallel processing: 11.562264442443848
with parallel processing: 1.412865400314331
I am trying to come up with a way to have threads work on the same goal without interfering. In this case I am using 4 threads to add up every number between 0 and 90,000. This code runs but it ends almost immediately (Runtime: 0.00399994850159 sec) and only outputs 0. Originally I wanted to do it with a global variable but I was worried about the threads interfering with each other (ie. the small chance that two threads double count or skip a number due to strange timing of the reads/writes). So instead I distributed the workload beforehand. If there is a better way to do this please share. This is my simple way of trying to get some experience into multi threading. Thanks
import threading
import time
start_time = time.time()
tot1 = 0
tot2 = 0
tot3 = 0
tot4 = 0
def Func(x,y,tot):
tot = 0
i = y-x
while z in range(0,i):
tot = tot + i + z
# class Tester(threading.Thread):
# def run(self):
# print(n)
w = threading.Thread(target=Func, args=(0,22499,tot1))
x = threading.Thread(target=Func, args=(22500,44999,tot2))
y = threading.Thread(target=Func, args=(45000,67499,tot3))
z = threading.Thread(target=Func, args=(67500,89999,tot4))
w.start()
x.start()
y.start()
z.start()
w.join()
x.join()
y.join()
z.join()
# while (w.isAlive() == False | x.isAlive() == False | y.isAlive() == False | z.isAlive() == False): {}
total = tot1 + tot2 + tot3 + tot4
print total
print("--- %s seconds ---" % (time.time() - start_time))
You have a bug that makes this program end almost immediately. Look at while z in range(0,i): in Func. z isn't defined in the function and its only by luck (bad luck really) that you happen to have a global variable z = threading.Thread(target=Func, args=(67500,89999,tot4)) that masks the problem. You are testing whether the thread object is in a list of integers... and its not!
The next problem is with the global variables. First, you are absolutely right that using a single global variable is not thread safe. The threads would mess with each others calculations. But you misunderstand how globals work. When you do threading.Thread(target=Func, args=(67500,89999,tot4)), python passes the object currently referenced by tot4 to the function, but the function has no idea which global it came from. You only update the local variable tot and discard it when the function completes.
A solution is to use a global container to hold the calculations as shown in the example below. Unfortunately, this is actually slower than just doing all the work in one thread. The python global interpreter lock (GIL) only lets 1 thread run at a time and only slows down CPU-intensive tasks implemented in pure python.
You could look at the multiprocessing module to split this into multiple processes. That works well if the cost of running the calculation is large compared to the cost of starting the process and passing it data.
Here is a working copy of your example:
import threading
import time
start_time = time.time()
tot = [0] * 4
def Func(x,y,tot_index):
my_total = 0
i = y-x
for z in range(0,i):
my_total = my_total + i + z
tot[tot_index] = my_total
# class Tester(threading.Thread):
# def run(self):
# print(n)
w = threading.Thread(target=Func, args=(0,22499,0))
x = threading.Thread(target=Func, args=(22500,44999,1))
y = threading.Thread(target=Func, args=(45000,67499,2))
z = threading.Thread(target=Func, args=(67500,89999,3))
w.start()
x.start()
y.start()
z.start()
w.join()
x.join()
y.join()
z.join()
# while (w.isAlive() == False | x.isAlive() == False | y.isAlive() == False | z.isAlive() == False): {}
total = sum(tot)
print total
print("--- %s seconds ---" % (time.time() - start_time))
You can pass in a mutable object that you can add your results either with an identifier, e.g. dict or just a list and append() the results, e.g.:
import threading
def Func(start, stop, results):
results.append(sum(range(start, stop+1)))
rngs = [(0, 22499), (22500, 44999), (45000, 67499), (67500, 89999)]
results = []
jobs = [threading.Thread(target=Func, args=(start, stop, results)) for start, stop in rngs]
for j in jobs:
j.start()
for j in jobs:
j.join()
print(sum(results))
# 4049955000
# 100 loops, best of 3: 2.35 ms per loop
As others have noted you could look multiprocessing in order to split the work to multiple different processes that can run parallel. This would benefit especially in CPU-intensive tasks assuming that there isn't huge amount of data to pass between the processes.
Here's a simple implementation of the same functionality using multiprocessing:
from multiprocessing import Pool
POOL_SIZE = 4
NUMBERS = 90000
def func(_range):
tot = 0
for z in range(*_range):
tot += z
return tot
with Pool(POOL_SIZE) as pool:
chunk_size = int(NUMBERS / POOL_SIZE)
chunks = ((i, i + chunk_size) for i in range(0, NUMBERS, chunk_size))
print(sum(pool.imap(func, chunks)))
In above chunks is a generator that produces the same ranges that were hardcoded in original version. It's given to imap which works the same as standard map except that it executes the function in the processes within the pool.
Less known fact about multiprocessing is that you can easily convert the code to use threads instead of processes by using undocumented multiprocessing.pool.ThreadPool. In order to convert above example to use threads just change import to:
from multiprocessing.pool import ThreadPool as Pool
I'm slowly switching to Python and I wanted to make a simple test for comparing the performance of a simple array summation. I generate a random 1000x1000 array and add one to each of the values in this array.
Here my script in Python :
import time
import numpy
from numpy.random import random
def testAddOne(data):
"""
Test addOne
"""
return data + 1
i = 1000
data = random((i,i))
start = time.clock()
for x in xrange(1000):
testAddOne(data)
stop = time.clock()
print stop - start
And my function in MATLAB:
function test
%parameter declaration
c=rand(1000);
tic
for t = 1:1000
testAddOne(c);
end
fprintf('Structure: \n')
toc
end
function testAddOne(c)
c = c + 1;
end
The Python takes 2.77 - 2.79 seconds, the same as the MATLAB function (I'm actually quite impressed by Numpy!). What would I have to change to my Python script to use multithreading? I can't in MATLAB since I don,t have the toolbox.
Multi threading in Python is only useful for situations where threads get blocked, e.g. on getting input, which is not the case here (see the answers to this question for more details). However, multi processing is easy to do in Python. Multiprocessing in general is covered here.
A program taking a similar approach to your example is below
import time
import numpy
from numpy.random import random
from multiprocessing import Process
def testAddOne(data):
return data + 1
def testAddN(data,N):
# print "testAddN", N
for x in xrange(N):
testAddOne(data)
if __name__ == '__main__':
matrix_size = 1000
num_adds = 10000
num_processes = 4
data = random((matrix_size,matrix_size))
start = time.clock()
if num_processes > 1:
processes = [Process(target=testAddN, args=(data,num_adds/num_processes))
for i in range(num_processes)]
for p in processes:
p.start()
for p in processes:
p.join()
else:
testAddN(data,num_adds)
stop = time.clock()
print "Elapsed", stop - start
A more useful example using a pool of worker processes to successively add 1 to different matrices is below.
import time
import numpy
from numpy.random import random
from multiprocessing import Pool
def testAddOne(data):
return data + 1
def testAddN(dataN):
data,N=dataN
for x in xrange(N):
data = testAddOne(data)
return data
if __name__ == '__main__':
num_matrices = 4
matrix_size = 1000
num_adds_per_matrix = 2500
num_processes = 4
inputs = [(random((matrix_size,matrix_size)), num_adds_per_matrix)
for i in range(num_matrices)]
#print inputs # test using, e.g., matrix_size = 2
start = time.clock()
if num_processes > 1:
proc_pool = Pool(processes=num_processes)
outputs = proc_pool.map(testAddN, inputs)
else:
outputs = map(testAddN, inputs)
stop = time.clock()
#print outputs # test using, e.g., matrix_size = 2
print "Elapsed", stop - start
In this case the code in testAddN actually does something with the result of calling testAddOne. And you can uncomment the print statements to check that some useful work is being done.
In both cases I've changed the total number of additions to 10000; with fewer additions the cost of starting up processes becomes more significant (but you can experiment with the parameters). And you can experiment with num_processes also. On my machine I found that compared to running in the same process with num_processes=1 I got just under a 2x speedup spawning four processes with num_processes=4.