I have a problem, which is similar to this:
import numpy as np
C = np.zeros((100,10))
for i in range(10):
C_sub = get_sub_matrix_C(i, other_args) # shape 10x10
C[i*10:(i+1)*10,:10] = C_sub
So, apparently there is no need to run this as a serial calculation, since each submatrix can be calculated independently.
I would like to use the multiprocessing module and create up to 4 processes for the for loop.
I read some tutorials about multiprocessing, but wasn't able to figure out how to use this to solve my problem.
Thanks for your help
A simple way to parallelize that code would be to use a Pool of processes:
pool = multiprocessing.Pool()
results = pool.starmap(get_sub_matrix_C, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res
I've used starmap since the get_sub_matrix_C function has more than one argument (starmap(f, [(x1, ..., xN)]) calls f(x1, ..., xN)).
Note however that serialization/deserialization may take significant time and space, so you may have to use a more low-level solution to avoid that overhead.
It looks like you are running an outdated version of python. You can replace starmap with plain map but then you have to provide a function that takes a single parameter:
def f(args):
return get_sub_matrix_C(*args)
pool = multiprocessing.Pool()
results = pool.map(f, ((i, other_args) for i in range(10)))
for i, res in enumerate(results):
C[i*10:(i+1)*10,:10] = res
The following recipe perhaps can do the job. Feel free to ask.
import numpy as np
import multiprocessing
def processParallel():
def own_process(i, other_args, out_queue):
C_sub = get_sub_matrix_C(i, other_args)
out_queue.put(C_sub)
sub_matrices_list = []
out_queue = multiprocessing.Queue()
other_args = 0
for i in range(10):
p = multiprocessing.Process(
target=own_process,
args=(i, other_args, out_queue))
procs.append(p)
p.start()
for i in range(10):
sub_matrices_list.extend(out_queue.get())
for p in procs:
p.join()
return sub_matrices_list
C = np.zeros((100,10))
result = processParallel()
for i in range(10):
C[i*10:(i+1)*10,:10] = result[i]
Related
I've been reading threads like this one but any of them seems to work for my case. I'm trying to parallelize the following toy example to fill a Numpy array inside a for loop using Multiprocessing in Python:
import numpy as np
from multiprocessing import Pool
import time
def func1(x, y=1):
return x**2 + y
def func2(n, parallel=False):
my_array = np.zeros((n))
# Parallelized version:
if parallel:
pool = Pool(processes=6)
for idx, val in enumerate(range(1, n+1)):
result = pool.apply_async(func1, [val])
my_array[idx] = result.get()
pool.close()
# Not parallelized version:
else:
for i in range(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(60000)
end = time.time()
print(my_array)
print("Normal time: {}\n".format(end-start))
start_parallel = time.time()
my_array_parallelized = func2(60000, parallel=True)
end_parallel = time.time()
print(my_array_parallelized)
print("Time based on multiprocessing: {}".format(end_parallel-start_parallel))
if __name__ == '__main__':
main()
The lines in the code based on Multiprocessing seem to work and give you the right results. However, it takes far longer than the non parallelized version. Here is the output of both versions:
[2.00000e+00 5.00000e+00 1.00000e+01 ... 3.59976e+09 3.59988e+09
3.60000e+09]
Normal time: 0.01605963706970215
[2.00000e+00 5.00000e+00 1.00000e+01 ... 3.59976e+09 3.59988e+09
3.60000e+09]
Time based on multiprocessing: 2.8775112628936768
My intuition tells me that it should be a better way of capturing results from pool.apply_async(). What am I doing wrong? What is the most efficient way to accomplish this? Thx.
Creating processes is expensive. On my machine it take at leas several hundred of microsecond per process created. Moreover, the multiprocessing module copy the data to be computed between process and then gather the results from the process pool. This inter-process communication is very slow too. The problem is that your computation is trivial and can be done very quickly, likely much faster than all the introduced overhead. The multiprocessing module is only useful when you are dealing with quite small datasets and perform intensive computation (compared to the amount of computed data).
Hopefully, when it comes to numericals computations using Numpy, there is a simple and fast way to parallelize your application: the Numba JIT. Numba can parallelize a code if you explicitly use parallel structures (parallel=True and prange). It uses threads and not heavy processes that are working in shared memory. Numba can overcome the GIL if your code does not deal with native types and Numpy arrays instead of pure Python dynamic object (lists, big integers, classes, etc.). Here is an example:
import numpy as np
import numba as nb
import time
#nb.njit
def func1(x, y=1):
return x**2 + y
#nb.njit('float64[:](int64)', parallel=True)
def func2(n):
my_array = np.zeros(n)
for i in nb.prange(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(60000)
end = time.time()
print(my_array)
print("Numba time: {}\n".format(end-start))
if __name__ == '__main__':
main()
Because Numba compiles the code at runtime, it is able to fully optimize the loop to a no-op resulting in a time close to 0 second in this case.
Here is the solution proposed by #thisisalsomypassword that improves my initial proposal. That is, "collecting the AsyncResult objects in a list within the loop and then calling AsyncResult.get() after all processes have started on each result object":
import numpy as np
from multiprocessing import Pool
import time
def func1(x, y=1):
time.sleep(0.1)
return x**2 + y
def func2(n, parallel=False):
my_array = np.zeros((n))
# Parallelized version:
if parallel:
pool = Pool(processes=6)
####### HERE COMES THE CHANGE #######
results = [pool.apply_async(func1, [val]) for val in range(1, n+1)]
for idx, val in enumerate(results):
my_array[idx] = val.get()
#######
pool.close()
# Not parallelized version:
else:
for i in range(1, n+1):
my_array[i-1] = func1(i)
return my_array
def main():
start = time.time()
my_array = func2(600)
end = time.time()
print(my_array)
print("Normal time: {}\n".format(end-start))
start_parallel = time.time()
my_array_parallelized = func2(600, parallel=True)
end_parallel = time.time()
print(my_array_parallelized)
print("Time based on multiprocessing: {}".format(end_parallel-start_parallel))
if __name__ == '__main__':
main()
Now it works. Time is reduced considerably with Multiprocessing:
Normal time: 60.107836008071
Time based on multiprocessing: 10.049324989318848
time.sleep(0.1) was added in func1 to cancel out the effect of being a super trivial task.
I want to calculate the mean of several variables after iterating many times. My function creates random data, and from those, I calculate the variables (using other functions).
So far I have:
stuff1_list = []
stuff2_list = []
stuff3_list = []
for i in range(100):
data = create_data(arg1, arg2)
stuff1_list.append(calc_stuff1(data))
stuff2_list.append(calc_stuff2(data))
stuff3_list.append(calc_stuff3(data))
mean1 = np.mean(stuff1_list)
mean2 = np.mean(stuff2_list)
mean3 = np.mean(stuff3_list)
I've been trying to figure out how to do this with multiprocessing, but I am confused with Process, Queue, Pool, and so on. How can I get this job done with parallel processing?
My approach would be:
def do_stuff():
stuff_list = []
for i in range(100):
data = create_data(arg1, arg2)
stuff_list.append(calc_stuff(data))
print(np.mean(stuff_list))
for i in range(3):
p = multiprocessing.Process(target=do_stuff, args=())
p.start()
I tried to write a merge sort with multiprocessing solution
from heapq import merge
from multiprocessing import Process
def merge_sort1(m):
if len(m) < 2:
return m
middle = len(m) // 2
left = Process(target=merge_sort1, args=(m[:middle],))
left.start()
right = Process(target=merge_sort1, args=(m[middle:],))
right.start()
for p in (left, right):
p.join()
result = list(merge(left, right))
return result
Test it with arr
In [47]: arr = list(range(9))
In [48]: random.shuffle(arr)
It repost error:
In [49]: merge_sort1(arr)
TypeError: 'Process' object is not iterable
What's the problem with my code?
merge(left, right) tries to merge two processes, whereas you presumably want to merge the two lists that resulted from each process. Note that return value of the function passed to Process is lost; it is a different process, not just a different thread, and you can't very easily shuffle data back to parent, so Python doesn't do that, by default. You need to be explicit and code such a channel yourself. Fortunately, there are multiprocessing datatypes to help you; for example, multiprocessing.Pipe:
from heapq import merge
import random
import multiprocessing
def merge_sort1(m, send_end=None):
if len(m) < 2:
result = m
else:
middle = len(m) // 2
inputs = [m[:middle], m[middle:]]
pipes = [multiprocessing.Pipe(False) for _ in inputs]
processes = [multiprocessing.Process(target=merge_sort1, args=(input, send_end))
for input, (recv_end, send_end) in zip(inputs, pipes)]
for process in processes: process.start()
for process in processes: process.join()
results = [recv_end.recv() for recv_end, send_end in pipes]
result = list(merge(*results))
if send_end:
send_end.send(result)
else:
return result
arr = list(range(9))
random.shuffle(arr)
print(merge_sort1(arr))
I'm trying to update a shared variable (numpy array in a namespace) when using the multiprocessing module. However, the variable is not updated and I dont understand why.
Here is a sample code to illustrate this:
from multiprocessing import Process, Manager
import numpy as np
chunk_size = 15
arr_length = 1000
jobs = []
namespace = Manager().Namespace()
namespace.arr = np.zeros(arr_length)
nb_chunk = arr_length/chunk_size + 1
def foo(i, ns):
from_idx = chunk_size*i
to_idx = min(arr_length, chunk_size*(i+1))
ns.arr[from_idx:to_idx] = np.random.randint(0, 100, to_idx-from_idx)
for i in np.arange(nb_chunk):
p = Process(target=foo, args=(i, namespace))
p.start()
jobs.append(p)
for i in np.arange(nb_chunk):
jobs[i].join()
print namespace.arr[:10]
You can not share in-built objects like list, dict across processes in Python. In order to share data between process, Python's multiprocessing provide two data structure:
Queue()
Pipe()
Also read: Exchanging objects between processes
The issue is that the Manager().Namespace() object doesn't notice that you're changing anything using ns.arr[from_idx:to_idx] = ... (as you're working on a inner data structure) and thus doesn't propagate to the other processes.
This answer explains very good what's going on here.
To fix it, create the list as a Manager().List() and pass this list to the processes, so that ns[from_idx:to_idx] = ... is recognized as a change and is propagated to the processes:
from multiprocessing import Process, Manager
import numpy as np
chunk_size = 15
arr_length = 1000
jobs = []
arr = Manager().list([0] * arr_length)
nb_chunk = arr_length/chunk_size + 1
def foo(i, ns):
from_idx = chunk_size*i
to_idx = min(arr_length, chunk_size*(i+1))
ns[from_idx:to_idx] = np.random.randint(0, 100, to_idx-from_idx)
for i in np.arange(nb_chunk):
p = Process(target=foo, args=(i, arr))
p.start()
jobs.append(p)
for i in np.arange(nb_chunk):
jobs[i].join()
print arr[:10]
I've run into a problem, where Python quits unexpectedly, when running multiprocessing with numpy. I've isolated the problem, so that I can now confirm that the multiprocessing works perfect when running the code stated below:
import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p
def test(args):
x,i = args
if i == 2:
time.sleep(4)
arr = np.dot(x.T,x)
print i
if __name__ == '__main__':
x = np.random.random(size=((2000,500)))
evaluations = [(x,i) for i in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()
The problem occurs when I try to evaluate the code below. This makes Python quit unexpectedly:
import numpy as np
from multiprocessing import Pool, Process
import time
import cPickle as p
def test(args):
x,i = args
if i == 2:
time.sleep(4)
arr = np.dot(x.T,x)
print i
if __name__ == '__main__':
x = np.random.random(size=((2000,500)))
test((x,4)) # Added code
evaluations = [(x,i) for i in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()
Please help someone. I'm open to all suggestions. Thanks. Note: I have tried two different machines and the same problem occurs.
This is a known issue with multiprocessing and numpy on MacOS X, and a bit of a duplicate of:
segfault using numpy's lapack_lite with multiprocessing on osx, not linux
http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html
The answer seems to be to use a different BLAS other than the Apple accelerate framework when linking Numpy... unfortunate :(
I figured out a workaround to the problem. The problem occurs when Numpy is used together with BLAS before initializing a multiprocessing instance. My workaround is simply to put the Numpy code (running BLAS) into a single process and then running the multiprocessing instances afterwards. This is not a good coding style, but it works. See example below:
Following will fail - Python will quit:
import numpy as np
from multiprocessing import Pool, Process
def test(x):
arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.
if __name__ == '__main__':
x = np.random.random(size=((2000,500))) # Random matrix
test(x)
evaluations = [x for _ in range(5)]
p = Pool()
p.map_async(test,evaluations) # This is where Python will quit, because of the prior use of BLAS.
p.close()
p.join()
Following will succeed:
import numpy as np
from multiprocessing import Pool, Process
def test(x):
arr = np.dot(x.T,x) # On large matrices, this calc will use BLAS.
if __name__ == '__main__':
x = np.random.random(size=((2000,500))) # Random matrix
p = Process(target = test,args = (x,))
p.start()
p.join()
evaluations = [x for _ in range(5)]
p = Pool()
p.map_async(test,evaluations)
p.close()
p.join()