I have been reading about multiprocessing in Python (e.g. I have read this and this and this and this and so on; I have also read/watched different websites/videos such as this and this and this and so many more!) but I am still confused how I could apply multiprocessing to my specific problem. I have written a simple example code for calculating the avg value of randomly generated integers using Monte Carlo Simulation (I store the random integers in a variable called integers so I can finally calculate the mean; I am also generating random numpy.ndarrays and store them in a variable called arrays as I need to do some post-processing on those arrays later too):
import numpy as np
nMCS = 10 ** 8
integers = []
arrays = []
for i in range(nMCS):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
integers.append(a)
arrays.append(b)
mean_val = np.average(integers)
# I will do post-processing on 'arrays' later!!
Now I want to utilize all of the 16 cores on my machine, so the random numbers/arrays are not generated in sequence and I can speed up the process. Based on what I have learnt, I recognize I need to store the results of each Monte Carlo Simulation (i.e. the generated random integer and random numpy.ndarray) and then use Inter-process communication in order to later store all of the results in a list. I have written different codes but unfortunately non of them work. As an example, when I write something like this:
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
for i in range(nMCS):
integers.append(a)
arrays.append(b)
I get the error "name 'a' is not defined". So could anyone please help me with this and tell me how I could generate as many random integers/arrays as possible concurrently, and then add them all to a list for further processing?
Due to the fact that returning a lot of result causes time for propagation between process, I would suggest to divide the task in few part and process it before returning back.
n = 4
def monte_carlo():
raw_result = []
for j in range(10**4 / n):
a = np.random.randint(0,10)
b = np.random.rand(10,2)
raw_result .append([a,b])
result = processResult(raw_result)
#Your method to reduce the result return,
#let's assume the return value is [avg(a),reformed_array(b)]
return result
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
pool = Pool(processes=4)
#you can control how many processes here, for example multiprocessing.cpu_count()-1 to avoid completely blocking
multiple_results = [pool.apply_async(monte_carlo, (i,)) for i in range(n)]
data = [res.get() for res in multiple_results]
#OR
data = pool.map(monte_carlo, [i for i in range(n)])
#Both return you a list of [avg(a),reformed_array(b)]
Simple error.
a and b are created in your function
They do not exist in your main scope. you will need to return them back from your function
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
#create a return statement here. It may help if you put them into an array so you can return 2 value
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo)
p1.start()
p1.join()
#Call your function here and save the return to something
for i in range(nMCS):
integers.append(a) # paste here
arrays.append(b) # and here
Edit: tested code and found you were never calling your monte_carlo function. a and b are now working correctly but you have a new error to try and solve. Sorry but I wont be able to help with this error as I dont understand it myself, but here is my edit of your code.
import numpy as np
import multiprocessing
nMCS = 10 ** 6
integers = []
arrays = []
def monte_carlo():
a = np.random.randint(0,10)
b = np.random.rand(10,2)
temp = [a,b]
return temp
if __name__ == '__main__':
__spec__ = "ModuleSpec(name='builtins', loader=<class
'_frozen_importlib.BuiltinImporter'>)" # this is because I am using Spyder!
p1 = multiprocessing.Process(target = monte_carlo())#added the extra brackets here
p1.start()
p1.join()
for i in range(nMCS):
array = monte_carlo()
integers.append(array[0])
arrays.append(array[1])
and here is the error I got with this edit. I am still learning multi processing myself so other people may be better suited to help with this
Process Process-6:
Traceback (most recent call last):
File"c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 252, in _bootstrap
self.run()
File "c:\users\lunar\appdata\local\continuum\anaconda3\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: 'list' object is not callable
Related
I am entirely new to parallel computing, in fact, to numerical methods. I am trying to solve a differential equation using python solve_ivp of the following form:
y''(x) + (a^2 + x^2)y(x) = 0
y(0)=1
y'(0)=0
x=(0,100)
I want to solve for a range of a and write a file as a[i] y[i](80).
The original Equation is quite complicated but essentially the structure is the same as defined above. I have used a for loop and it is taking a lot of time for computation. Searching online I came across this beautiful website and found this question and related answer that preciously may solve the problem I am facing.
I tried the original code provided in the solution; however, the output produced is not properly sorted. I mean, the second column is not in proper order.
q1 a1 Y1
q1 a2 Y3
q1 a4 Y4
q1 a3 Y3
q1 a5 Y5
...
I have even tried with one loop with one parameter, but the same issue still remains. Below is my code with the same multiprocessing method but with solve_ivp
import numpy as np
import scipy.integrate
import multiprocessing as mp
from scipy.integrate import solve_ivp
def fun(t, y):
# replace this function with whatever function you want to work with
# (this one is the example function from the scipy docs for odeint)
theta, omega = y
dydt = [omega, -a*omega - q*np.sin(theta)]
return dydt
#definitions of work thread and write thread functions
tspan = np.linspace(0, 10, 201)
def run_thread(input_queue, output_queue):
# run threads will pull tasks from the input_queue, push results into output_queue
while True:
try:
queueitem = input_queue.get(block = False)
if len(queueitem) == 3:
a, q, t = queueitem
sol = solve_ivp(fun, [tspan[0], tspan[-1]], [1, 0], method='RK45', t_eval=tspan)
F = 1 + sol.y[0].T[157]
output_queue.put((q, a, F))
except Exception as e:
print(str(e))
print("Queue exhausted, terminating")
break
def write_thread(queue):
# write thread will pull results from output_queue, write them to outputfile.txt
f1 = open("outputfile.txt", "w")
while True:
try:
queueitem = queue.get(block = False)
if queueitem[0] == "TERMINATE":
f1.close()
break
else:
q, a, F = queueitem
print("{} {} {} \n".format(q, a, F))
f1.write("{} {} {} \n".format(q, a, F))
except:
# necessary since it will throw an error whenever output_queue is empty
pass
# define time point sequence
t = np.linspace(0, 10, 201)
# prepare input and output Queues
mpM = mp.Manager()
input_queue = mpM.Queue()
output_queue = mpM.Queue()
# prepare tasks, collect them in input_queue
for q in np.linspace(0.0, 4.0, 100):
for a in np.linspace(-2.0, 7.0, 100):
# Your computations as commented here will now happen in run_threads as defined above and created below
# print('Solving for q = {}, a = {}'.format(q,a))
# sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
# print(t[157])
# F = 1 + sol1[157]
input_tupel = (a, q, t)
input_queue.put(input_tupel)
# create threads
thread_number = mp.cpu_count()
procs_list = [mp.Process(target = run_thread , args = (input_queue, output_queue)) for i in range(thread_number)]
write_proc = mp.Process(target = write_thread, args = (output_queue,))
# start threads
for proc in procs_list:
proc.start()
write_proc.start()
# wait for run_threads to finish
for proc in procs_list:
proc.join()
# terminate write_thread
output_queue.put(("TERMINATE",))
write_proc.join()
Kindly let me know what is wrong in the multiprocessing so that I can learn a bit about multiprocessing in python in the process. Also, I would much appreciate if anyone let me know about the most elegant/efficient way(s) for handling such a computation in python. Thanks
What you want is something of an online sort. In this case, you know the order in which you want to present results (i.e., the input order), so just accumulate the outputs in a priority queue and pop elements from it when they match the next key you’re expecting.
A trivial example without showing the parallelism:
import heapq
def sort_follow(pairs,ref):
"""
Sort (a prefix of) pairs so as to have its first components
be the elements of (sorted) ref in order.
Uses memory proportional to the disorder in pairs.
"""
heap=[]
pairs=iter(pairs)
for r in ref:
while not heap or heap[0][0]!=r:
heapq.heappush(heap,next(pairs))
yield heapq.heappop(heap)
The virtue of this approach—the reduced memory footprint—is probably irrelevant for small results of just a few floats, but it’s easy enough to apply.
I tried to write a merge sort with multiprocessing solution
from heapq import merge
from multiprocessing import Process
def merge_sort1(m):
if len(m) < 2:
return m
middle = len(m) // 2
left = Process(target=merge_sort1, args=(m[:middle],))
left.start()
right = Process(target=merge_sort1, args=(m[middle:],))
right.start()
for p in (left, right):
p.join()
result = list(merge(left, right))
return result
Test it with arr
In [47]: arr = list(range(9))
In [48]: random.shuffle(arr)
It repost error:
In [49]: merge_sort1(arr)
TypeError: 'Process' object is not iterable
What's the problem with my code?
merge(left, right) tries to merge two processes, whereas you presumably want to merge the two lists that resulted from each process. Note that return value of the function passed to Process is lost; it is a different process, not just a different thread, and you can't very easily shuffle data back to parent, so Python doesn't do that, by default. You need to be explicit and code such a channel yourself. Fortunately, there are multiprocessing datatypes to help you; for example, multiprocessing.Pipe:
from heapq import merge
import random
import multiprocessing
def merge_sort1(m, send_end=None):
if len(m) < 2:
result = m
else:
middle = len(m) // 2
inputs = [m[:middle], m[middle:]]
pipes = [multiprocessing.Pipe(False) for _ in inputs]
processes = [multiprocessing.Process(target=merge_sort1, args=(input, send_end))
for input, (recv_end, send_end) in zip(inputs, pipes)]
for process in processes: process.start()
for process in processes: process.join()
results = [recv_end.recv() for recv_end, send_end in pipes]
result = list(merge(*results))
if send_end:
send_end.send(result)
else:
return result
arr = list(range(9))
random.shuffle(arr)
print(merge_sort1(arr))
I would like to use a numpy array in shared memory for use with the multiprocessing module. The difficulty is using it like a numpy array, and not just as a ctypes array.
from multiprocessing import Process, Array
import scipy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
arr = Array('d', unshared_arr)
print "Originally, the first two elements of arr = %s"%(arr[:2])
# Create, start, and finish the child processes
p = Process(target=f, args=(arr,))
p.start()
p.join()
# Printing out the changed values
print "Now, the first two elements of arr = %s"%arr[:2]
This produces output such as:
Originally, the first two elements of arr = [0.3518653236697369, 0.517794725524976]
Now, the first two elements of arr = [-0.3518653236697369, 0.517794725524976]
The array can be accessed in a ctypes manner, e.g. arr[i] makes sense. However, it is not a numpy array, and I cannot perform operations such as -1*arr, or arr.sum(). I suppose a solution would be to convert the ctypes array into a numpy array. However (besides not being able to make this work), I don't believe it would be shared anymore.
It seems there would be a standard solution to what has to be a common problem.
To add to #unutbu's (not available anymore) and #Henry Gomersall's answers. You could use shared_arr.get_lock() to synchronize access when needed:
shared_arr = mp.Array(ctypes.c_double, N)
# ...
def f(i): # could be anything numpy accepts as an index such another numpy array
with shared_arr.get_lock(): # synchronize access
arr = np.frombuffer(shared_arr.get_obj()) # no data copying
arr[i] = -arr[i]
Example
import ctypes
import logging
import multiprocessing as mp
from contextlib import closing
import numpy as np
info = mp.get_logger().info
def main():
logger = mp.log_to_stderr()
logger.setLevel(logging.INFO)
# create shared array
N, M = 100, 11
shared_arr = mp.Array(ctypes.c_double, N)
arr = tonumpyarray(shared_arr)
# fill with random values
arr[:] = np.random.uniform(size=N)
arr_orig = arr.copy()
# write to arr from different processes
with closing(mp.Pool(initializer=init, initargs=(shared_arr,))) as p:
# many processes access the same slice
stop_f = N // 10
p.map_async(f, [slice(stop_f)]*M)
# many processes access different slices of the same array
assert M % 2 # odd
step = N // 10
p.map_async(g, [slice(i, i + step) for i in range(stop_f, N, step)])
p.join()
assert np.allclose(((-1)**M)*tonumpyarray(shared_arr), arr_orig)
def init(shared_arr_):
global shared_arr
shared_arr = shared_arr_ # must be inherited, not passed as an argument
def tonumpyarray(mp_arr):
return np.frombuffer(mp_arr.get_obj())
def f(i):
"""synchronized."""
with shared_arr.get_lock(): # synchronize access
g(i)
def g(i):
"""no synchronization."""
info("start %s" % (i,))
arr = tonumpyarray(shared_arr)
arr[i] = -1 * arr[i]
info("end %s" % (i,))
if __name__ == '__main__':
mp.freeze_support()
main()
If you don't need synchronized access or you create your own locks then mp.Array() is unnecessary. You could use mp.sharedctypes.RawArray in this case.
The Array object has a get_obj() method associated with it, which returns the ctypes array which presents a buffer interface. I think the following should work...
from multiprocessing import Process, Array
import scipy
import numpy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
a = Array('d', unshared_arr)
print "Originally, the first two elements of arr = %s"%(a[:2])
# Create, start, and finish the child process
p = Process(target=f, args=(a,))
p.start()
p.join()
# Print out the changed values
print "Now, the first two elements of arr = %s"%a[:2]
b = numpy.frombuffer(a.get_obj())
b[0] = 10.0
print a[0]
When run, this prints out the first element of a now being 10.0, showing a and b are just two views into the same memory.
In order to make sure it is still multiprocessor safe, I believe you will have to use the acquire and release methods that exist on the Array object, a, and its built in lock to make sure its all safely accessed (though I'm not an expert on the multiprocessor module).
While the answers already given are good, there is a much easier solution to this problem provided two conditions are met:
You are on a POSIX-compliant operating system (e.g. Linux, Mac OSX); and
Your child processes need read-only access to the shared array.
In this case you do not need to fiddle with explicitly making variables shared, as the child processes will be created using a fork. A forked child automatically shares the parent's memory space. In the context of Python multiprocessing, this means it shares all module-level variables; note that this does not hold for arguments that you explicitly pass to your child processes or to the functions you call on a multiprocessing.Pool or so.
A simple example:
import multiprocessing
import numpy as np
# will hold the (implicitly mem-shared) data
data_array = None
# child worker function
def job_handler(num):
# built-in id() returns unique memory ID of a variable
return id(data_array), np.sum(data_array)
def launch_jobs(data, num_jobs=5, num_worker=4):
global data_array
data_array = data
pool = multiprocessing.Pool(num_worker)
return pool.map(job_handler, range(num_jobs))
# create some random data and execute the child jobs
mem_ids, sumvals = zip(*launch_jobs(np.random.rand(10)))
# this will print 'True' on POSIX OS, since the data was shared
print(np.all(np.asarray(mem_ids) == id(data_array)))
I've written a small python module that uses POSIX shared memory to share numpy arrays between python interpreters. Maybe you will find it handy.
https://pypi.python.org/pypi/SharedArray
Here's how it works:
import numpy as np
import SharedArray as sa
# Create an array in shared memory
a = sa.create("test1", 10)
# Attach it as a different array. This can be done from another
# python interpreter as long as it runs on the same computer.
b = sa.attach("test1")
# See how they are actually sharing the same memory block
a[0] = 42
print(b[0])
# Destroying a does not affect b.
del a
print(b[0])
# See how "test1" is still present in shared memory even though we
# destroyed the array a.
sa.list()
# Now destroy the array "test1" from memory.
sa.delete("test1")
# The array b is not affected, but once you destroy it then the
# data are lost.
print(b[0])
You can use the sharedmem module: https://bitbucket.org/cleemesser/numpy-sharedmem
Here's your original code then, this time using shared memory that behaves like a NumPy array (note the additional last statement calling a NumPy sum() function):
from multiprocessing import Process
import sharedmem
import scipy
def f(a):
a[0] = -a[0]
if __name__ == '__main__':
# Create the array
N = int(10)
unshared_arr = scipy.rand(N)
arr = sharedmem.empty(N)
arr[:] = unshared_arr.copy()
print "Originally, the first two elements of arr = %s"%(arr[:2])
# Create, start, and finish the child process
p = Process(target=f, args=(arr,))
p.start()
p.join()
# Print out the changed values
print "Now, the first two elements of arr = %s"%arr[:2]
# Perform some NumPy operation
print arr.sum()
With Python3.8+ there is the multiprocessing.shared_memory standard library:
# np_sharing.py
from multiprocessing import Process
from multiprocessing.managers import SharedMemoryManager
from multiprocessing.shared_memory import SharedMemory
from typing import Tuple
import numpy as np
def create_np_array_from_shared_mem(
shared_mem: SharedMemory, shared_data_dtype: np.dtype, shared_data_shape: Tuple[int, ...]
) -> np.ndarray:
arr = np.frombuffer(shared_mem.buf, dtype=shared_data_dtype)
arr = arr.reshape(shared_data_shape)
return arr
def child_process(
shared_mem: SharedMemory, shared_data_dtype: np.dtype, shared_data_shape: Tuple[int, ...]
):
"""Logic to be executed by the child process"""
arr = create_np_array_from_shared_mem(shared_mem, shared_data_dtype, shared_data_shape)
arr[0, 0] = -arr[0, 0] # modify the array backed by shared memory
def main():
"""Logic to be executed by the parent process"""
# Data to be shared:
data_to_share = np.random.rand(10, 10)
SHARED_DATA_DTYPE = data_to_share.dtype
SHARED_DATA_SHAPE = data_to_share.shape
SHARED_DATA_NBYTES = data_to_share.nbytes
with SharedMemoryManager() as smm:
shared_mem = smm.SharedMemory(size=SHARED_DATA_NBYTES)
arr = create_np_array_from_shared_mem(shared_mem, SHARED_DATA_DTYPE, SHARED_DATA_SHAPE)
arr[:] = data_to_share # load the data into shared memory
print(f"The [0,0] element of arr is {arr[0,0]}") # before
# Run child process:
p = Process(target=child_process, args=(shared_mem, SHARED_DATA_DTYPE, SHARED_DATA_SHAPE))
p.start()
p.join()
print(f"The [0,0] element of arr is {arr[0,0]}") # after
del arr # delete np array so the shared memory can be deallocated
if __name__ == "__main__":
main()
Running the script:
$ python3.10 np_sharing.py
The [0,0] element of arr is 0.262091705529628
The [0,0] element of arr is -0.262091705529628
Since the arrays in different processes share the same underlying memory buffer, the standard caveats r.e. race conditions apply.
I am running a simulation using Runge-Kutta. At every time step two FFT of two independent variables are necessary which can be parallelized. I implemented the code like this:
from multiprocessing import Pool
import numpy as np
pool = Pool(processes=2) # I like to calculate only 2 FFTs parallel
# in every time step, therefor 2 processes
def Splitter(args):
'''I have to pass 2 arguments'''
return makeSomething(*args):
def makeSomething(a,b):
'''dummy function instead of the one with the FFT'''
return a*b
def RungeK():
# ...
# a lot of code which create the vectors A and B and calculates
# one Kunge-Kutta step for them
# ...
n = 20 # Just something for the example
A = np.arange(50000)
B = np.ones_like(A)
for i in xrange(n): # loop over the time steps
A *= np.mean(B)*B - A
B *= np.sqrt(A)
results = pool.map(Splitter,[(A,3),(B,2)])
A = results[0]
B = results[1]
print np.mean(A) # Some output
print np.max(B)
if __name__== '__main__':
RungeK()
Unfortunately python generates a unlimited number of processes after reaching the loop. Before it seems that only two processes are running. Also my memory fills up. Adding a
pool.close()
pool.join()
behind the loop does not solve my problem, and to put it inside the loop makes no sense for me. Hope you can help.
Move the creation of the pool into the RungeK function;
def RungeK():
# ...
# a lot of code which create the vectors A and B and calculates
# one Kunge-Kutta step for them
# ...
pool = Pool(processes=2)
n = 20 # Just something for the example
A = np.arange(50000)
B = np.ones_like(A)
for i in xrange(n): # loop over the time steps
A *= np.mean(B)*B - A
B *= np.sqrt(A)
results = pool.map(Splitter, [(A, 3), (B, 2)])
A = results[0]
B = results[1]
pool.close()
print np.mean(A) # Some output
print np.max(B)
Alternatively, put it in the main block.
This is probably a side effect of how multiprocessing works. E.g. on MS windows, you need to be able to import the main module without side effects (like creating new processes).
I am confused with Python multiprocessing.
I am trying to speed up a function which process strings from a database but I must have misunderstood how multiprocessing works because the function takes longer when given to a pool of workers than with “normal processing”.
Here an example of what I am trying to achieve.
from time import clock, time
from multiprocessing import Pool, freeze_support
from random import choice
def foo(x):
TupWerteMany = []
for i in range(0,len(x)):
TupWerte = []
s = list(x[i][3])
NewValue = choice(s)+choice(s)+choice(s)+choice(s)
TupWerte.append(NewValue)
TupWerte = tuple(TupWerte)
TupWerteMany.append(TupWerte)
return TupWerteMany
if __name__ == '__main__':
start_time = time()
List = [(u'1', u'aa', u'Jacob', u'Emily'),
(u'2', u'bb', u'Ethan', u'Kayla')]
List1 = List*1000000
# METHOD 1 : NORMAL (takes 20 seconds)
x2 = foo(List1)
print x2[1:3]
# METHOD 2 : APPLY_ASYNC (takes 28 seconds)
# pool = Pool(4)
# Werte = pool.apply_async(foo, args=(List1,))
# x2 = Werte.get()
# print '--------'
# print x2[1:3]
# print '--------'
# METHOD 3: MAP (!! DOES NOT WORK !!)
# pool = Pool(4)
# Werte = pool.map(foo, args=(List1,))
# x2 = Werte.get()
# print '--------'
# print x2[1:3]
# print '--------'
print 'Time Elaspse: ', time() - start_time
My questions:
Why does apply_async takes longer than the “normal way” ?
What I am doing wrong with map?
Does it makes sense to speed up such tasks with multiprocessing at all?
Finally: after all I have read here, I am wondering if multiprocessing in python works on windows at all ?
So your first problem is that there is no actual parallelism happening in foo(x), you are passing the entire list to the function once.
1)
The idea of a process pool is to have many processes doing computations on separate bits of some data.
# METHOD 2 : APPLY_ASYNC
jobs = 4
size = len(List1)
pool = Pool(4)
results = []
# split the list into 4 equally sized chunks and submit those to the pool
heads = range(size/jobs, size, size/jobs) + [size]
tails = range(0,size,size/jobs)
for tail,head in zip(tails, heads):
werte = pool.apply_async(foo, args=(List1[tail:head],))
results.append(werte)
pool.close()
pool.join() # wait for the pool to be done
for result in results:
werte = result.get() # get the return value from the sub jobs
This will only give you an actual speedup if the time it takes to process each chunk is greater than the time it takes to launch the process, in the case of four processes and four jobs to be done, of course these dynamics change if you've got 4 processes and 100 jobs to be done. Remember that you are creating a completely new python interpreter four times, this isn't free.
2) The problem you have with map is that it applies foo to EVERY element in List1 in a separate process, this will take quite a while. So if you're pool has 4 processes map will pop an item of the list four times and send it to a process to be dealt with - wait for process to finish - pop some more stuff of the list - wait for the process to finish. This makes sense only if processing a single item takes a long time, like for instance if every item is a file name pointing to a one gigabyte text file. But as it stands map will just take a single string of the list and pass it to foo where as apply_async takes a slice of the list. Try the following code
def foo(thing):
print thing
map(foo, ['a','b','c','d'])
That's the built-in python map and will run a single process, but the idea is exactly the same for the multiprocess version.
Added as per J.F.Sebastian's comment: You can however use the chunksize argument to map to specify an approximate size of for each chunk.
pool.map(foo, List1, chunksize=size/jobs)
I don't know though if there is a problem with map on Windows as I don't have one available for testing.
3) yes, given that your problem is big enough to justify forking out new python interpreters
4) can't give you a definitive answer on that as it depends on the number of cores/processors etc. but in general it should be fine on Windows.
On question (2)
With the guidance of Dougal and Matti, I figured out what's went wrong.
The original foo function processes a list of lists, while map requires a function to process single elements.
The new function should be
def foo2 (x):
TupWerte = []
s = list(x[3])
NewValue = choice(s)+choice(s)+choice(s)+choice(s)
TupWerte.append(NewValue)
TupWerte = tuple(TupWerte)
return TupWerte
and the block to call it :
jobs = 4
size = len(List1)
pool = Pool()
#Werte = pool.map(foo2, List1, chunksize=size/jobs)
Werte = pool.map(foo2, List1)
pool.close()
print Werte[1:3]
Thanks to all of you who helped me understand this.
Results of all methods:
for List * 2 Mio records: normal 13.3 seconds , parallel with async: 7.5 seconds, parallel with with map with chuncksize : 7.3, without chunksize 5.2 seconds
Here's a generic multiprocessing template if you are interested.
import multiprocessing as mp
import time
def worker(x):
time.sleep(0.2)
print "x= %s, x squared = %s" % (x, x*x)
return x*x
def apply_async():
pool = mp.Pool()
for i in range(100):
pool.apply_async(worker, args = (i, ))
pool.close()
pool.join()
if __name__ == '__main__':
apply_async()
And the output looks like this:
x= 0, x squared = 0
x= 1, x squared = 1
x= 2, x squared = 4
x= 3, x squared = 9
x= 4, x squared = 16
x= 6, x squared = 36
x= 5, x squared = 25
x= 7, x squared = 49
x= 8, x squared = 64
x= 10, x squared = 100
x= 11, x squared = 121
x= 9, x squared = 81
x= 12, x squared = 144
As you can see, the numbers are not in order, as they are being executed asynchronously.