I'm using a construct similar to this example to run my processing in parallel with a progress bar courtesy of tqdm...
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return square
if __name__ == '__main__':
with Pool(processes=2) as p:
max_ = 30
with tqdm(total=max_) as pbar:
for _ in p.imap_unordered(_foo, range(0, max_)):
pbar.update()
results = p.join() ## My attempt to combine results
results is always NoneType though, and I can not work out how to get my results combined. I understand that with ...: will close what it is working with on completion automatically.
I've tried doing away with the outer with:
if __name__ == '__main__':
max_ = 10
p = Pool(processes=8)
with tqdm(total=max_) as pbar:
for _ in p.imap_unordered(_foo, range(0, max_)):
pbar.update()
p.close()
results = p.join()
print(f"Results : {results}")
Stumped as to how to join() my results?
Your call to p.join() just waits for all the pool processes to end and returns None. This call is actually unnecessary since you are using the pool as a context manager, that is you have specified with Pool(processes=2) as p:). When that block terminates an implicit call is made to p.terminate(), which immediately terminates the pool processes and any tasks that may be running or queued up to run (there are none in your case).
It is, in fact, iterating the iterator returned by the call to p.imap_unordered that returns each return value from your worker function, _foo. But since you are using method imap_unordered, the results returned may not be in submission order. In other words, you cannot assume that the return values will be in succession 0, 1, , 4, 9, etc. There are many ways to handle this, such as having your worker function return the original argument along with the squared value:
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return my_number, square # return the argunent along with the result
if __name__ == '__main__':
with Pool(processes=2) as p:
max_ = 30
results = [None] * 30; # preallocate the resulys array
with tqdm(total=max_) as pbar:
for x, result in p.imap_unordered(_foo, range(0, max_)):
results[x] = result
pbar.update()
print(results)
The second way is to not use imap_unordered, but rather apply_async with a callback function. The disadvantage of this is that for large iterables you do not have the option of specifying a chunksize argument as you do with imap_unordered:
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return square
if __name__ == '__main__':
def my_callback(_): # ignore result
pbar.update() # update progress bar when a result is produced
with Pool(processes=2) as p:
max_ = 30
with tqdm(total=max_) as pbar:
async_results = [p.apply_async(_foo, (x,), callback=my_callback) for x in range(0, max_)]
# wait for all tasks to complete:
p.close()
p.join()
results = [async_result.get() for async_result in async_results]
print(results)
Related
In my GUI application, I want to use multiprocessing to accelerate the calculation. Now, I can use multiprocessing, and collect the calculated result. Now, I want the subprocess can inform the main-process that the calculation is finished, but I can not find any solution.
My multiprocessing looks like:
import multiprocessing
from multiprocessing import Process
import numpy as np
class MyProcess(Process):
def __init__(self,name, array):
super(MyProcess,self).__init__()
self.name = name
self.array = array
recv_end, send_end = multiprocessing.Pipe(False)
self.recv = recv_end
self.send = send_end
def run(self):
s = 0
for a in self.array:
s += a
self.send.send(s)
def getResult(self):
return self.recv.recv()
if __name__ == '__main__':
process_list = []
for i in range(5):
a = np.random.random(10)
print(i, ' correct result: ', a.sum())
p = MyProcess(str(i), a)
p.start()
process_list.append(p)
for p in process_list:
p.join()
for p in process_list:
print(p.name, ' subprocess result: ', p.getResult())
I want the sub-process can inform the main-process that the calculation is finish so that I can show the result in my GUI.
Any suggestion is appreciated~~~
Assuming you would like to do something with a result (the sum of an numpy array, in your case) as soon as it has been generated, then I would use a multiprocessing pool with method multiprocessing.pool.Pool with method imap_unordered, which will return results in the order generated. In this case you need to pass to your worker function the index of the array in the list of arrays to be processed along with the array itself and have it return back this index along with the array's sum since this is the only way for the main process to know for which array the sum has been generated:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(tpl):
# unpack tuple:
i, array = tpl
s = 0
for a in array:
s += a
return i, s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# get result as soon as it has been returned:
for i, s in pool.imap_unordered(compute_sum, zip(range(n), array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 0: 4.760033809335711, actual result: 4.76003380933571
correct result 1: 5.486818812843256, actual result: 5.486818812843257
correct result 2: 5.400374562564179, actual result: 5.400374562564179
correct result 3: 4.079376706247242, actual result: 4.079376706247242
correct result 4: 4.20860716467263, actual result: 4.20860716467263
In the above run the actual results generated happened to be in the same order in which the tasks were submitted. To demonstrate that in general the results could be generated in arbitrary order based on how long it takes for the worker function to compute its result, we introduce some randomness to the processing time:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(tpl):
import time
# unpack tuple:
i, array = tpl
# results will be generated in random order:
time.sleep(np.random.sample())
s = 0
for a in array:
s += a
return i, s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# get result as soon as it has been returned:
for i, s in pool.imap_unordered(compute_sum, zip(range(n), array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 4: 6.662288433360379, actual result: 6.66228843336038
correct result 0: 3.352901187256162, actual result: 3.3529011872561614
correct result 3: 5.836344458981557, actual result: 5.836344458981557
correct result 2: 2.9950208717729656, actual result: 2.9950208717729656
correct result 1: 5.144743159869513, actual result: 5.144743159869513
If you are satisfied with getting back results in task-submission rather than task-completion order, then use method imap and there is no need to pass back and forth array indices:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(array):
s = 0
for a in array:
s += a
return s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
for i, s in enumerate(pool.imap(compute_sum, array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 0: 4.841913985702773, actual result: 4.841913985702773
correct result 1: 4.836923014762733, actual result: 4.836923014762733
correct result 2: 4.91242274200897, actual result: 4.91242274200897
correct result 3: 4.701913574838348, actual result: 4.701913574838349
correct result 4: 5.813666896917504, actual result: 5.813666896917503
Update
You can also use method apply_async specifying a callback function to be invoked when a result is returned from your worker function, compute_sum. apply_async returns a multiprocessing.pool.AsyncResult whose get method will block until the task has completed and returns the return value from the completed task. But here, since, we are using a callback function that will automatically be called with the result when the task completes instead of calling method multiprocessing.pool.AsyncResult.get, there is no need to save the AsyncResult instances. We also rely on calling methods multiprocessing.pool.Pool.close() followed by multiprocessing.pool.Pool.join() to block until all submitted tasks have completed and results returned:
from multiprocessing import Pool, cpu_count
import numpy as np
from functools import partial
def compute_sum(i, array):
s = 0
for a in array:
s += a
return i, s
def calculation_display(result, t):
# Unpack returned tuple:
i, s = t
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
result[i] = s
if __name__ == '__main__':
global array_list
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
result = [0] * n
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# Get result as soon as it has been returned.
# Pass to our callback as the first argument the results list.
# The return value will now be the second argument:
my_callback = partial(calculation_display, result)
for i, array in enumerate(array_list):
pool.apply_async(compute_sum, args=(i, array), callback=my_callback)
# Wait for all submitted tasks to complete:
pool.close()
pool.join()
print('results:', result)
Prints:
correct result 0: 5.381579338696546, actual result: 5.381579338696546
correct result 1: 3.8780497856741274, actual result: 3.8780497856741274
correct result 2: 4.548733927791488, actual result: 4.548733927791488
correct result 3: 5.048921365623381, actual result: 5.048921365623381
correct result 4: 4.852415747983676, actual result: 4.852415747983676
results: [5.381579338696546, 3.8780497856741274, 4.548733927791488, 5.048921365623381, 4.852415747983676]
I'm having trouble on how to do a return results comparison for each of the multiprocessing.
I am doing a multiprocessing for my function. My function will return a value. I want to run my function 5 times and compare the which process have the lowest return value. My code as below.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return 'Current best value: ', currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
for f in concurrent.futures.as_completed(results):
print(f.results())
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
Output as of now:
Current best value: 12909.5 for process 21918
Current best value: 12091.5 for process 21920
Current best value: 12350.0 for process 21919
Current best value: 12000.5 for process 21921
Current best value: 11901.0 for process 21922
Finish in 85.86 second(s)
What I want is from all the 5 return values above, I want to take the data for value that is the lowest. In this example process 21922 have the lowest value. So I want to assign the value to a parameter.
FinalbestVal = 11901.0
From what I see, you are mixing the responsibilities of your function and that is causing your issues. The function below returns only the data. The code that is run after the data is collected evaluates the data. Then the data is presented all at once. Separating the responsibilities of your code is a decades old key to cleaner code. The reason it has been supported as a best practice for so long is because of problems like the one you have run into. It also makes code easier to reuse later without having to change it.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return [currbestVal, os.getpid()]
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
best_value = -1
values_list = []
for f in concurrent.futures.as_completed(results):
values = f.result()
values_list.append(values)
if best_value == -1 or values[0] < best_value:
best_value = values[0]
for i in values_list:
print(f'Current best value: {i[0]} for process {i[1]}')
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
print(f'Final best = {best_value}')
If I'm not mistaken you could simply return currbestVal in do_processVal() instead of the string.
Then you can collect them and select the min
(...)
values = []
for f in concurrent.futures.as_completed(results):
values.append(f.results())
print(f"FinalbestVal = {min(values)}")
(...)
Try this:
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
minimal_value = 1000000
for f in concurrent.futures.as_completed(results):
res, s = f.results()
if res < minimal_value:
minimal_value = res
print('Current best value: '+ str(currbestVal) + s) # not sure what format s
will have, you might need
to change that
print("minimal value:" + str(minimal_value))
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
I want to translate a huge matlab model to python. Therefor I need to work on the key functions first. One key function handles parallel processing. Basically, a matrix with parameters is the input, in which every row represents the parameters for one run. These parameters are used within a computation-heavy function. This computation-heavy function should run in parallel, I don't need the results of a previous run for any other run. So all processes can run independent from eachother.
Why is starmap_async slower on my pc? Also: When i add more code (to test consecutive computation) my python crashes (i use spyder). Can you give me advice?
import time
import numpy as np
import multiprocessing as mp
from functools import partial
# Create simulated data matrix
data = np.random.random((100,3000))
data = np.column_stack((np.arange(1,len(data)+1,1),data))
def EAF_DGL(*z, package_num):
sum_row = 0
for i in range(1,np.shape(z)[0]):
sum_row = sum_row + z[i]
func_result = np.column_stack((package_num,z[0],sum_row))
return func_result
t0 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap_async(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])]).get()
pool.close()
pool.join()
t1 = time.time()
calculation_time_parallel_async = t1-t0
print(calculation_time_parallel_async)
t2 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])])
pool.close()
pool.join()
t3 = time.time()
calculation_time_parallel = t3-t2
print(calculation_time_parallel)
I am trying to understand how multiprocessing works with Python. Here's my test code:
import numpy as np
import multiprocessing
import time
def worker(a):
for i in range(len(a)):
for j in arr2:
a[i] = a[i]*j
return len(a)
arr2 = np.random.rand(10000).tolist()
if __name__ == '__main__':
multiprocessing.freeze_support()
cores = multiprocessing.cpu_count()
arr1 = np.random.rand(1000000).tolist()
tmp = time.time()
pool = multiprocessing.Pool(processes=cores)
result = pool.map(worker, [arr1], chunksize=1000000/(cores-1))
print "mp time", time.time()-tmp
I have 8 cores. It usually ends up with 7 processes using only ~3% of the CPU for about a second, and the last process uses ~1/8 of the CPU for forever...(it has been running for about 15 minutes)
I understand that the interprocess communication usually bounds the complexity of parallel programming, but does it usually take this long? What else could cause the last process to take forever?
This thread: Python multiprocessing never joins seems to address a similar issue but it doesn't solve the problem with Pool.
It looks like you want to divide the work into chunks. You can use the range function to partition the data. On Linux, forked processes get a copy-on-write view of the parent memory so you can just pass down the indexes you want to work on. On Windows, no such luck. You need to pass in each sublist. This program should do it
import numpy as np
import multiprocessing
import time
import platform
def worker(a):
if platform.system() == "Linux":
# on linux we passed in start:len
start, length = a
a = arr1[start:length]
for i in range(len(a)):
for j in arr2:
a[i] = a[i]*j
return len(a)
arr2 = np.random.rand(10000).tolist()
if __name__ == '__main__':
multiprocessing.freeze_support()
cores = multiprocessing.cpu_count()
arr1 = np.random.rand(1000000).tolist()
tmp = time.time()
pool = multiprocessing.Pool(processes=cores)
chunk = (len(arr1)+cores-1)//cores
# on Windows, pass the sublist, on linux just the indexes and let the
# worker split from the view of parent memory space
if platform.system() == "Linux":
seq = [(i, i+chunk) for i in range(0, len(arr1), chunk)]
else:
seq = [arr1[i:i+chunk] for i in range(0, len(arr1), chunk)]
result = pool.map(worker, seq, chunksize=1)
print "mp time", time.time()-tmp
You point is here:
pool.map will automatically iterate the object which is [arr1] in your program. Please notice that the object is [arr1] but not arr1, that means the length of object you pass to pool.map is only one.
I think the simplest solution is replace [arr1] with arr1.
I am confused with Python multiprocessing.
I am trying to speed up a function which process strings from a database but I must have misunderstood how multiprocessing works because the function takes longer when given to a pool of workers than with “normal processing”.
Here an example of what I am trying to achieve.
from time import clock, time
from multiprocessing import Pool, freeze_support
from random import choice
def foo(x):
TupWerteMany = []
for i in range(0,len(x)):
TupWerte = []
s = list(x[i][3])
NewValue = choice(s)+choice(s)+choice(s)+choice(s)
TupWerte.append(NewValue)
TupWerte = tuple(TupWerte)
TupWerteMany.append(TupWerte)
return TupWerteMany
if __name__ == '__main__':
start_time = time()
List = [(u'1', u'aa', u'Jacob', u'Emily'),
(u'2', u'bb', u'Ethan', u'Kayla')]
List1 = List*1000000
# METHOD 1 : NORMAL (takes 20 seconds)
x2 = foo(List1)
print x2[1:3]
# METHOD 2 : APPLY_ASYNC (takes 28 seconds)
# pool = Pool(4)
# Werte = pool.apply_async(foo, args=(List1,))
# x2 = Werte.get()
# print '--------'
# print x2[1:3]
# print '--------'
# METHOD 3: MAP (!! DOES NOT WORK !!)
# pool = Pool(4)
# Werte = pool.map(foo, args=(List1,))
# x2 = Werte.get()
# print '--------'
# print x2[1:3]
# print '--------'
print 'Time Elaspse: ', time() - start_time
My questions:
Why does apply_async takes longer than the “normal way” ?
What I am doing wrong with map?
Does it makes sense to speed up such tasks with multiprocessing at all?
Finally: after all I have read here, I am wondering if multiprocessing in python works on windows at all ?
So your first problem is that there is no actual parallelism happening in foo(x), you are passing the entire list to the function once.
1)
The idea of a process pool is to have many processes doing computations on separate bits of some data.
# METHOD 2 : APPLY_ASYNC
jobs = 4
size = len(List1)
pool = Pool(4)
results = []
# split the list into 4 equally sized chunks and submit those to the pool
heads = range(size/jobs, size, size/jobs) + [size]
tails = range(0,size,size/jobs)
for tail,head in zip(tails, heads):
werte = pool.apply_async(foo, args=(List1[tail:head],))
results.append(werte)
pool.close()
pool.join() # wait for the pool to be done
for result in results:
werte = result.get() # get the return value from the sub jobs
This will only give you an actual speedup if the time it takes to process each chunk is greater than the time it takes to launch the process, in the case of four processes and four jobs to be done, of course these dynamics change if you've got 4 processes and 100 jobs to be done. Remember that you are creating a completely new python interpreter four times, this isn't free.
2) The problem you have with map is that it applies foo to EVERY element in List1 in a separate process, this will take quite a while. So if you're pool has 4 processes map will pop an item of the list four times and send it to a process to be dealt with - wait for process to finish - pop some more stuff of the list - wait for the process to finish. This makes sense only if processing a single item takes a long time, like for instance if every item is a file name pointing to a one gigabyte text file. But as it stands map will just take a single string of the list and pass it to foo where as apply_async takes a slice of the list. Try the following code
def foo(thing):
print thing
map(foo, ['a','b','c','d'])
That's the built-in python map and will run a single process, but the idea is exactly the same for the multiprocess version.
Added as per J.F.Sebastian's comment: You can however use the chunksize argument to map to specify an approximate size of for each chunk.
pool.map(foo, List1, chunksize=size/jobs)
I don't know though if there is a problem with map on Windows as I don't have one available for testing.
3) yes, given that your problem is big enough to justify forking out new python interpreters
4) can't give you a definitive answer on that as it depends on the number of cores/processors etc. but in general it should be fine on Windows.
On question (2)
With the guidance of Dougal and Matti, I figured out what's went wrong.
The original foo function processes a list of lists, while map requires a function to process single elements.
The new function should be
def foo2 (x):
TupWerte = []
s = list(x[3])
NewValue = choice(s)+choice(s)+choice(s)+choice(s)
TupWerte.append(NewValue)
TupWerte = tuple(TupWerte)
return TupWerte
and the block to call it :
jobs = 4
size = len(List1)
pool = Pool()
#Werte = pool.map(foo2, List1, chunksize=size/jobs)
Werte = pool.map(foo2, List1)
pool.close()
print Werte[1:3]
Thanks to all of you who helped me understand this.
Results of all methods:
for List * 2 Mio records: normal 13.3 seconds , parallel with async: 7.5 seconds, parallel with with map with chuncksize : 7.3, without chunksize 5.2 seconds
Here's a generic multiprocessing template if you are interested.
import multiprocessing as mp
import time
def worker(x):
time.sleep(0.2)
print "x= %s, x squared = %s" % (x, x*x)
return x*x
def apply_async():
pool = mp.Pool()
for i in range(100):
pool.apply_async(worker, args = (i, ))
pool.close()
pool.join()
if __name__ == '__main__':
apply_async()
And the output looks like this:
x= 0, x squared = 0
x= 1, x squared = 1
x= 2, x squared = 4
x= 3, x squared = 9
x= 4, x squared = 16
x= 6, x squared = 36
x= 5, x squared = 25
x= 7, x squared = 49
x= 8, x squared = 64
x= 10, x squared = 100
x= 11, x squared = 121
x= 9, x squared = 81
x= 12, x squared = 144
As you can see, the numbers are not in order, as they are being executed asynchronously.