Python Multiprocessing Return Value Processing

Python Multiprocessing Return Value Processing - python

I'm having trouble on how to do a return results comparison for each of the multiprocessing.
I am doing a multiprocessing for my function. My function will return a value. I want to run my function 5 times and compare the which process have the lowest return value. My code as below.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return 'Current best value: ', currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
for f in concurrent.futures.as_completed(results):
print(f.results())
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
Output as of now:
Current best value: 12909.5 for process 21918
Current best value: 12091.5 for process 21920
Current best value: 12350.0 for process 21919
Current best value: 12000.5 for process 21921
Current best value: 11901.0 for process 21922
Finish in 85.86 second(s)
What I want is from all the 5 return values above, I want to take the data for value that is the lowest. In this example process 21922 have the lowest value. So I want to assign the value to a parameter.
FinalbestVal = 11901.0

From what I see, you are mixing the responsibilities of your function and that is causing your issues. The function below returns only the data. The code that is run after the data is collected evaluates the data. Then the data is presented all at once. Separating the responsibilities of your code is a decades old key to cleaner code. The reason it has been supported as a best practice for so long is because of problems like the one you have run into. It also makes code easier to reuse later without having to change it.
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return [currbestVal, os.getpid()]
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
best_value = -1
values_list = []
for f in concurrent.futures.as_completed(results):
values = f.result()
values_list.append(values)
if best_value == -1 or values[0] < best_value:
best_value = values[0]
for i in values_list:
print(f'Current best value: {i[0]} for process {i[1]}')
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
print(f'Final best = {best_value}')

If I'm not mistaken you could simply return currbestVal in do_processVal() instead of the string.
Then you can collect them and select the min
(...)
values = []
for f in concurrent.futures.as_completed(results):
values.append(f.results())
print(f"FinalbestVal = {min(values)}")
(...)

Try this:
def do_processVal():
getParamInit()
do_evaluation()
currbestVal = bestGlobalVal
return currbestVal, 'for process{}'.format(os.getpid())
from multiprocessing import Pool
import concurrent.futures
from os import getpid
import time
import os
start = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
results = [executor.submit(do_processVal) for _ in range(5)]
minimal_value = 1000000
for f in concurrent.futures.as_completed(results):
res, s = f.results()
if res < minimal_value:
minimal_value = res
print('Current best value: '+ str(currbestVal) + s) # not sure what format s
will have, you might need
to change that
print("minimal value:" + str(minimal_value))
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')

Related

python: how to apply callback function for multiprocess?

In my GUI application, I want to use multiprocessing to accelerate the calculation. Now, I can use multiprocessing, and collect the calculated result. Now, I want the subprocess can inform the main-process that the calculation is finished, but I can not find any solution.
My multiprocessing looks like:
import multiprocessing
from multiprocessing import Process
import numpy as np
class MyProcess(Process):
def __init__(self,name, array):
super(MyProcess,self).__init__()
self.name = name
self.array = array
recv_end, send_end = multiprocessing.Pipe(False)
self.recv = recv_end
self.send = send_end
def run(self):
s = 0
for a in self.array:
s += a
self.send.send(s)
def getResult(self):
return self.recv.recv()
if __name__ == '__main__':
process_list = []
for i in range(5):
a = np.random.random(10)
print(i, ' correct result: ', a.sum())
p = MyProcess(str(i), a)
p.start()
process_list.append(p)
for p in process_list:
p.join()
for p in process_list:
print(p.name, ' subprocess result: ', p.getResult())
I want the sub-process can inform the main-process that the calculation is finish so that I can show the result in my GUI.
Any suggestion is appreciated~~~

Assuming you would like to do something with a result (the sum of an numpy array, in your case) as soon as it has been generated, then I would use a multiprocessing pool with method multiprocessing.pool.Pool with method imap_unordered, which will return results in the order generated. In this case you need to pass to your worker function the index of the array in the list of arrays to be processed along with the array itself and have it return back this index along with the array's sum since this is the only way for the main process to know for which array the sum has been generated:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(tpl):
# unpack tuple:
i, array = tpl
s = 0
for a in array:
s += a
return i, s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# get result as soon as it has been returned:
for i, s in pool.imap_unordered(compute_sum, zip(range(n), array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 0: 4.760033809335711, actual result: 4.76003380933571
correct result 1: 5.486818812843256, actual result: 5.486818812843257
correct result 2: 5.400374562564179, actual result: 5.400374562564179
correct result 3: 4.079376706247242, actual result: 4.079376706247242
correct result 4: 4.20860716467263, actual result: 4.20860716467263
In the above run the actual results generated happened to be in the same order in which the tasks were submitted. To demonstrate that in general the results could be generated in arbitrary order based on how long it takes for the worker function to compute its result, we introduce some randomness to the processing time:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(tpl):
import time
# unpack tuple:
i, array = tpl
# results will be generated in random order:
time.sleep(np.random.sample())
s = 0
for a in array:
s += a
return i, s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# get result as soon as it has been returned:
for i, s in pool.imap_unordered(compute_sum, zip(range(n), array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 4: 6.662288433360379, actual result: 6.66228843336038
correct result 0: 3.352901187256162, actual result: 3.3529011872561614
correct result 3: 5.836344458981557, actual result: 5.836344458981557
correct result 2: 2.9950208717729656, actual result: 2.9950208717729656
correct result 1: 5.144743159869513, actual result: 5.144743159869513
If you are satisfied with getting back results in task-submission rather than task-completion order, then use method imap and there is no need to pass back and forth array indices:
from multiprocessing import Pool, cpu_count
import numpy as np
def compute_sum(array):
s = 0
for a in array:
s += a
return s
if __name__ == '__main__':
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
for i, s in enumerate(pool.imap(compute_sum, array_list)):
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
pool.close()
pool.join()
Prints:
correct result 0: 4.841913985702773, actual result: 4.841913985702773
correct result 1: 4.836923014762733, actual result: 4.836923014762733
correct result 2: 4.91242274200897, actual result: 4.91242274200897
correct result 3: 4.701913574838348, actual result: 4.701913574838349
correct result 4: 5.813666896917504, actual result: 5.813666896917503
Update
You can also use method apply_async specifying a callback function to be invoked when a result is returned from your worker function, compute_sum. apply_async returns a multiprocessing.pool.AsyncResult whose get method will block until the task has completed and returns the return value from the completed task. But here, since, we are using a callback function that will automatically be called with the result when the task completes instead of calling method multiprocessing.pool.AsyncResult.get, there is no need to save the AsyncResult instances. We also rely on calling methods multiprocessing.pool.Pool.close() followed by multiprocessing.pool.Pool.join() to block until all submitted tasks have completed and results returned:
from multiprocessing import Pool, cpu_count
import numpy as np
from functools import partial
def compute_sum(i, array):
s = 0
for a in array:
s += a
return i, s
def calculation_display(result, t):
# Unpack returned tuple:
i, s = t
print(f'correct result {i}: {array_list[i].sum()}, actual result: {s}')
result[i] = s
if __name__ == '__main__':
global array_list
array_list = [np.random.random(10) for _ in range(5)]
n = len(array_list)
result = [0] * n
pool_size = min(cpu_count(), n)
pool = Pool(pool_size)
# Get result as soon as it has been returned.
# Pass to our callback as the first argument the results list.
# The return value will now be the second argument:
my_callback = partial(calculation_display, result)
for i, array in enumerate(array_list):
pool.apply_async(compute_sum, args=(i, array), callback=my_callback)
# Wait for all submitted tasks to complete:
pool.close()
pool.join()
print('results:', result)
Prints:
correct result 0: 5.381579338696546, actual result: 5.381579338696546
correct result 1: 3.8780497856741274, actual result: 3.8780497856741274
correct result 2: 4.548733927791488, actual result: 4.548733927791488
correct result 3: 5.048921365623381, actual result: 5.048921365623381
correct result 4: 4.852415747983676, actual result: 4.852415747983676
results: [5.381579338696546, 3.8780497856741274, 4.548733927791488, 5.048921365623381, 4.852415747983676]

join() output from multiprocessing when using tqdm for progress bar

I'm using a construct similar to this example to run my processing in parallel with a progress bar courtesy of tqdm...
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return square
if __name__ == '__main__':
with Pool(processes=2) as p:
max_ = 30
with tqdm(total=max_) as pbar:
for _ in p.imap_unordered(_foo, range(0, max_)):
pbar.update()
results = p.join() ## My attempt to combine results
results is always NoneType though, and I can not work out how to get my results combined. I understand that with ...: will close what it is working with on completion automatically.
I've tried doing away with the outer with:
if __name__ == '__main__':
max_ = 10
p = Pool(processes=8)
with tqdm(total=max_) as pbar:
for _ in p.imap_unordered(_foo, range(0, max_)):
pbar.update()
p.close()
results = p.join()
print(f"Results : {results}")
Stumped as to how to join() my results?

Your call to p.join() just waits for all the pool processes to end and returns None. This call is actually unnecessary since you are using the pool as a context manager, that is you have specified with Pool(processes=2) as p:). When that block terminates an implicit call is made to p.terminate(), which immediately terminates the pool processes and any tasks that may be running or queued up to run (there are none in your case).
It is, in fact, iterating the iterator returned by the call to p.imap_unordered that returns each return value from your worker function, _foo. But since you are using method imap_unordered, the results returned may not be in submission order. In other words, you cannot assume that the return values will be in succession 0, 1, , 4, 9, etc. There are many ways to handle this, such as having your worker function return the original argument along with the squared value:
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return my_number, square # return the argunent along with the result
if __name__ == '__main__':
with Pool(processes=2) as p:
max_ = 30
results = [None] * 30; # preallocate the resulys array
with tqdm(total=max_) as pbar:
for x, result in p.imap_unordered(_foo, range(0, max_)):
results[x] = result
pbar.update()
print(results)
The second way is to not use imap_unordered, but rather apply_async with a callback function. The disadvantage of this is that for large iterables you do not have the option of specifying a chunksize argument as you do with imap_unordered:
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
return square
if __name__ == '__main__':
def my_callback(_): # ignore result
pbar.update() # update progress bar when a result is produced
with Pool(processes=2) as p:
max_ = 30
with tqdm(total=max_) as pbar:
async_results = [p.apply_async(_foo, (x,), callback=my_callback) for x in range(0, max_)]
# wait for all tasks to complete:
p.close()
p.join()
results = [async_result.get() for async_result in async_results]
print(results)

Thread Pool Executor using Concurrent: no improvement for various number of workers

I'm trying to implement a task in parallel using Concurrent. Please find below a piece of code for it:
import os
import time
from concurrent.futures import ProcessPoolExecutor as PE
import concurrent.futures
# num CPUs
cpu_num = len(os.sched_getaffinity(0))
print("Number of cpu available : ",cpu_num)
# max_Worker = cpu_num
max_Worker = 1
# A fake input array
n=1000000
array = list(range(n))
results = []
# A fake function being applied to each element of array
def task(i):
return i**2
x = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=max_Worker) as executor:
features = {executor.submit(task, j) for j in array}
# the real function is heavy and we need to be sure of completeness of each run
for future in concurrent.futures.as_completed(features):
results.append(future.result())
results = [future.result() for future in features]
y = time.time()
print('=========================================')
print(f"Train data preparation time (s): {(y-x)}")
print('=========================================')
And now my questions,
Although there is no error, is it correct/optimized?
While playing with the number of workers, seems there is no
improvement in the speed (e.g., 1 vs 16, no difference). Then,
what's the problem and how can be solved?
Thanks in advance,

See my comment to your question. To the overhead I mentioned in that comment you need to also add the oberhead in just creating the process pool itself.
The following is a benchmark with several results. The first is a timing from just calling the worker function task 100000 times and creating a results list and printing out the last element of that list. It will become apparent why I have reduced the number of times I am calling task from 1000000 to 100000.
The next attempt is to use multiprocessing to accomplish the same thing using a ProcessPoolExecutor with the submit method and then processing the Future instances that are returned.
The next attempt is to instead use the map method with the default chunksize argument of 1 being used. It is important to understand this argument. With a chunksize value of 1, each element of the iterable that is passed to the map method is written individually to a queue of tasks as a chunk to be processed by the processes in the pool. When a pool process becomes idle looking for work, it pulls from the queue the next chunk of tasks to be performed, processes each task comprising the chunk and then becomes idle again. When there are a lot of submitted tasks being submitted via map, a chunksize value of 1 is inefficient. You would expect its performance to be equivalent to repeatedly issuing submit calls for each element of the iterable.
The next attempt specifies a chunksize value which approximates more or less the value that the map function used by the Pool class in the multiprocessing package would have used by default. As you can see, the improvement is dramatic, but still not an improvement over the non-multiprocessing case.
The final attempt uses the multiprocessing faciltity provided by package multiprocessing and its multiprocessing.pool.Pool class. The difference in this benchmark is that its map function uses a more intelligent default chunksize when no chunksize argument is specified.
import os
import time
from concurrent.futures import ProcessPoolExecutor as PE
from multiprocessing import Pool
# A fake function being applied to each element of array
def task(i):
return i**2
# required for Windows:
if __name__ == '__main__':
n=100000
t1 = time.time()
results = [task(i) for i in range(n)]
print('Non-multiprocessing time:', time.time() - t1, results[-1])
# num CPUs
cpu_num = os.cpu_count()
print("Number of CPUs available: ",cpu_num)
t1 = time.time()
with PE(max_workers=cpu_num) as executor:
futures = [executor.submit(task, i) for i in range(n)]
results = [future.result() for future in futures]
print('Multiprocessing time using submit:', time.time() - t1, results[-1])
t1 = time.time()
with PE(max_workers=cpu_num) as executor:
results = list(executor.map(task, range(n)))
print('Multiprocessing time using map:', time.time() - t1, results[-1])
t1 = time.time()
chunksize = n // (4 * cpu_num)
with PE(max_workers=cpu_num) as executor:
results = list(executor.map(task, range(n), chunksize=chunksize))
print(f'Multiprocessing time using map: {time.time() - t1}, chunksize: {chunksize}', results[-1])
t1 = time.time()
with Pool(cpu_num) as executor:
results = executor.map(task, range(n))
print('Multiprocessing time using Pool.map:', time.time() - t1, results[-1])
Prints:
Non-multiprocessing time: 0.027019739151000977 9999800001
Number of CPUs available: 8
Multiprocessing time using submit: 77.34723353385925 9999800001
Multiprocessing time using map: 79.52981925010681 9999800001
Multiprocessing time using map: 0.30500149726867676, chunksize: 3125 9999800001
Multiprocessing time using Pool.map: 0.2799997329711914 9999800001
Update
The following bechmarks use a version of task that is very CPU-intensive and shows the benefit of multiprocessing. It would also seem for this small iterable size (100), forcing a chunksize value of 1 for the Pool.map case (it would by default compute a chunksize value of 4), is slightly more performant.
import os
import time
from concurrent.futures import ProcessPoolExecutor as PE
from multiprocessing import Pool
# A fake function being applied to each element of array
def task(i):
for _ in range(1_000_000):
result = i ** 2
return result
def compute_chunksize(iterable_size, pool_size):
chunksize, remainder = divmod(iterable_size, pool_size * 4)
if remainder:
chunksize += 1
return chunksize
# required for Windows:
if __name__ == '__main__':
n = 100
cpu_num = os.cpu_count()
chunksize = compute_chunksize(n, cpu_num)
t1 = time.time()
results = [task(i) for i in range(n)]
t2 = time.time()
print('Non-multiprocessing time:', t2 - t1, results[-1])
# num CPUs
print("Number of CPUs available: ",cpu_num)
t1 = time.time()
with PE(max_workers=cpu_num) as executor:
futures = [executor.submit(task, i) for i in range(n)]
results = [future.result() for future in futures]
t2 = time.time()
print('Multiprocessing time using submit:', t2 - t1, results[-1])
t1 = time.time()
with PE(max_workers=cpu_num) as executor:
results = list(executor.map(task, range(n)))
t2 = time.time()
print('Multiprocessing time using map:', t2 - t1, results[-1])
t1 = time.time()
with PE(max_workers=cpu_num) as executor:
results = list(executor.map(task, range(n), chunksize=chunksize))
t2 = time.time()
print(f'Multiprocessing time using map: {t2 - t1}, chunksize: {chunksize}', results[-1])
t1 = time.time()
with Pool(cpu_num) as executor:
results = executor.map(task, range(n))
t2 = time.time()
print('Multiprocessing time using Pool.map:', t2 - t1, results[-1])
t1 = time.time()
with Pool(cpu_num) as executor:
results = executor.map(task, range(n), chunksize=1)
t2 = time.time()
print('Multiprocessing time using Pool.map (chunksize=1):', t2 - t1, results[-1])
Prints:
Non-multiprocessing time: 23.12758779525757 9801
Number of CPUs available: 8
Multiprocessing time using submit: 5.336004018783569 9801
Multiprocessing time using map: 5.364996671676636 9801
Multiprocessing time using map: 5.444890975952148, chunksize: 4 9801
Multiprocessing time using Pool.map: 5.400001287460327 9801
Multiprocessing time using Pool.map (chunksize=1): 4.698001146316528 9801

starmap, starmap_async python

I want to translate a huge matlab model to python. Therefor I need to work on the key functions first. One key function handles parallel processing. Basically, a matrix with parameters is the input, in which every row represents the parameters for one run. These parameters are used within a computation-heavy function. This computation-heavy function should run in parallel, I don't need the results of a previous run for any other run. So all processes can run independent from eachother.
Why is starmap_async slower on my pc? Also: When i add more code (to test consecutive computation) my python crashes (i use spyder). Can you give me advice?
import time
import numpy as np
import multiprocessing as mp
from functools import partial
# Create simulated data matrix
data = np.random.random((100,3000))
data = np.column_stack((np.arange(1,len(data)+1,1),data))
def EAF_DGL(*z, package_num):
sum_row = 0
for i in range(1,np.shape(z)[0]):
sum_row = sum_row + z[i]
func_result = np.column_stack((package_num,z[0],sum_row))
return func_result
t0 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap_async(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])]).get()
pool.close()
pool.join()
t1 = time.time()
calculation_time_parallel_async = t1-t0
print(calculation_time_parallel_async)
t2 = time.time()
if __name__ == "__main__":
package_num = 1
help_EAF_DGL = partial(EAF_DGL, package_num=1)
with mp.Pool() as pool:
#result = pool.starmap(partial(EAF_DGL, package_num), [(data[i]) for i in range(0,np.shape(data)[0])])
result = pool.starmap(help_EAF_DGL, [(data[i]) for i in range(0,np.shape(data)[0])])
pool.close()
pool.join()
t3 = time.time()
calculation_time_parallel = t3-t2
print(calculation_time_parallel)

Is this the right way to use multiprocessing queue with python?

I am trying to speed up my code by using multiprocessing with Python. The only problem I ran into when trying to implement multiprocessing was that my function has a return statement and I needed to save that data to a list. The best way I found using google was to use queue as "q.put()" and retrieve it using "q.get()". The only issue is that I think i'm not utilizing this the right way because when I use command prompt after compiling, it shows i'm hardly using my cpu and I only see one Python process running. If I remove "q.get()" the process is super fast and utilizes my cpu. Am I doing this the right way?
import time
import numpy as np
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
def test(x,y,q):
q.put(x * y)
if __name__ == '__main__':
q = Queue()
one = []
two = []
three = []
start_time = time.time()
for x in np.arange(30, 60, 1):
for y in np.arange(0.01, 2, 0.5):
p = multiprocessing.Process(target=test, args=(x, y, q))
p.start()
one.append(q.get())
two.append(int(x))
three.append(float(y))
print(x, ' | ', y, ' | ', one[-1])
p.join()
print("--- %s seconds ---" % (time.time() - start_time))
d = {'x' : one, 'y': two, 'q' : three}
data = pd.DataFrame(d)
print(data.tail())

No, this is not correct. You start a process and wait for the result through q.get at once. Therefore only one process running at the same time. If you want to operate on many tasks, use multiprocessing.Pool:
import time
import numpy as np
from multiprocessing import Pool
from itertools import product
def test((x,y)):
return x, y, x * y
def main():
start_time = time.time()
pool = Pool()
result = pool.map(test, product(np.arange(30, 60, 1), np.arange(0.01, 2, 0.5)))
pool.close()
print("--- %s seconds ---" % (time.time() - start_time))
print(result)
if __name__ == '__main__':
main()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing Return Value Processing - python

If I'm not mistaken you could simply return currbestVal in do_processVal() instead of the string. Then you can collect them and select the min (...) values = [] for f in concurrent.futures.as_completed(results): values.append(f.results()) print(f"FinalbestVal = {min(values)}") (...)

Related

python: how to apply callback function for multiprocess?

join() output from multiprocessing when using tqdm for progress bar

Thread Pool Executor using Concurrent: no improvement for various number of workers

starmap, starmap_async python

Is this the right way to use multiprocessing queue with python?

Categories

Resources