Clarification regarding python Pool.map function used for python parallelism - python

I have a couple of questions regarding the functioning of the following code fragment.
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=10) # start 10 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1)
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
In the line pool = Pool(processes=10), does it even make a difference if i'm running on 4 processor architecture (quad-core) and instantiate more than 4 worker processes since only up to 4 processes can execute at any point in time?
In thepool.map(f,range(10)) function, if I instantiate 10 worker processes, and have maybe 50 mappers does python take care of assigning mappers to processes as they complete execution or am I supposed to figure out how many mappers are created and instantiate that many number of processes in the line pool = Pool(processes=number_of_mappers) ?.
This is my first attempt at parallelizing anything and I am thoroughly confused. so any help would be much appreciated.
Thanks in advance!

If you create more worker processes than you have available CPUs, that's fine, but the processes will compete with each other for cycles. That is, you'll waste more cycles, in the sense that cycles devoted to switching among processes does nothing to get you closer to finishing. For CPU-bound tasks, it's just wasteful. For I/O-bound tasks, though, it may be just what you want, since in that case processes will spend lots of their time idle, waiting for blocking I/O to complete.
The map functions automatically slice up their iterable argument and send pieces of it to all worker processes. I really don't know what you mean by mappers, though. How many mappers do you think you created in your example? 10? 1? Something else? In what you wrote, pool.map() blocks until all work is completed.

You can create more workers than the number of threads your CPU can execute. This is required in real-time applications, like a web server, where you must ensure that each client is able communicate with you without having to wait others. If it's not a real-time application and you just want to finish all the jobs as soon as possible, it would be wiser to create as many threads as your CPU can handle simultaneously.
Python takes care of assigning jobs to workers no matter how many jobs you have.

Related

Graceful Termination of Worker Pool

I want to spawn X number of Pool workers and give each of them X% of the work to do. My issue is that the work takes about 20 minutes to exhaust, longer for each extra process running, due to the type of calculations being done my answer may be found within minutes or hours. What I would like to do is implement some way for a single worker to go "HEY I FOUND IT" and use that signal to kill the remainder of the pool and move on with my calculations.
Key points:
I have tried callbacks, they don't seem to run on a starmap_async until the entire pool finishes.
I only care about the first suitable answer found.
I am not sharing resources and surprise process death, albeit rude, is perfectly acceptable.
I've also considered using a Queue, but it wouldn't make since because the scope of work I'm passing to each is already built into the parameters of the function.
Below is a very dulled down version of what I'm working with (the calculations I'm working with can take hours to finish over a 4.2 billion complex iterable.)
def doWork():
workers = Pool(2)
results = workers.starmap_async( func = distSearch , iterable = Sections1_5, callback = killPool )
workers.close()
print("Found answer : {}".format(results.get()))
workers.join()
def killPool():
workers.terminate()
print("Worker Pool Terminated")
I should probably specify that my process only returns if it finds an answer otherwise it just exits once done. I have looked at this thread but it has my completely lost and seems like a lot of overhead to consistently check for the win condition when that should come in the return/callback of the Worker Pool.
All the answers I've found result in significant overhead by supervising the worker pool, I'm looking for a solution that sources the kill signal at the worker level, autonomously.
I'm looking for a solution that sources the kill signal at the worker level, autonomously.
AFAIK, that doesn't exist. The methods of the Pool object (like Pool.terminate) should only be used in the process that created the pool.
What you could do is use Pool.imap_unordered. This returns an iterator in the parent process over the results which yields results as soon as they become available. As soon as the desired result pops up, you could then use Pool.terminate().
Edit:
From looking at the 3.5 implementation starmap_async returns a MapResult instance, which is not an iterator.
You can wrap multiple inputs in a tuple and use imap_unordered over a list of those.

Python What is the difference between a Pool of worker processes and just running multiple Processes?

I am not sure when to use pool of workers vs multiple processes.
processes = []
for m in range(1,5):
p = Process(target=some_function)
p.start()
processes.append(p)
for p in processes:
p.join()
vs
if __name__ == '__main__':
# start 4 worker processes
with Pool(processes=4) as pool:
pool_outputs = pool.map(another_function, inputs)
As it says on PYMOTW:
The Pool class can be used to manage a fixed number of workers for
simple cases where the work to be done can be broken up and
distributed between workers independently.
The return values from the jobs are collected and returned as a list.
The pool arguments include the number of processes and a function to
run when starting the task process (invoked once per child).
Please have a look at the examples given there to better understand its application, functionalities and parameters.
Basically the Pool is a helper, easing the management of the processes (workers) in those cases where all they need to do is consume common input data, process it in parallel and produce a joint output.
The Pool does quite a few things that otherwise you should code yourself (not too hard, but still, it's convenient to find a pre-cooked solution)
i.e.
the splitting of the input data
the target process function is simplified: it can be designed to expect one input element only. The Pool is going to call it providing each element from the subset allocated to that worker
waiting for the workers to finish their job (i.e. joining the processes)
...
merging the output of each worker to produce the final output
Below information might help you understanding the difference between Pool and Process in Python multiprocessing class:
Pool:
When you have junk of data, you can use Pool class.
Only the process under executions are kept in the memory.
I/O operation: It waits till the I/O operation is completed & does not schedule another process. This might increase the execution time.
Uses FIFO scheduler.
Process:
When you have a small data or functions and less repetitive tasks to do.
It puts all the process in the memory. Hence in the larger task, it might cause to loss of memory.
I/O operation: The process class suspends the process executing I/O operations and schedule another process parallel.
Uses FIFO scheduler.

Should I create a new Pool object every time or reuse a single one?

I'm trying to understand the best practices with Python's multiprocessing.Pool object.
In my program I use Pool.imap very frequently. Normally every time I start tasks in parallel I create a new pool object and then close it after I'm done.
I recently encountered a hang where the number of tasks submitted to the pool was less than the number of processes. What was odd was that it only occurred in my test pipeline which had a bunch of things run before it. Running the test as a standalone did not cause the hand. I assume it has to do with making multiple pools.
I'd really like to find some resources to help me understand the best practices in using Python's multiprocessing. Specifically I'm currently trying to understand the implications of making several pool objects versus using only one.
When you create a Pool of worker processes, new processes are spawned from the parent one. This is a very fast operation but it has its cost.
Therefore, as long as you don't have a very good reason, for example the Pool breaks due to one worker dying unexpectedly, it's better to always use the same Pool instance.
The reason for the hang is hard to tell without inspecting the code. You might not have clean the previous instances properly (call close()/stop() and then always call join()). You might have sent too big data through the Pool channel which usually ends up with a deadlock and so on.
Surely a pool does not break if you submit less tasks than workers. The pool is designed exactly to de-couple the number of tasks from the number of workers.

Python multiprocessing - Why does pool.close() take so long to return?

Sometimes a call to the function pool.close() takes a lot of time to return, and I want to understand why. Typically, I would have each process return a big set or a big dict, and the main merge them. It looks like this:
def worker() :
s = set()
# add millions of elements to s
return s
if __name__ == '__main__' :
pool = multiprocessing.Pool( processes=20 )
fullSet = set.union( * pool.imap_unordered( worker, xrange(100) ) )
pool.close() # This takes a LOT OF TIME!
pool.join()
As I said, the pool.close() might take 5, 10 min or more to return. Same problem occurs when using dictionaries instead of sets. This is what the documentation says about close:
Prevents any more tasks from being submitted to the pool. Once all the
tasks have been completed the worker processes will exit.
I guess I don't understand what's going on. After the line fullSet = ..., all the work is done and I don't need the workers anymore. What are they doing that is taking so much time?
It is very unlikely that Pool.close is taking that long. Simply because this is the source of close
def close(self):
debug('closing pool')
if self._state == RUN:
self._state = CLOSE
self._worker_handler._state = CLOSE
So all that’s happening is that some state variables are changed. This has no measurable impact on the runtime of that method and will not cause it to return later. You could just assume close to return instantaneously.
Now instead, what’s way more likely is that your pool.join() line is the “culprit” of this delay. But it’s just doing its job:
Wait for the worker processes to exit.
It essentially calls join on every process in the pool. And if you are joining a process or thread, you are actively waiting for it to complete its work and terminate.
So in your case, you have 20 processes running that add a million elements to a set. That takes a while. To make your main process not quit early (causing child processes to die btw.), you are waiting for the worker processes to finish their work; by joining on them. So what you’re experiencing is likely what should happen for the amount of work you do.
On a side note: If you do heavy CPU work in your worker functions, you shouldn’t spawn more processes than your CPU has hardware threads available, as you will only introduce additional overhead from managing and switching processes. For example for a consumer Core i7, this number would be 8.
It is probably the iteration over the result of pool.imap_unordered and the subsequent set.union that take a long time.
After each worker has finished building a set, it has to be pickled, sent back to the original process and unpickled. This takes time and memory. And then the * has to unpack all the sets for union to process.
You might get better results with map_async. Have the callback append the returned set to a list, and loop over that list using union on each set.

Join to later event

Consider a Python script which spans multiple processes or threads (I have not decided which yet, I'm still experimenting). I would like to join all processes/threads spawned at a time later than when they are created. For that purpose I am using a list as a thread pool:
pool = []
for job in jobs:
j = Process(target=some_Function, args=(some_Arg, ))
j.start()
pool.append(j)
# Lots of code
for p in pool:
p.join()
do_Right_Before_Exiting()
sys.exit()
The problem with this approach is that there are many, many processes and the pool is acting like a memory leak. Eliminating the pool saves considerable memory, but I have no way to ensure that the processes finish. I could 'hack' a sleep(30) to give them time to complete, but I find this ugly and unreliable.
How might I join a process / thread to a point other than where the code is run (maybe join to something like a C# label, or join to another event)? Alternatively, how might I check that all processes / threads have completed and only then proceed?
If an array of your threads takes up too much memory, I think you have too many threads. Try dividing the jobs into 8 threads only and see if that improves things.
Most people don't have more than 8 cores.

Categories

Resources