Recently I have started using the multiprocessor pool executor in python to accelerate my processing.
So instead of doing a
list_of_res=[]
for n in range(a_number):
res=calculate_something(list_of sources[n])
list_of_res.append(res)
joint_results=pd.concat(list_of_res)
I do
with ProcessPoolExecutor(max_workers=8) as executor:
joint_results=pd.concat(executor.map(calculate_something,list_of_sources))
It works great.
However I've noticed that inside the calculate_something function I call the same function like 8 times, one after another, so I might as well apply a map to them instead of a loop
My question is, can I apply multiprocessing to a function that is already being called in multiprocess?
yes you can have a worker process spawn another pool of workers, but it is not optimal.
each time you launch a new process it takes a few hundred milliseconds to a few seconds for this new process to initialize and start executing work (OS, disk and code dependent.)
launching a worker from a worker is just wasting the overhead of spawning the first child to begin with, and you are better off extracting the loop inside calculate_something and launching it directly within your initial executor.
a better approach is to launch your initial calculate_something using a ThreadPoolExecutor and have one shared ProcessPoolExecutor that all your thread workers will push work into, this way you can limit the number of newly created processes and avoid creating and deleting much more workers than you actually need, and it takes only a few microseconds to launch a threadpool.
this is an example of how to nest threadpool and process_pool.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def process_worker(n):
print(n)
return n
def thread_worker(list_of_n,process_pool:ProcessPoolExecutor):
work_done = list(process_pool.map(process_worker,list_of_n))
return work_done
if __name__ == "__main__":
list_of_lists_of_n = [[1,2,3],[4,5,6]]
with ProcessPoolExecutor() as process_pool, ThreadPoolExecutor() as threadpool:
tasks = []
work_done = []
for item in list_of_lists_of_n:
tasks.append(threadpool.submit(thread_worker,item,process_pool))
for item in tasks:
work_done.append(item.result())
print(work_done)
Related
Will the following way of using a thread pool cause a deadlock? Or is such a pattern not preferred? If so, what is the alternative.
Passing pool to a function that is run in a thread, which in turn invokes a function that is run the same pool.
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def bar():
sleep(2)
return 2
def foo(pool):
sleep(2)
my_list = [pool.submit(bar) for i in range(4)]
return [i.result() for i in my_list]
pool = ThreadPoolExecutor(10)
my_list = [pool.submit(foo, pool) for i in range(2)]
for i in my_list:
print(i.result())
This would be a safe way to spawn a thread from within a thread that itself was initiated by ThreadPoolExecutor. This may not be necessary if ThreadPoolExecutor itself is thread-safe. The output shows how, in this case, there would be 10 concurrent threads.
from concurrent.futures import ThreadPoolExecutor
from time import sleep
BAR_THREADS = 4
FOO_THREADS = 2
def bar(_):
print('Running bar')
sleep(1)
def foo(_):
print('Running foo')
with ThreadPoolExecutor(max_workers=BAR_THREADS) as executor:
executor.map(bar, range(BAR_THREADS))
with ThreadPoolExecutor(max_workers=FOO_THREADS) as executor:
executor.map(foo, range(FOO_THREADS))
print('Done')
Output:
Running foo
Running foo
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Running bar
Done
Will the following way of using a thread pool cause a deadlock? ... If so, what is the alternative?
One alternative would be to use a thread pool that does not have a hard limit on the number of workers. Unfortunately, the concurrent.futures.ThreadPoolExecutor class is not so sophisticated. You either would have to write your own, or else find one provided by a third party. (I'm not a big-time Python programmer, so I don't know of one off-hand.)
A naive alternative thread-pool might create a new worker any time submit() was called and all of the existing workers were busy. On the other hand, that could make it easy for you to run the program out of memory by creating too many threads. A slightly more sophisticated thread pool might also kill off a worker if too many other workers were idle at the moment when the worker completed its task.
More sophisticated strategies are possible, but you might have to think more deeply about the needs and patterns-of-use of the application before writing the code.
I am using multiprocessing python module to run parallel and unrelated jobs with a function similar to the following example:
import numpy as np
from multiprocessing import Pool
def myFunction(arg1):
name = "file_%s.npy"%arg1
A = np.load(arg1)
A[A<0] = np.nan
np.save(arg1,A)
if(__name__ == "__main__"):
N = list(range(50))
with Pool(4) as p:
p.map_async(myFunction, N)
p.close() # I tried with and without that statement
p.join() # I tried with and without that statement
DoOtherStuff()
My problem is that the function DoOtherStuff is never executed, the processes switches into sleep mode on top and I need to kill it with ctrl+C to stop it.
Any suggestions?
You have at least a couple problems. First, you are using map_async() which does not block until the results of the task are completed. So what you're doing is starting the task with map_async(), but then immediately closes and terminates the pool (the with statement calls Pool.terminate() upon exiting).
When you add tasks to a Process pool with methods like map_async it adds tasks to a task queue which is handled by a worker thread which takes tasks off that queue and farms them out to worker processes, possibly spawning new processes as needed (actually there is a separate thread which handles that).
Point being, you have a race condition where you're terminating the Pool likely before any tasks are even started. If you want your script to block until all the tasks are done just use map() instead of map_async(). For example, I rewrote your script like this:
import numpy as np
from multiprocessing import Pool
def myFunction(N):
A = np.load(f'file_{N:02}.npy')
A[A<0] = np.nan
np.save(f'file2_{N:02}.npy', A)
def DoOtherStuff():
print('done')
if __name__ == "__main__":
N = range(50)
with Pool(4) as p:
p.map(myFunction, N)
DoOtherStuff()
I don't know what your use case is exactly, but if you do want to use map_async(), so that this task can run in the background while you do other stuff, you have to leave the Pool open, and manage the AsyncResult object returned by map_async():
result = pool.map_async(myFunction, N)
DoOtherStuff()
# Is my map done yet? If not, we should still block until
# it finishes before ending the process
result.wait()
pool.close()
pool.join()
You can see more examples in the linked documentation.
I don't know why in your attempt you got a deadlock--I was not able to reproduce that. It's possible there was a bug at some point that was then fixed, though you were also possibly invoking undefined behavior with your race condition, as well as calling terminate() on a pool after it's already been join()ed. As for your why your answer did anything at all, it's possible that with the multiple calls to apply_async() you managed to skirt around the race condition somewhat, but this is not at all guaranteed to work.
I would like to use spaCy in a program which is currently implemented with multiprocessing. Specifically I am using ProcessingPool to spawn 4 subprocesses which then go off and do their merry tasks.
To use spaCy (specifically for POS tagging), I need to invoke spacy.load('en'), which is an expensive call (takes ~10 seconds). If I am to load this object within each subprocess then it will take ~40 seconds, as they are all reading from the same location. This is annoyingly long.
But I cannot figure out a way to get them to share the object which is being loaded. This object cannot be pickled, which means (as far as I know):
It cannot be passed into the Pool.map call
It cannot be stored and used by a Manager instance to then be shared amongst the processes
What can I do?
I don't how you use Pool.map exactly but be aware that Pool.map doesn't work with a massive number of input. In Python 3.6, it's implemented in Lib/multiprocessing/pool.py as you can see, it states it takes an iterable as first argument but the implementation does consume the whole iterable before running the multiprocess map. So I think that's not Pool.map that you need to use if you need to process a lot of data. Maybe Pool.imap and Pool.imap_unordered can work.
About your actual issue. I have a solution that doesn't involve Pool.map and works kind of a like multiprocess foreach.
First you need to inherit Pool and create a worker process:
from multiprocessing import cpu_count
from multiprocessing import Queue
from multiprocessing import Process
class Worker(Process):
english = spacy.load('en')
def __init__(self, queue):
super(Worker, self).__init__()
self.queue = queue
def run(self):
for args in iter(self.queue.get, None):
# process args here, you can use self.
You prepare the pool of processus like that:
queue = Queue()
workers = list()
for _ in range(cpu_count()): # minus one if the main processus is CPU intensive
worker = Worker(queue)
workers.append(worker)
worker.start()
Then you can feed the pool via queue:
for args in iterable:
queue.put(args)
iterable is the list of arguments that you pass to the workers. The above code will push the content of iterable as fast as it can. Basically, if the worker is slow enough, almost all the iterable will be pushed to the queue before the workers have finished their job. That's why the content of the iterable must fit into memory.
If the workers arguments (aka. iterable) can't fit into memory you must synchronize somehow the main processus and the workers...
At the end make sure to call the following:
for worker in workers:
queue.put(None)
for worker in workers:
worker.join()
I have the following setup:
results = [f(args) for _ in range(10**3)]
But, f(args) takes a long time to compute. So I'd like to throw multiprocessing at it. I would like to do so by doing:
pool = mp.pool(mp.cpu_count() -1) # mp.cpu_count() -> 8
results = [pool.apply_async(f, args) for _ in range(10**3)]
Clearly, I don't have 1000 processors on my computer, so my concern:
Does the above call result in 1000 processes simultaneously competing for CPU time or 7 processes running simultaneously, iteratively computing the next f(args) when the previous call finishes?
I suppose I could do something like pool.async_map(f, (args for _ in range(10**3))) to get the same results, but the purpose of this post is to understand the behavior of pool.apply_async
You'll never have more processes running than there are workers in your pool (in your case mp.cpu_count() - 1. If you call apply_async and all the workers are busy, the task will be queued and executed as soon as a worker frees up. You can see this with a simple test program:
#!/usr/bin/python
import time
import multiprocessing as mp
def worker(chunk):
print('working')
time.sleep(10)
return
def main():
pool = mp.Pool(2) # Only two workers
for n in range(0, 8):
pool.apply_async(worker, (n,))
print("called it")
pool.close()
pool.join()
if __name__ == '__main__':
main()
The output is like this:
called it
called it
called it
called it
called it
called it
called it
called it
working
working
<delay>
working
working
<delay>
working
working
<delay>
working
working
The number of worker processes is wholly controlled by the argument to mp.pool(). So if mp.cpu_count() returns 8 on your box, 7 worker processes will be created.
All pool methods (apply_async() among them) then use no more than that many worker processes. Under the covers, arguments are pickled in the main program and sent over an inter-process pipe to worker processes. This hidden machinery effectively creates a work queue, off of which the fixed number of worker processes pull descriptions of work to do (function name + arguments).
Other than that, it's all just magic ;-)
I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.