I'm using a library whose multiprocessing is implemented using multiprocessing.pool.ThreadPool(processes).
How is it possible to get/compute the thread index from the pool (starting from 0 to processes-1) ?
I've been reading the documentation and searching the web without finding convincing solution. I can get the thread ID (using threading.get_ident()), and could go through all the threads to construct a mapping between their index and their ID but I would need to use some kind of time.sleep() to ensure I browse them all... Do you think of any better solution?
The idea is to create a worker function, called test_worker in the example below, that returns its thread identity and the argument it is called with, which takes on values of 0 ... pool size - 1. We then submit tasks with:
pool.map(test_worker, range(POOLSIZE), 1)
By specifying a chunksize value of 1, the idea is that each thread will be given just 1 task to process with the first thread given argument 0, the second thread argument 1, etc. We must ensure that test_worker gives up control of the processor to the other threads in the pool. If it were to consist only of a return statement, the first thread might end up processing all the tasks. Essentially tasks are placed on a single queue in lists of chunksize tasks and each pool thread takes off the next available list and processes the tasks in the list, But if the task is so trivial, it is possible that the first thread could actually grab all the lists because it never gives up control of the processor to the other threads. To avoid this, we insert a call to time.sleep in our worker.
from multiprocessing.pool import ThreadPool
import threading
def test_worker(i):
# To ensure that the worker gives up control of the processor we sleep.
# Otherwise, the same thread may be given all the tasks to process.
time.sleep(.1)
return threading.get_ident(), i
def real_worker(x):
# return the argument squared and the id of the thread that did the work
return x**2, threading.get_ident()
POOLSIZE = 5
with ThreadPool(POOLSIZE) as pool:
# chunksize = 1 is critical to be sure that we have 1 task per thread:
thread_dict = {result[0]: result[1]
for result in pool.map(test_worker, range(POOLSIZE), 1)}
assert(len(thread_dict) == POOLSIZE)
print(thread_dict)
value, id = pool.apply(real_worker, (7,))
print(value) # should be 49
assert (id in thread_dict)
print('thread index = ', thread_dict[id])
Prints:
{16880: 0, 16152: 1, 7400: 2, 13320: 3, 168: 4}
49
thread index = 4
A Version That Does Not Use sleep
from multiprocessing.pool import ThreadPool
import threading
import time
def test_worker(i, event):
if event:
event.wait()
return threading.get_ident(), i
def real_worker(x):
return x**2, threading.get_ident()
# Let's use a really big pool size for a good test:
POOLSIZE = 500
events = [threading.Event() for _ in range(POOLSIZE - 1)]
with ThreadPool(POOLSIZE) as pool:
thread_dict = {}
# These first POOLSIZE - 1 tasks will wait until we set their events
results = [pool.apply_async(test_worker, args=(i, event)) for i, event in enumerate(events)]
# This last one is not passed an event and so it does not wait.
# When it completes, we can be sure the other tasks, which have been submitted before it
# have already been picked up by the other threads in the pool.
id, index = pool.apply(test_worker, args=(POOLSIZE - 1, None))
thread_dict[id] = index
# let the others complete:
for event in events:
event.set()
for result in results:
id, index = result.get()
thread_dict[id] = index
assert(len(thread_dict) == POOLSIZE)
value, id = pool.apply(real_worker, (7,))
print(value) # should be 49
assert (id in thread_dict)
print('thread index = ', thread_dict[id])
Prints:
49
thread index = 499
It is possible to get the index of the thread in the ThreadPool without using sleep, by using the initializer function. This is a function that is called once immediately after the thread is started. It can be used to acquire resources, such as a database connection, to use exactly one connection per thread.
Use threading.local() to make sure that each thread can store and access its own resource. In the example below we treat the index in the ThreadPool as a resource. Use a Queue to make sure no two threads grab the same resource.
from multiprocessing.pool import ThreadPool
import threading
import time
import queue
POOL_SIZE = 4
local_storage = threading.local()
def init_thread_resource(resources):
local_storage.pool_idx = resources.get(False)
print(f'\nThread {threading.get_ident()} has pool_idx {local_storage.pool_idx}')
## A thread can also initialize other things here, meant for only 1 thread, e.g.
# local_storage.db_connection = open_db_connection()
def task(item):
# When running this example you may see all the tasks are picked up by one thread.
# Uncomment time.sleep below to see each of the threads do some work.
# This is not required to assign a unique index to each thread.
# time.sleep(1)
return f'Thread {threading.get_ident()} with pool_idx {local_storage.pool_idx} handled {item}'
def run_concurrently():
# Initialize the resources
resources = queue.Queue(POOL_SIZE) # one resource per thread
for pool_idx in range(POOL_SIZE):
resources.put(pool_idx, False)
container = range(500, 500 + POOL_SIZE) # Offset by 500 to not confuse the items with the pool_idx
with ThreadPool(POOL_SIZE, init_thread_resource, [resources]) as pool:
records = pool.map(task, container)
print('\n'.join(records))
run_concurrently()
This outputs:
Thread 32904 with pool_idx 0 handled 500
Thread 14532 with pool_idx 1 handled 501
Thread 32008 with pool_idx 2 handled 502
Thread 31552 with pool_idx 3 handled 503
Related
Is there any option to have a multiprocessing Queue where each value can be accessed twice?
My problem is I have one "Generator process" creating a constant flux of data and would like to access this in two different process each doing it's thing with the data.
A minimal "example" of the issue.
import multiprocessing as mp
import numpy as np
class Process1(mp.Process):
def __init__(self,Data_Queue):
mp.Process.__init__(self)
self.Data_Queue = Data_Queue
def run(self):
while True:
self.Data_Queue.get()
# Do stuff with
self.Data_Queue.task_done()
class Process2(mp.Process):
def __init__(self,Data_Queue):
mp.Process.__init__(self)
self.Data_Queue = Data_Queue
def run(self):
while True:
self.Data_Queue.get()
# Do stuff with
self.Data_Queue.task_done()
if __name__ == "__main__":
data_Queue = mp.Queue()
P1 = Process1()
P1.start()
P2 = Process2()
P2.start()
while True: # Generate data
data_Queue.put(np.random.rand(1000))
The idea is that I would like for both Process1 and Process2 to access all generated data in this example. What would happen is that each one would only just get some random portions of it this way.
Thanks for the help!
Update 1: As pointed in some of the questions and answers this becomes a little more complicated for two reasons I did not include in the initial question.
The data is externally generated on a non constant schedule (I may receive tons of data for a few seconds than wait minutes for more to come)
As such, data may arrive faster than it's possible to process so it would need to be "Queued" in a way while it waits for its turn to be processed.
One way to solve your problem is, first, to use multiprocessing.Array to share, let's say, a numpy array with your data between worker processes. Second, use a multiprocessing.Barrier to synchronize the main process and the workers when generating and processing data batches. And, finally, provide each process worker with its own queue to signal them when the next data batch is ready for processing. Below is the complete working example just to show you the idea:
#!/usr/bin/env python3
import os
import time
import ctypes
import multiprocessing as mp
import numpy as np
WORKERS = 5
DATA_SIZE = 10
DATA_BATCHES = 10
def process_data(data, queue, barrier):
proc = os.getpid()
print(f'[WORKER: {proc}] Started')
while True:
data_batch = queue.get()
if data_batch is None:
break
arr = np.frombuffer(data.get_obj())
print(f'[WORKER: {proc}] Started processing data {arr}')
time.sleep(np.random.randint(0, 2))
print(f'[WORKER: {proc}] Finished processing data {arr}')
barrier.wait()
print(f'[WORKER: {proc}] Finished')
def generate_data_array(i):
print(f'[DATA BATCH: {i}] Start generating data... ', end='')
time.sleep(np.random.randint(0, 2))
data = np.random.randint(0, 10, size=DATA_SIZE)
print(f'Done! {data}')
return data
if __name__ == '__main__':
data = mp.Array(ctypes.c_double, DATA_SIZE)
data_barrier = mp.Barrier(WORKERS + 1)
workers = []
# Start workers:
for _ in range(WORKERS):
data_queue = mp.Queue()
p = mp.Process(target=process_data, args=(data, data_queue, data_barrier))
p.start()
workers.append((p, data_queue))
# Generate data batches in the main process:
for i in range(DATA_BATCHES):
arr = generate_data_array(i + 1)
data_arr = np.frombuffer(data.get_obj())
np.copyto(data_arr, arr)
for _, data_queue in workers:
# Signal workers that the new data batch is ready:
data_queue.put(True)
data_barrier.wait()
# Stop workers:
for worker, data_queue in workers:
data_queue.put(None)
worker.join()
Here, you start with the definition of the shared data array data and the barrier data_barrier used for the process synchronization. Then, in the loop, you instantiate a queue data_queue, create and start a worker process p passing the shared data array, the queue instance, and the shared barrier instance data_barrier as its parameters. Once the workers have been started, you generate data batches in the loop, copy generated numpy arrays into shared data array, and signal processes via their queues that the next data batch is ready for processing. Then, you wait on barrier when all the worker processes have finished their work before generate the next data batch. In the end, you send None signal to all the processes in order to make them quit the infinite processing loop.
I'm currently setting up a automated simulation pipeline for OpenFOAM (CFD library) using the PyFoam library within Python to create a large database for machine learning purposes. The database will have around 500k distinct simulations. To run this pipeline on multiple machines, I'm using the multiprocessing.Pool.starmap_async(args) option which will continually start a new simulation once the old simulation has completed.
However, since some of the simulations might / will crash, I want to generate a textfile with all cases which have crashed.
I've already found this thread which implements the multiprocessing.Manager.Queue() and adds a listener but I failed to get it running with starmap_async(). For my testing I'm trying to print the case name for any simulation which has been completed but currently only one entry is written into the text file instead of all of them (the simulations all complete successfully).
So my question is how can I write a message to a file for each simulation which has completed.
The current code layout looks roughly like this - only important snipped has been added as the remaining code can't be run without OpenFOAM and additional customs scripts which were created for the automation.
Any help is highly appreciated! :)
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
# Defining global variables
manager = multiprocessing.Manager()
queue = manager.Queue()
def runCase(airfoil, angle, velocity):
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
'''
A lot of pre-processing commands to prepare the simulation
which has been removed from snipped such as generate geometry, create mesh etc...
'''
# run simulation
machine = LAMMachine(nr=4) # set number of cores for parallel execution
simulation = BasicRunner(argv=[solver, "-case", case.name], silent=True, lam=machine, logname="solver")
simulation.start() # start simulation
# check if simulation has completed
if simulation.runOK():
# write message into queue
queue.put(newCase)
if not simulation.runOK():
print("Simulation did not run successfully")
def listener(queue):
fname = 'errors.txt'
msg = queue.get()
while True:
with open(fname, 'w') as f:
if msg == 'complete':
break
f.write(str(msg) + '\n')
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
paramlist = list(itertools.product(nacas, angles, np.round(machs, 9)))
# create number of processes and keep 2 cores idle for other processes
nCores = psutil.cpu_count(logical=False) - 2
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
pool.apply_async(listener, (queue,)) # start the listener
pool.starmap_async(runCase, paramlist).get() # run parallel simulations
queue.put('complete')
pool.close()
pool.join()
if __name__ == '__main__':
main()
First, when your with multiprocessing.Pool(processes=nProcs) as pool: exits, there will be an implicit call to pool.terminate(), which will kill all pool processes and with it any running or queued up tasks. There is no point in calling queue.put('complete') since nobody is listening.
Second, your 'listener" task gets only a single message from the queue. If is "complete", it terminates immediately. If it is something else, it just loops continuously writing the same message to the output file. This cannot be right, can it? Did you forget an additional call to queue.get() in your loop?
Third, I do not quite follow your computation for nProcs. Why the division by 4? If you had 5 physical processors nProcs would be computed as 0. Do you mean something like:
nProcs = psutil.cpu_count(logical=False) // 4
if nProcs == 0:
nProcs = 1
elif nProcs > 1:
nProcs -= 1 # Leave a core free
Fourth, why do you need a separate "listener" task? Have your runCase task return whatever message is appropriate according to how it completes back to the main process. In the code below, multiprocessing.pool.Pool.imap is used so that results can be processed as the tasks complete and results returned:
from PyFoam.Execution.BasicRunner import BasicRunner
from PyFoam.Execution.ParallelExecution import LAMMachine
import numpy as np
import multiprocessing
import itertools
import psutil
def runCase(tpl):
# Unpack tuple:
airfoil, angle, velocity = tpl
# define simulation name
newCase = str(airfoil) + "_" + str(angle) + "_" + str(velocity)
... # Code omitted for brevity
# check if simulation has completed
if simulation.runOK():
return '' # No error
# Simulation did not run successfully:
return f"Simulation {newcase} did not run successfully"
def main():
# Create parameter list
angles = np.arange(-5, 0, 1)
machs = np.array([0.15])
nacas = ['0012']
# There is no reason to convert this into a list; it
# can be lazilly computed:
paramlist = itertools.product(nacas, angles, np.round(machs, 9))
# create number of processes and keep 1 core idle for main process
nCores = psutil.cpu_count(logical=False) - 1
nProc = 4
nProcs = int(nCores / nProc)
with multiprocessing.Pool(processes=nProcs) as pool:
with open('errors.txt', 'w') as f:
# Process message results as soon as the task ends.
# Use method imap_unordered if you do not care about the order
# of the messages in the output.
# We can only pass a single argument using imap, so make it a tuple:
for msg in pool.imap(runCase, zip(paramlist)):
if msg != '': # Error completion
print(msg)
print(msg, file=f)
pool.join() # Not really necessary here
if __name__ == '__main__':
main()
How can I get data back from one (provider) Process to another (requester) Process that made a request of it? This needs to work when the requester process may not have been running when the provider process was started.
I know the provider process can have a Queue to receive requests from multiple requesters, but how can it return data to the requester process that put the request on the queue? How can the requester process know when the data is available?
Simplified Example
Process A requests data from Process Z and waits for Z to return it. Process B is started and independently requests data from Process Z. Process Z handles the requests in order, returning one set of data to A and another to B.
Process Z needs to run continually while Processes A, B, C ... etc. may come and go.
N.B. Briefly, this is needed because Process Z is managing requests to an external resource which does not handle concurrent requests.
Thanks for any help and suggestions.
Julian
In the following demo I have chosen to make "Process Z" a daemon process, meaning that it will automatically terminate when all non-daemon, i.e. "regular" processes, terminate. Alterntively, you can make this a regular process and put in its input queue a special sentinel request value such as (None, None) to signal to it to terminate.
The idea is that Process Z will be initialized with an input queue to which other processes will put requests. Each request is a tuple. For this demo the first element of the tuple is a value to be squared and the second element is a queue to which the result is to be put to. So each process that is making a request of Process Z will be passing its own result queue instance. The important thing to note here is that you cannot put an instance of a multiprocessing.Queue object to another multiprocessing.Queue instance: you will get a RuntimeError: Queue objects should only be shared between processes through inheritance. Therefore, these result queues must instead be managed queue instances created by an instance of a multiprocessing.SynchManager, which is returned by calling multiprocessing.Manager():
from multiprocessing import Process, Queue, Manager
def process_z(input_q):
while True:
x, result_q = input_q.get()
result_q.put(x ** 2)
def process_a(input_q, result_q):
input_q.put((3, result_q))
result = result_q.get()
print('3 ** 2 =', result)
def process_b(input_q, result_q):
input_q.put((7, result_q))
result = result_q.get()
print('7 ** 2 =', result)
input_q.put((5, result_q))
result = result_q.get()
print('5 ** 2 =', result)
for x in range(10, 14):
input_q.put((x, result_q))
for x in range(10, 14):
result = result_q.get()
print(f'{x} ** 2 =', result)
# required by Windows:
if __name__ == '__main__':
with Manager() as manager:
input_q = Queue()
# make "Process Z" a daemon process that will end when all non-daemon processes end:
Process(target=process_z, args=(input_q,), daemon=True).start()
result_q_a = manager.Queue()
p_a = Process(target=process_a, args=(input_q, result_q_a))
p_a.start()
result_q_b = manager.Queue()
p_b = Process(target=process_b, args=(input_q, result_q_b))
p_b.start()
# wait for completion of non-daemon processes:
p_a.join()
p_b.join()
Prints:
3 ** 2 = 9
7 ** 2 = 49
5 ** 2 = 25
10 ** 2 = 100
11 ** 2 = 121
12 ** 2 = 144
13 ** 2 = 169
I'm trying to speed up a simple Python program using multiprocessing's Pool. Specifically: the imap_unordered function.
In my case I'm searching for a specific object with specific properties, and checking this property takes a long time, hence the reason I want to spread the load over my CPU cores.
I created the following code:
from multiprocessing import Pool as ThreadPool
pool = ThreadPool(4)
some_iterator = (create_item() for _ in range(100000))
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
if is_favourable(result):
break
Unfortunately, after calling break, there is still a lot of activity in the threads (as can be observed in my computers activity monitor). How should I keep searching for results till I find a favourable one, or how can I stop iterating over all items using the imap_unordered iterator?
Pool.terminate() will immediately stop the working processes, while Pool.close() will stop submitting tasks and the processes will close once their current task is done.
Pool.terminate() will also be called if the Pool instance is garbage-collected, or by using it with with, so the following is a solution:
import multiprocessing as mp
import time
def my_function(item):
print(mp.current_process().name,item)
time.sleep(2) # imitate a long process
return item * 2
def is_favourable(item):
return item == 20 # something to look for (result of item 10)
def find():
with mp.Pool() as pool:
some_iterator = range(100)
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
print(result)
if is_favourable(result):
return result # pool will be terminated exiting with.
if __name__ == '__main__':
start = time.time()
find()
print(time.time() - start)
A single thread would find item 10 in 22 seconds. On my 8-core system it finds it in ~4 seconds:
SpawnPoolWorker-2 0
SpawnPoolWorker-3 1
SpawnPoolWorker-1 2
SpawnPoolWorker-5 3
SpawnPoolWorker-4 4
SpawnPoolWorker-8 5
SpawnPoolWorker-7 6
SpawnPoolWorker-6 7
SpawnPoolWorker-1 8
SpawnPoolWorker-3 9
SpawnPoolWorker-2 10
4
2
0
8
SpawnPoolWorker-4 11
SpawnPoolWorker-8 12
10
SpawnPoolWorker-5 13
6
12
SpawnPoolWorker-7 14
SpawnPoolWorker-6 15
14
SpawnPoolWorker-3 16
18
SpawnPoolWorker-1 17
SpawnPoolWorker-2 18
16
20
4.203129768371582
For starters, your example code is not using a multiprocessing ThreadPool because your import statement is wrong (it's just allowing access to the regular Pool class via that name).
Regardless, you can just use the Pool/ThreadPool as a context manager since Python 3.3 and put the loop inside it. This will cause its terminate() method to be called automatically when the context is exited (due to the break statement in the example below), and it will will immediately stop the working processes.
from multiprocessing import current_process
from multiprocessing.pool import ThreadPool
from random import randint
import time
def create_item():
return randint(0, 20)
def is_favourable(value):
return value < 20
def my_function(value):
print(current_process().name, value)
time.sleep(2)
return value * 2
if __name__ == '__main__':
with ThreadPool(4) as pool: # Use as context manager (Python 3.3+)
some_iterator = (create_item() for _ in range(10000))
start = time.time()
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
print('result:', result)
if is_favourable(result):
break # Stop loop and exit Pool context.
print('done')
print(time.time() - start)
If you're using an older version of Python, you can just explicitly call pool.terminate() immediately before the break statement (and not use a with statement).
My program basically has to get around 6000 items from the DB and calls an external API for each item. This almost takes 30 min to complete. I just thought of using threads here where i could create multi threads and split the process and reduce the time. So i came up with some thing like this. But I have two questions here. How do i store the response from the API that is processed by the function.
api = externalAPI()
for x in instruments:
response = api.getProcessedItems(x.symbol, days, return_perc);
if(response > float(return_perc)):
return_response.append([x.trading_symbol, x.name, response])
So in the above example the for loop runs for 6000 times(len(instruments) == 6000)
Now lets take i have splited the 6000 items to 2 * 3000 items and do something like this
class externalApi:
def handleThread(self, symbol, days, perc):
//I call the external API and process the items
// how do i store the processed data
def getProcessedItems(self,symbol, days, perc):
_thread.start_new_thread(self.handleThread, (symbol, days, perc))
_thread.start_new_thread(self.handleThread, (symbol, days, perc))
return self.thread_response
I am just starting out with thread. would be helpful if i know this is the right thing to do to reduce the time here.
P.S : Time is important here. I want to reduce it to 1 min from 30 min.
I suggest using worker-queue pattern like so...
you will have a queue of jobs, each worker will take a job and work on it, the result it will put at another queue, when all workers are done, the result queue will be read and process the results
def worker(pool, result_q):
while True:
job = pool.get()
result = handle(job) #handle job
result_q.put(result)
pool.task_done()
q = Queue.Queue()
res_q = Queue.Queue()
for i in range(num_worker_threads):
t = threading.Thread(target=worker, args=(q, res_q))
t.setDaemon(True)
t.start()
for job in jobs:
q.put(job)
q.join()
while not res_q.empty():
result = res_q.get()
# do smth with result
The worker-queue pattern suggested in shahaf's answer works fine, but Python provides even higher level abstractions, in concurret.futures. Namely a ThreadPoolExecutor, which will take care of the queueing and starting of threads for you:
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=30)
responses = executor.map(process_item, (x.symbol for x in instruments))
The main complication with using the excutor.map() is that it can only map over one argument, meaning that there can be only one input to proces_item namely symbol).
However, if more arguments are needed, it is possible to define a new function, which will fixate all arguments but one. This can either be done manually or using the special Python partial call, found in functools:
from functools import partial
process_item = partial(api.handleThread, days=days, perc=return_perc)
Applying the ThreadPoolExecutor strategy to your current probelm would then have a solution similar to:
from concurrent.futures import ThreadPoolExecutor
from functools import partial
class Instrument:
def __init__(self, symbol, name):
self.symbol = symbol
self.name = name
instruments = [Instrument('SMB', 'Name'), Instrument('FNK', 'Funky')]
class externalApi:
def handleThread(self, symbol, days, perc):
# Call the external API and process the items
# Example, to give something back:
if symbol == 'FNK':
return days*3
else:
return days
def process_item_generator(api, days, perc):
return partial(api.handleThread, days=days, perc=perc)
days = 5
return_perc = 10
api = externalApi()
process_item = process_item_generator(api, days, return_perc)
executor = ThreadPoolExecutor(max_workers=30)
responses = executor.map(process_item, (x.symbol for x in instruments))
return_response = ([x.symbol, x.name, response]
for x, response in zip(instruments, responses)
if response > float(return_perc))
Here I have assumed that x.symbol is the same as x.trading_symbol and I have made a dummy implementation of your API call, to get some type of return value, but it should give a good idea of how to do this. Due to this, the code is a bit longer, but then again, it becomes a runnable example.