Multiprocessing in python with workers acting both producers and consumers - python

I'm trying to write a code in which there is a single queue and many workers (producer_consumer in the example) that process objects in the queue. I need to use multiprocessing since the code the workers are going to execute will be CPU bounded. The setup is the following:
The queue is initialized by the parent process with some initial values (names in the example), then it starts the workers.
Workers start getting elements from the queue and after processing that element each worker may produce a new object to be inserted (...and then processed by someone else) into the queue.
All this goes on until the queue is empty. When this happens I would like that all workers stops and the control is given back to the Parent to conclude the execution.
I wrote this example in which workers correctly process elements and produce new objects into the queue but the problem is that the execution hang when the queue is empty. Any suggestions?
Thanks in advance
import time
import os
import random
import string
from multiprocessing import Process, Queue, Lock
# Produces and Consumes names in and from the Queue,
def producer_consumer(queue, lock):
# Synchronize access to the console
with lock:
print('Starting consumer => {}'.format(os.getpid()))
while not queue.empty():
time.sleep(random.randint(0, 10))
# If the queue is empty, queue.get() will block until the queue has data
name = queue.get()
if random.random() < 0.7:
product = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))
queue.put(product)
else:
product = 'nothing'
# Synchronize access to the console
with lock:
print('{} got {}, produced {}'.format(os.getpid(), name, product))
if __name__ == '__main__':
# Create the Queue object
queue = Queue()
# Create a lock object to synchronize resource access
lock = Lock()
producer_consumers = []
names = ['Mario', 'Peppino', 'Francesco', 'Carlo', 'Ermenegildo']
for name in names:
queue.put(name)
for _ in range(5):
producer_consumers.append(Process(target=producer_consumer, args=(queue, lock)))
for process in producer_consumers:
process.start()
for p in producer_consumers:
p.join()
print('Parent process exiting...')

Related

limit number of threads used by a child process launched with ``multiprocessing.Process`

I'm trying to launch a function (my_function) and stop its execution after a certain time is reached.
So i challenged multiprocessing library and everything works well. Here is the code, where my_function() has been changed to only create a dummy message.
from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
import time
timeout=1
# timeout=3
def my_function(something):
time.sleep(2)
return f'my message: {something}'
def wrapper(something, queue):
message ="too late..."
try:
message = my_function(something)
return message
finally:
queue.put(message)
try:
queue = Queue()
params = ("hello", queue)
child_process = Process(target=wrapper, args=params)
child_process.start()
output = queue.get(timeout=timeout)
print(f"ok: {output}")
except Empty:
timeout_message = f"Timeout {timeout}s reached"
print(timeout_message)
finally:
if 'child_process' in locals():
child_process.kill()
You can test and verify that depending on timeout=1 or timeout=3, i can trigger an error or not.
My main problem is that the real my_function() is a torch model inference for which i would like to limit the number of threads (to 4 let's say)
One can easily do so if my_function were in the main process, but in my example i tried a lot of tricks to limit it in the child process without any success (using threadpoolctl.threadpool_limits(4), torch.set_num_threads(4), os.environ["OMP_NUM_THREADS"]=4, os.environ["MKL_NUM_THREADS"]=4).
I'm completely open to other solution that can monitor the time execution of a function while limiting the number of threads used by this function.
thanks
Regards
You can limit simultaneous process with Pool. (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
You can set max tasks done per child. Check it out.
Here you have a sample from superfastpython by Jason Brownlee:
# SuperFastPython.com
# example of limiting the number of tasks per child in the process pool
from time import sleep
from multiprocessing.pool import Pool
from multiprocessing import current_process
# task executed in a worker process
def task(value):
# get the current process
process = current_process()
# report a message
print(f'Worker is {process.name} with {value}', flush=True)
# block for a moment
sleep(1)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(2, maxtasksperchild=3) as pool:
# issue tasks to the process pool
for i in range(10):
pool.apply_async(task, args=(i,))
# close the process pool
pool.close()
# wait for all tasks to complete
pool.join()

How do I share data between processes in python?

here is a simple example:
from collections import deque
from multiprocessing import Process
global_dequeue = deque([])
def push():
global_dequeue.append('message')
p = Process(target=push)
p.start()
def pull():
print(global_dequeue)
pull()
the output is deque([])
if I was to call push function directly, not as a separate process, the output would be deque(['message'])
How can get the message into deque, but still run push function in a separate process?
You can share data by using multiprocessing Queue object which is designed to share data between processes:
from multiprocessing import Process, Queue
import time
def push(q): # send Queue to function as argument
for i in range(10):
q.put(str(i)) # put element in Queue
time.sleep(0.2)
q.put("STOP") # put poison pillow to stop taking elements from Queue in master
if __name__ == "__main__":
q = Queue() # create Queue instance
p = Process(target=push, args=(q,),) # create Process
p.start() # start it
while True:
x = q.get()
if x == "STOP":
break
print(x)
p.join() # join process to our master process and continue master run
print("Finish")
Let me know if it helped, feel free to ask questions.
You can also use Managers to achieve this.
Python 2: https://docs.python.org/2/library/multiprocessing.html#managers
Python 3:https://docs.python.org/3.8/library/multiprocessing.html#managers
Example of usage:
https://pymotw.com/2/multiprocessing/communication.html#managing-shared-state

How to tell if an apply_async function has started or if it's still in the queue with multiprocessing.Pool

I'm using python's multiprocessing.Pool and apply_async to call a bunch of functions.
How can I tell whether a function has started processing by a member of the pool or whether it is sitting in a queue?
For example:
import multiprocessing
import time
def func(t):
#take some time processing
print 'func({}) started'.format(t)
time.sleep(t)
pool = multiprocessing.Pool()
results = [pool.apply_async(func, [t]) for t in [100]*50] #adds 50 func calls to the queue
For each AsyncResult in results you can call ready() or get(0) to see if the func finished running. But how do you find out whether the func started but hasn't finished yet?
i.e. for a given AsyncResult object (i.e. a given element of results) is there a way to see whether the function has been called or if it's sitting in the pool's queue?
First, remove completed jobs from results list
results = [r for r in results if not r.ready()]
Number of processes pending is length of results list:
pending = len(results)
And number pending but not started is total pending - pool_size
not_started = pending - pool_size
pool_size will be multiprocessing.cpu_count() if Pool is created with default argument as you did
UPDATE:
After initially misunderstanding the question, here's a way to do what OP was asking about.
I suspect this functionality could be added to the Pool class without too much trouble because AsyncResult is implemented by Pool with a Queue. That queue could also be used internally to indicate whether started or not.
But here's a way to implement using Pool and Pipe. NOTE: this doesn't work in Python 2.x -- not sure why. Tested in Python 3.8.
import multiprocessing
import time
import os
def worker_function(pipe):
pipe.send('started')
print('[{}] started pipe={}'.format(os.getpid(), pipe))
time.sleep(3)
pipe.close()
def test():
pool = multiprocessing.Pool(processes=2)
print('[{}] pool={}'.format(os.getpid(), pool))
workers = []
for x in range(1, 4):
parent, child = multiprocessing.Pipe()
pool.apply_async(worker_function, (child,))
worker = {'name': 'worker{}'.format(x), 'pipe': parent, 'started': False}
workers.append(worker)
pool.close()
while True:
for worker in workers:
if worker.get('started'):
continue
pipe = worker.get('pipe')
if pipe.poll(0.1):
message = pipe.recv()
print('[{}] {} says {}'.format(os.getpid(), worker.get('name'), message))
worker['started'] = True
pipe.close()
count_in_queue = len(workers)
for worker in workers:
if worker.get('started'):
count_in_queue -= 1
print('[{}] count_in_queue = {}'.format(os.getpid(), count_in_queue))
if not count_in_queue:
break
time.sleep(0.5)
pool.join()
if __name__ == '__main__':
test()

Synchronize pool of workers - Python and multiproccessing

I want to make an synchronized simulation of graph coloring. To create the graph (tree) I am using igraph package and to synchronization I am using for the first time multiprocessing package. I built a graph where each node has attributes: label, color and parentColor. To color the tree I excecute the following function (I am not giving the full code because it is very long, and I think not necessary to solve my problem):
def sixColor(self):
root = self.graph.vs.find("root")
root["color"] = self.takeColorFromList(root["label"])
self.sendToChildren(root)
lista = []
for e in self.graph.vs():
lista.append(e.index)
p = multiprocessing.Pool(len(lista))
p.map(fun, zip([self]*len(lista), lista),chunksize=300)
def process_sixColor(self, id):
v = self.graph.vs.find(id)
if not v["name"] == "root":
while True:
if v["received"] == True:
v["received"] = False
#------------Part 1-----------
self.sendToChildren(v)
self.printInfo()
#-----------Part 2-------------
diffIdx = self.compareLabelWithParent(v)
if not diffIdx == -1:
diffIdxStr = str(bin(diffIdx))[2:]
charAtPos = (v["label"][::-1])[diffIdx]
newLabel = diffIdxStr + charAtPos
v["label"] = newLabel
self.sendToChildren(v)
colorNum = int(newLabel,2)
if colorNum in sixColorList:
v["color"] = self.takeColorFromList(newLabel)
self.printGraph()
break
I want to have that each node (except root) is calling function process_sixColor synchronously in parallel and will not evaluate Part 2before Part 1 will be made by all nodes. But I notice that this is not working properly and some nodes are evaluating before every other node will execute Part 1. How can I solve that problem?
You can use a combination of a multiprocessing.Queue and a multiprocessing.Event object to synchronize the workers. Make the main process create a Queue and an Event and pass both to all the workers. The Queue will be used by the workers to let the main process know that they are finished with part 1. The Event will be used by the main process to let all the workers know that all the workers are finished with part 1. Basically,
the workers will call queue.put() to let the main process know that they have reached part 2 and then call event.wait() to wait for the main process to give the green light.
the main process will repeatedly call queue.get() until it receives as many messages as there are workers in the worker pool and then call event.set() to give the green light for the workers to start with part 2.
This is a simple example:
from __future__ import print_function
from multiprocessing import Event, Process, Queue
def worker(identifier, queue, event):
# Part 1
print("Worker {0} reached part 1".format(identifier))
# Let the main process know that we have finished part 1
queue.put(identifier)
# Wait for all the other processes
event.wait()
# Start part 2
print("Worker {0} reached part 2".format(identifier))
def main():
queue = Queue()
event = Event()
processes = []
num_processes = 5
# Create the worker processes
for identifier in range(num_processes):
process = Process(target=worker, args=(identifier, queue, event))
processes.append(process)
process.start()
# Wait for "part 1 completed" messages from the processes
while num_processes > 0:
queue.get()
num_processes -= 1
# Set the event now that all the processes have reached part 2
event.set()
# Wait for the processes to terminate
for process in processes:
process.join()
if __name__ == "__main__":
main()
If you want to use this in a production environment, you should think about how to handle errors that occur in part 1. Right now if an exception happens in part 1, the worker will never call queue.put() and the main process will block indefinitely waiting for the message from the failed worker. A production-ready solution should probably wrap the entire part 1 in a try..except block and then send a special error signal in the queue. The main process can then exit immediately if the error signal is received in the queue.

python multiprocessing accessing data from multiprocessing queue not reading all data

I have an iterator which contains a lot of data (larger then memory) I want to be able to perform some actions on this data. To do this quickly I am using the multiprocessing module.
def __init__(self, poolSize, spaceTimeTweetCollection=None):
super().__init__()
self.tagFreq = {}
if spaceTimeTweetCollection is not None:
q = Queue()
processes = [Process(target=self.worker, args=((q),)) for i in range(poolSize)]
for p in processes:
p.start()
for tweet in spaceTimeTweetCollection:
q.put(tweet)
for p in processes:
p.join()
the aim is that I create some proceses which listen in on the queue
def worker(self, queue):
tweet = queue.get()
self.append(tweet) #performs some actions on data
I then loop over the iterator and add the data to the queue as the queue.get() in the worker method is blocking the workers should start performing actions on the data as it recieves it from the queue.
However instead each worker on each processor is run once and thats it! so if poolSize is 8 it will read the first 8 items in the queue perform the actions on 8 different processes and then it will finish! does anyone know why this is happerning? I am running this on windows.
edit
I wanted to mention even thought this is all being done in a class the class is called in _main_like so
if __name__ == '__main__':
tweetDatabase = Database()
dataSet = tweetDatabase.read2dBoundingBox(boundaryBox)
freq = TweetCounter(8, dataSet) # this is where the multiprocessing is done
Your worker is to blame I believe. It just does one thing and then dies. Try:
def worker(self, queue):
while True:
tweet = queue.get()
self.append(tweet)
(I'd take a look at Pool though)

Categories

Resources