here is a simple example:
from collections import deque
from multiprocessing import Process
global_dequeue = deque([])
def push():
global_dequeue.append('message')
p = Process(target=push)
p.start()
def pull():
print(global_dequeue)
pull()
the output is deque([])
if I was to call push function directly, not as a separate process, the output would be deque(['message'])
How can get the message into deque, but still run push function in a separate process?
You can share data by using multiprocessing Queue object which is designed to share data between processes:
from multiprocessing import Process, Queue
import time
def push(q): # send Queue to function as argument
for i in range(10):
q.put(str(i)) # put element in Queue
time.sleep(0.2)
q.put("STOP") # put poison pillow to stop taking elements from Queue in master
if __name__ == "__main__":
q = Queue() # create Queue instance
p = Process(target=push, args=(q,),) # create Process
p.start() # start it
while True:
x = q.get()
if x == "STOP":
break
print(x)
p.join() # join process to our master process and continue master run
print("Finish")
Let me know if it helped, feel free to ask questions.
You can also use Managers to achieve this.
Python 2: https://docs.python.org/2/library/multiprocessing.html#managers
Python 3:https://docs.python.org/3.8/library/multiprocessing.html#managers
Example of usage:
https://pymotw.com/2/multiprocessing/communication.html#managing-shared-state
Related
I'm trying to write a code in which there is a single queue and many workers (producer_consumer in the example) that process objects in the queue. I need to use multiprocessing since the code the workers are going to execute will be CPU bounded. The setup is the following:
The queue is initialized by the parent process with some initial values (names in the example), then it starts the workers.
Workers start getting elements from the queue and after processing that element each worker may produce a new object to be inserted (...and then processed by someone else) into the queue.
All this goes on until the queue is empty. When this happens I would like that all workers stops and the control is given back to the Parent to conclude the execution.
I wrote this example in which workers correctly process elements and produce new objects into the queue but the problem is that the execution hang when the queue is empty. Any suggestions?
Thanks in advance
import time
import os
import random
import string
from multiprocessing import Process, Queue, Lock
# Produces and Consumes names in and from the Queue,
def producer_consumer(queue, lock):
# Synchronize access to the console
with lock:
print('Starting consumer => {}'.format(os.getpid()))
while not queue.empty():
time.sleep(random.randint(0, 10))
# If the queue is empty, queue.get() will block until the queue has data
name = queue.get()
if random.random() < 0.7:
product = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(10))
queue.put(product)
else:
product = 'nothing'
# Synchronize access to the console
with lock:
print('{} got {}, produced {}'.format(os.getpid(), name, product))
if __name__ == '__main__':
# Create the Queue object
queue = Queue()
# Create a lock object to synchronize resource access
lock = Lock()
producer_consumers = []
names = ['Mario', 'Peppino', 'Francesco', 'Carlo', 'Ermenegildo']
for name in names:
queue.put(name)
for _ in range(5):
producer_consumers.append(Process(target=producer_consumer, args=(queue, lock)))
for process in producer_consumers:
process.start()
for p in producer_consumers:
p.join()
print('Parent process exiting...')
I'm trying to launch a function (my_function) and stop its execution after a certain time is reached.
So i challenged multiprocessing library and everything works well. Here is the code, where my_function() has been changed to only create a dummy message.
from multiprocessing import Queue, Process
from multiprocessing.queues import Empty
import time
timeout=1
# timeout=3
def my_function(something):
time.sleep(2)
return f'my message: {something}'
def wrapper(something, queue):
message ="too late..."
try:
message = my_function(something)
return message
finally:
queue.put(message)
try:
queue = Queue()
params = ("hello", queue)
child_process = Process(target=wrapper, args=params)
child_process.start()
output = queue.get(timeout=timeout)
print(f"ok: {output}")
except Empty:
timeout_message = f"Timeout {timeout}s reached"
print(timeout_message)
finally:
if 'child_process' in locals():
child_process.kill()
You can test and verify that depending on timeout=1 or timeout=3, i can trigger an error or not.
My main problem is that the real my_function() is a torch model inference for which i would like to limit the number of threads (to 4 let's say)
One can easily do so if my_function were in the main process, but in my example i tried a lot of tricks to limit it in the child process without any success (using threadpoolctl.threadpool_limits(4), torch.set_num_threads(4), os.environ["OMP_NUM_THREADS"]=4, os.environ["MKL_NUM_THREADS"]=4).
I'm completely open to other solution that can monitor the time execution of a function while limiting the number of threads used by this function.
thanks
Regards
You can limit simultaneous process with Pool. (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool)
You can set max tasks done per child. Check it out.
Here you have a sample from superfastpython by Jason Brownlee:
# SuperFastPython.com
# example of limiting the number of tasks per child in the process pool
from time import sleep
from multiprocessing.pool import Pool
from multiprocessing import current_process
# task executed in a worker process
def task(value):
# get the current process
process = current_process()
# report a message
print(f'Worker is {process.name} with {value}', flush=True)
# block for a moment
sleep(1)
# protect the entry point
if __name__ == '__main__':
# create and configure the process pool
with Pool(2, maxtasksperchild=3) as pool:
# issue tasks to the process pool
for i in range(10):
pool.apply_async(task, args=(i,))
# close the process pool
pool.close()
# wait for all tasks to complete
pool.join()
I am new to multiprocessing of Python, and I wrote the tiny script below:
import multiprocessing
import os
def task(queue):
print(100)
def run(pool):
queue = multiprocessing.Queue()
for i in range(os.cpu_count()):
pool.apply_async(task, args=(queue, ))
if __name__ == '__main__':
multiprocessing.freeze_support()
pool = multiprocessing.Pool()
run(pool)
pool.close()
pool.join()
I am wondering why the task() method is not executed and there is no output after running this script. Could anyone help me?
It is running, but it's dying with an error outside the main thread, and so you don't see the error. For that reason, it's always good to .get() the result of an async call, even if you don't care about the result: the .get() will raise the error that's otherwise invisible.
For example, change your loop like so:
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task, args=(queue,)))
for t in tasks:
t.get()
Then the new t.get() will blow up, ending with:
RuntimeError: Queue objects should only be shared between processes through inheritance
In short, passing Queue objects to Pool methods isn't supported.
But you can pass them to multiprocessing.Process(), or to a Pool initialization function. For example, here's a way to do the latter:
import multiprocessing
import os
def pool_init(q):
global queue # make queue global in workers
queue = q
def task():
# can use `queue` here if you like
print(100)
def run(pool):
tasks = []
for i in range(os.cpu_count()):
tasks.append(pool.apply_async(task))
for t in tasks:
t.get()
if __name__ == '__main__':
queue = multiprocessing.Queue()
pool = multiprocessing.Pool(initializer=pool_init, initargs=(queue,))
run(pool)
pool.close()
pool.join()
On Linux-y systems, you can - as the original error message suggested - use process inheritance instead (but that's not possible on Windows).
I've gone through (this SO thread)[ Synchronization issue using Python's multiprocessing module but it doesnt provide the answer.
The following code :-
rom multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print 'hello world', i
l.release()
# do something else
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
How do I get the processes to execute in order.? I want to hold up a lock for a few seconds and then release it and thereby moving forward with the P1 and P2 into the lock, and then P2 moving forward and P3 exceuting that lock. How would I get the processes to execute in order.?
It sounds like you just want to delay the start of each successive process. If that's the case, you can use a multiprocessing.Event to delay starting the next child in the main process. Just pass the event to the child, and have the child set the Event when its done doing whatever should run prior to starting the next child. The main process can wait on that Event, and once it's signalled, clear it and start the next child.
from multiprocessing import Process, Event
def f(e, i):
print 'hello world', i
e.set()
# do something else
if __name__ == '__main__':
event = Event()
for num in range(10):
p = Process(target=f, args=(event, num))
p.start()
event.wait()
event.clear()
this is not the purpose of locks. Your code architecture is bad for your use case. I think you should refactor your code to this:
from multiprocessing import Process
def f(i):
# do something here
if __name__ == '__main__':
for num in range(10):
print 'hello world', num
Process(target=f, args=(num,)).start()
in this case it will print in order and then will do the remaining part asynchronously
I have an iterator which contains a lot of data (larger then memory) I want to be able to perform some actions on this data. To do this quickly I am using the multiprocessing module.
def __init__(self, poolSize, spaceTimeTweetCollection=None):
super().__init__()
self.tagFreq = {}
if spaceTimeTweetCollection is not None:
q = Queue()
processes = [Process(target=self.worker, args=((q),)) for i in range(poolSize)]
for p in processes:
p.start()
for tweet in spaceTimeTweetCollection:
q.put(tweet)
for p in processes:
p.join()
the aim is that I create some proceses which listen in on the queue
def worker(self, queue):
tweet = queue.get()
self.append(tweet) #performs some actions on data
I then loop over the iterator and add the data to the queue as the queue.get() in the worker method is blocking the workers should start performing actions on the data as it recieves it from the queue.
However instead each worker on each processor is run once and thats it! so if poolSize is 8 it will read the first 8 items in the queue perform the actions on 8 different processes and then it will finish! does anyone know why this is happerning? I am running this on windows.
edit
I wanted to mention even thought this is all being done in a class the class is called in _main_like so
if __name__ == '__main__':
tweetDatabase = Database()
dataSet = tweetDatabase.read2dBoundingBox(boundaryBox)
freq = TweetCounter(8, dataSet) # this is where the multiprocessing is done
Your worker is to blame I believe. It just does one thing and then dies. Try:
def worker(self, queue):
while True:
tweet = queue.get()
self.append(tweet)
(I'd take a look at Pool though)