I'm trying to combine multiprocessing with asyncio. The program has two main components - one which streams/generates content, and another that consumes it.
What I want to do is to create multiple processes in order to exploit multiple CPU cores - one for the stream listener/generator, another for the consumer, and a simple one to shut down everything when the consumer has stopped.
My approach so far has been to create the processes, and start them. Each such process creates an async task. Once all processes have started, I run the asyncio tasks. What I have so far (stripped down) is:
def consume_task(loop, consumer):
loop.create_task(consume_queue(consumer))
def stream_task(loop, listener, consumer):
loop.create_task(create_stream(listener, consumer))
def shutdown_task(loop, listener):
loop.create_task(shutdown(consumer))
async def shutdown(consumer):
print("Shutdown task created")
while not consumer.is_stopped():
print("No activity")
await asyncio.sleep(5)
print("Shutdown initiated")
loop.stop()
async def create_stream(listener, consumer):
stream = Stream(auth, listener)
print("Stream created")
stream.filter(track=KEYWORDS, is_async=True)
await asyncio.sleep(EVENT_DURATION)
print("Stream finished")
consumer.stop()
async def consume_queue(consumer):
await consumer.run()
loop = asyncio.get_event_loop()
p_stream = Process(target=stream_task, args=(loop, listener, consumer, ))
p_consumer = Process(target=consume_task, args=(loop, consumer, ))
p_shutdown = Process(target=shutdown_task, args=(loop, consumer, ))
p_stream.start()
p_consumer.start()
p_shutdown.start()
loop.run_forever()
loop.close()
The problem is that everything hangs (or does it block?) - no tasks are actually running. My solution was to change the first three functions to:
def consume_task(loop, consumer):
loop.create_task(consume_queue(consumer))
loop.run_forever()
def stream_task(loop, listener, consumer):
loop.create_task(create_stream(listener, consumer))
loop.run_forever()
def shutdown_task(loop, listener):
loop.create_task(shutdown(consumer))
loop.run_forever()
This does actually run. However, the consumer and the listener objects are not able to communicate. As a simple example, when the create_stream function calls consumer.stop(), the consumer does not stop. Even when I change a consumer class variable, the changes are not made - case in point, the shared queue remains empty. This is how I am creating the instances:
queue = Queue()
consumer = PrintConsumer(queue)
listener = QueuedListener(queue, max_time=EVENT_DURATION)
Please note that if I do not use processes, but only asyncio tasks, everything works as expected, so I do not think it's a reference issue:
loop = asyncio.get_event_loop()
stream_task(loop, listener, consumer)
consume_task(loop, consumer)
shutdown_task(loop, listener)
loop.run_forever()
loop.close()
Is it because they are running on different processes? How should I go about fixing this issue please?
Found the problem! Multi-processing creates copies of instances. The solution is to create a Manager, which shares the instances itself.
EDIT [11/2/2020]:
import asyncio
from multiprocessing import Process, Manager
"""
These two functions will be created as separate processes.
"""
def task1(loop, shared_list):
output = loop.run_until_complete(asyncio.gather(async1(shared_list)))
def task2(loop, shared_list):
output = loop.run_until_complete(asyncio.gather(async2(shared_list)))
"""
These two functions will be called (in different processes) asynchronously.
"""
async def async1(shared_list):
pass
async def async2(shared_list):
pass
"""
Create the manager and start it up.
From this manager, also create a list that is shared by functions in different threads.
"""
manager = Manager()
manager.start()
shared_list = manager.list()
loop = asyncio.get_event_loop() # the event loop
"""
Create two processes.
"""
process1 = Process(target=task1, args=(loop, shared_list, ))
process2 = Process(target=task2, args=(loop, shared_list, ))
"""
Start the two processes and wait for them to finish.
"""
process1.start()
process2.start()
output1 = process1.join()
output2 = process2.join()
"""
Clean up
"""
loop.close()
manager.shutdown()
Related
I want to start a new Process (Pricefeed) from my Executor class and then have the Executor class keep running in its own event loop (the shoot method). In my current attempt, the asyncio loop gets blocked on the line p.join(). However, without that line, my code just exits. How do I do this properly?
Note: fh.run() blocks as well.
import asyncio
from multiprocessing import Process, Queue
from cryptofeed import FeedHandler
from cryptofeed.defines import L2_BOOK
from cryptofeed.exchanges.ftx import FTX
class Pricefeed(Process):
def __init__(self, queue: Queue):
super().__init__()
self.coin_symbol = 'SOL-USD'
self.fut_symbol = 'SOL-USD-PERP'
self.queue = queue
async def _book_update(self, feed, symbol, book, timestamp, receipt_timestamp):
self.queue.put(book)
def run(self):
fh = FeedHandler()
fh.add_feed(FTX(symbols=[self.fut_symbol, self.coin_symbol], channels=[L2_BOOK],
callbacks={L2_BOOK: self._book_update}))
fh.run()
class Executor:
def __init__(self):
self.q = Queue()
async def shoot(self):
print('in shoot')
for i in range(5):
msg = self.q.get()
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
asyncio.create_task(self.shoot())
p = Pricefeed(self.q)
p.start()
p.join()
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
Since you're using a queue to communicate this is a somewhat tricky problem. To answer your first question as to why removing join makes the program work, join blocks until the process finishes. In asyncio you can't do anything blocking in a function marked async or it will freeze the event loop. To do this properly you'll need to run your process with the asyncio event loop's run_in_executor method which will run things in a process pool and return an awaitable that is compatible with the asyncio event loop.
Secondly, you'll need to use a multiprocessing Manager which creates shared state that can be used by multiple processes to properly share your queue. Managers directly support creation of a shared queue. Using these two bits of knowledge you can adapt your code to something like the following which works:
import asyncio
import functools
import time
from multiprocessing import Manager
from concurrent.futures import ProcessPoolExecutor
def run_pricefeed(queue):
i = 0
while True: #simulate putting an item on the queue every 250ms
queue.put(f'test-{i}')
i += 1
time.sleep(.25)
class Executor:
async def shoot(self, queue):
print('in shoot')
for i in range(5):
while not queue.empty():
msg = queue.get(block=False)
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
with ProcessPoolExecutor() as pool:
with Manager() as manager:
queue = manager.Queue()
asyncio.create_task(self.shoot(queue))
await asyncio.get_running_loop().run_in_executor(pool, functools.partial(run_pricefeed, queue))
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
This code has a drawback in that you need to empty the queue in a non-blocking fashing from your asyncio process and wait for a while for new items to come in before emptying it again, effectively implementing a polling mechanism. If you don't wait after emptying, you'll wind up with blocking code and you will freeze the event loop again. This isn't as good as just waiting for the queue to have an item in it by blocking, but may suit your needs. If possible, I would avoid asyncio here and use multiprocessing entirely, for example, by implementing queue processing as a separate process.
I would like to combine asyncio and multiprocessing as I have a task where a part is io-bound and another is cpu-bound. I first tried to use loop.run_in_executor(), but I couldn't get it to work probably. Instead I went with creating two processes where one uses asyncio and the other doesn't.
The code is such that I have a class with some non-blocking functions and one blocking. I have an asyncio.Queue to pass information between the non-blocking parts and a multiprocessing.Queue to pass information between the non-blocking and the blocking functions.
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
import asyncio
import time
class TestClass:
def __init__(self):
m = mp.Manager()
self.blocking_queue = m.Queue()
async def run(self):
loop = asyncio.get_event_loop()
self.non_blocking_queue = asyncio.Queue() # asyncio Queue must be declared within event loop
task1 = loop.create_task(self.non_blocking1())
task2 = loop.create_task(self.non_blocking2())
task3 = loop.create_task(self.print_msgs())
await asyncio.gather(task1, task2)
task3.cancel()
def blocking(self):
i = 0
while i < 5:
time.sleep(0.6)
i += 1
print("Blocking ", i)
line = self.blocking_queue.get()
print("Blocking: ", line)
print("blocking done")
async def non_blocking1(self):
for i in range(5):
await self.non_blocking_queue.put("Hello")
await asyncio.sleep(0.4)
async def non_blocking2(self):
for i in range(5):
await self.non_blocking_queue.put("World")
await asyncio.sleep(0.5)
async def print_msgs(self):
while True:
line = await self.non_blocking_queue.get()
self.blocking_queue.put(line)
print(line)
test_class = TestClass()
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(asyncio.run(test_class.run()))
print("done")
About half the times I run this, it works fine and prints out the text in the blocking and the non-blocking queues. The other half it only prints out the results of the non-blocking queue. It looks like the blocking process isn't started at all. It is not consequent every other time. It might work five times in a row and then not work five times in row.
What might cause such a problem? Which better way can I do this, using both multiprocessing and asyncio?
running the async task "inside" the other process works for me, e.g.:
def runfn(fn):
return asyncio.run(fn())
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(runfn, test_class.run)
presumably there's some state inside asyncio/the task that needs to be consistent or gets broken when running in another process
I am writing a Python program that run tasks taken from a queue concurrently, to learn asyncio.
Items will be put onto a queue by interacting with a main thread (within REPL).
Whenever a task is put onto the queue, it should be consumed and executed immediately.
My approach is to kick off a separate thread and pass a queue to the event loop within that thread.
The tasks are running but only sequentially and I am not clear on how to run the tasks concurrently. My attempt is as follows:
import asyncio
import time
import queue
import threading
def do_it(task_queue):
'''Process tasks in the queue until the sentinel value is received'''
_sentinel = 'STOP'
def clock():
return time.strftime("%X")
async def process(name, total_time):
status = f'{clock()} {name}_{total_time}:'
print(status, 'START')
current_time = time.time()
end_time = current_time + total_time
while current_time < end_time:
print(status, 'processing...')
await asyncio.sleep(1)
current_time = time.time()
print(status, 'DONE.')
async def main():
while True:
item = task_queue.get()
if item == _sentinel:
break
await asyncio.create_task(process(*item))
print('event loop start')
asyncio.run(main())
print('event loop end')
if __name__ == '__main__':
tasks = queue.Queue()
th = threading.Thread(target=do_it, args=(tasks,))
th.start()
tasks.put(('abc', 5))
tasks.put(('def', 3))
Any advice pointing me in the direction of running these tasks concurrently would be greatly appreciated!
Thanks
UPDATE
Thank you Frank Yellin and cynthi8! I have reformed main() according to your advice:
removed await before asyncio.create_task - fixed concurrency
added wait while loop so that main would not return prematurely
used non-blocking mode of Queue.get()
The program now works as expected 👍
UPDATE 2
user4815162342 has offered further improvements, I have annotated his suggestions below.
'''
Starts auxiliary thread which establishes a queue and consumes tasks within a
queue.
Allow enqueueing of tasks from within __main__ and termination of aux thread
'''
import asyncio
import time
import threading
import functools
def do_it(started):
'''Process tasks in the queue until the sentinel value is received'''
_sentinel = 'STOP'
def clock():
return time.strftime("%X")
async def process(name, total_time):
print(f'{clock()} {name}_{total_time}:', 'Started.')
current_time = time.time()
end_time = current_time + total_time
while current_time < end_time:
print(f'{clock()} {name}_{total_time}:', 'Processing...')
await asyncio.sleep(1)
current_time = time.time()
print(f'{clock()} {name}_{total_time}:', 'Done.')
async def main():
# get_running_loop() get the running event loop in the current OS thread
# out to __main__ thread
started.loop = asyncio.get_running_loop()
started.queue = task_queue = asyncio.Queue()
started.set()
while True:
item = await task_queue.get()
if item == _sentinel:
# task_done is used to tell join when the work in the queue is
# actually finished. A queue length of zero does not mean work
# is complete.
task_queue.task_done()
break
task = asyncio.create_task(process(*item))
# Add a callback to be run when the Task is done.
# Indicate that a formerly enqueued task is complete. Used by queue
# consumer threads. For each get() used to fetch a task, a
# subsequent call to task_done() tells the queue that the processing
# on the task is complete.
task.add_done_callback(lambda _: task_queue.task_done())
# keep loop going until all the work has completed
# When the count of unfinished tasks drops to zero, join() unblocks.
await task_queue.join()
print('event loop start')
asyncio.run(main())
print('event loop end')
if __name__ == '__main__':
# started Event is used for communication with thread th
started = threading.Event()
th = threading.Thread(target=do_it, args=(started,))
th.start()
# started.wait() blocks until started.set(), ensuring that the tasks and
# loop variables are available from the event loop thread
started.wait()
tasks, loop = started.queue, started.loop
# call_soon schedules the callback callback to be called with args arguments
# at the next iteration of the event loop.
# call_soon_threadsafe is required to schedule callbacks from another thread
# put_nowait enqueues items in non-blocking fashion, == put(block=False)
loop.call_soon_threadsafe(tasks.put_nowait, ('abc', 5))
loop.call_soon_threadsafe(tasks.put_nowait, ('def', 3))
loop.call_soon_threadsafe(tasks.put_nowait, 'STOP')
As others pointed out, the problem with your code is that it uses a blocking queue which halts the event loop while waiting for the next item. The problem with the proposed solution, however, is that it introduces latency because it must occasionally sleep to allow other tasks to run. In addition to introducing latency, it prevents the program from ever going to sleep, even when there are no items in the queue.
An alternative is to switch to asyncio queue which is designed for use with asyncio. This queue must be created inside the running loop, so you can't pass it to do_it, you must retrieve it. Also, since it's an asyncio primitive, its put method must be invoked through call_soon_threadsafe to ensure that the event loop notices it.
One final issue is that your main() function uses another busy loop to wait for all the tasks to complete. This can be avoided by using Queue.join, which is explicitly designed for this use case.
Here is your code adapted to incorporate all of the above suggestions, with the process function remaining unchanged from your original:
import asyncio
import time
import threading
def do_it(started):
'''Process tasks in the queue until the sentinel value is received'''
_sentinel = 'STOP'
def clock():
return time.strftime("%X")
async def process(name, total_time):
status = f'{clock()} {name}_{total_time}:'
print(status, 'START')
current_time = time.time()
end_time = current_time + total_time
while current_time < end_time:
print(status, 'processing...')
await asyncio.sleep(1)
current_time = time.time()
print(status, 'DONE.')
async def main():
started.loop = asyncio.get_running_loop()
started.queue = task_queue = asyncio.Queue()
started.set()
while True:
item = await task_queue.get()
if item == _sentinel:
task_queue.task_done()
break
task = asyncio.create_task(process(*item))
task.add_done_callback(lambda _: task_queue.task_done())
await task_queue.join()
print('event loop start')
asyncio.run(main())
print('event loop end')
if __name__ == '__main__':
started = threading.Event()
th = threading.Thread(target=do_it, args=(started,))
th.start()
started.wait()
tasks, loop = started.queue, started.loop
loop.call_soon_threadsafe(tasks.put_nowait, ('abc', 5))
loop.call_soon_threadsafe(tasks.put_nowait, ('def', 3))
loop.call_soon_threadsafe(tasks.put_nowait, 'STOP')
Note: an unrelated issue with your code was that it awaited the result of create_task(), which nullified the usefulness of create_task() because it wasn't allowed to run in the background. (It would be equivalent to immediately joining a thread you've just started - you can do it, but it doesn't make much sense.) This issue is fixed both in the above code and in your edit to the question.
There are two problems with your code.
First, you should not have the await before the asyncio.create_task. This is possibly what is causing your code to run synchronously.
Then, once you've made your code run asynchronously, you need something after the while loop in main so that the code doesn't return immediately, but instead waits for all the jobs to finish. Another stackoverflow answer recommends:
while len(asyncio.Task.all_tasks()) > 1: # Any task besides main() itself?
await asyncio.sleep(0.2)
Alternatively there are versions of Queue that can keep track of running tasks.
As an additional problem:
If a queue.Queue is empty, get() blocks by default and does not return a sentinel string. https://docs.python.org/3/library/queue.html
I have a program with one main thread where I spawn a second thread that uses asyncio. Are there any tools provided to synchronize these two threads? If everything was asyncio, I could do it with its synchronization primitives, eg:
import asyncio
async def taskA(lst, evt):
print(f'Appending 1')
lst.append(1)
evt.set()
async def taskB(lst, evt):
await evt.wait()
print('Retrieved:', lst.pop())
lst = []
evt = asyncio.Event()
asyncio.get_event_loop().run_until_complete(asyncio.gather(
taskA(lst, evt),
taskB(lst, evt),
))
However, this does not work with multiple threads. If I just use a threading.Event then it will block the asyncio thread. I figured out I could defer the wait to an executor:
import asyncio
import threading
def taskA(lst, evt):
print(f'Appending 1')
lst.append(1)
evt.set()
async def taskB(lst, evt):
asyncio.get_event_loop().run_in_executor(None, evt.wait)
print('Retrieved:', lst.pop())
def targetA(lst, evt):
taskA(lst, evt)
def targetB(lst, evt):
asyncio.set_event_loop(asyncio.new_event_loop())
asyncio.get_event_loop().run_until_complete(taskB(lst, evt))
lst = []
evt = threading.Event()
threadA = threading.Thread(target=targetA, args=(lst, evt))
threadB = threading.Thread(target=targetB, args=(lst, evt))
threadA.start()
threadB.start()
threadA.join()
threadB.join()
However, having an executor thread only to wait for a mutex seems unnatural. Is this the way this is supposed to be done? Or is there any other way to wait for synchronization between OS threads asynchronously?
A simple way to synchronize an asyncio coroutine with an event coming from another thread is to await an asyncio.Event in taskB, and set it from taskA using loop.call_soon_threadsafe.
To be able to pass values and exceptions between the two, you can use futures; however then you are inventing much of run_in_executor. If the only job of taskA is to take tasks off a queue, you might as well make a single-worker "pool" and use it as your worker thread. Then you can use run_in_executor as intended:
worker = concurrent.futures.ThreadPoolExecutor(max_workers=1)
async def taskB(lst):
loop = asyncio.get_event_loop()
# or result = await ..., if taskA has a useful return value
# This will also propagate exceptions raised by taskA
await loop.run_in_executor(worker, taskA, lst)
print('Retrieved:', lst.pop())
The semantics are the same as in your version with an explicit queue - the queue is still there, it's just inside the ThreadPoolExecutor.
I am using Python websockets 4.0.1 on Ubuntu. I want to have 2 websocket servers running. I was able to get this to "kind of work" by creating 2 threads and independent event loops for each one. By "kind of work", I mean both websockets work and are responsive for about 30 seconds and then one of them stops. I have to restart the process to get them both to work again. If I only run one or the other of these 2 threads, the single websocket works forever.
What am I doing wrong and how can I have 2 websockets work forever with asyncio?
# Start VL WebSocket Task
class vlWebSocketTask (threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
# Main while loops
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
while True:
try:
print("Starting VL WebSocket Server...")
startVLServer = websockets.serve(vlWebsocketServer, '192.168.1.3', 8777)
asyncio.get_event_loop().run_until_complete(startVLServer)
asyncio.get_event_loop().run_forever()
except Exception as ex:
print(ex)
time.sleep(5)
# Start IR WebSocket Task
class irWebSocketTask (threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
while True:
try:
print("Starting IR WebSocket Server...")
startIRServer = websockets.serve(irWebsocketServer, '192.168.1.3', 8555)
asyncio.get_event_loop().run_until_complete(startIRServer)
asyncio.get_event_loop().run_forever()
except Exception as ex:
print(ex)
time.sleep(5)
# Initialize VL WebSocket Task
#VLWebSocketTask = vlWebSocketTask()
#VLWebSocketTask.start()
# Initialize IR WebSocket Task
IRWebSocketTask = irWebSocketTask()
IRWebSocketTask.start()
You don't need threads to run multiple asyncio tasks - allowing multiple agents to share the same event loop is the strong suit of asyncio. You should be able to replace both thread-based classes with code like this:
loop = asyncio.new_event_loop()
loop.run_until_complete(websockets.serve(vlWebsocketServer, '192.168.1.3', 8777))
loop.run_until_complete(websockets.serve(irWebsocketServer, '192.168.1.3', 8555))
loop.run_forever()
While it is not exactly wrong to mix threads and asyncio, doing so correctly requires care not to mix up the separate asyncio instances. The safe way to use threads for asyncio is with loop.run_in_executor(), which runs synchronous code in a separate thread without blocking the event loop, while returning an object awaitable from the loop.
Note: the above code was written prior to the advent of asyncio.run() and manually spins the event loop. In Python 3.7 and later one would probably write something like:
async def main():
server1 = await websockets.serve(vlWebsocketServer, '192.168.1.3', 8777)
server2 = await websockets.serve(irWebsocketServer, '192.168.1.3', 8555)
await asyncio.gather(server1.wait_closed(), server2.wait_closed())
asyncio.run(main())