Python - Combining multiprocessing with asyncio works only sometimes - python

I would like to combine asyncio and multiprocessing as I have a task where a part is io-bound and another is cpu-bound. I first tried to use loop.run_in_executor(), but I couldn't get it to work probably. Instead I went with creating two processes where one uses asyncio and the other doesn't.
The code is such that I have a class with some non-blocking functions and one blocking. I have an asyncio.Queue to pass information between the non-blocking parts and a multiprocessing.Queue to pass information between the non-blocking and the blocking functions.
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
import asyncio
import time
class TestClass:
def __init__(self):
m = mp.Manager()
self.blocking_queue = m.Queue()
async def run(self):
loop = asyncio.get_event_loop()
self.non_blocking_queue = asyncio.Queue() # asyncio Queue must be declared within event loop
task1 = loop.create_task(self.non_blocking1())
task2 = loop.create_task(self.non_blocking2())
task3 = loop.create_task(self.print_msgs())
await asyncio.gather(task1, task2)
task3.cancel()
def blocking(self):
i = 0
while i < 5:
time.sleep(0.6)
i += 1
print("Blocking ", i)
line = self.blocking_queue.get()
print("Blocking: ", line)
print("blocking done")
async def non_blocking1(self):
for i in range(5):
await self.non_blocking_queue.put("Hello")
await asyncio.sleep(0.4)
async def non_blocking2(self):
for i in range(5):
await self.non_blocking_queue.put("World")
await asyncio.sleep(0.5)
async def print_msgs(self):
while True:
line = await self.non_blocking_queue.get()
self.blocking_queue.put(line)
print(line)
test_class = TestClass()
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(asyncio.run(test_class.run()))
print("done")
About half the times I run this, it works fine and prints out the text in the blocking and the non-blocking queues. The other half it only prints out the results of the non-blocking queue. It looks like the blocking process isn't started at all. It is not consequent every other time. It might work five times in a row and then not work five times in row.
What might cause such a problem? Which better way can I do this, using both multiprocessing and asyncio?

running the async task "inside" the other process works for me, e.g.:
def runfn(fn):
return asyncio.run(fn())
with ProcessPoolExecutor() as pool:
pool.submit(test_class.blocking)
pool.submit(runfn, test_class.run)
presumably there's some state inside asyncio/the task that needs to be consistent or gets broken when running in another process

Related

How to properly use concurrent.futures with asyncio

I am prototyping a FastAPI app with an endpoint that will launch long-running process using subprocess module. The obvious solution is to use concurrent.futures and ProcessPoolExecutor, however I am unable to get the behavior I want. Code sample:
import asyncio
from concurrent.futures import ProcessPoolExecutor
import subprocess as sb
import time
import random
pool = ProcessPoolExecutor(5)
def long_task(s):
print("started")
time.sleep(random.randrange(5, 15))
sb.check_output(["touch", str(s)])
print("done")
async def async_task():
loop = asyncio.get_event_loop()
print("started")
tasks = [loop.run_in_executor(pool, long_task, i) for i in range(10)]
while True:
print("in async task")
done, _ = await asyncio.wait(tasks, timeout=1)
for task in done:
await task
await asyncio.sleep(1)
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(async_task())
if __name__ == "__main__":
main()
This sample works fine, on the surface, but spawned processes do not get stopped after execution completes - I see all of python processes in ps aux | grep python. Shouldn't awaiting completed task stop it? In the end I do not care much about the result of the execution, it just should happen in the background and exit cleanly - without any hanging processes.
You must close the ProcessPool when you are done using it, either by explicitly calling its shutdown() method, or using it in a ContextManager. I used the ContextManager approach.
I don't know what subprocess.check_output does, so I commented it out.
I also replaced your infinite loop with a single call to asyncio.gather, which will yield until the Executor is finished.
I'm on Windows, so to observe the creation/deletion of Processes I watched the Windows Task Manager. The program creates 5 subprocesses and closes them again when the ProcessPool context manager exits.
import asyncio
from concurrent.futures import ProcessPoolExecutor
# import subprocess as sb
import time
import random
def long_task(s):
print("started")
time.sleep(random.randrange(5, 15))
# sb.check_output(["touch", str(s)])
print("done", s)
async def async_task():
loop = asyncio.get_event_loop()
print("started")
with ProcessPoolExecutor(5) as pool:
tasks = [loop.run_in_executor(pool, long_task, i) for i in range(10)]
await asyncio.gather(*tasks)
print("Completely done")
def main():
asyncio.run(async_task())
if __name__ == "__main__":
main()

Starting a new process from an asyncio loop

I want to start a new Process (Pricefeed) from my Executor class and then have the Executor class keep running in its own event loop (the shoot method). In my current attempt, the asyncio loop gets blocked on the line p.join(). However, without that line, my code just exits. How do I do this properly?
Note: fh.run() blocks as well.
import asyncio
from multiprocessing import Process, Queue
from cryptofeed import FeedHandler
from cryptofeed.defines import L2_BOOK
from cryptofeed.exchanges.ftx import FTX
class Pricefeed(Process):
def __init__(self, queue: Queue):
super().__init__()
self.coin_symbol = 'SOL-USD'
self.fut_symbol = 'SOL-USD-PERP'
self.queue = queue
async def _book_update(self, feed, symbol, book, timestamp, receipt_timestamp):
self.queue.put(book)
def run(self):
fh = FeedHandler()
fh.add_feed(FTX(symbols=[self.fut_symbol, self.coin_symbol], channels=[L2_BOOK],
callbacks={L2_BOOK: self._book_update}))
fh.run()
class Executor:
def __init__(self):
self.q = Queue()
async def shoot(self):
print('in shoot')
for i in range(5):
msg = self.q.get()
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
asyncio.create_task(self.shoot())
p = Pricefeed(self.q)
p.start()
p.join()
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
Since you're using a queue to communicate this is a somewhat tricky problem. To answer your first question as to why removing join makes the program work, join blocks until the process finishes. In asyncio you can't do anything blocking in a function marked async or it will freeze the event loop. To do this properly you'll need to run your process with the asyncio event loop's run_in_executor method which will run things in a process pool and return an awaitable that is compatible with the asyncio event loop.
Secondly, you'll need to use a multiprocessing Manager which creates shared state that can be used by multiple processes to properly share your queue. Managers directly support creation of a shared queue. Using these two bits of knowledge you can adapt your code to something like the following which works:
import asyncio
import functools
import time
from multiprocessing import Manager
from concurrent.futures import ProcessPoolExecutor
def run_pricefeed(queue):
i = 0
while True: #simulate putting an item on the queue every 250ms
queue.put(f'test-{i}')
i += 1
time.sleep(.25)
class Executor:
async def shoot(self, queue):
print('in shoot')
for i in range(5):
while not queue.empty():
msg = queue.get(block=False)
print(msg)
await asyncio.sleep(1) # do some stuff
async def run(self):
with ProcessPoolExecutor() as pool:
with Manager() as manager:
queue = manager.Queue()
asyncio.create_task(self.shoot(queue))
await asyncio.get_running_loop().run_in_executor(pool, functools.partial(run_pricefeed, queue))
async def main():
g = Executor()
await g.run()
if __name__ == '__main__':
asyncio.run(main())
This code has a drawback in that you need to empty the queue in a non-blocking fashing from your asyncio process and wait for a while for new items to come in before emptying it again, effectively implementing a polling mechanism. If you don't wait after emptying, you'll wind up with blocking code and you will freeze the event loop again. This isn't as good as just waiting for the queue to have an item in it by blocking, but may suit your needs. If possible, I would avoid asyncio here and use multiprocessing entirely, for example, by implementing queue processing as a separate process.

Non-blocking launching of concurrent coroutines in Python

I want to execute tasks asynchronously and concurrently. If task1 is running when task2 arrives, task2 is started right away, without waiting for task2 to complete. Also, I would like to avoid callbacks with the help of coroutines.
Here's a concurrent solution with callbacks:
def fibonacci(n):
if n <= 1:
return 1
return fibonacci(n - 1) + fibonacci(n - 2)
class FibonacciCalculatorFuture:
def __init__(self):
self.pool = ThreadPoolExecutor(max_workers=2)
#staticmethod
def calculate(n):
print(f"started n={n}")
return fibonacci(n)
def run(self, n):
future = self.pool.submit(self.calculate, n)
future.add_done_callback(lambda f: print(f.result()))
if __name__ == '__main__':
calculator = FibonacciCalculatorFuture()
calculator.run(35)
calculator.run(32)
print("initial thread can continue its work")
Its output:
started n=35
started n=32
initial thread can continue its work
3524578
14930352
And here's my effort to get rid of callbacks:
class FibonacciCalculatorAsync:
def __init__(self):
self.pool = ThreadPoolExecutor(max_workers=2)
self.loop = asyncio.get_event_loop()
#staticmethod
def calculate_sync(n):
print(f"started n={n}")
return fibonacci(n)
async def calculate(self, n):
result = await self.loop.run_in_executor(self.pool, self.calculate_sync, n)
print(result)
def run(self, n):
asyncio.ensure_future(self.calculate(n))
if __name__ == '__main__':
calculator = FibonacciCalculatorAsync()
calculator.run(35)
calculator.run(32)
calculator.loop.run_forever()
print("initial thread can continue its work")
Output:
started n=35
started n=32
3524578
14930352
In this case initial thread won't be able to go further than loop.run_forever() and hence won't be able to accept new tasks.
So, here's my question: is there a way to simultaneously:
execute tasks concurrently;
be able to accept new tasks and schedule them for execution right away (along with already running taks);
use coroutines and code without callbacks.
The second bullet from your question can be met by running asyncio in a dedicated thread and using asyncio.run_coroutine_threadsafe to schedule coroutines. For example:
class FibonacciCalculatorAsync:
def __init__(self):
self.pool = ThreadPoolExecutor(max_workers=2)
self.loop = asyncio.get_event_loop()
#staticmethod
def calculate_sync(n):
print(f"started n={n}")
return fibonacci(n)
async def calculate(self, n):
result = await self.loop.run_in_executor(self.pool, self.calculate_sync, n)
print(result)
def run(self, n):
asyncio.run_coroutine_threadsafe(self.calculate(n), self.loop)
def start_loop(self):
thr = threading.Thread(target=self.loop.run_forever)
thr.daemon = True
thr.start()
if __name__ == '__main__':
calculator = FibonacciCalculatorAsync()
calculator.start_loop()
calculator.run(35)
calculator.run(32)
print("initial thread can continue its work")
calculator.run(10)
time.sleep(1)
loop.run_forever() will indeed run forever, even if there are no tasks inside. Good news is that you don't need this function. In order to wait for your computations to complete, use asyncio.gather:
class FibonacciCalculatorAsync:
def __init__(self):
self.pool = ThreadPoolExecutor(max_workers=2)
# self.loop = asyncio.get_event_loop()
...
async def calculate(self, n):
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(self.pool, self.calculate_sync, n)
print(result)
async def main():
calculator = FibonacciCalculatorAsync()
fib_35 = asyncio.ensure_future(calculator.run(35))
fib_32 = asyncio.ensure_future(calculator.run(32))
print("initial thread can continue its work")
...
# demand fibonaccy computation has ended
await asyncio.gather(fib_35, fib_32)
if __name__ == '__main__':
asyncio.run(main())
Please note how the loop is handled here - I changed a few things. If you start using asyncio, I'd actually recommend to have one loop for all the things instead of creating loops for more granular task. With this approach, you get all asyncio bells and whistles for handling and synchronizing tasks.
Also, it is not possible to parallelize pure Python non-IO code in ThreadPoolExecutor due to GIL. Keep that in mind and prefer a process pool executor in such cases.
Please note that since python is a single threaded language because of the global interpreter lock, you cannot achieve true concurrency when doing cpu bound tasks, like calculating Fibonacci. It will still just run synchronously. However you can achieve 'fake it til you make it' concurrency with I/O bound tasks like reading/writing to sockets. This can be read about in more depth in O'Reilly Python Concurrency with asyncio. The book will demonstrate with coding examples what I have said above and goes more in depth on asyncio leveraging the OS's underlying event notification api to achieve concurrency with i/o bound tasks.

How to terminate long-running computation (CPU bound task) in Python using asyncio and concurrent.futures.ProcessPoolExecutor?

Similar Question (but answer does not work for me): How to cancel long-running subprocesses running using concurrent.futures.ProcessPoolExecutor?
Unlike the question linked above and the solution provided, in my case the computation itself is rather long (CPU bound) and cannot be run in a loop to check if some event has happened.
Reduced version of the code below:
import asyncio
import concurrent.futures as futures
import time
class Simulator:
def __init__(self):
self._loop = None
self._lmz_executor = None
self._tasks = []
self._max_execution_time = time.monotonic() + 60
self._long_running_tasks = []
def initialise(self):
# Initialise the main asyncio loop
self._loop = asyncio.get_event_loop()
self._loop.set_default_executor(
futures.ThreadPoolExecutor(max_workers=3))
# Run separate processes of long computation task
self._lmz_executor = futures.ProcessPoolExecutor(max_workers=3)
def run(self):
self._tasks.extend(
[self.bot_reasoning_loop(bot_id) for bot_id in [1, 2, 3]]
)
try:
# Gather bot reasoner tasks
_reasoner_tasks = asyncio.gather(*self._tasks)
# Send the reasoner tasks to main monitor task
asyncio.gather(self.sample_main_loop(_reasoner_tasks))
self._loop.run_forever()
except KeyboardInterrupt:
pass
finally:
self._loop.close()
async def sample_main_loop(self, reasoner_tasks):
"""This is the main monitor task"""
await asyncio.wait_for(reasoner_tasks, None)
for task in self._long_running_tasks:
try:
await asyncio.wait_for(task, 10)
except asyncio.TimeoutError:
print("Oops. Some long operation timed out.")
task.cancel() # Doesn't cancel and has no effect
task.set_result(None) # Doesn't seem to have an effect
self._lmz_executor.shutdown()
self._loop.stop()
print('And now I am done. Yay!')
async def bot_reasoning_loop(self, bot):
import math
_exec_count = 0
_sleepy_time = 15
_max_runs = math.floor(self._max_execution_time / _sleepy_time)
self._long_running_tasks.append(
self._loop.run_in_executor(
self._lmz_executor, really_long_process, _sleepy_time))
while time.monotonic() < self._max_execution_time:
print("Bot#{}: thinking for {}s. Run {}/{}".format(
bot, _sleepy_time, _exec_count, _max_runs))
await asyncio.sleep(_sleepy_time)
_exec_count += 1
print("Bot#{} Finished Thinking".format(bot))
def really_long_process(sleepy_time):
print("I am a really long computation.....")
_large_val = 9729379273492397293479237492734 ** 344323
print("I finally computed this large value: {}".format(_large_val))
if __name__ == "__main__":
sim = Simulator()
sim.initialise()
sim.run()
The idea is that there is a main simulation loop that runs and monitors three bot threads. Each of these bot threads then perform some reasoning but also start a really long background process using ProcessPoolExecutor, which may end up running longer their own threshold/max execution time for reasoning on things.
As you can see in the code above, I attempted to .cancel() these tasks when a timeout occurs. Though this is not really cancelling the actual computation, which keeps happening in the background and the asyncio loop doesn't terminate until after all the long running computation have finished.
How do I terminate such long running CPU-bound computations within a method?
Other similar SO questions, but not necessarily related or helpful:
asyncio: Is it possible to cancel a future been run by an Executor?
How to terminate a single async task in multiprocessing if that single async task exceeds a threshold time in Python
Asynchronous multiprocessing with a worker pool in Python: how to keep going after timeout?
How do I terminate such long running CPU-bound computations within a method?
The approach you tried doesn't work because the futures returned by ProcessPoolExecutor are not cancellable. Although asyncio's run_in_executor tries to propagate the cancellation, it is simply ignored by Future.cancel once the task starts executing.
There is no fundamental reason for that. Unlike threads, processes can be safely terminated, so it would be perfectly possible for ProcessPoolExecutor.submit to return a future whose cancel terminated the corresponding process. Asyncio coroutines have well-defined cancellation semantics and could automatically make use of it. Unfortunately, ProcessPoolExecutor.submit returns a regular concurrent.futures.Future, which assumes the lowest common denominator of the underlying executors, and treats a running future as untouchable.
As a result, to cancel tasks executed in subprocesses, one must circumvent the ProcessPoolExecutor altogether and manage one's own processes. The challenge is how to do this without reimplementing half of multiprocessing. One option offered by the standard library is to (ab)use multiprocessing.Pool for this purpose, because it supports reliable shutdown of worker processes. A CancellablePool could work as follows:
Instead of spawning a fixed number of processes, spawn a fixed number of 1-worker pools.
Assign tasks to pools from an asyncio coroutine. If the coroutine is canceled while waiting for the task to finish in the other process, terminate the single-process pool and create a new one.
Since everything is coordinated from the single asyncio thread, don't worry about race conditions such as accidentally killing a process which has already started executing another task. (This would need to be prevented if one were to support cancellation in ProcessPoolExecutor.)
Here is a sample implementation of that idea:
import asyncio
import multiprocessing
class CancellablePool:
def __init__(self, max_workers=3):
self._free = {self._new_pool() for _ in range(max_workers)}
self._working = set()
self._change = asyncio.Event()
def _new_pool(self):
return multiprocessing.Pool(1)
async def apply(self, fn, *args):
"""
Like multiprocessing.Pool.apply_async, but:
* is an asyncio coroutine
* terminates the process if cancelled
"""
while not self._free:
await self._change.wait()
self._change.clear()
pool = usable_pool = self._free.pop()
self._working.add(pool)
loop = asyncio.get_event_loop()
fut = loop.create_future()
def _on_done(obj):
loop.call_soon_threadsafe(fut.set_result, obj)
def _on_err(err):
loop.call_soon_threadsafe(fut.set_exception, err)
pool.apply_async(fn, args, callback=_on_done, error_callback=_on_err)
try:
return await fut
except asyncio.CancelledError:
pool.terminate()
usable_pool = self._new_pool()
finally:
self._working.remove(pool)
self._free.add(usable_pool)
self._change.set()
def shutdown(self):
for p in self._working | self._free:
p.terminate()
self._free.clear()
A minimalistic test case showing cancellation:
def really_long_process():
print("I am a really long computation.....")
large_val = 9729379273492397293479237492734 ** 344323
print("I finally computed this large value: {}".format(large_val))
async def main():
loop = asyncio.get_event_loop()
pool = CancellablePool()
tasks = [loop.create_task(pool.apply(really_long_process))
for _ in range(5)]
for t in tasks:
try:
await asyncio.wait_for(t, 1)
except asyncio.TimeoutError:
print('task timed out and cancelled')
pool.shutdown()
asyncio.get_event_loop().run_until_complete(main())
Note how the CPU usage never exceeds 3 cores, and how it starts dropping near the end of the test, indicating that the processes are being terminated as expected.
To apply it to the code from the question, make self._lmz_executor an instance of CancellablePool and change self._loop.run_in_executor(...) to self._loop.create_task(self._lmz_executor.apply(...)).

async queue hangs when used with background thread

It seems asyncio.Queue only can be pushed by the same thread reading it? For instance:
import asyncio
from threading import Thread
import time
q = asyncio.Queue()
def produce():
for i in range(100):
q.put_nowait(i)
time.sleep(0.1)
async def consume():
while True:
i = await q.get()
print('consumed', i)
Thread(target=produce).start()
asyncio.get_event_loop().run_until_complete(consume())
only prints
consumed 0
and then hangs. What am I missing?
You can't call asyncio methods from another thread directly.
Either use loop.call_soon_threadsafe:
loop.call_soon_threadsafe(q.put_nowait, i)
Or asyncio.run_coroutine_threadsafe:
future = asyncio.run_coroutine_threadsafe(q.put(i), loop)
where loop is the loop returned by asyncio.get_event_loop() in your main thread.

Categories

Resources