The code in the docs has a while True: block and I am curious if something like that would deadlock the process. If I get two requests, would the second one just not go through? why or why not?
Source: https://fastapi.tiangolo.com/advanced/websockets/
#app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
while True:
data = await websocket.receive_text()
await websocket.send_text(f"Message text was: {data}")
The short answer (based on the code sample given in your question) is yes, subsequent requests would go through (regardless of the while True loop).
The long answer is depends on what kind of operations you would like to perform inside the async function or while True loop (i.e., I/O-bound or CPU-bound operations?). When a function is called with await and concerns an I/O-bound operation (i.e., waiting for data from the client to be sent through the network, waiting for contents of a file in the disk to be read, etc.), the event loop (main thread) can then continue on and service other coroutines (requests) while waiting for that operation to finish (i.e., in your case, waiting for a message to be either received or sent). If, however, the await concerns a CPU-bound operation, or a CPU-bound operation is performed regardless of calling it using await (i.e, audio or image processing, machine learning, etc.), the event loop will get blocked; that means, no further requests would go through, until that CPU-bound operation is completed.
Please refer to this answer for more details.
Related
I have one program that collects data from a websocket, processes the data and if some conditions apply I want to call another function that does something with the data.
This is easy enough, but I want the program that collects the data from the websocket to keep running.
I have 'fixed' this quite ugly by writing the data in a database and letting the second program check the database every few seconds. But I don't want to use this solution, since I occasionally get database is locked errors.
Is there a way to start program B from program A while program A keeps running?
I have looked at multi threading and multi processing, and I feel this could be a way to solve it, but while I grasp the basic of that, it is still a bit too difficult for me to use.
Is there an easier way? and if not should I study multi threading or multi processing more?
(or if anyone knows a good guide/video, that would be great too!)
I suggest launching a worker thread, waiting for data to process. Main thread listen to websocket, and send data to worker through pipe.
The logic of worker is:
while True:
data = peek_data_or_sleep(pipe)
process_data(data)
This way you won't get thousands of workers when incoming traffic is high.
So the key point is how to send data to worker, usually a pipe or message queue.
I've used Celery with RabbitMQ as message queue. Send data to Celery from Django server, and Celery call your function from another process.
Here is an example assuming you are using asyncio for WebSockets:
import asyncio
from time import sleep
async def web_socket(queue: asyncio.Queue):
for i in range(5):
await asyncio.sleep(1.0)
await queue.put(f"Here is message n°{i}!")
await queue.put(None)
def expensive_work(message: str):
sleep(0.5)
print(message)
async def worker(queue: asyncio.Queue):
while True:
message = await queue.get()
if message is None: break
await asyncio.to_thread(expensive_work, message)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
web_socket(queue),
worker(queue)
)
if __name__ == "__main__":
asyncio.run(main())
The web_socket() function simulates a websocket listener which receives messages. For each received message, it put it in a queue that will be shared with another task running concurrently and processing the message.
The expensive_work() function simulates the processing task to apply to each message.
The worker() function will be running concurrently to the websocket listener. It reads values from the shared queue and process them. If the processing is really expensive (for instance a CPU-bound task) consider running it in a ProcessPoolExecutor (see here how to do that) to avoid blocking the event loop.
Finally, the main() function creates the shared queue, launches the two tasks concurrently with asyncio.gather() and then awaits the completion of both tasks.
If you are using threads and blocking IO, the solution is essentially similar but using threading Threads and queue.Queue. Beware not to mix multithreading and asyncio concurrency, or search on how to do it properly.
I'm having issues with asyncio queues. Execution gets stuck on await queue.get() if the queue is empty - even if I publish something into the queue.
I have a loop which reads the event queue, which starts right after the app loads, so the queue is empty on the first await. In a different co-routine I publish a message to this queue, however the execution waits on the await statement. Only a single consumer is reading the queue. I publish the message using put_nowait():
async def _event_loop(self):
while True:
try:
# if self.events.empty():
# await asyncio.sleep(0.1)
# continue
ev = await self.events.get()
print(ev)
If I uncomment the commented out part, the whole thing starts working.
I noticed a similar issue here:
https://github.com/mosquito/aio-pika/issues/56
But I had no luck figuring out how to fix this.
Does anyone have any idea what's wrong?
You are filling the queue from a thread different than the one that runs the event loop. By design, asyncio queues are not thread-safe and can only be safely accessed from asyncio coroutines and callbacks.
You can fix the issue by changing your call to queue.put_nowait(elem), to something like loop.call_soon_threadsafe(queue.put_nowait, elem), where loop is the event loop object which you must also pass to the thread, probably the same way you pass the queue.
why would then the uncommented part of the code fix the issue?
Uncommenting effectively removes the need for the coroutine to wake up while waiting on an empty queue. The wakeup didn't work because put_nowait assumes it is run from the event loop thread, and therefore doesn't need to emit an additional wakeup signal. See e.g. this answer for details.
I want to write a library that mixes synchronous and asynchronous work, like:
def do_things():
# 1) do sync things
# 2) launch a bunch of slow async tasks and block until they are all complete or an exception is thrown
# 3) more sync work
# ...
I started implementing this using asyncio as an excuse to learn the learn the library, but as I learn more it seems like this may be the wrong approach. My problem is that there doesn't seem to be a clean way to do 2, because it depends on the context of the caller. For example:
I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
Marking do_things as async is too heavy because it shouldn't require the caller to be async. Plus, if do_things was async, calling synchronous code (1 & 3) from an async function seems to be bad practice.
asyncio.get_event_loop() also seems wrong, because it may create a new loop, which if left running would prevent the caller from creating their own loop after calling do_things (though arguably they shouldn't do that). And based on the documentation of loop.close, it looks like starting/stopping multiple loops in a single thread won't work.
Basically it seems like if I want to use asyncio at all, I am forced to use it for the entire lifetime of the program, and therefore all libraries like this one have to be written as either 100% synchronous or 100% asynchronous. But the behavior I want is: Use the current event loop if one is running, otherwise create a temporary one just for the scope of 2, and don't break client code in doing so. Does something like this exist, or is asyncio the wrong choice?
I can't use asyncio.run(), because the caller could already have a running event loop and you can only have one loop per thread.
If the caller has a running event loop, you shouldn't run blocking code in the first place because it will block the caller's loop!
With that in mind, your best option is to indeed make do_things async and call sync code using run_in_executor which is designed precisely for that use case:
async def do_things():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, sync_stuff)
await async_func()
await loop.run_in_executor(None, more_sync_stuff)
This version of do_things is usable from async code as await do_things() and from sync code as asyncio.run(do_things()).
Having said that... if you know that the sync code will run very briefly, or you are for some reason willing to block the caller's event loop, you can work around the limitation by starting an event loop in a separate thread:
def run_async(aw):
result = None
async def run_and_store_result():
nonlocal result
result = await aw
t = threading.Thread(target=asyncio.run, args=(run_and_store_result(),))
t.start()
t.join()
return result
do_things can then look like this:
async def do_things():
sync_stuff()
run_async(async_func())
more_sync_stuff()
It will be callable from both sync and async code, but the cost will be that:
it will create a brand new event loop each and every time. (Though you can cache the event loop and never exit it.)
when called from async code, it will block the caller's event loop, thus effectively breaking its asyncio usage, even if most time is actually spent inside its own async code.
I feel like there is a gap in my understanding of async IO: is there a benefit to wrapping small functions into coroutines, within the scope of larger coroutines? Is there a benefit to this in signaling the event loop correctly? Does the extent of this benefit depend on whether the wrapped function is IO or CPU-bound?
Example: I have a coroutine, download(), which:
Downloads JSON-serialized bytes from an HTTP endpoint via aiohttp.
Compresses those bytes via bz2.compress() - which is not in itself awaitable
Writes the compressed bytes to S3 via aioboto3
So parts 1 & 3 use predefined coroutines from those libraries; part 2 does not, by default.
Dumbed-down example:
import bz2
import io
import aiohttp
import aioboto3
async def download(endpoint, bucket_name, key):
async with aiohttp.ClientSession() as session:
async with session.request("GET", endpoint, raise_for_status=True) as resp:
raw = await resp.read() # payload (bytes)
# Yikes - isn't it bad to throw a synchronous call into the middle
# of a coroutine?
comp = bz2.compress(raw)
async with (
aioboto3.session.Session()
.resource('s3')
.Bucket(bucket_name)
) as bucket:
await bucket.upload_fileobj(io.BytesIO(comp), key)
As hinted by the comment above, my understanding has always been that throwing a synchronous function like bz2.compress() into a coroutine can mess with it. (Even if bz2.compress() is probably more IO-bound than CPU-bound.)
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
(And now comp = await compress(raw) within download().)
Wa-la, this is now an awaitable coroutine, because a sole return is valid in a native coroutine. Is there a case to be made for using this?
Per this answer, I've heard justification for randomly throwing in asyncio.sleep(0) in a similar manner - just to single back up to the event loop that the calling coroutine wants a break. Is this right?
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
There is no benefit to it whatsoever. Contrary to expectations, adding an await doesn't guarantee that the control will be passed to the event loop - that will happen only if the awaited coroutine actually suspends. Since compress doesn't await anything, it will never suspend, so it's a coroutine in name only.
Note that adding await asyncio.sleep(0) in coroutines does not solve the problem; see this answer for a more detailed discussion. If you need to run a blocking function, use run_in_executor:
async def compress(*args, **kwargs):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, lambda: bz2.compress(*args, **kwargs))
Coroutines allow you to run something concurrently, not in parallel. They allow for a single-threaded cooperative multitasking. This makes sense in two cases:
You need to produce results in lockstep, like two generators would.
You want something useful be done while another coroutine is waiting for I/O.
Things like http requests or disk I/O would allow other coroutines to run while they are waiting for completion of an operation.
bz2.compress() is synchronous and, I suppose, does not release GIL but does release GIL while it is running. This means that no meaningful work can be done while it's running. That is, other coroutines would not run during its invocation, though other threads would.
If you anticipate a large amount of data to compress, so large that the overhead of running a coroutine is small in comparison, you can use bz2.BZ2Compressor and feed it with data in reasonably small blocks (like 128KB), write the result to a stream (S3 supports streaming, or you can use StringIO), and await asyncio.sleep(0) between compressing blocks to yield control.
This will allow other coroutines to also run concurrently with your compression coroutine. Possibly async S3 upload will be occurring in parallel at the socket level, too, while your coroutine would be inactive.
BTW making your compressor explicitly an async generator can be a simpler way to express the same idea.
I am writing a Python program which schedules a number of asynchronous, I/O-bound items to occur, many of which will also be scheduling other, similar work items. The work items themselves are completely independent of one another and they do not require each others' results to be complete, nor do I need to gather any results from them for any sort of local output (beyond logging, which takes place as part of the work items themselves).
I was originally using a pattern like this:
async def some_task(foo):
pending = []
for x in foo:
# ... do some work ...
if some_condition:
pending.append(some_task(bar))
if pending:
await asyncio.wait(pending)
However, I was running into trouble with some of the nested asyncio.wait(pending) calls sometimes hanging forever, even though the individual things being awaited were always completing (according to the debug output that was produced when I used KeyboardInterrupt to list out the state of the un-gathered results, which showed all of the futures as being in the done state). When I asked others for help they said I should be using asyncio.create_task instead, but I am not finding any useful information about how to do this nor have I been able to get clarification from the people who suggested this.
So, how can I satisfy this use case?
Python asyncio.Queue may help to tie your program processing to program completion. It has a join() method which will block until all items in the queue have been received and processed.
Another benefit that I like is that the worker becomes more explicit as it pulls from a queue processes, potentially adds more items, and then ACKS, but this is just personal preference.
async def worker(q):
while True:
item = await queue.get()
# process item potentially requeue more work
if some_condition:
await q.put('something new')
queue.task_done()
async def run():
queue = asyncio.Queue()
worker = asyncio.ensure_future(worker(queue))
await queue.join()
worker.cancel()
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
loop.close()
The example above was adapted from asyncio producer_consumer example and modified since your worker both consumes and produces:
https://asyncio.readthedocs.io/en/latest/producer_consumer.html
I'm not super sure how to fix your specific example but I would def look at the primitives that asyncio offers to help the event loop hook into your program state, notably join and using a Queue.