I think everyone knows what to do with long-running tasks in django: use celery and relax. But what if I want to get benefits of the websockets with aiohttp (or tornado)?
Let's say I have very CPU bound task which can take from a couple of seconds till multiple (5-10) minutes. It looks like pretty good idea to handle this task in websocket loop and notify user about the progress. No ajax requests, very fast response for short tasks.
async def websocket_handler(request):
ws = web.WebSocketResponse()
await ws.prepare(request)
async for msg in ws:
if msg.tp == aiohttp.MsgType.text:
answer_to_the_ultimate_question_of_life_the_universe_and_everything =\
long_running_task(msg.data, NotificationHelper(ws))
ws.send_str(json.dumps({
'action': 'got-answer',
'data': answer_to_the_ultimate_question_of_life_the_universe_and_everything,
}))
return ws
But on the other hand, CPU-bound task served in such way blocks entire thread as I understand. If I have 10 workers and 11 clients who wants to use application, 11th client won't be served until the 1st client's task is done.
Maybe, I should run tasks which look big in celery and tasks which look small in the main loop?
So, my question: is there any good design pattern for serving long-running tasks with async server?
Thanks!
Just run your long-running CPU-bound task by loop.run_in_executor() and send progress notifications by loop.call_soon_threadsafe().
If your job is not CPU but IO bound (sending emails for example) you may create a new task by loop.create_task() call. It looks like spawning new thread.
If you cannot use fire-and-forget approach you need to use persistent message broker like RabbitMQ (there is https://github.com/benjamin-hodgson/asynqp library for communicating with Rabbit in asyncio way).
Related
I have one program that collects data from a websocket, processes the data and if some conditions apply I want to call another function that does something with the data.
This is easy enough, but I want the program that collects the data from the websocket to keep running.
I have 'fixed' this quite ugly by writing the data in a database and letting the second program check the database every few seconds. But I don't want to use this solution, since I occasionally get database is locked errors.
Is there a way to start program B from program A while program A keeps running?
I have looked at multi threading and multi processing, and I feel this could be a way to solve it, but while I grasp the basic of that, it is still a bit too difficult for me to use.
Is there an easier way? and if not should I study multi threading or multi processing more?
(or if anyone knows a good guide/video, that would be great too!)
I suggest launching a worker thread, waiting for data to process. Main thread listen to websocket, and send data to worker through pipe.
The logic of worker is:
while True:
data = peek_data_or_sleep(pipe)
process_data(data)
This way you won't get thousands of workers when incoming traffic is high.
So the key point is how to send data to worker, usually a pipe or message queue.
I've used Celery with RabbitMQ as message queue. Send data to Celery from Django server, and Celery call your function from another process.
Here is an example assuming you are using asyncio for WebSockets:
import asyncio
from time import sleep
async def web_socket(queue: asyncio.Queue):
for i in range(5):
await asyncio.sleep(1.0)
await queue.put(f"Here is message n°{i}!")
await queue.put(None)
def expensive_work(message: str):
sleep(0.5)
print(message)
async def worker(queue: asyncio.Queue):
while True:
message = await queue.get()
if message is None: break
await asyncio.to_thread(expensive_work, message)
async def main():
queue = asyncio.Queue()
await asyncio.gather(
web_socket(queue),
worker(queue)
)
if __name__ == "__main__":
asyncio.run(main())
The web_socket() function simulates a websocket listener which receives messages. For each received message, it put it in a queue that will be shared with another task running concurrently and processing the message.
The expensive_work() function simulates the processing task to apply to each message.
The worker() function will be running concurrently to the websocket listener. It reads values from the shared queue and process them. If the processing is really expensive (for instance a CPU-bound task) consider running it in a ProcessPoolExecutor (see here how to do that) to avoid blocking the event loop.
Finally, the main() function creates the shared queue, launches the two tasks concurrently with asyncio.gather() and then awaits the completion of both tasks.
If you are using threads and blocking IO, the solution is essentially similar but using threading Threads and queue.Queue. Beware not to mix multithreading and asyncio concurrency, or search on how to do it properly.
I've a simple FastAPI server
#app.get("/")
async def root():
time.sleep(0.5) # any I/O bound task
return {"message": "home"}
I'm sending 10 concurrent requests to the server, which would take little over 0.5 seconds if it was a def method instead of async def:
# sending concurrent requests
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(send_request, [...])
I understand that I should be using some asynchronous method to sleep on the server but why does my CONCURRENT code behave sequentially and taking 5+ seconds when I'm sending concurrent requests?
Technically there's some latency in every line so does it mean I need to make every line asynchronous?
Defination :
asyncio is a library to write concurrent code using the async/await syntax.
asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc.
asyncio is often a perfect fit for IO-bound and high-level structured network code.
You can see async performance in IO/Bound like : hitting to DB, Calling API and etc.
In your example time.sleep isn't IO/bound, you can use await asyncio.sleep(0.5) to imagine you have IO/bound.
As i said, if you wanna use async performance you should consider IO/bounds, in CPU bound like solving algorithm, encryption or other process depends on CPU/bound , async functions not helping you.
Simply put, whenever our system is waiting for an answer from an external service, instead of our system sitting idle, it goes to another task to get the answer from the external system.
Is there a way to run all messages that arrive to the same websocket sequentially, in a blocking way, while without blocking messages arriving to different websocket?
So let's assume someone is using ThreadPoolExecutor with 8 threads (to utilize all available cores), together with the yield statement and the #gen.coroutine decorator, every time the server runs executor.submit the task goes to some thread arbitrarily. I'd like to enforce that for a given WebSocket, only one thread will handle the tasks, in order to assure things will run sequentially.
In my application, I have python celery tasks that connect to a rest API.. simple.
The problem I have is that the API does not allow multiple resuests with the same credentials.
Is there a way to have these api tasks blocking in the queue? Meaning, If multiple requests are made around the same time, can I have the tasks sit in the queue and execute one by one, waiting for the first in the queue to finish?
Currently, in the rabbitmq message queue (with one worker), i see the tasks go through (spawned) and not wait.
I looked over documentation but could not find a simple solution.
Thanks.
With one worker it's impossible for celery to do more than one task at a time. what you may be seeing is called prefetching which allows the worker to reserve tasks.
http://docs.celeryproject.org/en/latest/userguide/optimizing.html#prefetch-limits
The default prefetch value is 4, turn it down to one and see if that fixes it.
The workflow of my app is -
User submits a file
On receiving -> process_file()
return response
This could result in timeouts if the process_file() takes much time, so how could I send back the response before and then process the file and send the desired output to user later.
I have checked out django-celery but I think it's quite heavy for a small app which I am trying to build.
Update: I searched a bit around on the internet, and if anyone would like to use celery, here is a nice blog post, that could help you solve this situation - [Link]
You can use Celery for that matter:
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
The execution units, called tasks, are executed concurrently on a single or more worker servers using multiprocessing, Eventlet, or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).