aiohttp and asyncio concurrency - python

We have a aiohttp server serving requests at:
from aiohttp import web
app = web.Application()
app.add_routes(
[
web.post("/submit_job", submit_job),
web.get("/get_job/{job_name}", get_job)
]
)
web.run_app(
app, host="127.0.0.1", port=s.kworkers_port, access_log=logger, keepalive_timeout=5,
reuse_address=True, reuse_port=True)
where /submit_job sends a long-running asyncio.Task to the current running event loop:
async def coro():
# Construct a ProcessPoolExecutor object per function run to make sure the resources are cleaned up
# right after the function runs to completion.
with concurrent.futures.ProcessPoolExecutor() as executor:
# Keep a reference to the result to prevent the `run_in_executor` function from
# disappearing midway through running.
result = await asyncio.get_running_loop().run_in_executor(
executor, functools.partial(worker_func, **worker_func_kwargs))
print(f"Got result from running {worker_func.__name__}({worker_func_kwargs}): {result}")
task = asyncio.create_task(coro())
self.background_tasks.add(task)
# To prevent keeping references to finished tasks forever, make each task remove its own reference
# from the set after completion.
task.add_done_callback(self.background_tasks.discard)
where worker_func is a blocking CPU-intensive function.
After a /submit_job call, a separate process polls on /get_job/{job_name} to retrieve the status of the task.
This setup works only when there is no load on the system. As soon as some sort of load is incurred, no matter how light, all /get_job/{job_name} requests hang.
What's wrong with aiohttp+asyncio in this code?

Related

aiohttp: Tasks created from aiojobs.spawn() don't appear in main loop when doing `asyncio.Task.all_tasks()`

I am using asyncio.Task.all_tasks() to figure out which tasks to cancel & which should I wait for during shutdown.
So it looks like this:
web.run_app(web_app, port=PORT, handle_signals=True)
# Then in case the app is stopped, we do cleanup
loop.run_until_complete(wait_for_all_blocker_coroutines_to_finish())
This is the function that waits for tasks that need to be completed before shutting down:
async def wait_for_all_blocker_coroutines_to_finish() -> None:
started_time = datetime.now()
all_tasks = asyncio.Task.all_tasks() - {asyncio.current_task()}
# all_tasks doesn't contain any tasks that I created inside of the spawn() coroutine
logging.debug(f"Total tasks unfinished: {len(all_tasks)}")
loop = asyncio.get_event_loop()
logging.debug(f"Checking in loop {loop._thread_id}")
logging.debug(all_tasks)
coroutines = list(filter(filter_tasks_with_meta, all_tasks))
logging.debug(coroutines)
total = len(coroutines)
logging.debug(f"Waiting for all blocker coroutines to finish ({total} total)")
await asyncio.gather(*coroutines, return_exceptions=True)
duration = datetime.now() - started_time
seconds = duration.total_seconds()
logging.debug(f"Coroutines unblocked after {seconds} seconds")
Somewhere inside of spawn(coro) I do this:
class TaskMeta(TypedDict):
is_meta: bool
blocker: bool
def name_for_task_with_meta(task_meta: TaskMeta) -> str:
return json.dumps(task_meta)
def create_app_blocking_task(coro) -> asyncio.Task:
# We differentiate between tasks that need to be waited for using TaskMeta that is then converted into json string and saved in the name parameter (later we filter tasks by name)
name = name_for_task_with_meta(TaskMeta(is_meta=True, blocker=True))
loop = asyncio.get_running_loop()
task = loop.create_task(coro, name=name)
logging.debug(f"Creating blocking task with meta={name}, loop_id={loop._thread_id}")
return task
job = create_app_blocking_task(coro)
The tasks that I want to wait for are created inside aiojobs.spawn(). When I do asyncio.Task.all_tasks() inside of the coroutine ran by aiojobs.spawn(), it displays the correct listing of tasks.
However, my shutdown handler is outside of aiojobs.spawn() coroutine, and when I do asyncio.Task.all_tasks(), it doesn't return anything. Zero running tasks, even though they are actually running.
From what I understand, asyncio.Task.all_tasks() returns all running tasks inside of the current asyncio loop. Could it be that spawn() creates a separate loop and thus tasks from there don't appear in the main loop? If so, how can I prevent this? Or can I get all loops and then get all tasks from each loop? Or do graceful shutdown for spawn() separately?
EDIT: So I figured out that these tasks get cancelled when I stop the application. My question now is: How do I prevent that?
So the problem here was that aiohttp cancelled tasks before my shutdown handler could process them. The solution I used was to process tasks inside a on_shutdown handler.
web_app = web.Application(client_max_size=1024 * 1024 * 40)
web_app.on_shutdown.append(on_shutdown)

Wrap websockets asyncio with synchronous API

I'm using a python websockets library in order to create a websocket server. My goals is to export a synchronous API, as it's going to be used outside of python.
As such, I need, at least at the beginning, start() and stop() methods. So it seems right to create a websocket server class for that.
Main issue it that the way to create (and start) a server through the library is by awaiting, and so these methods should encouraged to be async, which I try to avoid.
The following code work perfectly when I run the main() function.
When runnig server = WebsocketServer(); server.start(1234) through ipython shell I can't seem to connect through a client code. What am I missing?
class WebsocketServer():
def __init__(self):
self._server = None
def start(self, port):
start_server = websockets.serve(self._handle_client,
host='localhost',
port=port)
self._server = asyncio.get_event_loop().run_until_complete(start_server)
def stop(self):
if self._server:
self._server.close()
self._server = None
async def _handle_client(self, client, path):
async for buffer in client:
await self._handle_buffer(client, buffer)
async def _handle_buffer(self, client, buffer):
print(buffer)
await client.send(buffer)
def main():
server = WebsocketServer()
server.start(1234)
asyncio.get_event_loop().run_forever()
The synchronous interface into the IO loop is via tasks. Scheduling methods return a future that can be synchronously waited on if needed. The run an event loop section of the docs features a combo for synchronous shutdown in the bottom.
When running inside an iPython shell, one option is to spawn a daemon background thread for the IO loop and register an atexit callback to synchronously shutdown the IO loop when the shell exits.
Another option is to "borrow" shell's thread once in a while for the IO tasks (only works for short tasks, of cause) using the UI Event Loop integration point described here describing how to borrow the shell thread for the IO.
You are missing the last line from you main function.
asyncio.get_event_loop().run_forever()
Nothing happens when the loop is not running. So, the server won't be running unless you run the loop.

python cherrypy: how to start long background (stoppable) task

I've built a restful service in python using cherrypy, which is multi-thread by default. Thus two different http sessions don't block each other.
For a given endpoint of my API i need a way to start a long (non blocking) background task which i can stop at any time. Actually i am using a new thread to run the task which allows the user to send other requests to the server without wait for the long task to complete. Unfortunately i need also a way to stop the background task at any time, and it seems i can't stop the new thread from the main thread (am i correct?).
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
thread = Thread(target=longRunningTask, args=())
thread.start()
return {"message" : "Long task started"}
I've tried multiprocessing Process instead of thread, but this seems to block the main thread (client can't get any response from the server until the background task is completed):
#cp.expose
#cp.tools.json_in()
#cp.tools.json_out()
class LongTaskEndpoint(object):
def GET(self):
process = multiprocessing.Process(target=longRunningTask, args=())
process.start()
return {"message" : "Long task started"}
How can i start a long background task which does not block the main thread (for each http session) and which the server can stop at any moment?

Collecting results from celery worker with asyncio

I am having a Python application which offloads a number of processing work to a set of celery workers. The main application has to then wait for results from these workers. As and when result is available from a worker, the main application will process the results and will schedule more workers to be executed.
I would like the main application to run in a non-blocking fashion. As of now, I am having a polling function to see whether results are available from any of the workers.
I am looking at the possibility of using asyncio get notification about result availability so that I can avoid the polling. But, I could not find any information on how to do this.
Any pointers on this will be highly appreciated.
PS: I know with gevent, I can avoid the polling. However, I am on python3.4 and hence would prefer to avoid gevent and use asyncio.
You must be looking for asyncio.as_completed(coros). It yields as and when the results are ready from different coroutines. It returns an iterator which yields - in the order in which they are completed. You might also want to see how it differs from asyncio.gather(*coros) which returns once everything submitted to it has fininshed
import asyncio
from asyncio.coroutines import coroutine
#coroutine
def some_work(x, y):
print("doing some background work")
yield from asyncio.sleep(1.0)
return x * y
#coroutine
def some_other_work(x, y):
print("doing some background other work")
yield from asyncio.sleep(3.0)
return x + y
#coroutine
def as_when_completed():
# give me results as and when they are ready
coros = [some_work(2, 3), some_other_work(2, 3)]
for futures in asyncio.as_completed(coros):
res = yield from futures
print(res)
#coroutine
def when_all_completed():
# when everything is complete
coros = [some_work(2, 3), some_other_work(2, 3)]
results = yield from asyncio.gather(*coros)
print(results)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# loop.run_until_complete(when_all_completed())
loop.run_until_complete(as_when_completed())
I implement on_finish function of celery worker to publish a message to redis
then in the main app uses aioredis to subscribe the channel, once got notified, the result is ready

asyncio start_server timeout issue

I have a TCP server implemented in Python using asyncio's create_server.
I call the coroutine start_server with a connection_handler_cb.
Now my question is this: let's say my connection_handler_cb looks something
like this:
def connection_handler_cb(reader, writer):
while True:
yield from reader.read()
--do some computation--
I know that only the yield from coroutines are being run "concurrently" (I know it's not really concurrent), all the "--do some computation--" part is being called sequentially and is preventing everything else from running in the loop.
Let's say we are talking about a TCP server with multiple clients trying to send. Can this situation cause send timeout from the other side - the client side?
If your clients are waiting for a response from the server, and that response isn't sent until the computation is done, then it's possible the clients could eventually timeout, if the computations took long enough. More likely, though, is that the clients will just hang until the computations are done and the event loop gets unblocked.
In any case, if you're worried about timeouts or hangs, use loop.run_in_executor to run your computations in a background process (this is preferable), or thread (probably not a good choice since you're doing CPU-bound computations) without blocking the event loop:
import asyncio
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
def comp_func(arg1, arg2):
# Do computation here
return output
def connection_handler_cb(reader, writer):
while True:
yield from reader.read()
# Do computation in a background process
# This won't block the event loop.
output = yield from loop.run_in_executor(None, comp_func, arg1, arg2) #
if __name__ == "__main__":
executor =
loop = asyncio.get_event_loop()
loop.set_default_executor(
ProcessPoolExecutor(multiprocessing.cpu_count()))
asyncio.async(asyncio.start_server(connect_handler_cb, ...))
loop.run_forever()

Categories

Resources