FastAPI asynchronous background tasks blocks other requests? - python

I want to run a simple background task in FastAPI, which involves some computation before dumping it into the database. However, the computation would block it from receiving any more requests.
from fastapi import BackgroundTasks, FastAPI
app = FastAPI()
db = Database()
async def task(data):
otherdata = await db.fetch("some sql")
newdata = somelongcomputation(data,otherdata) # this blocks other requests
await db.execute("some sql",newdata)
#app.post("/profile")
async def profile(data: Data, background_tasks: BackgroundTasks):
background_tasks.add_task(task, data)
return {}
What is the best way to solve this issue?

Your task is defined as async, which means fastapi (or rather starlette) will run it in the asyncio event loop.
And because somelongcomputation is synchronous (i.e. not waiting on some IO, but doing computation) it will block the event loop as long as it is running.
I see a few ways of solving this:
Use more workers (e.g. uvicorn main:app --workers 4). This will allow up to 4 somelongcomputation in parallel.
Rewrite your task to not be async (i.e. define it as def task(data): ... etc). Then starlette will run it in a separate thread.
Use fastapi.concurrency.run_in_threadpool, which will also run it in a separate thread. Like so:
from fastapi.concurrency import run_in_threadpool
async def task(data):
otherdata = await db.fetch("some sql")
newdata = await run_in_threadpool(lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
Or use asyncios's run_in_executor directly (which run_in_threadpool uses under the hood):
import asyncio
async def task(data):
otherdata = await db.fetch("some sql")
loop = asyncio.get_running_loop()
newdata = await loop.run_in_executor(None, lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
You could even pass in a concurrent.futures.ProcessPoolExecutor as the first argument to run_in_executor to run it in a separate process.
Spawn a separate thread / process yourself. E.g. using concurrent.futures.
Use something more heavy-handed like celery. (Also mentioned in the fastapi docs here).

If your task is CPU bound you could use multiprocessing, there is way to do that with Background task in FastAPI:
https://stackoverflow.com/a/63171013
Although you should consider to use something like Celery if there are lot of cpu-heavy tasks.

Read this issue.
Also in the example below, my_model.function_b could be any blocking function or process.
TL;DR
from starlette.concurrency import run_in_threadpool
#app.get("/long_answer")
async def long_answer():
rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
return rst

This is a example of Background Task To FastAPI
from fastapi import FastAPI
import asyncio
app = FastAPI()
x = [1] # a global variable x
#app.get("/")
def hello():
return {"message": "hello", "x":x}
async def periodic():
while True:
# code to run periodically starts here
x[0] += 1
print(f"x is now {x}")
# code to run periodically ends here
# sleep for 3 seconds after running above code
await asyncio.sleep(3)
#app.on_event("startup")
async def schedule_periodic():
loop = asyncio.get_event_loop()
loop.create_task(periodic())
if __name__ == "__main__":
import uvicorn
uvicorn.run(app)

Related

how to call functions/objects from another process

Summarize the problem
I have a flask server (with endpoints and socket events) and a discord bot, both work independently, I want to run them on parallel so I can trigger functions of the bot from a flask endpoint.
Describe what you have tried
For context, this is an example of the endpoint:
#app.route("/submit", methods=["POST"])
async def submit():
data = request.json
userid = int(os.getenv("USER_ID"))
message = f"```Title: {data['title']}\nMessage: {data['message']}```"
await send_dm(userid, message)
return data
Where send_dm in its own package looks like this
# notice that this is not a method of a function
# nor a decorated function with properties from the discord library
# it just uses an intance of the commands.Bot class
async def send_dm(userid: int, message: str):
user = await bot.fetch_user(userid)
channel = await user.create_dm()
await channel.send(message)
So to run them on parallel and be able to communicate them with each other I tried:
Attempt 1: multiprocessing module
def main():
executor = ProcessPoolExecutor(2)
loop = asyncio.new_event_loop()
loop.run_in_executor(executor, start_bot)
loop.run_in_executor(executor, start_server)
loop.run_forever()
if __name__ == "__main__":
run()
When the function on the endpoint mentioned executes I get the following error AttributeError: '_MissingSentinel' object has no attribute 'is_set' on concurrent tasks
Attempt 2: threading module
# make them aware of each other
bot.flask_app = app
app.bot = bot
async def main():
task = threading.Thread(target=start_bot)
task.start()
start_server()
if __name__ == "__main__":
asyncio.run(main())
This approach brings two issues:
First is that to run start_bot I use .start() method instead of .run() because according to the this example .run() created its own event pool which would make it unreachable by other processes, and .start() is an async function, so when running this I get the error: RuntimeError: You cannot use AsyncToSync in the same thread as an async event loop - just await the async function directly.
Second is that even using the run.() function then the same issue arises when executing mentioned endpoint.
Attempt 3: asyncio module
def main():
executor = ProcessPoolExecutor(2)
loop = asyncio.new_event_loop()
boo = loop.run_in_executor(executor, start_bot)
baa = loop.run_in_executor(executor, start_server)
loop.run_forever()
if __name__ == "__main__":
main()
This time I actually get the execute both processes but still cannot call the function I want from the flask endpoint.
I also tried
await asyncio.gather([start_server(), start_bot()])
But same issue as Attempt 2, and I already upgraded the flask[async] module so that is not the issue anymore.
Show some code
To reproduce what I have right now you can either check the full repo here that has only 4 files or this sample should be enough to reproduce.
from server import socketio, app
from bot import bot
from dotenv import load_dotenv
import os
import asyncio
import threading
env_path = os.path.dirname(__file__) + "/.env"
load_dotenv(env_path)
def start_server():
socketio.run(app)
def start_bot():
token = os.getenv("BOT_TOKEN")
bot.run(token)
async def main():
# this doesn't achieve what I want and is the main part of the problem
start_server()
start_bot()
if __name__ == "__main__":
try:
asyncio.run(main())
except KeyboardInterrupt:
print("Program exited")
Did I miss something?, point it out in the comments and I will add it in the edit.

How to run in parallel queries via asyncpg? [duplicate]

I'm learning about the magic of concurrency in python and I have a script that I have been using await in(via fastapi's framework).
I need to get multiple data from my database and do something like:
DBresults1 = await db_conn.fetch_rows(query_statement1)
DBresults2 = await db_conn.fetch_rows(query_statement2)
DBresults3 = await db_conn.fetch_rows(query_statement3)
#then on to processing...
The problem is my awaits are causing my code to run sequentially which is getting to be slow.
Is there a way to have my 3 queries(or any function call for that matter) to run without await but await until the entire group is done so it can execute concurrently?
You can use gather to await multiple tasks non-sequentially. You mentioned FastAPI, this example might help. It does a 3 second task and a 2 second task in ~3 seconds total. Although the first task takes longer, gather will give you results in the order you list them, not the order they complete.
For this to work, the long running part needs to be actual awaitable IO (simulated with asyncio.sleep here) not CPU bound work.
If you run this in a terminal, then call GET localhost:8080 from Postman (or whatever is convenient for you) you'll see what's happening in the logs.
import logging
import asyncio
import time
from fastapi import FastAPI
logging.basicConfig(level=logging.INFO, format="%(levelname)-9s %(asctime)s - %(name)s - %(message)s")
LOGGER = logging.getLogger(__name__)
app = FastAPI()
async def takes_three():
LOGGER.info("will sleep for 3s...")
await asyncio.sleep(3)
LOGGER.info("takes_three awake!")
return "takes_three done"
async def takes_two():
LOGGER.info("will sleep for 2s...")
await asyncio.sleep(2)
LOGGER.info("takes_two awake!")
return "takes_two done"
#app.get("/")
async def root():
start_time = time.time()
LOGGER.info("starting...")
results = await asyncio.gather(*[takes_three(), takes_two()])
duration = time.time() - start_time
LOGGER.info(f"results: {results} after {duration:.2f}")
return f"finished after {duration:.2f}s!"
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="127.0.0.1", port=8080)
you can use asyncio.wait like in example below
async def get_suggest(entity, ids, company_id):
...
return ...
jobs = []
for entity, ids, in dict(data.filters).items():
jobs.append(get_suggest(entity, ids, company_id))
result = {}
done, _ = await asyncio.wait(jobs)
for job in done:
values, entity = job.result()
result[entity] = values

Asynchronous Python server: fire and forget at startup

I am writing a server that must handle asynchronous tasks. I'd rather stick to asyncio for the asynchronous code, so I chose to use Quart[-OpenAPI] framework with Uvicorn.
Now, I need to run a task (master.resume() in the code below) when the server is starting up without waiting for it to finish, that is, firing and forgetting it.
I'm not sure if it's even possible with asyncio, as I cannot await for this task but if I don't I get a coroutine X was never awaited error. Using loop.run_until_complete() as suggested in this answer would block the server until the task completes.
Here's a skeleton of the code that I have:
import asyncio
from quart_openapi import Pint, Resource
app = Pint(__name__, title="Test")
class Master:
...
async def resume():
await coro()
async def handle_get()
...
#app.route("/test")
class TestResource(Resource):
async def get(self):
print("Received get")
asyncio.create_task(master.handle_get())
return {"message": "Request received"}
master = Master()
# How do I fire & forget master.resume()?
master.resume() # <== This throws "RuntimeWarning: coroutine 'Master.resume' was never awaited"
asyncio.get_event_loop().run_until_complete(master.resume()) # <== This prevents the server from starting
Should this not be achievable with asyncio/Quart, what would be the proper way to do it?
It is see these docs, in summary,
#app.before_serving
async def startup():
asyncio.ensure_future(master.resume())
I'd hold on to the task though, so that you can cancel it at shutdown,
#app.before_serving
async def startup():
app.background_task = asyncio.ensure_future(master.resume())
#app.after_serving
async def shutdown():
app.background_task.cancel() # Or something similar

Removing async pollution from Python

How do I remove the async-everywhere insanity in a program like this?
import asyncio
async def async_coro():
await asyncio.sleep(1)
async def sync_func_1():
# This is blocking and synchronous
await async_coro()
async def sync_func_2():
# This is blocking and synchronous
await sync_func_1()
if __name__ == "__main__":
# Async pollution goes all the way to __main__
asyncio.run(sync_func_2())
I need to have 3 async markers and asyncio.run at the top level just to call one async function. I assume I'm doing something wrong - how can I clean up this code to make it use async less?
FWIW, I'm interested mostly because I'm writing an API using asyncio and I don't want my users to have to think too much about whether their functions need to be def or async def depending on whether they're using a async part of the API or not.
After some research, one answer is to manually manage the event loop:
import asyncio
async def async_coro():
await asyncio.sleep(1)
def sync_func_1():
# This is blocking and synchronous
loop = asyncio.get_event_loop()
coro = async_coro()
loop.run_until_complete(coro)
def sync_func_2():
# This is blocking and synchronous
sync_func_1()
if __name__ == "__main__":
# No more async pollution
sync_func_2()
If you must do that, I would recommend an approach like this:
import asyncio, threading
async def async_coro():
await asyncio.sleep(1)
_loop = asyncio.new_event_loop()
threading.Thread(target=_loop.run_forever, daemon=True).start()
def sync_func_1():
# This is blocking and synchronous
return asyncio.run_coroutine_threadsafe(async_coro(), _loop).result()
def sync_func_2():
# This is blocking and synchronous
sync_func_1()
if __name__ == "__main__":
sync_func_2()
The advantage of this approach compared to one where sync functions run the event loop is that it supports nesting of sync functions. It also only runs a single event loop, so that if the underlying library wants to set up e.g. a background task for monitoring or such, it will work continuously rather than being spawned each time anew.

No speedup using asyncio despite awaiting API response

I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))
Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.

Categories

Resources