Post-startup event in FastAPI? - python

I have a FastAPI app that serves an ML model. It is deployed on Kubernetes. For best practices Kubernetes recommends implementing a liveliness endpoint in your API that it can probe to see when the application has completed startup, as well as a readiness endpoint to probe to see when the application is ready to start receiving requests.
Currently, I have implemented both the liveliness and readiness endpoints as a single endpoint, which returns a status code of 200 once the ML model has been loaded and the endpoints are available for requests.
This is ok, but ideally, I would like a liveliness endpoint to return 200 once FastAPI's startup has completed, and a readiness endpoint to return 200 once the models have been loaded (takes much longer than the application startup).
FastAPI allows for startup event triggers where I could initiate the loading of the model, but no endpoints become available until the application startup is complete, which will not be complete until the startup events are also complete.
Is there anyway to implement and "post-startup" event in FastAPI where I could initiate the loading of the model?
Here is some simple example code for what I would like to achieve:
from fastapi import FastAPI, Response
from request import PredictionRequest
import model
app = FastAPI()
#app.on_event("post-startup") # not possible
def load_model():
model.load()
#app.get("/live")
def is_live():
return Response(status_code=200)
#app.get("/ready")
def is_ready():
if model.is_loaded():
return Response(status_code=200)
else:
return Response(status_code=409)
#app.post('/predict')
def predict(request: PredictionRequest):
return model.predict(request)

At the moment there are only 2 events: "shutdown" and "startup"
These are a subsection of the ASGI protocol and are implemented by Starlette and available in FastAPI.

Related

On startup of uvicorn execute script and cache the data

I am using the fastapi to build a RESTful webservice. To run the service, I am using uvicorn. On startup of uvicorn, I want to execute the my python script, which will make a database call and cache some data, so that it can be reused until uvicorn is running. I tried looking up documents of uvicorn, but did not find any reference.
Is there anyway to achieve my requirement?
Use the FastAPI startup event. From FastAPI docs:
from fastapi import FastAPI
app = FastAPI()
items = {}
#app.on_event("startup")
async def startup_event():
items["foo"] = {"name": "Fighters"}
items["bar"] = {"name": "Tenders"}
In this case, the startup event handler function will initialize the
items "database" (just a dict) with some values.
You can add more than one event handler function.
And your application won't start receiving requests until all the
startup event handlers have completed.
(this feature is actually implemented in starlette - the ASGI framework that FastAPI is built on)

How to let a single Cloud Run instance (with Gunicorn sync workers) handle multiple POST requests from Pub/Sub?

Scenario
I'm trying to deploy an application written in Python that parses e-mail files to images on Google Cloud Run with Flask and gunicorn. The Cloud Run instance is triggered by POST requests from a Pub/Sub topic. I'm trying to get my Cloud Run instances to handle multiple requests by using multiple sync workers with gunicorn, but I cannot seem to achieve this. I've been browsing SO and google for the last few hours and I just cannot figure it out. I feel like I'm missing something extremely simple.
This is what my pipeline should look like:
New e-mail is placed in the Storage bucket.
Storage sends a notification to Pub/Sub topic.
Pub/Sub performs a POST request to HTTPS endpoint of Cloud Run instance.
Cloud Run processes e-mail to images and saves results in another Storage bucket.
Setup
I have configured the Cloud Run service with --concurrency=3 and --max-instances=10. Pub/Sub uses an --ack-deadline of 10 minutes (600 seconds). This is how my Cloud Run instances are started:
CMD exec gunicorn --bind :$PORT main:app --workers 3 --timeout 0
My (simplified) main.py:
import os
import json
import base64
import traceback
from flask import Flask, request
from flask_cors import CORS
from flask_sslify import SSLify
from src.utils.data_utils import images_from_email
app = Flask(__name__)
CORS(app, supports_credentials=True)
sslify = SSLify(app)
#app.route("/", methods=['GET', 'POST'])
def preprocess_emails():
envelope = request.get_json()
data = json.loads(base64.b64decode(pubsub_message["data"]).decode())
try:
# function that processes email referenced in pubsub message to images
fn, num_files, img_bucket, processed_eml_bucket = images_from_email(data)
# here I do some logging
return "", 204
except Exception as e:
traceback.print_exception(type(e), e, e.__traceback__)
return "", 500
return "", 500
if __name__ == "__main__":
app.run(ssl_context="adhoc", host="0.0.0.0", port=int(os.environ.get("PORT", 8080)))
Problem
With the above setup, I would expect that my Cloud Run service can handle 30 requests at a time. Each instance should be able to handle 3 requests (1 for each gunicorn worker) and a maximum of 10 instances can spawn. What actually happens is the following:
As soon as I throw 15 new emails in my storage bucket, 15 POST requests are made to my Cloud Run endpoint via Pub/Sub. But instead of spawning 5 Cloud Run instances each handling 3 requests, Cloud Run instantly tries to spawn an instance (with 3 workers) for every (!) request, where it seems each of the 3 workers is processing the same request. This eventually results in HTTP 429 errors "The request was aborted because there was no available instance". I'm also noticing in the logs that some of the email files are being processed by multiple Cloud Run instances at the same time. What am I doing wrong? Does this have something to do with having to enable multiprocessing in my Python code somehow or is this gunicorn/Cloud Run/PubSub related?
The lifetime of a Google Cloud Run instance begins with an HTTP Request, in your case POST, and ends when the request returns a response. There is no background processing supported.
This is why there is no point to that design. Cloud Run is an HTTP Request/Response system. The GFE(Goblal FrontEnd) sitting in front of Cloud Run determines which instance receives requests (current instance or another instance).
The Cloud Run design is from one-to-N requests per instance. You do not need to do anything to receive N requests other than set the correct command line option for concurrency and not overload the instance. In the simplest view, do nothing. Your instance will receive multiple requests if set to do so (default is 80 requests per instance).
Code a function that handles a request and returns a response. That is all that is required for your example. Worker threads will just waste resources and be terminated as there are no background processing between requests.
Remember, there is a Global Front End (GFE) in front of Cloud Run that handles proxying, routing requests, etc. Your code does not make those decisions.

How can we create asynchronous API in django?

I want to create a third party chatbot API which is asynchronous and replies "ok" after 10 seconds pause.
import time
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
return wait()
I have tried celery for the same as follows where I am waiting for celery response in view itself:
import time
from celery import shared_task
#shared_task
def wait():
time.sleep(10)
return "ok"
# views.py
def api(request):
a = wait.delay()
work = AsyncResult(a.id)
while True:
if work.ready():
return work.get(timeout=1)
But this solution works synchronously and makes no difference. How can we make it asynchronous without asking our user to keep on requesting until the result is received?
As mentioned in #Blusky's answer:
The asynchronous API will exist in django 3.X. not before.
If this is not an option, then the answer is just no.
Please note as well, that even with django 3.X any django code, that accesses the database will not be asynchronous it had to be executed in a thread (thread pool)
Celery is intended for background tasks or deferred tasks, but celery will never return an HTTP response as it didn't receive the HTTP request to which it should respond to. Celery is also not asyncio friendly.
You might have to think of changing your architecture / implementation.
Look at your overall problem and ask yourself whether you really need an asynchronous API with Django.
Is this API intended for browser applications or for machine to machine applications?
Could your client's use web sockets and wait for the answer?
Could you separate blocking and non blocking parts on your server side?
Use django for everything non blocking, for everything periodic / deferred (django + celelry) and implement the asynchronous part with web server plugins or python ASGI code or web sockets.
Some ideas
Use Django + nginx nchan (if your web server is nginx)
Link to nchan: https://nchan.io/
your API call would create a task id, start a celery task, return immediately the task id or a polling url.
The polling URL would be handled for example via an nchan long polling channel.
your client connects to the url corresponding to an nchan long polling channel and celery deblocks it whenever you're task is finished (the 10s are over)
Use Django + an ASGI server + one handcoded view and use strategy similiar to nginx nchan
Same logic as above, but you don't use nginx nchan, but implement it yourself
Use an ASGI server + a non blocking framework (or just some hand coded ASGI views) for all blocking urls and Django for the rest.
They might exchange data via the data base, local files or via local http requests.
Just stay blocking and throw enough worker processes / threads at your server
This is probably the worst suggestion, but if it is just for personal use,
and you know how many requests you will have in parallel then just make sure you have enough Django workers, so that you can afford to be blocking.
I this case you would block an entire Django worker for each slow request.
Use websockets. e.g. with the channels module for Django
Websockets can be implemented with earlier versions of django (>= 2.2) with the django channels module (pip install channels) ( https://github.com/django/channels )
You need an ASGI server to server the asynchronous part. You could use for example Daphne ot uvicorn (The channel doc explains this rather well)
Addendum 2020-06-01: simple async example calling synchronous django code
Following code uses the starlette module as it seems quite simple and small
miniasyncio.py
import asyncio
import concurrent.futures
import os
import django
from starlette.applications import Starlette
from starlette.responses import Response
from starlette.routing import Route
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'pjt.settings')
django.setup()
from django_app.xxx import synchronous_func1
from django_app.xxx import synchronous_func2
executor = concurrent.futures.ThreadPoolExecutor(max_workers=2)
async def simple_slow(request):
""" simple function, that sleeps in an async matter """
await asyncio.sleep(5)
return Response('hello world')
async def call_slow_dj_funcs(request):
""" slow django code will be called in a thread pool """
loop = asyncio.get_running_loop()
future_func1 = executor.submit(synchronous_func1)
func1_result = future_func1.result()
future_func2 = executor.submit(synchronous_func2)
func2_result = future_func2.result()
response_txt = "OK"
return Response(response_txt, media_type="text/plain")
routes = [
Route("/simple", endpoint=simple_slow),
Route("/slow_dj_funcs", endpoint=call_slow_dj_funcs),
]
app = Starlette(debug=True, routes=routes)
you could for example run this code with
pip install uvicorn
uvicorn --port 8002 miniasyncio:app
then on your web server route these specific urls to uvicorn and not to your django application server.
The best option is to use the futur async API, which will be proposed on Django in 3.1 release (which is already available in alpha)
https://docs.djangoproject.com/en/dev/releases/3.1/#asynchronous-views-and-middleware-support
(however, you will need to use an ASGI Web Worker to make it work properly)
Checkout Django 3 ASGI (Asynchronous Server Gateway Interface) support:
https://docs.djangoproject.com/en/3.0/howto/deployment/asgi/

Setup custom request context with flask

I have a complex service that runs flask queries asynchronously. So the flask app accepts requests and submits them to a queue and returns a handle to the caller. Then an async service picks up these requests and runs them and then submits the response to a data-store. The caller would continuously poll the flask endpoint to check if the data is available. Currently, this asynchronous feature is only available for a single flask endpoint. But I want to extend this to multiple flask endpoints. As such, I am putting in the code that submits the request to the queue in a python decorator. So that this decorator can be applied to any flask endpoint and then it would support this asynchronous feature.
But to achieve this seamlessly, I have the need to setup a custom request context for flask. This is because the flask endpoints use request.args, request.json, jsonify from flask. And the async service just calls the functions associated with the flask endpoints.
I tried using app.test_request_context() but this doesn't allow me to assign to request.json.
with app.test_request_context() as req:
req.request.json = json.dump(args)
The above doesn't work and throws the below error
AttributeError: can't set attribute
How can I achieve this?
Answer is
builder = EnvironBuilder(path='/',
query_string=urllib.urlencode(query_options), method='POST', data=json.dumps(post_payload),
content_type="application/json")
env = builder.get_environ()
with app.request_context(env):
func_to_call(*args, **kwargs)

Python flask async action in view

I got Flask application written in python 3.6.
On registration there is created user and synchronization with external service is done (takes ~5s).
class Register(Resource):
def post(self):
submitted_data = flask.request.get_json(force=True)
user = User.register(submitted_data)
# long-running API calls
user.synchronize_with_external_service()
return {}, 201
User is SQLALchemy model. During synchronize_with_external_service external service is called and some fields in db are updated. Due to this synchronization taking long time I would like to do it asynchronously and response user immediately. I thought about celery, asyncio or multi-threading to do it, but I would like to know which option is best.
Application is setup on AWS Elasticbean (load balancing, 2 instances, multi-docker).

Categories

Resources