Optimal way to initialize heavy services only once in FastAPI

Optimal way to initialize heavy services only once in FastAPI - python

The FastAPI application I started working on, uses several services, which I want to initialize only once, when the application starts and then use the methods of this object in different places.
It can be a cloud service or any other heavy class.
Possible ways is to do it with Lazy loading and with Singlenton pattern, but I am looking for better approach for FastAPI.
Another possible way, is to use Depends class and to cache it, but its usage makes sense only with route methods, not with other regular methods which are called from route methods.
Example:
async def common_parameters(q: Optional[str] = None, skip: int = 0, limit: int = 100):
return {"q": q, "skip": skip, "limit": limit}
async def non_route_function(commons: dict = Depends(common_parameters)):
print(commons) # returns `Depends(common_parameters)`
#router.get('/test')
async def test_endpoint(commons: dict = Depends(common_parameters)):
print(commons) # returns correct dict
await non_route_function()
return {'success': True}
There can be also used #app.on_event("startup") event to initialize heavy class there, but have no idea how to make this initialized object accessible from every place, without using singleton.
Another ugly way is also to save initialized objects into #app( and then get this app from requests, but then you have to pass request into each non-route function.
All of the ways I have described are either ugly, uncovenient, non-pythonic or worse practice, we also don't have here thread locals and proxy objects like in flask, so what is the best approach for such kind of problem I have described above?
Thanks!

It's usually a good idea to initialize the heavy objects before launching the FastAPI application. That way you're done with initialization when the application starts listening for connections (and is made available by the load balancer).
You can set up these dependencies and do any initialization in the same location as you set up your app and main routers, since they're a part of the application as well. I usually expose the heavy object through a light weight service that exposes useful endpoints to the controllers themselves, and the service object is then injected through Depends.
Exactly how you want to perform the initialization will depend on what other requirements you have in the application - for example if you're planning to re-use the infrastructure in cli tools or use them in cron as well.
This is the way I've been doing it in a few projects, and so far it has worked out fine and kept code changes located in their own vicinities.
Simulated heavy class in heavylifting/heavy.py with from .heavy import HeavyLifter in __init__.py:
import time
class HeavyLifter:
def __init__(self, initial):
self.initial = initial
time.sleep(self.initial)
def do_stuff(self):
return 'we did stuff'
A skeleton project created in a module named foo (heavylifting lives under foo/heavylifting for now to make sense of the imports below):
foo/app.py
from fastapi import FastAPI, APIRouter
from .heavylifting import HeavyLifter
heavy = HeavyLifter(initial=3)
from .views import api_router
app = FastAPI()
app.include_router(api_router)
foo/services.py
The service layer in the application; the services are the operations and services that the application exposes to controllers, handling business logic and other co-related activities. If a service needs access to heavy, it adds a Depends requirement on that service.
class HeavyService:
def __init__(self, heavy):
self.heavy = heavy
def operation_that_requires_heavy(self):
return self.heavy.do_stuff()
class OtherService:
def __init__(self, heavy_service: HeavyService):
self.heavy_service = heavy_service
def other_operation(self):
return self.heavy_service.operation_that_requires_heavy()
foo/app_services.py
This exposes the services defined to the application as dependency lightweight injections. Since the services only attach their dependencies and gets returned, they're quickly created for a request and then discarded afterwards.
from .app import heavy
from .services import HeavyService, OtherService
from fastapi import Depends
async def get_heavy_service():
return HeavyService(heavy=heavy)
async def get_other_service_that_uses_heavy(heavy_service: HeavyService = Depends(get_heavy_service)):
return OtherService(heavy_service=heavy_service)
foo/views.py
Example of an exposed endpoint to make FastAPI actually serve something and test the whole service + heavy chain:
from fastapi import APIRouter, Depends
from .services import OtherService
from .app_services import get_other_service_that_uses_heavy
api_router = APIRouter()
#api_router.get('/')
async def index(other_service: OtherService = Depends(get_other_service_that_uses_heavy)):
return {'hello world': other_service.other_operation()}
main.py
The application entrypoint. Could live in app.py as well.
from fooweb.app import app
if __name__ == '__main__':
import uvicorn
uvicorn.run('fooweb.app:app', host='0.0.0.0', port=7272, reload=True)
This way the heavy client gets initialized on startup, and uvicorn starts serving requests when everything is live. Depending on how the heavy client is implemented it might need to pool and recreate sockets if they can get disconnected for inactivity (as most database libraries offer).
I'm not sure if the example is easy enough to follow, or that if it serves what you need, but hopefully it'll at least get you a bit further.

Related

Get flow run UUID in Prefect 2.0

I'm currently discovering Prefect and I'm trying to deploy it to schedule workflows. I struggle a bit to understand how to access some data though. Here is my problem: I create a deployment and run it via the Python API and I need the ID of the flow run it creates (to cancel it, may other things happen outside of the flow).
When I run without any scheduling I can access the data I need (the flow run UUID), but I kind of want the scheduling part. It may be because the run_deployment function is asynchronous but as I am nowhere near being an expert in Python I don't know for sure (well that, and the fact that my code never exits after calling the main() function).
Here is what my code looks like:
from prefect import flow, task
from prefect.deployments import Deployment, run_deployment
from datetime import datetime, date, time, timezone
# Import the flow:
from script import my_flow
# Configure the deployment:
deployment_name = "my_deployment"
# Create the deployment for the flow:
deployment = Deployment.build_from_flow(
flow = my_flow,
name = deployment_name,
version = 1,
work_queue_name = "my_queue",
)
deployment.apply()
def main():
# Schedule a flow run based on the deployment:
response = run_deployment(
name = "my_flow/" + deployment_name,
parameters = {my_param},
scheduled_time = dateutil.parser.isoparse(scheduledDate),
flow_run_name = "my_run",
)
print(response)
if __name__ == "__main__":
main()
exit()
I searched a bit and saw in that post that it was possible to print the flow run id as it was executed, but in my case I need before the execution.
Is there anyway to get that data (using the Python API)? Or to set the flow ID myself? (I've already thoroughly checked the docs, I'm quite sure this is not possible)
Thanks a lot for your time!
Gauthier

As of 2.7.12 - released the same day you posted your question - you can create names for flows programmatically. Does that get you what you need?

As of 2.7.12 - released the same day you posted your question - you can create names for flows programmatically. Does that get you what you need?
Both tasks and flows now expose a mechanism for customizing the names of runs! This new keyword argument (flow_run_name for flows, task_run_name for tasks) accepts a string that will be used to create a run name for each run of the function. The most basic usage is as follows:
from datetime import datetime
from prefect import flow, task
#task(task_run_name="custom-static-name")
def my_task(name):
print(f"hi {name}")
#flow(flow_run_name="custom-but-fixed-name")
def my_flow(name: str, date: datetime):
return my_task(name)
my_flow()
This is great, but doesn’t help distinguish between multiple runs of the same task or flow. In order to make these names dynamic, you can template them using the parameter names of the task or flow function, using all of the basic rules of Python string formatting as follows:
from datetime import datetime
from prefect import flow, task
#task(task_run_name="{name}")
def my_task(name):
print(f"hi {name}")
#flow(flow_run_name="{name}-on-{date:%A}")
def my_flow(name: str, date: datetime):
return my_task(name)
my_flow()
See the docs or https://github.com/PrefectHQ/prefect/pull/8378 for more details.

run_deployment returns a flow run object - which you named response in your code.
If you want to get the ID before the flow run is actually executed, you just have to set timeout=0, so that run_deployment will return immediately after submission.
You only have to do:
flow_run = run_deployment(
name = "my_flow/" + deployment_name,
parameters = {my_param},
scheduled_time = dateutil.parser.isoparse(scheduledDate),
flow_run_name = "my_run",
timeout=0
)
print(flow_run.id)

Connect Function App to CosmosDB with Managed Identity

I'm trying to write a function in a Function App that manipulates data in a CosmosDB. I get it working if I drop the read-write key in the environment variables. To make it more robust I wanted it to work as a managed identity app. The app has the role 'DocumentDB Account Contributor' on the Cosmos DB.
However, the CosmosClient constructor doesn't accept a Credential and needs the read-write key. I've been chasing down the rabbit hole of azure.mgmt.cosmosdb.operations where there is a DatabaseAccountsOperations class with a list_keys() method. I can't find a neat way to access that function though. If I try to create that object (which requires poaching the config, serializer and deserializer from my dbmgmt object) it still requires the resourceGroupName and accountName.
I can't help but think that I've taken a wrong turn somewhere because this has to be possible in a more straightforward manner. Especially given that the JavaScript SDK references a more logical class CosmosDBManagementClient in line with the SubscriptionClient. However, I can't find that class anywhere on the python side.
Any pointers?
from azure.identity import DefaultAzureCredential
from azure.cosmos import CosmosClient
from azure.mgmt.resource import SubscriptionClient
from azure.mgmt.cosmosdb import CosmosDB
from .cred_wrapper import CredentialWrapper
def main(req: func.HttpRequest) -> func.HttpResponse:
request_body = req.get_body()
# credential = DefaultAzureCredential()
# https://gist.github.com/lmazuel/cc683d82ea1d7b40208de7c9fc8de59d
credential = CredentialWrapper()
uri = os.environ.get('cosmos-db-uri')
# db = CosmosClient(url=uri, credential=credential) # Doesn't work, wants a credential that is a RW/R key
# Does work if I replace it with my primary / secondary key but the goal is to remove dependence on that.
subscription_client = SubscriptionClient(credential)
subscription = next(subscription_client.subscriptions.list())
dbmgmt = CosmosDB(credential, subscription.subscription_id) # This doesn't accept the DB URI??
operations = list(dbmgmt.operations.list()) # I see the list_keys() Operation there...
EDIT
A helpful soul provided a response here but removed it before I could even react or accept it as the answer. They pointed out that there is an equivalent python SDK and that from azure.mgmt.cosmosdb import CosmosDBManagementClient would do the trick.
From there, I was on my own as that resulted in
ImportError: cannot import name 'CosmosDBManagementClient' from 'azure.mgmt.cosmosdb'
I believe the root of the problem lies in an incompatibility of the package azure-mgmt. After removing azure-mgmt from my requirements.txt and only loading the cosmos and identiy related packages, the import error was resolved.
This solved 90% of the problem.
dbmgmt = CosmosDBManagementClient(credential, subscription.subscription_id, c_uri)
print(dbmgmt.database_accounts.list_keys())
TypeError: list_keys() missing 2 required positional arguments: 'resource_group_name' and 'account_name'
Does one really need to collect each of these parameters? Compared to the example that reads a secret from a Vault it seems so convoluted.

For other unfortunate ones looking to access CosmosDB with Managed Identity, it seems that this is, as of May 2021, not yet possible.
Source: Discussion on Github

Update 12/05/2021 - I came here finding a solution for this with Javascript/Typescript. So leaving the answer here for others. I think that a similar approach could work for Python.
You can use RBAC for data plane operations with Managed Identities. Finding the documentation was difficult.
RBAC for Cosmos DB data plane operations with Managed Identities
Important - If you get the error Request blocked by Auth mydb : Request is blocked because principal [xxxxxx-6fad-44e4-98bc-2d423a88b65f] does not have required RBAC permissions to perform action Microsoft.DocumentDB/databaseAccounts/readMetadata on resource [/]. Don't use the Portal to assign roles, use the Azure CLI for CosmosDB.
How to - creating a role assignment for a user/system MSI/user MSI is done using the Azure CosmosDB CLI
# Find the role ID:
resourceGroupName='<myResourceGroup>'
accountName='<myCosmosAccount>'
az cosmosdb sql role definition list --account-name $accountName --resource-group $resourceGroupName
# Assign to the MSI or user managed MSI:
readOnlyRoleDefinitionId = '<roleDefinitionId>' # as fetched above
principalId = '<aadPrincipalId>'
az cosmosdb sql role assignment create --account-name $accountName --resource-group $resourceGroupName --scope "/" --principal-id $principalId --role-definition-id $readOnlyRoleDefinitionId
Once this step is done, the code for connecting is very easy. Use the #azure/identity package's Default Credential. This works in Azure Function App with managed identity and on your laptop with VS code or with az login.
Docs for #azure/identity sdk
Examples of authentication with #azure/identity to get the credential object
import { CosmosClient } from "#azure/cosmos";
import { DefaultAzureCredential, ManagedIdentityCredential, ChainedTokenCredential } from "#azure/identity";
const defaultCredentials = new DefaultAzureCredential();
const managedCredentials = new ManagedIdentityCredential();
const aadCredentials = new ChainedTokenCredential(managedCredentials, defaultCredentials);
client = new CosmosClient({
endpoint: "https://mydb.documents.azure.com:443/",
aadCredentials
});

Python Tornado respond to GET request

I am misunderstanding something very basic probably. I am new to tornado and web servers in general. I used some tutorials and a lot of googling to get started but I still find myself stuck at the basics.
The Situation
I am using python 3.6.9 on an Ubuntu 18.04 server with tornado 6.0.4.
I have a tornado server that accepts GET requests via a tornado.web.RequestHandler class get () function and does some computation on it. This all works properly.
I need the tornado server to return the results (a numpy array) to the client that sent the request.
To my knowledge everything I am doing is synchronous as I did not add any async code myself.
My code in a nutshell:
class MainHandler(tornado.web.RequestHandler):
def get(self):
base_data = self.get_argument("base_data")
compute_data(base_data)
#Here I want to return the data back to the client
application = tornado.web.Application(handlers=[ (r"/calculator", MainHandler)])
if __name__ == "__main__":
http_server=tornado.httpserver.HTTPServer(application)
http_server.listen(__PORT__)
tornado.ioloop.IOLoop.instance().start()
The problem
I do not have info about the client.
I do not have any idea and cannot find any tutorial explaining how to respond back to a client from a GET request.
What I tried
I tried simply returning the np.array at the end of my get() function but I got:
TypeError: object numpy.ndarray can't be used in 'await' expression
I thought what I need to do is make a POST request back to the client, but I do not (that I know of) have the IP and port of the client.
I also found randomly maybe I should use tornado.ioloop.IOLoop.current().spawn_callback(data) but that wasn't right I guess because it asked me for a callable function.
What I want to happen
I want to send back the computed data to the client that requested it.
Thanks in advance for any help available. I know I am probably misunderstanding the very basics of what tornado is meant to do or how it works, but I can't find any place addressing this question specifically.

See official documentation:
Many methods in RequestHandler are designed to be overridden in
subclasses and be used throughout the application. It is common to
define a BaseHandler class that overrides methods such as write_error
and get_current_user and then subclass your own BaseHandler instead of
RequestHandler for all your specific handlers.
So in your example it is also possible to write a write_response method that could make it easier to write responses in MainHandler as well as in other handlers.
See a simple example:
from tornado.web import RequestHandler
from http import HTTPStatus
import json
class BaseHandler(RequestHandler):
def write_response(self, status_code, result=None, message=None):
self.set_status(status_code)
if result:
self.finish(json.dumps(result))
elif message:
self.finish(json.dumps({
"message": message
}))
elif status_code:
self.set_status(status_code)
self.finish()
class MainHandler(BaseHandler):
def get(self):
self.write_response(status_code=HTTPStatus.OK, message='Hello calculator!')
If the data you return to the client is in the form below, then use write_response with the result argument
data = ['foo', {'bar': ('baz', None, 1.0, 2)}]
self.write_response(status_code=HTTPStatus.OK, result=data)
# and so you will send to the client:
["foo", {"bar": ["baz", null, 1.0, 2]}]
# or
your_numpy_list = your_numpy_object.tolist()
self.write_response(status_code=HTTPStatus.OK, result=your_numpy_list)

So I was missing the most basic thing.
Apparently in Tornado self.write({"data_name":data}) in the get() function will return the data.
Now I am still running into an issue of not being able to return byte data (my circumstances have changed and now I need to turn the numpy array into a wav file and send the wav file over) and I am getting a different error that Object of type 'bytes' is not JSON serializable but if I wont be able to figure it out I will open a new question for it.

How to interact with the UI when testing an application written by kivy?

The application is written by kivy.
I want to test a function via pytest, but in order to test that function, I need to initalize the object first, but the object needs something from the UI when initalizing, but I am at testing phase, so don't know how to retrieve something from the UI.
This is the class which has an error and has been handled
class SaltConfig(GridLayout):
def check_phone_number_on_first_contact(self, button):
s = self.instanciate_ServerMsg(tt)
try:
s.send()
except HTTPError as err:
print("[HTTPError] : " + str(err.code))
return
# some code when running without error
def instanciate_ServerMsg():
return ServerMsg()
This is the helper class which generates the ServerMsg object used by the former class.
class ServerMsg(OrderedDict):
def send(self,answerCallback=None):
#send something to server via urllib.urlopen
This is my tests code:
class TestSaltConfig:
def test_check_phone_number_on_first_contact(self):
myError = HTTPError(url="http://127.0.0.1", code=500,
msg="HTTP Error Occurs", hdrs="donotknow", fp=None)
mockServerMsg = mock.Mock(spec=ServerMsg)
mockServerMsg.send.side_effect = myError
sc = SaltConfig(ds_config_file_missing.data_store)
def mockreturn():
return mockServerMsg
monkeypatch.setattr(sc, 'instanciate_ServerMsg', mockreturn)
sc.check_phone_number_on_first_contact()
I can't initialize the object, it will throw an AttributeError when initialzing since it needs some value from UI.
So I get stuck.
I tried to mock the object then patch the function to the original one, but won't work either since the function itself has has logic related to UI.
How to solve it? Thanks

I made an article about testing Kivy apps together with a simple runner - KivyUnitTest. It works with unittest, not with pytest, but it shouldn't be hard to rewrite it, so that it fits your needs. In the article I explain how to "penetrate" the main loop of UI and this way you can happily go and do with button this:
button = <button you found in widget tree>
button.dispatch('on_release')
and many more. Basically you can do anything with such a test and you don't need to test each function independently. I mean... it's a good practice, but sometimes (mainly when testing UI), you can't just rip the thing out and put it into a nice 50-line test.
This way you can do exactly the same thing as a casual user would do when using your app and therefore you can even catch issues you'd have trouble with when testing the casual way e.g. some weird/unexpected user behavior.
Here's the skeleton:
import unittest
import os
import sys
import time
import os.path as op
from functools import partial
from kivy.clock import Clock
# when you have a test in <root>/tests/test.py
main_path = op.dirname(op.dirname(op.abspath(__file__)))
sys.path.append(main_path)
from main import My
class Test(unittest.TestCase):
def pause(*args):
time.sleep(0.000001)
# main test function
def run_test(self, app, *args):
Clock.schedule_interval(self.pause, 0.000001)
# Do something
# Comment out if you are editing the test, it'll leave the
# Window opened.
app.stop()
def test_example(self):
app = My()
p = partial(self.run_test, app)
Clock.schedule_once(p, 0.000001)
app.run()
if __name__ == '__main__':
unittest.main()
However, as Tomas said, you should separate UI and logic when possible, or better said, when it's an efficient thing to do. You don't want to mock your whole big application just to test a single function that requires communication with UI.

Finally made it, just get things done, I think there must be a more elegant solution. The idea is simple, given the fact that all lines are just simply value assignment except the s.send() statement.
Then we just mock the original object, every time when some errors pop up in the testing phase (since the object lack some values from the UI), we mock it, we repeat this step until the testing method can finally test if the function can handle the HTTPError or not.
In this example, we only need to mock a PhoneNumber class which is lucky, but some times we may need to handle more, so obviously #KeyWeeUsr 's answer is an more ideal choice for the production environment. But I just list my thinking here for somebody who wants a quick solution.
#pytest.fixture
def myHTTPError(request):
"""
Generating HTTPError with the pass-in parameters
from pytest_generate_tests(metafunc)
"""
httpError = HTTPError(url="http://127.0.0.1", code=request.param,
msg="HTTP Error Occurs", hdrs="donotknow", fp=None)
return httpError
class TestSaltConfig:
def test_check_phone_number( self, myHTTPError, ds_config_file_missing ):
"""
Raise an HTTP 500 error, and invoke the original function with this error.
Test to see if it could pass, if it can't handle, the test will fail.
The function locates in configs.py, line 211
This test will run 2 times with different HTTP status code, 404 and 500
"""
# A setup class used to cover the runtime error
# since Mock object can't fake properties which create via __init__()
class PhoneNumber:
text = "610274598038"
# Mock the ServerMsg class, and apply the custom
# HTTPError to the send() method
mockServerMsg = mock.Mock(spec=ServerMsg)
mockServerMsg.send.side_effect = myHTTPError
# Mock the SaltConfig class and change some of its
# members to our custom one
mockSalt = mock.Mock(spec=SaltConfig)
mockSalt.phoneNumber = PhoneNumber()
mockSalt.instanciate_ServerMsg.return_value = mockServerMsg
mockSalt.dataStore = ds_config_file_missing.data_store
# Make the check_phone_number_on_first_contact()
# to refer the original function
mockSalt.check_phone_number = SaltConfig.check_phone_number
# Call the function to do the test
mockSalt.check_phone_number_on_first_contact(mockSalt, "button")

making a cache for json which can be explicitly purged

I have many routes on blueprints that do something along these lines:
# charts.py
#charts.route('/pie')
def pie():
# ...
return jsonify(my_data)
The data comes from a CSV which is grabbed once every x hours by a script which is separate from the application. The application reads this using a class which is then bound to the blueprint.
# __init__.py
from flask import Blueprint
from helpers.csv_reader import CSVReader
chart_blueprint = Blueprint('chart_blueprint', __name__)
chart_blueprint.data = CSVReader('statistics.csv')
from . import charts
My goal is to cache several of the responses of the route, as the data does not change. However, the more challenging problem is to be able to explicitly purge the data on my fetch script finishing.
How would one go about this? I'm a bit lost but I do imagine I will need to register a before_request on my blueprints

ETag and Expires were made for exactly this:
class CSVReader(object):
def read_if_reloaded(self):
# Do your thing here
self.expires_on = self.calculate_next_load_time()
self.checksum = self.calculate_current_checksum()
#charts.route('/pie')
def pie():
if request.headers.get('ETag') == charts.data.checksum:
return ('', 304, {})
# ...
response = jsonify(my_data)
response.headers['Expires'] = charts.data.expires_on
response.headers['ETag'] = charts.data.checksum
return response

Sean's answer is great for clients that come back and request the same information before the next batch is read in, but it does not help for clients that are coming in cold.
For new clients you can use cache servers such as redis or memcachd that can store the pre-calculated results. These servers are very simple key-value stores, but they are very fast. You can even set how long the values will be valid before it expires.
Cache servers help if calculating the result is time consuming or computationally expensive, but if you are simply returning items from a file it will not make a dramatic improvement.
Here is a flask pattern for using the werkzeug cache interface flask pattern and here is a link to the flask cache extention

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimal way to initialize heavy services only once in FastAPI - python

Related

Get flow run UUID in Prefect 2.0

Connect Function App to CosmosDB with Managed Identity

Python Tornado respond to GET request

How to interact with the UI when testing an application written by kivy?

making a cache for json which can be explicitly purged

Categories

Resources