I'm using FastAPI to serve ML models. My endpoint receives and sends JSON data of the form:
[
{"id": 1, "data": [{"code": "foo", "value": 0.1}, {"code": "bar", "value": 0.2}, ...]},
{"id": 2, "data": [{"code": "baz", "value": 0.3}, {"code": "foo", "value": 0.4}, ...]},
...
]
My models and app look as follows:
from typing import Dict, List
from fastapi import Body
from fastapi.responses import JSONResponse
from pydantic import BaseModel
import pandas as pd
class Item(BaseModel):
code: str
value: float
class Sample(BaseModel):
id: int
data: List[Item]
app = FastAPI()
#app.post("/score", response_model=List[Sample]) # correct response documentation
def score(input_data: List[Sample] = Body(...)): # 1. conversion dict -> Pydantic models, slow
input_df: pd.DataFrame = models_to_df(input_data) # 2. conversion Pydantic models -> df
output_df: pd.DataFrame = predict(input_df)
output_data: Dict = df_to_dict(output_df) # direct conversion df -> dict, fast
return JSONResponse(output_data)
Everything works fine and the automated documentation looks good, but the performance is bad. Since the data can be quite large, Pydantic conversion and validation can take a lot of time.
This can easily be solved by writing direct conversion functions between JSON data and data frames, skipping the intermediary representation of Pydantic models. This is what I did for the response, achieving a 10x speedup, at the same time preserving the automated API documentation with the response_model=List[Sample] argument.
I would like to achieve the same with the request: being able to use custom JSON input parsing, while at the same time preserving API documentation using Pydantic models. Sadly I can't find a way to do it in the FastAPI docs. How can I accomplish this?
You can always accept the raw request, load the request.body() data as bytes and do your own decoding. The schema of the request body should then be documented as a (partial) raw OpenAPI Operation structure using the openapi_extra argument to the #app.post() decorator:
#app.post(
"/score",
response_model=List[Sample],
openapi_extra={
"requestBody": {
"content": {
"application/json": {
"schema": {
"type": "array",
"items": Sample.schema(ref_template="#/components/schemas/{model}"),
}
}
}
}
},
)
async def score(request: Request):
raw_body = await request.body()
# parse the `raw_body` request data (bytes) into your DF directly.
The openapi_extra structure is merged into the operation structure generated from other components (such as the response_model). I used your existing Sample model here to provide the schema for the array items, but you can also map out the whole schema manually.
Instead of using the raw bytes of the body, you could also delegate parsing as JSON to the request object:
data = await request.json()
If there is a way to parse the data as a stream (pushing chunks to a parser), you could avoid the memory overhead of loading the whole body at once by treating the request as a stream in an async loop:
parser = ... # something that can be fed chunks of data
async for chunk in request.stream():
parser.feed(chunk)
This is documented in the Custom OpenAPI path operation schema section in the Advanced User Guide. The same section also covers Us[ing] the Request object directly, and the various options for handling the Request body can be found in the Starlette Request class documentation.
Related
I am trying to figure out ways of validating query string parameters for an API created using AWS API gateway and backed by a Python Lambda function. API Gateway can validate the presence of the required query string parameters. However, I could not find a way for additional validations such as determining if the length of a certain parameter is within some limit (e.g. config_id should be minimum 7 characters long). Such validations are possible for the request body using the API Gateway request validation. Refer this link. However, for the query string paramaters only required/not required validation is possible as it does not use any json schema for validation.
Hence, to overcome this issue, I decided to try the webargs module in Python for validating the query string parameters. It is generally used for request validations for APIs created using Python frameworks such as flask or django. I am using the core parser (Refer webargs doc) as follows:
from webargs import fields, validate, core, ValidationError
parser = core.Parser()
params = {"config_id": fields.Str(required=True, validate=lambda p: len(p) >= 7)}
def main(event, context: Dict):
try:
# print(event["queryStringParameters"])
input_params = event.get("queryStringParameters")
print("queryStringParameters: ", str(input_params))
if input_params is None:
input_params = {}
parsed_params = parser.parse(params, input_params)
print("parsedParams: ", str(parsed_params))
except ValidationError as e:
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": True,
"x-amzn-ErrorType": "ValidationError",
},
"body": str(e),
}
This is how the validation is done in the lambda function. However, only the required validation works correctly. When I pass a config_id of length 5 it does not return any error and proceeds further in the lambda function.
What could be going wrong with this? The parser seems to work, however, the validate function doesn't.
Any help is appreciated as I am new to this. Also, is there a better way of doing validations in lambda functions especially for queryStringParameters? It can be handled by the code, but we can have many parameters and many APIs which makes writing code for all such validations a cumbersome task. The webargs module comes in handy.
webargs Library is mostly used for validating HTTP Requests coming via popular Python frameworks like Flask, Django, Bottle etc. The core Parser that you are trying to use should not be used directly as it does not have the methods like load_json, load_query etc implemented (Source code showing the missing implementation here). There are child class implementations of the core parser for each of the frameworks, but using them on API GW does not make sense.
So it's better to use a simpler json validation library like jsonschema. I've modified your code to use jsonschema instead of webargs as follows -
from jsonschema import validate, ValidationError
schema = {
"type" : "object",
"properties" : {
"queryStringParameters" : {
"type" : "object",
"properties": {
"config_id": {
"type": "string",
"minLength": 7,
}
}
},
},
}
def main(event, context):
try:
validate(instance=event, schema=schema)
except ValidationError as e:
return {
"statusCode": 400,
"headers": {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Credentials": True,
"x-amzn-ErrorType": "ValidationError",
},
"body": e.message,
}
It's when I send a PUT request to my API endpoint from python with a JSON request body I receive empty request body, because sometimes It's containing special characters which is not supported by JSON.
How can I sanitize my JSON before sending my request?
I've tried with stringify and parsing json before I sent my request!
profile = json.loads(json.dumps(profile))
My example invalid json is:
{
"url": "https://www.example.com/edmund-chand/",
"name": "Edmund Chand",
"current_location": "FrankfurtAmMainArea, Germany",
"education": [],
"skills": []
}
and My expected validated json should be:
{
"url": "https://www.example.com/edmund-chand/",
"name": "Edmund Chand",
"current_location": "Frankfurt Am Main Area, Germany",
"education": [],
"skills": []
}
If you're looking for something quick to sanitize json data for limited fields i.e. current_location, you can try something like the following below:
def sanitize(profile):
profile['current_location'] = ', '.join([val.strip() for val in profile['current_location'].split(',')])
return profile
profile = sanitize(profile)
The idea here is that you would write code to sanitize each bits in that function and send it your api or throw exception if invalid etc.
For more robust validation, you can consider using jsonschema package. More details here.
With that package you can validate strings and json schema more flexibly.
Example taken from the package readme:
from jsonschema import validate
# A sample schema, like what we'd get from json.load()
schema = {
"type" : "object",
"properties" : {
"url" : {"type" : "string", "format":"uri"},
"current_location" : {"type" : "string", "maxLength":25, "pattern": "your_regex_pattern"},
},
}
# If no exception is raised by validate(), the instance is valid.
validate(instance=profile, schema=schema)
You can find more infor and types of available validation for strings here.
Thank you #Rithin for your solution but that one seems more coupled with one field of the whole JSON.
I found a solution to replace it with below example code which works for any field:
profile = json.loads(json.dumps(profile).replace("\t", " "))
I am trying to write some test cases for some code I've developed using Elasticsearch and Django. The concept is straightforward - I just want to test a get request, which will be an Elasticsearch query. However, I am constructing the query as a nested dict. When I pass the nested dict to the Client object in the test script it gets passed through Django until it ends up at the urlencode function which doesn't look like it can handle nested dicts only MultivalueDicts. Any suggestions or solutions? I don't want to use any additional packages as I don't want to depend on potentially non-supported packages for this application.
Generic Code:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", query)
print(response)
Link for urlencode function: urlencode Django
The issue is clearly at the conditional statement when the urlencode function checks if the dictionary value is a str or bytes object. If it isn't it creates a generator object which can never access the nested portions of the dictionary.
EDIT: 07/25/2018
So I was able to come up with a temporary work around to at least run the test. However, it is ugly and I feel like there must be a better way. The first thing I tried was specifying the content_type and converting the dict to a json string first. However, Django still kicked back and error in the urlencode function.
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
response = client.get("", data=json.dumps(query), content_type="application/json")
print(response)
So instead I had to do:
class MyViewTest(TestCase):
es_connection = elasticsearch.Elasticsearch("localhost:9200")
def test_es_query(self):
client = Client()
query = {
"query": {
"term": {
"city": "some city"
}
}
}
query = json.dumps(query)
response = client.get("", data={"q": query}, content_type="application/json")
print(response)
This let me send the HttpRequest to my View and parse it back out using:
json.loads(request.GET["q"])
Then I was able to successfully get the requested data from Elasticsearch and return it as an HttpResponse. I feel like in Django though there has to be a way to just pass a json formatted string directly to the Client object's get function. I thought specifying the content_type as application/json would work but it still calls the urlencode function. Any ideas? I really don't want to implement this current system into production.
I would like to load a JSON object from mongo query, access part of the object and then have this serialised and sent to the web client. I can get the web client to work not the python side. Any examples would be great..
The request is a serialised JSON object like this
{"page":"home"}
the mongo query results would be like this, but each "home", "hello" etc has many more keys and values in them.
{ "_id" : ObjectId("5a8838782430d5ae7526e9eb"), "home" : { "next_page" : "/stats"}, "hello" : { "next_page" : "/home" }, "stats" : { "next_page" : "/hello" } }
The views.py code includes this, which I provide as pseudocode.
#view_config(route_name='nav_route', renderer='json')
def content(request):
page_name=request.page #that is, I want page_name="home"
mongo_json=mongo_query(woof) #load and parse the above query
page_nav=mongo_json[page_name] #get the part for "home"
return {'nav': page_nav}
UPDATE: With feedback this now works - cheers!
#view_config(route_name='rachel_content', renderer='json')
def rachel_content(request):
page=request.json_body['page']
result=db.rachel.find_one()
return { 'rachel' : json.dumps(result[page])}
(Rachel is named after the replicant)
I would suggest you simply need to dump the JSON data. I tend to default to using the simplejson package so would add this to the top of the file.
import simplejson as json
Then simply tell it to dump your dictionary object to a JSON string by using the dumps function:
return {'nav': json.dumps(page_nav)}
I believe you can do the same with the built-in JSON library if you don't want/can't install simplejson
I am using the following filters in Postman to make a POST request in a Web API but I am unable to make a simple POST request in Python with the requests library.
First, I am sending a POST request to this URL (http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets) with the following filters in Postman applied to the Body, with the raw and JSON(application/json) options selected.
Filters in Postman
{
"filter": {
"filters": [
{
"field": "RCA_Assigned_Date",
"operator": "gte",
"value": "2017-05-31 00:00:00"
},
{
"field": "RCA_Assigned_Date",
"operator": "lte",
"value": "2017-06-04 00:00:00"
},
{
"field": "T_Subcategory",
"operator": "neq",
"value": "Temporary Degradation"
},
{
"field": "Issue_Status",
"operator": "neq",
"value": "Queued"
}],
"logic": "and"
}
}
The database where the data is stored is Cassandra and according to the following links Cassandra not equal operator, Cassandra OR operator,
Cassandra Between order by operators, Cassandra does not support the NOT EQUAL TO, OR, BETWEEN operators, so there is no way I can filter the URL with these operators except with AND.
Second, I am using the following code to apply a simple filter with the requests library.
import requests
payload = {'field':'T_Subcategory','operator':'neq','value':'Temporary Degradation'}
url = requests.post("http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets",data=payload)
But what I've got is the complete data of tickets instead of only those that are not temporary degradation.
Third, the system is actually working but we are experiencing a delay of 2-3 mins to see the data. The logic goes as follows: We have 8 users and we want to see all the tickets per user that are not temporary degradation, then we do:
def get_json():
if user_name == "user 001":
with urllib.request.urlopen(
"http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets?user_name=user&001",timeout=15) as url:
complete_data = json.loads(url.read().decode())
elif user_name == "user 002":
with urllib.request.urlopen(
"http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets?user_name=user&002",timeout=15) as url:
complete_data = json.loads(url.read().decode())
return complete_data
def get_tickets_not_temp_degradation(start_date,end_date,complete_):
return Counter([k['user_name'] for k in complete_data if start_date < dateutil.parser.parse(k.get('DateTime')) < end_date and k['T_subcategory'] != 'Temporary Degradation'])
Basically, we get the whole set of tickets from the current and last year, then we let Python to filter the complete set by user and so far there are only 10 users which means that this process is repeated 10 times and makes me no surprise to discover why we get the delay...
My questions is how can I fix this problem of the requests library? I am using the following link Requests library documentation as a tutorial to make it working but it just seems that my payload is not being read.
Your Postman request is a JSON body. Just reproduce that same body in Python. Your Python code is not sending JSON, nor is it sending the same data as your Postman sample.
For starters, sending a dictionary via the data arguments encodes that dictionary to application/x-www-form-urlencoded form, not JSON. Secondly, you appear to be sending a single filter.
The following code replicates your Postman post exactly:
import requests
filters = {"filter": {
"filters": [{
"field": "RCA_Assigned_Date",
"operator": "gte",
"value": "2017-05-31 00:00:00"
}, {
"field": "RCA_Assigned_Date",
"operator": "lte",
"value": "2017-06-04 00:00:00"
}, {
"field": "T_Subcategory",
"operator": "neq",
"value": "Temporary Degradation"
}, {
"field": "Issue_Status",
"operator": "neq",
"value": "Queued"
}],
"logic": "and"
}}
url = "http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets"
response = requests.post(url, json=filters)
Note that filters is a Python data structure here, and that it is passed to the json keyword argument. Using the latter does two things:
Encode the Python data structure to JSON (producing the exact same JSON value as your raw Postman body value).
Set the Content-Type header to application/json (as you did in your Postman configuration by picking the JSON option in the dropdown menu after picking raw for the body).
requests is otherwise just an HTTP API, it can't make Cassandra do any more than any other HTTP library. The urllib.request.urlopen code sends GET requests, and are trivially translated to requests with:
def get_json():
url = "http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets"
response = requests.get(url, params={'user_name': user}, timeout=15)
return response.json()
I removed the if branching and replaced that with using the params argument, which translates a dictionary of key-value pairs to a correctly encoded URL query (passing in the user name as the user_name key).
Note the json() call on the response; this takes care of decoding JSON data coming back from the server. This still takes long, you are not filtering the Cassandra data much here.
I would recommend using the json attribute instead of data. It handles the dumping for you.
import requests
data = {'user_name':'user&001'}
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
url = "http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets/"
r = requests.post(url, headers=headers, json=data)
Update, answer for question 3. Is there a reason you are using urllib? I’d use python requests as well for this request.
import requests
def get_json():
r = requests.get("http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets”, params={"user_name": user_name.replace(" ", "&")})
return r.json
# not sure what you’re doing here, more context/code example would help
def get_tickets_not_temp_degradation(start_date, end_date, complete_):
return Counter([k['user_name'] for k in complete_data if start_date < dateutil.parser.parse(k.get('DateTime')) < end_date and k['T_subcategory'] != 'Temporary Degradation'])
Also, is the username really supposed to be user+001 and not user&001 or user 001?
I think, you can use requests library as follows:
import requests
import json
payload = {'field':'T_Subcategory','operator':'neq','value':'Temporary Degradation'}
url = requests.post("http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets",data=json.dumps(payload))
You are sending user in url, use it through post, but its depend upon how end points are implemented. You can try the below code :
import requests
from json import dumps
data = {'user_name':'user&001'}
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
url = "http://10.61.202.98:8081/T/a/api/rows/cat/ect/tickets/"
r = requests.post(url, headers=headers, data=dumps(data))