Can FastAPI/Pydantic individually validate input items in a list?

Can FastAPI/Pydantic individually validate input items in a list? - python

I have a FastAPI post method:
from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from pydantic import BaseModel
from typing import List
import pandas as pd
class InputItem(BaseModel):
Feature: str
class Item(BaseModel):
Feature: str
Result: str
app = FastAPI()
#app.post("/", response_model=List[OutputItem])
def my_function(input: List[InputItem]):
df = pd.DataFrame(jsonable_encoder(input))
result = df.apply(another_method)
return result.to_dict(orient=records)
My question is, if I pass it a list like this:
[
{"NOTFeature":"value"},
{"Feature":"value"},
{"Feature":"value"}
]
or if one of the values is of a different data type, the whole thing currently fails and returns an error. Is there a way to get it to handle the error so that the failing entry is skipped, and the API function is still carried out for items in the list which do pass validation?
Incidentally, if there's a smoother way to handle the dataframe conversion which still uses the dataframe (these are essential for the other data handling done in the functions), this would be very helpful to know as well!

welcome to Stack Overflow.
The short answer for your question is no. That's because it's not pydantic (and also FastAPI) responsability to handle payload contents or fix malformed payloads.
The right way you could do that is to make the Feature member Optional and filter out when it gets to your method, something like this:
import fastapi
import typing
import pydantic
class InputItem(pydantic.BaseModel):
feature: typing.Optional[str]
class OutputItem(pydantic.BaseModel):
Feature: str
Result: str
app = fastapi.FastAPI()
#app.post("/", response_model=typing.List[OutputItem])
def my_function(data: typing.List[InputItem]):
data = [i for i in data if i.feature is not None]
print(data)
# ... do what you gotta do

Union with dict as a passthrough seemed to work for me, i.e.:
from typing import List, Union
#app.post("/", response_model=List[OutputItem])
def my_function(input: List[Union[InputItem, dict]]):
df = pd.DataFrame(jsonable_encoder(input))
result = df.apply(another_method)
return result.to_dict(orient=records)

Related

Make Pydantic BaseModel fields optional including sub-models for PATCH

As already asked in similar questions, I want to support PATCH operations for a FastApi application where the caller can specify as many or as few fields as they like, of a Pydantic BaseModel with sub-models, so that efficient PATCH operations can be performed, without the caller having to supply an entire valid model just in order to update two or three of the fields.
I've discovered there are 2 steps in Pydantic PATCH from the tutorial that don't support sub-models. However, Pydantic is far too good for me to criticise it for something that it seems can be built using the tools that Pydantic provides. This question is to request implementation of those 2 things while also supporting sub-models:
generate a new DRY BaseModel with all fields optional
implement deep copy with update of BaseModel
These problems are already recognised by Pydantic.
There is discussion of a class based solution to the optional model
And there two issues open on the deep copy with update
A similar question has been asked one or two times here on SO and there are some great answers with different approaches to generating an all-fields optional version of the nested BaseModel. After considering them all this particular answer by Ziur Olpa seemed to me to be the best, providing a function that takes the existing model with optional and mandatory fields, and returning a new model with all fields optional: https://stackoverflow.com/a/72365032
The beauty of this approach is that you can hide the (actually quite compact) little function in a library and just use it as a dependency so that it appears in-line in the path operation function and there's no other code or boilerplate.
But the implementation provided in the previous answer did not take the step of dealing with sub-objects in the BaseModel being patched.
This question therefore requests an improved implementation of the all-fields-optional function that also deals with sub-objects, as well as a deep copy with update.
I have a simple example as a demonstration of this use-case, which although aiming to be simple for demonstration purposes, also includes a number of fields to more closely reflect the real world examples we see. Hopefully this example provides a test scenario for implementations, saving work:
import logging
from datetime import datetime, date
from collections import defaultdict
from pydantic import BaseModel
from fastapi import FastAPI, HTTPException, status, Depends
from fastapi.encoders import jsonable_encoder
app = FastAPI(title="PATCH demo")
logging.basicConfig(level=logging.DEBUG)
class Collection:
collection = defaultdict(dict)
def __init__(self, this, that):
logging.debug("-".join((this, that)))
self.this = this
self.that = that
def get_document(self):
document = self.collection[self.this].get(self.that)
if not document:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Not Found",
)
logging.debug(document)
return document
def save_document(self, document):
logging.debug(document)
self.collection[self.this][self.that] = document
return document
class SubOne(BaseModel):
original: date
verified: str = ""
source: str = ""
incurred: str = ""
reason: str = ""
attachments: list[str] = []
class SubTwo(BaseModel):
this: str
that: str
amount: float
plan_code: str = ""
plan_name: str = ""
plan_type: str = ""
meta_a: str = ""
meta_b: str = ""
meta_c: str = ""
class Document(BaseModel):
this: str
that: str
created: datetime
updated: datetime
sub_one: SubOne
sub_two: SubTwo
the_code: str = ""
the_status: str = ""
the_type: str = ""
phase: str = ""
process: str = ""
option: str = ""
#app.get("/endpoint/{this}/{that}", response_model=Document)
async def get_submission(this: str, that: str) -> Document:
collection = Collection(this=this, that=that)
return collection.get_document()
#app.put("/endpoint/{this}/{that}", response_model=Document)
async def put_submission(this: str, that: str, document: Document) -> Document:
collection = Collection(this=this, that=that)
return collection.save_document(jsonable_encoder(document))
#app.patch("/endpoint/{this}/{that}", response_model=Document)
async def patch_submission(
document: Document,
# document: optional(Document), # <<< IMPLEMENT optional <<<
this: str,
that: str,
) -> Document:
collection = Collection(this=this, that=that)
existing = collection.get_document()
existing = Document(**existing)
update = document.dict(exclude_unset=True)
updated = existing.copy(update=update, deep=True) # <<< FIX THIS <<<
updated = jsonable_encoder(updated)
collection.save_document(updated)
return updated
This example is a working FastAPI application, following the tutorial, and can be run with uvicorn example:app --reload. Except it doesn't work, because there's no all-optional fields model, and Pydantic's deep copy with update actually overwrites sub-models rather than updating them.
In order to test it the following Bash script can be used to run curl requests. Again I'm supplying this just to hopefully make it easier to get started with this question.
Just comment out the other commands each time you run it so that the command you want is used.
To demonstrate this initial state of the example app working you would run GET (expect 404), PUT (document stored), GET (expect 200 and same document returned), PATCH (expect 200), GET (expect 200 and updated document returned).
host='http://127.0.0.1:8000'
path="/endpoint/A123/B456"
method='PUT'
data='
{
"this":"A123",
"that":"B456",
"created":"2022-12-01T01:02:03.456",
"updated":"2023-01-01T01:02:03.456",
"sub_one":{"original":"2022-12-12","verified":"Y"},
"sub_two":{"this":"A123","that":"B456","amount":0.88,"plan_code":"HELLO"},
"the_code":"BYE"}
'
# method='PATCH'
# data='{"this":"A123","that":"B456","created":"2022-12-01T01:02:03.456","updated":"2023-01-02T03:04:05.678","sub_one":{"original":"2022-12-12","verified":"N"},"sub_two":{"this":"A123","that":"B456","amount":123.456}}'
method='GET'
data=''
if [[ -n data ]]; then data=" --data '$data'"; fi
curl="curl -K curlrc -X $method '$host$path' $data"
echo $curl >&2
eval $curl
This curlrc will need to be co-located to ensure the content type headers are correct:
--cookie "_cookies"
--cookie-jar "_cookies"
--header "Content-Type: application/json"
--header "Accept: application/json"
--header "Accept-Encoding: compress, gzip"
--header "Cache-Control: no-cache"
So what I'm looking for is the implementation of optional that is commented out in the code, and a fix for existing.copy with the update parameter, that will enable this example to be used with PATCH calls that omit otherwise mandatory fields.
The implementation does not have to conform precisely to the commented out line, I just provided that based on Ziur Olpa's previous answer.

When I first posed this question I thought that the only problem was how to turn all fields Optional in a nested BaseModel, but actually that was not difficult to fix.
The real problem with partial updates when implementing a PATCH call is that the Pydantic BaseModel.copy method doesn't attempt to support nested models when applying it's update parameter. That's quite an involved task for the generic case, considering you may have fields that are dicts, lists, or sets of another BaseModel, just for instance. Instead it just unpacks the dict using **: https://github.com/pydantic/pydantic/blob/main/pydantic/main.py#L353
I haven't got a proper implementation of that for Pydantic, but since I've got a working example PATCH by cheating, I'm going to post this as an answer and see if anyone can fault it or provide better, possibly even with an implementation of BaseModel.copy that supports updates for nested models.
Rather than post the implementations separately I am going to update the example given in the question so that it has a working PATCH and being a full demonstration of PATCH hopefully this will help others more.
The two additions are partial and merge. partial is what's referred to as optional in the question code.
partial:
This is a function that takes any BaseModel and returns a new BaseModel with all fields Optional, including sub-object fields. That's enough for Pydantic to allow through any sub-set of fields without throwing an error for "missing fields". It's recursive - not really popular - but given these are nested data models the depth is not expected to exceed single digits.
merge:
The BaseModel update on copy method operates on an instance of BaseModel - but supporting all the possible type variations when descending through a nested model is the hard part - and the database data, and the incoming update, are easily available as plain Python dicts; so this is the cheat: merge is an implementation of a nested dict update instead, and since the dict data has already been validated at one point or other, it should be fine.
Here's the full example solution:
import logging
from typing import Optional, Type
from datetime import datetime, date
from functools import lru_cache
from pydantic import BaseModel, create_model
from collections import defaultdict
from pydantic import BaseModel
from fastapi import FastAPI, HTTPException, status, Depends, Body
from fastapi.encoders import jsonable_encoder
app = FastAPI(title="Nested model PATCH demo")
logging.basicConfig(level=logging.DEBUG)
class Collection:
collection = defaultdict(dict)
def __init__(self, this, that):
logging.debug("-".join((this, that)))
self.this = this
self.that = that
def get_document(self):
document = self.collection[self.this].get(self.that)
if not document:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Not Found",
)
logging.debug(document)
return document
def save_document(self, document):
logging.debug(document)
self.collection[self.this][self.that] = document
return document
class SubOne(BaseModel):
original: date
verified: str = ""
source: str = ""
incurred: str = ""
reason: str = ""
attachments: list[str] = []
class SubTwo(BaseModel):
this: str
that: str
amount: float
plan_code: str = ""
plan_name: str = ""
plan_type: str = ""
meta_a: str = ""
meta_b: str = ""
meta_c: str = ""
class SubThree(BaseModel):
one: str = ""
two: str = ""
class Document(BaseModel):
this: str
that: str
created: datetime
updated: datetime
sub_one: SubOne
sub_two: SubTwo
# sub_three: dict[str, SubThree] = {} # Hah hah not really
the_code: str = ""
the_status: str = ""
the_type: str = ""
phase: str = ""
process: str = ""
option: str = ""
#lru_cache
def partial(baseclass: Type[BaseModel]) -> Type[BaseModel]:
"""Make all fields in supplied Pydantic BaseModel Optional, for use in PATCH calls.
Iterate over fields of baseclass, descend into sub-classes, convert fields to Optional and return new model.
Cache newly created model with lru_cache to ensure it's only created once.
Use with Body to generate the partial model on the fly, in the PATCH path operation function.
- https://stackoverflow.com/questions/75167317/make-pydantic-basemodel-fields-optional-including-sub-models-for-patch
- https://stackoverflow.com/questions/67699451/make-every-fields-as-optional-with-pydantic
- https://github.com/pydantic/pydantic/discussions/3089
- https://fastapi.tiangolo.com/tutorial/body-updates/#partial-updates-with-patch
"""
fields = {}
for name, field in baseclass.__fields__.items():
type_ = field.type_
if type_.__base__ is BaseModel:
fields[name] = (Optional[partial(type_)], {})
else:
fields[name] = (Optional[type_], None) if field.required else (type_, field.default)
# https://docs.pydantic.dev/usage/models/#dynamic-model-creation
validators = {"__validators__": baseclass.__validators__}
return create_model(baseclass.__name__ + "Partial", **fields, __validators__=validators)
def merge(original, update):
"""Update original nested dict with values from update retaining original values that are missing in update.
- https://github.com/pydantic/pydantic/issues/3785
- https://github.com/pydantic/pydantic/issues/4177
- https://docs.pydantic.dev/usage/exporting_models/#modelcopy
- https://github.com/pydantic/pydantic/blob/main/pydantic/main.py#L353
"""
for key in update:
if key in original:
if isinstance(original[key], dict) and isinstance(update[key], dict):
merge(original[key], update[key])
elif isinstance(original[key], list) and isinstance(update[key], list):
original[key].extend(update[key])
else:
original[key] = update[key]
else:
original[key] = update[key]
return original
#app.get("/endpoint/{this}/{that}", response_model=Document)
async def get_submission(this: str, that: str) -> Document:
collection = Collection(this=this, that=that)
return collection.get_document()
#app.put("/endpoint/{this}/{that}", response_model=Document)
async def put_submission(this: str, that: str, document: Document) -> Document:
collection = Collection(this=this, that=that)
return collection.save_document(jsonable_encoder(document))
#app.patch("/endpoint/{this}/{that}", response_model=Document)
async def patch_submission(
this: str,
that: str,
document: partial(Document), # <<< IMPLEMENTED partial TO MAKE ALL FIELDS Optional <<<
) -> Document:
collection = Collection(this=this, that=that)
existing_document = collection.get_document()
incoming_document = document.dict(exclude_unset=True)
# VVV IMPLEMENTED merge INSTEAD OF USING BROKEN PYDANTIC copy WITH update VVV
updated_document = jsonable_encoder(merge(existing_document, incoming_document))
collection.save_document(updated_document)
return updated_document

Python. Attach data to asyncio.Task

Is there proper way to attach additional data to asyncio.create_task()? The example is
import asyncio
from dataclasses import dataclass
#dataclass
class Foo:
name: str
url_to_download: str
size: int
...
async def download_file(url: str):
return await download_impl()
objs: list[Foo] = [obj1, obj2, ...]
# Question
# How to attach the Foo object to the each task?
tasks = [asyncio.create_task(download_file(obj.url_to_download)) for obj in objs]
for task in asyncio.as_completed(tasks):
# Question
# How to find out which Foo obj corresponds the downloaded data?
data = await task
process(data)
There is also way to forward the Foo object to the download_file and return it with the downloaded data, but it is poor design. Do I miss something, or anyone has a better design to solve that problem?

If I understood correctly, you just want a way to match the returned values from all your tasks to the instances of Foo whose url_to_download attributes you passed as arguments to said tasks.
Since all you are doing in that last loop is blocking until all tasks are completed, you may as well simply run the coroutines concurrently via asyncio.gather. The order of the individual return values in the list it returns corresponds to the order of the coroutines passed to it as arguments:
from asyncio import gather, run
from dataclasses import dataclass
#dataclass
class Foo:
url_to_download: str
...
async def download_file(url: str):
return url
async def main() -> None:
objs: list[Foo] = [Foo("foo"), Foo("bar"), Foo("baz")]
returned_values = await gather(
*(download_file(obj.url_to_download) for obj in objs)
)
print(returned_values) # ['foo', 'bar', 'baz']
if __name__ == '__main__':
run(main())
That means you can simply match the objs and returned_values via index or zip them or whatever you need to do.
As for your comment that saving the returned the returned value in an attribute of the object itself is "poor design", I see absolutely no justification for that assessment. That would also be a perfectly valid way and arguably even cleaner. You might even define a download method on Foo for that purpose. But that is another discussion.

list all compute instances in a specific region gcp with python

So, I can list my instances by zones using this API.
GET https://compute.googleapis.com/compute/v1/projects/{project}/zones/{zone}/instances.
I want now to filter my instances by region. Any idea how can I do this (using python)?

You can use aggregated_list(), to list all your instances on your project. Filtering via region could be done on the actual code. See code below where I used regex to mimic a filter using region variable.
from typing import Dict, Iterable
from google.cloud import compute_v1
import re
def list_all_instances(
project_id: str,
region: str
) -> Dict[str, Iterable[compute_v1.Instance]]:
instance_client = compute_v1.InstancesClient()
request = {
"project" : project_id,
}
agg_list = instance_client.aggregated_list(request=request)
all_instances = {}
print("Instances found:")
for zone, response in agg_list:
if response.instances:
if re.search(f"{region}*", zone):
all_instances[zone] = response.instances
print(f" {zone}:")
for instance in response.instances:
print(f" - {instance.name} ({instance.machine_type})")
return all_instances
list_all_instances(project_id="your-project-id",region="us-central1") #used us-central1 for testing
NOTE: Code above is from this code. I just modified it to apply the filtering above.
Actual instances on my GCP account:
Result from code above (only zones with prefix us-central1 were displayed):

FastApi urls path and 2 values into a query

im using FastApi and get some troubles with url.
i have a root url
#app.get("/myurl")
http://host/myurl
and
http://host/myurl?id=2
and here function returns all info from needed table.
on url like http://host/myurl?id=2&type=3 i need to get another query from table. how i need to create function because now this http://host/myurl?id=2 overlapping this function http://host/myurl?id=2&type=3
how i can use multiple urls with different values in it in fastapi?
and i want to know how to make url like http://host/myurl?id=2&type=3,2 to return result from table for two types (query example is SELECT * from mytable WHERE id=%(id)s and type IN (1,2) but type IN (,) should be parameters which i need to inpout

how i can use multiple urls with different values in it in fastapi?
As far as I know, you can't. But fortunately, you don't need to. What you can do is define only one route ("/myurl") with both parameters id and type, and set the second as optional.
Then, if you don't receive type, you process a different query.
By the way, don't use id and type as parameter names, that will mess with the name of the in-built function id()and type().
Here a working example:
from fastapi import FastAPI, Query
app = FastAPI()
#app.get("/myurl")
async def my_url(my_id: int = Query(...), my_type: int = Query(None)):
if my_type:
return f"You gave an id ({my_id}) and a type ({my_type})."
return f"You gave only an id ({my_id}) but no type."
i want to know how to make url like http://host/myurl?id=2&type=3,2
Not sure you can do it at all. What you can do is add the type parameter several times, like this:
http://host/myurl?my_id=2&my_type=3&my_type=2
In this case, you need to slightly change your code:
from fastapi import FastAPI, Query
from typing import List
app = FastAPI()
#app.get("/myurl")
async def my_url(my_id: int = Query(...), my_type: List[int] = Query(None)):
if my_type:
if len(my_type) > 1:
return f"You gave an id ({my_id}) and a list of types ({my_type})."
else:
return f"You gave an id ({my_id}) and a type ({my_type})."
return f"You gave only an id ({my_id}) but no type."
Note that you'll always receive my_type as a list then, even if you pass it only one time.

How to post an image file with a list of strings using FastAPI?

I have tried a lot of things, but it doesn't seem to work. Here is my code:
#app.post("/my-endpoint")
async def my_func(
languages: List[str] = ["en", "hi"], image: UploadFile = File(...)
):
The function works fine when I remove one of the parameters, but with both of the parameters, the retrieved list comes out to be like ["en,hi"], whereas I want it to be ["en, "hi].
I am not even sure if my approach is correct, hence the broader question, if this approach is not right then how can I post a list and an image together?

Your function looks just fine. That behaviour though has to do with how FastAPI autodocs (Swagger UI)—I am assuming you are using it for testing, as I did myself and noticed the exact same behaviour—handles the list items. For some reason, the Swagger UI/OpenAPI adds all items as a single item to the list, separated by comma (i.e., ["en, hi, ..."], instead of ["en", "hi", ...]).
Testing the code with Python requests and sending the languages' list in the proper way, it works just fine. To fix, however, the behaviour of Swagger UI, or any other tool that might behave the same, you could perform a check on the length of the list that is received in the function, and if it is equal to 1 (meaning that the list contains a single item), then split this item using comma delimiter to get a new list with all languages included.
Below is a working example:
app.py
from fastapi import File, UploadFile, FastAPI
from typing import List
app = FastAPI()
#app.post("/submit")
def submit(languages: List[str] = ["en", "hi"], image: UploadFile = File(...)):
if (len(languages) == 1):
languages= [item.strip() for item in languages[0].split(',')]
return {"Languages ": languages, "Uploaded filename": image.filename}
test.py
import requests
url = 'http://127.0.0.1:8000/submit'
image = {'image': open('sample.png', 'rb')}
#payload ={"languages": ["en", "hi"]} # send languages as separate items
payload ={"languages": "en, hi"} # send languages as a single item
resp = requests.post(url=url, data=payload, files=image)
print(resp.json())

I solved this using Query parameters! This might be helpful for someone, though I think Chris' answer makes much more sense -
#app.post("/my-endpoint")
async def my_func(
languages: List[str] = Query(["en", "hi"]), image: UploadFile = File(...)
):

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can FastAPI/Pydantic individually validate input items in a list? - python

Related

Make Pydantic BaseModel fields optional including sub-models for PATCH

Python. Attach data to asyncio.Task

list all compute instances in a specific region gcp with python

FastApi urls path and 2 values into a query

How to post an image file with a list of strings using FastAPI?

Categories

Resources