cerberus schema validator for tuples - python

I have a variable declaration as follows
my_var = typing.List[typing.Tuple[int, int]]
and I want to write a validator as follows
schema_validator = "my_var": {
"type": "list",
"empty": False,
"items": [
{"type": "tuple"},
{"items": [
{"type": "int"}, {"type": "int"}
]}
]
}
In Cerberus documentation it does not specify a validator example for tuples.
How to accomplish this?

Given your typevar typing.List[typing.Tuple[int, int]], you expect an arbritrary length list of two-value tuples where each value is an integer.
class MyValidator(Validator):
# add a type definition to a validator subclass
types_mapping = Validator.types_mapping.copy()
types_mapping['tuple'] = TypeDefinition((tuple,), ())
schema = {
'type': 'list',
'empty': False,
'schema': { # the number of items is undefined
'type': 'tuple',
'items': 2 * ({'type': 'int'},)
}
}
validator = MyValidator(schema)
It's important to understand the difference of the items and the schema rule.
Mind that the default list type actually maps to the more abstract Sequence type and you might want to add another, stricter type for that.

While this isn't the cleanest solution, it will certainly do what you want.
from cerberus import Validator, TypeDefinition
class MyValidator(Validator):
def __init__(self, *args, **kwargs):
# Add the tuple type
tuple_type = TypeDefinition("tuple", (tuple,), ())
Validator.types_mapping["tuple"] = tuple_type
# Call the Validator constructor
super(MyValidator, self).__init__(*args, **kwargs)
def _validate_is_int_two_tuple(self, is_int_two_tuple, field, value):
''' Test that the value is a 2-tuple of ints
The rule's arguments are validated against this schema:
{'type': 'boolean'}
'''
if is_int_two_tuple:
# Check the type
if type(value) != tuple:
self._error(field, "Must be of type 'tuple'")
# Check the length
if len(value) != 2:
self._error(field, "Tuple must have two elements")
# Check the element types
if type(value[0]) != int or type(value[1]) != int:
self._error(field, "Both tuple values must be of type 'int'")
data = {"mylist": [(1,1), (2,2), (3,3)]}
schema = {
"mylist": {
"type": "list",
"schema": {
"type": "tuple",
"is_int_two_tuple": True
}
}
}
v = MyValidator(schema)
print("Validated: {}".format(v.validate(data)))
print("Validation errors: {}".format(v.errors))
print("Normalized result: {}".format(v.normalized(data)))
So as bro-grammer pointed out, the custom data types will get you validation of the types, but that's it. From the schema that you provided, it looks like you also want to validate other features like the length of the tuple and the types of the elements in the tuple. Doing that requires more than just a simple TypeDefinition for tuples.
Extending Validator to include a rule for this specific use-case isn't ideal, but it will do what you want. The more comprehensive solution would be to create a TupleValidator subclass that has rules for validating length, element-types, order, etc. of tuples.

Related

Pydantic: Create model with fixed and extended fields from a Dict[str, OtherModel], the Typescript [key: string] way

From a similar question, the goal is to create a model like this Typescript interface:
interface ExpandedModel {
fixed: number;
[key: string]: OtherModel;
}
However the OtherModel needs to be validated, so simply using:
class ExpandedModel(BaseModel):
fixed: int
class Config:
extra = "allow"
Won't be enough. I tried root (pydantic docs):
class VariableKeysModel(BaseModel):
__root__: Dict[str, OtherModel]
But doing something like:
class ExpandedModel(VariableKeysModel):
fixed: int
Is not possible due to:
ValueError: root cannot be mixed with other fields
Would something like #root_validator (example from another answer) be helpful in this case?
Thankfully, Python is not TypeScript. As mentioned in the comments here as well, an object is generally not a dictionary and dynamic attributes are considered bad form in almost all cases.
You can of course still set attributes dynamically, but they will for example never be recognized by a static type checker like Mypy or your IDE. This means you will not get auto-suggestions for those dynamic fields. Only attributes that are statically defined within the namespace of the class are considered members of that class.
That being said, you can abuse the extra config option to allow arbitrary fields to by dynamically added to the model, while at the same time enforcing all corresponding values to be of a specific type via a root_validator.
from typing import Any
from pydantic import BaseModel, root_validator
class Foo(BaseModel):
a: int
class Bar(BaseModel):
b: str
#root_validator
def validate_foo(cls, values: dict[str, Any]) -> dict[str, Any]:
for name, value in values.items():
if name in cls.__fields__:
continue # ignore statically defined fields here
values[name] = Foo.parse_obj(value)
return values
class Config:
extra = "allow"
Demo:
if __name__ == "__main__":
from pydantic import ValidationError
bar = Bar.parse_obj({
"b": "xyz",
"foo1": {"a": 1},
"foo2": Foo(a=2),
})
print(bar.json(indent=4))
try:
Bar.parse_obj({
"b": "xyz",
"foo": {"a": "string"},
})
except ValidationError as err:
print(err.json(indent=4))
try:
Bar.parse_obj({
"b": "xyz",
"foo": {"not_a_foo_field": 1},
})
except ValidationError as err:
print(err.json(indent=4))
Output:
{
"b": "xyz",
"foo2": {
"a": 2
},
"foo1": {
"a": 1
}
}
[
{
"loc": [
"__root__",
"a"
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
[
{
"loc": [
"__root__",
"a"
],
"msg": "field required",
"type": "value_error.missing"
}
]
A better approach IMO is to just put the dynamic name-object-pairs into a dictionary. For example, you could define a separate field foos: dict[str, Foo] on the Bar model and get automatic validation out of the box that way.
Or you ditch the outer base model altogether for that specific case and just handle the data as a native dictionary with Foo values and parse them all via the Foo model.

FastAPI create a generic response model that would suit requirements

I've been working with FastAPI for some time, it's a great framework.
However real life scenarios can be surprising, sometimes a non-standard approach is necessary. There's a one case I'd like to ask your help with.
There's a strange external requirement that a model response should be formatted as stated in example:
Desired behavior:
GET /object/1
{status: ‘success’, data: {object: {id:‘1’, category: ‘test’ …}}}
GET /objects
{status: ‘success’, data: {objects: [...]}}}
Current behavior:
GET /object/1 would respond:
{id: 1,field1:"content",... }
GET /objects/ would send a List of Object e.g.,:
{
[
{id: 1,field1:"content",... },
{id: 1,field1:"content",... },
...
]
}
You can substitute 'object' by any class, it's just for description purposes.
How to write a generic response model that will suit those reqs?
I know I can produce response model that would contain status:str and (depending on class) data structure e.g ticket:Ticket or tickets:List[Ticket].
The point is there's a number of classes so I hope there's a more pythonic way to do it.
Thanks for help.
Generic model with static field name
A generic model is a model where one field (or multiple) are annotated with a type variable. Thus the type of that field is unspecified by default and must be specified explicitly during subclassing and/or initialization. But that field is still just an attribute and an attribute must have a name. A fixed name.
To go from your example, say that is your model:
{
"status": "...",
"data": {
"object": {...} # type variable
}
}
Then we could define that model as generic in terms of the type of its object attribute.
This can be done using Pydantic's GenericModel like this:
from typing import Generic, TypeVar
from pydantic import BaseModel
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
class GenericSingleObject(GenericModel, Generic[M]):
object: M
class GenericMultipleObjects(GenericModel, Generic[M]):
objects: list[M]
class BaseGenericResponse(GenericModel):
status: str
class GenericSingleResponse(BaseGenericResponse, Generic[M]):
data: GenericSingleObject[M]
class GenericMultipleResponse(BaseGenericResponse, Generic[M]):
data: GenericMultipleObjects[M]
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
As you can see, GenericSingleObject reflects the generic type we want for data, whereas GenericSingleResponse is generic in terms of the type parameter M of GenericSingleObject, which is the type of its data attribute.
If we now want to use one of our generic response models, we would need to specify it with a type argument (a concrete model) first, e.g. GenericSingleResponse[Foo].
FastAPI deals with this just fine and can generate the correct OpenAPI documentation. The JSON schema for GenericSingleResponse[Foo] looks like this:
{
"title": "GenericSingleResponse[Foo]",
"type": "object",
"properties": {
"status": {
"title": "Status",
"type": "string"
},
"data": {
"$ref": "#/definitions/GenericSingleObject_Foo_"
}
},
"required": [
"status",
"data"
],
"definitions": {
"Foo": {
"title": "Foo",
"type": "object",
"properties": {
"a": {
"title": "A",
"type": "string"
},
"b": {
"title": "B",
"type": "integer"
}
},
"required": [
"a",
"b"
]
},
"GenericSingleObject_Foo_": {
"title": "GenericSingleObject[Foo]",
"type": "object",
"properties": {
"object": {
"$ref": "#/definitions/Foo"
}
},
"required": [
"object"
]
}
}
}
To demonstrate it with FastAPI:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericSingleResponse[Foo])
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"object": {"a": "spam", "b": 123}}}
Sending a request to that route returns the following:
{
"status": "foo",
"data": {
"object": {
"a": "spam",
"b": 123
}
}
}
Dynamically created model
If you actually want the attribute name to also be different every time, that is obviously no longer possible with static type annotations. In that case we would have to resort to actually creating the model type dynamically via pydantic.create_model.
In that case there is really no point in genericity anymore because type safety is out of the window anyway, at least for the data model. We still have the option to define a GenericResponse model, which we can specify via our dynamically generated models, but this will make every static type checker mad, since we'll be using variables for types. Still, it might make for otherwise concise code.
We just need to define an algorithm for deriving the model parameters:
from typing import Any, Generic, Optional, TypeVar
from pydantic import BaseModel, create_model
from pydantic.generics import GenericModel
M = TypeVar("M", bound=BaseModel)
def create_data_model(
model: type[BaseModel],
plural: bool = False,
custom_plural_name: Optional[str] = None,
**kwargs: Any,
) -> type[BaseModel]:
data_field_name = model.__name__.lower()
if plural:
model_name = f"Multiple{model.__name__}"
if custom_plural_name:
data_field_name = custom_plural_name
else:
data_field_name += "s"
kwargs[data_field_name] = (list[model], ...) # type: ignore[valid-type]
else:
model_name = f"Single{model.__name__}"
kwargs[data_field_name] = (model, ...)
return create_model(model_name, **kwargs)
class GenericResponse(GenericModel, Generic[M]):
status: str
data: M
Using the same Foo and Bar examples as before:
class Foo(BaseModel):
a: str
b: int
class Bar(BaseModel):
x: float
SingleFoo = create_data_model(Foo)
MultipleBar = create_data_model(Bar, plural=True)
This also works as expected with FastAPI including the automatically generated schemas/documentations:
from fastapi import FastAPI
app = FastAPI()
#app.get("/foo/", response_model=GenericResponse[SingleFoo]) # type: ignore[valid-type]
async def get_one_foo() -> dict[str, object]:
return {"status": "foo", "data": {"foo": {"a": "spam", "b": 123}}}
#app.get("/bars/", response_model=GenericResponse[MultipleBar]) # type: ignore[valid-type]
async def get_multiple_bars() -> dict[str, object]:
return {"status": "bars", "data": {"bars": [{"x": 3.14}, {"x": 0}]}}
Output is essentially the same as with the first approach.
You'll have to see, which one works better for you. I find the second option very strange because of the dynamic key/field name. But maybe that is what you need for some reason.

Flatten nested Pydantic model

from typing import Union
from pydantic import BaseModel, Field
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
When instantiated like this:
OrderItems(**{'name': 'Test','category':{'name': 'Test Cat'}, 'unit': 'kg', 'quantity': 10})
It returns data like this:
OrderItems(name='Test', category=Category(name='Test Cat'), unit='kg', quantity=10)
But I want the output like this:
OrderItems(name='Test', category='Test Cat', unit='kg', quantity=10)
How can I achieve this?
You should try as much as possible to define your schema the way you actually want the data to look in the end, not the way you might receive it from somewhere else.
UPDATE: Generalized solution (one nested field or more)
To generalize this problem, let's assume you have the following models:
from pydantic import BaseModel
class Foo(BaseModel):
x: bool
y: str
z: int
class _BarBase(BaseModel):
a: str
b: float
class Config:
orm_mode = True
class BarNested(_BarBase):
foo: Foo
class BarFlat(_BarBase):
foo_x: bool
foo_y: str
Problem: You want to be able to initialize BarFlat with a foo argument just like BarNested, but the data to end up in the flat schema, wherein the fields foo_x and foo_y correspond to x and y on the Foo model (and you are not interested in z).
Solution: Define a custom root_validator with pre=True that checks if a foo key/attribute is present in the data. If it is, it validates the corresponding object against the Foo model, grabs its x and y values and then uses them to extend the given data with foo_x and foo_y keys:
from pydantic import BaseModel, root_validator
from pydantic.utils import GetterDict
...
class BarFlat(_BarBase):
foo_x: bool
foo_y: str
#root_validator(pre=True)
def flatten_foo(cls, values: GetterDict) -> GetterDict | dict[str, object]:
foo = values.get("foo")
if foo is None:
return values
# Assume `foo` must ba valid `Foo` data:
foo = Foo.validate(foo)
return {
"foo_x": foo.x,
"foo_y": foo.y,
} | dict(values)
Note that we need to be a bit more careful inside a root validator with pre=True because the values are always passed in the form of a GetterDict, which is an immutable mapping-like object. So we cannot simply assign new values foo_x/foo_y to it like we would to a dictionary. But nothing is stopping us from returning the cleaned up data in the form of a regular old dict.
To demonstrate, we can throw some test data at it:
test_dict = {"a": "spam", "b": 3.14, "foo": {"x": True, "y": ".", "z": 0}}
test_orm = BarNested(a="eggs", b=-1, foo=Foo(x=False, y="..", z=1))
test_flat = '{"a": "beans", "b": 0, "foo_x": true, "foo_y": ""}'
bar1 = BarFlat.parse_obj(test_dict)
bar2 = BarFlat.from_orm(test_orm)
bar3 = BarFlat.parse_raw(test_flat)
print(bar1.json(indent=4))
print(bar2.json(indent=4))
print(bar3.json(indent=4))
The output:
{
"a": "spam",
"b": 3.14,
"foo_x": true,
"foo_y": "."
}
{
"a": "eggs",
"b": -1.0,
"foo_x": false,
"foo_y": ".."
}
{
"a": "beans",
"b": 0.0,
"foo_x": true,
"foo_y": ""
}
The first example simulates a common situation, where the data is passed to us in the form of a nested dictionary. The second example is the typical database ORM object situation, where BarNested represents the schema we find in a database. The third is just to show that we can still correctly initialize BarFlat without a foo argument.
One caveat to note is that the validator does not get rid of the foo key, if it finds it in the values. If your model is configured with Extra.forbid that will lead to an error. In that case, you'll just need to have an extra line, where you coerce the original GetterDict to a dict first, then pop the "foo" key instead of getting it.
Original post (flatten single field)
If you need the nested Category model for database insertion, but you want a "flat" order model with category being just a string in the response, you should split that up into two separate models.
Then in the response model you can define a custom validator with pre=True to handle the case when you attempt to initialize it providing an instance of Category or a dict for category.
Here is what I suggest:
from pydantic import BaseModel, validator
class Category(BaseModel):
name: str
class OrderItemBase(BaseModel):
name: str
unit: str | None
quantity: int
class OrderItemCreate(OrderItemBase):
category: Category
class OrderItemResponse(OrderItemBase):
category: str
#validator("category", pre=True)
def handle_category_model(cls, v: object) -> object:
if isinstance(v, Category):
return v.name
if isinstance(v, dict) and "name" in v:
return v["name"]
return v
Here is a demo:
if __name__ == "__main__":
insert_data = '{"name": "foo", "category": {"name": "bar"}, "quantity": 1}'
insert_obj = OrderItemCreate.parse_raw(insert_data)
print(insert_obj.json(indent=2))
... # insert into DB
response_obj = OrderItemResponse.parse_obj(insert_obj.dict())
print(response_obj.json(indent=2))
Here is the output:
{
"name": "foo",
"unit": null,
"quantity": 1,
"category": {
"name": "bar"
}
}
{
"name": "foo",
"unit": null,
"quantity": 1,
"category": "bar"
}
One of the benefits of this approach is that the JSON Schema stays consistent with what you have on the model. If you use this in FastAPI that means the swagger documentation will actually reflect what the consumer of that endpoint receives. You could of course override and customize schema creation, but... why? Just define the model correctly in the first place and avoid headache in the future.
Try this when instantiating:
myCategory = Category(name="test cat")
OrderItems(
name="test",
category=myCategory.name,
unit="kg",
quantity=10)
Well, i was curious, so here's the insane way:
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
def json(self, *args, **kwargs) -> str:
self.__dict__.update({'category': self.__dict__['category'].name})
return super().json(*args, **kwargs)
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)
m.json()
And you get:
'{"name": "sodie", "category": "Dranks", "unit": "can", "quantity": 1}'
The sane way would probably be:
class Category(BaseModel):
name: str = Field(alias="name")
class OrderItems(BaseModel):
name: str = Field(alias="name")
category: Category = Field(alias="category")
unit: Union[str, None] = Field(alias="unit")
quantity: int = Field(alias="quantity")
c = Category(name='Dranks')
m = OrderItems(name='sodie', category=c, unit='can', quantity=1)
r = m.dict()
r['category'] = r['category']['name']

in pydantic.validators.find_validators TypeError: issubclass() arg 1 must be a class

Hello I am reading a JSON with the following format:
{
"1": {"id":1, "type": "a"},
2: {"id":2, "type": "b"},
"3": {"id":3, "type": "c"},
"5": {"id":4, "type": "d"}
}
As you can see the keys are numbers but are not consecutives.
So I have the following BaseModel to the nested dict:
#validate_arguments
class ObjI(BaseModel):
id: int
type: str
The question is how can I validate that all items in the dict are ObjI without use of:
objIs = json.load(open(path))
assert type(objIs) == dict
for objI in objIs.values():
assert type(objI) == dict
ObjI(**pair)
I tried with:
#validate_arguments
class ObjIs(BaseModel):
ObjIs: Dict[Union[str, int], ObjI]
EDIT
The error validating the previous is:
in pydantic.validators.find_validators TypeError: issubclass() arg 1 must be a class
Is this possible?
Thanks
You could change your model definitions to use a custom root type (no need for the validate_arguments decorators):
from pydantic import BaseModel
from typing import Dict
class ObjI(BaseModel):
id: int
type: str
class ObjIs(BaseModel):
__root__: dict[int, ObjI]
The model can now be initialised with the JSON data, e.g. like this:
import json
with open("/path/to/data") as file:
data = json.load(file)
objis = ObjIs.parse_obj(data)
If data contains invalid types (or has missing fields), prase_obj() will raise a ValidationError.
For examples, if data looked like this:
data = {
"1": {"id": "x", "type": "a"},
# ^
# wrong type
2: {"id": 2, "type": "b"},
"3": {"id": 3, "type": "c"},
"4": {"id": 4, "type": "d"},
}
objs = ObjIs.parse_obj(data)
it would result in:
pydantic.error_wrappers.ValidationError: 1 validation error for ObjIs
__root__ -> 1 -> id
value is not a valid integer (type=type_error.integer)
which tells us that the id of the object with key 1 has an invalid type.
(You can catch and handle a ValidationError like any other exception in Python.)
(The pydantic docs also recommend to implement custom __iter__ and __getitem__ methods on the model if you want to access the items in the __root__ field directly.)

How to Read URL param and body typing in Fast API Python

I want to create a generic endpoint definition in Fast API Python that reads URL path parameter and then calls a specific method to do a derealisation.
But I always get
422 Unprocessable Entity
So I expect that it works like so:
/answer/aaa -> handle_generic_answer -> read_item_aaa, type body to ModelAAA
/answer/bbb -> handle_generic_answer -> read_item_bbb, type body to ModelBBB
etc.
Here's the generic endpoint code:
#app.post("/answer/{type}")
def handle_generic_answer(type: str, item):
# I also tried
# def handle_generic_answer(type: str, item: Any):
# or
# def handle_generic_answer(type: str, item: Optional):
switcher = {
'aaaa': read_item_aaa,
'bbb': read_item_bbb,
'nothing': unrecognised_answer
}
func = switcher.get(type, unrecognised_answer)
print('answer >> ' + type)
func(item)
then I have separate methods called based on a type value:
def read_item_aaa(item: ModelAAA):
update_aaa(item)
return {"type": "aaa", "result": "success"}
def read_item_bbb(item: ModelBBB):
update_bbb(item)
return {"type": "bbb", "result": "success"}
and a default -
def unrecognised_answer(type):
print("unrecognised_answer")
raise HTTPException(status_code=400, detail="answer type not found")
return {}
models are defined like this:
from pydantic import BaseModel, Field
class ModelAAA(BaseModel):
field1: str
field2: list = []
But whether I call
http://localhost:8000/answer/aaa
or http://localhost:8000/answer/some-other-url
I always get 422:
{
"detail": [
{
"loc": [
"query",
"item"
],
"msg": "field required",
"type": "value_error.missing"
}
]
}
You forgot to annotate body parameter item.
Without this item is treated as query str parameter. For example:
#app.post("/answer/{type}")
def handle_generic_answer(type: str, item: Union[ModelAAA, ModelBBB]):

Categories

Resources