I am looking for something like the pydantic.Field( discriminator = x ) that scales to a large number of dataclasses.
I ask for a solution instead of using the above because my understanding is that the out-of-the-box Field discriminator requires the user to write a Union[...] of types, and I think it is unfeasible to do (not to mention maintain) for 100+ types in the type hint.
BONUS: A solution that also maintains type hints s.t. I can run mypy or similar typechecks after parsing in the data would be awesome - but I think I can engineer in that bit if I just figure out the best way to read in the data first.
ref:
(https://pydantic-docs.helpmanual.io/usage/types/#discriminated-unions-aka-tagged-unions)
Example of code I would like to run (modified from pydantic's website)
from typing import Literal, Union
from pydantic import BaseModel, Field, ValidationError
class Base(BaseModel):
t : str # the discriminator field!
class One(Base):
t: Literal['one']
class Two(Base):
t: Literal['two']
... # many, many dataclasses (that get from schema)
class NinetyNineThousand(Base):
t: Literal['big number!']
class Model(BaseModel):
pet: Union[One, Two, ... , # This part is where I look for
..., FourtyTwo, ... , # something more elegant
NinetyNineThousand
] = Field(..., discriminator='t')
test = Model(pet={'t':'sixtynine'})
assert isinstance(test,SixtyNine) # should be a SixtyNine
PS. If someone from the attrs team sees this - now is a great chance to mint a new fan by giving an elegant solution! ;)
Related
I am trying to automatically convert a Pydantic model to a DB schema. To do that, I am recursively looping through a Pydantic model's fields to determine the type of field.
As an example, I have this simple model:
from typing import List
from pydantic import BaseModel
class TestModel(BaseModel):
tags: List[str]
I am recursing through the model using the __fields__ property as described here: https://docs.pydantic.dev/usage/models/#model-properties
If I do type(TestModel).__fields__['tags'] I see:
ModelField(name='tags', type=List[str], required=True)
I want to programatically check if the ModelField type has a List origin. I have tried the following, and none of them work:
type(TestModel).__fields__['tags'].type_ is List[str]
type(TestModel).__fields__['tags'].type_ == List[str]
typing.get_origin(type(TestModel).__fields__['tags'].type_) is List
typing.get_origin(type(TestModel).__fields__['tags'].type_) == List
Frustratingly, this does return True:
type(TestModel).__fields__['tags'].type_ is str
What is the correct way for me to confirm a field is a List type?
Pydantic has the concept of the shape of a field. These shapes are encoded as integers and available as constants in the fields module. The more-or-less standard types have been accommodated there already. If a field was annotated with list[T], then the shape attribute of the field will be SHAPE_LIST and the type_ will be T.
The type_ refers to the element type in the context of everything that is not SHAPE_SINGLETON, i.e. with container-like types. This is why you get str in your example.
Thus for something as simple as list, you can simply check the shape against that constant:
from pydantic import BaseModel
from pydantic.fields import SHAPE_LIST
class TestModel(BaseModel):
tags: list[str]
other: tuple[str]
tags_field = TestModel.__fields__["tags"]
other_field = TestModel.__fields__["other"]
assert tags_field.shape == SHAPE_LIST
assert other_field.shape != SHAPE_LIST
If you want more insight into the actual annotation of the field, that is stored in the annotation attribute of the field. With that you should be able to do all the typing related analyses like get_origin.
That means another way of accomplishing your check would be this:
from typing import get_origin
from pydantic import BaseModel
class TestModel(BaseModel):
tags: list[str]
other: tuple[str]
tags_field = TestModel.__fields__["tags"]
other_field = TestModel.__fields__["other"]
assert get_origin(tags_field.annotation) is list
assert get_origin(other_field.annotation) is tuple
Sadly, neither of those attributes are officially documented anywhere as far as I know, but the beauty of open-source is that we can just check ourselves. Neither the attributes nor the shape constants are obfuscated, protected or made private in any of the usual ways, so I'll assume these are stable (at least until Pydantic v2 drops).
I generated a Pydantic model and would like to import it into SQLModel. Since said model does not inherit from the SQLModel class, it is not registered in the metadata which is why
SQLModel.metadata.create_all(engine)
just ignores it.
In this discussion I found a way to manually add models:
SQLModel.metadata.tables["hero"].create(engine)
But doing so throws a KeyError for me.
SQLModel.metadata.tables["sopro"].create(engine)
KeyError: 'sopro'
My motivation for tackling the problem this way is that I want to generate an SQLModel from a simple dictionary like this:
model_dict = {"feature_a": int, "feature_b": str}
And in this SO answer, I found a working approach. Thank you very much in advance for your help!
As far as I know, it is not possible to simply convert an existing Pydantic model to an SQLModel at runtime. (At least as of now.)
There are a lot of things that happen during model definition. There is a custom meta class involved, so there is no way that you can simply substitute a regular Pydantic model class for a real SQLModel class, short of manually monkeypatching all the missing pieces.
That being said, you clarified that your actual motivation was to be able to dynamically create an SQLModel class at runtime from a dictionary of field definitions. Luckily, this is in fact possible. All you need to do is utilize the Pydantic create_model function and pass the correct __base__ and __cls_kwargs__ arguments:
from pydantic import create_model
from sqlmodel import SQLModel
field_definitions = {
# your field definitions here
}
Hero = create_model(
"Hero",
__base__=SQLModel,
__cls_kwargs__={"table": True},
**field_definitions,
)
With that, SQLModel.metadata.create_all(engine) should create the corresponding database table according to your field definitions.
See this question for more details.
Be sure to use correct form for the field definitions, as the example you gave would not be valid. As the documentation says, you need to define fields in the form of 2-tuples (or just a default value):
model_dict = {
"feature_a": (int, ...),
"feature_b": (str, ...),
"feature_c": 3.14,
}
Hope this helps.
I wanted to know what is the difference between:
from pydantic import BaseModel, Field
class Person(BaseModel):
name: str = Field(..., min_length=1)
And:
from pydantic import BaseModel, constr
class Person(BaseModel):
name: constr(min_length=1)
Both seem to perform the same validation (even raise the exact same exception info when name is an empty string). Is it just a matter of code style? Is one of them preferred over the other?
Also, if I wanted to include a list of nonempty strings as an attribute, which of these ways do you think would be better?:
from typing import List
from pydantic import BaseModel, constr
class Person(BaseModel):
languages: List[constr(min_length=1)]
Or:
from typing import List
from pydantic import BaseModel, Field
class Person(BaseModel):
languages: List[str]
#validator('languages', each_item=True)
def check_nonempty_strings(cls, v):
if not v:
raise ValueError('Empty string is not a valid language.')
return v
EDIT:
FWIW, I am using this for a FastAPI app.
EDIT2:
For my 2nd question, I think the first alternative is better, as it includes the length requirement in the Schema (and so it's in the documentation)
constr and Fields don't serve the same purpose.
constr is a specific type that give validation rules regarding this specific type. You have equivalent for all classic python types.
arguments of constr:
strip_whitespace: bool = False: removes leading and trailing whitespace
to_lower: bool = False: turns all characters to lowercase
to_upper: bool = False: turns all characters to uppercase
strict: bool = False: controls type coercion
min_length: int = None: minimum length of the string
max_length: int = None: maximum length of the string
curtail_length: int = None: shrinks the string length to the set value when it is longer than the set value
regex: str = None: regex to validate the string against
As you can see thoses arguments allow you to manipulate the str itself not the behaviour of pydantic with this field.
Field doesn't serve the same purpose, it's a way of customizing fields, all fields not only str, it add 18 customization variables that you can find here.
Is it just a matter of code style? Is one of them preferred over the other?
for the specific case of str it is a matter of code style and what is preferred doesn't matter, only your usecase does.
In general it is better to don't mix different syntax toguether and since you often need Field(), you will find it often.
A classic use case would be api response that send json object in camelCase or PascalCase, you would use field alias to match thoses object and work with their variables in snake_case.
exemple:
class Voice(BaseModel):
name: str = Field(None, alias='ActorName')
language_code: str = None
mood: str = None
for your 2nd question you are right, using constr is surely the best approach since the validation rule will be added into the openapi doc.
If you want to learn more about limitation and field rules enforcement check this.
This link shows the methods that do and don't work for pydantic and mypy together: https://lyz-code.github.io/blue-book/coding/python/pydantic_types/#using-constrained-strings-in-list-attributes
The best option for my use case was to make a class that inherited from pydantic.ConstrainedStr as so:
import pydantic
from typing import List
...
class Regex(pydantic.ConstrainedStr):
regex = re.compile("^[0-9a-z_]*$")
class Data(pydantic.BaseModel):
regex: List[Regex]
# regex: list[Regex] if you are on 3.9+
I'm learning a new tool called SQLModel, by Sebastian Ramirez (The FastAPI creator).
For basic CRUD operations in SQLModel, the docs teach you it's necessary to set up a model entity like this:
from typing import Optional, List
from sqlmodel import Field, SQLModel, Relationship
class RoomBase(SQLModel):
name: str
is_ensuite: bool
current_occupants: Optional[List[str]]
house_id: Optional[int] = Field(default=None, foreign_key="house.id")
class Room(RoomBase, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
house: Optional["House"] = Relationship(back_populates="rooms")
class RoomCreate(RoomBase):
pass
class RoomRead(RoomBase):
id: int
class RoomUpdate(SQLModel):
name: Optional[str] = None
is_ensuite: Optional[bool] = None
current_occupants: Optional[List[str]] = None
house_id: Optional[int] = None
My example above will create a model called Room, which is part of a House. This would have to be repeated for every new model class, meaning I can forget about putting multiple models in the same file.
Lots of code for a little CRUD, right!?
Since it's likely that I will use the same CRUD setup 90% of the time (e.g. I will always want all the fields to be editable on an update, or I will always only need the ID for a read, etc.), it got me wondering whether the above could be abstracted, so that whole file didn't have to be repeated for EVERY SINGLE database entity.
Is it possible in Python to pass in fields and types by means of inheritance or otherwise, such that I would only need to write a generic version of the above code once, rather than having to write it all out for every model?
It appears you are using fastapi. If so, what about fastapi-crudrouter? It did the bulk of the work for me. While googling for the fastapi-crudrouter link, I found another project FastAPIQuickCrud. Just skimming, but it seems to solve the same problem.
I have some python classes that relate to one another, they attempt to mimic a graphql schema (The schema itself is not relevant, I post here the base case to reproduce the issue).
The GraphQL schema looks like this:
type User {
name: String
orders: [Order]
}
type Order {
key: String
user: User
}
From a schema-design point of view, there is nothing wrong with this schema, it's a valid one and I already have a database running with this relationships (it just means: An user may have several orders, an order may have only one user that created it).
It's in the python side of things that things get a little messy.
I would expect the following code to work:
file: models/Model.py
import attr
#attr.s
class Model():
pass # Model internal workings not relevant to the example
file: models/User.py
from typing import List
import attr
from . import Model
#attr.s
class User(Model):
name: str = 'Name'
orders: List[Order] = attr.ib(factory=list)
file: models/Order.py
import attr
from . import Model
#attr.s
class Order(Model):
key: str = 'abc'
user: User = attr.ib(factory=User)
then I can do things like this:
file: main.py
import models as m
user = m.User.query(name='John', with='orders')
user.name # "John"
user.orders # [m.Order(key='1'), m.Order(key='2'), m.Order(key='3')...]
order = m.Order.query(key='1', with='user')
order.key # "1"
order.user # m.User(name="John")
This code does not work due to the circular dependency (User needing Order type to be defined earlier, and Order requiring User).
The workaround I found was late-importing the models using the importlib:
# current solution:
# using the importlib to import dynamically
from typing import List
import attr
from helpers import convert_to, list_convert_to,
# Note: "convert_to" receives a class name and returns a function to instantiate it dinamically
#attr.s
class Model():
pass
#attr.s
class User(Model):
name: str = 'Name'
orders: List[Model] = attr.ib(factory=list_convert_to('Order'))
#attr.s
class Order(Model):
key: str = 'abc'
user: Model = attr.ib(factory=list_convert_to('User'))
this solution works, but loses the ability to know beforehand the types of the fields, and I think it is slower when building complex relations (hundreds of items with Objects several levels deep).
This is why i am looking for better ways to solve this problem, any ideas?
Assuming you're using Python 3.7 or later, the following line will make it work:
from __future__ import annotations
It also allows you to refer to a class while defining it. E.g.
class C:
#classmethod
def factory(cls) -> C:
...
works now.
If your classes are defined in multiple files and you get a circular dependency due to that, you can guard the imports using
from typing import TYPE_CHECKING
# ...
if TYPE_CHECKING:
from module import User