I'm following this tutorial to adapt it to my needs, in this case, to perform a sql module where I need to record the data collected by a webhook from the gitlab issues.
For the database module I'm using SQLAlchemy library and PostgreSQL as database engine.
So, I would like to solve some doubts, I have regarding the use of the Pydantic library, in particular with this example
From what I've read, Pydantic is a library that is used for data validation using classes with attributes.
But I don't quite understand some things...is the integration of Pydantic strictly necessary? The purpose of using Pydantic I understand, but the integration of using Pydantic with SQLAlchemy models I don't understand.
In the tutorial, models.py has the following content:
from sqlalchemy import Boolean, Column, ForeignKey, Integer, String
from sqlalchemy.orm import relationship
from .database import Base
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True, index=True)
email = Column(String, unique=True, index=True)
hashed_password = Column(String)
is_active = Column(Boolean, default=True)
items = relationship("Item", back_populates="owner")
class Item(Base):
__tablename__ = "items"
id = Column(Integer, primary_key=True, index=True)
title = Column(String, index=True)
description = Column(String, index=True)
owner_id = Column(Integer, ForeignKey("users.id"))
owner = relationship("User", back_populates="items")
And schemas.py has the following content:
from typing import Optional
from pydantic import BaseModel
class ItemBase(BaseModel):
title: str
description: Optional[str] = None
class ItemCreate(ItemBase):
pass
class Item(ItemBase):
id: int
owner_id: int
class Config:
orm_mode = True
class UserBase(BaseModel):
email: str
class UserCreate(UserBase):
password: str
class User(UserBase):
id: int
is_active: bool
items: list[Item] = []
class Config:
orm_mode = True
I know that the primary means of defining objects in Pydantic is via models and also I know that models are simply classes which inherit from BaseModel.
Why does it create ItemBase, ItemCreate and Item that inherits from ItemBase?
In ItemBase it passes the fields that are strictly necessary in Item table? and defines its type?
The ItemCreate class I have seen that it is used latter in crud.py to create a user, in my case I would have to do the same with the incidents? I mean, I would have to create a clase like this:
class IssueCreate(BaseModel):
pass
There are my examples trying to follow the same workflow:
models.py
import sqlalchemy
from sqlalchemy import Column, Table
from sqlalchemy import Integer, String, Datetime, TIMESTAMP
from .database import Base
class Issues(Base):
__tablename__ = 'issues'
id = Column(Integer, primary_key=True)
gl_assignee_id = Column(Integer, nullable=True)
gl_id_user = Column(Integer, nullable=False)
current_title = Column(String, nullable=False)
previous_title = Column(String, nullable=True)
created_at = Column(TIMESTAMP(timezone=False), nullable=False)
updated_at = Column(TIMESTAMP(timezone=False), nullable=True)
closed_at = Column(TIMESTAMP(timezone=False), nullable=True)
action = Column(String, nullable=False)
And schemas.py
from pydantic import BaseModel
class IssueBase(BaseModel):
updated_at: None
closed_at: None
previous_title: None
class Issue(IssueBase):
id: int
gl_task_id: int
gl_assignee_id: int
gl_id_user: int
current_title: str
action: str
class Config:
orm_mode = True
But I don't know if I'm right doing it in this way, any suggestions are welcome.
The tutorial you mentioned is about FastAPI. Pydantic by itself has nothing to do with SQL, SQLAlchemy or relational databases. It is FastAPI that is showing you a way to use a relational database.
is the integration of pydantic strictly necessary [when using FastAPI]?
Yes. Pydantic is a requirement according to the documentation:
Requirements
Python 3.6+
FastAPI stands on the shoulders of giants:
Starlette for the web parts.
Pydantic for the data parts.
Why does it create ItemBase, ItemCreate and Item that inherits from ItemBase?
Pydantic models are the way FastAPI uses to define the schemas of the data that it receives (requests) and returns (responses). ItemCreate represent the data required to create an item. Item represents the data that is returned when the items are queried. The fields that are common to ItemCreate and Item are placed in ItemBase to avoid duplication.
In ItemBase it passes the fields that are strictly necessary in Item table? and defines its type?
ItemBase has the fields that are common to ItemCreate and Item. It has nothing to do with a table. It is just a way to avoid duplication. Every field of a pydantic model must have a type, there is nothing unusual there.
in my case I would have to do the same with the incidents?
If you have a similar scenario where the schemas of the data that you receive (request) and the data that you return (response) have common fields (same name and type), you could define a model with those fields and have other models inherit from it to avoid duplication.
This could be a (probably simplistic) way of understanding FastAPI and pydantic:
FastAPI transforms requests to pydantic models. Those pydantic models are your input data and are also known as schemas (maybe to avoid confusion with other uses of the word model). You can do whatever you want with those schemas, including using them to create relational database models and persisting them.
Whatever data you want to return as a response needs to be transformed by FastAPI to a pydantic model (schema). It just happens that pydantic supports an orm_mode option that allows it to parse arbitrary objects with attributes instead of dicts. Using that option you can return a relational database model and FastAPI will transform it to the corresponding schema (using pydantic).
FastAPI uses the parsing and validation features of pydantic, but you have to follow a simple rule: the data that you receive must comply with the input schema and the data that you want to return must comply with the output schema. You are in charge of deciding whatever happens in between.
Related
I am working on a project using Flask and SqlAlchemy. Me and my colleagues found two ways to define a table. Both work, but what is the different?
Possibility I
base = declarative_base()
class Story(base):
__tablename__ = 'stories'
user_id = Column(Integer, primary_key=True)
email = Column(String(100), unique=True)
password = Column(String(100), unique=True)
Possibility II
db = SQLAlchemy()
class Story(db.Model):
__tablename__ = 'stories'
user_id = db.Column(Integer, primary_key=True)
email = db.Column(String(100), unique=True)
password = db.Column(String(100), unique=True)
We want to choose one option, but which one?
It is obvious that both classes inherit from a different parent class, but for what are these two possibilities used for?
Possibility 1 is raw SQLAlchemy declarative mapping.
Possibility 2 is Flask-SQLAlchemy.
Both map a class to SQL table (or something more exotic in SQL) in a declarative style, i.e. the class is mapped to an automatically generated table.
Choosing which one to use however is a matter of opinion.
I'll say that using Flask-SQLAlchemy is obviously locking the application to Flask, but that's basically a non-problem since switching frameworks is very uncommon.
NB. __tablename__ is optional with Flask-SQLAlchemy.
I've got a Python 3.10.8 FastAPI application using SQLModel and having trouble with many-to-many relationships. I can define them okay using strings to refer to the class on the other side of the many. Here is a simplified sample of one side of the many-to-many schema/model:
schedule.py file
class Schedule(ScheduleBase, table=True):
__tablename__ = "schedules"
id: Optional[int] = Field(default=None, primary_key=True)
plans: List["Plan"] = Relationship(back_populates="schedules", link_model=SchedulePlanLink)
Here's the other side of the many-to-many schema/model:
class Plan(PlanBase, table=True):
__tablename__ = "plans"
id: Optional[int] = Field(default=None, primary_key=True)
schedules: List["Schedule"] = Relationship(back_populates="plans", link_model=SchedulePlanLink)
And here's the association table between them:
class SchedulePlanLink(SchedulePlanLinkBase, table=True):
__tablename__ = "schedule_plan_links"
schedule_id: Optional[int] = Field(
default=None, primary_key=True, foreign_key="schedules.id"
)
plan_id: Optional[int] = Field(
default=None, primary_key=True, foreign_key="plans.id"
)
This works and creates the expected tables and FK's, etc. The problem arises when I try to access the data. I have a SQLModel class that looks like this:
class ScheduleReadWithPlans(ScheduleRead):
plans: Optional[List["PlanRead"]] = []
And I have to have this in order to read the data and return it via a route:
ScheduleReadWithPlans.update_forward_refs()
And I can get the schedule data with the list of plans. But if I try the same thing with plan data trying to get a list of schedules (with the appropriate classes define), I end up getting circular references.
Any ideas about how to define/configure this kind of thing to to work?
I am writing a FastAPI application, and have the need to essentially merge some properties from two SQLAlchemy model instances into a single Pydantic model for the response, some properties from Object A and some from Object B, returning a "consolidated" object.
class Message:
id = Column(Integer, primary_key=True)
subject = Column(Text)
message = Column(Text)
class FolderLink:
id = Column(Integer, primary_key=True)
message_id = Column(Integer, ForeignKey('message.id'))
folder_id = Column(Integer, ForeignKey('folder.id'))
is_read = Column(Boolean, nullable=False, default=False)
In application code, I have a Message instance, needing all properties, and the relevant FolderLink instance, from which I need is_read.
My Pydantic schema looks like:
class MessageWithProperties(BaseModel):
id: int # from Message
subject: str # from Message
message: str # from Message
is_read: bool # from FolderLink
And in the view code, the only way I can seem to properly pass objects to the Pydantic model is like so:
objs = []
# [...]
for link in links:
objs.append({**link.message.__dict__, **link.__dict__})
It feels wrong to have to use a dunder method this way. I had experimented with a custom #classmethod constructor on the Pydantic model, but that will not work as I am not directly instantiating them myself, they are instantiated by FastAPI as part of the response handling.
What is the proper way to do this?
What if instead of mashing all the fields togther you create a top level MessageWithProperties class that is built from two inferior Pydantic models, one for Message and one for FolderLink? Something like:
class BaseSchema(pydantic.BaseModel):
class Config:
orm_mode = True
class MessageSchema(BaseSchema):
id: int # from Message
subject: str # from Message
message: str # from Message
class FolderLinkSchema(BaseSchema):
is_read: bool # from FolderLink
class CombinedSchema(BaseSchema):
message: MessageSchema
folderlink: FolderLinkSchema
objs = []
for link in links:
objs.append(CombinedSchema(
message=MessageSchema.from_orm(link.message),
folderlink=FolderLinkSchema.from_orm(link),
))
Now you'll be able to access both the FolderLink and Message
attributes from each object in your objs array.
Question
Is it possible to replicate Marshmallow's dump_only feature using pydantic for FastAPI, so that certain fields are "read-only", without defining separate schemas for serialization and deserialization?
Context
At times, a subset of the attributes (e.g. id and created_date) for a given API resource are meant to be read-only and should be ignored from the request payload during deserialization (e.g. when POSTing to a collection or PUTting to an existing resource) but need to be returned with that schema in the response body for those same requests.
Marshmallow provides a convenient dump_only parameter that requires only one schema to be defined for both serialization and deserialization, with the option to exclude certain fields from either operation.
Existing Solution
Most attempts I've seen to replicate this functionality within FastAPI (i.e. FastAPI docs, GitHub Issue, Related SO Question) tend to define separate schemas for input (deserialization) and output (serialization) and define a common base schema for the shared fields between the two.
Based on my current understanding of this approach, it seems a tad inconvenient for a few reasons:
It requires the API developer to reserve separate namespaces for each schema, a problem that is exacerbated by following the practice of abstracting the common fields to a third "base" schema class.
It results in the proliferation of schema classes in APIs that have nested resources, since each level of nesting requires a separate input and output schema.
The the OAS-compliant documentation displays the input/output schemas as separate definitions, when the consumer of that API only ever needs to be aware of a single schema since the (de)serialization of those read-only fields should be handled properly by the API.
Example
Say we're developing a simple API for a survey with the following models:
from sqlalchemy.orm import declarative_base, relationship
from sqlalchemy import (
func,
Column,
Integer,
String,
DateTime,
ForeignKey,
)
Base = declarative_base()
class SurveyModel(db.Base):
"""Table that represents a collection of questions"""
__tablename__ = "survey"
# columns
id = Column(Integer, primary_key=True, index=True)
name = Column(String, nullable=False)
created_date = Column(DateTime, default=func.now())
# relationships
questions = relationship("Question", backref="survey")
class QuestionModel(Base):
"""Table that contains the questions that comprise a given survey"""
__tablename__ = "question"
# columns
id = Column(Integer, primary_key=True, index=True)
survey_id = Column(Integer, ForeignKey("survey.id"))
text = Column(String)
created_date = Column(DateTime, default=func.now())
And we wanted a POST /surveys endpoint to accept the following payload in the request body:
{
"name": "First Survey",
"questions": [
{"text": "Question 1"},
{"text": "Question 2"}
]
}
And return the following in the response body:
{
"id": 1,
"name": "First Survey",
"created_date": "2021-12-12T00:00:30",
"questions": [
{
"id": 1,
"text": "Question 1",
"created_date": "2021-12-12T00:00:30"
},
{
"id": 2,
"text": "Question 2",
"created_date": "2021-12-12T00:00:30"
},
]
}
Is there an alternative way to make id and created_date read-only for both QuestionModel and SurveyModel other than defining the schemas like this?
from datetime import datetime
from typing import List
from pydantic import BaseModel
class QuestionIn(BaseModel):
text: str
class Config:
extra = "ignore" # ignores extra fields passed to schema
class QuestionOut(QuestionIn):
id: int
created_date: datetime
class SurveyBase(BaseModel):
name: str
class Config:
extra = "ignore" # ignores extra fields passed to schema
class SurveyOut(SurveyBase):
id: int
created_date: datetime
class SurveyQuestionsIn(SurveyBase):
questions: List[QuestionIn]
class SurveyQuestionsOut(SurveyOut):
questions: List[QuestionOut]
Just for comparison, here would be the equivalent schema using marshmallow:
from marshmallow import Schema, fields
class Question(Schema):
id = fields.Integer(dump_only=True)
created_date = fields.DateTime(dump_only=True)
text = fields.String(required=True)
class Survey(Schema):
id = fields.Integer(dump_only=True)
created_date = fields.DateTime(dump_only=True)
name = fields.String(required=True)
questions = fields.List(fields.Nested(Question))
References
Marshmallow read-only/load-only fields
Existing Stack Exchange Question
Read-only fields issue on FastAPI repo
FastAPI documentation on Schemas
I am currently working with some legacy code that looks as follows:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Unicode
from sqlalchemy.dialects.postgresql import ARRAY, TEXT
Base = declarative_base()
class Book(Base):
__tablename__ = 'book'
id = Column(Integer, primary_key=True)
title = Column(Unicode)
keywords = Column('keywords', ARRAY(TEXT), primary_key=False)
The keywords are currently being kept as an array, but I'd like to flatten this out and have them be in their own separate model
class Keyword():
__tablename__ = 'keyword'
id = Column(Integer, primary_key=True)
book_id = Column(Integer, ForeignKey('book.id', ondelete='cascade'),
nullable=False)
keyword = Column(Unicode)
How can I make it such that when a Book() is created, it also creates the
accompanying keywords? As an intermediate step for migrating the API, I'd like to keep the current array column, but also have the accompanying Keyword() instances be created.
I could do this within an __init__ method, but would need to know what the current Session() was, in order to run a commit. I could also perhaps use a property attribute, attached to keywords, but am not sure how that would work given that I am working with a class that inherits from SQLAlchemy's base, and not with a regular class that I have defined. What's the correct way to do this?
You can use object_session to find out the session of a given instance.
But if you define relationship between a Book and Keywords, you should not need even bother:
class Book(Base):
# ...
rel_keywords = relationship('Keyword', backref='book')
def init_keyword_relationship(self):
for kw in self.keywords:
self.rel_keywords.add(Keyword(keyword=kw))
sess = # ... get_session...
books = sess.query(Book).all()
for book in books:
book.init_keyword_relationship()
sess.commit()
However, I would do a migration once and get rid of the keywords array in order not to add a logic to keep those in sync.