Performance one-to-many relationship in SQLAlchemy

Performance one-to-many relationship in SQLAlchemy - python

I'm trying to define a one-to-many relationship with SqlAlchemy where I have Parent has many Child
class Parent(Base):
__tablename__ = "parent"
id = Column(String, primary_key = True)
children = relationship("Child")
class Child(Base):
__tablename__ = "child"
id = Column(Integer, primary_key = True)
feed_type_id = Column(String, ForeignKey("parent.id"))
From business rules, Parent has no much Child (between 10 and 30) and most of the time I will need access to all of them so I think that it's good idea that relationship() retrieve all children in memory in order to increase performance (First question: am I right?) but Few times I need to get a particular child but I won't do something like:
def search_bar_attr(some_value)
for bar in foo.bars:
if(bar.attr == some_value)
return bar
lazy="dynamic" returns a list that allows queries but I think it's slow against "eagerly" loaded because dynamic relationship always queries the database.
Second question: Is there some configuration that covers all my needs?

You can construct the same query that lazy="dynamic" does by using .with_parent.
class Parent(Base):
...
#property
def children_dynamic(self):
return object_session(self).query(Child).with_parent(self, Parent.children)
You can even add a function to reduce boilerplate if you have to write a lot of these:
def dynamicize(rel):
#property
def _getter(self):
return object_session(self).query(rel.parent).with_parent(self, rel)
return _getter
class Parent(Base):
...
children = relationship("Child")
children_dynamic = dynamicize(children)

You don't need to use a function like that one, you don't even need to load all of the child objects in memory.
When you want to search for a child with a certain attribute, you can do:
# get a session object, usually with sessionmaker() configured to bind to your engine instance
c = session.query(Child).filter_by(some_attribute="some value here").all() # returns a list of all child objects that match the filter
# or: to get a child who belongs to a certain parrent with a certain attribute:
# get the parent object (p)
c = session.query(Child).filter_by(feed_type_id=p.id).filter_by(some_attr="some attribute that belongs to children of the p parrent object")

No one strategy will give you everything. However, you can choose a default strategy and then override it.
My recommendation would be to:
Add lazy = "joined" to your relationship so that by default, you will get all the parents.
In cases where you want to query for a set of children dependent on properties of their parents but don't need the parent objects, use the join function on the query and filters referring both to the parent and child
In cases where you need to construct a query similar to what lazy = "dynamic" would do, use the sqlalchemy.orm.defer operator to turn off your lazy = "joined" eager loading and the loading interface( to override eager loading and then use with_parent to construct query. a query like you would have gotten with lazy = "dynamic"

Related

FastAPI + SqlAlchemy: Left join

I am new to using both FastAPI and SqlAlchemy with PostgreSQL. I've been working on creating some models, which started out fine.
class Parent(Base):
__tablename__ = "parents"
uid = Column(Integer, primary_key=True)
children = relationship("Child", back_populates="parent")
class Child(Base):
__tablename__ = "children"
uid = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey("parents.uid"))
parent = relationship("Parent", backpopulates="children")
This part works as I would expect, and I can create Parent and Child objects, with Child models having parent_id fields as ForeignKeys that reference the Parent.uid fields.
My issue is when I now try to obtain a parent and it's children in a query. For this I use the SqlAlchemy query function:
session.query(Parent).outerjoin(Child).all()
In my mind this should give me a parent object that looks something like this:{ uid: 1, children: [{ uid: 111 }] }. However all I get is: { uid: 1 }. While it does not throw an error, it doesn't show me the child data. When I look at the query used by SqlAlchemy (using query.statement.compile(compile_kwargs={"literal_binds": True})) I get:
SELECT parent.uid, child.uid as uid_1 FROM parents LEFT OUTER JOIN children ON parent.uid = child.parent_id;
Which is about what I would expect and when I run this in the psql shell I get the expected result:
uid | uid_1
-----+-------
1 | 111
I've tried various different ways to define the relationship, both in the joins and model declarations (backrefs, declaring explicit joins such as .outerjoin(child, child.parent_id == parent.uid, etc.), but nothing I do gives me the output I am looking for from the SqlAlchemy query. Any help is very much appreciated.

When using the ORM usually you reference the relationships off the parent like this:
for parent in session.query(Parent).all():
for child in parent.children:
# do something here
pass
This executes a query every time you have to fetch the children (when using lazy loading) for a parent which is not what you want.
There are lot of loading strategies to "load" the children. One that emulates what you describes is a joined load, like this:
from sqlalchemy.orm import joined_load
q = session.query(Parent).options(joinedload(Parent.children))
for parent in q.all():
for child in parent.children:
# these children should already be loaded
pass
You would use a regular join like in your example if you needed to filter the parent based on a child column.
You can load different relationships different ways both dynamically like above and beforehand by setting the loading setting on the relationship itself. You can read about these things below:
loading_relationships
joinedload

How to use the order_by defined on the relationship in SQLAlchemy and contains_eager?

The Zen of Joined Eager Loading docs recommends using contains_eager() if we want to keep the relationship order defined in the model.
"If we wanted to use just one JOIN for collection loading as well as ordering, we use the contains_eager() option, described in Routing Explicit Joins/Statements into Eagerly Loaded Collections below."
But the following example seems to behave otherwise. I must be missing something, but unsure what.
class Parent(Base):
__tablename__ = "parent"
parent_id = Column(types.Integer, primary_key=True)
name = Column(types.String(200), nullable=False)
class Child(Base):
__tablename__ = "child"
order = Column(types.Integer, default=0)
name = Column(types.String(200))
parent_id = Column(types.Integer, ForeignKey(Parent.parent_id))
parent = relationship(
Parent,
backref=backref(
"children",
cascade="all,delete",
order_by="Child.order",
),
)
query = session.query(Parent).options(
contains_eager(Parent.children)
).filter(Parent.parent_id == 99).filter(Child.name == "foo")
Generates the following SQL:
SELECT parent.parent_id, parent.name,
child.order, child.name,
FROM parent, child
WHERE parent.parent_id = 99 AND parent.name = 'foo'
For some reason,
ORDER BY child.order
is missing, even though it's defined in the relationship(). Any hints?
It works fine if the order_by is specified at query time, but I want to avoid writing the same ordering criteria multiple times.

The documentation is correct, and it refers to the fact that if using most out-of-the box eager loading methods, the query is modified and might not be optimal.
The suggestion is then to use contains_eager where:
1) the user is responsible for constructing the correct query (including joins, filters, ordering etc)
2) by using contains_eager the user hints SA that the specified relationship is included in the query.
The way to load relationship eagerly would be to use joinedload:
q_joined = (
session
.query(Parent)
.options(joinedload(Parent.children))
.filter(Parent.parent_id == parent_id)
)
But you cannot apply these additional filters in this case.
Using the contains_eager you would do:
q_manual = (
session
.query(Parent)
.join(Child) # MUST HAVE THIS
.options(contains_eager(Parent.children))
.filter(Parent.parent_id == 99)
# .filter(Child.name == "foo") # you can add this, but you are "tricking" SA into believing that only these 'Child' are part of the Parent.children relationship.
.order_by(Parent.parent_id, Child.order) # JUST ADD THIS to solve the ordering
)

Why I need both relationship and foreign key for Many to One relationship?

In the docs for SQLAlchemy for Many to One relationships it shows the following example:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
child_id = Column(Integer, ForeignKey('child.id'))
child = relationship("Child")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
Many parents for a single child. Then, when if we create a Parent, we need to populate child_id and child, which seems kind of redundant? Is this mandatory, or what's the purpose of each thing?
child = Child()
Parent(child_id=child, child=child)
Also, in Flask-SQLAlchemy, there is this example for a simple relationship in which it creates a post like this:
Post(title='Hello Python!', body='Python is pretty cool', category=py)
without providing a category_id. If I replicate that scenario, category_id value is None.
For the purpose of creating new objects like Parent(child=child), would it be enough to add foreign_keys=[child_id] or does it have further implications?

It is not mandatory; you do not need to populate both. Setting the foreign key to the related instance can be an error waiting to manifest itself. The only thing you need to do is
child = Child()
parent = Parent(child=child)
After this parent.child_id is None, but they represent the object part of ORM just fine. parent.child is a reference to the created child. They have not been persisted to the database and have no identity, other than their Python object ID. Only when you add them to a Session and flush the changes to the database do they receive an identity, due to them using generated surrogate keys. Here is where the mapping from the object world to the relational world happens. SQLAlchemy automatically fills in parent.child_id, so that their relationship is recorded in the database as well (note that this is not what "relational" in relational model means).
Returning to the example, adding some printing helps keep track of what happens and when:
child = Child()
parent = Parent(child=child)
print(parent.child_id) # None
session.add(parent)
session.flush() # Send changes held in session to DB
print(parent.child_id) # The ID assigned to child
You can also reverse the situation: you might have the ID of an existing Child, but not the actual object. In that case you can simply assign child_id yourself.
So, to answer the title: you do not need the ORM relationship in order to have a DB foreign key relationship, but you can use it to map the DB relationship to the object world.

SQLAlchemy Many to One Join

I have a many to one relationship between two SQL tables using SQLAlchemy. For example:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
child_id = Column(Integer, ForeignKey('child.id'))
child = relationship("Child")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String(100))
What I would like to be able to do is be able to add information from the Child class to the parent. I tried a join query:
result = session.query(Parent).join(Child).all()
While this query adds the appropriate Child object to the Parent object at parent.child it only returns the first parent for each child, i.e. I have four parents and two children in my database and this query only returns parents 1 and 3. How do I fix the query to return all four parents? The second I have is if I wanted to just add the child's name to the parent, not the entire child object, as parent.child_name how would I go about doing that?

How to get all parents when joining to children
The issue is that some parents do not have children, so using a normal join will exclude them. Use an outer join instead. Also, just adding a join won't actually load the children. You should specify contains_eager or joinedload to load the child with the parent.
# use contains_eager when you are joining and filtering on the relationship already
session.query(Parent).join(Parent.child).filter(Child.name == 'Max').options(contains_eager(Parent.child))
# use joinedload when you do not need to join and filter, but still want to load the relationship
session.query(Parent).options(joinedload(Parent.child))
How to add child_name to the parent
You want to use an association proxy.
from sqlalchemy.ext.associationproxy import association_proxy
class Parent(Base):
child = relationship('Child')
child_name = association_proxy('child', 'name')
# you can filter queries with proxies:
session.query(Parent).filter(Parent.child_name == 'Min')
There are some cool things you can do with association proxies, be sure to read the docs for more information.

Generic query in SQLAlchemy

I have following code:
class ArchaeologicalRecord(Base, ObservableMixin, ConcurrentMixin):
author_id = Column(Integer, ForeignKey('authors.id'))
author = relationship('Author', backref=backref('record'))
horizont_id = Column(Integer, ForeignKey('horizonts.id'))
horizont = relationship('Horizont', backref=backref('record'))
.....
somefield_id = Column(Integer, ForeignKey('somefields.id'))
somefield = relationship('SomeModel', backref=backref('record'))
At the moment I have one of entry (Author or Horizont or any other entry which related to arch.record). And I want to ensure that no one record has reference to this field. But I hate to write a lot of code for each case and want to do it most common way.
So, actually I have:
instance of ArchaeologicalRecord
instance of child entity, for example, Horizont
(from previous) it's class definition.
How to check whether any ArchaeologicalRecord contains (or does not) reference to Horizont (or any other child entity) without writing great chunk of copy-pasted code?

Are you asking how to find orphaned authors, horzonts, somefields etc?
Assuming all your relations are many-to-one (ArchaelogicalRecord-to-Author), you could try something like:
from sqlalchemy.orm.properties import RelationshipProperty
from sqlalchemy.orm import class_mapper
session = ... # However you setup the session
# ArchaelogicalRecord will have various properties defined,
# some of these are RelationshipProperties, which hold the info you want
for rp in class_mapper(ArchaeologicalRecord).iterate_properties:
if not isinstance(rp, RelationshipProperty):
continue
query = session.query(rp.mapper.class_)\
.filter(~getattr(rp.mapper.class_, rp.backref[0]).any())
orphans = query.all()
if orphans:
# Do something...
print rp.mapper.class_
print orphans
This will fail when rp.backref is None (i.e. where you've defined a relationship without a backref) - in this case you'd probably have to construct the query a bit more manually, but the RelationshipProperty, and it's .mapper and .mapper.class_ attributes should get you all the info you need to do this in a generic way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Performance one-to-many relationship in SQLAlchemy - python

Related

FastAPI + SqlAlchemy: Left join

How to use the order_by defined on the relationship in SQLAlchemy and contains_eager?

Why I need both relationship and foreign key for Many to One relationship?

SQLAlchemy Many to One Join

Generic query in SQLAlchemy

Categories

Resources