Related
I'm trying to model the following situation: A program has many versions, and one of the versions is the current one (not necessarily the latest).
This is how I'm doing it now:
class Program(Base):
__tablename__ = 'programs'
id = Column(Integer, primary_key=True)
name = Column(String)
current_version_id = Column(Integer, ForeignKey('program_versions.id'))
current_version = relationship('ProgramVersion', foreign_keys=[current_version_id])
versions = relationship('ProgramVersion', order_by='ProgramVersion.id', back_populates='program')
class ProgramVersion(Base):
__tablename__ = 'program_versions'
id = Column(Integer, primary_key=True)
program_id = Column(Integer, ForeignKey('programs.id'))
timestamp = Column(DateTime, default=datetime.datetime.utcnow)
program = relationship('Filter', foreign_keys=[program_id], back_populates='versions')
But then I get the error: Could not determine join condition between parent/child tables on relationship Program.versions - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.
But what foreign key should I provide for the 'Program.versions' relationship? Is there a better way to model this situation?
Circular dependency like that is a perfectly valid solution to this problem.
To fix your foreign keys problem, you need to explicitly provide the foreign_keys argument.
class Program(Base):
...
current_version = relationship('ProgramVersion', foreign_keys=current_version_id, ...)
versions = relationship('ProgramVersion', foreign_keys="ProgramVersion.program_id", ...)
class ProgramVersion(Base):
...
program = relationship('Filter', foreign_keys=program_id, ...)
You'll find that when you do a create_all(), SQLAlchemy has trouble creating the tables because each table has a foreign key that depends on a column in the other. SQLAlchemy provides a way to break this circular dependency by using an ALTER statement for one of the tables:
class Program(Base):
...
current_version_id = Column(Integer, ForeignKey('program_versions.id', use_alter=True, name="fk_program_current_version_id"))
...
Finally, you'll find that when you add a complete object graph to the session, SQLAlchemy has trouble issuing INSERT statements because each row has a value that depends on the yet-unknown primary key of the other. SQLAlchemy provides a way to break this circular dependency by issuing an UPDATE for one of the columns:
class Program(Base):
...
current_version = relationship('ProgramVersion', foreign_keys=current_version_id, post_update=True, ...)
...
This design is not ideal; by having two tables refer to one another, you cannot effectively insert into either table, because the foreign key required in the other will not exist. One possible solution in outlined in the selected answer of
this question related to microsoft sqlserver, but I will summarize/elaborate on it here.
A better way to model this might be to introduce a third table, VersionHistory, and eliminate your foreign key constraints on the other two tables.
class VersionHistory(Base):
__tablename__ = 'version_history'
program_id = Column(Integer, ForeignKey('programs.id'), primary_key=True)
version_id = Column(Integer, ForeignKey('program_version.id'), primary_key=True)
current = Column(Boolean, default=False)
# I'm not too familiar with SQLAlchemy, but I suspect that relationship
# information goes here somewhere
This eliminates the circular relationship you have created in your current implementation. You could then query this table by program, and receive all existing versions for the program, etc. Because of the composite primary key in this table, you could access any specific program/version combination. The addition of the current field to this table takes the burden of tracking currency off of the other two tables, although maintaining a single current version per program could require some trigger gymnastics.
HTH!
In the docs for SQLAlchemy for Many to One relationships it shows the following example:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
child_id = Column(Integer, ForeignKey('child.id'))
child = relationship("Child")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
Many parents for a single child. Then, when if we create a Parent, we need to populate child_id and child, which seems kind of redundant? Is this mandatory, or what's the purpose of each thing?
child = Child()
Parent(child_id=child, child=child)
Also, in Flask-SQLAlchemy, there is this example for a simple relationship in which it creates a post like this:
Post(title='Hello Python!', body='Python is pretty cool', category=py)
without providing a category_id. If I replicate that scenario, category_id value is None.
For the purpose of creating new objects like Parent(child=child), would it be enough to add foreign_keys=[child_id] or does it have further implications?
It is not mandatory; you do not need to populate both. Setting the foreign key to the related instance can be an error waiting to manifest itself. The only thing you need to do is
child = Child()
parent = Parent(child=child)
After this parent.child_id is None, but they represent the object part of ORM just fine. parent.child is a reference to the created child. They have not been persisted to the database and have no identity, other than their Python object ID. Only when you add them to a Session and flush the changes to the database do they receive an identity, due to them using generated surrogate keys. Here is where the mapping from the object world to the relational world happens. SQLAlchemy automatically fills in parent.child_id, so that their relationship is recorded in the database as well (note that this is not what "relational" in relational model means).
Returning to the example, adding some printing helps keep track of what happens and when:
child = Child()
parent = Parent(child=child)
print(parent.child_id) # None
session.add(parent)
session.flush() # Send changes held in session to DB
print(parent.child_id) # The ID assigned to child
You can also reverse the situation: you might have the ID of an existing Child, but not the actual object. In that case you can simply assign child_id yourself.
So, to answer the title: you do not need the ORM relationship in order to have a DB foreign key relationship, but you can use it to map the DB relationship to the object world.
I would like to create a nullable, self-referencing relationship which can be deleted using SQLAlchemy. An example model is as follows (note, using Flask-SQLAlchemy):
class Person(db.Model):
__tablename__ = 'person'
id = db.Column(db.Integer, primary_key=True)
partner_id = db.Column(db.Integer, db.ForeignKey('person.id'), nullable=True)
partner = db.relationship('Person', uselist=False)
So think of this as a table of cops who have only a single partner, but that partner may turn out to have been in the mafia all along, so they lose their partner for a while. A cop without a partner is fine, at least in database terms - but I assume over the course of the show their partnerless status means a lot of property damage.
Needless to say, this question: sqlalchemy: one-to-one relationship with declarative discusses how to set up this relationship. The question is how do you remove the relationship? Normally with a different foreign key you'd do this as follows:
joe.partner.remove(larry)
Where joe and larry are both Person objects. However, via the uselist argument, joe.partner is now actually a Person with no remove method.
How to delete one-to-one relationships is buried away in the SQLAlchemy documentation under the explanation of Cascades: https://docs.sqlalchemy.org/en/14/orm/cascades.html#notes-on-delete-deleting-objects-referenced-from-collections-and-scalar-relationships
The delete-orphan cascade can also be applied to a many-to-one or
one-to-one relationship, so that when an object is de-associated from its parent, it is also automatically marked for deletion. Using
delete-orphan cascade on a many-to-one or one-to-one requires an
additional flag relationship.single_parent which invokes an assertion
that this related object is not to shared with any other parent
simultaneously
So you'll want to set up your one-to-one relationship like so:
partner = db.relationship(
'Person',
cascade='all, delete-orphan',
uselist=False,
single_parent=True,
)
Then, deleting a Person's partner is just a matter of setting it to None:
some_person.partner = None
session.flush() # will delete the partner object
I must be missing something trivial with SQLAlchemy's cascade options because I cannot get a simple cascade delete to operate correctly -- if a parent element is a deleted, the children persist, with null foreign keys.
I've put a concise test case here:
from sqlalchemy import Column, Integer, ForeignKey
from sqlalchemy.orm import relationship
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Parent(Base):
__tablename__ = "parent"
id = Column(Integer, primary_key = True)
class Child(Base):
__tablename__ = "child"
id = Column(Integer, primary_key = True)
parentid = Column(Integer, ForeignKey(Parent.id))
parent = relationship(Parent, cascade = "all,delete", backref = "children")
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
parent = Parent()
parent.children.append(Child())
parent.children.append(Child())
parent.children.append(Child())
session.add(parent)
session.commit()
print "Before delete, children = {0}".format(session.query(Child).count())
print "Before delete, parent = {0}".format(session.query(Parent).count())
session.delete(parent)
session.commit()
print "After delete, children = {0}".format(session.query(Child).count())
print "After delete parent = {0}".format(session.query(Parent).count())
session.close()
Output:
Before delete, children = 3
Before delete, parent = 1
After delete, children = 3
After delete parent = 0
There is a simple, one-to-many relationship between Parent and Child. The script creates a parent, adds 3 children, then commits. Next, it deletes the parent, but the children persist. Why? How do I make the children cascade delete?
The problem is that sqlalchemy considers Child as the parent, because that is where you defined your relationship (it doesn't care that you called it "Child" of course).
If you define the relationship on the Parent class instead, it will work:
children = relationship("Child", cascade="all,delete", backref="parent")
(note "Child" as a string: this is allowed when using the declarative style, so that you are able to refer to a class that is not yet defined)
You might want to add delete-orphan as well (delete causes children to be deleted when the parent gets deleted, delete-orphan also deletes any children that were "removed" from the parent, even if the parent is not deleted)
EDIT: just found out: if you really want to define the relationship on the Child class, you can do so, but you will have to define the cascade on the backref (by creating the backref explicitly), like this:
parent = relationship(Parent, backref=backref("children", cascade="all,delete"))
(implying from sqlalchemy.orm import backref)
#Steven's asnwer is good when you are deleting through session.delete() which never happens in my case. I noticed that most of the time I delete through session.query().filter().delete() (which doesn't put elements in the memory and deletes directly from db).
Using this method sqlalchemy's cascade='all, delete' doesn't work. There is a solution though: ON DELETE CASCADE through db (note: not all databases support it).
class Child(Base):
__tablename__ = "children"
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey("parents.id", ondelete='CASCADE'))
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key=True)
child = relationship(Child, backref="parent", passive_deletes=True)
Pretty old post, but I just spent an hour or two on this, so I wanted to share my finding, especially since some of the other comments listed aren't quite right.
TL;DR
Give the child table a foreign or modify the existing one, adding ondelete='CASCADE':
parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
And one of the following relationships:
a) This on the parent table:
children = db.relationship('Child', backref='parent', passive_deletes=True)
b) Or this on the child table:
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))
Details
First off, despite what the accepted answer says, the parent/child relationship is not established by using relationship, it's established by using ForeignKey. You can put the relationship on either the parent or child tables and it will work fine. Although, apparently on the child tables, you have to use the backref function in addition to the keyword argument.
Option 1 (preferred)
Second, SqlAlchemy supports two different kinds of cascading. The first, and the one I recommend, is built into your database and usually takes the form of a constraint on the foreign key declaration. In PostgreSQL it looks like this:
CONSTRAINT child_parent_id_fkey FOREIGN KEY (parent_id)
REFERENCES parent_table(id) MATCH SIMPLE
ON DELETE CASCADE
This means that when you delete a record from parent_table, then all the corresponding rows in child_table will be deleted for you by the database. It's fast and reliable and probably your best bet. You set this up in SqlAlchemy through ForeignKey like this (part of the child table definition):
parent_id = db.Column(db.Integer, db.ForeignKey('parent.id', ondelete='CASCADE'))
parent = db.relationship('Parent', backref=backref('children', passive_deletes=True))
The ondelete='CASCADE' is the part that creates the ON DELETE CASCADE on the table.
Gotcha!
There's an important caveat here. Notice how I have a relationship specified with passive_deletes=True? If you don't have that, the entire thing will not work. This is because by default when you delete a parent record SqlAlchemy does something really weird. It sets the foreign keys of all child rows to NULL. So if you delete a row from parent_table where id = 5, then it will basically execute
UPDATE child_table SET parent_id = NULL WHERE parent_id = 5
Why you would want this I have no idea. I'd be surprised if many database engines even allowed you to set a valid foreign key to NULL, creating an orphan. Seems like a bad idea, but maybe there's a use case. Anyway, if you let SqlAlchemy do this, you will prevent the database from being able to clean up the children using the ON DELETE CASCADE that you set up. This is because it relies on those foreign keys to know which child rows to delete. Once SqlAlchemy has set them all to NULL, the database can't delete them. Setting the passive_deletes=True prevents SqlAlchemy from NULLing out the foreign keys.
You can read more about passive deletes in the SqlAlchemy docs.
Option 2
The other way you can do it is to let SqlAlchemy do it for you. This is set up using the cascade argument of the relationship. If you have the relationship defined on the parent table, it looks like this:
children = relationship('Child', cascade='all,delete', backref='parent')
If the relationship is on the child, you do it like this:
parent = relationship('Parent', backref=backref('children', cascade='all,delete'))
Again, this is the child so you have to call a method called backref and putting the cascade data in there.
With this in place, when you delete a parent row, SqlAlchemy will actually run delete statements for you to clean up the child rows. This will likely not be as efficient as letting this database handle if for you so I don't recommend it.
Here are the SqlAlchemy docs on the cascading features it supports.
Alex Okrushko answer almost worked best for me. Used ondelete='CASCADE' and passive_deletes=True combined. But I had to do something extra to make it work for sqlite.
Base = declarative_base()
ROOM_TABLE = "roomdata"
FURNITURE_TABLE = "furnituredata"
class DBFurniture(Base):
__tablename__ = FURNITURE_TABLE
id = Column(Integer, primary_key=True)
room_id = Column(Integer, ForeignKey('roomdata.id', ondelete='CASCADE'))
class DBRoom(Base):
__tablename__ = ROOM_TABLE
id = Column(Integer, primary_key=True)
furniture = relationship("DBFurniture", backref="room", passive_deletes=True)
Make sure to add this code to ensure it works for sqlite.
from sqlalchemy import event
from sqlalchemy.engine import Engine
from sqlite3 import Connection as SQLite3Connection
#event.listens_for(Engine, "connect")
def _set_sqlite_pragma(dbapi_connection, connection_record):
if isinstance(dbapi_connection, SQLite3Connection):
cursor = dbapi_connection.cursor()
cursor.execute("PRAGMA foreign_keys=ON;")
cursor.close()
Stolen from here: SQLAlchemy expression language and SQLite's on delete cascade
Steven is correct in that you need to explicitly create the backref, this results in the cascade being applied on the parent (as opposed to it being applied to the child like in the test scenario).
However, defining the relationship on the Child does NOT make sqlalchemy consider Child the parent. It doesn't matter where the relationship is defined (child or parent), its the foreign key that links the two tables that determines which is the parent and which is the child.
It makes sense to stick to one convention though, and based on Steven's response, I'm defining all my child relationships on the parent.
Steven's answer is solid. I'd like to point out an additional implication.
By using relationship, you're making the app layer (Flask) responsible for referential integrity. That means other processes that access the database not through Flask, like a database utility or a person connecting to the database directly, will not experience those constraints and could change your data in a way that breaks the logical data model you worked so hard to design.
Whenever possible, use the ForeignKey approach described by d512 and Alex. The DB engine is very good at truly enforcing constraints (in an unavoidable way), so this is by far the best strategy for maintaining data integrity. The only time you need to rely on an app to handle data integrity is when the database can't handle them, e.g. versions of SQLite that don't support foreign keys.
If you need to create further linkage among entities to enable app behaviors like navigating parent-child object relationships, use backref in conjunction with ForeignKey.
I struggled with the documentation as well, but found that the docstrings themselves tend to be easier than the manual. For example, if you import relationship from sqlalchemy.orm and do help(relationship), it will give you all the options you can specify for cascade. The bullet for delete-orphan says:
if an item of the child's type with no parent is detected, mark it for deletion.
Note that this option prevents a pending item of the child's class from being
persisted without a parent present.
I realize your issue was more with the way the documentation for defining parent-child relationships. But it seemed that you might also be having a problem with the cascade options, because "all" includes "delete". "delete-orphan" is the only option that's not included in "all".
Even tho this question is very old, it comes up first when searched for in Google so I'll post my solution to add up to what others said (I've spent few hours even after reading all the answers in here).
As d512 explained, it is all about Foreign Keys. It was quite a surprise to me but not all databases / engines support Foreign Keys. I'm running a MySQL database. After long investigation, I noticed that when I create new table it defaults to an engine (MyISAM) that doesn't support Foreign Keys. All I had to do was to set it to InnoDB by adding mysql_engine='InnoDB' when defining a Table. In my project I'm using an imperative mapping and it looks like so:
db.Table('child',
Column('id', Integer, primary_key=True),
# other columns
Column('parent_id',
ForeignKey('parent.id', ondelete="CASCADE")),
mysql_engine='InnoDB')
Answer by Stevan is perfect. But if you are still getting the error. Other possible try on top of that would be -
http://vincentaudebert.github.io/python/sql/2015/10/09/cascade-delete-sqlalchemy/
Copied from the link-
Quick tip if you get in trouble with a foreign key dependency even if you have specified a cascade delete in your models.
Using SQLAlchemy, to specify a cascade delete you should have cascade='all, delete' on your parent table. Ok but then when you execute something like:
session.query(models.yourmodule.YourParentTable).filter(conditions).delete()
It actually triggers an error about a foreign key used in your children tables.
The solution I used it to query the object and then delete it:
session = models.DBSession()
your_db_object = session.query(models.yourmodule.YourParentTable).filter(conditions).first()
if your_db_object is not None:
session.delete(your_db_object)
This should delete your parent record AND all the children associated with it.
TLDR: If the above solutions don't work, try adding nullable=False to your column.
I'd like to add a small point here for some people who may not get the cascade function to work with the existing solutions (which are great). The main difference between my work and the example was that I used automap. I do not know exactly how that might interfere with the setup of cascades, but I want to note that I used it. I am also working with a SQLite database.
I tried every solution described here, but rows in my child table continued to have their foreign key set to null when the parent row was deleted. I'd tried all the solutions here to no avail. However, the cascade worked once I set the child column with the foreign key to nullable = False.
On the child table, I added:
Column('parent_id', Integer(), ForeignKey('parent.id', ondelete="CASCADE"), nullable=False)
Child.parent = relationship("parent", backref=backref("children", passive_deletes=True)
With this setup, the cascade functioned as expected.
This question is about how to design a SQL relationship. I am pretty newbie in this matter and I'd like to know the answers of (way) more experts guys...
I am currently migrating a ZopeDB (Object oriented) database to MySQL (relational) using MeGrok and SqlAlchemy (although I don't think that's really too relevant, since my question is more about designing a relationship in a relational database).
I have two classes related like this:
class Child(object):
def __init__(self):
self.field1 = "hello world"
class Parent(object):
def __init__(self):
self.child1 = Child()
self.child2 = Child()
The "Parent" class has two different instances of a Child() class. I am not even sure about how to treat this (two different 1:1 relationships or a 1:2 relationship).
Currently, I have this:
class Child(rdb.Model):
rdb.metadata(metadata)
rdb.tablename("children_table")
id = Column("id", Integer, primary_key=True)
field1 = Column("field1", String(64)) #Irrelevant
def __init__(self):
self.field1 = "hello world"
class Parent(rdb.Model):
rdb.metadata(metadata)
rdb.tablename("parent_table")
id = Column("id", Integer, primary_key=True)
child1_id = Column("child_1_id", Integer, ForeignKey("children_table.id"))
child2_id = Column("child_2_id", Integer, ForeignKey("children_table.id"))
child1 = relationship(Child,
primaryjoin = ("parent_table.child1_id == children_table.id")
)
child2 = relationship(Child,
primaryjoin = ("parent_table.child2_id == children_table.id")
)
Meaning... Ok, I store the two "children" ids as foreign keys in the Parent and retrieve the children itself using that information.
This is working fine, but I don't know if it's the most proper solution.
Or I could do something like:
class Child(rdb.Model):
rdb.metadata(metadata)
rdb.tablename("children_table")
id = Column("id", Integer, primary_key=True)
parent_id = Column("id", Integer, ForeignKey("parent_table.id")) # New!
type = Column("type", ShortInteger) # New!
field1 = Column("field1", String(64)) #Irrelevant
def __init__(self):
self.field1 = "hello world"
class Parent(rdb.Model):
rdb.metadata(metadata)
rdb.tablename("parent_table")
id = Column("id", Integer, primary_key=True)
child1 = relationship(
# Well... this I still don't know how to write it down,
# but it would be something like:
# Give me all the children whose "parent_id" is my own "id"
# AND their type == 1
# I'll deal with the joins and the actual implementation depending
# on your answer, guys
)
child2 = relationship(
# Would be same as above
# but selecting children whose type == 2
)
This may be good for adding new children to the parent class... If I add a "Parent.child3", I just need to create a new relationship very similar to the already existing ones.
The way I have it now would imply creating a new relationship AND adding a new foreign key to the parent.
Also, having a "parent" table with a bunch of foreign keys may not make it the best "parent" table in the world, right?
I'd like to know what people that know much more about databases think :)
Thank you.
PS: Related post? Question 3998545
Expanded in Response to Comments
The issue is, you are thinking in the terms that you know (understandable), and you have the limitations of an OO database ... which would not be good to carry over into the Relational db. So for many reasons, it is best to simply identify the Entities and Relations, and to Normalise them. The method you use to call is easy to change and you will not be limited to only what you have now.
There are some good answers here, but even those are limited and incomplete. If you Normalise Parent and Child (being people, they will have many common columns), you get Person, with no duplicated columns.
People have "upward" relations to other people, their Parents, but that is context, not the fact that the Parent exists as a Person first (and you can have more than two if you like). People also have "downward" relations to their Children, also contextual. The limitation of two children per Parent is absurd (you may have to inspect your methods/classes: I suspect one is an "upward" navigation and the other is "downward"). And you do not want to have to store the relations as duplicates (once that Fred is a father of Sally; twice that Sally is a child of Fred), that single fact exists in a single row, which can be interpreted Parent⇢Child or Parent⇠Child.
This requirement has come up in many questions, therefore I am using a single generic, but detailed, illustration. The model defines any tree structure that needs to be walked up or down, handled by simple recursion. It is called a Bill of Materials structure, originally created for inventory control systems, and can be applied to any tree structure requirement. It is Fifth Normal Form; no duplicate columns; no Update Anomalies.
Bill of Materials
For Assemblies and Components, which would have many common columns, they are Normalised into Part; whether they are Assemblies or Components is contextual, and these contextual columns are located in the Associative (many-to-many) table.
Two Relations 1:1 or one 1:2 ?
Actually, it is two times 1::n.
Ordinals, or Ranking, is explicit in the Primary Key (chronological order). If some other ordinal is required, simply add a column to the Associative table. better yet, it is truly a derived column, so compute it at runtime from current values.
I'll admit that I'm not too familiar with object databases, but in relational terms this is a straightforward one-to-many (optional) relationship.
create table parent (
id int PK,
otherField whatever
)
create table child (
id int PK,
parent_id int Fk,
otherField whatever
)
Obviously, that's not usable code as it stands....
I think this is similar to your second example. If you need to track the ordinal postion of the children in their relationships to the parent, you'd add a column to the child table such as:
create table child (
id int PK,
parent_id int Fk,
birth_order int,
otherField whatever
)
You'd have to be responsible for managing that field at teh application level, it's not something you can expect the DBMS to do for you.
I called it an optional relationship on the assumption that childless parents can exist--if that's not true, it becomes a required relationship logically, though you'd still have to let the DBMS create a new parent record childlessly, then grab its id to create the child--and once again manage the requirement at the application level.
This is probably a little out of context, since I use none of the things you've mentioned - but as far as the general design goes, here are a couple ideas:
Keep relationships based on common types: has_one, has_many, belongs_to, has_and_belongs_to_many.
With children, it's better to not specify N number of children explicitly; either there are none, one, or there could potentially be many. Thus your model declarations of child1 and child2 would be replaced by a single property - an array containing children.
To be totally honest, I don't know how well that fits in with what you're using. However, that's generally how relationships work in an ORM sense. So, based on this:
If a model belongs to another (it has a foreign key for another table), it would have a parent [sic] property with a reference to the parent object
If a model has one model that belongs to it (the other model has a foreign key to the first model's table), it would have a child [sic] property with a reference to the child object
If a model has many models that belong to it (many other models have foreign keys to the first model's table), it would have a children [sic] property that is an array of references to child objects
If a model has and belongs to many other models... you might want to consider using both parents and children properties, or something similar; nomenclature is less important than you having access to a group of models that it belongs to, and another group of models that belong to it.
Sorry if that's totally unhelpful, but it might shed some light. HTH.