SQLAlchemy: Map multiple instances to the same row

SQLAlchemy: Map multiple instances to the same row - python

Consider Many-to-One relationship: Article-to-Keyword.
An article has only one keyword, single keyword can be referenced by multiple articles.
class Article(Base):
__tablename__ = 'article'
id = Column(Integer, primary_key=True)
keyword_id = Column(Integer, ForeignKey('keyword.id'))
keyword = relationship("Keyword")
class Keyword(Base):
__tablename__ = 'keyword'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
Now, I'd like to be able to associate multiple instances of Keyword having the same name value
with the sole row in keyword table.
So that saving a1 and a2:
a1 = Article()
a1.keyword = Keyword(name="science")
a2 = Article()
a2.keyword = Keyword(name="science")
Wouldn't yield unique constraint violation.
Currently one has to create additional query fetching FK from keyword table and set it in Article object as Article.keyword_id.
Gets pretty boring with real-world schemas.

Take a look at Unique Object recipe.

Related

Performance issue with multiple table references to one

Database model (simplified):
class Document(Base):
__tablename__ = "document"
id = Column(Integer, primary_key=True)
...
#classmethod
def link(cls):
col = Column(Integer, ForeignKey(cls.id))
rel = relationship(cls, uselist=False, lazy='selectin', foreign_keys=[col])
return col, rel
class File(Base):
__tablename__ = "document_file"
id = Column(Integer, primary_key=True)
document_id = Column(Integer, ForeignKey(Document.id))
document = relationship( Document, back_populates="files" )
...
class A(Base):
__tablename__ = "a"
id = Column(Integer, primary_key=True)
...
class B(Base):
__tablename__ = "b"
id = Column(Integer, primary_key=True)
a_id = Column(Integer, ForeignKey(A.id))
a = relationship(A, back_populates="a")
...
doc1_id, doc1 = Document.link()
doc2_id, doc2 = Document.link()
....
class C(Base):
__tablename__ = "c"
id = Column(Integer, primary_key=True)
b_id = Column(Integer, ForeignKey(B.id))
b = relationship(B, back_populates="a")
...
doc_id, doc = Document.link()
...
There is entity Document used in domain-specific entities. It has complex hierarchical structure with several one-to-many layers. It's fetched as a single root object (class A in the example), then lazy-loading of SQLAlchemy does its magic.
In a single one-to-many case SQLAlchemy detects it and loads whole list of objects in a single request to database. But in given case this optimization is not working: Document instances are fetched one-by-one. With 10k+ objects it becomes very slow.
Two ways to solve this:
Use selectinload policy for whole hierarchy. Drawback is I need a whole hierarchy after a check of several objects. Moreover, sometimes I can't use an additional query with populate_existing command because the objects are already modified.
Use SQLAlchemy mapper data to scan for references to Document and fetch it to the dictionary with a single request. But I can't just assign fetched objects to properties managed by SQLAlchemy, so this technique is not uniform with ordinary access to objects.
Two questions:
Is there some another optimization technique?
Is there a more efficient solution in terms of database model architecture?

SQLAlchemy Handling Multiple Paths In One Relationship

Please note: this question is related but separate from my other currently open question SQLAlchemy secondary join relationship on multiple foreign keys.
The SQLAlchemy documentation describes handling multiple join paths in a single class for multiple relationships:
from sqlalchemy import Integer, ForeignKey, String, Column
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
Base = declarative_base()
class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(String)
billing_address_id = Column(Integer, ForeignKey("address.id"))
shipping_address_id = Column(Integer, ForeignKey("address.id"))
billing_address = relationship("Address")
shipping_address = relationship("Address")
class Address(Base):
__tablename__ = 'address'
id = Column(Integer, primary_key=True)
street = Column(String)
city = Column(String)
state = Column(String)
zip = Column(String)
Within the same section the documentation shows three separate ways to define the relationship:
billing_address = relationship("Address", foreign_keys=[billing_address_id])
billing_address = relationship("Address", foreign_keys="[Customer.billing_address_id]")
billing_address = relationship("Address", foreign_keys="Customer.billing_address_id")
As you can see in (1) and (2) SQLAlchemy allows you to define a list of foreign_keys. In fact, the documentation explicitly states:
In this specific example, the list is not necessary in any case as there’s only one Column we need: billing_address = relationship("Address", foreign_keys="Customer.billing_address_id")
But I cannot determine how to use the list notation to specify multiple foreign keys in a single relationship.
For the classes
class PostVersion(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
tag_1_id = db.Column(db.Integer, db.ForeignKey("tag.id"))
tag_2_id = db.Column(db.Integer, db.ForeignKey("tag.id"))
tag_3_id = db.Column(db.Integer, db.ForeignKey("tag.id"))
tag_4_id = db.Column(db.Integer, db.ForeignKey("tag.id"))
tag_5_id = db.Column(db.Integer, db.ForeignKey("tag.id"))
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
tag = db.Column(db.String(127))
I have tried all of the following:
tags = db.relationship("Tag", foreign_keys=[tag_1_id, tag_2_id, tag_3_id, tag_4_id, tag_5_id]) resulting in
sqlalchemy.exc.AmbiguousForeignKeysError: Could not determine join condition between parent/child tables on relationship AnnotationVersion.tags - there are multiple foreign key paths linking the tables. Specify the 'foreign_keys' argument, providing a list of those columns which should be counted as containing a foreign key reference to the parent table.
tags = db.relationship("Tag", foreign_keys="[tag_1_id, tag_2_id, tag_3_id, tag_4_id, tag_5_id]") resulting in
sqlalchemy.exc.InvalidRequestError: When initializing mapper Mapper|AnnotationVersion|annotation_version, expression '[tag_1_id, tag_2_id, tag_3_id, tag_4_id, tag_5_id]' failed to locate a name ("name 'tag_1_id' is not defined"). If this is a class name, consider adding this relationship() to the class after both dependent classes have been defined.
And many others variations on the list style, using quotes inside and outside, using Table names and Class names.
I've actually solved the problem in the course of this question. Since there seems to be no direct documentation, I'll answer it myself instead of deleting this question.

The key is to define the relationship on a primary join and specify the uselist parameter.
tags = db.relationship("Tag", primaryjoin="or_(PostVersion.tag_1_id==Tag.id,"
"PostVersion.tag_2_id==Tag.id, PostVersion.tag_3_id==Tag.id,"
"PostVersion.tag_4_id==Tag.id, PostVersion.tag_5_id==Tag.id)",
uselist=True)

Using sqlalchemy to define relationships in MySQL

I am in the process of working with sqlalchemy and MySQL to build a database. I am currently having trouble defining a specific relationship between two tables.
class Experiment_Type(Base):
__tablename__='experiment_types'
id = Column(Integer, primary_key=True)
type = Column(String(100))
class Experiments(Base):
__tablename__ = 'experiments'
id = Column(Integer, primary_key=True)
type = Column(String(100))
sub_id = Column(Integer, ForeignKey('experiment_types.id'))
experiments = relationship('Experiments',
primaryjoin="and_(Experiment_Type.id == Experiments.sub_id,
'Experiments.type' == 'Experiment_Type.type')",
backref=backref('link'))
What I want to do is have values of sub_id in experiments match the id in experiment_types based on type (if an entry in experiment_types of type = 'type1' has id = 1, then an entry in experiments with type = 'type1' should have a sub_id = 1). I am not even sure if this is the best way to approach defining the relationship in this situation
so any advice is welcome.
The current error message is this:
sqlalchemy.exc.ArgumentError: Could not locate any relevant foreign key columns for primary join condition '0 = 1' on relationship Experiments.experiments. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or are annotated in the join condition with the foreign() annotation.

The whole point of setting up relationships in relational dbs is to not have to duplicate data across tables. Just do something like this:
class ExperimentType(Base):
__tablename__='experiment_types'
id = Column(Integer, primary_key=True)
name = Column(String(100))
class Experiments(Base):
__tablename__ = 'experiments'
id = Column(Integer, primary_key=True)
description = Column(String(100))
type_id = Column(Integer, ForeignKey('experiment_types.id'))
type = relationship("ExperimentType")
Then, if you do need to display the experiment type stuff later, can access it with something like:
exp = session.query(Experiment).first()
print exp.type.name

SQLAlchemy foreign key lazy loading

I have a basic one to many relationship:
class Term(Base):
__tablename__ = 'term'
id = Column(Integer, primary_key=True)
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
term = Column(Integer, ForeignKey('term.id'))
But when I load the Node object, access the "term" property, I just get the numeric term id, not the Term object.
node = session.query(Node).filter(Node.id == 1).one()
print node.term # 123
How do I get Foreign Key fields to lazy load the object?
Thanks very much.
Ben

because your term attribute is a Column, sqlalchemy maps it as that column's value. You can get sqlalchemy to actually load the referent row by using relationship:
from sqlalchemy.orm import relationship
class Term(Base):
__tablename__ = 'term'
id = Column(Integer, primary_key=True)
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
term = Column(Integer, ForeignKey('term.id'))
related_term = relationship(Term, backref="nodes")
Because my_node.related_term looks a bit odd, I tend to prefer a naming convention of having the column called table_column instead of just table, so that I can also name the relationship attribute after the table, instead of inventing some other, odd name.

Use the returned value of node.term for a new query, to get the related objects:
node = session.query(Node).filter(Node.id == 1).one()
related_term = session.query(Term).filter(Term.id == node.term).one()

Do I need multiple association tables for this relationship?

I'm trying to get my head around the best way to construct a relationship that maps many Constants to many Items.
My initial relationship, an Item has a Constant, looked like this.
class Constant(Base):
__tablename__ = "Constant"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constantId = Column(Integer, ForeignKey("Constant.id"))
constant = relationship("Constant")
However, I really need my item to have more than one constant, something like this...
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1Id = Column(Integer, ForeignKey("Constant.id"))
constant1 = relationship("Constant")
constant2Id = Column(Integer, ForeignKey("Constant.id"))
constant2 = relationship("Constant")
My first attempt was to use an association table...
item_to_constant_assoc = Table("itemToConstantAssoc", Base.metadata, Column("constantId", Integer, ForeignKey("Constant.id"), Column("itemId", Integer, ForeignKey("Item.id")))
while updating the Item class accordingly:
Class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1 = relationship("Constant", secondary=item_to_constant_assoc, uselist=False)
constant2 = relationship("Constant", secondary=item_to_constant_assoc, uselist=False)
This failed (understandably when looking at the MySQL tables that were created) because Item.constant1 and Item.constant2 referred to the same entry in the association table.
My next step is to add another association table for the second constant but I have to wonder whether I'm barking up the wrong tree as I seem to be creating a large number of tables for a relatively simple mapping. I've read the documentation. It is detailed and substantial (thanks Michael Bayer!) and I may have just overlooked a section. Could anyone provide me with a few pointers either to this problem, or what I should be looking for in the docs?
Thanks!
Phil

Couldn't see the wood for the trees. This is easily accomplished by using the primaryjoin argument on the relationship.
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1Id = Column(Integer, ForeignKey("Constant.id"))
constant1 = relationship("Constant", primaryjoin="Constant.id==Item.constant1Id")
constant2Id = Column(Integer, ForeignKey("Constant.id"))
constant2 = relationship("Constant", primaryjoin="Constant.id==Item.constant2Id")

A many-to-many association already allows each Item to have an unlimited number of constants. You don't need anything more than this as your two base tables.
class Constant(Base):
__tablename__ = "Constant"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
item_to_constant_assoc = Table("itemToConstantAssoc", Base.metadata, Column("constantId", Integer, ForeignKey("Constant.id"), Column("itemId", Integer, ForeignKey("Item.id")))
At this point, every Item has an unlimited number of Constants. When you want a specific constant, you have to query the constant by the name attribute in the Constant table. Your association table is merely a list of key pairs: (itemID, constantId).
The set of all Constants for an Item is a three-table join for all association rows joined with matching Constant rows for a given Item.
The set of all Items for a Constant is a three-table join for all association rows join with match Item rows for a given Constant.
A specific Constant for an Item needs to be retrieved via a join. You think of it like the the set of all Constants for a given Item where both the Item and the Constant name are given. The SQL involves a join even though only a single row is retrieved.
I think your generic query to associate a constant with all relevant items or an item with all relevant constants will look something like this.
query(Item). join(item_to_constant_assoc.itemId==Item.itemId). join(item_to_constant_assoc.contantId==Constant.constantId

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQLAlchemy: Map multiple instances to the same row - python

Take a look at Unique Object recipe.

Related

Performance issue with multiple table references to one

SQLAlchemy Handling Multiple Paths In One Relationship

Using sqlalchemy to define relationships in MySQL

SQLAlchemy foreign key lazy loading

Do I need multiple association tables for this relationship?

Categories

Resources