Performance issue with multiple table references to one - python

Database model (simplified):
class Document(Base):
__tablename__ = "document"
id = Column(Integer, primary_key=True)
...
#classmethod
def link(cls):
col = Column(Integer, ForeignKey(cls.id))
rel = relationship(cls, uselist=False, lazy='selectin', foreign_keys=[col])
return col, rel
class File(Base):
__tablename__ = "document_file"
id = Column(Integer, primary_key=True)
document_id = Column(Integer, ForeignKey(Document.id))
document = relationship( Document, back_populates="files" )
...
class A(Base):
__tablename__ = "a"
id = Column(Integer, primary_key=True)
...
class B(Base):
__tablename__ = "b"
id = Column(Integer, primary_key=True)
a_id = Column(Integer, ForeignKey(A.id))
a = relationship(A, back_populates="a")
...
doc1_id, doc1 = Document.link()
doc2_id, doc2 = Document.link()
....
class C(Base):
__tablename__ = "c"
id = Column(Integer, primary_key=True)
b_id = Column(Integer, ForeignKey(B.id))
b = relationship(B, back_populates="a")
...
doc_id, doc = Document.link()
...
There is entity Document used in domain-specific entities. It has complex hierarchical structure with several one-to-many layers. It's fetched as a single root object (class A in the example), then lazy-loading of SQLAlchemy does its magic.
In a single one-to-many case SQLAlchemy detects it and loads whole list of objects in a single request to database. But in given case this optimization is not working: Document instances are fetched one-by-one. With 10k+ objects it becomes very slow.
Two ways to solve this:
Use selectinload policy for whole hierarchy. Drawback is I need a whole hierarchy after a check of several objects. Moreover, sometimes I can't use an additional query with populate_existing command because the objects are already modified.
Use SQLAlchemy mapper data to scan for references to Document and fetch it to the dictionary with a single request. But I can't just assign fetched objects to properties managed by SQLAlchemy, so this technique is not uniform with ordinary access to objects.
Two questions:
Is there some another optimization technique?
Is there a more efficient solution in terms of database model architecture?

Related

sqlalchemy mapping multiple entries to the same backref

Perhaps this is a design issue but I have a case where I have a class that defines a default part, and alternate_parts.
class Entity(Base):
__tablename__ = "entity"
id = Column(Integer, primary_key=True)
part_id = Column(Integer, ForeignKey('part.id'))
part = relationship("Part", backref="entities")
alternate_parts = relationship("Part", secondary="associate_entity_to_parts", backref="entities")
...
class Part(Base):
__tablename__ = "part"
id = Column(Integer, primary_key=True)
...
This throws an error:
sqlalchemy.exc.ArgumentError: Error creating backref 'entities' on relationship 'Entity.alternate_parts': property of that name exists on mapper 'Mapper|Parts|part'
The purpose of this is that I want to be able to reverse look up any entities that refer to this part including ones where they are alternates. The reason for storing the alternates in this way is that this way I don't have to to store some kind of a "default" part id somewhere.
Any suggestions?

SQLAlchemy: Map multiple instances to the same row

Consider Many-to-One relationship: Article-to-Keyword.
An article has only one keyword, single keyword can be referenced by multiple articles.
class Article(Base):
__tablename__ = 'article'
id = Column(Integer, primary_key=True)
keyword_id = Column(Integer, ForeignKey('keyword.id'))
keyword = relationship("Keyword")
class Keyword(Base):
__tablename__ = 'keyword'
id = Column(Integer, primary_key=True)
name = Column(String, nullable=False, unique=True)
Now, I'd like to be able to associate multiple instances of Keyword having the same name value
with the sole row in keyword table.
So that saving a1 and a2:
a1 = Article()
a1.keyword = Keyword(name="science")
a2 = Article()
a2.keyword = Keyword(name="science")
Wouldn't yield unique constraint violation.
Currently one has to create additional query fetching FK from keyword table and set it in Article object as Article.keyword_id.
Gets pretty boring with real-world schemas.
Take a look at Unique Object recipe.

Long chained exists query with multiple one-to-many mappings in the chain

Edit: Following piece seems to be the right way:
session.query(User).join("userGadgets", "gadget", "components","gadgetComponentMetals")
Original:
I have the following tables configured:
class User(Base):
__tablename__ = "user"
id = Column(Integer, primary_key=True)
name = Column(String)
class Gadget(Base):
__tablename__ = "gadget"
id = Column(Integer, primary_key=True)
brand = Column(String)
class UserGadget(Base):
__tablename__ = "user_gadget"
user_id = Column(Integer, ForeignKey('user.id'), primary_key=True)
gadget_id = Column(Integer, ForeignKey('gadget.id'), primary_key=True)
user = relationship("User", backref=backref('userGadgets', order_by=user_id))
gadget = relationship("Gadget", backref=backref('userGadgets', order_by=gadget_id))
class GadgetComponent(Base):
__tablename__ = "gadget_component"
id = Column(String, primary_key=True)
gadget_id = Column(Integer,ForeignKey('gadget.id'))
component_maker = Column(String)
host = relationship("Gadget", backref=backref('components', order_by=id))
class ComponentUsingMetal(Base):
__tablename__ = "component_metal"
id = Column(Integer, primary_key=True)
component_id = Column(Integer, ForeignKey('GadgetComponent.id'))
metal = Column(String)
component = relationship("GadgetComponent", backref=backref('gadgetComponentMetals', order_by=id))
I want to find all user names for users who own gadgets having at least one component containing some kind of metal. SQL query for this will be something along the lines of:
SELECT distinct u.name FROM user u join user_gadget ug on (u.id = ug.user_id) join gadget_component gc on (ug.gadget_id = gc.id) join component_metal cm on (gc.id = cm.component_id) order by u.name
I have tried various versions along the line of: session.query(User).filter(User.userGadgets.any(UserGadget.gadget.components.any(GadgetComponent.gadgetComponentMetals.exists())))
I get the below error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with UserGadget.gadget has an attribute 'gadgetComponents'
Any ideas on what I am doing wrong or is there a better way to do this kind of query in SQLAlchemy?
the join() is the better way to go here since any() is going to produce lots of expensive nested subqueries. but the mistake you made with the "any" is using syntax like: UserGadget.gadget.components. SQLAlchemy doesn't continue the namespace of attributes in a series like that, e.g. there is no UserGadget.gadget.components; there is just UserGadget.gadget and Gadget.components, separately. Just like SQL won't let you say, "SELECT * from user_gadget.gadget_id.gadget.component_id" or something, SQLAlchemy needs you to tell it how you want to join together multiple tables that you're querying from. With the any() here it would be something like any(and_(UserGadget.gadget_id == GadgetComponent.gadget_id)), but using JOIN is better in any case.

SQLAlchemy foreign key lazy loading

I have a basic one to many relationship:
class Term(Base):
__tablename__ = 'term'
id = Column(Integer, primary_key=True)
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
term = Column(Integer, ForeignKey('term.id'))
But when I load the Node object, access the "term" property, I just get the numeric term id, not the Term object.
node = session.query(Node).filter(Node.id == 1).one()
print node.term # 123
How do I get Foreign Key fields to lazy load the object?
Thanks very much.
Ben
because your term attribute is a Column, sqlalchemy maps it as that column's value. You can get sqlalchemy to actually load the referent row by using relationship:
from sqlalchemy.orm import relationship
class Term(Base):
__tablename__ = 'term'
id = Column(Integer, primary_key=True)
class Node(Base):
__tablename__ = 'node'
id = Column(Integer, primary_key=True)
term = Column(Integer, ForeignKey('term.id'))
related_term = relationship(Term, backref="nodes")
Because my_node.related_term looks a bit odd, I tend to prefer a naming convention of having the column called table_column instead of just table, so that I can also name the relationship attribute after the table, instead of inventing some other, odd name.
Use the returned value of node.term for a new query, to get the related objects:
node = session.query(Node).filter(Node.id == 1).one()
related_term = session.query(Term).filter(Term.id == node.term).one()

Do I need multiple association tables for this relationship?

I'm trying to get my head around the best way to construct a relationship that maps many Constants to many Items.
My initial relationship, an Item has a Constant, looked like this.
class Constant(Base):
__tablename__ = "Constant"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constantId = Column(Integer, ForeignKey("Constant.id"))
constant = relationship("Constant")
However, I really need my item to have more than one constant, something like this...
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1Id = Column(Integer, ForeignKey("Constant.id"))
constant1 = relationship("Constant")
constant2Id = Column(Integer, ForeignKey("Constant.id"))
constant2 = relationship("Constant")
My first attempt was to use an association table...
item_to_constant_assoc = Table("itemToConstantAssoc", Base.metadata, Column("constantId", Integer, ForeignKey("Constant.id"), Column("itemId", Integer, ForeignKey("Item.id")))
while updating the Item class accordingly:
Class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1 = relationship("Constant", secondary=item_to_constant_assoc, uselist=False)
constant2 = relationship("Constant", secondary=item_to_constant_assoc, uselist=False)
This failed (understandably when looking at the MySQL tables that were created) because Item.constant1 and Item.constant2 referred to the same entry in the association table.
My next step is to add another association table for the second constant but I have to wonder whether I'm barking up the wrong tree as I seem to be creating a large number of tables for a relatively simple mapping. I've read the documentation. It is detailed and substantial (thanks Michael Bayer!) and I may have just overlooked a section. Could anyone provide me with a few pointers either to this problem, or what I should be looking for in the docs?
Thanks!
Phil
Couldn't see the wood for the trees. This is easily accomplished by using the primaryjoin argument on the relationship.
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
constant1Id = Column(Integer, ForeignKey("Constant.id"))
constant1 = relationship("Constant", primaryjoin="Constant.id==Item.constant1Id")
constant2Id = Column(Integer, ForeignKey("Constant.id"))
constant2 = relationship("Constant", primaryjoin="Constant.id==Item.constant2Id")
A many-to-many association already allows each Item to have an unlimited number of constants. You don't need anything more than this as your two base tables.
class Constant(Base):
__tablename__ = "Constant"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
class Item(Base):
__tablename__ = "Item"
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String(64), nullable=False)
item_to_constant_assoc = Table("itemToConstantAssoc", Base.metadata, Column("constantId", Integer, ForeignKey("Constant.id"), Column("itemId", Integer, ForeignKey("Item.id")))
At this point, every Item has an unlimited number of Constants. When you want a specific constant, you have to query the constant by the name attribute in the Constant table. Your association table is merely a list of key pairs: (itemID, constantId).
The set of all Constants for an Item is a three-table join for all association rows joined with matching Constant rows for a given Item.
The set of all Items for a Constant is a three-table join for all association rows join with match Item rows for a given Constant.
A specific Constant for an Item needs to be retrieved via a join. You think of it like the the set of all Constants for a given Item where both the Item and the Constant name are given. The SQL involves a join even though only a single row is retrieved.
I think your generic query to associate a constant with all relevant items or an item with all relevant constants will look something like this.
query(Item). join(item_to_constant_assoc.itemId==Item.itemId). join(item_to_constant_assoc.contantId==Constant.constantId

Categories

Resources