Please see update at bottom
I have three classes. Let's call them Post, PostVersion, and Tag. (This is for an internal version control system in a web app, perhaps similar to StackOverflow, though I'm unsure of their implementation strategy). I sort of use terminology from git to understand it. These are highly simplified versions of the classes for the purposes of this question:
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
author_id = db.Column(db.Integer, db.ForeignKey("user.id"))
author = db.relationship("User", backref="posts")
head_id = db.Column(db.Integer, db.ForeignKey("post_version.id"))
HEAD = db.relationship("PostVersion", foreign_keys=[head_id])
added = db.Column(db.DateTime, default=datetime.utcnow)
class PostVersion(db.Model):
id = db.Column(db.Integer, primary_key=True)
editor_id = db.Column(db.Integer, db.ForeignKey("user.id"))
editor = db.relationship("User")
previous_id = db.Column(db.Integer, db.ForeignKey("post_version.id"), default=None)
previous = db.relationship("PostVersion")
pointer_id = db.Column(db.Integer, db.ForeignKey("post.id"))
pointer = db.relationship("Post", foreign_keys=[pointer_id])
post = db.Column(db.Text)
modified = db.Column(db.DateTime, default=datetime.utcnow)
tag_1_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_2_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_3_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_4_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_5_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_1 = db.relationship("Tag", foreign_keys=[tag_1_id])
tag_2 = db.relationship("Tag", foreign_keys=[tag_2_id])
tag_3 = db.relationship("Tag", foreign_keys=[tag_3_id])
tag_4 = db.relationship("Tag", foreign_keys=[tag_4_id])
tag_5 = db.relationship("Tag", foreign_keys=[tag_5_id])
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
tag = db.Column(db.String(128))
To make a new post, I create both a Post and an initial PostVersion to which Post.head_id points. Every time an edit is made, a new PostVersion is created pointing to the previous PostVersion, and the Post.head_id is reset to point to the new PostVersion. To reset the post version to an earlier version--well, I haven't gotten that far but it seems trivial to either copy the previous version or just reset the pointer to the previous version.
My question is this, though: how can I write a relationship between Post and Tag such that
Post.tags would be a list of all the tags the current PostVersion contains, and
Tag.posts would be a list of all the Post's that currently have that particular tag?
The first condition seems easy enough, a simple method
def get_tags(self):
t = []
if self.HEAD.tag_1:
t.append(self.HEAD.tag_1)
if self.HEAD.tag_2:
t.append(self.HEAD.tag_2)
if self.HEAD.tag_3:
t.append(self.HEAD.tag_3)
if self.HEAD.tag_4:
t.append(self.HEAD.tag_4)
if self.HEAD.tag_5:
t.append(self.HEAD.tag_5)
return t
does the trick just fine for now, but the second condition is almost intractable for me right now. I currently use an obnoxious method in Tag where I query for all the PostVersion's with the tag using an or_ filter:
def get_posts(self):
edits = PostVersion.query.filter(or_(
PostVersion.tag_1_id==self.id,
PostVersion.tag_2_id==self.id,
PostVersion.tag_3_id==self.id,
PostVersion.tag_4_id==self.id,
PostVersion.tag_5_id==self.id,
).order_by(PostVersion.modified.desc()).all()
posts = []
for e in edits:
if self in e.pointer.get_tags() and e.pointer not in posts:
posts.append(e.pointer)
return posts
This is horribly inefficient and I cannot paginate the results.
I know this would be a secondary join from Post to Tag or Tag to Post through PostVersion, but it would have to be a secondary join on an or, and I have no clue how to even start to write that.
Looking back on my code I'm beginning to wonder why some of these relationships require the foreign_keys parameter to be defined and others don't. I'm thinking it's relating to where they're defined (immediately following the FK id column or not) and noticing that there's a list for the foreign_keys, I'm thinking that's how I could define it. But I'm unsure how to pursue this.
I'm also wondering now if I could dispense with the pointer_id on PostVersion with a well-configured relationship. This, however, is irrelevant to the question (though the circular reference does cause headaches).
For reference, I am using Flask-SQLAlchemy, Flask-migrate, and MariaDB. I am heavily following Miguel Grinberg's Flask Megatutorial.
Any help or advice would be a godsend.
UPDATE
I have devised the following mysql query that works, and now I need to translate it into sqlalchemy:
SELECT
post.id, tag.tag
FROM
post
INNER JOIN
post_version
ON
post.head_id=post_version.id
INNER JOIN
tag
ON
post_version.tag_1_id=tag.id OR
post_version.tag_2_id=tag.id OR
post_version.tag_3_id=tag.id OR
post_version.tag_4_id=tag.id OR
post_version.tag_5_id=tag.id OR
WHERE
tag.tag="<tag name>";
Can you change the database design, or do you have to make your app work on a DB that you can't change? If the latter, I can't help you. If you can change the design, you should do it like this:
Replace the linked chain of PostVersions with a one-to-many relationship from Post to PostVersions. Your "Post" class will end up having a relationship "versions" to all instances of PostVersion pertinent to that Post.
Replace the tag_id members with a many-to-many relationship using an additional association table.
Both methods are well-explained in the SQLAlchemy docs. Be sure to start with minimal code, testing in small non-Flask command line programs. Once you have the basic functionality down, transfer the concept to your more complicated classes. After that, ask yourself your original questions again. The answers will come much more easily.
I solved the problem on my own, and it really just consists of defining a primary and secondary join with an or_ in the primary:
posts = db.relationship("Post", secondary="post_version",
primaryjoin="or_(Tag.id==post_version.c.tag_1_id,"
"Tag.id==post_version.c.tag_2_id,"
"Tag.id==post_version.c.tag_3_id,"
"Tag.id==post_version.c.tag_4_id,"
"Tag.id==post_version.c.tag_5_id)",
secondaryjoin="Annotation.head_id==post_version.c.id",
lazy="dynamic")
As you can see I mix table and class names. I will update the answer as I experiment to make it more regular.
Related
I am building a slightly unconventional marketplace in flask, the requirements mean that the price of products varies depending on parameters specified by the user.
The user enters details -> the site creates a quote -> when a user decides to purchase, an order is created (which is linked to the quote).
I am building a page where merchants (who are selling through the site) can view their orders (so they can fulfill them). The issue I am now faced with is that the merchantID is specified in the Product, and I do not think the filter_by is up to the task.
class Quote(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
order = db.relationship('Order', backref='quote', lazy=True)
class Product(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
product = db.relationship('Quote', backref='product', lazy=True)
g.merchant.id = current_merchant()
orders = Order.query.filter_by(Order.quote.product.merchantID == g.merchant.id).all()
To get the orders for a merchant, I need to filter the orders by the merchantID that exists in the Product, related to the Quote which is finally related to the Order.
I am wondering if there is an elegant way of querying for a child based on a condition specified on its parent.
I considered this briefly:
orders = Order.query.all()
my_orders = []
for order in orders:
if order.quote.product.merchantID == g.merchant.id:
my_orders.append(order)
But my ugly code senses are tingling (perhaps wrongly); I just suspect there is a better way around this problem.
Many Thanks.
Based on Matt Healy's answer on this question SQL Alchemy query child table filtered by parent join
Orders = Order.query.join(Quote).join(Product).filter(Product.merchantID == g.merchant.id).all()
Works.
I'm writing a simple bookmark manager program that uses SQLAlchemy for data storage. I have database objects for Bookmarks and Tags, and there is a many-to-many relationship between them: a bookmark can use any tags in the database, and each tag can be assigned to any (or even all) bookmarks in the database. Tags are automatically created and removed by the program – if the number of bookmarks referencing a tag drops to zero, the tag should be deleted.
Here's my model code, with unnecessary methods such as __str__() removed:
mark_tag_assoc = Table('mark_tag_assoc', Base.metadata,
Column('mark_id', Integer, ForeignKey('bookmarks.id')),
Column('tag_id', Integer, ForeignKey('tags.id')),
PrimaryKeyConstraint('mark_id', 'tag_id'))
class Bookmark(Base):
__tablename__ = 'bookmarks'
id = Column(Integer, primary_key=True)
name = Column(String)
url = Column(String)
description = Column(String)
tags_rel = relationship("Tag", secondary=mark_tag_assoc,
backref="bookmarks", cascade="all, delete")
class Tag(Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key=True)
text = Column(String)
I thought that if I set up a cascade (cascade="all, delete") it would take care of removing tags with no more references for me. What actually happens is that when any bookmark is deleted, all tags referenced by it are automatically removed from all other bookmarks and deleted, which is obviously not the intended behavior.
Is there a simple option to do what I want, or if not, what would be the cleanest way to implement it myself? Although I have a little bit of experience with simple SQL, this is my first time using SQLAlchemy, so details would be appreciated.
I'd still be interested to know if there happens to be a built-in function for this, but after further research it seems more likely to me that there is not, as there don't generally seem to be too many helpful functions for doing complicated stuff with many-to-many relationships. Here's how I solved it:
Remove cascade="all, delete" from the relationship so that no cascades are performed. Even with no cascades configured, SQLAlchemy will still remove rows from the association table when bookmarks are deleted.
Call a function after each delete of a Bookmark to check if the tag still has any relationships, and delete the tag if not:
def maybeExpungeTag(self, tag):
"""
Delete /tag/ from the tags table if it is no longer referenced by
any bookmarks.
Return:
True if the tag was deleted.
False if the tag is still referenced and was not deleted.
"""
if not len(tag.bookmarks):
self.session.delete(tag)
return True
else:
return False
# and for the actual delete...
mark = # ...get the bookmark being deleted
tags = mark.tags_rel
self.session.delete(mark)
for tag in tags:
self.maybeExpungeTag(tag)
self.session.commit()
The following code is for Flask-SQLAlchemy, but would be quite similar in SQLAlchemy.
I have two simple classes:
class Thread(db.Model):
id = db.Column(db.Integer, primary_key=True)
subject = db.Column(db.String)
messages = db.relationship('Message', backref='thread', lazy='dynamic')
class Message(db.Model):
id = db.Column(db.Integer, primary_key=True)
created = db.Column(db.DateTime, default=datetime.utcnow())
text = db.Column(db.String, nullable=False)
I would like to query all Threads and have them ordered by last message created. This is simple:
threads = Thread.query.join(Message).order_by(Message.created.desc()).all()
Threads is now a correctly ordered list I can iterate. However if I iterate over
threads[0].messages then Messages objects are not ordered by Message.created descending.
I can solve this issue while declaring the relationship:
messages = relationship('Message', backref='thread', lazy='dynamic',
order_by='Message.created.desc()')
However this is something I'd rather not do. I want explicitly set this while declaring my query.
I could also call:
threads[0].messages.reverse()
..but this is quite inconvenient in Jinja template.
Is there a good solution for setting order_by for joined model?
You have Thread.messages marked as lazy='dynamic'. This means that after querying for threads, messages is a query object, not a list yet. So iterate over threads[0].messages.order_by(Message.created.desc()).
I'm looking for the best way to define relationships between two people and query it in SQLAlchemy. I'm having a hard time wrapping my head around this. Here is what I have so far but I don't know if I should be using a model as link table like this. Advice?
Example character_a = student character_b = teacher
or [[relationship.character_b, relationship.character_b.role] for relationship in character.relationships] to get a list of related characters and their roles.
class Character(db.Model):
__tablename__ = 'characters'
story_id = db.Column(db.String, db.ForeignKey('stories.id'))
id = db.Column(db.Integer(), primary_key=True)
name = db.Column(db.String(50))
gender = db.Column(db.String(6))
description = db.Column(db.Text())
relationships = db.relationship('Relationship', backref='character', lazy='dynamic')
class Relationship(db.Model):
__tablename__ = 'relationships'
character_a_id = db.Column(db.String, db.ForeignKey('characters.id'))
character_b_id = db.Column(db.String, db.ForeignKey('characters.id'))
character_a_role = db.Column(db.String(25))
character_b_role = db.Column(db.String(25))
Database schemas are hard to get right the first time. I suggest you follow the advice of Peter Norvig - write the test cases in english, model them in code (play with the relationships assuming they exist in the database), you will discover the shortcomings of the current design that way. Then you can refine the relationships and when you are done, your code must be as readable as the use cases you wrote in english.
Suppose we have these classes:
class Item(Base):
id = Column(Integer, primary_key=True)
data = Column(String)
i18ns = relationship("ItemI18n", backref="item")
class ItemI18n(Base):
lang_short = Column(String, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'), primary_key=True)
name = Column(String)
The idea here is to have this item's name in multiple languages, for example in English and German. This works fine so far, one can easily work with that. However, most times, I am not interested in all (i.e. both) names but only the users locale.
For example, if the user is English and wants to have the name in his language, I see two options:
# Use it as a list
item = session.query(Item).first()
print item.data, [i18n.name for i18n in item.i18ns if i18n.lang_short == "en"][0]
# Get the name separately
item, name = session.query(Item, ItemI18N.name).join(ItemI18N).filter(ItemI18N.lang_short == "en").first()
print item.data, name
The first one filters the list, the second one queries the language separately. The second is the more efficient way as it only pulls the data really needed. However, there is a drawback: I now have to carry around two variables: item and name. If I were to extend my ItemI18N for example, add a description property, then I would query for ItemI18N and carry those around.
But business logic is different: I would expect to have an Item with a name and description attribute, so that I would do something like this:
item = session.query(Item).first()
print item.data, item.name
So that's where I want to go: Pull all those attributes from Item18N directly into Item. And of course, I would have to specify the language anywhere. However, I cannot find any recipes for this since I don't even know what to search for. Can SQLAlchemy do such a thing?
I also created a complete example for everything I described (except of course the part I don't know how to realize).
Edit: I have played around a bit more to see whether I can come up with a better solution and so far, I have found one way that works. I initially tried to realize it with Query.get but this doesn't work beyond my simple example, because reality is different. To explain, I have to extend my initial model by adding a Language table and turn ItemI18N into a many-to-many relationship with the primary key being (lang_id, item_id):
class ItemI18N(Base):
lang_id = Column(Integer, ForeignKey('language.id'), primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'), primary_key=True)
name = Column(String)
language = relationship("Language", lazy="joined")
class Language(Base):
id = Column(Integer, primary_key=True)
short = Column(String)
Now to get my correct locale I simply turn all loadings into joined loadings by applying lazy="joined" to the complete path. This will inevitable pull in all languages thus returning more data than I need. My approach is then completely independent of SQLAlchemy:
class Item(Base):
...
i18ns = relationship("ItemI18N", backref="item", cascade="all, delete-orphan", lazy="joined")
def name(self, locale):
for i18n in self.i18ns:
if i18n.language.short == locale:
return i18n.name
But this is not a pretty solution, both because of the overhead of retrieving all I18N data from the database and then fitering that result back to one thus making it completely irrelevant that I pulled all in the first place (since the locale will stay the same the whole time). My new full example shows how only one query is executed and gives my transparent access - but with an ugly overhead I would like to avoid.
The example also contains some playing around with transformations I have done. This could point to a solution from that direction, but I wasn't happy with this either because it required me to pass in the with_transformation part every time. I'd like it much better if this would automatically be applied when Item is queried. But I have found no event or other for this.
So now I have multiple solution attempts that all lack the ease of direct access compared to the business logic described above. I hope someone is able to figure out how to close these gaps to produce something nice and clean.