The following code is for Flask-SQLAlchemy, but would be quite similar in SQLAlchemy.
I have two simple classes:
class Thread(db.Model):
id = db.Column(db.Integer, primary_key=True)
subject = db.Column(db.String)
messages = db.relationship('Message', backref='thread', lazy='dynamic')
class Message(db.Model):
id = db.Column(db.Integer, primary_key=True)
created = db.Column(db.DateTime, default=datetime.utcnow())
text = db.Column(db.String, nullable=False)
I would like to query all Threads and have them ordered by last message created. This is simple:
threads = Thread.query.join(Message).order_by(Message.created.desc()).all()
Threads is now a correctly ordered list I can iterate. However if I iterate over
threads[0].messages then Messages objects are not ordered by Message.created descending.
I can solve this issue while declaring the relationship:
messages = relationship('Message', backref='thread', lazy='dynamic',
order_by='Message.created.desc()')
However this is something I'd rather not do. I want explicitly set this while declaring my query.
I could also call:
threads[0].messages.reverse()
..but this is quite inconvenient in Jinja template.
Is there a good solution for setting order_by for joined model?
You have Thread.messages marked as lazy='dynamic'. This means that after querying for threads, messages is a query object, not a list yet. So iterate over threads[0].messages.order_by(Message.created.desc()).
Related
I need to implement many-to-many relationship with additional columns in Flask-SQLAlchemy. I am currently using association table to link two models (following this guide https://flask-sqlalchemy.palletsprojects.com/en/master/models/#many-to-many-relationships). My problem is that this relationship need to have additional attached data. My two models and table are:
log = db.Table('log',
db.Column('workout_id', db.Integer, db.ForeignKey('workout.id')),
db.Column('exercise_variant_id', db.Integer, db.ForeignKey('exercise_variant.id')),
db.Column('quantity', db.Integer, nullable=False),
db.Column('series', db.Integer, nullable=False)
)
class ExerciseVariant(db.Model):
__tablename__ = 'exercise_variant'
id = db.Column(db.Integer, primary_key=True)
class Workout(db.Model):
__tablename__ = 'workout'
id = db.Column(db.Integer, primary_key=True)
exercises = db.relationship('ExerciseVariant', secondary=log, lazy='subquery',
backref=db.backref('workouts', lazy=True))
This approach is working ok, but the current method for adding records to log table seems a bit hacky to me, since I have to first query both objects to get id I am looking for and then create custom statement:
statement = log.insert().values(
workout_id=workout.id,
exercise_variant_id=exercise_variant.id,
quantity=exercise_dict['quantity'],
series=exercise_dict['series']
)
db.session.execute(statement)
My questions are:
1. Should this kind of relationship be implemented using Table or Model?
2. If an answer to 1. is Table, can I somehow use backrefs to pass object instances instead of querying and passing their id?
I have 2 table like this:
class Role(db.Model):
__tablename__ = 'roles'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(64), unique=True)
index = db.Column(db.String(64))
users = db.relationship('User',
backref='role', lazy='dynamic')
class User(UserMixin, db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
email = db.Column(db.String(64), unique=True, index=True)
role_id = db.Column(db.Integer, db.ForeignKey('roles.id'))
Then I try to making 2 kinds of query to get data for the relationship models.
first, I make it like this:
user = db.session.query(User, Role.index).filter_by(email=form.email.data).first()
and the second one I use join statement on that:
user = db.session.query(User, Role.index).join(Role).filter(User.email==form.email.data).first()
My questions are, what's the difference in that query while in the second one I use the join statement but the result still same.
For the fast query or performance, should I use the first or the second one..?
The difference is that the first query will add both users and roles to FROM list, which results in a CROSS JOIN. In other words every row from users is joined with every row from roles. The second query performs an INNER JOIN and SQLAlchemy deduces the ON clause based on the foreign key relationship between the tables.
You should use the first one when you want a cartesian product, and the second one when you want the role related to the user by the foreign key relationship. That the result happens to be the same for you is just a coincidence.
For future reference, try enabling echo so that you can check from your logs what queries are actually emitted. Also have a look at defining ORM relationships, which would allow you to have a role attribute on User for accessing its related Role.
If your entities are from different classes/tables then joining is implied and SQL Alchemy will add it to actual SQL. You may add custom join if that connection isn't the one that SQL Alchemy uses (retrieved from foreign key or such).
Please see update at bottom
I have three classes. Let's call them Post, PostVersion, and Tag. (This is for an internal version control system in a web app, perhaps similar to StackOverflow, though I'm unsure of their implementation strategy). I sort of use terminology from git to understand it. These are highly simplified versions of the classes for the purposes of this question:
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
author_id = db.Column(db.Integer, db.ForeignKey("user.id"))
author = db.relationship("User", backref="posts")
head_id = db.Column(db.Integer, db.ForeignKey("post_version.id"))
HEAD = db.relationship("PostVersion", foreign_keys=[head_id])
added = db.Column(db.DateTime, default=datetime.utcnow)
class PostVersion(db.Model):
id = db.Column(db.Integer, primary_key=True)
editor_id = db.Column(db.Integer, db.ForeignKey("user.id"))
editor = db.relationship("User")
previous_id = db.Column(db.Integer, db.ForeignKey("post_version.id"), default=None)
previous = db.relationship("PostVersion")
pointer_id = db.Column(db.Integer, db.ForeignKey("post.id"))
pointer = db.relationship("Post", foreign_keys=[pointer_id])
post = db.Column(db.Text)
modified = db.Column(db.DateTime, default=datetime.utcnow)
tag_1_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_2_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_3_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_4_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_5_id = db.Column(db.Integer, db.ForeignKey("tag.id"), default=None)
tag_1 = db.relationship("Tag", foreign_keys=[tag_1_id])
tag_2 = db.relationship("Tag", foreign_keys=[tag_2_id])
tag_3 = db.relationship("Tag", foreign_keys=[tag_3_id])
tag_4 = db.relationship("Tag", foreign_keys=[tag_4_id])
tag_5 = db.relationship("Tag", foreign_keys=[tag_5_id])
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
tag = db.Column(db.String(128))
To make a new post, I create both a Post and an initial PostVersion to which Post.head_id points. Every time an edit is made, a new PostVersion is created pointing to the previous PostVersion, and the Post.head_id is reset to point to the new PostVersion. To reset the post version to an earlier version--well, I haven't gotten that far but it seems trivial to either copy the previous version or just reset the pointer to the previous version.
My question is this, though: how can I write a relationship between Post and Tag such that
Post.tags would be a list of all the tags the current PostVersion contains, and
Tag.posts would be a list of all the Post's that currently have that particular tag?
The first condition seems easy enough, a simple method
def get_tags(self):
t = []
if self.HEAD.tag_1:
t.append(self.HEAD.tag_1)
if self.HEAD.tag_2:
t.append(self.HEAD.tag_2)
if self.HEAD.tag_3:
t.append(self.HEAD.tag_3)
if self.HEAD.tag_4:
t.append(self.HEAD.tag_4)
if self.HEAD.tag_5:
t.append(self.HEAD.tag_5)
return t
does the trick just fine for now, but the second condition is almost intractable for me right now. I currently use an obnoxious method in Tag where I query for all the PostVersion's with the tag using an or_ filter:
def get_posts(self):
edits = PostVersion.query.filter(or_(
PostVersion.tag_1_id==self.id,
PostVersion.tag_2_id==self.id,
PostVersion.tag_3_id==self.id,
PostVersion.tag_4_id==self.id,
PostVersion.tag_5_id==self.id,
).order_by(PostVersion.modified.desc()).all()
posts = []
for e in edits:
if self in e.pointer.get_tags() and e.pointer not in posts:
posts.append(e.pointer)
return posts
This is horribly inefficient and I cannot paginate the results.
I know this would be a secondary join from Post to Tag or Tag to Post through PostVersion, but it would have to be a secondary join on an or, and I have no clue how to even start to write that.
Looking back on my code I'm beginning to wonder why some of these relationships require the foreign_keys parameter to be defined and others don't. I'm thinking it's relating to where they're defined (immediately following the FK id column or not) and noticing that there's a list for the foreign_keys, I'm thinking that's how I could define it. But I'm unsure how to pursue this.
I'm also wondering now if I could dispense with the pointer_id on PostVersion with a well-configured relationship. This, however, is irrelevant to the question (though the circular reference does cause headaches).
For reference, I am using Flask-SQLAlchemy, Flask-migrate, and MariaDB. I am heavily following Miguel Grinberg's Flask Megatutorial.
Any help or advice would be a godsend.
UPDATE
I have devised the following mysql query that works, and now I need to translate it into sqlalchemy:
SELECT
post.id, tag.tag
FROM
post
INNER JOIN
post_version
ON
post.head_id=post_version.id
INNER JOIN
tag
ON
post_version.tag_1_id=tag.id OR
post_version.tag_2_id=tag.id OR
post_version.tag_3_id=tag.id OR
post_version.tag_4_id=tag.id OR
post_version.tag_5_id=tag.id OR
WHERE
tag.tag="<tag name>";
Can you change the database design, or do you have to make your app work on a DB that you can't change? If the latter, I can't help you. If you can change the design, you should do it like this:
Replace the linked chain of PostVersions with a one-to-many relationship from Post to PostVersions. Your "Post" class will end up having a relationship "versions" to all instances of PostVersion pertinent to that Post.
Replace the tag_id members with a many-to-many relationship using an additional association table.
Both methods are well-explained in the SQLAlchemy docs. Be sure to start with minimal code, testing in small non-Flask command line programs. Once you have the basic functionality down, transfer the concept to your more complicated classes. After that, ask yourself your original questions again. The answers will come much more easily.
I solved the problem on my own, and it really just consists of defining a primary and secondary join with an or_ in the primary:
posts = db.relationship("Post", secondary="post_version",
primaryjoin="or_(Tag.id==post_version.c.tag_1_id,"
"Tag.id==post_version.c.tag_2_id,"
"Tag.id==post_version.c.tag_3_id,"
"Tag.id==post_version.c.tag_4_id,"
"Tag.id==post_version.c.tag_5_id)",
secondaryjoin="Annotation.head_id==post_version.c.id",
lazy="dynamic")
As you can see I mix table and class names. I will update the answer as I experiment to make it more regular.
Is there a good way to speed up querying hybrid properties in SQLALchemy that involve relationships? I have the following two tables:
class Child(Base):
__tablename__ = 'Child'
id = Column(Integer, primary_key=True)
is_boy = Column(Boolean, default=False)
parent_id = Column(Integer, ForeignKey('Parent.id'))
class Parent(Base):
__tablename__ = 'Parent'
id = Column(Integer, primary_key=True)
children = relationship("Child", backref="parent")
#hybrid_property
def children_count(self):
return self.children_count.count()
#children_count.expression
def children_count(cls):
return (select([func.count(Children.id)]).
where(Children.parent_id == cls.id).
label("children_count")
)
When I query Parent.children_count across 50,000 rows (each parent has on average roughly 2 children), it's pretty slow. Is there a good way through indexes or something else for me to speed these queries up?
By default, PostgreSQL doesn't create indexes on foreign keys.
So the first thing I'd do is add an index, which SQLAlchemy makes really easy:
parent_id = Column(Integer, ForeignKey('Parent.id'), index=True)
This will probably result in a fast enough retrieval time given the size of your current dataset--try it and see. Be sure to try the query a few times in a row to warm up the PostgreSQL cache.
For a larger dataset, or if the queries still aren't fast enough, you could look into pre-calculating the counts and caching them... A number of ways to cache, the easiest hack is probably throw an extra column in your Parent table and just make sure whenever a new child is added that you write app logic to increment the count. It's a little hacky that way. Another option is caching the count in Redis/memcache, or even using a Materialized View (this is a great solution if it's okay for the count to occasionally be out of date by a few minutes).
I'm looking for the best way to define relationships between two people and query it in SQLAlchemy. I'm having a hard time wrapping my head around this. Here is what I have so far but I don't know if I should be using a model as link table like this. Advice?
Example character_a = student character_b = teacher
or [[relationship.character_b, relationship.character_b.role] for relationship in character.relationships] to get a list of related characters and their roles.
class Character(db.Model):
__tablename__ = 'characters'
story_id = db.Column(db.String, db.ForeignKey('stories.id'))
id = db.Column(db.Integer(), primary_key=True)
name = db.Column(db.String(50))
gender = db.Column(db.String(6))
description = db.Column(db.Text())
relationships = db.relationship('Relationship', backref='character', lazy='dynamic')
class Relationship(db.Model):
__tablename__ = 'relationships'
character_a_id = db.Column(db.String, db.ForeignKey('characters.id'))
character_b_id = db.Column(db.String, db.ForeignKey('characters.id'))
character_a_role = db.Column(db.String(25))
character_b_role = db.Column(db.String(25))
Database schemas are hard to get right the first time. I suggest you follow the advice of Peter Norvig - write the test cases in english, model them in code (play with the relationships assuming they exist in the database), you will discover the shortcomings of the current design that way. Then you can refine the relationships and when you are done, your code must be as readable as the use cases you wrote in english.