Efficient queries for relationships in SQLAlchemy

Efficient queries for relationships in SQLAlchemy - python

A little background: I am creating a web application (using Flask) for use internally in an organization. The webapp will have a very simple message board that allows users to post and comment on posts.
I'm doing this for a couple reasons -- mainly to get experience with Flask and to better understand sqlalchemy.
This is the database schema with some non-important info removed:
class User(db.Model):
id = db.Column(db.Integer, primary_key = True)
# information about user
posts = db.relationship('Post', backref = 'author', lazy = 'dynamic')
comments = db.relationship('Comment', backref = 'author', lazy = 'dynamic')
class Post(db.Model):
id = db.Column(db.Integer, primary_key = True)
# information about posts (title, body, timestamp, etc.)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
comments = db.relationship('Comment', backref = 'thread', lazy = 'dynamic')
class Comment(db.Model):
id = db.Column(db.Integer, primary_key = True)
# information about comment (body, timestamp, etc)
user_id = db.Column(db.Integer, db.ForeignKey('user.id')) # author
post_id = db.Column(db.Integer, db.ForeignKey('post.id')) # thread
When I render the messages view, I want to be able to display a table of threads with the following information for each message:
Title
Author
# Replies
Time of last modification
Right now, my query to get the messages looks like this:
messages = Post.query.filter_by(post_type = TYPE_MESSAGE).order_by('timestamp desc')
With that query, I get easily get the title and author for each post. However, it currently orders by the date the message was created (I know that is wrong, and I know why) and I can't easily get the number of replies.
If I was looping through the messages to render them in the application, I could access the message.comments attribute and use that to find the length and get the timestamp of the most recent comment, but am I correct in assuming that to get that data it would require another database query (to access message.comments)?
Since that is the case, I could get the list of all of the messages with one query (good) but if I had n messages, it would require n additional database queries to populate the messages view with the information that I want, which is far from efficient.
This brings me to my main question: is it possible to use aggregate operators with SQLAlchemy as you would in a regular SQL query to get COUNT(comments) and MAX(timestamp) in the original query for messages? Or, is there another solution to this that I haven't explored yet? Ideally, I want to be able to do this all in one query. I looked through the SQLAlchemy documentation and couldn't find anything like this. Thanks!

For counting, you can try this (an example):
session.query(Comment).join(Post).filter_by(id=5).count()
or
sess.query(Comment).join(Post).filter(Post.id==5).count()
And, yes you can use aggregates:
sess.query(func.max(Comment.id)).join(Post).filter_by(id=5).all()
or
sess.query(func.max(Comment.id)).join(Post).filter(Post.id==5).all()

Related

Linking databases for a management system using Flask SQLAlchemy

I'm trying to build a very basic management system website for a hypothetical insurance agency and i just cant wrap my head around how i should organize the database to make it so i can assign users to specific policies and have the ability to update/replace the user in case there are re-arrangements within the agency so policies can be reassigned to the proper agents. This would be used to display data based on login as well. There's 3 layers that i think i need. A User table for user data, a client data/policy table to store client and policy info, and then a table for tasks that would be assigned to policies. I need multiple users to have access to a policy and then the policy should have access to 1 row in the task table. Would it just be better to have a user table and large client table with the task columns inside rather than a separate table for the tasks? I've been banging my head with this for days so if anyone can help, i greatly appreciate it.
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SECRET_KEY'] = ''
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///site.db'
db = SQLAlchemy(app)
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50))
username = db.Column(db.String(50), unique=True)
password = db.Column(db.String(50))
email = db.Column(db.String(50), unique=True)
#Multiple assigned users can access
class Client(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50))
policy_number = db.Column(db.String(50), unique = True)
expiration_date = db.Column(db.Datetime)
#Single "client" assigned to single row of tasks based on policy number
class PolicyTasks(db.Model):
id = db.Column(db.Integer, primary_key=True)
step1 = db.Column(db.String(50))
step1_completed = db.Column(db.Boolean)
step2 = db.Column(db.String(50))
step2_completed = db.Column(db.Boolean)
step3 = db.Column(db.String(50))
step3_completed = db.Column(db.Boolean)
step4 = db.Column(db.String(50))
step4_completed = db.Column(db.Boolean)
step5 = db.Column(db.String(50))
step5_completed = db.Column(db.Boolean)
I removed the code i used to attempt to create the relationships because it might honestly be more helpful to look at the base layout

To wrap your head around how you should organize the database to make it so you can assign users to specific policies, take a look at the docs on the Flask website. I think the second example under One-to-Many is very similar to what you're looking to do http://flask-sqlalchemy.pocoo.org/2.3/models/.
I would recommend keeping the tables separate and not jamming it all into one. Usually that leads to a slippy slope (in my experience) with tables with too many attributes to manage.
It can also mean slower query times because Client.query.filter_by(username='peter').first() would then always query the client data, policy data and whatever else you later on end up throwing in that table, when you may've only needed the policy data for that specific view/api route.
This other stackoverflow post might help too:
database design - when to split tables?

slqlachemy joined load by variable declaration

Suppose I got two models. Account and Question.
class Account(DeclarativeBase):
__tablename__ = 'accounts'
id = Column(Integer, primary_key=True)
user_name = Column(Unicode(255), unique=True, nullable=False)
and my Question model be like:
class Question(DeclarativeBase):
__tablename__ = 'questions'
id = Column(Integer, primary_key=True)
content = Column(Unicode(2500), nullable=False)
account_id = Column(Integer, ForeignKey(
'accounts.id', onupdate='CASCADE', ondelete='CASCADE'), nullable=False)
account = relationship('Account', backref=backref('questions'))
I got a method that returns a question in json format from the provided question ID.
when the method is like this below, it only returns the id the content and the account_id of the question.
#expose('json')
def question(self, question_id):
return dict(questions=DBSession.query(Question).filter(Question.id == question_id).one())
but I need the user_name of Account to be included in the json response too. something weird (at least to me) is that I have to explicitly tell the method that the query result contains a relation to an Account and this way the account info will be included in the json response: I mean doing something like this
#expose('json')
def question(self, question_id):
result = DBSession.query(Question).filter(Question.id == question_id).one()
weird_variable = result.account.user_name
return dict(question=result)
why do I have to do such thing? what is the reason behind this?

From Relationship Loading Techniques:
By default, all inter-object relationships are lazy loading.
In other words in its default configuration the relationship account does not actually load the account data when you fetch a Question, but when you access the account attribute of a Question instance. This behaviour can be controlled:
from sqlalchemy.orm import joinedload
DBSession.query(Question).\
filter(Question.id == question_id).\
options(joinedload('account')).\
one()
Adding the joinedload option instructs the query to load the relationship at the same time with the parent, using a join. Other eager loading techniques are also available, their possible use cases and tradeoffs discussed under "What Kind of Loading to Use?"

SQLAlchemy low performance when use relationship

I have SQLAlchemy models in my Flask application:
class User(db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
photos = db.relationship('Photo', lazy='joined')
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('users.id'))
photo = db.Column(db.String(255))
When i query user i get user with his photos automatically. But i noticed if user has a lot of photos, query u = User.query.get(1) become very slow. I do the same but manually with lazy='noload' option
u = User.query.get(1)
photos = Photo.query.filter_by(user_id == 1)
and on the same data it works faster in times. Where is problem, is sql join slow(don`t think so, because it start hang on 100-1kk photo objects, not so big data) or something wrong in the SQLAlchemy?

From my experiences I suggest you to get familiar with SQLAlchemy Loading Relationships. Sometimes even if relationship functionality is easy, usefull in larger datasets is better to do not use it, or even execute plain text SQL. This will be better from performance point of view on larger data sets.

Querying with Flask-SQLAlchemy

I'm using Flask to build a RESTful api and use SQLAlchemy to connect my app to a MySQL database.
I have two models in databse : Order and Order_line. An order is made of several order lines. Each order lines has a status associated.
I'm having trouble translating my SQL request into a Flask-SQLAlchemy statement. I'm especially bugged by the join.
Here are my models:
class Order(db.Model):
id = db.Column(db.Integer, primary_key=True)
date_created = db.Column(db.DateTime)
lines = db.relationship('Order_line',
backref=db.backref('order',
lazy='join'))
def __init__(self, po):
self.date_created = datetime.datetime.now()
class Order_line(db.Model):
id = db.Column(db.Integer, primary_key=True)
order_id = db.Column(db.Integer, db.ForeignKey('order.id'))
status_id = db.Column(db.Integer, db.ForeignKey('status.id'))
def __init__(self, order_id):
self.order_id = order_id
self.status_id = 1
class Status(db.Model):
id = db.Column(db.Integer, primary_key=True)
short_name = db.Column(db.String(60))
description = db.Column(db.String(400))
lines = db.relationship('Order_line',
backref=db.backref('status',
lazy='join'))
def __init__(self, short_name, description):
self.short_name = short_name
self.description = description
Basically, I want to retrieve all the orders (so retrieve the Order.id) which have one or more order_line's status_id different from 1.
The SQL query would be
SELECT id FROM `order`
INNER JOIN order_line
ON order.id=order_line.order_id
WHERE
order_line.status_id NOT LIKE 1
GROUP BY
order.id
I didn't find a way to translate that SQL statement into a SQLAlchemy command. I'm especially confused by the difference between Flask-SQLAlchemy wrapper and 'vanilla' SQLAlchemy.

You can use .any():
Order.query.filter(Order.lines.any(Order_line.status_id != 1))

I'm also new in Flask-SQLAlchemy but I've been learning a lot lately for my app. Then, some point that could help you:
The main difference between 'vanilla' and Flask-SQLAlchemy at the moment to do a query, it is the way Flas-SQLAlchemy handle the session variables. In the Flask version you have a db object that handle your session as in this case:
db = SQLAlchemy()
With that object, you handle the query. In your case, your query could be performed in this way:
db.session.query(Order).filter(Order.id==Order_line.order_id).filter(Order_line.status_id!=1).group_by(Order.id).all()
This is not exactly the same query but it quite similar. It will return you all the fields from the Order table but if you want only the "id", you can change "Order" for "Query.id" in the query statement. The "like" filter that you have I'm not totally sure how to implement it in Flask-SQLAlchemy but I found this question that perform an answers for the "vanilla" SQLalchemy: how to pass a not like operator in a sqlalchemy ORM query

Can also use
db.session.query(Order.id).filter(Order.lines.status_id != 1 ).group_by(Order.id).all()

sqlalchemy, select using reverse-inclusive (not in) list of child column values

I have a typical Post / Tags (many tags associated with one post) relationship in flask-sqlalchemy, and I want to select posts which aren't tagged with any tag in a list I provide. First, the models I set up:
class Post(db.Model):
id = db.Column(db.Integer, primary_key=True)
tags = db.relationship('Tag', lazy='dynamic')
class Tag(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.Text(50))
post_id = db.Column(db.Integer, db.ForeignKey('post.id'))
Something like
db.session.query(Post).filter(Post.tags.name.notin_(['dont','want','these']))
fails with
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Post.tags has an attribute 'name'
which I assume is because tags is a relationship and not a column. I had this working on another project when I was writing the actual SQL manually. This was the SQL that worked:
SELECT * FROM $posts WHERE id NOT IN (SELECT post_id FROM $tags WHERE name IN ('dont','want','these'))
How would I achieve this using the sqlalchemy API?

Pretty straightforward using negated any:
query = session.query(Post).filter(~Post.tags.any(Tag.name.in_(['dont', 'want', 'these'])))

Try this one, easy:
users = session.query(Post).filter(not_(Post.tags.name.in_(['dont', 'want', 'these'])))
Hope this helps!

The notin_ works for me, adjusted example:
db.session.query(Post).filter(Post.tags.notin_(['dont','want','these']))

I thought up a nasty solution, but it works for the time being. I'd be interested to hear if anyone comes up with a smarter method.
ignore_ids = [item.post_id for item in Tag.query.filter(Tag.name.in_(['dont','want','these'])).all()]
Post.query.filter(Post.id.notin_(ignore_ids))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient queries for relationships in SQLAlchemy - python

Related

Linking databases for a management system using Flask SQLAlchemy

slqlachemy joined load by variable declaration

SQLAlchemy low performance when use relationship

Querying with Flask-SQLAlchemy

sqlalchemy, select using reverse-inclusive (not in) list of child column values

Categories

Resources