How to paginate in Flask-SQLAlchemy for db.session joined queries?

How to paginate in Flask-SQLAlchemy for db.session joined queries? - python

Say, we have the following relationships:
a person can have many email addresses
a email service provider can (obviously) serve multiple email address
So, it's a many to many relationship. I have three tables: emails, providers, and users. Emails have two foreign ids for provider and user.
Now, given a specific person, I want to print all the email providers and the email address it hosts for this person, if it exists. (If the person do not have an email at Gmail, I still want Gmail be in the result. I believe otherwise I only need a left inner join to solve this.)
I figured out how to do this with the following subqueries (following the sqlalchemy tutorial):
email_subq = db.session.query(Emails).\
filter(Emails.user_id==current_user.id).\
subquery()
provider_and_email = db.session.query(Provider, email_subq).\
outerjoin(email_subq, Provider.emails).\
all()
This works okay (it returns a 4-tuple of (Provider, user_id, provider_id, email_address), all the information that I want), but I later found out this is not using the Flask BaseQuery class, so that pagination provided by Flask-SQLAlchemy does not work. Apparently db.session.query() is not the Flask-SQLAlchemy Query instance.
I tried to do Emails.query.outerjoin[...] but that returns only columns in the email table though I want both the provider info and the emails.
My question: how can I do the same thing with Flask-SQLAlchemy so that I do not have to re-implement pagination that is already there?
I guess the simplest option at this point is to implement my own paginate function, but I'd love to know if there is another proper way of doing this.

I'm not sure if this is going to end up being the long-term solution, and it does not directly address my concern about not using the Flask-SQLAlchemy's BaseQuery, but the most trivial way around to accomplish what I want is to reimplement the paginate function.
And, in fact, it is pretty easy to use the original Flask-SQLAlchemy routine to do this:
def paginate(query, page, per_page=20, error_out=True):
if error_out and page < 1:
abort(404)
items = query.limit(per_page).offset((page - 1) * per_page).all()
if not items and page != 1 and error_out:
abort(404)
# No need to count if we're on the first page and there are fewer
# items than we expected.
if page == 1 and len(items) < per_page:
total = len(items)
else:
total = query.order_by(None).count()
return Pagination(query, page, per_page, total, items)
Modified from the paginate function found around line 376: https://github.com/mitsuhiko/flask-sqlalchemy/blob/master/flask_sqlalchemy.py

Your question is how to use Flask-SQLAlchemy's Pagination with regular SQLAlchemy queries.
Since Flask-SQLAlchemy's BaseQuery object holds no state of its own, and is derived from SQLAlchemy's Query, and is really just a container for methods, you can use this hack:
from flask.ext.sqlalchemy import BaseQuery
def paginate(sa_query, page, per_page=20, error_out=True):
sa_query.__class__ = BaseQuery
# We can now use BaseQuery methods like .paginate on our SA query
return sa_query.paginate(page, per_page, error_out)
To use:
#route(...)
def provider_and_email_view(page):
provider_and_email = db.session.query(...) # any SQLAlchemy query
paginated_results = paginate(provider_and_email, page)
return render_template('...', paginated_results=paginated_results)
*Edit:
Please be careful doing this. It's really just a way to avoid copying/pasting the paginate function, as seen in the other answer. Note that BaseQuery has no __init__ method. See How dangerous is setting self.__class__ to something else?.
*Edit2:
If BaseQuery had an __init__, you could construct one using the SA query object, rather than hacking .__class__.

Hey I have found a quick fix for this here it is:
provider_and_email = Provider.query.with_entities(email_subq).\
outerjoin(email_subq, Provider.emails).paginate(page, POST_PER_PAGE_LONG, False)

I'm currently using this approach:
query = BaseQuery([Provider, email_subq], db.session())
to create my own BaseQuery. db is the SqlAlchemy instance.
Update: as #afilbert suggests you can also do this:
query = BaseQuery(provider_and_email.subquery(), db.session())

How do you init your application with SQLAlchemy?
Probably your current SQLAlchemy connection has nothing to do with flask.ext.sqalchemy and you use original sqlalchemy
Check this tutorial and check your imports, that they really come from flask.ext.sqlalchemy
http://pythonhosted.org/Flask-SQLAlchemy/quickstart.html#a-minimal-application

You can try to paginate the list with results.
my_list = [my_list[i:i + per_page] for i in range(0, len(my_list), per_page)][page]

I did this and it works:
query = db.session.query(Table1, Table2, ...).filter(...)
if page_size is not None:
query = query.limit(page_size)
if page is not None:
query = query.offset(page*page_size)
query = query.all()

I could be wrong, but I think your problem may be the .all(). By using that, you're getting a list, not a query object.
Try leaving it off, and pass your query to the pagination method like so (I left off all the subquery details for clarity's sake):
email_query = db.session.query(Emails).filter(**filters)
email_query.paginate(page, per_page)

Related

Manage Multiple Params with GET Flask restful MongoEngine Clean Code

I'm creating an API with flask_restful, and I want to search for example with two parameters that will be passed by GET, tag and author.
with the code below I can do this. However, it's necessary that I should pass the two parameters. I want whatever the parameter the user passed still search for it,
Exempli gratia:
if I passed tag=tech the response should have all the news with tag tech and with all authors if also I passed the author it should consider tag as all -i think u got it-
class ArticleAPI(Resource):
def get(self):
tag=request.args.get('tag','')
auth=request.args.get('author','')
news = News.objects.filter(topic=tag,author=auth).to_json()
return Response(news, mimetype="application/json", status=200)
i know i can do it long just like this, but it looks ugly :`(
if tag is not None and auth is not None :
news = News.objects.filter(topic=tag,author=auth).to_json()
elif tag is not None :
news = News.objects.filter(topic=tag).to_json()
elif auth is not None:
news = News.objects.filter(author=auth).to_json()
I'm using Flask_mongoengine
from flask_mongoengine import MongoEngine
db = MongoEngine()
def initialize_db(app):
db.init_app(app)

I think you are asking how to pass in keyword arguments into the .filter() method in a more concise way.
According to the mongoengine docs, .filter() is an alias for __call__(). It takes a Query object, or keyword arguments for the **query parameter. Your code is using the keywords style.
You could put the tag and auth variables into a dict, then unpack them using a double splat as keyword arguments.
Something like this:
fdict = dict()
if tag : fdict['tag'] = tag
if auth: fdict['auth'] = auth
news = News.objects.filter(**fdict).to_json()
Now you can add as many of these parameters as you want and it should be the same syntax.
You could just pass in all query params at once. This is the cleanest way I can think of:
news = News.objects.filter(**request.args).to_json()
That said, there is usually a security tradeoff for blindly taking user provided data and passing into a database. I don't know enough about how Mongo handles this to speak intelligently about what is best practice here. Plus, what you name on the UI side may not have the same name on the DB side.
This is also possible using a Query object, but I haven't done any MongoDB stuff in a long time and the syntax seems very specific, so I won't attempt here. :)

DatabaseSessionIsOver with Pony ORM due to lazy loading?

I am using Pony ORM for a flask solution and I've come across the following.
Consider the following:
#db_session
def get_orders_of_the_week(self, user, date):
q = select(o for o in Order for s in o.supplier if o.user == user)
q2 = q.filter(lambda o: o.date >= date and o.date <= date+timedelta(days=7))
res = q2[:]
#for r in res:
# print r.supplier.name
return res
When I need the result in Jinja2 -- which is looks like this
{% for order in res %}
Supplier: {{ order.supplier.name }}
{% endfor %}
I get a
DatabaseSessionIsOver: Cannot load attribute Supplier[3].name: the database session is over
If I uncomment the for r in res part, it works fine. I suspect there is some sort of lazy loading that doesn't get loaded with res = q2[:].
Am I completely missing a point or what's going on here?

I just added prefetch functionality that should solve your problem. You can take working code from the GitHub repository. This feature will be part of the upcoming release Pony ORM 0.5.4.
Now you can write:
q = q.prefetch(Supplier)
or
q = q.prefetch(Order.supplier)
and Pony will automatically load related supplier objects.
Below I'll show several queries with prefetching, using the standard Pony example with Students, Groups and Departments.
from pony.orm.examples.presentation import *
Loading Student objects only, without any prefetching:
students = select(s for s in Student)[:]
Loading students together with groups and departments:
students = select(s for s in Student).prefetch(Group, Department)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
The same as above, but specifying attributes instead of entities:
students = select(s for s in Student).prefetch(Student.group, Group.dept)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
Loading students and its courses (many-to-many relationship):
students = select(s for s in Student).prefetch(Student.courses)
for s in students:
print s.name
for c in s.courses: # no additional query to the DB is required
print c.name
As a parameters of the prefetch() method you can specify entities and/or attributes. If you specified an entity, then all to-one attributes with this type will be prefetched. If you specified an attribute, then this specific attribute will be prefetched. The to-many attributes are prefetched only when specified explicitly (as in the Student.courses example). The prefetching goes recursively, so you can load long chain of attributes, such as student.group.dept.
When object is prefetched, then by default all of its attributes are loaded, except lazy attributes and to-many attributes. You can prefetch lazy and to-many attributes explicitly if it is needed.
I hope this new method fully covers your use-case. If something is not working as expected, please start new issue on GitHub. Also you can discuss functionality and make feature requests at Pony ORM mailing list.
P.S. I'm not sure that repository pattern that you use give your serious benefits. I think that it actually increase coupling between template rendering and repo implementation, because you may need to change repo implementation (i.e. add new entities to prefetching list) when template code start using of new attributes. With the top-level #db_session decorator you can just send query result to the template and all happens automatically, without the need of explicit prefetching. But maybe I'm missing something, so I will be interested to see additional comments about the benefits of using the repository pattern in your case.

This happens because you're trying to access the related object which was not loaded and since you're trying to access it outside of the database session (the function decorated with the db_session), Pony raises this exception.
The recommended approach is to use the db_session decorator at the top level, at the same place where you put the Flask's app.route decorator:
#app.route('/index')
#db_session
def index():
....
return render_template(...)
This way all calls to the database will be wrapped with the database session, which will be finished after a web page is generated.
If there is a reason that you want to narrow the database session to a single function, then you need to iterate the returning objects inside the function decorated with the db_session and access all the necessary related objects. Pony will use the most effective way for loading the related objects from the database, avoiding the N+1 Query problem. This way Pony will extract all the necessary objects within the db_session scope, while the connection to the database is still active.
--- update:
Right now, for loading the related objects, you should iterate over the query result and call the related object attribute:
for r in res:
r.supplier.name
It is similar to the code in your example, I just removed the print statement. When you 'touch' the r.supplier.name attribute, Pony loads all non-lazy attributes of the related supplier object. If you need to load lazy attributes, you need to touch each of them separately.
Seems that we need to introduce a way to specify what related objects should be loaded during the query execution. We will add this feature in one of the future releases.

Django Models Create a Coustom Query

I am trying to get all the post in a thread before or on a certain time. So how do I get Django to allow me the privilege to enter my own queries?
This is the closest I could come using Django's model functions.
# need to get all the post from Thread post_set that were created before Thread post_set 9
posts = Thread.post_set.filter(created <= Thread.post_set.all()[9].created)

You can use raw sql like so:
Thread.objects.raw('SELECT ... FROM myapp_thread WHERE ...')

If post_set is a foreign key, then use:
posts = Thread.objects.filter( post_set__created__lt=datetime.date(2013, 5, 10))
If you still want to go with a raw SQL query, as detailed here, please be careful, as no escaping is automatically performed.

How to retrieve properties only once from database in django

I have some relationships in my database that I describe like that:
#property
def translations(self):
"""
:return: QuerySet
"""
if not hasattr(self, '_translations'):
self._translations = ClientTranslation.objects.filter(base=self)
return self._translations
The idea behind the hasattr() and self._translation is to have the db only hit one time, while the second time the stored property is returned.
However, after reading, the docs, I'm not sure if the code is doing that - as queries are only hitting the db when the values are really needed - which comes after my code.
How would a correct approach look like?

Yes, DB is hit the first time someone needs the value. But as you pointed out, you save the query, not the results. Wrap the query with list(...) to save the results.
By the way, you can use the cached_property decorator to make it more elegant. It is not a built-in, though. It can be found here. You end up with:
#cached_property
def translations(self):
return list(ClientTranslation.objects.filter(base=self))

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?

Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice

Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.

I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)

I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.