SQLAlchemy conversion of Django annotate - python

I have the following annotation in a Django model manager I'd like to convert to a SQLAlchemy ORM query:
annotations = {
'review_count' : Count("cookbookreview", distinct=True),
'rating' : Avg("cookbookreview__rating")
}
return self.model.objects.annotate(**annotations)
What I essentially need is each model object in the query to have review_count and rating attached to them as part of the initial query. I believe I can use column_property, but I would like to avoid this type of "calculated property" on the object, because I don't want the property (expensive lookup) being done for each object when I access the property in a template.
What is the right way to approach this problem? Thanks in advance.

So, for the sake of completeness and usefulness for others with this issue I present the following solution (which may or may not be the optimal way to solve this)
sq_reviews = db_session.query(CookbookReview.cookbook_id,
func.avg(CookbookReview.rating).label('rating'),\
func.count('*').label('review_count')).\
group_by(CookbookReview.cookbook_id).subquery()
object_list = db_session.query(
Cookbook, sq_reviews.c.rating, sq_reviews.c.review_count).\
outerjoin(sq_reviews, Cookbook.id==sq_reviews.c.cookbook_id).\
order_by(Cookbook.name).limit(20)
The key here is the concept of SQLAlchemy subqueries. If you think of each annotation in my original Django query as a subquery, the concept is easier to understand. It's also worth noting that this query is quite speedy - many orders of magnitude swifter than it's (more concise/magical) Django counterpart. Hopefully this helps others curious about this particular Django/SQLAlchemy query analog.
Also keep in mind that you need to perform the actual annotation of the ORM objects yourself. A simple function like this called before sending the object list to your template will suffice:
def process(query):
for obj, rating, review_count in query:
obj.rating = rating
obj.review_count = review_count
yield obj

Related

ProgrammingError: column "product" is of type product[] but expression is of type text[] enum postgres

I would like to save array of enums.
I have the following:
CREATE TABLE public.campaign
(
id integer NOT NULL,
product product[]
)
product is an enum.
In Django I defined it like this:
PRODUCT = (
('car', 'car'),
('truck', 'truck')
)
class Campaign(models.Model):
product = ArrayField(models.CharField(null=True, choices=PRODUCT))
However, when I write the following:
campaign = Campaign(id=5, product=["car", "truck"])
campaign.save()
I get the following error:
ProgrammingError: column "product" is of type product[] but expression is of type text[]
LINE 1: ..."product" = ARRAY['car...
Note
I saw this answer, but I don't use sqlalchemy and would rather not use it if not needed.
EDITED
I tried #Roman Konoval suggestion below like this:
class PRODUCT(Enum):
CAR = 'car'
TRUCK = 'truck'
class Campaign(models.Model):
product = ArrayField(EnumField(PRODUCT, max_length=10))
and with:
campaign = Campaign(id=5, product=[CAR, TRUCK])
campaign.save()
However, I still get the same error,
I see that django is translating it to list of strings.
if I write the following directly the the psql console:
INSERT INTO campaign ("product") VALUES ('{car,truck}'::product[])
it works just fine
There are two fundamental problems here.
Don't use Enums
If you continue to use enum, your next question here on Stackoverflow will be "how do I add a new entry to an enum?". Django does not support enum type out of the box (thank heavens). So you have to use third party libraries for this. Your mileage will vary with how complete the library is.
An enum value occupies four bytes on disk. The length of an enum
value's textual label is limited by the NAMEDATALEN setting compiled
into PostgreSQL; in standard builds this means at most 63 bytes.
If you are thinking that you are saving space on disk by using enum, the above quote from the manual shows that it's an illusion.
See this Q&A for more on advantages and disadvantages of enum. But generally the disadvantages outweigh the advantages.
Don't use Arrays
Tip: Arrays are not sets; searching for specific array elements can be
a sign of database misdesign. Consider using a separate table with a
row for each item that would be an array element. This will be easier
to search, and is likely to scale better for a large number of
elements.
Source: https://www.postgresql.org/docs/9.6/static/arrays.html
If you are going to search for a campaign that deals with Cars or Trucks you are going to have to do a lot of hard work. So will the database.
The correct design
The correct design is the one suggested in the postgresql arrays documentation page. Create a related table. This is the standard django way as well.
class Campaign(models.Model):
name = models.CharField(max_length=20)
class Product(Models.model):
name = models.CharField(max_length=20)
campaign = models.ForeignKey(Campaign)
This makes your code simpler. Doesn't require any extra storage. Doesn't require third party libraries. And best of all the vast api of the django related models becomes available to you.
The definition of product field is incorrect as it specifies that it is array of CharFields but it is array of enums in reality. Django does not support enum type now so you can try this extension to define the type correctly:
class Product(Enum):
ProductA = 'a'
...
class Campaign(models.Model):
product = ArrayField(EnumField(Product, max_length=<whatever>))
Try this:
def django2psql(s):
return '{'+','.join(s) + '}'
campaign = Campaign(id=5, product=django2psql(["car", "truck"]))
I think you may have to subclass CharField to get it to report the correct db_type. There may be more problems than this but you can give this a try:
class Product(models.CharField):
def db_type(self, connection):
return 'product'
PRODUCT = (
('car', 'car'),
('truck', 'truck')
)
class Campaign(models.Model):
product = ArrayField(Product(null=True, choices=PRODUCT))

Sqlalchemy FIlter based on subtype

I am using a join-based inheritance - I have User(parent) and CorporateUser(child) models. The polymorphic_identity of User is "user" and the one of the CorporateUser is "corporate_user".
I have a query like this
User.query.filter(User.name.like("%"+search_text+"%"))
Is it possible to "chain" to this query something that will only return objects of type CorporateUser?
Currently I just add another User.query.filter(User.name.like("%"+search_text+"%")).filter(User.type == 'corporate_user')
but this doesn't seem very elegant.
I am aware I can just do CorporateUser.query.filter(User.name.like("%"+search_text+"%"))
but the point is that I am given the filters of the initial query.
Thanks.
The .with_entities() can help you. It will not exactly return the CorporateUser, but only the fields you defined.
query = User.query.filter(User.name.like("%"+search_text+"%")) \
.filter(User.type == 'corporate_user') \
.with_entities(User.corporate_user)
Each item in your query result will be a tuple with the entities defined.
Of course, your model need the User.corporate_user back reference. Based on your question, I'm not sure if you have it.

DatabaseSessionIsOver with Pony ORM due to lazy loading?

I am using Pony ORM for a flask solution and I've come across the following.
Consider the following:
#db_session
def get_orders_of_the_week(self, user, date):
q = select(o for o in Order for s in o.supplier if o.user == user)
q2 = q.filter(lambda o: o.date >= date and o.date <= date+timedelta(days=7))
res = q2[:]
#for r in res:
# print r.supplier.name
return res
When I need the result in Jinja2 -- which is looks like this
{% for order in res %}
Supplier: {{ order.supplier.name }}
{% endfor %}
I get a
DatabaseSessionIsOver: Cannot load attribute Supplier[3].name: the database session is over
If I uncomment the for r in res part, it works fine. I suspect there is some sort of lazy loading that doesn't get loaded with res = q2[:].
Am I completely missing a point or what's going on here?
I just added prefetch functionality that should solve your problem. You can take working code from the GitHub repository. This feature will be part of the upcoming release Pony ORM 0.5.4.
Now you can write:
q = q.prefetch(Supplier)
or
q = q.prefetch(Order.supplier)
and Pony will automatically load related supplier objects.
Below I'll show several queries with prefetching, using the standard Pony example with Students, Groups and Departments.
from pony.orm.examples.presentation import *
Loading Student objects only, without any prefetching:
students = select(s for s in Student)[:]
Loading students together with groups and departments:
students = select(s for s in Student).prefetch(Group, Department)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
The same as above, but specifying attributes instead of entities:
students = select(s for s in Student).prefetch(Student.group, Group.dept)[:]
for s in students: # no additional query to the DB is required
print s.name, s.group.major, s.group.dept.name
Loading students and its courses (many-to-many relationship):
students = select(s for s in Student).prefetch(Student.courses)
for s in students:
print s.name
for c in s.courses: # no additional query to the DB is required
print c.name
As a parameters of the prefetch() method you can specify entities and/or attributes. If you specified an entity, then all to-one attributes with this type will be prefetched. If you specified an attribute, then this specific attribute will be prefetched. The to-many attributes are prefetched only when specified explicitly (as in the Student.courses example). The prefetching goes recursively, so you can load long chain of attributes, such as student.group.dept.
When object is prefetched, then by default all of its attributes are loaded, except lazy attributes and to-many attributes. You can prefetch lazy and to-many attributes explicitly if it is needed.
I hope this new method fully covers your use-case. If something is not working as expected, please start new issue on GitHub. Also you can discuss functionality and make feature requests at Pony ORM mailing list.
P.S. I'm not sure that repository pattern that you use give your serious benefits. I think that it actually increase coupling between template rendering and repo implementation, because you may need to change repo implementation (i.e. add new entities to prefetching list) when template code start using of new attributes. With the top-level #db_session decorator you can just send query result to the template and all happens automatically, without the need of explicit prefetching. But maybe I'm missing something, so I will be interested to see additional comments about the benefits of using the repository pattern in your case.
This happens because you're trying to access the related object which was not loaded and since you're trying to access it outside of the database session (the function decorated with the db_session), Pony raises this exception.
The recommended approach is to use the db_session decorator at the top level, at the same place where you put the Flask's app.route decorator:
#app.route('/index')
#db_session
def index():
....
return render_template(...)
This way all calls to the database will be wrapped with the database session, which will be finished after a web page is generated.
If there is a reason that you want to narrow the database session to a single function, then you need to iterate the returning objects inside the function decorated with the db_session and access all the necessary related objects. Pony will use the most effective way for loading the related objects from the database, avoiding the N+1 Query problem. This way Pony will extract all the necessary objects within the db_session scope, while the connection to the database is still active.
--- update:
Right now, for loading the related objects, you should iterate over the query result and call the related object attribute:
for r in res:
r.supplier.name
It is similar to the code in your example, I just removed the print statement. When you 'touch' the r.supplier.name attribute, Pony loads all non-lazy attributes of the related supplier object. If you need to load lazy attributes, you need to touch each of them separately.
Seems that we need to introduce a way to specify what related objects should be loaded during the query execution. We will add this feature in one of the future releases.

Django: Extracting a `Q` object from a `QuerySet`

I have a Django QuerySet, and I want to get a Q object out of it. (i.e. that holds the exact same query as that queryset.)
Is that possible? And if so, how?
No, but you could create the Q object first, and use that; alternatively, create your query as a dict, and pass that to your filter method and the Q object.
This is not exactly what you were asking for, but you can extract the sql from a query set by accessing the query member. For example:
x = somequeryset.query
Then you could use that on a new queryset object to reconstruct the original queryset. This may work better in saving stuff like "values" that are defined for a query set. The defined x is easy to store. I've used this in the past to save user constructed queries/searches that then are run daily with the results emailed to the user.
Relevant also if you wanted the Q object so you you can reconstruct a complex query by ORing another Q object to it, is that, provided two QuerySets are on the same model, you can OR the QuerySets directly for that same effect. It's worth trying that and examining the SQL before and after.
For example:
qs1 = model.objects.filter(...)
print("qs1: {}".format(qs1.query)
qs2 = model.objects.filter(...)
print("qs2: {}".format(qs1.query)
qs = q1 | q2
print("qs: {}".format(qs.query)
I certainly found your question because I wanted the Q object from the query for this very reason, and discovered on the Django Users Group:
https://groups.google.com/d/msg/django-users/2BuFFMDL0VI/dIih2WRKAgAJ
that QuerySets can be combined in much the same way as Q objects can.
That may or may not be helpful to you, depending on the reason you want that Q object of course.

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?
Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice
Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.
I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Categories

Resources