Django, How to make multiple annotate in a single queryset - python

I am currently trying to annotate two different number of likes to a User model in Django.
Here's the code I'm using to return the desired querySet
def get_top_user(self):
return User.objects. \
annotate(guide_like=Count('guidelike')).\
annotate(news_like=Count('newslike')).\
values_list('first_name', 'last_name', 'guide_like','news_like').\
order_by('-guide_like')
However, the querySet returns ["Bob", "Miller", 612072, 612072]. As you can see, Django takes the two annotate values and multiply them together and that's why I'm getting 612072.
Is there a way to call multiple annotate in a single querySet without getting these multiplied values.
EDIT: Also tried to add distinct() at the end of the query or distinct=True in each count but the call simply gets into an infinite loop.

This is how django annotate produce sql code: it's do all necessary joins and then group by over all User fields, aggregating with annotation function(count in your case). So, it joins users with all their guide likes and then with all news likes and then simply counts number of rows produced per user.
If you can, you should use raw querysets, or extra Queryset method. E.g:
User.objects.all().extra(select={
'guide_likes': 'select count(*) from tbl_guide_likes where user_id=tbl_users.id',
'news_like': 'select count(*) from tbl_news_likes where user_id=tbl_users.id'
}).\
values_list('first_name', 'last_name', 'guide_like','news_like')
For more flexibility you can use select_params parameter of extra method for providing names of tables(which you can get through Model._meta). By the way this is very unconvenient and hackish method.
Sooner or later your logic become more complicated and then you should remove it from python code to sql(stored functions/procedures) and raw queries.

Related

Making complex query with django models

I created a view in my database model with 6 joins and 10 columns, and at the moment it shows around 86.000 rows.
I try to query all the rows by objects.all() and then filter according to user interaction (form data sent by POST and then choosing appropriate .filter(*args) querying)
After that I tried to get the length of the queryset by using count() since this method doesnt evaluate the query. But since views don't have indexes on the columns, the count() method takes to long.
I searched for the solution of materializing the view but that isn't possible in mysql.
Then I searched for a solution that might be able to replace the initial .all() by just using the 6 joins and filtering arguments in django rather than creating a view, so the indexes would still be available. But I couldn't find a solution to that problem.
Or maybe combining every row from the view with another table so I can use the index of the other table for faster querying?:
SELECT * FROM View LEFT JOIN Table ON (View.id = Table.id)
I appreciate every answer
Try this below:
from django.db import models
# I think below is your table structure
class Table(models.Model):
pass
class View(models.Model):
table = models.ForeignKey(to=Table)
qs = View.objects.select_related('table').filter(table__isnull=True)
for iterator in qs:
print(qs)
Thanks !

Django - Annotate multiple fields from a Subquery

I'm working on a Django project on which i have a queryset of a 'A' objects ( A.objects.all() ), and i need to annotate multiple fields from a 'B' objects' Subquery. The problem is that the annotate method can only deal with one field type per parameter (DecimalField, CharField, etc.), so, in order to annotate multiple fields, i must use something like:
A.objects.all().annotate(b_id =Subquery(B_queryset.values('id')[:1],
b_name =Subquery(B_queryset.values('name')[:1],
b_other_field =Subquery(B_queryset.values('other_field')[:1],
... )
Which is very inefficient, as it creates a new subquery/subselect on the final SQL for each field i want to annotate. I would like to use the same Subselect with multiple fields on it's values() params, and annotate them all on A's queryset. I'd like to use something like this:
b_subquery = Subquery(B_queryset.values('id', 'name', 'other_field', ...)[:1])
A.objects.all().annotate(b=b_subquery)
But when i try to do that (and access the first element A.objects.all().annotate(b=b_subquery)[0]) it raises an exception:
{FieldError}Expression contains mixed types. You must set output_field.
And if i set Subquery(B_quer...[:1], output_field=ForeignKey(B, models.DO_NOTHING)), i get a DB exception:
{ProgrammingError}subquery must return only one column
In a nutshell, the whole problem is that i have multiple Bs that "belongs" to a A, so i need to use Subquery to, for every A in A.objects.all(), pick a specific B and attach it on that A, using OuterRefs and a few filters (i only want a few fields of B), which seens a trivial problem for me.
Thanks for any help in advance!
What I do in such situations is to use prefetch-related
a_qs = A.objects.all().prefetch_related(
models.Prefetch('b_set',
# NOTE: no need to filter with OuterRef (it wont work anyway)
# Django automatically filter and matches B objects to A
queryset=B_queryset,
to_attr='b_records'
)
)
Now a.b_records will be a list containing a's related b objects. Depending on how you filter your B_queryset this list may be limited to only 1 object.

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

Django postgres order_by distinct on field

We have a limitation for order_by/distinct fields.
From the docs: "fields in order_by() must start with the fields in distinct(), in the same order"
Now here is the use case:
class Course(models.Model):
is_vip = models.BooleanField()
...
class CourseEvent(models.Model):
date = models.DateTimeField()
course = models.ForeignKey(Course)
The goal is to fetch the courses, ordered by nearest date but vip goes first.
The solution could look like this:
CourseEvent.objects.order_by('-course__is_vip', '-date',).distinct('course_id',).values_list('course')
But it causes an error since the limitation.
Yeah I understand why ordering is necessary when using distinct - we get the first row for each value of course_id so if we don't specify an order we would get some arbitrary row.
But what's the purpose of limiting order to the same field that we have distinct on?
If I change order_by to something like ('course_id', '-course__is_vip', 'date',) it would give me one row for course but the order of courses will have nothing in common with the goal.
Is there any way to bypass this limitation besides walking through the entire queryset and filtering it in a loop?
You can use a nested query using id__in. In the inner query you single out the distinct events and in the outer query you custom-order them:
CourseEvent.objects.filter(
id__in=CourseEvent.objects\
.order_by('course_id', '-date').distinct('course_id')
).order_by('-course__is_vip', '-date')
From the docs on distinct(*fields):
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

Django: Order by evaluation of whether or not a date is empty

In Django, is it possible to order by whether or not a field is None, instead of the value of the field itself?
I know I can send the QuerySet to python sorted() but I want to keep it as a QuerySet for subsequent filtering. So, I'd prefer to order in the QuerySet itself.
For example, I have a termination_date field and I want to first sort the ones without a termination_date, then I want to order by a different field, like last_name, first_name.
Is this possible or am I stuck using sorted() and then having to do an entire new Query with the included ids and run sorted() on the new QuerySet? I can do this, but would prefer not to waste the overhead and use the beauty of QuerySets that they don't run until evaluated.
Translation, how can I get this SQL from Django assuming my app is employee, my model is Employee and it has three fields 'first_name (varchar)', 'last_name (varchar)', and 'termination_date (date)':
SELECT
"employee_employee"."last_name",
"employee_employee"."first_name",
"employee_employee"."termination_date"
FROM "employee_employee"
ORDER BY
"employee_employee"."termination_date" IS NOT NULL,
"employee_employee"."last_name",
"employee_employee"."first_name"
You should be able to order by query expressions, like this:
from django.db.models import IntegerField, Case, Value, When
MyModel.objects.all().order_by(
Case(
When(some_field=None, then=Value(1)),
default=Value(0),
output_field=IntegerField(),
).asc(),
'some_other_field'
)
I cannot test here so it might require a bit a fiddling around, but this should put rows that have a NULL some_field after those that have a some_field. And each set of rows should be sorted by some_other_field.
Granted, the CASE/WHEN is be a bit more cumbersome that what you put in your question, but I don't know how to get Django ORM to output that. Maybe someone else will have a better answer.
Spectras' answer works fine, but it only orders your records by 'null or not'. There is a shorter way that allows you to put empty dates wherever you want them in your date ordering - Coalesce:
from django.db.models import Value
from django.db.models.functions import Coalesce
wayback = datetime(year=1, month=1, day=1) # or whatever date you want
MyModel.objects
.annotate(null_date=Coalesce('date_field', Value(wayback)))
.order_by('null_date')
This will essentially sort by the field 'date_field' with all records with date_field == None will be in the order as if they had the date wayback. This works perfectly with PostgreSQL, but might need some raw sql casting in MySQL as described in the documentation.

Categories

Resources