Django get values for Max of grouped data

Django get values for Max of grouped data - python

After many trials and errors and checking similar questions, I think it worth asking it with all the details.
Here's a simple model. Let's say we have a Book model and a Reserve model that holds reservation data for each Book.
class Book(models.Model):
title = models.CharField(
'Book Title',
max_length=50
)
name = models.CharField(
max_length=250
)
class Reserve(models.Model):
book = models.ForeignKey(
Book,
on_delete=models.CASCADE
)
reserve_date = models.DateTimeField()
status = models.CharField(
'Reservation Status',
max_length=5,
choices=[
('R', 'Reserved'),
('F', 'Free')
]
)
I added a book and two reservation records to the model:
from django.utils import timezone
book_inst = Book(title='Book1')
book_inst.save()
reserve_inst = Reserve(book=book_inst, reserve_date=timezone.now(), status='R')
reserve_inst.save()
reserve_inst = Reserve(book=book_inst, reserve_date=timezone.now(), status='F')
reserve_inst.save()
My goal is to get data for the last reservation for each book. Based on other questions, I get it to this point:
from django.db.models import F, Q, Max
reserve_qs = Reserve.objects.values(
'book__title'
)
reserve_qs now has the last action for each Book, but when I add .value() it ignores the grouping and returns all the records.
I also tried filtering with F:
Reserve.objects.values(
'book__title'
).annotate(
last_action=Max('reserve_date')
).values(
).filter(
reserve_date=F('last_action')
)
I'm using Django 3 and SQLite.

By using another filter, you will break the GROUP BY mechanism. You can however simply obtain the last reservation with:
from django.db.models import F, Max
Reserve.objects.filter(
book__title='Book1'
).annotate(
book_title=F('book__title'),
last_action=Max('book__reserve__reserve_date')
).filter(
reserve_date=F('last_action')
)
or for all books:
from django.db.models import F, Max
qs = Reserve.objects.annotate(
book_title=F('book__title'),
last_action=Max('book__reserve__reserve_date')
).filter(
reserve_date=F('last_action')
).select_related('book')
Here we will thus calculate the maximum for that book. Since we here join on the same table, we thus group correctly.
This will retrieve all the last reservations for all Books that are retained after filtering. Normally that is one per Book. But if there are multiple Books with multiple Reservations with exactly the same timestamp, then multiple ones will be returned.
So we can for example print the reservations with:
for q in qs:
print(
'Last reservation for {} is {} with status {}',
q.book.title,
q.reserve_date,
q.status
)
For a single book however, it is better to simply fetch the Book object and return the .latest(..) [Django-doc] reseervation:
Book.objects.get(title='Book1').reserve_set.latest('reserve_date')

book_obj = Book.objects.get(title='Book1')
reserve_qs = book_obj.reserve_set.all()
This returns all the Reserves that contains this book.
You can get the latest object using .first or .last() or sort them.

Related

Can I use an annotated subquery parameter later on in the same query?

I have a Django queryset that ideally does some annotation and filtering with 3 object classes. I have Conversations, Tickets, and Interactions.
My desired output is Conversations that have 1. an OPEN ticket, and 2. exactly ONE interaction, of type mass_text, since the ticket's created_at date.
I am trying to annotate the conversation query with ticket_created_at & filter out Nones, then somehow use that ticket_created_at parameter in a subsequent annotation/subquery to get count of interactions since the ticket_created_at date. Is this doable?
class Interaction(PolymorphicModel):
when = models.DateTimeField()
conversation = models.ForeignKey(Conversation)
mass_text = models.ForeignKey(MassText)
class Ticket(PolymorphicModel):
created_at = models.DateTimeField()
conversation = models.ForeignKey(Conversation)
status = models.CharField()
########################################################
open_ticket_subquery = (
Ticket.objects.filter(conversation=OuterRef("id"))
.filter(status=Ticket.Status.OPEN)
.order_by("-created_at")
)
filtered_conversations = (
self.get_queryset()
.select_related("student")
.annotate(
ticket_created_at=Subquery(
open_ticket_subquery.values("created_at")[:1]
)
)
.exclude(ticket_created_at=None)
.annotate(interactions_since_ticket=Count('interactions', filter=Q(interactions__when__gte=ticket_created_at)))
.filter(interactions_since_ticket=1)
This isn't working, because I can't figure out how to use ticket_created_at in the subsequent annotation.

What are the options to get filter on union querysets behavior with Django?

Basically the problem I have: I need an option or alternative approach to filter on annotated fields on union queryset.
I have the following simplified models setup:
class Course(Model):
groups = ManyToManyField(through=CourseAssignment)
class CourseAssignment(Model):
course = ForeignKey(Course)
group = ForeignKey(Group)
teacher = ForeignKey(Teacher)
class Lesson(Model):
course = ForeignKey(Course, related_name='lessons')
class AssignmentProgress(Model):
lesson = ForeignKey(related_name='progresses')
course_assignment = ForeignKey(CourseAssignment)
student = ForeignKey(Student)
group = ForeignKey(Group)
status = CharField(choices=(
('on_check', 'On check'),
('complete', 'Complete'),
('assigned', 'Assigned'),
))
deadline = DateTimeField()
checked_date = DateTimeField()
I need to display a statistics on assignment progresses grouped by lessons and groups for which courses assigned. Here is a my initial queryset, note that lessons are repeated in final result, the difference is in annotated data:
def annotated_lessons_queryset():
lessons = None
for course_assignment in CourseAssignment.objects.all():
qs = Lesson.objects.filter(
course=course_assignment.course
).annotate(
completed_progresses=Count(
'progresses',
filter=Q(group=course_assignment.group),
output_field=IntegerField()
),
on_check=Exists(
AssignmentProgress.objects.filter(
lesson=OuterRef('id'), group=course_assignment.group, status='on_check'
)
)
)
lessons = qs if lessons is None else lessons.union(qs)
return lessons
I canon use | OR operator here, because it returns only distinct lesson values.
So far this works until I try filter all the lessons with annotated status on_check:
qs = annotated_lessons_queryset().filter(on_check=True)
Which fails with the error:
raise NotSupportedError(
django.db.utils.NotSupportedError: Calling QuerySet.filter() after union() is not supported.
Please, suggest a workaround or another approach to make this queryset filtered.

I haven't pulled this in and tried it out yet, but as the error message states you have to use union() last. This is a bit complicated as "Lessons can be repeated" in this queryset. So I would suggest using a list comprehension to get what you need out.
qs = annotated_lessons_queryset()
filtered = [lesson for lesson in qs if lesson.on_check]

django - prefetch only the newest record?

I am trying to prefetch only the latest record against the parent record.
my models are as such
class LinkTargets(models.Model):
device_circuit_subnet = models.ForeignKey(DeviceCircuitSubnets, verbose_name="Device", on_delete=models.PROTECT)
interface_index = models.CharField(max_length=100, verbose_name='Interface index (SNMP)', blank=True, null=True)
get_bgp = models.BooleanField(default=False, verbose_name="get BGP Data?")
dashboard = models.BooleanField(default=False, verbose_name="Display on monitoring dashboard?")
class LinkData(models.Model):
link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
...
The below query fails with the error
AttributeError: 'LinkData' object has no attribute '_iterable_class'
Query:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.all().order_by('-id')[0]
)
)
I thought about getting LinkData instead and doing a select related but ive no idea how to get only 1 record for each link_target_id
link_data = LinkData.objects.filter(link_target__dashboard=True) \
.select_related('link_target')..?
EDIT:
using rtindru's solution, the pre fetched seems to be empty. there is 6 records in there currently, atest 1 record for each of the 3 LinkTargets
>>> link_data[0]
<LinkTargets: LinkTargets object>
>>> link_data[0].linkdata_set.all()
<QuerySet []>
>>>

The reason is that Prefetch expects a Django Queryset as the queryset parameter and you are giving an instance of an object.
Change your query as follows:
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk=LinkData.objects.latest('id').pk)
)
)
This does have the unfortunate effect of undoing the purpose of Prefetch to a large degree.
Update
This prefetches exactly one record globally; not the latest LinkData record per LinkTarget.
To prefetch the max LinkData for each LinkTarget you should start at LinkData: you can achieve this as follows:
LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id'))
This will return a dictionary of {link_target: 12, max_id: 3223}
You can then use this to return the right set of objects; perhaps filter LinkData based on the values of max_id.
That will look something like this:
latest_link_data_pks = LinkData.objects.filter(link_target__dashboard=True).values('link_target').annotate(max_id=Max('id')).values_list('max_id', flat=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_set',
queryset=LinkData.objects.filter(pk__in=latest_link_data_pks)
)
)

The following works on PostgreSQL. I understand it won't help OP, but it might be useful to somebody else.
from django.db.models import Count, Prefetch
from .models import LinkTargets, LinkData
link_data_qs = LinkData.objects.order_by(
'link_target__id',
'-id',
).distinct(
'link_target__id',
)
qs = LinkTargets.objects.prefetch_related(
Prefetch(
'linkdata_set',
queryset=link_data_qs,
)
).all()

LinkData.objects.all().order_by('-id')[0] is not a queryset, it is an model object, hence your error.
You could try LinkData.objects.all().order_by('-id')[0:1] which is indeed a QuerySet, but it's not going to work. Given how prefetch_related works, the queryset argument must return a queryset that contains all the LinkData records you need (this is then further filtered, and the items in it joined up with the LinkTarget objects). This queryset only contains one item, so that's no good. (And Django will complain "Cannot filter a query once a slice has been taken" and raise an exception, as it should).
Let's back up. Essentially you are asking an aggregation/annotation question - for each LinkTarget, you want to know the most recent LinkData object, or the 'max' of an 'id' column. The easiest way is to just annotate with the id, and then do a separate query to get all the objects.
So, it would look like this (I've checked with a similar model in my project, so it should work, but the code below may have some typos):
linktargets = (LinkTargets.objects
.filter(dashboard=True)
.annotate(most_recent_linkdata_id=Max('linkdata_set__id'))
# Now, if we need them, lets collect and get the actual objects
linkdata_ids = [t.most_recent_linkdata_id for t in linktargets]
linkdata_objects = LinkData.objects.filter(id__in=linkdata_ids)
# And we can decorate the LinkTarget objects as well if we want:
linkdata_d = {l.id: l for l in linkdata_objects}
for t in linktargets:
if t.most_recent_linkdata_id is not None:
t.most_recent_linkdata = linkdata_d[t.most_recent_linkdata_id]
I have deliberately not made this into a prefetch that masks linkdata_set, because the result is that you have objects that lie to you - the linkdata_set attribute is now missing results. Do you really want to be bitten by that somewhere down the line? Best to make a new attribute that has just the thing you want.

Tricky, but it seems to work:
class ForeignKeyAsOneToOneField(models.OneToOneField):
def __init__(self, to, on_delete, to_field=None, **kwargs):
super().__init__(to, on_delete, to_field=to_field, **kwargs)
self._unique = False
class LinkData(models.Model):
# link_target = models.ForeignKey(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT)
link_target = ForeignKeyAsOneToOneField(LinkTargets, verbose_name="Link Target", on_delete=models.PROTECT, related_name='linkdata_helper')
interface_description = models.CharField(max_length=200, verbose_name='Interface Description', blank=True, null=True)
link_data = LinkTargets.objects.filter(dashboard=True) \
.prefetch_related(
Prefetch(
'linkdata_helper',
queryset=LinkData.objects.all().order_by('-id'),
'linkdata'
)
)
# Now you can access linkdata:
link_data[0].linkdata
Ofcourse with this approach you can't use linkdata_helper to get related objects.

This is not a direct answer to you question, but solves the same problem. It is possible annotate newest object with a subquery, which I think is more clear. You also don't have to do stuff like Max("id") to limit the prefetch query.
It makes use of django.db.models.functions.JSONObject (added in Django 3.2) to combine multiple fields:
MainModel.objects.annotate(
last_object=RelatedModel.objects.filter(mainmodel=OuterRef("pk"))
.order_by("-date_created")
.values(
data=JSONObject(
id="id", body="body", date_created="date_created"
)
)[:1]
)

How to run a custom aggregation on a queryset?

I have a model called LeaveEntry:
class LeaveEntry(models.Model):
date = models.DateField(auto_now=False, auto_now_add=False)
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.PROTECT,
limit_choices_to={'is_active': True},
unique_for_date='date'
)
half_day = models.BooleanField(default=False)
I get a set of LeaveEntries with the filter:
LeaveEntry.objects.filter(
leave_request=self.unapproved_leave
).count()
I would like to get an aggregation called total days, so where a LeaveEntry has half_day=True then it is half a day so 0.5.
What I was thinking based on the django aggregations docs was annotating the days like this:
days = LeaveEntry.objects.annotate(days=<If this half_day is True: 0.5 else 1>)

You can use django's conditional expressions Case and When (only for django 1.8+):
Keeping the order of filter() and annotate() in wind you can count the the number of days left for unapproved leaves like so:
from django.db.models import FloatField, Case, When
# ...
LeaveEntry.objects.filter(
leave_request=self.unapproved_leave # not sure what self relates to
).annotate(
days=Count(Case(
When(half_day=True, then=0.5),
When(half_day=False, then=1),
output_field=FloatField()
)
)
)

Django: annotate Count with filter

I have "post" objects and a "post like" object with how many likes a post has received by which user:
class Post(models.Model):
text = models.CharField(max_length=500, default ='')
user = models.ForeignKey(User)
class PostLike(models.Model):
user = models.ForeignKey(User)
post = models.ForeignKey(Post)
I can select how many likes a post has received like this:
Post.objects.all().annotate(likes=Count('postlike'))
This roughly translates to:
SELECT p.*,
Count(l.id) AS likes
FROM post p, postlike l
WHERE p.id = l.post_id
GROUP BY (p.id)
It works. Now, how I can filter the Count aggregation by the current user? I'd like to retrieve not all the likes of the post, but all the likes by the logged user. The resulting SQL should be like:
SELECT p.*,
(SELECT COUNT(*) FROM postlike WHERE postlike.user_id = 1 AND postlike.post_id = p.id) AS likes
FROM post p, postlike l
WHERE p.id = l.post_id
GROUP BY (p.id)

Do you know the Count has a filter argument?
Post.objects.annotate(
likes=Count('postlike', filter=Q(postlike__user=logged_in_user))
)

It's not exactly as clean, but you could use Case/When...
posts = Post.objects.all().annotate(likes=models.Count(
models.Case(
models.When(postlike__user_id=user.id, then=1),
default=0,
output_field=models.IntegerField(),
)
))
And of course, you can always drop down to .extra() or even raw SQL when there's something you can't express via the Django ORM.

Try to add filter first:
Post.objects.filter(postlike__user=request.user).annotate(likes=Count('postlike'))
From the docs:
The filter precedes the annotation, so the filter constrains the objects considered when calculating the annotation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django get values for Max of grouped data - python

book_obj = Book.objects.get(title='Book1') reserve_qs = book_obj.reserve_set.all() This returns all the Reserves that contains this book. You can get the latest object using .first or .last() or sort them.

Related

Can I use an annotated subquery parameter later on in the same query?

What are the options to get filter on union querysets behavior with Django?

django - prefetch only the newest record?

How to run a custom aggregation on a queryset?

Django: annotate Count with filter

Categories

Resources