How to run a custom aggregation on a queryset?

How to run a custom aggregation on a queryset? - python

I have a model called LeaveEntry:
class LeaveEntry(models.Model):
date = models.DateField(auto_now=False, auto_now_add=False)
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.PROTECT,
limit_choices_to={'is_active': True},
unique_for_date='date'
)
half_day = models.BooleanField(default=False)
I get a set of LeaveEntries with the filter:
LeaveEntry.objects.filter(
leave_request=self.unapproved_leave
).count()
I would like to get an aggregation called total days, so where a LeaveEntry has half_day=True then it is half a day so 0.5.
What I was thinking based on the django aggregations docs was annotating the days like this:
days = LeaveEntry.objects.annotate(days=<If this half_day is True: 0.5 else 1>)

You can use django's conditional expressions Case and When (only for django 1.8+):
Keeping the order of filter() and annotate() in wind you can count the the number of days left for unapproved leaves like so:
from django.db.models import FloatField, Case, When
# ...
LeaveEntry.objects.filter(
leave_request=self.unapproved_leave # not sure what self relates to
).annotate(
days=Count(Case(
When(half_day=True, then=0.5),
When(half_day=False, then=1),
output_field=FloatField()
)
)
)

Related

Can I use an annotated subquery parameter later on in the same query?

I have a Django queryset that ideally does some annotation and filtering with 3 object classes. I have Conversations, Tickets, and Interactions.
My desired output is Conversations that have 1. an OPEN ticket, and 2. exactly ONE interaction, of type mass_text, since the ticket's created_at date.
I am trying to annotate the conversation query with ticket_created_at & filter out Nones, then somehow use that ticket_created_at parameter in a subsequent annotation/subquery to get count of interactions since the ticket_created_at date. Is this doable?
class Interaction(PolymorphicModel):
when = models.DateTimeField()
conversation = models.ForeignKey(Conversation)
mass_text = models.ForeignKey(MassText)
class Ticket(PolymorphicModel):
created_at = models.DateTimeField()
conversation = models.ForeignKey(Conversation)
status = models.CharField()
########################################################
open_ticket_subquery = (
Ticket.objects.filter(conversation=OuterRef("id"))
.filter(status=Ticket.Status.OPEN)
.order_by("-created_at")
)
filtered_conversations = (
self.get_queryset()
.select_related("student")
.annotate(
ticket_created_at=Subquery(
open_ticket_subquery.values("created_at")[:1]
)
)
.exclude(ticket_created_at=None)
.annotate(interactions_since_ticket=Count('interactions', filter=Q(interactions__when__gte=ticket_created_at)))
.filter(interactions_since_ticket=1)
This isn't working, because I can't figure out how to use ticket_created_at in the subsequent annotation.

How to limit top N of each group in Django ORM by using Postgres Window functions or Lateral Joins?

I have following Post, Category & PostScore Model.
class Post(models.Model):
category = models.ForeignKey('Category', on_delete=models.SET_NULL, related_name='category_posts', limit_choices_to={'parent_category': None}, blank=True, null=True)
status = models.CharField(max_length=100, choices=STATUS_CHOICES, default='draft')
deleted_at = models.DateTimeField(null=True, blank=True)
...
...
class Category(models.Model):
title = models.CharField(max_length=100)
parent_category = models.ForeignKey('self', on_delete=models.SET_NULL,
related_name='sub_categories', null=True, blank=True,
limit_choices_to={'parent_category': None})
...
...
class PostScore(models.Model):
post = models.OneToOneField(Post, on_delete=models.CASCADE, related_name='post_score')
total_score = models.DecimalField(max_digits=8, decimal_places=5, default=0)
...
...
So what i want is to write a query which returns N number of posts (Posts) of each distinct category (Category) sorted by post score (denoted by total_score column in PostScore model) in descending manner. So that i have atmost N records of each category with highest post score.
So i can achieve the above mentioned thing by the following raw query which gives me top 10 posts having highest score of each category :
SELECT *
FROM (
SELECT *,
RANK() OVER (PARTITION BY "post"."category_id"
ORDER BY "postscore"."total_score" DESC) AS "rank"
FROM
"post"
LEFT OUTER JOIN
"postscore"
ON
("post"."id" = "postscore"."post_id")
WHERE
("post"."deleted_at" IS NULL AND "post"."status" = 'accepted')
ORDER BY
"postscore"."total_score"
DESC
) final_posts
WHERE
rank <= 10
What i have achieved so far using Django ORM:
>>> from django.db.models.expressions import Window
>>> from django.db.models.functions import Rank
>>> from django.db.models import F
>>> posts = Post.objects.annotate(
rank=Window( expression=Rank(),
order_by=F('post_score__total_score').desc(),
partition_by[F('category_id')]
)). \
filter(status='accepted', deleted_at__isnull=True). \
order_by('-post_score__total_score')
which roughly evaluates to
>>> print(posts.query)
>>> SELECT *,
RANK() OVER (PARTITION BY "post"."category_id"
ORDER BY "postscore"."total_score" DESC) AS "rank"
FROM
"post"
LEFT OUTER JOIN
"postscore"
ON
("post"."id" = "postscore"."post_id")
WHERE
("post"."deleted_at" IS NULL AND "post"."status" = 'accepted')
ORDER BY
"postscore"."total_score"
DESC
So basically what is missing that i need to limit each group (i.e category) results by using “rank” alias.
Would love to know how this can be done ?
I have seen one answer suggested by Alexandr on this question, one way of achieving this is by using Subquery and in operator . Although it satisfies the above condition and outputs the right results but the query is very slow.
Anyway this would be the query if I go by Alexandr suggestions:
>>> from django.db.models import OuterRef, Subquery
>>> q = Post.objects.filter(status='accepted', deleted_at__isnull=True,
category=OuterRef('category')).order_by('-post_score__total_score')[:10]
>>> posts = Post.objects.filter(id__in=Subquery(q.values('id')))
So i am more keen in completing the above raw query (which is almost done just misses the limit part) by using window function in ORM. Also, i think this can be achieved by using lateral join so answers in this direction are also welcomed.

So I have got a workaround using RawQuerySet but the things is it returns a django.db.models.query.RawQuerySet which won't support methods like filter, exclude etc.
>>> posts = Post.objects.annotate(rank=Window(expression=Rank(),
order_by=F('post_score__total_score').desc(),
partition_by=[F('category_id')])).filter(status='accepted',
deleted_at__isnull=True)
>>> sql, params = posts.query.sql_with_params()
>>> posts = Post.objects.raw(""" SELECT * FROM ({}) final_posts WHERE
rank <= %s""".format(sql),[*params, 10],)
I'll wait for the answers which provides a solution which returns a QuerySet object instead, otherwise i have to do by this way only.

Django get values for Max of grouped data

After many trials and errors and checking similar questions, I think it worth asking it with all the details.
Here's a simple model. Let's say we have a Book model and a Reserve model that holds reservation data for each Book.
class Book(models.Model):
title = models.CharField(
'Book Title',
max_length=50
)
name = models.CharField(
max_length=250
)
class Reserve(models.Model):
book = models.ForeignKey(
Book,
on_delete=models.CASCADE
)
reserve_date = models.DateTimeField()
status = models.CharField(
'Reservation Status',
max_length=5,
choices=[
('R', 'Reserved'),
('F', 'Free')
]
)
I added a book and two reservation records to the model:
from django.utils import timezone
book_inst = Book(title='Book1')
book_inst.save()
reserve_inst = Reserve(book=book_inst, reserve_date=timezone.now(), status='R')
reserve_inst.save()
reserve_inst = Reserve(book=book_inst, reserve_date=timezone.now(), status='F')
reserve_inst.save()
My goal is to get data for the last reservation for each book. Based on other questions, I get it to this point:
from django.db.models import F, Q, Max
reserve_qs = Reserve.objects.values(
'book__title'
)
reserve_qs now has the last action for each Book, but when I add .value() it ignores the grouping and returns all the records.
I also tried filtering with F:
Reserve.objects.values(
'book__title'
).annotate(
last_action=Max('reserve_date')
).values(
).filter(
reserve_date=F('last_action')
)
I'm using Django 3 and SQLite.

By using another filter, you will break the GROUP BY mechanism. You can however simply obtain the last reservation with:
from django.db.models import F, Max
Reserve.objects.filter(
book__title='Book1'
).annotate(
book_title=F('book__title'),
last_action=Max('book__reserve__reserve_date')
).filter(
reserve_date=F('last_action')
)
or for all books:
from django.db.models import F, Max
qs = Reserve.objects.annotate(
book_title=F('book__title'),
last_action=Max('book__reserve__reserve_date')
).filter(
reserve_date=F('last_action')
).select_related('book')
Here we will thus calculate the maximum for that book. Since we here join on the same table, we thus group correctly.
This will retrieve all the last reservations for all Books that are retained after filtering. Normally that is one per Book. But if there are multiple Books with multiple Reservations with exactly the same timestamp, then multiple ones will be returned.
So we can for example print the reservations with:
for q in qs:
print(
'Last reservation for {} is {} with status {}',
q.book.title,
q.reserve_date,
q.status
)
For a single book however, it is better to simply fetch the Book object and return the .latest(..) [Django-doc] reseervation:
Book.objects.get(title='Book1').reserve_set.latest('reserve_date')

book_obj = Book.objects.get(title='Book1')
reserve_qs = book_obj.reserve_set.all()
This returns all the Reserves that contains this book.
You can get the latest object using .first or .last() or sort them.

Python Django get distinct queryset by month from a DateField

class MyModel(models.Model):
TRANSACTION_TYPE_CHOICES = (
('p', 'P'),
('c', 'C'),
)
status = models.CharField(max_length=50, choices=TRANSACTION_TYPE_CHOICES, default='c')
user = models.ForeignKey(User, db_index=True, on_delete=models.CASCADE,related_name='user_wallet')
date = models.DateField(auto_now=True)
amount = models.FloatField(null=True, blank=True)
def __unicode__(self):
return str(self.id)
I am a fresher in Python django and have a little knowledge in Django Rest Framework.
I have a model like above and I want to filter the date field by month and get distinct queryset by month.....Is there any default way to do this...
Thanks in advance

you can use TruncMonth with annotations
from django.db.models.functions import TruncMonth
MyModel.objects.annotate(
month=TruncMonth('date')
).filter(month=YOURVALUE).values('month').distinct()
or if you need only filter date by month with distinct you can use __month option
MyModel.objects.filter(date__month=YOURVALUE).distinct()
Older django
you can use extra, example for postgres
MyModel.objects.extra(
select={'month': "EXTRACT(month FROM date)"},
where=["EXTRACT(month FROM date)=%s"],
params=[5]
# CHANGE 5 on you value
).values('month').distinct()

This may help you
MyModel.object.values('col1','col2',...,'date').distinct('date')
OR try this:
from django.db.models.functions import TruncMonth
MyModel.objects
.annotate(month=TruncMonth('date')) # Truncate to month and add to select list
.values('month') # Group By month
.annotate(c=Count('id')) # Select the count of the grouping
.values('month', 'c') # (might be redundant, haven't tested) select month and count

Django group by hour

I have the following model in Django.
class StoreVideoEventSummary(models.Model):
Customer = models.ForeignKey(GlobalCustomerDirectory, null=True, db_column='CustomerID', blank=True, db_index=True)
Store = models.ForeignKey(Store, null=True, db_column='StoreID', blank=True, related_name="VideoEventSummary")
Timestamp = models.DateTimeField(null=True, blank=True, db_index=True)
PeopleCount = models.IntegerField(null=True, blank=True)
I would like to find out the number of people entering the store each hour.
To achieve this, I'm trying to group the rows by the hour on Timestamp and sum the PeopleCount column.
store_count_events = StoreVideoEventSummary.objects.filter(Timestamp__range=(start_time, end_time),
Customer__id=customer_id,
Store__StoreName=store)\
.order_by("Timestamp")\
.extra({
"hour": "date_part(\'hour\', \"Timestamp\")"
}).annotate(TotalPeople=Sum("PeopleCount"))
This doesn't seem to group the results by the hour, it merely adds a new column TotalPeople which has the same value as PeopleCount to each row in the query set.

just break it into two steps
import itertools
from datetime import datetime
# ...
def date_hour(timestamp):
return datetime.fromtimestamp(timestamp).strftime("%x %H")
objs = StoreVideoEventSummary.objects.filter(
Timestamp__range=(start_time, end_time),
Customer__id=customer_id,
Store__StoreName=store
).order_by("Timestamp")
groups = itertools.groupby(objs, lambda x: date_hour(x.Timestamp))
# since groups is an iterator and not a list you have not yet traversed the list
for group, matches in groups: # now you are traversing the list ...
print(group, "TTL:", sum(1 for _ in matches))
This allows you to group by several distinct criteria
Of you just want the hour regardless of date just change date_hour
def date_hour(timestamp):
return datetime.fromtimestamp(timestamp).strftime("%H")
If you wanted to group by day of the week you just use
def date_day_of_week(timestamp):
return datetime.fromtimestamp(timestamp).strftime("%w %H")
And update itertools.groupby's lambda to use date_day_of_week.

Building off your original code, could you try:
store_count_events = StoreVideoEventSummary.objects.filter(Timestamp__range=(start_time, end_time), Customer__id=customer_id, Store__StoreName=store)\
.extra({
"hour": "date_part(\'hour\', \"Timestamp\")"
})\
.values("hour")\
.group_by("hour")\
.annotate(TotalPeople=Sum("PeopleCount"))

I know I'm late here, but taking cues from the doc, https://docs.djangoproject.com/en/1.11/ref/models/querysets/#django.db.models.query.QuerySet.extra
the below filter should work for you.
store_count_events = StoreVideoEventSummary.objects.filter(
Timestamp__range=(start_time, end_time),
Customer__id=customer_id,
Store__StoreName=store
).order_by(
'Timestamp'
).extra(
select={
'hour': 'hour(Timestamp)'
}
).values(
'hour'
).annotate(
TotalPeople=Sum('PeopleCount')
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to run a custom aggregation on a queryset? - python

Related

Can I use an annotated subquery parameter later on in the same query?

How to limit top N of each group in Django ORM by using Postgres Window functions or Lateral Joins?

Django get values for Max of grouped data

Python Django get distinct queryset by month from a DateField

Django group by hour

Categories

Resources