Month on month values in django query - python

I have an annotation like this: which displays the month wise count of a field
bar = Foo.objects.annotate(
item_count=Count('item')
).order_by('-item_month', '-item_year')
and this produces output like this:
html render
I would like to show the change in item_count when compared with the previous month item_count for each month (except the first month). How could I achieve this using annotations or do I need to use pandas?
Thanks
Edit:
In SQL this becomes easy with LAG function, which is similar to
SELECT item_month, item_year, COUNT(item),
LAG(COUNT(item)) OVER (ORDER BY item_month, item_year)
FROM Foo
GROUP BY item_month, item_year
(PS: item_month and item_year are date fields)
Do Django ORM have similar to LAG in SQL?

For these types of Query you need to use Window functions in django Orm
For Lag you can take the help of
https://docs.djangoproject.com/en/4.0/ref/models/database-functions/#lag
Working Query in Orm will look like this :
#models.py
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
This is just a dummy table to show you the use case of Lag and Window function in django
Because examples are not available for Lag function on Django Docs.
from django.db.models.functions import Lag, ExtractYear
from django.db.models import F, Window
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=ExtractYear('date_added').asc())).order_by('-num_likes').distinct().query)
Query will look like
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY EXTRACT(YEAR FROM `temp_view_review`.`date_added`) ASC) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
Also if you don't want to order_by on extracted year of date then you can use F expressions like this
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=[F('date_added')])).order_by('-num_likes').distinct().query)
Query for this :
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY `temp_view_review`.`date_added`) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC

Related

Is Nested aggregate queries possible with Django queryset

I want to calculate the monthly based profit with the following models using django queryset methods. The tricky point is that I have a freightselloverride field in the order table. It overrides the sum of freightsell in the orderItem table. An order may contain multiple orderItems. That's why I have to calculate order based profit first and then calculate the monthly based profit. Because if there is any order level freightselloverride data I should take this into consideration.
Below I gave a try using annotate method but could not resolve how to reach this SQL. Does Django allow this kind of nested aggregate queries?
select sales_month
,sum(sumSellPrice-sumNetPrice-sumFreighNet+coalesce(FreightSellOverride,sumFreightSell)) as profit
from
(
select CAST(DATE_FORMAT(b.CreateDate, '%Y-%m-01 00:00:00') AS DATETIME) AS `sales_month`,
a.order_id,b.FreightSellOverride
,sum(SellPrice) as sumSellPrice,sum(NetPrice) as sumNetPrice
,sum(FreightNet) as sumFreighNet,sum(FreightSell) as sumFreightSell
from OrderItem a
inner join Order b
on a.order_id=b.id
group by 1,2,3
) c
group by sales_month
I tried this
result = (OrderItem.objects
.annotate(sales_month=TruncMonth('order__CreateDate'))
.values('sales_month','order','order__FreightSellOverride')
.annotate(sumSellPrice=Sum('SellPrice'),sumNetPrice=Sum('NetPrice'),sumFreighNet=Sum('FreightNet'),sumFreightSell=Sum('FreightSell'))
.values('sales_month')
.annotate(profit=Sum(F('sumSellPrice')-F('sumNetPrice')-F('sumFreighNet')+Coalesce('order__FreightSellOverride','sumFreightSell')))
)
but get this error
Exception Type: FieldError
Exception Value:
Cannot compute Sum('<CombinedExpression: F(sumSellPrice) - F(sumNetPrice) - F(sumFreighNet) + Coalesce(F(ProjectId__FreightSellOverride), F(sumFreightSell))>'): '<CombinedExpression: F(sumSellPrice) - F(sumNetPrice) - F(sumFreighNet) + Coalesce(F(ProjectId__FreightSellOverride), F(sumFreightSell))>' is an aggregate
from django.db import models
from django.db.models import F, Count, Sum
from django.db.models.functions import TruncMonth, Coalesce
class Order(models.Model):
CreateDate = models.DateTimeField(verbose_name="Create Date")
FreightSellOverride = models.FloatField()
class OrderItem(models.Model):
SellPrice = models.DecimalField(max_digits=10,decimal_places=2)
FreightSell = models.DecimalField(max_digits=10,decimal_places=2)
NetPrice = models.DecimalField(max_digits=10,decimal_places=2)
FreightNet = models.DecimalField(max_digits=10,decimal_places=2)
order = models.ForeignKey(Order,on_delete=models.DO_NOTHING,related_name="Item")

Conditional annotations with Aggregation over only some fields in Django

So lets assume i have two databases in my Django project
class Article(models.Model):
name = models.CharField(max_length=200)
# ..
class Price(models.Model):
article = models.ForeignKey('Article')
date = models.DateTimeField(auto_now_add=True)
price = models.DecimalField()
# ..
There exist multiple Price entries per day for the same article.
Now I want to annotate an article queryset with the average price of every article on the previous day. But I have no idea on how to do this in one efficient query.
What I have done is this:
articles = Articles.objects.all().select_related().filter(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1)).annotate(avg_price=Avg('price__price'))
This works, if every article would have at least one price each day. But that isnt always the case. Articles that have no price for the previous day should have None or 0 or some default as avg_price.
Does anybody know how to achieve this?
Aggregation functions can take an argument filter [Django docs] which can be used to put conditions on the aggregation:
from django.db.models import Q
articles = Articles.objects.all().select_related().annotate(
avg_price=Avg(
'price__price',
filter=Q(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1))
)
)

Django queryset order by latest value in related field

Consider the following Models in Django:
class Item(models.Model):
name = models.CharField(max_length = 100)
class Item_Price(models.Model):
created_on = models.DateTimeField(default = timezone.now)
item = models.ForeignKey('Item', related_name = 'prices')
price = models.DecimalField(decimal_places = 2, max_digits = 15)
The price of an item can vary throughout time so I want to keep a price history.
My goal is to have a single query using the Django ORM to get a list of Items with their latest prices and sort the results by this price in ascending order.
What would be the best way to achieve this?
You can use a Subquery to obtain the latest Item_Price object and sort on these:
from django.db.models import OuterRef, Subquery
last_price = Item_Price.objects.filter(
item_id=OuterRef('pk')
).order_by('-created_on').values('price')[:1]
Item.objects.annotate(
last_price=Subquery(last_price)
).order_by('last_price')
For each Item, we thus obtain the latest Item_Price and use this in the annotation.
That being said, the above modelling is perhaps not ideal, since it will require a lot of complex queries. django-simple-history [readthedocs.io] does this differently by creating an extra model and save historical records. It also has a manager that allows one to query for historical states. This perhaps makes working with historical dat simpeler.
You could prefetch them in order to do the nested ordering inline like the following:
from django.db.models import Prefetch
prefetched_prices = Prefetch("prices", queryset=Item_Price.objects.order_by("price"))
for i in Item.objects.prefetch_related(prefetched_prices): i.name, i.prices.all()

Django & Postgres - percentile (median) and group by

I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.
Model
class MyModel:
period = models.IntegerField(null=True, default=None)
seller_ids = ArrayField(models.IntegerField(), default=list)
aux = JSONField(default=dict)
Query
queryset = (
MyModel.objects.filter(period=25)
.annotate(seller_id=Func(F("seller_ids"), function="unnest"))
.values("seller_id")
.annotate(
duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
median=Func(
F("duration"),
function="percentile_cont",
template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
),
)
.values("median", "seller_id")
)
ArrayField aggregation (seller_id) source
I think what I need to do is something along the lines below
select t.*, p_25, p_75
from t join
(select district,
percentile_cont(0.25) within group (order by sales) as p_25,
percentile_cont(0.75) within group (order by sales) as p_75
from t
group by district
) td
on t.district = td.district
above example source
Python 3.7.5, Django 2.2.8, Postgres 11.1
You can create a Median child class of the Aggregate class as was done by Ryan Murphy (https://gist.github.com/rdmurphy/3f73c7b1826cacee34f6c2a855b12e2e). Median then works just like Avg:
from django.db.models import Aggregate, FloatField
class Median(Aggregate):
function = 'PERCENTILE_CONT'
name = 'median'
output_field = FloatField()
template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'
Then to find the median of a field use
my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))
which is then available as my_model_aggregate['period__median'].
Here's what did the trick.
from django.db.models import F, Func, IntegerField
from django.db.models.aggregates import Aggregate
queryset = (
MyModel.objects.filter(period=25)
.annotate(duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()))
.filter(duration__isnull=False)
.annotate(seller_id=Func(F("seller_ids"), function="unnest"))
.values("seller_id") # group by
.annotate(
median=Aggregate(
F("duration"),
function="percentile_cont",
template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
),
)
)
Notice the median annotation employs Aggregate and not Func as in the question.
Also, order of annotate() and filter() clauses as well as order of annotate() and values() clauses matters a lot!
BTW the resulting SQL is without a nested select and join.

In Django ORM: Select record from each group with maximal value of a given attribute

Say I have three models as follows representing the prices of goods sold at several retail locations of the same company:
class Store(models.Model):
name = models.CharField(max_length=256)
address = models.TextField()
class Product(models.Model):
name = models.CharField(max_length=256)
description = models.TextField()
class Price(models.Model):
store = models.ForeignKey(Store)
product = models.ForeignKey(Product)
effective_date = models.DateField()
value = models.FloatField()
When a price is set, it is set on a store-and-product-specific basis. I.e. the same item can have different prices in different stores. And each of these prices has an effective date. For a given store and a given product, the currently-effective price is the one with the latest effective_date.
What's the best way to write the query that will return the currently-effective price of all items in all stores?
If I were using Pandas, I would get myself a dataframe with columns ['store', 'product', 'effective_date', 'price'] and I would run
dataframe\
.sort_values(columns=['store', 'product', 'effective_date'], ascending=[True, True, False])\
.groupby('store', 'product')['price'].first()
But there has to be some way of doing this directly on the database level. Thoughts?
If your DBMS is PostgreSQL you can use distinct combined with order_by this way :
Price.objects.order_by('store','product','-effective_date').distinct('store','product')
It will give you all the latest prices for all product/store combinations.
There are tricks about distinct, have a look at the docs here : https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Without Postgres' added power (which you should really use) there is a more complicated solution to this (based on ryanpitts' idea), which requires two db hits:
latest_set = Price.objects
.values('store_id', 'product_id') # important to have values before annotate ...
.annotate(max_date=Max('effective_date')).order_by()
# ... to annotate for the grouping that results from values
# Build a query that reverse-engineers the Price records that contributed to
# 'latest_set'. (Relying on the fact that there are not 2 Prices
# for the same product-store with an identical date)
q_statement = Q(product_id=-1) # sth. that results in empty qs
for latest_dict in latest_set:
q_statement |=
(Q(product_id=latest_dict['product_id']) &
Q(store_id=latest_dict['store_id']) &
Q(effective_date=latest_dict['max_date']))
Price.objects.filter(q_statement)
If you are using PostgreSQL, you could use order_by and distinct to get the current effective prices for all the products in all the stores as follows:
prices = Price.objects.order_by('store', 'product', '-effective_date')
.distinct('store', 'product')
Now, this is quite analogous to what you have there for Pandas.
Do note that using field names in distinct only works in PostgreSQL. Once you have sorted the prices based on store, product and decreasing order of effective date, distinct('store', 'product') will retain only the first entry for each store-product pair and that will be your current entry with recent price.
Not PostgreSQL database:
If you are not using PostgreSQL, you could do it with two queries:
First, we get latest effective date for all the store-product groups:
latest_effective_dates = Price.objects.values('store_id', 'product_id')
.annotate(led=Max('effective_date')).values('led')
Once we have these dated we could get the prices for this date:
prices = Price.objects.filter(effective_date__in=latest_effective_dates)
Disclaimer: This assumes that for no effective_date is same for any store-product group.

Categories

Resources