Is it possible to GROUP BY an aggregate query with django ORM? - python

I'm trying to calculate the equivalent of SELECT SUM(...) FROM ... GROUP BY .... Here's a simplified analogy:
Let's say Salesperson objects sell stuff and get a commission on the margin they generate through each Sale:
sp = Salesperson.objects.get(pk=1)
my_sales = Sale.objects.filter(fk_salesperson=sp)
#calculate commission owing to sp
commission = 0
for sale in my_sales:
commission += sp.commission_rate\
* (sale.selling_price - sale.cost_price)
That last loop could be done with something like:
.annotate( commission= ( F('selling_price')-F('cost_price') )\
* sp.commission_rate )
But can I then further aggregate the query for all Salesperson objects? I.e. I want to know every salesperson's commission (i.e. roughly SELECT SUM( (sale_price-cost_price) * commission_rate) FROM Sales GROUP BY Salesperson). I could do something like below, but I'm trying to do it with ORM:
commissions = []
salespeople = Salesperson.objects.all()
for sp in salespeople:
data = Sale.objects.filter(fk_salesperson=sp)\
.annotate(salesperson=F('sp__email')\
.annotate(commission= ( F('selling_price')-F('cost_price') )\
* sp.commission_rate )
commissions.append(data)
Is there a way to do this with a single query (making the reporting db server do the work) rather than doing it on my application server?

The Sum() aggregate function is available in django.db.models and you can use related fields in an F expression.
from django.db.models import F, Sum
Sales.objects.values('salesperson__id').annotate(commission=Sum(
(F('selling_price') - F('cost_price')) * F('salesperson__commission_rate')
))

Related

How to get a proper and query with django where not exists statement

So I have this query which I would like to use as a filter:
select * from api_document ad
where not exists (
select 1
from api_documentactionuser ad2
where ad2.user_id=4 and ad2.action_id=3 and ad.id = ad2.document_id
limit 1
)
What I've tried with django is:
q = queryset.exclude(
Q(documentactionuser__action__id=3)
& Q(documentactionuser__user=current_user),
)
while queryset is a queryset on the api_document table. When I print the generated query however, django keeps separating the two conditions into two queries instead of simply using and, which in turn gives me the wrong data back:
select * FROM "api_document"
WHERE NOT (
EXISTS(SELECT 1 AS "a" FROM "api_documentactionuser" U1 WHERE (U1."action_id" = 3 AND U1."document_id" = ("api_document"."id")) LIMIT 1)
AND EXISTS(SELECT 1 AS "a" FROM "api_documentactionuser" U1 WHERE (U1."user_id" = 1 AND U1."document_id" = ("api_document"."id")) LIMIT 1)
)
I've tried chaining exclude().exclude(), filter(~#()).filter(~#()) and the above variant and it all returns nearly the same query, with the same data output
Use ~Exists and specify the queryset you want to check
queryset.filter(
~Exists(DocumentationActionUser.objects.filter(action__id=3).filter(user=current_user)
)
SHould do the trick (untested - just typed this directly here, so beware of typos! :) )

Is Nested aggregate queries possible with Django queryset

I want to calculate the monthly based profit with the following models using django queryset methods. The tricky point is that I have a freightselloverride field in the order table. It overrides the sum of freightsell in the orderItem table. An order may contain multiple orderItems. That's why I have to calculate order based profit first and then calculate the monthly based profit. Because if there is any order level freightselloverride data I should take this into consideration.
Below I gave a try using annotate method but could not resolve how to reach this SQL. Does Django allow this kind of nested aggregate queries?
select sales_month
,sum(sumSellPrice-sumNetPrice-sumFreighNet+coalesce(FreightSellOverride,sumFreightSell)) as profit
from
(
select CAST(DATE_FORMAT(b.CreateDate, '%Y-%m-01 00:00:00') AS DATETIME) AS `sales_month`,
a.order_id,b.FreightSellOverride
,sum(SellPrice) as sumSellPrice,sum(NetPrice) as sumNetPrice
,sum(FreightNet) as sumFreighNet,sum(FreightSell) as sumFreightSell
from OrderItem a
inner join Order b
on a.order_id=b.id
group by 1,2,3
) c
group by sales_month
I tried this
result = (OrderItem.objects
.annotate(sales_month=TruncMonth('order__CreateDate'))
.values('sales_month','order','order__FreightSellOverride')
.annotate(sumSellPrice=Sum('SellPrice'),sumNetPrice=Sum('NetPrice'),sumFreighNet=Sum('FreightNet'),sumFreightSell=Sum('FreightSell'))
.values('sales_month')
.annotate(profit=Sum(F('sumSellPrice')-F('sumNetPrice')-F('sumFreighNet')+Coalesce('order__FreightSellOverride','sumFreightSell')))
)
but get this error
Exception Type: FieldError
Exception Value:
Cannot compute Sum('<CombinedExpression: F(sumSellPrice) - F(sumNetPrice) - F(sumFreighNet) + Coalesce(F(ProjectId__FreightSellOverride), F(sumFreightSell))>'): '<CombinedExpression: F(sumSellPrice) - F(sumNetPrice) - F(sumFreighNet) + Coalesce(F(ProjectId__FreightSellOverride), F(sumFreightSell))>' is an aggregate
from django.db import models
from django.db.models import F, Count, Sum
from django.db.models.functions import TruncMonth, Coalesce
class Order(models.Model):
CreateDate = models.DateTimeField(verbose_name="Create Date")
FreightSellOverride = models.FloatField()
class OrderItem(models.Model):
SellPrice = models.DecimalField(max_digits=10,decimal_places=2)
FreightSell = models.DecimalField(max_digits=10,decimal_places=2)
NetPrice = models.DecimalField(max_digits=10,decimal_places=2)
FreightNet = models.DecimalField(max_digits=10,decimal_places=2)
order = models.ForeignKey(Order,on_delete=models.DO_NOTHING,related_name="Item")

Django & Postgres - percentile (median) and group by

I need to calculate period medians per seller ID (see simplyfied model below). The problem is I am unable to construct the ORM query.
Model
class MyModel:
period = models.IntegerField(null=True, default=None)
seller_ids = ArrayField(models.IntegerField(), default=list)
aux = JSONField(default=dict)
Query
queryset = (
MyModel.objects.filter(period=25)
.annotate(seller_id=Func(F("seller_ids"), function="unnest"))
.values("seller_id")
.annotate(
duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()),
median=Func(
F("duration"),
function="percentile_cont",
template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
),
)
.values("median", "seller_id")
)
ArrayField aggregation (seller_id) source
I think what I need to do is something along the lines below
select t.*, p_25, p_75
from t join
(select district,
percentile_cont(0.25) within group (order by sales) as p_25,
percentile_cont(0.75) within group (order by sales) as p_75
from t
group by district
) td
on t.district = td.district
above example source
Python 3.7.5, Django 2.2.8, Postgres 11.1
You can create a Median child class of the Aggregate class as was done by Ryan Murphy (https://gist.github.com/rdmurphy/3f73c7b1826cacee34f6c2a855b12e2e). Median then works just like Avg:
from django.db.models import Aggregate, FloatField
class Median(Aggregate):
function = 'PERCENTILE_CONT'
name = 'median'
output_field = FloatField()
template = '%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)'
Then to find the median of a field use
my_model_aggregate = MyModel.objects.all().aggregate(Median('period'))
which is then available as my_model_aggregate['period__median'].
Here's what did the trick.
from django.db.models import F, Func, IntegerField
from django.db.models.aggregates import Aggregate
queryset = (
MyModel.objects.filter(period=25)
.annotate(duration=Cast(KeyTextTransform("duration", "aux"), IntegerField()))
.filter(duration__isnull=False)
.annotate(seller_id=Func(F("seller_ids"), function="unnest"))
.values("seller_id") # group by
.annotate(
median=Aggregate(
F("duration"),
function="percentile_cont",
template="%(function)s(0.5) WITHIN GROUP (ORDER BY %(expressions)s)",
),
)
)
Notice the median annotation employs Aggregate and not Func as in the question.
Also, order of annotate() and filter() clauses as well as order of annotate() and values() clauses matters a lot!
BTW the resulting SQL is without a nested select and join.

Cumulative (running) sum with django orm and postgresql

Is it possible to calculate the cumulative (running) sum using django's orm? Consider the following model:
class AModel(models.Model):
a_number = models.IntegerField()
with a set of data where a_number = 1. Such that I have a number ( >1 ) of AModel instances in the database all with a_number=1. I'd like to be able to return the following:
AModel.objects.annotate(cumsum=??).values('id', 'cumsum').order_by('id')
>>> ({id: 1, cumsum: 1}, {id: 2, cumsum: 2}, ... {id: N, cumsum: N})
Ideally I'd like to be able to limit/filter the cumulative sum. So in the above case I'd like to limit the result to cumsum <= 2
I believe that in postgresql one can achieve a cumulative sum using window functions. How is this translated to the ORM?
For reference, starting with Django 2.0 it is possible to use the Window function to achieve this result:
AModel.objects.annotate(cumsum=Window(Sum('a_number'), order_by=F('id').asc()))\
.values('id', 'cumsum').order_by('id', 'cumsum')
From Dima Kudosh's answer and based on https://stackoverflow.com/a/5700744/2240489 I had to do the following:
I removed the reference to PARTITION BY in the sql and replaced with ORDER BY resulting in.
AModel.objects.annotate(
cumsum=Func(
Sum('a_number'),
template='%(expressions)s OVER (ORDER BY %(order_by)s)',
order_by="id"
)
).values('id', 'cumsum').order_by('id', 'cumsum')
This gives the following sql:
SELECT "amodel"."id",
SUM("amodel"."a_number")
OVER (ORDER BY id) AS "cumsum"
FROM "amodel"
GROUP BY "amodel"."id"
ORDER BY "amodel"."id" ASC, "cumsum" ASC
Dima Kudosh's answer was not summing the results but the above does.
For posterity, I found this to be a good solution for me. I didn't need the result to be a QuerySet, so I could afford to do this, since I was just going to plot the data using D3.js:
import numpy as np
import datettime
today = datetime.datetime.date()
raw_data = MyModel.objects.filter('date'=today).values_list('a_number', flat=True)
cumsum = np.cumsum(raw_data)
You can try to do this with Func expression.
from django.db.models import Func, Sum
AModel.objects.annotate(cumsum=Func(Sum('a_number'), template='%(expressions)s OVER (PARTITION BY %(partition_by)s)', partition_by='id')).values('id', 'cumsum').order_by('id')
Check this
AModel.objects.order_by("id").extra(select={"cumsum":'SELECT SUM(m.a_number) FROM table_name m WHERE m.id <= table_name.id'}).values('id', 'cumsum')
where table_name should be the name of table in database.

SQLAlchemy group by minute

The task is a grouping of datetime values (using SQLAlchemy) into per minute points (group by minute).
I have a custom SQL-query:
SELECT COUNT(*) AS point_value, MAX(time) as time
FROM `Downloads`
LEFT JOIN Mirror ON Downloads.mirror = Mirror.id
WHERE Mirror.domain_name = 'localhost.local'
AND `time` BETWEEN '2012-06-30 00:29:00' AND '2012-07-01 00:29:00'
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
ORDER BY time ASC
It works great, but now I have do it in SQLAlchemy. This is what I've got for now (grouping by year is just an example):
rows = (DBSession.query(func.count(Download.id), func.max(Download.time)).
filter(Download.time >= fromInterval).
filter(Download.time <= untilInterval).
join(Mirror,Download.mirror==Mirror.id).
group_by(func.year(Download.time)).
order_by(Download.time)
)
It gives me this SQL:
SELECT count("Downloads".id) AS count_1, max("Downloads".time) AS max_1
FROM "Downloads" JOIN "Mirror" ON "Downloads".mirror = "Mirror".id
WHERE "Downloads".time >= :time_1 AND "Downloads".time <= :time_2
GROUP BY year("Downloads".time)
ORDER BY "Downloads".time
As you can see, it lacking only the correct grouping:
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
Does SQLAlchemy have some function to group by minute?
You can use any SQL side function from SA by means of Functions, which you already use fr the YEAR part. I think in your case you just need to add (not tested):
from sqlalchemy.sql import func
...
# add another group_by to your existing query:
rows = ...
group_by(func.year(Download.time),
60 * func.HOUR(Download.time) + func.MINUTE(Download.time)
)

Categories

Resources