Cumulative (running) sum with django orm and postgresql - python

Is it possible to calculate the cumulative (running) sum using django's orm? Consider the following model:
class AModel(models.Model):
a_number = models.IntegerField()
with a set of data where a_number = 1. Such that I have a number ( >1 ) of AModel instances in the database all with a_number=1. I'd like to be able to return the following:
AModel.objects.annotate(cumsum=??).values('id', 'cumsum').order_by('id')
>>> ({id: 1, cumsum: 1}, {id: 2, cumsum: 2}, ... {id: N, cumsum: N})
Ideally I'd like to be able to limit/filter the cumulative sum. So in the above case I'd like to limit the result to cumsum <= 2
I believe that in postgresql one can achieve a cumulative sum using window functions. How is this translated to the ORM?

For reference, starting with Django 2.0 it is possible to use the Window function to achieve this result:
AModel.objects.annotate(cumsum=Window(Sum('a_number'), order_by=F('id').asc()))\
.values('id', 'cumsum').order_by('id', 'cumsum')

From Dima Kudosh's answer and based on https://stackoverflow.com/a/5700744/2240489 I had to do the following:
I removed the reference to PARTITION BY in the sql and replaced with ORDER BY resulting in.
AModel.objects.annotate(
cumsum=Func(
Sum('a_number'),
template='%(expressions)s OVER (ORDER BY %(order_by)s)',
order_by="id"
)
).values('id', 'cumsum').order_by('id', 'cumsum')
This gives the following sql:
SELECT "amodel"."id",
SUM("amodel"."a_number")
OVER (ORDER BY id) AS "cumsum"
FROM "amodel"
GROUP BY "amodel"."id"
ORDER BY "amodel"."id" ASC, "cumsum" ASC
Dima Kudosh's answer was not summing the results but the above does.

For posterity, I found this to be a good solution for me. I didn't need the result to be a QuerySet, so I could afford to do this, since I was just going to plot the data using D3.js:
import numpy as np
import datettime
today = datetime.datetime.date()
raw_data = MyModel.objects.filter('date'=today).values_list('a_number', flat=True)
cumsum = np.cumsum(raw_data)

You can try to do this with Func expression.
from django.db.models import Func, Sum
AModel.objects.annotate(cumsum=Func(Sum('a_number'), template='%(expressions)s OVER (PARTITION BY %(partition_by)s)', partition_by='id')).values('id', 'cumsum').order_by('id')

Check this
AModel.objects.order_by("id").extra(select={"cumsum":'SELECT SUM(m.a_number) FROM table_name m WHERE m.id <= table_name.id'}).values('id', 'cumsum')
where table_name should be the name of table in database.

Related

Can't convert SQL to django query (having doesn't work)

I have this SQL:
SELECT
stock_id, consignment_id, SUM(qty), SUM(cost)
FROM
warehouse_regсonsignmentproduct
WHERE
product_id = '1'
GROUP BY
stock_id, consignment_id
HAVING
SUM(qty) > 0
I used django ORM to create this query:
regСonsignmentProduct.objects
.filter(product='1')
.order_by('period')
.values('stock', 'consignment')
.annotate(total_qty=Sum('qty'), total_cost=Sum('cost'))
.filter(total_qty__gt=0)
But my django query returns an incorrect result.
I think, the problem is in "annotate"
Thanks!
You need to order by the values to force grouping, so:
regСonsignmentProduct.objects.filter(product='1').values(
'stock', 'consignment'
).annotate(
total_qty=Sum('qty'),
total_cost=Sum('cost')
).order_by('stock', 'consignment').filter(total_qty__gt=0)

SQLAlchemy counting subquery result

I have an SQL query which I want to convert to use the ORM but I cannot get the ORM to count the results from the subquery.
So my working SQL is:
select FOO
,BAR
,TOTALCOUNT
from(
select FOO
,BAR
,COUNT(BAR) OVER (PARTITION BY FOO) AS TOTALCOUNT
from(
SELECT distinct
[FOO]
,[BAR]
FROM [database].[dbo].[table]
)m
)m
WHERE TOTALCOUNT > 10
I have tried to create the equivalent code using the ORM but my final result has just 1's for the final count, the code I have tried is below
subs = session.query(table.FOO,table.BAR).filter(
table.date > datetime.now() - timedelta(days=10),
).distinct().subquery()
result = pd.read_sql(session.query(subs.c.FOO,subs.c.BAR,func.count(subs.c.BAR).label('TOTALCOUNT')).group_by(subs.c.FOO,subs.c.BAR).statement,session.bind)
I have also tried to do it in one query with:
result = pd.read_sql(session.query(table.FOO,table.BAR,func.count(table.BAR).label("TOTALCOUNT")).filter(
and_(
table.date> datetime.now() - timedelta(days= 30),
)
),groupby.order_by(table.FOO).distinct().statement,session.bind)
But that is counting the columns before applying the distinct operator so the count is incorrect. I would really appreciate if someone could assist me or tell me where I am going wrong, I have googled all morning and cant seem to find an answer.
ahh im an idiot, should pay more attention to what I am doing, added the alias and then removed an additional column i was grouping by. However should anyone else ever struggle with something similar here is the working code.
subs = session.query(table.FOO,table.BAR).filter(
table.date > datetime.now() - timedelta(days=10),
).distinct().subquery().alias('subs')
result = pd.read_sql(session.query(subs.c.FOO,func.count(subs.c.BAR).label('TOTALCOUNT'))./
group_by(subs.c.FOO).statement,session.bind)

Django: get duplicates based on annotation

I want to get all duplicates based on a case insensitive field value.
Basically to rewrite this SQL query
SELECT count(*), lower(name)
FROM manufacturer
GROUP BY lower(name)
HAVING count(*) > 1;
with Django ORM. I was hoping something like this would do the trick
from django.db.models import Count
from django.db.models.functions import Lower
from myapp.models import Manufacturer
qs = Manufacturer.objects.annotate(
name_lower=Lower('name'),
cnt=Count('name_lower')
).filter('cnt__gt'=1)
but of course it didn't work.
Any idea how to do this?
you can try it:
qs = Manufacturer.objects.annotate(lname=Lower('name')
).values('lname').annotate(cnt=Count(Lower('name'))
).values('lname', 'cnt').filter(cnt__gt=1).order_by('lname', 'cnt')
why should add the order_by ordering-or-order-by:
the sql query looks like:
SELECT
LOWER("products_manufacturer"."name") AS "lname",
COUNT(LOWER("products_manufacturer"."name")) AS "cnt"
FROM "products_manufacturer"
GROUP BY LOWER("products_manufacturer"."name")
HAVING COUNT(LOWER("products_manufacturer"."name")) > 1
ORDER BY "lname" ASC, "cnt" ASC

Django: Query Group By Month

How to calculate total by month without using extra?
I'm currently using:
django 1.8
postgre 9.3.13
Python 2.7
Example.
What I have tried so far.
#Doesn't work for me but I don't mind because I don't want to use extra
truncate_month = connection.ops.date_trunc_sql('month','day')
invoices = Invoice.objects.filter(is_deleted = False,company = company).extra({'month': truncate_month}).values('month').annotate(Sum('total'))
----
#It works but I think that it's too slow if I query a big set of data
for current_month in range(1,13):
Invoice.objects.filter(date__month = current__month).annotate(total = Sum("total"))
and also this one, the answer seems great but I can't import the TruncMonth module.
Django: Group by date (day, month, year)
P.S. I know that this question is already asked multiple times but I don't see any answer.
Thanks!
SOLUTION:
Thanks to #Vin-G's answer.
First, you have to make a Function that can extract the month for you:
from django.db import models
from django.db.models import Func
class Month(Func):
function = 'EXTRACT'
template = '%(function)s(MONTH from %(expressions)s)'
output_field = models.IntegerField()
After that all you need to do is
annotate each row with the month
group the results by the annotated month using values()
annotate each result with the aggregated sum of the totals using Sum()
Important: if your model class has a default ordering specified in the meta options, then you will have to add an empty order_by() clause. This is because of https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate.
If you are unsure, you could just add the empty order_by() clause anyway without any adverse effects.
i.e.
from django.db.models import Sum
summary = (Invoice.objects
.annotate(m=Month('date'))
.values('m')
.annotate(total=Sum('total'))
.order_by())
See the full gist here: https://gist.github.com/alvingonzales/ff9333e39d221981e5fc4cd6cdafdd17
If you need further information:
Details on creating your own Func classes: https://docs.djangoproject.com/en/1.8/ref/models/expressions/#func-expressions
Details on the values() clause, (pay attention to how it interacts with annotate() with respect to order of the clauses):
https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#values
the order in which annotate() and values() clauses are applied to a query is significant. If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
result = (
invoices.objects
.all()
.values_list('created_at__year', 'created_at__month')
.annotate(Sum('total'))
.order_by('created_at__year', 'created_at__month')
)
itertools.groupby is the performant option in Python and can be utilized with a single db query:
from itertools import groupby
invoices = Invoice.objects.only('date', 'total').order_by('date')
month_totals = {
k: sum(x.total for x in g)
for k, g in groupby(invoices, key=lambda i: i.date.month)
}
month_totals
# {1: 100, 3: 100, 4: 500, 7: 500}
I am not aware of a pure django ORM solution. The date__month filter is very limited and cannot be used in values, order_by, etc.
Don't forget that Django querysets provide a native datetimes manager, which lets you easily pull all of the days/weeks/months/years out of any queryset for models with a datetime field. So if the Invoice model above has a created datetime field, and you want totals for each month in your queryset, you can just do:
invoices = Invoice.objects.all()
months = invoices.datetimes("created", kind="month")
for month in months:
month_invs = invoices.filter(created__month=month.month)
month_total = month_invs.aggregate(total=Sum("otherfield")).get("total")
print(f"Month: {month}, Total: {month_total}")
No external functions or deps needed.
I don't know if my solution is faster than your. You should profile it. Nonetheless I only query the db once instead of 12 times.
#utils.py
from django.db.models import Count, Sum
def get_total_per_month_value():
"""
Return the total of sales per month
ReturnType: [Dict]
{'December': 3400, 'February': 224, 'January': 792}
"""
result= {}
db_result = Sale.objects.values('price','created')
for i in db_result:
month = str(i.get('created').strftime("%B"))
if month in result.keys():
result[month] = result[month] + i.get('price')
else:
result[month] = i.get('price')
return result
#models.py
class Sale(models.Model):
price = models.PositiveSmallIntegerField()
created = models.DateTimeField(_(u'Published'), default="2001-02-24")
#views.py
from .utils import get_total_per_month_value
# ...
result = get_total_per_month_value()
test.py
#
import pytest
from mixer.backend.django import mixer
#Don't try to write in the database
pytestmark = pytest.mark.django_db
def test_get_total_per_month():
from .utils import get_total_per_month_value
selected_date = ['01','02','03','01','01']
#2016-01-12 == YYYY-MM-DD
for i in selected_date:
mixer.blend('myapp.Sale', created="2016-"+i+"-12")
values = get_total_per_month_value() #return a dict
months = values.keys()
assert 'January' in months, 'Should include January'
assert 'February' in months, 'Should include February'
assert len(months) == 3, 'Should aggregate the months'
I have a reservation models which have fields like booked date, commission amount, total booking amount etc. and based on the year provided I have to aggregate the reservations by months. Here is how I did that:
from django.db.models import Count, Sum
from django.db.models.functions import ExtractMonth
Reservation.objects.filter(
booked_date__year=year
).values(
'id',
'booked_date',
'commission_amount',
'total_amount'
).annotate(
month=ExtractMonth('booked_date')
).values('month').annotate(
total_commission_amount=Sum('commission_amount'),
total_revenue_amount=Sum('total_amount'),
total_booking=Count('id')
).order_by()

Django queryset SUM positive and negative values

I have a model which have IntegerField named as threshold.
I need to get total SUM of threshold value regardless of negative values.
vote_threshold
100
-200
-5
result = 305
Right now I am doing it like this.
earning = 0
result = Vote.objects.all().values('vote_threshold')
for v in result:
if v.vote_threshold > 0:
earning += v.vote_threshold
else:
earning -= v.vote_threshold
What is a faster and more proper way?
use abs function in django
from django.db.models.functions import Abs
from django.db.models import Sum
<YourModel>.objects.aggregate(s=Sum(Abs("vote_threshold")))
try this:
objects = Vote.objects.extra(select={'abs_vote_threshold': 'abs(vote_threshold)'}).values('abs_vote_threshold')
earning = sum([obj['abs_vote_threshold'] for obj in objects])
I don't think there is an easy way to do the calculation using the Django orm. Unless you have performance issues, there is nothing wrong with doing the calculation in python. You can simplify your code slightly by using sum() and abs().
votes = Vote.objects.all()
earning = sum(abs(v.vote_threshold) for v in votes)
If performance is an issue, you can use raw SQL.
from django.db import connection
cursor = connection.cursor()
cursor.execute("SELECT sum(abs(vote_theshold)) from vote")
row = cursor.fetchone()
earning = row[0]
This one example, if you want to sum negative and positive in one query
select = {'positive': 'sum(if(value>0, value, 0))',
'negative': 'sum(if(value<0, value, 0))'}
summary = items.filter(query).extra(select=select).values('positive', 'negative')[0]
positive, negative = summary['positive'], summary['negative']

Categories

Resources