How to calculate total by month without using extra?
I'm currently using:
django 1.8
postgre 9.3.13
Python 2.7
Example.
What I have tried so far.
#Doesn't work for me but I don't mind because I don't want to use extra
truncate_month = connection.ops.date_trunc_sql('month','day')
invoices = Invoice.objects.filter(is_deleted = False,company = company).extra({'month': truncate_month}).values('month').annotate(Sum('total'))
----
#It works but I think that it's too slow if I query a big set of data
for current_month in range(1,13):
Invoice.objects.filter(date__month = current__month).annotate(total = Sum("total"))
and also this one, the answer seems great but I can't import the TruncMonth module.
Django: Group by date (day, month, year)
P.S. I know that this question is already asked multiple times but I don't see any answer.
Thanks!
SOLUTION:
Thanks to #Vin-G's answer.
First, you have to make a Function that can extract the month for you:
from django.db import models
from django.db.models import Func
class Month(Func):
function = 'EXTRACT'
template = '%(function)s(MONTH from %(expressions)s)'
output_field = models.IntegerField()
After that all you need to do is
annotate each row with the month
group the results by the annotated month using values()
annotate each result with the aggregated sum of the totals using Sum()
Important: if your model class has a default ordering specified in the meta options, then you will have to add an empty order_by() clause. This is because of https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate.
If you are unsure, you could just add the empty order_by() clause anyway without any adverse effects.
i.e.
from django.db.models import Sum
summary = (Invoice.objects
.annotate(m=Month('date'))
.values('m')
.annotate(total=Sum('total'))
.order_by())
See the full gist here: https://gist.github.com/alvingonzales/ff9333e39d221981e5fc4cd6cdafdd17
If you need further information:
Details on creating your own Func classes: https://docs.djangoproject.com/en/1.8/ref/models/expressions/#func-expressions
Details on the values() clause, (pay attention to how it interacts with annotate() with respect to order of the clauses):
https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#values
the order in which annotate() and values() clauses are applied to a query is significant. If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
result = (
invoices.objects
.all()
.values_list('created_at__year', 'created_at__month')
.annotate(Sum('total'))
.order_by('created_at__year', 'created_at__month')
)
itertools.groupby is the performant option in Python and can be utilized with a single db query:
from itertools import groupby
invoices = Invoice.objects.only('date', 'total').order_by('date')
month_totals = {
k: sum(x.total for x in g)
for k, g in groupby(invoices, key=lambda i: i.date.month)
}
month_totals
# {1: 100, 3: 100, 4: 500, 7: 500}
I am not aware of a pure django ORM solution. The date__month filter is very limited and cannot be used in values, order_by, etc.
Don't forget that Django querysets provide a native datetimes manager, which lets you easily pull all of the days/weeks/months/years out of any queryset for models with a datetime field. So if the Invoice model above has a created datetime field, and you want totals for each month in your queryset, you can just do:
invoices = Invoice.objects.all()
months = invoices.datetimes("created", kind="month")
for month in months:
month_invs = invoices.filter(created__month=month.month)
month_total = month_invs.aggregate(total=Sum("otherfield")).get("total")
print(f"Month: {month}, Total: {month_total}")
No external functions or deps needed.
I don't know if my solution is faster than your. You should profile it. Nonetheless I only query the db once instead of 12 times.
#utils.py
from django.db.models import Count, Sum
def get_total_per_month_value():
"""
Return the total of sales per month
ReturnType: [Dict]
{'December': 3400, 'February': 224, 'January': 792}
"""
result= {}
db_result = Sale.objects.values('price','created')
for i in db_result:
month = str(i.get('created').strftime("%B"))
if month in result.keys():
result[month] = result[month] + i.get('price')
else:
result[month] = i.get('price')
return result
#models.py
class Sale(models.Model):
price = models.PositiveSmallIntegerField()
created = models.DateTimeField(_(u'Published'), default="2001-02-24")
#views.py
from .utils import get_total_per_month_value
# ...
result = get_total_per_month_value()
test.py
#
import pytest
from mixer.backend.django import mixer
#Don't try to write in the database
pytestmark = pytest.mark.django_db
def test_get_total_per_month():
from .utils import get_total_per_month_value
selected_date = ['01','02','03','01','01']
#2016-01-12 == YYYY-MM-DD
for i in selected_date:
mixer.blend('myapp.Sale', created="2016-"+i+"-12")
values = get_total_per_month_value() #return a dict
months = values.keys()
assert 'January' in months, 'Should include January'
assert 'February' in months, 'Should include February'
assert len(months) == 3, 'Should aggregate the months'
I have a reservation models which have fields like booked date, commission amount, total booking amount etc. and based on the year provided I have to aggregate the reservations by months. Here is how I did that:
from django.db.models import Count, Sum
from django.db.models.functions import ExtractMonth
Reservation.objects.filter(
booked_date__year=year
).values(
'id',
'booked_date',
'commission_amount',
'total_amount'
).annotate(
month=ExtractMonth('booked_date')
).values('month').annotate(
total_commission_amount=Sum('commission_amount'),
total_revenue_amount=Sum('total_amount'),
total_booking=Count('id')
).order_by()
Related
I am using Django 3.1 with Postgres, and this is my abridged model:
class PlayerSeasonReport:
player = models.ForeignKey(Player)
competition_season = models.ForeignKey(CompetitionSeason)
class PlayerPrice:
player_season_report = models.ForeignKey(PlayerSeasonReport)
price = models.IntegerField()
date = models.DateTimeField()
# unique on (price, date)
I'm querying on the PlayerSeasonReport to get aggregate information about all players, in particular I would like the prices for the last n records (so the last price, the 7th-to-last price, etc.)
I currently get the PlayerSeasonReport queryset and annotate it like this:
base_query = PlayerSeasonReport.objects.filter(competition_season_id=id)
# This works fine
last_value = base_query.filter(
pk=OuterRef('pk'),
).order_by(
'pk',
'-player_prices__date'
).distinct('pk').annotate(
value=F('player_prices__price')
)
# Pull the value from a week ago
# This produces a value but is logically incorrect
# I am interested in the 7th-to-last value, not really from a week ago from day of query
week_ago = datetime.datetime.now() - datetime.timedelta(7)
value_7d_ago = base_query.filter(
pk=OuterRef('pk'),
player_prices__date__gte=week_ago,
).order_by(
'pk',
'fantasy_player_prices__date'
).distinct('pk').annotate(
value=F('player_prices__price')
)
return base_query.annotate(
value=Subquery(
value.values('value'),
output_field=FloatField()
),
# Same for value_7d_ago
# ...
# Many other annotations
)
Getting the most recent value works fine, but getting the last n values doesn't. I shouldn't be using datetime concepts in my logic, since what I'm really interested in is in the n-to-last values.
I've tried annotating the max date, then filtering based on this annotation, and also somehow slicing the subquery, but I can't seem to get any of it right.
It's worth noting that a price may not exist (there may be no record for n values in the past), in which case it should be null (the annotation based on datetime works)
How can I annotate the price values for the last n days?
Sorted:
base_query = PlayerSeasonReport.objects.filter(id=id)
# ...other manipulations on base query
prices = PlayerPrice.objects.filter(
fantasy_player_season_report=OuterRef('pk')
).order_by('-date')
return base_query.annotate(
price=Subquery(
prices.values('price')[:1],
output_field=FloatField()
),
prev_day_price=Subquery(
prices.values('price')[1:2],
output_field=FloatField()
),
# ...
)
Explanation:
We query on the child model (PlayerPrice) and join on the pk of the PlayerSeasonReport.
prices.values('price')[i:j] where j = i + 1 allows us to get the value we desire without evaluating the QuerySet (which is indispensable in a Subquery).
I have a queryset in my Django application:
databytes_all = DataByte.objects
Each item in the databytes_all queryset has many attributes but one of them is publish_date.
I'd like to order the queryset by publish_date, however if publish_date is None, I'd like the item to be at the end of the queryset.
This is what I'm trying but it's not working:
databytes_all = DataByte.objects
Make a queryset: filter out all of the publish dates that are None
no_date = databytes_all.filter(publish_date=None)
Make anohther queryset: exclude all of the items where publish date is none, then order the remaining items by publish date
databytes_with_date = databytes_all.order_by('-publish_date').exclude(publish_date=None)
Now combine the two querysets (however this doesnt work- the items with no publish date are first in the list when I want them to be last)
databytes = databytes_with_date | no_date
You do not need to filter, you can specify that the NULL should be last with:
from django.db.models import F
databytes_all.order_by(F('publish_date').desc(nulls_last=True))
We here thus make use of an F object [Django-doc] where we call the .desc(…) method [Django-doc]. This method has a nulls_last=… parameter that we can use to specify that items with NULL should be selected as the last records.
This will sort with:
SELECT …
FROM …
ORDER BY publish_date IS NULL, publish_date DESC
since False/0 is considered less than True/1, it will thus move the items with publish_date to the bottom of the records you retrieve.
In Django, you can combine a lot of QuerySet using chain
try this:
from itertools import chain
...
combined_results = list(chain(databytes_with_date, no_date))
i have a model which contains a field birth_year and in another model i have the user registration date.
I have the list of user ids for which i want to query if their age belongs to a particular range of age.User age is calculated as registration date - birth_year.
I was able to calculate it from current date as:
startAge=25
endAge=50
ageStartRange = (today - relativedelta(years=startAge)).year
ageEndRange = (today - relativedelta(years=endAge)).year
and i made the query as:
query.filter(profile_id__in=communityUsersIds, birth_year__lte=age_from, birth_year__gte=age_to).values('profile_id')
This way i am getting the userids whose age is in range bw 25 and 50. Instead of today how can i use registration_date(it is a field in another model) of user.
You can use native DB functions. Works like a charm using Postgres.
from django.contrib.auth.models import User
from django.db.models import DurationField, IntegerField, F, Func
class Age(Func):
function = 'AGE'
output_field = DurationField()
class AgeYears(Func):
template = 'EXTRACT (YEAR FROM %(function)s(%(expressions)s))'
function = 'AGE'
output_field = IntegerField()
users = User.objects.annotate(age=Age(F("dob")), age_years=AgeYears(F("dob"))).filter(age_years__gte=18)
for user in users:
print(user.age, user.age_years)
# which will generate result like below
# 10611 days, 0:00:00 29
The "today" version of the query was easy to do, because the "today" date doesn't depend on the individual fields in the row.
F Expressions
You can explore Django's F expressions as they allow you to reference the fields of the model in your queries (without pulling them into Python)
https://docs.djangoproject.com/en/1.7/topics/db/queries/#using-f-expressions-in-filters
e.g. for you, the age would be this F expressions:
F('registration_date__year') - F('birth_year')
However, we don't really need to calculate that, because e.g. to query for what you want, consider this query:
Model.filter(birth_year__lte=F('regisration_date__year') - 25)
From that you can do add a:
birth_year__gte=F('regisration_date__year') + 50,
or use a birth_year__range=(F('regisration_date__year') - 25, F('regisration_date__year') + 50))
Alternative: precalculate age value
Otherwise you can precalculate that age, since that value is knowable on user registration time
Model.update(age=F('registration_date__year') - F('birth_year'))
Once that is saved, it's as simple as Model.filter(age__range=(25, 50))
I have a model Book which has a field year_of_publishing. A user inputs the year and I want to filter the Book's set getting all the books published in that year.
year = self.cleaned_data.get('year', SOME_DEFAULT_VALUE)
books = Book.objects.filter(year_of_publishing=year)
However the user might leave year field blank, and I want to put some default value, obtaining which in the .filter function Django ORM would return all the books like this filter was not present at all. What value should I use? I suppose it should be type independent, so I can use it for Char-, Choice- and other type of fields.
You can pass an empty dictionary to .filter() to return all results, or filter on the fields/values you want:
filters = {}
year = request.GET.get('year')
if year:
filters['year'] = year
books = Book.objects.filter(**filters)
You can just build a queryset with a dictionary:
parameters = ['year', 'foo', 'bar']
query_dict = {}
for parameter in parameters:
if self.cleaned_data.get(parameter, None):
query_dict[parameter] = self.cleaned_data.get(parameter)
books = Book.objects.filter(**query_dict)
Assume I have a such model:
class Entity(models.Model):
start_time = models.DateTimeField()
I want to regroup them as list of lists which each list of lists contains Entities from the same date (same day, time should be ignored).
How can this be achieved in a pythonic way ?
Thanks
Create a small function to extract just the date:
def extract_date(entity):
'extracts the starting date from an entity'
return entity.start_time.date()
Then you can use it with itertools.groupby:
from itertools import groupby
entities = Entity.objects.order_by('start_time')
for start_date, group in groupby(entities, key=extract_date):
do_something_with(start_date, list(group))
Or, if you really want a list of lists:
entities = Entity.objects.order_by('start_time')
list_of_lists = [list(g) for t, g in groupby(entities, key=extract_date)]
I agree with the answer:
Product.objects.extra(select={'day': 'date( date_created )'}).values('day') \
.annotate(available=Count('date_created'))
But there is another point that:
the arguments of date() cannot use the double underline combine foreign_key field,
you have to use the table_name.field_name
result = Product.objects.extra(select={'day': 'date( product.date_created )'}).values('day') \
.annotate(available=Count('date_created'))
and product is the table_name
Also, you can use "print result.query" to see the SQL in CMD.