I’m trying to find duplicates of a Django model-object's instance based on grandparent-instance id and filter out older duplicates based on timestamp field.
I suppose I could do this with distinct(*specify_fields) function, but I don’t use Postgresql database (docs). I managed to achieve this with the following code:
queryset = MyModel.objects.filter(some_filtering…) \
.only('parent_id__grandparent_id', 'timestamp' 'regular_fields'...) \
.values('parent_id__grandparent_id', 'timestamp' 'regular_fields'...)
# compare_all_combinations_and_remove_duplicates_with_older_timestamps
list_of_dicts = list(queryset)
for a, b in itertools.combinations(list_of_dicts, 2):
if a['parent_id__grandparent_id']: == b['parent_id__grandparent_id']:
if a['timestamp'] > b['timestamp']:
list_of_dicts.remove(b)
else:
list_of_dicts.remove(a)
However, this feels hacky and I guess this is not an optimal solution. Is there a better way (by better I mean more optimal, i.e. minimizing the number of times querysets are evaluated etc.)? Can I do the same with queryset’s methods?
My models look something like this:
class MyModel(models.Model):
parent_id = models.ForeignKey('Parent'…
timestamp = …
regular_fields = …
class Parent(models.Model):
grandparent_id = models.ForeignKey('Grandparent'…
class Grandparent(models.Model):
…
Suppose I have an object model A with a one-to-many relationship with B in Peewee using an sqlite backend. I want to fetch some set of A and join each with their most recent B. Is their a way to do this without looping?
class A(Model):
some_field = CharField()
class B(Model):
a = ForeignKeyField(A)
date = DateTimeField(default=datetime.datetime.now)
The naive way would be to call order_by and limit(1), but that would apply to the entire query, so
q = A.select().join(B).order_by(B.date.desc()).limit(1)
will naturally produce a singleton result, as will
q = B.select().order_by(B.date.desc()).limit(1).join(A)
I am either using prefetch wrong or it doesn't work for this, because
q1 = A.select()
q2 = B.select().order_by(B.date.desc()).limit(1)
q3 = prefetch(q1,q2)
len(q3[0].a_set)
len(q3[0].a_set_prefetch)
Neither of those sets has length 1, as desired. Does anyone know how to do this?
I realize I needed to understand functions and group_by.
q = B.select().join(A).group_by(A).having(fn.Max(B.date)==B.date)
You can use it this way only if you want the latest date and not the last entry of the date. If the last date entry isn't the default one (datetime.datetime.now) this query will be wrong.
You can find the last date entry:
last_entry_date = B.select(B.date).order_by(B.id.desc()).limit(1).scalar()
and the related A records with this date:
with A and B fields:
q = A.select(A, B).join(B).where(B.date == last_entry_date)
with only the A fields:
q = B.select().join(A).where(B.date == last_entry_date)
If you want to find the latest B.date (as you do with the fn.Max(B.date)) and use it as the where filter:
latest_date = B.select(B.date).order_by(B.date.desc()).limit(1).scalar()
How to calculate total by month without using extra?
I'm currently using:
django 1.8
postgre 9.3.13
Python 2.7
Example.
What I have tried so far.
#Doesn't work for me but I don't mind because I don't want to use extra
truncate_month = connection.ops.date_trunc_sql('month','day')
invoices = Invoice.objects.filter(is_deleted = False,company = company).extra({'month': truncate_month}).values('month').annotate(Sum('total'))
----
#It works but I think that it's too slow if I query a big set of data
for current_month in range(1,13):
Invoice.objects.filter(date__month = current__month).annotate(total = Sum("total"))
and also this one, the answer seems great but I can't import the TruncMonth module.
Django: Group by date (day, month, year)
P.S. I know that this question is already asked multiple times but I don't see any answer.
Thanks!
SOLUTION:
Thanks to #Vin-G's answer.
First, you have to make a Function that can extract the month for you:
from django.db import models
from django.db.models import Func
class Month(Func):
function = 'EXTRACT'
template = '%(function)s(MONTH from %(expressions)s)'
output_field = models.IntegerField()
After that all you need to do is
annotate each row with the month
group the results by the annotated month using values()
annotate each result with the aggregated sum of the totals using Sum()
Important: if your model class has a default ordering specified in the meta options, then you will have to add an empty order_by() clause. This is because of https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate.
If you are unsure, you could just add the empty order_by() clause anyway without any adverse effects.
i.e.
from django.db.models import Sum
summary = (Invoice.objects
.annotate(m=Month('date'))
.values('m')
.annotate(total=Sum('total'))
.order_by())
See the full gist here: https://gist.github.com/alvingonzales/ff9333e39d221981e5fc4cd6cdafdd17
If you need further information:
Details on creating your own Func classes: https://docs.djangoproject.com/en/1.8/ref/models/expressions/#func-expressions
Details on the values() clause, (pay attention to how it interacts with annotate() with respect to order of the clauses):
https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#values
the order in which annotate() and values() clauses are applied to a query is significant. If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
result = (
invoices.objects
.all()
.values_list('created_at__year', 'created_at__month')
.annotate(Sum('total'))
.order_by('created_at__year', 'created_at__month')
)
itertools.groupby is the performant option in Python and can be utilized with a single db query:
from itertools import groupby
invoices = Invoice.objects.only('date', 'total').order_by('date')
month_totals = {
k: sum(x.total for x in g)
for k, g in groupby(invoices, key=lambda i: i.date.month)
}
month_totals
# {1: 100, 3: 100, 4: 500, 7: 500}
I am not aware of a pure django ORM solution. The date__month filter is very limited and cannot be used in values, order_by, etc.
Don't forget that Django querysets provide a native datetimes manager, which lets you easily pull all of the days/weeks/months/years out of any queryset for models with a datetime field. So if the Invoice model above has a created datetime field, and you want totals for each month in your queryset, you can just do:
invoices = Invoice.objects.all()
months = invoices.datetimes("created", kind="month")
for month in months:
month_invs = invoices.filter(created__month=month.month)
month_total = month_invs.aggregate(total=Sum("otherfield")).get("total")
print(f"Month: {month}, Total: {month_total}")
No external functions or deps needed.
I don't know if my solution is faster than your. You should profile it. Nonetheless I only query the db once instead of 12 times.
#utils.py
from django.db.models import Count, Sum
def get_total_per_month_value():
"""
Return the total of sales per month
ReturnType: [Dict]
{'December': 3400, 'February': 224, 'January': 792}
"""
result= {}
db_result = Sale.objects.values('price','created')
for i in db_result:
month = str(i.get('created').strftime("%B"))
if month in result.keys():
result[month] = result[month] + i.get('price')
else:
result[month] = i.get('price')
return result
#models.py
class Sale(models.Model):
price = models.PositiveSmallIntegerField()
created = models.DateTimeField(_(u'Published'), default="2001-02-24")
#views.py
from .utils import get_total_per_month_value
# ...
result = get_total_per_month_value()
test.py
#
import pytest
from mixer.backend.django import mixer
#Don't try to write in the database
pytestmark = pytest.mark.django_db
def test_get_total_per_month():
from .utils import get_total_per_month_value
selected_date = ['01','02','03','01','01']
#2016-01-12 == YYYY-MM-DD
for i in selected_date:
mixer.blend('myapp.Sale', created="2016-"+i+"-12")
values = get_total_per_month_value() #return a dict
months = values.keys()
assert 'January' in months, 'Should include January'
assert 'February' in months, 'Should include February'
assert len(months) == 3, 'Should aggregate the months'
I have a reservation models which have fields like booked date, commission amount, total booking amount etc. and based on the year provided I have to aggregate the reservations by months. Here is how I did that:
from django.db.models import Count, Sum
from django.db.models.functions import ExtractMonth
Reservation.objects.filter(
booked_date__year=year
).values(
'id',
'booked_date',
'commission_amount',
'total_amount'
).annotate(
month=ExtractMonth('booked_date')
).values('month').annotate(
total_commission_amount=Sum('commission_amount'),
total_revenue_amount=Sum('total_amount'),
total_booking=Count('id')
).order_by()
3 differents models have a different datetime field:
class ModelA(models.Model):
# some fields here
date = models.DateField()
class ModelB(models.Model):
# some fields here
date = models.DateField()
class ModelC(models.Model):
# some fields here
date = models.DateField()
I'd like to get the 50 last objects using the date fields (whatever their class).
For now, it works but I'm doing it in a very innecient way as you can see:
all_a = ModelA.objects.all()
all_b = ModelB.objects.all()
all_c = ModelC.objects.all()
last_50_events = sorted(
chain(all_a, all_b, all_c),
key=attrgetter('date'),
reverse=True)[:50]
How to do it un a efficient way (ie. without loading useless data)?
Easy solution - which i recommend you - load 50 objects of each type, sort, get first 50 (load 3 times more)
"Proper solution" can't be achieved in ORM with your current schema.
Probably easiest way is add new model with date and generic relation to whole model.
Theoretically you can also do some magic with union and raw queries, but all stuff like this is dirty with non trivial manual processing.
Assume I have a such model:
class Entity(models.Model):
start_time = models.DateTimeField()
I want to regroup them as list of lists which each list of lists contains Entities from the same date (same day, time should be ignored).
How can this be achieved in a pythonic way ?
Thanks
Create a small function to extract just the date:
def extract_date(entity):
'extracts the starting date from an entity'
return entity.start_time.date()
Then you can use it with itertools.groupby:
from itertools import groupby
entities = Entity.objects.order_by('start_time')
for start_date, group in groupby(entities, key=extract_date):
do_something_with(start_date, list(group))
Or, if you really want a list of lists:
entities = Entity.objects.order_by('start_time')
list_of_lists = [list(g) for t, g in groupby(entities, key=extract_date)]
I agree with the answer:
Product.objects.extra(select={'day': 'date( date_created )'}).values('day') \
.annotate(available=Count('date_created'))
But there is another point that:
the arguments of date() cannot use the double underline combine foreign_key field,
you have to use the table_name.field_name
result = Product.objects.extra(select={'day': 'date( product.date_created )'}).values('day') \
.annotate(available=Count('date_created'))
and product is the table_name
Also, you can use "print result.query" to see the SQL in CMD.