Assume I have a such model:
class Entity(models.Model):
start_time = models.DateTimeField()
I want to regroup them as list of lists which each list of lists contains Entities from the same date (same day, time should be ignored).
How can this be achieved in a pythonic way ?
Thanks
Create a small function to extract just the date:
def extract_date(entity):
'extracts the starting date from an entity'
return entity.start_time.date()
Then you can use it with itertools.groupby:
from itertools import groupby
entities = Entity.objects.order_by('start_time')
for start_date, group in groupby(entities, key=extract_date):
do_something_with(start_date, list(group))
Or, if you really want a list of lists:
entities = Entity.objects.order_by('start_time')
list_of_lists = [list(g) for t, g in groupby(entities, key=extract_date)]
I agree with the answer:
Product.objects.extra(select={'day': 'date( date_created )'}).values('day') \
.annotate(available=Count('date_created'))
But there is another point that:
the arguments of date() cannot use the double underline combine foreign_key field,
you have to use the table_name.field_name
result = Product.objects.extra(select={'day': 'date( product.date_created )'}).values('day') \
.annotate(available=Count('date_created'))
and product is the table_name
Also, you can use "print result.query" to see the SQL in CMD.
Related
I have a queryset in my Django application:
databytes_all = DataByte.objects
Each item in the databytes_all queryset has many attributes but one of them is publish_date.
I'd like to order the queryset by publish_date, however if publish_date is None, I'd like the item to be at the end of the queryset.
This is what I'm trying but it's not working:
databytes_all = DataByte.objects
Make a queryset: filter out all of the publish dates that are None
no_date = databytes_all.filter(publish_date=None)
Make anohther queryset: exclude all of the items where publish date is none, then order the remaining items by publish date
databytes_with_date = databytes_all.order_by('-publish_date').exclude(publish_date=None)
Now combine the two querysets (however this doesnt work- the items with no publish date are first in the list when I want them to be last)
databytes = databytes_with_date | no_date
You do not need to filter, you can specify that the NULL should be last with:
from django.db.models import F
databytes_all.order_by(F('publish_date').desc(nulls_last=True))
We here thus make use of an F object [Django-doc] where we call the .desc(…) method [Django-doc]. This method has a nulls_last=… parameter that we can use to specify that items with NULL should be selected as the last records.
This will sort with:
SELECT …
FROM …
ORDER BY publish_date IS NULL, publish_date DESC
since False/0 is considered less than True/1, it will thus move the items with publish_date to the bottom of the records you retrieve.
In Django, you can combine a lot of QuerySet using chain
try this:
from itertools import chain
...
combined_results = list(chain(databytes_with_date, no_date))
I’m trying to find duplicates of a Django model-object's instance based on grandparent-instance id and filter out older duplicates based on timestamp field.
I suppose I could do this with distinct(*specify_fields) function, but I don’t use Postgresql database (docs). I managed to achieve this with the following code:
queryset = MyModel.objects.filter(some_filtering…) \
.only('parent_id__grandparent_id', 'timestamp' 'regular_fields'...) \
.values('parent_id__grandparent_id', 'timestamp' 'regular_fields'...)
# compare_all_combinations_and_remove_duplicates_with_older_timestamps
list_of_dicts = list(queryset)
for a, b in itertools.combinations(list_of_dicts, 2):
if a['parent_id__grandparent_id']: == b['parent_id__grandparent_id']:
if a['timestamp'] > b['timestamp']:
list_of_dicts.remove(b)
else:
list_of_dicts.remove(a)
However, this feels hacky and I guess this is not an optimal solution. Is there a better way (by better I mean more optimal, i.e. minimizing the number of times querysets are evaluated etc.)? Can I do the same with queryset’s methods?
My models look something like this:
class MyModel(models.Model):
parent_id = models.ForeignKey('Parent'…
timestamp = …
regular_fields = …
class Parent(models.Model):
grandparent_id = models.ForeignKey('Grandparent'…
class Grandparent(models.Model):
…
I'm looking for a way to combine a custom date value and a time field in django. My model only contains a time field. Now I have to annotate a new field combining a custom date and the time field. I thought the following code will solve my problem, but it only gives the date value. TimeField is ignored.
class MyModel(models.Model):
my_time_field = TimeField()
custom_date = datetime.today().date()
objects = MyModel.objects.annotate(
custom_datetime=Func(
custom_date + F('my_time_field'),
function='DATE'
)
)
Please advise the right way to solve this issue.
You should be able to use a Value expression (see the docs) in a manner similar to this:
class MyModel(models.Model):
my_time_field = TimeField()
custom_date = datetime.today().date()
MyModel.objects.annotate(
custom_datetime=Value(
datetime.datetime.combine(custom_date, F('my_time_field')),
output_field=DateTimeField()))
The key parts are to combine your custom_date, the time from your my_time_field, and then output it as a DateTimeField.
It was an easy solution, but took a while for me to figure it out. If anyone else having the same question, this is the answer. Just use ExpressionWrapper.
objects = MyModel.objects.annotate(
custom_datetime=ExpressionWrapper(
custom_date + F('my_time_field'),
output_field=DateTimeField()
)
)
How to calculate total by month without using extra?
I'm currently using:
django 1.8
postgre 9.3.13
Python 2.7
Example.
What I have tried so far.
#Doesn't work for me but I don't mind because I don't want to use extra
truncate_month = connection.ops.date_trunc_sql('month','day')
invoices = Invoice.objects.filter(is_deleted = False,company = company).extra({'month': truncate_month}).values('month').annotate(Sum('total'))
----
#It works but I think that it's too slow if I query a big set of data
for current_month in range(1,13):
Invoice.objects.filter(date__month = current__month).annotate(total = Sum("total"))
and also this one, the answer seems great but I can't import the TruncMonth module.
Django: Group by date (day, month, year)
P.S. I know that this question is already asked multiple times but I don't see any answer.
Thanks!
SOLUTION:
Thanks to #Vin-G's answer.
First, you have to make a Function that can extract the month for you:
from django.db import models
from django.db.models import Func
class Month(Func):
function = 'EXTRACT'
template = '%(function)s(MONTH from %(expressions)s)'
output_field = models.IntegerField()
After that all you need to do is
annotate each row with the month
group the results by the annotated month using values()
annotate each result with the aggregated sum of the totals using Sum()
Important: if your model class has a default ordering specified in the meta options, then you will have to add an empty order_by() clause. This is because of https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate.
If you are unsure, you could just add the empty order_by() clause anyway without any adverse effects.
i.e.
from django.db.models import Sum
summary = (Invoice.objects
.annotate(m=Month('date'))
.values('m')
.annotate(total=Sum('total'))
.order_by())
See the full gist here: https://gist.github.com/alvingonzales/ff9333e39d221981e5fc4cd6cdafdd17
If you need further information:
Details on creating your own Func classes: https://docs.djangoproject.com/en/1.8/ref/models/expressions/#func-expressions
Details on the values() clause, (pay attention to how it interacts with annotate() with respect to order of the clauses):
https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#values
the order in which annotate() and values() clauses are applied to a query is significant. If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
result = (
invoices.objects
.all()
.values_list('created_at__year', 'created_at__month')
.annotate(Sum('total'))
.order_by('created_at__year', 'created_at__month')
)
itertools.groupby is the performant option in Python and can be utilized with a single db query:
from itertools import groupby
invoices = Invoice.objects.only('date', 'total').order_by('date')
month_totals = {
k: sum(x.total for x in g)
for k, g in groupby(invoices, key=lambda i: i.date.month)
}
month_totals
# {1: 100, 3: 100, 4: 500, 7: 500}
I am not aware of a pure django ORM solution. The date__month filter is very limited and cannot be used in values, order_by, etc.
Don't forget that Django querysets provide a native datetimes manager, which lets you easily pull all of the days/weeks/months/years out of any queryset for models with a datetime field. So if the Invoice model above has a created datetime field, and you want totals for each month in your queryset, you can just do:
invoices = Invoice.objects.all()
months = invoices.datetimes("created", kind="month")
for month in months:
month_invs = invoices.filter(created__month=month.month)
month_total = month_invs.aggregate(total=Sum("otherfield")).get("total")
print(f"Month: {month}, Total: {month_total}")
No external functions or deps needed.
I don't know if my solution is faster than your. You should profile it. Nonetheless I only query the db once instead of 12 times.
#utils.py
from django.db.models import Count, Sum
def get_total_per_month_value():
"""
Return the total of sales per month
ReturnType: [Dict]
{'December': 3400, 'February': 224, 'January': 792}
"""
result= {}
db_result = Sale.objects.values('price','created')
for i in db_result:
month = str(i.get('created').strftime("%B"))
if month in result.keys():
result[month] = result[month] + i.get('price')
else:
result[month] = i.get('price')
return result
#models.py
class Sale(models.Model):
price = models.PositiveSmallIntegerField()
created = models.DateTimeField(_(u'Published'), default="2001-02-24")
#views.py
from .utils import get_total_per_month_value
# ...
result = get_total_per_month_value()
test.py
#
import pytest
from mixer.backend.django import mixer
#Don't try to write in the database
pytestmark = pytest.mark.django_db
def test_get_total_per_month():
from .utils import get_total_per_month_value
selected_date = ['01','02','03','01','01']
#2016-01-12 == YYYY-MM-DD
for i in selected_date:
mixer.blend('myapp.Sale', created="2016-"+i+"-12")
values = get_total_per_month_value() #return a dict
months = values.keys()
assert 'January' in months, 'Should include January'
assert 'February' in months, 'Should include February'
assert len(months) == 3, 'Should aggregate the months'
I have a reservation models which have fields like booked date, commission amount, total booking amount etc. and based on the year provided I have to aggregate the reservations by months. Here is how I did that:
from django.db.models import Count, Sum
from django.db.models.functions import ExtractMonth
Reservation.objects.filter(
booked_date__year=year
).values(
'id',
'booked_date',
'commission_amount',
'total_amount'
).annotate(
month=ExtractMonth('booked_date')
).values('month').annotate(
total_commission_amount=Sum('commission_amount'),
total_revenue_amount=Sum('total_amount'),
total_booking=Count('id')
).order_by()
I would like to do a SUM on rows in a database and group by date.
I am trying to run this SQL query using Django aggregates and annotations:
select strftime('%m/%d/%Y', time_stamp) as the_date, sum(numbers_data)
from my_model
group by the_date;
I tried the following:
data = My_Model.objects.values("strftime('%m/%d/%Y',
time_stamp)").annotate(Sum("numbers_data")).order_by()
but it seems like you can only use column names in the values() function; it doesn't like the use of strftime().
How should I go about this?
This works for me:
select_data = {"d": """strftime('%%m/%%d/%%Y', time_stamp)"""}
data = My_Model.objects.extra(select=select_data).values('d').annotate(Sum("numbers_data")).order_by()
Took a bit to figure out I had to escape the % signs.
As of v1.8, you can use Func() expressions.
For example, if you happen to be targeting AWS Redshift's date and time functions:
from django.db.models import F, Func, Value
def TimezoneConvertedDateF(field_name, tz_name):
tz_fn = Func(Value(tz_name), F(field_name), function='CONVERT_TIMEZONE')
dt_fn = Func(tz_fn, function='TRUNC')
return dt_fn
Then you can use it like this:
SomeDbModel.objects \
.annotate(the_date=TimezoneConvertedDateF('some_timestamp_col_name',
'America/New_York')) \
.filter(the_date=...)
or like this:
SomeDbModel.objects \
.annotate(the_date=TimezoneConvertedDateF('some_timestamp_col_name',
'America/New_York')) \
.values('the_date') \
.annotate(...)
Any reason not to just do this in the database, by running the following query against the database:
select date, sum(numbers_data)
from my_model
group by date;
If your answer is, the date is a datetime with non-zero hours, minutes, seconds, or milliseconds, my answer is to use a date function to truncate the datetime, but I can't tell you exactly what that is without knowing what RBDMS you're using.
I'm not sure about strftime, my solution below is using sql postgres trunc...
select_data = {"date": "date_trunc('day', creationtime)"}
ttl = ReportWebclick.objects.using('cms')\
.extra(select=select_data)\
.filter(**filters)\
.values('date', 'tone_name', 'singer', 'parthner', 'price', 'period')\
.annotate(loadcount=Sum('loadcount'), buycount=Sum('buycount'), cancelcount=Sum('cancelcount'))\
.order_by('date', 'parthner')
-- equal to sql query execution:
select date_trunc('month', creationtime) as date, tone_name, sum(loadcount), sum(buycount), sum(cancelcount)
from webclickstat
group by tone_name, date;
my solution like this when my db is mysql:
select_data = {"date":"""FROM_UNIXTIME( action_time,'%%Y-%%m-%%d')"""}
qs = ViewLogs.objects.filter().extra(select=select_data).values('mall_id', 'date').annotate(pv=Count('id'), uv=Count('visitor_id', distinct=True))
to use which function, you can read mysql datetime processor docs like DATE_FORMAT,FROM_UNIXTIME...