Django create subquery with values for last n days - python

I am using Django 3.1 with Postgres, and this is my abridged model:
class PlayerSeasonReport:
player = models.ForeignKey(Player)
competition_season = models.ForeignKey(CompetitionSeason)
class PlayerPrice:
player_season_report = models.ForeignKey(PlayerSeasonReport)
price = models.IntegerField()
date = models.DateTimeField()
# unique on (price, date)
I'm querying on the PlayerSeasonReport to get aggregate information about all players, in particular I would like the prices for the last n records (so the last price, the 7th-to-last price, etc.)
I currently get the PlayerSeasonReport queryset and annotate it like this:
base_query = PlayerSeasonReport.objects.filter(competition_season_id=id)
# This works fine
last_value = base_query.filter(
pk=OuterRef('pk'),
).order_by(
'pk',
'-player_prices__date'
).distinct('pk').annotate(
value=F('player_prices__price')
)
# Pull the value from a week ago
# This produces a value but is logically incorrect
# I am interested in the 7th-to-last value, not really from a week ago from day of query
week_ago = datetime.datetime.now() - datetime.timedelta(7)
value_7d_ago = base_query.filter(
pk=OuterRef('pk'),
player_prices__date__gte=week_ago,
).order_by(
'pk',
'fantasy_player_prices__date'
).distinct('pk').annotate(
value=F('player_prices__price')
)
return base_query.annotate(
value=Subquery(
value.values('value'),
output_field=FloatField()
),
# Same for value_7d_ago
# ...
# Many other annotations
)
Getting the most recent value works fine, but getting the last n values doesn't. I shouldn't be using datetime concepts in my logic, since what I'm really interested in is in the n-to-last values.
I've tried annotating the max date, then filtering based on this annotation, and also somehow slicing the subquery, but I can't seem to get any of it right.
It's worth noting that a price may not exist (there may be no record for n values in the past), in which case it should be null (the annotation based on datetime works)
How can I annotate the price values for the last n days?

Sorted:
base_query = PlayerSeasonReport.objects.filter(id=id)
# ...other manipulations on base query
prices = PlayerPrice.objects.filter(
fantasy_player_season_report=OuterRef('pk')
).order_by('-date')
return base_query.annotate(
price=Subquery(
prices.values('price')[:1],
output_field=FloatField()
),
prev_day_price=Subquery(
prices.values('price')[1:2],
output_field=FloatField()
),
# ...
)
Explanation:
We query on the child model (PlayerPrice) and join on the pk of the PlayerSeasonReport.
prices.values('price')[i:j] where j = i + 1 allows us to get the value we desire without evaluating the QuerySet (which is indispensable in a Subquery).

Related

Efficiently get first and last model instances in Django Model with timestamp, by day

Suppose you have this model:
from django import models
from django.contrib.postgres.indexes import BrinIndex
class MyModel(model.Models):
device_id = models.IntegerField()
timestamp = models.DateTimeField(auto_now_add=True)
my_value = models.FloatField()
class Meta:
indexes = (BrinIndex(fields=['timestamp']),)
There is a periodic process that creates an instance of this model every 2 minutes or so. This process is supposed to run for years, with multiple devices, so this table will contain a great number of records.
My goal is, for each day when there are records, to get the first and last records in that day.
So far, what I could come up with is this:
from django.db.models import Min, Max
results = []
device_id = 1 # Could be other device id, of course, but 1 for illustration's sake
# This will get me a list of dictionaries that have first and last fields
# with the desired timestamps, but not the field my_value for them.
first_last = MyModel.objects.filter(device_id=device_id).values('timestamp__date')\
.annotate(first=Min('timestamp__date'),last=Max('timestamp__date'))
# So now I have to iterate over that list to get the instances/values
for f in first_last:
first = f['first']
last = f['last']
first_value = MyModel.objects.get(device=device, timestmap=first).my_value
last_value = MyModel.objects.get(device=device, timestamp=last).my_value
results.append({
'first': first,
'last': last,
'first_value': first_value,
'last_value': last_value,
})
# Do something with results[]
This works, but takes a long time (about 50 seconds on my machine, retrieving first and last values for about 450 days).
I have tried other combinations of annotate(), values(), values_list(), extra() etc, but this is the best I could come up with so far.
Any help or insight is appreciated!
You can take advantage of .distinct() if you are using PostgreSQL as DBMS.
first_models = MyModel.objects.order_by('timestamp__date', 'timestamp').distinct('timestamp__date')
last_models = MyModel.objects.order_by('timestamp__date', '-timestamp').distinct('timestamp__date')
first_last = first_models.union(last_models)
# do something with first_last
One more things need to be mentioned: first_last might eliminate duplicate when there is only one record for a date. It should not be a problem for you, but if it does, you can iterate first_models and last_models separately.

Djano annotate with subtract operation returns None when the subtrahend is None

My objective is to get the balance for a billing instance by subtracting the payments posted within the period. In my Billing model, I have a backlog field that contains the backlogs from previous period. The Billing model has m2m relationship with Payment model through PaidBills model.
In my queryset:
'''
qs = Billing.objects.filter(
bill_batch=prev_batch, balance__gt=0).annotate(
payment=Sum('paidbills__amount_paid', filter=Q(
paidbills__pay_period=batch.period))).order_by(
'reservation__tenant__name', 'reservation__apt_unit').only(
'id', 'bill_batch', 'reservation')
qs = qs.annotate(new_bal=F('backlog') - F('payment'))
'''
The result is correct when the expression F('payment') contains a value, but will give a result None when F('payment') returns None. I have tried to replace the expression F('payment') with any fixed value, say 5000, and it worked as expected.
How to go about this? (Django 3.2.7, Python 3.9.5)
I haven't tested this, but Coalesce should do the job when the sum aggregation is None.
from django.db.models import Sum, Value
qs = Billing.objects.filter(bill_batch=prev_batch, balance__gt=0) \
.annotate(
payment=Coalesce(
Sum('paidbills__amount_paid', filter=Q(paidbills__pay_period=batch.period)),
Value(0)
)
) \
.order_by('reservation__tenant__name', 'reservation__apt_unit') \
.only('id', 'bill_batch', 'reservation')

How to implement cross join in django for a count annotation

I present a simplified version of my problem. I have venues and timeslots and users and bookings, as shown in the model descriptions below. Time slots are universal for all venues, and users can book into a time slot at a venue up until the venue capacity is reached.
class Venue(models.Model):
name = models.Charfield(max_length=200)
capacity = models.PositiveIntegerField(default=0)
class TimeSlot(models.Model):
start_time = models.TimeField()
end_time = models.TimeField()
class Booking(models.Model):
user = models.ForeignKey(User)
time_slot = models.ForeignKey(TimeSlot)
venue = models.ForeignKey(Venue)
Now I would like to as efficiently as possible get all possible combinations of Venues and TimeSlots and annotate the count of the bookings made for each combination, including the case where the number of bookings is 0.
I have managed to achieve this in raw SQL using a cross join on the Venue and TimeSlot tables. Something to the effect of the below. However despite exhaustive searching have not been able to find a django equivalent.
SELECT venue.name, timeslot.start_time, timeslot.end_time, count(booking.id)
FROM myapp_venue as venue
CROSS JOIN myapp_timeslot as timeslot
LEFT JOIN myapp_booking as booking on booking.time_slot_id = timeslot.id
GROUP BY venue.name, timeslot.start_time, timeslot.end_time
I'm also able to annotate the query to retrieve the count of bookings for which bookings for that combination do exist. But those combinations with 0 bookings get excluded. Example:
qs = Booking.objects.all().values(
venue=F('venue__name'),
start_time=F('time_slot__start_time'),
end_time=F('time_slot__end_time')
).annotate(bookings=Count('id')) \
.order_by('venue', 'start_time', 'end_time')
How can I achieve the effect of the CROSS JOIN query using the django ORM?
I don't believe Django has the capability to do cross joins without reverting down to raw SQL. I can give you two ideas that could point you in the right direction though:
Combination of queries and python loops.
venues = Venue.objects.all()
time_slots = TimeSlot.objects.all()
qs = ** your customer query above **
# Loop through both querysets, to create a master list.
venue_time_slots = []
for venue in venues:
for time_slot in time_slots:
venue_time_slots.append(venue.name, time_slot.start_time, time_slot.end_time, 0)
# Loop through master list and then compare to custom qs to update the count.
for venue_time in venue_time_slots:
for vt in qs:
# Check if venue and time found.
if venue_time[0] == qs.venue and venue_time[1] == qs.start_time:
venue_time[3] += qs.bookings
break
The harder one which I don't have a solution is to use a combination of filter, exclude, and union. I only have used this with 3 tables (two parents with a child-link-table), where you have 4 including user. So I can only provide the logic and not an example.
# Get all results that exist in table using .filter().
first_query.filter()
# Get all results that do not exist by using .exclude().
# You can use your results from the first query to exclude also, but
# would need to create an interim list.
exclude_ids = [fq_row.id for fq_row in first_query]
second_query.exclude(id__in=exclude_ids)
# Combine both queries
query = first_query.union(second_query)
return query

Django: Query Group By Month

How to calculate total by month without using extra?
I'm currently using:
django 1.8
postgre 9.3.13
Python 2.7
Example.
What I have tried so far.
#Doesn't work for me but I don't mind because I don't want to use extra
truncate_month = connection.ops.date_trunc_sql('month','day')
invoices = Invoice.objects.filter(is_deleted = False,company = company).extra({'month': truncate_month}).values('month').annotate(Sum('total'))
----
#It works but I think that it's too slow if I query a big set of data
for current_month in range(1,13):
Invoice.objects.filter(date__month = current__month).annotate(total = Sum("total"))
and also this one, the answer seems great but I can't import the TruncMonth module.
Django: Group by date (day, month, year)
P.S. I know that this question is already asked multiple times but I don't see any answer.
Thanks!
SOLUTION:
Thanks to #Vin-G's answer.
First, you have to make a Function that can extract the month for you:
from django.db import models
from django.db.models import Func
class Month(Func):
function = 'EXTRACT'
template = '%(function)s(MONTH from %(expressions)s)'
output_field = models.IntegerField()
After that all you need to do is
annotate each row with the month
group the results by the annotated month using values()
annotate each result with the aggregated sum of the totals using Sum()
Important: if your model class has a default ordering specified in the meta options, then you will have to add an empty order_by() clause. This is because of https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Fields that are mentioned in the order_by() part of a queryset (or which are used in the default ordering on a model) are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate.
If you are unsure, you could just add the empty order_by() clause anyway without any adverse effects.
i.e.
from django.db.models import Sum
summary = (Invoice.objects
.annotate(m=Month('date'))
.values('m')
.annotate(total=Sum('total'))
.order_by())
See the full gist here: https://gist.github.com/alvingonzales/ff9333e39d221981e5fc4cd6cdafdd17
If you need further information:
Details on creating your own Func classes: https://docs.djangoproject.com/en/1.8/ref/models/expressions/#func-expressions
Details on the values() clause, (pay attention to how it interacts with annotate() with respect to order of the clauses):
https://docs.djangoproject.com/en/1.9/topics/db/aggregation/#values
the order in which annotate() and values() clauses are applied to a query is significant. If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
result = (
invoices.objects
.all()
.values_list('created_at__year', 'created_at__month')
.annotate(Sum('total'))
.order_by('created_at__year', 'created_at__month')
)
itertools.groupby is the performant option in Python and can be utilized with a single db query:
from itertools import groupby
invoices = Invoice.objects.only('date', 'total').order_by('date')
month_totals = {
k: sum(x.total for x in g)
for k, g in groupby(invoices, key=lambda i: i.date.month)
}
month_totals
# {1: 100, 3: 100, 4: 500, 7: 500}
I am not aware of a pure django ORM solution. The date__month filter is very limited and cannot be used in values, order_by, etc.
Don't forget that Django querysets provide a native datetimes manager, which lets you easily pull all of the days/weeks/months/years out of any queryset for models with a datetime field. So if the Invoice model above has a created datetime field, and you want totals for each month in your queryset, you can just do:
invoices = Invoice.objects.all()
months = invoices.datetimes("created", kind="month")
for month in months:
month_invs = invoices.filter(created__month=month.month)
month_total = month_invs.aggregate(total=Sum("otherfield")).get("total")
print(f"Month: {month}, Total: {month_total}")
No external functions or deps needed.
I don't know if my solution is faster than your. You should profile it. Nonetheless I only query the db once instead of 12 times.
#utils.py
from django.db.models import Count, Sum
def get_total_per_month_value():
"""
Return the total of sales per month
ReturnType: [Dict]
{'December': 3400, 'February': 224, 'January': 792}
"""
result= {}
db_result = Sale.objects.values('price','created')
for i in db_result:
month = str(i.get('created').strftime("%B"))
if month in result.keys():
result[month] = result[month] + i.get('price')
else:
result[month] = i.get('price')
return result
#models.py
class Sale(models.Model):
price = models.PositiveSmallIntegerField()
created = models.DateTimeField(_(u'Published'), default="2001-02-24")
#views.py
from .utils import get_total_per_month_value
# ...
result = get_total_per_month_value()
test.py
#
import pytest
from mixer.backend.django import mixer
#Don't try to write in the database
pytestmark = pytest.mark.django_db
def test_get_total_per_month():
from .utils import get_total_per_month_value
selected_date = ['01','02','03','01','01']
#2016-01-12 == YYYY-MM-DD
for i in selected_date:
mixer.blend('myapp.Sale', created="2016-"+i+"-12")
values = get_total_per_month_value() #return a dict
months = values.keys()
assert 'January' in months, 'Should include January'
assert 'February' in months, 'Should include February'
assert len(months) == 3, 'Should aggregate the months'
I have a reservation models which have fields like booked date, commission amount, total booking amount etc. and based on the year provided I have to aggregate the reservations by months. Here is how I did that:
from django.db.models import Count, Sum
from django.db.models.functions import ExtractMonth
Reservation.objects.filter(
booked_date__year=year
).values(
'id',
'booked_date',
'commission_amount',
'total_amount'
).annotate(
month=ExtractMonth('booked_date')
).values('month').annotate(
total_commission_amount=Sum('commission_amount'),
total_revenue_amount=Sum('total_amount'),
total_booking=Count('id')
).order_by()

In Django ORM: Select record from each group with maximal value of a given attribute

Say I have three models as follows representing the prices of goods sold at several retail locations of the same company:
class Store(models.Model):
name = models.CharField(max_length=256)
address = models.TextField()
class Product(models.Model):
name = models.CharField(max_length=256)
description = models.TextField()
class Price(models.Model):
store = models.ForeignKey(Store)
product = models.ForeignKey(Product)
effective_date = models.DateField()
value = models.FloatField()
When a price is set, it is set on a store-and-product-specific basis. I.e. the same item can have different prices in different stores. And each of these prices has an effective date. For a given store and a given product, the currently-effective price is the one with the latest effective_date.
What's the best way to write the query that will return the currently-effective price of all items in all stores?
If I were using Pandas, I would get myself a dataframe with columns ['store', 'product', 'effective_date', 'price'] and I would run
dataframe\
.sort_values(columns=['store', 'product', 'effective_date'], ascending=[True, True, False])\
.groupby('store', 'product')['price'].first()
But there has to be some way of doing this directly on the database level. Thoughts?
If your DBMS is PostgreSQL you can use distinct combined with order_by this way :
Price.objects.order_by('store','product','-effective_date').distinct('store','product')
It will give you all the latest prices for all product/store combinations.
There are tricks about distinct, have a look at the docs here : https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Without Postgres' added power (which you should really use) there is a more complicated solution to this (based on ryanpitts' idea), which requires two db hits:
latest_set = Price.objects
.values('store_id', 'product_id') # important to have values before annotate ...
.annotate(max_date=Max('effective_date')).order_by()
# ... to annotate for the grouping that results from values
# Build a query that reverse-engineers the Price records that contributed to
# 'latest_set'. (Relying on the fact that there are not 2 Prices
# for the same product-store with an identical date)
q_statement = Q(product_id=-1) # sth. that results in empty qs
for latest_dict in latest_set:
q_statement |=
(Q(product_id=latest_dict['product_id']) &
Q(store_id=latest_dict['store_id']) &
Q(effective_date=latest_dict['max_date']))
Price.objects.filter(q_statement)
If you are using PostgreSQL, you could use order_by and distinct to get the current effective prices for all the products in all the stores as follows:
prices = Price.objects.order_by('store', 'product', '-effective_date')
.distinct('store', 'product')
Now, this is quite analogous to what you have there for Pandas.
Do note that using field names in distinct only works in PostgreSQL. Once you have sorted the prices based on store, product and decreasing order of effective date, distinct('store', 'product') will retain only the first entry for each store-product pair and that will be your current entry with recent price.
Not PostgreSQL database:
If you are not using PostgreSQL, you could do it with two queries:
First, we get latest effective date for all the store-product groups:
latest_effective_dates = Price.objects.values('store_id', 'product_id')
.annotate(led=Max('effective_date')).values('led')
Once we have these dated we could get the prices for this date:
prices = Price.objects.filter(effective_date__in=latest_effective_dates)
Disclaimer: This assumes that for no effective_date is same for any store-product group.

Categories

Resources