Sum averages over date ranges in Django - python

I'm trying to construct a query in Django that sums averages that were taken (i.e. averaged) over a range of times.
Here is the relevant Django model:
class Data(models.Model):
class Meta:
verbose_name_plural = "Data"
site = models.ForeignKey(Site)
created_on = models.DateTimeField(auto_created=True)
reported_on = models.DateTimeField(null=True, blank=True)
baseline_power_kw = models.FloatField('Baseline Power (kw)', blank=True, null=True)
measured_power_kw = models.FloatField('Measured Power (kw)', blank=True, null=True)
In my query, I'm trying to average sites' data over a range of times, and then sum those averages for each range of time. Here is the query I have so far, which I believe just gets the average of all sites' data over a range of times.
t_data = Data.objects.filter(site__in=sites) \
.filter(created_on__range=(start, end)) \
.extra(select={'date_slice': "trunc(extract(epoch from created_on) / '60' )"}) \
.values('date_slice') \
.annotate(avg_baseline_power_kw=Avg('baseline_power_kw'),
avg_measured_power_kw=Avg('measured_power_kw'),
time=Min('created_on')) \
.order_by('-created_on')
Do you know how I can proceed? I am using Django with Postgres.
Thanks!

If you add 'site' to your .values() clause, like this:
.values('date_slice', 'site')
and remove the order_by, which will cause the 'created_on' field to get added to the generated SQL GROUP BY, you should get averages for your two measurements for each slice+site. You can then sum those values to get the totals across all sites.

Related

Conditional annotations with Aggregation over only some fields in Django

So lets assume i have two databases in my Django project
class Article(models.Model):
name = models.CharField(max_length=200)
# ..
class Price(models.Model):
article = models.ForeignKey('Article')
date = models.DateTimeField(auto_now_add=True)
price = models.DecimalField()
# ..
There exist multiple Price entries per day for the same article.
Now I want to annotate an article queryset with the average price of every article on the previous day. But I have no idea on how to do this in one efficient query.
What I have done is this:
articles = Articles.objects.all().select_related().filter(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1)).annotate(avg_price=Avg('price__price'))
This works, if every article would have at least one price each day. But that isnt always the case. Articles that have no price for the previous day should have None or 0 or some default as avg_price.
Does anybody know how to achieve this?
Aggregation functions can take an argument filter [Django docs] which can be used to put conditions on the aggregation:
from django.db.models import Q
articles = Articles.objects.all().select_related().annotate(
avg_price=Avg(
'price__price',
filter=Q(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1))
)
)

Django: How can I add an aggregated field to a queryset based on data from the row and data from another Model?

I have a Django App with the following models:
CURRENCY_CHOICES = (('USD', 'US Dollars'), ('EUR', 'Euro'))
class ExchangeRate(models.Model):
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
rate = models.FloatField()
exchange_date = models.DateField()
class Donation(models.Model):
donation_date = models.DateField()
donor = models.CharField(max_length=250)
amount = models.FloatField()
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
I also have a form I use to filter donations based on some criteria:
class DonationFilterForm(forms.Form)
min_amount = models.FloatField(required=False)
max_amount = models.FloatField(required=False)
The min_amount and max_amount fields will always represent values in US Dollars.
I need to be able to filter a queryset based on min_amount and max_amount, but for that all the amounts must be in USD. To convert the donation amount to USD I need to multiply by the ExchangeRate of the donation currency and date.
The only way I found of doing this so far is by iterating the dict(queryset) and adding a new value called usd_amount, but that may offer very poor performance in the future.
Reading Django documentation, it seems the same thing can be done using aggregation, but so far I haven't been able to create the right logic that would give me same result.
I knew I had to use annotate to solve this, but I didn't know exactly how because it involved getting data from an unrelated Model.
Upon further investigation I found what I needed in the Django Documentation. I needed to use the Subquery and the OuterRef expressions to get the values from the outer queryset so I could filter the inner queryset.
The final solution looks like this:
# Prepare the filter with dynamic fields using OuterRef
rates = ExchangeRate.objects.filter(exchange_date=OuterRef('date'), currency='EUR')
# Get the exchange rate for every donation made in Euros
qs = Donation.objects.filter(currency='EUR').annotate(exchange_rate=Subquery(rates.values('rate')[:1]))
# Get the equivalent amount in USD
qs = qs.annotate(usd_amount=F('amount') * F('exchange_rate'))
So, finally, I could filter the resulting queryset like so:
final_qs = qs.filter(usd_amount__gte=min_amount, usd_amount__lte=max_amount)

Django querysets. Annotate different fields with one query

I wrote 3 queries to the database to get different values. I need to combine those queries to one query.
# Counting Total Number of Plans by Day
Day.objects.annotate(num_of_plans=Count('plan')) \
.values('num_of_plans', 'date', 'id')
# Counting is_completed=True Plans by Day
Day.objects \
.filter(plan__is_completed=True) \
.annotate(num_of_completed_plans=Count('plan__is_completed')) \
.values('num_of_completed_plans', 'id', 'date')
# Counting status=deferred Plans by Day
Day.objects \
.filter(plan__status='deferred') \
.annotate(num_of_deferred_plans=Count('plan__is_completed')) \
.values('num_of_deferred_plans', 'id', 'date')
As you can see above there 3 queries. Somehow I need to optimize this code and get values with the help of one query
models
class Day(models.Model):
date = models.DateField(default=datetime.date.today, unique=True)
class Plan(models.Model):
title = models.CharField(max_length=255)
status = models.CharField(max_length=255, choices=PLAN_STATUSES, null=True, default='upcoming')
is_completed = models.BooleanField(default=False, null=True)
day = models.ForeignKey(Day, CASCADE, null=True)
Are there any ways to optimize that 3 queries and get values with one query?
Since django-2.0, you can use the filter=… parameter [Django-doc] in the Count expression. As for the filtering on a Bool, you can just use a Sum expression [Django-doc]:
from django.db.models import Count, Q, Sum
Day.objects.annotate(
num_of_plans=Count('plan'),
num_of_completed_plans=Sum('plan__is_completed'),
num_of_deferred_plans=Count('plan', filter=Q(plan__status='deferred'))
)
Normally it is better not to use .values(). But make use of the objects (that have extra attributes), since then you keep the logic you define on your model intact.

django product price tracker: getting the amount and date of the all time max and min price

I'm trying to build a django app where I can track product prices over time. The app fetches new prices routinely, graphs them and shows the recent history of price changes.
I'm checking the price once a day and saving that price plus the date timestamp to my models.
models.py
Class Product(models.Model):
title = models.CharField(max_length=255)
Class Price(models.Model):
product = models.ForeignKey(Product, on_delete=models.CASCADE)
date_seen = models.DateTimeField(auto_now_add=True)
price = models.IntegerField(blank=True, null=True)
Along with the current price of a product I'd also like to show the max and min over all the price data I've collected. I want to get the value and also the date it was at that value. So far I can get the value but I can't get the corresponding date. I'm using this:
def price_hla(self):
return Product.objects.filter(price__product=self).aggregate(high_price=Max('price__price'), low_price=Min('price__price'), avg_price=Avg('price___price'))
Any advice? Thanks in advance!
EDIT: Based on responses I have the following. My problem is I'm getting the MAX price and MAX date independent of each other. I want the MAX price with that max price's date in the same response.
def price_hla(self):
return
Product.objects.filter(price__product=self)[:1].annotate(Max('price__price'), Max('price__date_seen'))`
Try this:
Product.objects.filter(price__product=self).annotate(
high_price=Max('price__price'),
).filter(price=F('max_price'))
Which should give you the max price and date in the resulting objects.
I can't think of a way to simultaneously find the minimum price/date in the same query though. I also have a feeling that this is going to be very slow if you have a large number of items.
Figured this out and I'm getting what I want. If anyone reads this I'd love feedback about if this is best practice or if I'm going to be overloading my database.
Because I needed both the actual price and the date the price was max I needed to return the whole Price Object. So I wrote some QuerySets on my DetailView by overwriting the default get_context_data method.
views.py
class ProductDetailView(DetailView):
model = Product
def get_context_data(self, **kwargs):
context = super(ProductDetailView, self).get_context_data(**kwargs)
context['high'] = Price.objects.filter(product_id=self.get_object()).order_by('price').last()
context['low'] = Price.objects.filter(product_id=self.get_object()).order_by('-price').last()
context['avg'] = Price.objects.filter(product_id=self.get_object()).aggregate(avg_price=Avg('price'))
Then I pulled it in to my templates using high.price and high.date_seen, etc.

In Django ORM: Select record from each group with maximal value of a given attribute

Say I have three models as follows representing the prices of goods sold at several retail locations of the same company:
class Store(models.Model):
name = models.CharField(max_length=256)
address = models.TextField()
class Product(models.Model):
name = models.CharField(max_length=256)
description = models.TextField()
class Price(models.Model):
store = models.ForeignKey(Store)
product = models.ForeignKey(Product)
effective_date = models.DateField()
value = models.FloatField()
When a price is set, it is set on a store-and-product-specific basis. I.e. the same item can have different prices in different stores. And each of these prices has an effective date. For a given store and a given product, the currently-effective price is the one with the latest effective_date.
What's the best way to write the query that will return the currently-effective price of all items in all stores?
If I were using Pandas, I would get myself a dataframe with columns ['store', 'product', 'effective_date', 'price'] and I would run
dataframe\
.sort_values(columns=['store', 'product', 'effective_date'], ascending=[True, True, False])\
.groupby('store', 'product')['price'].first()
But there has to be some way of doing this directly on the database level. Thoughts?
If your DBMS is PostgreSQL you can use distinct combined with order_by this way :
Price.objects.order_by('store','product','-effective_date').distinct('store','product')
It will give you all the latest prices for all product/store combinations.
There are tricks about distinct, have a look at the docs here : https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Without Postgres' added power (which you should really use) there is a more complicated solution to this (based on ryanpitts' idea), which requires two db hits:
latest_set = Price.objects
.values('store_id', 'product_id') # important to have values before annotate ...
.annotate(max_date=Max('effective_date')).order_by()
# ... to annotate for the grouping that results from values
# Build a query that reverse-engineers the Price records that contributed to
# 'latest_set'. (Relying on the fact that there are not 2 Prices
# for the same product-store with an identical date)
q_statement = Q(product_id=-1) # sth. that results in empty qs
for latest_dict in latest_set:
q_statement |=
(Q(product_id=latest_dict['product_id']) &
Q(store_id=latest_dict['store_id']) &
Q(effective_date=latest_dict['max_date']))
Price.objects.filter(q_statement)
If you are using PostgreSQL, you could use order_by and distinct to get the current effective prices for all the products in all the stores as follows:
prices = Price.objects.order_by('store', 'product', '-effective_date')
.distinct('store', 'product')
Now, this is quite analogous to what you have there for Pandas.
Do note that using field names in distinct only works in PostgreSQL. Once you have sorted the prices based on store, product and decreasing order of effective date, distinct('store', 'product') will retain only the first entry for each store-product pair and that will be your current entry with recent price.
Not PostgreSQL database:
If you are not using PostgreSQL, you could do it with two queries:
First, we get latest effective date for all the store-product groups:
latest_effective_dates = Price.objects.values('store_id', 'product_id')
.annotate(led=Max('effective_date')).values('led')
Once we have these dated we could get the prices for this date:
prices = Price.objects.filter(effective_date__in=latest_effective_dates)
Disclaimer: This assumes that for no effective_date is same for any store-product group.

Categories

Resources