3 differents models have a different datetime field:
class ModelA(models.Model):
# some fields here
date = models.DateField()
class ModelB(models.Model):
# some fields here
date = models.DateField()
class ModelC(models.Model):
# some fields here
date = models.DateField()
I'd like to get the 50 last objects using the date fields (whatever their class).
For now, it works but I'm doing it in a very innecient way as you can see:
all_a = ModelA.objects.all()
all_b = ModelB.objects.all()
all_c = ModelC.objects.all()
last_50_events = sorted(
chain(all_a, all_b, all_c),
key=attrgetter('date'),
reverse=True)[:50]
How to do it un a efficient way (ie. without loading useless data)?
Easy solution - which i recommend you - load 50 objects of each type, sort, get first 50 (load 3 times more)
"Proper solution" can't be achieved in ORM with your current schema.
Probably easiest way is add new model with date and generic relation to whole model.
Theoretically you can also do some magic with union and raw queries, but all stuff like this is dirty with non trivial manual processing.
Related
I will try to be precise with this as much as possible.
Imagine these two models. whose relation was set up years ago:
class Event(models.Model):
instance_created_date = models.DateTimeField(auto_now_add=True)
car = models.ForeignKey(Car, on_delete=models.CASCADE, related_name="car_events")
...
a lot of normal text fields here, but they dont matter for this problem.
and
class Car(models.Model):
a lot of text fields here, but they dont matter for this problem.
hide_from_company_search = models.BooleanField(default=False)
images = models.ManyToManyField(Image, through=CarImage)
Lets say I want to query the amount of events for a given car:
def get_car_events_qs() -> QuerySet:
six_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=6))
cars = Car.objects.prefetch_related(
'car_events',
).filter(
some_conditions_on_fields=False,
).annotate(
num_car_events=Count(
'car_events',
filter=Q(car_events__instance_created_date__gt=six_days_ago), distinct=True)
)
return cars
The really tricky part for this is the performance of the query: Cars has 450.000 entries, and Events has 156.850.048. All fields that I am using to query are indexed. The query takes around 4 minutes to complete, depending on the db load. It took 18 minutes before adding the indicies.
This above ORM query will result in the following sql:
SELECT
"core_car"."id",
COUNT("analytics_carevent"."id") FILTER (WHERE ("analytics_carevent"."event" = 'view'
AND "analytics_carevent"."instance_created_date" >= '2022-05-10T07:45:16.672279+00:00'::timestamptz
AND "analytics_carevent"."instance_created_date" < '2022-05-11T07:45:16.672284+00:00'::timestamptz)) AS "num_cars_view",
LEFT OUTER JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id")
WHERE
... some conditions that dont matter
GROUP BY
"core_car"."id"
I somehow suspect this FILTER to be a problem.
I tried with
.annotate(num_car_events=Count('car_events'))
and moving the car_events__instance_created_date__gt=six_days_ago into the filter:
.filter(some_conditions_on_fields=False, car_events__instance_created_date__gt=six_days_ago)
But of course this would filter out Cars with no Events, which is not what we want - but it is super fast!
I fiddled a bit with it in raw sql and came to his nice working example, that I now would like to write into ORM, since we dont really want to use rawsql. This query takes 2.2s, which is in our acceptable boundary, but faaaaar less than the 18minutes.
SELECT
"core_car"."id",
COUNT(DISTINCT "analytics_carevent"."id") AS "num_cars_view",
FROM
"core_car"
LEFT JOIN "analytics_carevent" ON ("core_car"."id" = "analytics_carevent"."car_id" AND "analytics_carevent"."event" = 'view' AND "analytics_carevent"."instance_created_date" > '2022-05-14T00:00:00+02:00'::timestamptz
AND "analytics_carevent"."instance_created_date" <= '2022-05-15T00:00:00+02:00'::timestamptz)
WHERE (some conditions that dont matter)
GROUP BY "core_car"."id";
My question now is:
How can I make the above query into the ORM?
I need to put the "filter" or conditions onto the left join. If I just use filter() it will just put it into the where clause, which is wrong.
I tried:
two_days_ago = (timezone.now().replace(hour=0, minute=0, second=0, microsecond=0) - timedelta(days=2))
cars = Car.objects.prefetch_related(
'car_events',
).filter(some_filters,)
and
cars = cars.annotate(events=FilteredRelation('car_events')).filter(car_events__car_id__in=cars.values_list("id", flat=True), car_events__instance_created_date__gt=six_days_ago)
But I dont think this is quite correct. I also need the count of the annotation.
Using Django 4 and latest python release as of this writing. :)
Thanks a lot!
TLDR: Putting a filter or conditions on LEFT JOIN in django, instead of queryset.filter()
So lets assume i have two databases in my Django project
class Article(models.Model):
name = models.CharField(max_length=200)
# ..
class Price(models.Model):
article = models.ForeignKey('Article')
date = models.DateTimeField(auto_now_add=True)
price = models.DecimalField()
# ..
There exist multiple Price entries per day for the same article.
Now I want to annotate an article queryset with the average price of every article on the previous day. But I have no idea on how to do this in one efficient query.
What I have done is this:
articles = Articles.objects.all().select_related().filter(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1)).annotate(avg_price=Avg('price__price'))
This works, if every article would have at least one price each day. But that isnt always the case. Articles that have no price for the previous day should have None or 0 or some default as avg_price.
Does anybody know how to achieve this?
Aggregation functions can take an argument filter [Django docs] which can be used to put conditions on the aggregation:
from django.db.models import Q
articles = Articles.objects.all().select_related().annotate(
avg_price=Avg(
'price__price',
filter=Q(price__date__exact=datetime.datetime.now() - datetime.timedelta(days=1))
)
)
Consider the following Models in Django:
class Item(models.Model):
name = models.CharField(max_length = 100)
class Item_Price(models.Model):
created_on = models.DateTimeField(default = timezone.now)
item = models.ForeignKey('Item', related_name = 'prices')
price = models.DecimalField(decimal_places = 2, max_digits = 15)
The price of an item can vary throughout time so I want to keep a price history.
My goal is to have a single query using the Django ORM to get a list of Items with their latest prices and sort the results by this price in ascending order.
What would be the best way to achieve this?
You can use a Subquery to obtain the latest Item_Price object and sort on these:
from django.db.models import OuterRef, Subquery
last_price = Item_Price.objects.filter(
item_id=OuterRef('pk')
).order_by('-created_on').values('price')[:1]
Item.objects.annotate(
last_price=Subquery(last_price)
).order_by('last_price')
For each Item, we thus obtain the latest Item_Price and use this in the annotation.
That being said, the above modelling is perhaps not ideal, since it will require a lot of complex queries. django-simple-history [readthedocs.io] does this differently by creating an extra model and save historical records. It also has a manager that allows one to query for historical states. This perhaps makes working with historical dat simpeler.
You could prefetch them in order to do the nested ordering inline like the following:
from django.db.models import Prefetch
prefetched_prices = Prefetch("prices", queryset=Item_Price.objects.order_by("price"))
for i in Item.objects.prefetch_related(prefetched_prices): i.name, i.prices.all()
I have a Django App with the following models:
CURRENCY_CHOICES = (('USD', 'US Dollars'), ('EUR', 'Euro'))
class ExchangeRate(models.Model):
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
rate = models.FloatField()
exchange_date = models.DateField()
class Donation(models.Model):
donation_date = models.DateField()
donor = models.CharField(max_length=250)
amount = models.FloatField()
currency = models.CharField(max_length=3, default='USD', choices=CURRENCY_CHOICES)
I also have a form I use to filter donations based on some criteria:
class DonationFilterForm(forms.Form)
min_amount = models.FloatField(required=False)
max_amount = models.FloatField(required=False)
The min_amount and max_amount fields will always represent values in US Dollars.
I need to be able to filter a queryset based on min_amount and max_amount, but for that all the amounts must be in USD. To convert the donation amount to USD I need to multiply by the ExchangeRate of the donation currency and date.
The only way I found of doing this so far is by iterating the dict(queryset) and adding a new value called usd_amount, but that may offer very poor performance in the future.
Reading Django documentation, it seems the same thing can be done using aggregation, but so far I haven't been able to create the right logic that would give me same result.
I knew I had to use annotate to solve this, but I didn't know exactly how because it involved getting data from an unrelated Model.
Upon further investigation I found what I needed in the Django Documentation. I needed to use the Subquery and the OuterRef expressions to get the values from the outer queryset so I could filter the inner queryset.
The final solution looks like this:
# Prepare the filter with dynamic fields using OuterRef
rates = ExchangeRate.objects.filter(exchange_date=OuterRef('date'), currency='EUR')
# Get the exchange rate for every donation made in Euros
qs = Donation.objects.filter(currency='EUR').annotate(exchange_rate=Subquery(rates.values('rate')[:1]))
# Get the equivalent amount in USD
qs = qs.annotate(usd_amount=F('amount') * F('exchange_rate'))
So, finally, I could filter the resulting queryset like so:
final_qs = qs.filter(usd_amount__gte=min_amount, usd_amount__lte=max_amount)
I’m trying to find duplicates of a Django model-object's instance based on grandparent-instance id and filter out older duplicates based on timestamp field.
I suppose I could do this with distinct(*specify_fields) function, but I don’t use Postgresql database (docs). I managed to achieve this with the following code:
queryset = MyModel.objects.filter(some_filtering…) \
.only('parent_id__grandparent_id', 'timestamp' 'regular_fields'...) \
.values('parent_id__grandparent_id', 'timestamp' 'regular_fields'...)
# compare_all_combinations_and_remove_duplicates_with_older_timestamps
list_of_dicts = list(queryset)
for a, b in itertools.combinations(list_of_dicts, 2):
if a['parent_id__grandparent_id']: == b['parent_id__grandparent_id']:
if a['timestamp'] > b['timestamp']:
list_of_dicts.remove(b)
else:
list_of_dicts.remove(a)
However, this feels hacky and I guess this is not an optimal solution. Is there a better way (by better I mean more optimal, i.e. minimizing the number of times querysets are evaluated etc.)? Can I do the same with queryset’s methods?
My models look something like this:
class MyModel(models.Model):
parent_id = models.ForeignKey('Parent'…
timestamp = …
regular_fields = …
class Parent(models.Model):
grandparent_id = models.ForeignKey('Grandparent'…
class Grandparent(models.Model):
…