Django aggregate filters

Django aggregate filters - python

I have 3 models similar to the below, and I am trying to get the latest sale date for my items in a single query, which is definitely possible using SQL, but I am trying to use the built in Django functionality:
class Item(models.Model):
name = models.CharField()
...
class InventoryEntry(models.Model):
delta = models.IntegerField()
item = models.ForeignKey("Item")
receipt = models.ForeignKey("Receipt", null=True)
created = models.DateTimeField(default=timezone.now)
...
class Receipt(models.Model):
amt = models.IntegerField()
...
What I am trying to do is query my items and annotate the last time a sale was made on them. The InventoryEntry model can be queried for whether or not an entry was a sale based on the existence of a receipt (inventory can also be adjusted because of an order, or being stolen, etc, and I am only interested in the most recent sale).
My query right now looks something like this, which currently just gets the latest of ANY inventory entry. I want to filter the annotation to only return the max value of created when receipt__isnull=False on the InventoryEntry:
Item.objects.filter(**item_qs_kwargs).annotate(latest_sale_date=Max('inventoryentry_set__created'))
I attempted to use the When query expression but it did not work as intended, so perhaps I misused it. Any insight would be appreciated

A solution with conditional expressions should work like this:
from django.db.models import Max, Case, When, F
sale_date = Case(When(
inventoryentry__receipt=None,
then=None
), default=F('inventoryentry__created'))
qs = Item.objects.annotate(latest_sale_date=Max(sale_date))

I have tried some modified solution. Have a look.
from django.db.models import F
Item.objects\
.annotate(latest_inventoryentry_id=Max('inventoryentry__created'))\
.filter(
inventoryentry__id=F('latest_inventoryentry_id'),
inventoryentry__receipt=None
)
I did not check manually. you can check and let me know.
Thanks

Related

Limit prefetch_related to 1 by a certain criteria

So I have models like these
class Status(models.Mode):
name = models.CharField(max_length=255, choices=StatusName.choices, unique=True)
class Case(models.Model):
# has some fields
class CaseStatus(models.Model):
case = models.ForeignKey("cases.Case", on_delete=models.CASCADE, related_name="case_statuses")
status = models.ForeignKey("cases.Status", on_delete=models.CASCADE, related_name="case_statuses")
created = models.DateTimeField(auto_now_add=True)
I need to filter the cases on the basis of the status of their case-status but the catch is only the latest case-status should be taken into account.
To get Case objects based on all the case-statuses, this query works:
Case.objects.filter(case_statuses__status=status_name)
But I need to get the Case objects such that only their latest case_status object (descending created) is taken into account. Something like this is what I am looking for:
Case.objects.filter(case_statuses__order_by_created_first__status=status_name)
I have tried Prefetch as well but doesnt seem to work with my use-case
sub_query = CaseStatus.objects.filter(
id=CaseStatus.objects.select_related('case').order_by('-created').first().id)
Case.objects.prefetch_related(Prefetch('case_statuses', queryset=sub_query)).filter(
case_statuses__status=status_name)
This would be easy to solve in raw postgres by using limit 1. But not sure how can I make this work in Django ORM.

You can annotate your cases with their last status, and then filter on that status to be what you want.
from django.db.models import OuterRef
status_qs = CaseStatus.objects.filter(case=OuterRef('pk')).order_by('-created').values('status__name')[:1]
Case.objects.annotate(last_status=status_qs).filter(last_status=status_name)

Django annotate value based on another model field

I have these two models, Cases and Specialties, just like this:
class Case(models.Model):
...
judge = models.CharField()
....
class Specialty(models.Model):
name = models.CharField()
sys_num = models.IntegerField()
I know this sounds like a really weird structure but try to bare with me:
The field judge in the Case model refer to a Specialty instance sys_num value (judge is a charfield but it will always carries an integer) (each Specialty instance has a unique sys_num). So I can get the Specialty name related to a specific Case instance using something like this:
my_pk = #some number here...
my_case_judge = Case.objects.get(pk=my_pk).judge
my_specialty_name = Specialty.objects.get(sys_num=my_case_judge)
I know this sounds really weird but I can't change the underlying schemma of the tables, just work around it with sql and Django's orm.
My problem is: I want to annotate the Specialty names in a queryset of Cases that have already called values().
I only managed to get it working using Case and When but it's not dynamic. If I add more Specialty instances I'll have to manually alter the code.
cases.annotate(
specialty=Case(
When(judge=0, then=Value('name 0 goes here')),
When(judge=1, then=Value('name 1 goes here')),
When(judge=2, then=Value('name 2 goes here')),
When(judge=3, then=Value('name 3 goes here')),
...
Can this be done dynamically? I look trough django's query reference docs but couldn't produce a working solution with the tools specified there.

You can do this with a subquery expression:
from django.db.models import OuterRef, Subquery
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=OuterRef('judge')).values('name')[:1]
)
)
For some databases, casting might even be necessary:
from django.db.models import IntegerField, OuterRef, Subquery
from django.db.models.functions import Cast
Case.objects.annotate(
specialty=Subquery(
Specialty.objects.filter(sys_num=Cast(
OuterRef('judge'),
output_field=IntegerField()
)).values('name')[:1]
)
)
But the modeling is very bad. Usually it is better to work with a ForeignKey, this will guarantee that the judge can only point to a valid case (so referential integrity), will create indexes on the fields, and it will also make the Django ORM more effective since it allows more advanced querying with (relativily) small queries.

Django prefetch_related and N+1 - How is it solved?

I am sitting with a query looking like this:
# Get the amount of kilo attached to products
product_data = {}
for productSpy in ProductSpy.objects.all():
product_data[productSpy.product.product_id] = productSpy.kilo # RERUN
I do not see how I on my last line would be able to use prefetch_related. In the examples in the docs it's very simplified and somehow makes sense, but I do not understand the whole concept enough to see myself out of this. Could I please get explained what's being done and how? I find this very important to understand, and where met by my first N+1 here.
Thank you up front for your time.
models.py
class ProductSpy(models.Model):
created_by = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.CASCADE)
def __str__(self):
return self.kilo
class Product(models.Model):
product_id = models.IntegerField()
name = models.CharField(max_length=150)
def __str__(self):
return self.name

Django fetches related tables at runtime:
each call to productSpy.product will fetch from the table product using productSpy.id
The latency in I/O operation means that this code is highly inefficient. using prefetch_related will fetch product for all the product spy objects in one shot resulting in better performance.
# Get the amount of kilo attached to products
product_data = {}
product_spies = ProductSpy.objects.all()
product_spies.prefetch_related('product')
product_spies.prefetch_related('kilo')
for productSpy in product_spies:
product_data[productSpy.product.product_id] = productSpy.kilo # RERUN

When one writes productSpy.product if the related object is not already fetched, Django makes automatically will make a query to the database to get the related Product instance. Hence if ProductSpy.objects.all() returned N instances by writing productSpy.product in a loop we will be making N more queries which is what we call N + 1 problem.
Moving further although you can use prefetch_related (will use 2 queries in your case) here it would be better for you to use select_related [Django docs] which will use a LEFT JOIN and get you the related instances in 1 query itself:
product_data = {}
queryset = ProductSpy.objects.select_related('product')
for productSpy in queryset:
product_data[productSpy.product.product_id] = productSpy.kilo # No extra queries as we used `select_related`
Note: There seems to be some problem with your logic here though, as multiple ProductSpy instances can have the same Product,
hence your loop might overwrite some values.

Get time based model statistics in django

I know this is not a django question per say but I am working with django models and would like to have a solution specific to django
Suppose I have a model like this
class Foo(models.Model):
type = models.IntegerField()
timestamp = models.DateTimeField(auto_now_add=True)
Now what is the best method get a count of all objects of type(say 1) spread over date/time
For example: get_stat(type=1) gives me information on how many objects(of type 1) were created on 12/10/2018, on 13/10/2018, 14/10/2018 and so on...

I think you need to use group by. See this answer: How to query as GROUP BY in django?
#classmethod
def get_stat(cls, type):
return cls.objects.filter(type=type).values('timestamp').annotate(
count=Count('id')
).values('timestamp', 'count')
This function is an example in your case.

Django aggregation query on related one-to-many objects

Here is my simplified model:
class Item(models.Model):
pass
class TrackingPoint(models.Model):
item = models.ForeignKey(Item)
created = models.DateField()
data = models.IntegerField()
class Meta:
unique_together = ('item', 'created')
In many parts of my application I need to retrieve a set of Item's and annotate each item with data field from latest TrackingPoint from each item ordered by created field. For example, instance i1 of class Item has 3 TrackingPoint's:
tp1 = TrackingPoint(item=i1, created=date(2010,5,15), data=23)
tp2 = TrackingPoint(item=i1, created=date(2010,5,14), data=21)
tp3 = TrackingPoint(item=i1, created=date(2010,5,12), data=120)
I need a query to retrieve i1 instance annotated with tp1.data field value as tp1 is the latest tracking point ordered by created field. That query should also return Item's that don't have any TrackingPoint's at all. If possible I prefer not to use QuerySet's extra method to do this.
That's what I tried so far... and failed :(
Item.objects.annotate(max_created=Max('trackingpoint__created'),
data=Avg('trackingpoint__data')).filter(trackingpoint__created=F('max_created'))
Any ideas?

Here's a single query that will provide (TrackingPoint, Item)-pairs:
TrackingPoint.objects.annotate(max=Max('item__trackingpoint__created')).filter(max=F('created')).select_related('item').order_by('created')
You would have to query for items without TrackingPoints separately.

This isn't directly answer to your question, but in case don't need exactly what you described you might be interested in greatest-n-per-group solution. You can take a look on my answer on similar question:
Django Query That Get Most Recent Objects From Different Categories
-- this should apply directly to your case:
items = Item.objects.annotate(tracking_point_created=Max('trackingpoint__created'))
trackingpoints = TrackingPoint.objects.filter(created__in=[b.tracking_point_created for b in items])
Note that second line can produce ambiguous results if created dates repeat in TrackingPoint model.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django aggregate filters - python

A solution with conditional expressions should work like this: from django.db.models import Max, Case, When, F sale_date = Case(When( inventoryentryreceipt=None, then=None ), default=F('inventoryentrycreated')) qs = Item.objects.annotate(latest_sale_date=Max(sale_date))

Related

Limit prefetch_related to 1 by a certain criteria

Django annotate value based on another model field

Django prefetch_related and N+1 - How is it solved?

Get time based model statistics in django

Django aggregation query on related one-to-many objects

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django aggregate filters - python

A solution with conditional expressions should work like this: from django.db.models import Max, Case, When, F sale_date = Case(When( inventoryentry__receipt=None, then=None ), default=F('inventoryentry__created')) qs = Item.objects.annotate(latest_sale_date=Max(sale_date))

Related

Limit prefetch_related to 1 by a certain criteria

Django annotate value based on another model field

Django prefetch_related and N+1 - How is it solved?

Get time based model statistics in django

Django aggregation query on related one-to-many objects

Categories

Resources

A solution with conditional expressions should work like this: from django.db.models import Max, Case, When, F sale_date = Case(When( inventoryentryreceipt=None, then=None ), default=F('inventoryentrycreated')) qs = Item.objects.annotate(latest_sale_date=Max(sale_date))