Annotating a count of a superset of fields with Django - python

So the setup here is I have a Post table that contains a bunch of posts. Some of these rows are different versions of the same post, which are grouped together by post_version_group_id, so it looks like something like:
pk | title | post_version_group_id
1 | a | 123
2 | b | 789
3 | c | 123
4 | d | 123
so there are two "groups" of posts, and 4 posts in total. Now each post has a foreign key pointing to a PostDownloads table that looks like
post | user_downloaded
1 | user1
2 | user2
3 | user3
4 | user4
what I'd like to be able to do is annotate my Post queryset so that it looks like:
pk | title | post_version_group_id | download_count
1 | a | 123 | 3
2 | b | 789 | 1
3 | c | 123 | 3
4 | d | 123 | 3
i.e have all the posts with the same post_version_group_id have the same count (being the sum of downloads across the different versions).
At the moment, I'm currently doing:
Post.objects.all().annotate(download_count=models.Count("downloads__user_downloaded, distinct=True))
which doesn't quite work, it annotates a download_count which looks like:
pk | title | post_version_group_id | download_count
1 | a | 123 | 1
2 | b | 789 | 1
3 | c | 123 | 1
4 | d | 123 | 1
since the downloads__user_downloaded seems to only be limited to the set of rows inside the downloads table that is linked to the current post row being annotate, which makes sense - really, but is working against me in this particular case.
One thing I've also tried is
Post.objects.all().values("post_version_group_id").annotate(download_count=Count("downloads__user_downloaded", distinct=True))
which kind of works, but the .values() bit breaks the queryset and of post instances to queryset of dicts - and I need it to stay a queryset of post instances.
The actual models look something like:
class Post:
title = models.CharField()
post_version_group_id = models.UUIDField()
class PostDownloads:
post = models.ForeignKey(Post)
user_downloaded = models.ForeignKey(User)

So, I ended up figuring this out and thought I'd post the answer for anybody else that got stuck in the same rut. The key here was using a Subquery, but not just any Subquery - a custom one that returns a count of rows rather then a default Subquery type that returns a single row of data.
First step is defining this custom subquery type:
class SubqueryCount(models.Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
Then building the subquery:
downloads_subquery = PostDownloads
.objects
.filter(
post__post_version_group_id=models.OuterRef(
"post_version_group_id"
)
)
.distinct("user")
which filters based on that grouping version id I had.
And finally, executing the subquery in the annotation:
Post.objects.annotate(download_count=SubqueryCount(downloads_subquery))

Related

Django queryset - Add HAVING constraint after annotate(F())

I had a seemingly normal situation with adding HAVING to the query.
I read here and here, but it did not help me
I need add HAVING to my query
MODELS :
class Package(models.Model):
status = models.IntegerField()
class Product(models.Model):
title = models.CharField(max_length=10)
packages = models.ManyToManyField(Package)
Products:
|id|title|
| - | - |
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
Packages:
|id|status|
| - | - |
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 2 |
Product_Packages:
|product_id|package_id|
| - | - |
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 2 |
| 2 | 3 |
| 4 | 3 |
| 4 | 4 |
visual
pack_1 (A, B) status OK
pack_2 (B, C) status not ok
pack_3 (B, D) status OK
pack_4 (D) status not ok
My task is to select those products that have the latest package in status = 1
Expected result is : A, B
my query is like this
SELECT prod.title, max(tp.id)
FROM "product" as prod
INNER JOIN "product_packages" as p_p ON (p.id = p_p.product_id)
INNER JOIN "package" as pack ON (pack.id = p_p.package_id)
GROUP BY prod.title
HAVING pack.status = 1
it returns exactly what I needed
|title|max(pack.id)|
| - | - |
| A | 1 |
| B | 3 |
BUT my orm does not work correctly
I try like this
p = Product.objects.values('id').annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1).values('id', 'pack_id')
p.query
SELECT "product"."id", MAX("product_packages"."package_id") AS "pack_id"
FROM "product" LEFT OUTER JOIN "product_packages" ON ("product"."id" = "product_packages"."product_id") LEFT OUTER JOIN "package" ON ("product_packages"."package_id" = "package"."id")
WHERE "package"."status" = 1
GROUP BY "product"."id"
please help me to make correct ORM
How the query looks like when you remove
.values('id', 'pack_id')
At the end?
If I remember correctly then:
p = Product.objects.values('id').annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1)
and
p = Product.objects.annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1).values('id')
Will result with different queries
Haki Benita has an excellent site that is made for Database Gurus and how to make to most of Django.
You can take a look at this post:
https://hakibenita.com/django-group-by-sql#how-to-use-having
Django has a very specific way of adding the "HAVING" operator, i.e. your query set needs to be structured so that your annotation is followed by a 'values' call to single out the column you want to group by, then annotate the Max or whatever aggregate you want.
Also this annotation seems like it won't work annotate(my_status = F('packages__status') you want to annotate multiple status to a single annotation.
You might want to try a subquery to annotate the way you want.
e.g.
Product.objects.annotate(
latest_pack_id=Subquery(
Package.objects.order_by('-pk').filter(status=1).values('pk')[:1]
)
).filter(
packages__in=F('latest_pack_id')
)
Or something along those lines, I haven't tested this out
I think you can try like this with subquery:
from django.db.models import OuterRef, Subquery
sub_query = Package.objects.filter(product=OuterRef('pk')).order_by('-pk')
products = Product.objects.annotate(latest_package_status=Subquery(sub_query.values('status')[0])).filter(latest_package_status=1)
Here first I am preparing the subquery by filtering the Package model with Product's primary key and ordering them by Package's primary key. Then I took the latest value from the subquery and annotating it with Product queryset, and filtering out the status with 1.

Django - How to combine 2 queryset and filter to get same element in both queryset?

I have a model:
class LocationItem(models.Model):
location = models.ForeignKey(Location, on_delete=models.CASCADE)
item = models.ForeignKey(Item, on_delete=models.CASCADE)
stock_qty = models.IntegerField(null=True)
Example: I have some data like this:
------------------------------
| ID | Item | Location | Qty |
------------------------------
| 1 | 1 | 1 | 10 |
------------------------------
| 2 | 2 | 1 | 5 |
------------------------------
| 3 | 1 | 2 | 2 |
------------------------------
| 4 | 3 | 1 | 4 |
------------------------------
| 5 | 3 | 2 | 20 |
------------------------------
I have 2 queryset to get items of each location:
location_1 = LocationItem.objects.filter(location_id=1)
location_2 = LocationItem.objects.filter(location_id=2)
Now I want to combine 2 queryset above into 1 and filter only same items in both 2 location such as result of this example above is [Item 1, Item 3] because item 1 and 3 belong to both location 1 and 2
You can combine django query set using following expression
location_1 = LocationItem.objects.filter(location_id=1)
location_2 = LocationItem.objects.filter(location_id=2)
location = location_1 | location_2
Above combine expression works on same model filter query set.
Try this one
from django.db.models import Count
dupes = LocationItem.objects.values('item__id').annotate(Count('id')).order_by().filter(id__count__gt=1)
LocationItem.objects.filter(item__=[i['item__id'] for i in dupes]).distinct('item__id')
May be above solution help.
If you want both conditions to be true, then you need the AND operator (&)
from django.db.models import Q
Q(location_1) & Q(location_2)
Try this:
location1_items_pk = LocationItem.objects.filter(
location_id=1
).values_list("item_pk", flat=true)
Result = Location.objects.filter(
item_pk__in=location1_items_pk, location_id=2
)
You can do this by piping the filters. The result of a filter is a queryset. So after the first filtering, the result will be [Item1, Item2 , Item3] and then second filter will be applied on the resulting queryset which leads [Item1, Item3]. For eg.
Item.objects.filter(locationitem_set__location = 1).filter(locationitem_set__location = 2)
P.S. Not tested. Hope this works.

How can I filter exported tickets from database using Django?

I am working on a Django based web project where we handle tickets based requests. I am working on an implementation where I need to export all closed tickets everyday.
My ticket table database looks like,
-------------------------------------------------
| ID | ticket_number | ticket_data | is_closed |
-------------------------------------------------
| 1 | 123123 | data 1 | 1 |
-------------------------------------------------
| 2 | 123124 | data 2 | 1 |
-------------------------------------------------
| 3 | 123125 | data 3 | 1 |
-------------------------------------------------
| 4 | 123126 | data 4 | 1 |
-------------------------------------------------
And my ticket_exported table in database is similar to
----------------------------------
| ID | ticket_id | ticket_number |
----------------------------------
| 10 | 1 | 123123 |
----------------------------------
| 11 | 2 | 123124 |
----------------------------------
so my question is that when I process of exporting tickets, is there any way where I can make a single query to get list of all tickets which are closed but ticket_id and ticket_number is not in ticket_exported table? So when I run functions it should get tickets with ticket_id '3' and '4' because they are not exported in ticket_export database.
I don't want to go through all possible tickets and check one by one if their id exists in exported tickets table if I can just do it in one query whether it is raw SQL query or Django's queries.
Thanks everyone.
you can do without is_exported field:
exported_tickets = TicketsExported.objects.all()
unexported_tickets = Tickets.object.exclude(id__in=[et.id for et in exported_tickets])
but is_exported field can be useful somewhere else
Per my comment- you could probably save yourself a bunch of trouble and just add another BooleanField for 'is_exported' instead of having a separate model assuming there aren't fields specific to TicketExported.
#doniyor's answer gets you the queryset you're looking for though. In response to your raw SQL statement question: you want: unexported_tickets.query.

Using F() expressions with lookup of position from a list to update objects

I have 4 BaseReward objects, that have a default ordering of (rank, id) in Meta class. My aim is to update the objects such that I preserve their relative rankings, but give them unique ranks starting from 1,
Originally:
| id | rank |
|----|------|
| 1 | 3 |
| 2 | 2 |
| 3 | 2 |
| 4 | 1 |
after calling rerank_rewards_to_have_unique_ranks() should become
| id | rank |
|----|------|
| 1 | 4 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
I am trying to use F() expression with lookup .index() on list, but Django won't accept it as F() expression has only a fixed set of operators https://docs.djangoproject.com/en/1.8/topics/db/queries/#filters-can-reference-fields-on-the-model
Is there another way of achieving the same in an optimized way, without bringing the objects to database?
models.py
class BaseReward(models.Model):
class Meta:
ordering = ('rank', 'id')
# BaseReward.objects.all() gets the objects ordered by 'rank' as in the Meta class, and then by id if two objects have same rank
helper.py
def rerank_rewards_to_have_unique_ranks():
qs = BaseReward.objects.all() # this would give the rewards of that category ordered by [rank, id]
id_list_in_order_of_rank = list(qs.values_list('id', flat=True)) # get the ordered list of ids sequenced in order of ranks
# now I want to update the ranks of the objects, such that rank = the desired rank
BaseReward.objects.all().update(rank=id_list_in_order_of_rank.index(F('id'))+1)

Django, how to make multiple annotate in single queryset?

using Django 1.7, Python 3.4 and PostgreSQL 9.1 I am having difficulties with annotate over queryset.
Here is my model:
class Payment(models.Model):
TYPE_CHOICES= (
('C', 'CREDIT'),
('D', 'DEBIT')
)
amount = models.DecimalField(max_digits=8, decimal_places=2, default=0.0)
customer = models.ForeignKey(Customer, null=False)
type=models.CharField(max_length=1, null=True, choices=TYPE_CHOICES)
class Customer(models.Model):
name = models.CharField(max_length=100, unique=True)
available_funds = models.DecimalField(max_digits=8, decimal_places=2, null=True, default=0.0)
total_funds = models.DecimalField(max_digits=8, decimal_places=2, null=True, default=0.0)
What I am trying to get is something like:
Customers:
Name | Total in | Total out | available funds | total funds
-----------------------------------------------------------------
cust 1 | 255 | 220 | 5 | 35
cust 2 | 100 | 120 | 0 | -20
cust 3 | 50 | 20 | 15 | 30
and some data:
Payments:
amount | customer | type
--------------------------
20 | cust 1 | D
10 | cust 1 | c
70 | cust 2 | D
20 | cust 2 | C
10 | cust 2 | D
25 | cust 1 | C
200 | cust 3 | D
10 | cust 3 | C
20 | cust 1 | D
i was trying this query set:
Customer.objects.select_related().filter(Q(payment__isnull=False)& Q(payment__type='D')).values('name').annotate(Sum('payment__amount'))
but i am getting only Debits.
I don't know how to create a list with customer,total in, total out, total funds, available funds.
Can anyone help me with this?
I think you're hitting a limit of what you can do with a single queryset. The reason I say this is that you're asking to do database aggregation on different sets of Payment records.
Let's look at your current queryset:
Customer.objects.select_related().filter(Q(payment__isnull=False)& Q(payment__type='D')).values('name').annotate(Sum('payment__amount'))
Ignoring the extraneous Q() calls, the filter call payment__type='D' means that the payment_amount will always only pertain to debits. If you change that to 'C', it'll always only pertain to credits. This query demonstrates a fundamental constraint imposed on you by Django's queryset language -- you can't really generate two different aggregations and annotate them into a single record.
Taking a detour off to raw SQL land to see how I'd write this query is another way of demonstrating the point. You'll note, of course, that I still am running two different Payment aggregations here! One for credits and one for debits.
SELECT
*
FROM
customer
INNER JOIN
(
SELECT SUM(amount) as total FROM Payment WHERE type='C' GROUP BY customer_id, type
) AS credits
ON credits.customer_id=customer.id
INNER JOIN
(
SELECT SUM(amount) as total FROM Payment WHERE type='D' GROUP BY customer_id, type
) AS debits
ON debits.customer_id=customer.id
That query will return data approximately of the form:
customer.id | customer.name | ... | credits.total | debits.total
----------------------------------------------------------------
1 | foo bar | | 20 | 30
2 | baz qux | | 30 | 20
If you try to use only one inner join/aggregation, you're forced to have group by both payment type and customer, resulting in a table like this:
customer_id | type | sum(amount)
--------------------------------
1 | C | 20
1 | D | 30
2 | C | 30
2 | D | 20
When you inner join this intermediate result with your customers, it should be immediately clear that debits and credits still are not unified into a single record.
Because you can't do this sort of select with inner joins in Django (as far as I know), you can't really do what you're trying to do in a single query. However, there are solutions to your problem.
In descending order of desirability (in my opinion, of course -- and based on what I consider obviousness/maintainability of your resulting code), the first is to just do multiple queries and unify the results manually.
You can also track credits/debits as a part of the Customer record. You're already tracking available funds this way (you're using F objects in your query to update/maintain these records, right?), so it's not really too much more onerous to maintain credit/debit summaries in similar fashion as well.
Lastly, and I don't think you should do this as I don't think there's a burning need to, you can perform a raw SQL query to get the results you need in one go.

Categories

Resources