Django, how to make multiple annotate in single queryset?

Django, how to make multiple annotate in single queryset? - python

using Django 1.7, Python 3.4 and PostgreSQL 9.1 I am having difficulties with annotate over queryset.
Here is my model:
class Payment(models.Model):
TYPE_CHOICES= (
('C', 'CREDIT'),
('D', 'DEBIT')
)
amount = models.DecimalField(max_digits=8, decimal_places=2, default=0.0)
customer = models.ForeignKey(Customer, null=False)
type=models.CharField(max_length=1, null=True, choices=TYPE_CHOICES)
class Customer(models.Model):
name = models.CharField(max_length=100, unique=True)
available_funds = models.DecimalField(max_digits=8, decimal_places=2, null=True, default=0.0)
total_funds = models.DecimalField(max_digits=8, decimal_places=2, null=True, default=0.0)
What I am trying to get is something like:
Customers:
Name | Total in | Total out | available funds | total funds
-----------------------------------------------------------------
cust 1 | 255 | 220 | 5 | 35
cust 2 | 100 | 120 | 0 | -20
cust 3 | 50 | 20 | 15 | 30
and some data:
Payments:
amount | customer | type
--------------------------
20 | cust 1 | D
10 | cust 1 | c
70 | cust 2 | D
20 | cust 2 | C
10 | cust 2 | D
25 | cust 1 | C
200 | cust 3 | D
10 | cust 3 | C
20 | cust 1 | D
i was trying this query set:
Customer.objects.select_related().filter(Q(payment__isnull=False)& Q(payment__type='D')).values('name').annotate(Sum('payment__amount'))
but i am getting only Debits.
I don't know how to create a list with customer,total in, total out, total funds, available funds.
Can anyone help me with this?

I think you're hitting a limit of what you can do with a single queryset. The reason I say this is that you're asking to do database aggregation on different sets of Payment records.
Let's look at your current queryset:
Customer.objects.select_related().filter(Q(payment__isnull=False)& Q(payment__type='D')).values('name').annotate(Sum('payment__amount'))
Ignoring the extraneous Q() calls, the filter call payment__type='D' means that the payment_amount will always only pertain to debits. If you change that to 'C', it'll always only pertain to credits. This query demonstrates a fundamental constraint imposed on you by Django's queryset language -- you can't really generate two different aggregations and annotate them into a single record.
Taking a detour off to raw SQL land to see how I'd write this query is another way of demonstrating the point. You'll note, of course, that I still am running two different Payment aggregations here! One for credits and one for debits.
SELECT
*
FROM
customer
INNER JOIN
(
SELECT SUM(amount) as total FROM Payment WHERE type='C' GROUP BY customer_id, type
) AS credits
ON credits.customer_id=customer.id
INNER JOIN
(
SELECT SUM(amount) as total FROM Payment WHERE type='D' GROUP BY customer_id, type
) AS debits
ON debits.customer_id=customer.id
That query will return data approximately of the form:
customer.id | customer.name | ... | credits.total | debits.total
----------------------------------------------------------------
1 | foo bar | | 20 | 30
2 | baz qux | | 30 | 20
If you try to use only one inner join/aggregation, you're forced to have group by both payment type and customer, resulting in a table like this:
customer_id | type | sum(amount)
--------------------------------
1 | C | 20
1 | D | 30
2 | C | 30
2 | D | 20
When you inner join this intermediate result with your customers, it should be immediately clear that debits and credits still are not unified into a single record.
Because you can't do this sort of select with inner joins in Django (as far as I know), you can't really do what you're trying to do in a single query. However, there are solutions to your problem.
In descending order of desirability (in my opinion, of course -- and based on what I consider obviousness/maintainability of your resulting code), the first is to just do multiple queries and unify the results manually.
You can also track credits/debits as a part of the Customer record. You're already tracking available funds this way (you're using F objects in your query to update/maintain these records, right?), so it's not really too much more onerous to maintain credit/debit summaries in similar fashion as well.
Lastly, and I don't think you should do this as I don't think there's a burning need to, you can perform a raw SQL query to get the results you need in one go.

Related

Django queryset - Add HAVING constraint after annotate(F())

I had a seemingly normal situation with adding HAVING to the query.
I read here and here, but it did not help me
I need add HAVING to my query
MODELS :
class Package(models.Model):
status = models.IntegerField()
class Product(models.Model):
title = models.CharField(max_length=10)
packages = models.ManyToManyField(Package)
Products:
|id|title|
| - | - |
| 1 | A |
| 2 | B |
| 3 | C |
| 4 | D |
Packages:
|id|status|
| - | - |
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 2 |
Product_Packages:
|product_id|package_id|
| - | - |
| 1 | 1 |
| 2 | 1 |
| 2 | 2 |
| 3 | 2 |
| 2 | 3 |
| 4 | 3 |
| 4 | 4 |
visual
pack_1 (A, B) status OK
pack_2 (B, C) status not ok
pack_3 (B, D) status OK
pack_4 (D) status not ok
My task is to select those products that have the latest package in status = 1
Expected result is : A, B
my query is like this
SELECT prod.title, max(tp.id)
FROM "product" as prod
INNER JOIN "product_packages" as p_p ON (p.id = p_p.product_id)
INNER JOIN "package" as pack ON (pack.id = p_p.package_id)
GROUP BY prod.title
HAVING pack.status = 1
it returns exactly what I needed
|title|max(pack.id)|
| - | - |
| A | 1 |
| B | 3 |
BUT my orm does not work correctly
I try like this
p = Product.objects.values('id').annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1).values('id', 'pack_id')
p.query
SELECT "product"."id", MAX("product_packages"."package_id") AS "pack_id"
FROM "product" LEFT OUTER JOIN "product_packages" ON ("product"."id" = "product_packages"."product_id") LEFT OUTER JOIN "package" ON ("product_packages"."package_id" = "package"."id")
WHERE "package"."status" = 1
GROUP BY "product"."id"
please help me to make correct ORM

How the query looks like when you remove
.values('id', 'pack_id')
At the end?
If I remember correctly then:
p = Product.objects.values('id').annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1)
and
p = Product.objects.annotate(pack_id = Max('packages')).annotate(my_status = F('packages__status')).filter(my_status=1).values('id')
Will result with different queries

Haki Benita has an excellent site that is made for Database Gurus and how to make to most of Django.
You can take a look at this post:
https://hakibenita.com/django-group-by-sql#how-to-use-having
Django has a very specific way of adding the "HAVING" operator, i.e. your query set needs to be structured so that your annotation is followed by a 'values' call to single out the column you want to group by, then annotate the Max or whatever aggregate you want.
Also this annotation seems like it won't work annotate(my_status = F('packages__status') you want to annotate multiple status to a single annotation.
You might want to try a subquery to annotate the way you want.
e.g.
Product.objects.annotate(
latest_pack_id=Subquery(
Package.objects.order_by('-pk').filter(status=1).values('pk')[:1]
)
).filter(
packages__in=F('latest_pack_id')
)
Or something along those lines, I haven't tested this out

I think you can try like this with subquery:
from django.db.models import OuterRef, Subquery
sub_query = Package.objects.filter(product=OuterRef('pk')).order_by('-pk')
products = Product.objects.annotate(latest_package_status=Subquery(sub_query.values('status')[0])).filter(latest_package_status=1)
Here first I am preparing the subquery by filtering the Package model with Product's primary key and ordering them by Package's primary key. Then I took the latest value from the subquery and annotating it with Product queryset, and filtering out the status with 1.

Find min max from two different tables

I have 3 tables in a MYSQL DB
ORDER
order_id | order_date
-------------------------
1 | 2021-09-20
2 | 2021-09-21
PRODUCTS
product_id | product_price
------------------------------
1 | 30
2 | 34
3 | 39
4 | 25
ORDER_PRODUCTS
product_id | order_id | discount_price
------------------------------------------
1 | 1 | null
2 | 1 | 18
1 | 2 | null
4 | 2 | null
Now I want to know the min and max prices of all products in a specific ORDER record when I give a specific product_id (I need all the ORDERS that have the provided product) group by order_id. I got the required data for this, but here is the tricky part, the ORDER_PRODUCTS table will have the discounted_price for that particular product for that specific ORDER.
So, when computing MIN, MAX values I want discount_price to be prioritized instead of product_price if that product doesn't have any discount_price then product_price should be returned.
EX:
order_id | min_price | max_price
------------------------------------------------------
1 | 18(p_id=2)(discount price) | 30(p_id=1)
2 | 25(p_id=4) | 30(p_id=1)

If I understand correctly you are looking for the IfNull()function, you can read about it here
You can simply surround the IfNull()function in the appropriate aggregate function
select o.order_id,
min(ifnull(discount_price,product_price)),
max(ifnull(discount_price,product_price))
from PRODUCTS p
inner join ORDER_PRODUCTS op on op.product_id =p.product_id
inner join ORDER o on o.order_id = op.order_id
group by o.order_id, p.product_id

Annotating a count of a superset of fields with Django

So the setup here is I have a Post table that contains a bunch of posts. Some of these rows are different versions of the same post, which are grouped together by post_version_group_id, so it looks like something like:
pk | title | post_version_group_id
1 | a | 123
2 | b | 789
3 | c | 123
4 | d | 123
so there are two "groups" of posts, and 4 posts in total. Now each post has a foreign key pointing to a PostDownloads table that looks like
post | user_downloaded
1 | user1
2 | user2
3 | user3
4 | user4
what I'd like to be able to do is annotate my Post queryset so that it looks like:
pk | title | post_version_group_id | download_count
1 | a | 123 | 3
2 | b | 789 | 1
3 | c | 123 | 3
4 | d | 123 | 3
i.e have all the posts with the same post_version_group_id have the same count (being the sum of downloads across the different versions).
At the moment, I'm currently doing:
Post.objects.all().annotate(download_count=models.Count("downloads__user_downloaded, distinct=True))
which doesn't quite work, it annotates a download_count which looks like:
pk | title | post_version_group_id | download_count
1 | a | 123 | 1
2 | b | 789 | 1
3 | c | 123 | 1
4 | d | 123 | 1
since the downloads__user_downloaded seems to only be limited to the set of rows inside the downloads table that is linked to the current post row being annotate, which makes sense - really, but is working against me in this particular case.
One thing I've also tried is
Post.objects.all().values("post_version_group_id").annotate(download_count=Count("downloads__user_downloaded", distinct=True))
which kind of works, but the .values() bit breaks the queryset and of post instances to queryset of dicts - and I need it to stay a queryset of post instances.
The actual models look something like:
class Post:
title = models.CharField()
post_version_group_id = models.UUIDField()
class PostDownloads:
post = models.ForeignKey(Post)
user_downloaded = models.ForeignKey(User)

So, I ended up figuring this out and thought I'd post the answer for anybody else that got stuck in the same rut. The key here was using a Subquery, but not just any Subquery - a custom one that returns a count of rows rather then a default Subquery type that returns a single row of data.
First step is defining this custom subquery type:
class SubqueryCount(models.Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
Then building the subquery:
downloads_subquery = PostDownloads
.objects
.filter(
post__post_version_group_id=models.OuterRef(
"post_version_group_id"
)
)
.distinct("user")
which filters based on that grouping version id I had.
And finally, executing the subquery in the annotation:
Post.objects.annotate(download_count=SubqueryCount(downloads_subquery))

Django Subquery Subset of Previous Records

I am trying to annotate a queryset with an aggregate of a subset of previous rows. Take the following example table of a player's score in a particular game, with the column, last_2_average_score being the rolling average from the previous two games score for a particular player.
+----------+-----------+---------+-------------------------+
| date | player | score | last_2_average_score |
+----------+-----------+---------+-------------------------+
| 12/01/19 | 1 | 10 | None |
| 12/02/19 | 1 | 9 | None |
| 12/03/19 | 1 | 8 | 9.5 |
| 12/04/19 | 1 | 7 | 8.5 |
| 12/05/19 | 1 | 6 | 7.5 |
+----------+-----------+---------+-------------------------+
In order to accomplish this, i wrote the following query, trying to annotate each "row" with the corresponding 2 game average for their score
ScoreModel.objects.annotate(
last_two_average_score=Subquery(
ScoreModel.objects.filter(
player=OuterRef("player"), date__lt=OuterRef("date")
)
.order_by("-date")[:2]
.annotate(Avg("score"))
.values("score__avg")[:1],
output_field=FloatField(),
)
)
This query however, does not output the correct result. In fact the result is just every record annotated with
{'last_two_average_score': None}
I have tried a variety of different combinations of the query, and cannot find the correct combination. Any advice that you can give would be much appreciated!

Instead of trying to address the problem from the ORM first, I ended up circling back and first trying to implement the query in raw SQL. This immediately lead me to the concept of WINDOW functions, which when I looked in Django's ORM for, found very quickly.
https://docs.djangoproject.com/en/3.0/ref/models/expressions/#window-functions
For this interested, the resulting query looks something like this, which was much simpler than what I was trying to accomplish with Subquery
ScoreModel.objects.annotate(
last_two_average=Window(
expression=Avg("score"),
partition_by=[F("player")],
order_by=[F("date").desc()],
frame=RowRange(start=-2, end=0),
)
)

How to make func.sum and group_by to output sum of the rows and merge the duplicate rows using sqlalchemy

I want to generate a table which sums the number of books that are sold and the total amount paid for that distinct book in a given period of time. I need it to show a report of books that are sold.
My subquery is:
bp = db.session.query(CustomerPurchase.book_category_id,
func.sum(CustomerPurchase.amount).label('amount'),
func.sum(CustomerPurchase.total_price).label('total_price'))\
.filter(CustomerPurchase.created_on >= start_date)\
.filter(CustomerPurchase.created_on <= end_date)\
.group_by(CustomerPurchase.book_category_id).subquery()
Combined query with a subquery:
cp = CustomerPurchase.query\
.join(bp, bp.c.category_id == CustomerPurchase.category_id)\
.distinct(bp.c.category_id)\
.order_by(bp.c.category_id)
My CustomerPurchase table looks like this and the output of my query looks the same:
id | book_category_id | book_title | amount | total_price |
---+------------------+------------+--------+-------------+
1 | 1 | Book A | 10 | 35.00 |
2 | 1 | Book A | 20 | 70.00 |
3 | 2 | Book B | 40 | 45.00 |
Desired output after the query run should be like this:
id | book_category_id | book_title | amount | total_price |
---+------------------+------------+--------+-------------+
1 | 1 | Book A | 30 | 105.00 |
2 | 2 | Book B | 40 | 45.00 |
Above query displays all the books that are sold to customer from CustomerPurchase table, but it doesn't SUM the amount and total_price nor it merges the duplicate
I have seen many examples but none of them worked for me. Any help is greatly appreciated! Thanks in advance!

So after a lot of research and trials I came up with a query which solved my problem. Basically I used add_column attribute in sqlalchemy which gave me exact rows that I wanted to display for my report.
bp = db.session.query(CustomerPurchase.book_store_category_id,
func.sum(CustomerPurchase.quantity).label('quantity'),
func.sum(CustomerPurchase.total_price).label('total'))\
.filter(CustomerPurchase.created_on >= start_date)\
.filter(CustomerPurchase.created_on <= end_date)
bp = bp.add_column(BookStore.book_amount)\
.filter(BookStore.category_id == CustomerPurchase.book_store_category_id)
bp = bp.add_columns(Category.category_name, Category.total_stock_amount)\
.filter(Category.id == CustomerPurchase.book_store_category_id)
bp = bp.add_column(Category.unit_cost)\
.filter(Category.id == CustomerPurchase.book_store_category_id)
bp = bp.add_column(Book.stock_amount)\
.filter(Book.category_id == CustomerPurchase.book_store_category_id)\
.group_by(BookStore.book_amount, CustomerPurchase.book_store_category_id, Category.category_name, Category.unit_cost, Category.total_stock_amount, Book.stock_amount)
bp = bp.all()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django, how to make multiple annotate in single queryset? - python

Related

Django queryset - Add HAVING constraint after annotate(F())

Find min max from two different tables

Annotating a count of a superset of fields with Django

Django Subquery Subset of Previous Records

How to make func.sum and group_by to output sum of the rows and merge the duplicate rows using sqlalchemy

Categories

Resources