python Django - Convert SQL Query to ORM Query(Subquery) - python

SELECT *,
(SELECT sum(amount) FROM history
WHERE history_id IN
(SELECT history_id FROM web_cargroup
WHERE group_id = a.group_id) AND type = 1)
as sum
FROM web_car a;
It is very difficult to convert the above query to orm
1. orm annotate is automatically created by group by.
2. It is difficult to put a subquery in the 'in' condition
please help.

If I understand the models you presented, this should work
from django.db.models import Sum
History.objects.filter(
type=1,
id__in=(CarGroup.objects.values('history_id'))
).aggregate(
total_amount=Sum('amount')
)

Related

Aggregating joined tables in SQLAlchemy

I got this aggregate function working in Django ORM, it counts some values and percents from the big queryset and returns the resulting dictionary.
queryset = Game.objects.prefetch_related(
"timestamp",
"fighters",
"score",
"coefs",
"rounds",
"rounds_view",
"rounds_view_f",
"finishes",
"rounds_time",
"round_time",
"time_coef",
"totals",
).all()
values = queryset.aggregate(
first_win_cnt=Count("score", filter=Q(score__first_score=5)),
min_time_avg=Avg("round_time__min_time"),
# and so on
) # -> dict
I'm trying to achieve the same using SQLAlchemy and this is my tries so far:
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)
.options(
contains_eager(models.Game.fighters),
contains_eager(models.Game.timestamp),
contains_eager(models.Game.coefs),
contains_eager(models.Game.rounds),
contains_eager(models.Game.rounds_view),
contains_eager(models.Game.rounds_view_f),
contains_eager(models.Game.finishes),
contains_eager(models.Game.score),
contains_eager(models.Game.rounds_time),
contains_eager(models.Game.round_time),
contains_eager(models.Game.time_coef),
contains_eager(models.Game.totals),
)
.all()
)
And it gives me an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError)
column "stats_fighters.id" must appear in the GROUP BY clause or be
used in an aggregate function LINE 1: SELECT stats_fighters.id AS
stats_fighters_id, stats_fighter...
I don't really understand why there should be stats_fighters.id in the group by, and why do I need to use group by. Please help.
This is the SQL which generates Django ORM:
SELECT
AVG("stats_roundtime"."min_time") AS "min_time_avg",
COUNT("stats_score"."id") FILTER (
WHERE "stats_score"."first_score" = 5) AS "first_win_cnt"
FROM "stats_game" LEFT OUTER JOIN "stats_roundtime" ON ("stats_game"."id" = "stats_roundtime"."game_id")
LEFT OUTER JOIN "stats_score" ON ("stats_game"."id" = "stats_score"."game_id")
Group by is used in connection with rows that have the same values and you want to calculate a summary. It is often used with sum, max, min or average.
Since SQLAlchemy generates the final SQL command you need to know your table structure and need to find out how to make SQLAlchemy to generate the right SQL command.
Doku says there is a group_by method in SQLAlchemy.
May be this code might help.
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.group_by(models.Game.fighters)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)
func.count is an aggregation function. If any expression in your SELECT clause uses an aggregation, then all expressions in the SELECT must be constant, aggregation, or appear in the GROUP BY.
if you try SELECT a,max(b) the SQL parser would complain that a is not an aggregation or in group by. In your case, you may consider adding models.Game to GROUP BY.

How to query from nested SELECT using SQLAlchemy ORM [duplicate]

is there any way how to write the following SQL statement in SQLAlchemy ORM:
SELECT AVG(a1) FROM (SELECT sum(irterm.n) AS a1 FROM irterm GROUP BY irterm.item_id);
Thank you
sums = session.query(func.sum(Irterm.n).label('a1')).group_by(Irterm.item_id).subquery()
average = session.query(func.avg(sums.c.a1)).scalar()

Django Subquery with Scalar Value

Is it possible to compare subqueries results with scalar values using Django ORM? I'm having a hard time to convert this:
SELECT payment_subscription.*
FROM payment_subscription payment_subscription
JOIN payment_recurrent payment_recurrent ON payment_subscription.id = payment_recurrent.subscription_id
WHERE
payment_subscription.status = 1
AND (SELECT expiration_date
FROM payment_transaction payment_transaction
WHERE payment_transaction.company_id = payment_subscription.company_id
AND payment_transaction.status IN ('OK', 'Complete')
ORDER BY payment_transaction.expiration_date DESC, payment_transaction.id DESC
LIMIT 1) <= ?
The main points are:
The last comparison of the scalar value of the subquery with an arbitrary parameter.
the join between the subquery and the outer query with the company id
Subscription.objects.annotate(
max_expiraton_date=Max('transaction__expiration_date')
).filter(
status=1,
recurrent__isnull=False, # [inner] join with recurrent
transaction__status__in=['OK', 'Complete'],
max_expiraton_date=date_value
)
This produces other SQL query, but obtains the same Subscription objects.
You can (as of Django 1.11) annotate on a subquery, and slice it to ensure you only get the "first" result. You can then filter on that subquery annotations, by comparing to the value you want.
from django.db.models.expressions import Subquery, OuterRef
expiration_date = Transaction.objects.filter(
company=OuterRef('company'),
status__in=['OK', 'Complete'],
).order_by('-expiration_date').values('expiration_date')[:1]
Subscription.objects.filter(status=1).annotate(
expiration_date=Subquery(expiration_date),
).filter(expiration_date__lte=THE_DATE)
However...
Currently that can result in really poor performance: your database will evaluate the subquery twice (once in the where clause, from the filter, and again in the select clause, from the annotation). There is work underway to resolve this, but it's not currently complete.

How do I select from a many-to-many intermediate model in django?

I have models of books and people:
from django.db import models
class Book(models.Model):
author = models.ManyToManyField('Person')
class Person(models.Model):
name = models.CharField(max_length=16)
I simplified them a bit here. How do I craft a django query to get all of the authors of the books? With SQL I would do a select on the intermediate table and join it with the people table to get the name, but I'm not sure how to do something similar here... Of course, there are people in the Person table that are not book authors, or I could just get Person.objects.all().
As easy as 1,2,3 with Filtering on annotations:
from django.db.models import Count
Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)
For curiosity, i generated the SQL from each of the ways proposed on this topic:
In [9]: Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name", COUNT("testapp_book_author"."book_id") AS "count_book" FROM "testapp_person" LEFT OUTER JOIN "testapp_book_author" ON ("testapp_person"."id" = "testapp_book_author"."person_id") GROUP BY "testapp_person"."id", "testapp_person"."name", "testapp_person"."id", "testapp_person"."name" HAVING COUNT("testapp_book_author"."book_id") > 0 LIMIT 21; args=(0,)
Out[9]: [<Person: Person object>]
In [10]: Person.objects.exclude(book=None)
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" WHERE NOT (("testapp_person"."id" IN (SELECT U0."id" FROM "testapp_person" U0 LEFT OUTER JOIN "testapp_book_author" U1 ON (U0."id" = U1."person_id") LEFT OUTER JOIN "testapp_book" U2 ON (U1."book_id" = U2."id") WHERE (U2."id" IS NULL AND U0."id" IS NOT NULL)) AND "testapp_person"."id" IS NOT NULL)) LIMIT 21; args=()
Out[10]: [<Person: Person object>]
In [11]: Person.objects.filter(pk__in=Book.objects.values_list('author').distinct())
DEBUG (0.000) SELECT "testapp_person"."id", "testapp_person"."name" FROM "testapp_person" WHERE "testapp_person"."id" IN (SELECT DISTINCT U1."person_id" FROM "testapp_book" U0 LEFT OUTER JOIN "testapp_book_author" U1 ON (U0."id" = U1."book_id")) LIMIT 21; args=()
Out[11]: [<Person: Person object>]
Maybe this can help you choose.
Personnaly, i prefer the version by Chris because it is the shortest. On the other hand, I don't know for sure about the impact of having subqueries which is the case for the two other ways. That said, they do demonstrate interresting QuerySet concepts:
Annonation, is aggregation per value of the queryset. If you use aggregate(Count('book')) then you will get the total number of books. If you use annotate(Count('book')) then you get a total number of book per value of the queryset (per Person). Also, each person has a 'count_book' attribute which is pretty cool: Person.objects.annotate(count_book=Count('book')).filter(count_book__gt=0)[0].count_book
Subqueries, very useful to create complex queries or optimize queries (merge querysets, generic relation prefetching for example).
Easiest way:
Person.objects.filter(book__isnull=False)
That will select all people that have at least one book associated with them.
You can get all the ID's of the authors quite easily using
Book.objects.all().values_list('author', flat=True).distinct()
As jpic points out below
Person.objects.filter(pk__in=Book.objects.values_list('author').distinct()) will give you all the person objects and not just their id's.
This chapter of the django book answers all your model needs with examples very similar to yours, including a ManyToMany relation. http://www.djangobook.com/en/2.0/chapter10/

SQLAlchemy subquery - average of sums

is there any way how to write the following SQL statement in SQLAlchemy ORM:
SELECT AVG(a1) FROM (SELECT sum(irterm.n) AS a1 FROM irterm GROUP BY irterm.item_id);
Thank you
sums = session.query(func.sum(Irterm.n).label('a1')).group_by(Irterm.item_id).subquery()
average = session.query(func.avg(sums.c.a1)).scalar()

Categories

Resources