Django Subquery with Scalar Value

Django Subquery with Scalar Value - python

Is it possible to compare subqueries results with scalar values using Django ORM? I'm having a hard time to convert this:
SELECT payment_subscription.*
FROM payment_subscription payment_subscription
JOIN payment_recurrent payment_recurrent ON payment_subscription.id = payment_recurrent.subscription_id
WHERE
payment_subscription.status = 1
AND (SELECT expiration_date
FROM payment_transaction payment_transaction
WHERE payment_transaction.company_id = payment_subscription.company_id
AND payment_transaction.status IN ('OK', 'Complete')
ORDER BY payment_transaction.expiration_date DESC, payment_transaction.id DESC
LIMIT 1) <= ?
The main points are:
The last comparison of the scalar value of the subquery with an arbitrary parameter.
the join between the subquery and the outer query with the company id

Subscription.objects.annotate(
max_expiraton_date=Max('transaction__expiration_date')
).filter(
status=1,
recurrent__isnull=False, # [inner] join with recurrent
transaction__status__in=['OK', 'Complete'],
max_expiraton_date=date_value
)
This produces other SQL query, but obtains the same Subscription objects.

You can (as of Django 1.11) annotate on a subquery, and slice it to ensure you only get the "first" result. You can then filter on that subquery annotations, by comparing to the value you want.
from django.db.models.expressions import Subquery, OuterRef
expiration_date = Transaction.objects.filter(
company=OuterRef('company'),
status__in=['OK', 'Complete'],
).order_by('-expiration_date').values('expiration_date')[:1]
Subscription.objects.filter(status=1).annotate(
expiration_date=Subquery(expiration_date),
).filter(expiration_date__lte=THE_DATE)
However...
Currently that can result in really poor performance: your database will evaluate the subquery twice (once in the where clause, from the filter, and again in the select clause, from the annotation). There is work underway to resolve this, but it's not currently complete.

Related

Aggregating joined tables in SQLAlchemy

I got this aggregate function working in Django ORM, it counts some values and percents from the big queryset and returns the resulting dictionary.
queryset = Game.objects.prefetch_related(
"timestamp",
"fighters",
"score",
"coefs",
"rounds",
"rounds_view",
"rounds_view_f",
"finishes",
"rounds_time",
"round_time",
"time_coef",
"totals",
).all()
values = queryset.aggregate(
first_win_cnt=Count("score", filter=Q(score__first_score=5)),
min_time_avg=Avg("round_time__min_time"),
# and so on
) # -> dict
I'm trying to achieve the same using SQLAlchemy and this is my tries so far:
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)
.options(
contains_eager(models.Game.fighters),
contains_eager(models.Game.timestamp),
contains_eager(models.Game.coefs),
contains_eager(models.Game.rounds),
contains_eager(models.Game.rounds_view),
contains_eager(models.Game.rounds_view_f),
contains_eager(models.Game.finishes),
contains_eager(models.Game.score),
contains_eager(models.Game.rounds_time),
contains_eager(models.Game.round_time),
contains_eager(models.Game.time_coef),
contains_eager(models.Game.totals),
)
.all()
)
And it gives me an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError)
column "stats_fighters.id" must appear in the GROUP BY clause or be
used in an aggregate function LINE 1: SELECT stats_fighters.id AS
stats_fighters_id, stats_fighter...
I don't really understand why there should be stats_fighters.id in the group by, and why do I need to use group by. Please help.
This is the SQL which generates Django ORM:
SELECT
AVG("stats_roundtime"."min_time") AS "min_time_avg",
COUNT("stats_score"."id") FILTER (
WHERE "stats_score"."first_score" = 5) AS "first_win_cnt"
FROM "stats_game" LEFT OUTER JOIN "stats_roundtime" ON ("stats_game"."id" = "stats_roundtime"."game_id")
LEFT OUTER JOIN "stats_score" ON ("stats_game"."id" = "stats_score"."game_id")

Group by is used in connection with rows that have the same values and you want to calculate a summary. It is often used with sum, max, min or average.
Since SQLAlchemy generates the final SQL command you need to know your table structure and need to find out how to make SQLAlchemy to generate the right SQL command.
Doku says there is a group_by method in SQLAlchemy.
May be this code might help.
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.group_by(models.Game.fighters)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)

func.count is an aggregation function. If any expression in your SELECT clause uses an aggregation, then all expressions in the SELECT must be constant, aggregation, or appear in the GROUP BY.
if you try SELECT a,max(b) the SQL parser would complain that a is not an aggregation or in group by. In your case, you may consider adding models.Game to GROUP BY.

SQLAlchemy: How to use group_by() correctly (only_full_group_by)?

I'm trying to use the group_by() function of SQLAlchemy with the mysql+mysqlconnector engine:
rows = session.query(MyModel) \
.order_by(MyModel.published_date.desc()) \
.group_by(MyModel.category_id) \
.all()
It works fine with SQLite, but for MySQL I get this error:
[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column '...' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
I know how to solve it in plain SQL, but I'd like to use the advantages of SQLAlchemy.
What's the proper solution with SQLAlchemy?
Thanks in advance

One way to form the greatest-n-per-group query with well defined behaviour would be to use a LEFT JOIN, looking for MyModel rows per category_id that have no matching row with greater published_date:
my_model_alias = aliased(MyModel)
rows = session.query(MyModel).\
outerjoin(my_model_alias,
and_(my_model_alias.category_id == MyModel.category_id,
my_model_alias.published_date > MyModel.published_date)).\
filter(my_model_alias.id == None).\
all()
This will work in about any SQL DBMS. In SQLite 3.25.0 and MySQL 8 (and many others) you could use window functions to achieve the same:
sq = session.query(
MyModel,
func.row_number().
over(partition_by=MyModel.category_id,
order_by=MyModel.published_date.desc()).label('rn')).\
subquery()
my_model_alias = aliased(MyModel, sq)
rows = session.query(my_model_alias).\
filter(sq.c.rn == 1).\
all()
Of course you could use GROUP BY as well, if you then use the results in a join:
max_pub_dates = session.query(
MyModel.category_id,
func.max(MyModel.published_date).label('published_date')).\
group_by(MyModel.category_id).\
subquery()
rows = session.query(MyModel).\
join(max_pub_dates,
and_(max_pub_dates.category_id == MyModel.category_id,
max_pub_dates.published_date == MyModel.published_date)).\
all()

python Django - Convert SQL Query to ORM Query(Subquery)

SELECT *,
(SELECT sum(amount) FROM history
WHERE history_id IN
(SELECT history_id FROM web_cargroup
WHERE group_id = a.group_id) AND type = 1)
as sum
FROM web_car a;
It is very difficult to convert the above query to orm
1. orm annotate is automatically created by group by.
2. It is difficult to put a subquery in the 'in' condition
please help.

If I understand the models you presented, this should work
from django.db.models import Sum
History.objects.filter(
type=1,
id__in=(CarGroup.objects.values('history_id'))
).aggregate(
total_amount=Sum('amount')
)

Get the newest rows for each foreign key ID

I don't want to aggregate any columns. I just want the newest row for each foreign key in a table.
I've tried grouping.
Model.query.order_by(Model.created_at.desc()).group_by(Model.foreign_key_id).all()
# column "model.id" must appear in the GROUP BY clause
And I've tried distinct.
Model.query.order_by(Model.created_at.desc()).distinct(Model.foreign_key_id).all()
# SELECT DISTINCT ON expressions must match initial ORDER BY expressions

This is known as greatest-n-per-group, and for PostgreSQL you can use DISTINCT ON, as in your second example:
SELECT DISTINCT ON (foreign_key_id) * FROM model ORDER BY foreign_key_id, created_at DESC;
In your attempt, you were missing the DISTINCT ON column in your ORDER BY list, so all you had to do was:
Model.query.order_by(Model.foreign_key_id, Model.created_at.desc()).distinct(Model.foreign_key_id)

The solution is to left join an aliased model to itself (with a special join condition). Then filter out the rows that do not have an id.
model = Model
aliased = aliased(Model)
query = model.query.outerjoin(aliased, and_(
aliased.primary_id == model.primary_id,
aliased.created_at > model.created_at))
query = query.filter(aliased.id.is_(None))

Need a workaround to filter on related model and aggregated fields in Django

I opened a ticket for this problem.
In a nutshell here is my model:
class Plan(models.Model):
cap = models.IntegerField()
class Phone(models.Model):
plan = models.ForeignKey(Plan, related_name='phones')
class Call(models.Model):
phone = models.ForeignKey(Phone, related_name='calls')
cost = models.IntegerField()
I want to run a query like this one:
Phone.objects.annotate(total_cost=Sum('calls__cost')).filter(total_cost__gte=0.5*F('plan__cap'))
Unfortunately Django generates bad SQL:
SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"."cap"
and errors with:
ProgrammingError: column "app_plan.cap" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...."plan_id" HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"....
Is there any workaround apart from running raw SQL?

When aggregating, SQL requires any value in a field either be unique within a group, or that the field be wrapped in an aggregation function which ensures that only one value will come out for each group. The problem here is that "app_plan.cap" could have many different values for each combination of "app_phone.id" and "app_phone.plan_id", so you need to tell the DB how to treat those.
So, valid SQL for your result is one of two different possibilities, depending on the result you want. First, you could include app_plan.cap in the GROUP BY function, so that any distinct combination of (app_phone.id, app_phone.plan_id, app_plan.cap) will be a different group:
SELECT "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap",
SUM("app_call"."cost") AS "total_cost"
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id", "app_plan"."cap"
HAVING SUM("app_call"."cost") >= 0.5 * "app_plan"."cap"
The trick is to get the extra value into the "GROUP BY" call. We can weasel our way into this by abusing "extra", though this hard-codes the table name for "app_plan" which is unideal -- you could do it programmatically with the Plan class instead if you wanted:
Phone.objects.extra({
"plan_cap": "app_plan.cap"
}).annotate(
total_cost=Sum('calls__cost')
).filter(total_cost__gte=0.5*F('plan__cap'))
Alternatively, you could wrap app_plan.cap in an aggregation function, turning it into a unique value. Aggregation functions vary by DB provider, but might include things like AVG, MAX, MIN, etc.
SELECT "app_phone"."id", "app_phone"."plan_id",
SUM("app_call"."cost") AS "total_cost",
AVG("app_plan"."cap") AS "avg_cap",
FROM "app_phone"
INNER JOIN "app_plan" ON ("app_phone"."plan_id" = "app_plan"."id")
LEFT OUTER JOIN "app_call" ON ("app_phone"."id" = "app_call"."phone_id")
GROUP BY "app_phone"."id", "app_phone"."plan_id"
HAVING SUM("app_call"."cost") >= 0.5 * AVG("app_plan"."cap")
You could get this result in Django using the following:
Phone.objects.annotate(
total_cost=Sum('calls__cost'),
avg_cap=Avg('plan__cap')
).filter(total_cost__gte=0.5 * F("avg_cap"))
You may want to consider updating the bug report you left with a clearer specification of the result you expect -- for example, the valid SQL you're after.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django Subquery with Scalar Value - python

Related

Aggregating joined tables in SQLAlchemy

SQLAlchemy: How to use group_by() correctly (only_full_group_by)?

python Django - Convert SQL Query to ORM Query(Subquery)

Get the newest rows for each foreign key ID

Need a workaround to filter on related model and aggregated fields in Django

Categories

Resources