is there any way how to write the following SQL statement in SQLAlchemy ORM:
SELECT AVG(a1) FROM (SELECT sum(irterm.n) AS a1 FROM irterm GROUP BY irterm.item_id);
Thank you
sums = session.query(func.sum(Irterm.n).label('a1')).group_by(Irterm.item_id).subquery()
average = session.query(func.avg(sums.c.a1)).scalar()
Related
I am trying to translate the following query to peewee:
select count(*) from A where
id not in (select distinct package_id FROM B)
What is the correct Python code? So far I have this:
A.select(A.id).where(A.id.not_in(B.select(B.package_id).distinct()).count()
This code is not returning the same result. A and B are large 10-20M rows each. I can't create a dictionary of existing package_id items in the memory.
For example, this takes lot of time:
A.select(A.id).where(A.id.not_in({x.package_id for x in B.select(B.package_id).distinct()}).count()
May be LEFT JOIN?
Update: I ended up calling database.execute_sql()
Your SQL:
select count(*) from A where
id not in (select distinct package_id FROM B)
Equivalent peewee:
q = (A
.select(fn.COUNT(A.id))
.where(A.id.not_in(B.select(B.package_id.distinct()))))
count = q.scalar()
I have a sql query that I want to convert to pyspark:
select * from Table_output where cct_id not in (select * from df_hr_excl)
Pseudo Code:
Table_output=Table_output.select(col("cct_id")).exceptAll(df_hr_excl.select("cct_id")) or
col("cct_id").isin(df_hr_excl.select("cct_id"))
Correlated subqueries in where clause with NOT IN or NOT EXISTS can be written using left anti join :
Table_output = Table_output.join(df_hr_excl, ["cct_id"], "left_anti")
As per your comment, if you have a condition in your subquery then you can put it in the join condition. E.g.:
Table_output = Table_output.alias("a").join(df_hr_excl.alias("b"), (F.col("a.x") > F.col("b.y")) & (F.col("a.id") == F.col("b.id")), "left_anti")
SELECT *,
(SELECT sum(amount) FROM history
WHERE history_id IN
(SELECT history_id FROM web_cargroup
WHERE group_id = a.group_id) AND type = 1)
as sum
FROM web_car a;
It is very difficult to convert the above query to orm
1. orm annotate is automatically created by group by.
2. It is difficult to put a subquery in the 'in' condition
please help.
If I understand the models you presented, this should work
from django.db.models import Sum
History.objects.filter(
type=1,
id__in=(CarGroup.objects.values('history_id'))
).aggregate(
total_amount=Sum('amount')
)
Suppose I have two DataFrames in Pyspark and I'd want to run a nested SQL-like SELECT query, on the lines of
SELECT * FROM table1
WHERE b IN
(SELECT b FROM table2
WHERE c='1')
Now, I can achieve a select query by using where, as in
df.where(df.a.isin(my_list))
given I have selected the my_list tuple of values beforehand. How would I perform a nested query in one go instead?
As for know Spark doesn't support subqueries in WHERE clause (SPARK-4226). The closest thing you can get without collecting is join and distinct roughly equivalent to this:
SELECT DISTINCT table1.*
FROM table1 JOIN table2
WHERE table1.b = table2.b AND table2.c = '1'
is there any way how to write the following SQL statement in SQLAlchemy ORM:
SELECT AVG(a1) FROM (SELECT sum(irterm.n) AS a1 FROM irterm GROUP BY irterm.item_id);
Thank you
sums = session.query(func.sum(Irterm.n).label('a1')).group_by(Irterm.item_id).subquery()
average = session.query(func.avg(sums.c.a1)).scalar()