How to make a subquery in sqlalchemy - python

SELECT *
FROM Residents
WHERE apartment_id IN (SELECT ID
FROM Apartments
WHERE postcode = 2000)
I'm using sqlalchemy and am trying to execute the above query. I haven't been able to execute it as raw SQL using db.engine.execute(sql) since it complains that my relations doesn't exist... But I succesfully query my database using this format: session.Query(Residents).filter_by(???).
I cant not figure out how to build my wanted query with this format, though.

You can create subquery with subquery method
subquery = session.query(Apartments.id).filter(Apartments.postcode==2000).subquery()
query = session.query(Residents).filter(Residents.apartment_id.in_(subquery))

I just wanted to add, that if you are using this method to update your DB, make sure you add the synchronize_session='fetch' kwarg. So it will look something like:
subquery = session.query(Apartments.id).filter(Apartments.postcode==2000).subquery()
query = session.query(Residents).\
filter(Residents.apartment_id.in_(subquery)).\
update({"key": value}, synchronize_session='fetch')
Otherwise you will run into issues.

Related

Converting raw sql query to SQL Alchemy ORM

I'm currently executing this query in one process:
SELECT DISTINCT ON (c.api_key, worker_id) worker_id, c.api_key, a.updated_at, b.user_id, a.country
FROM TABLE_A a
INNER JOIN TABLE_B b ON (b.id = a.user)
INNER JOIN TABLE_C c ON (b.owner = c.id)
WHERE 1=1
AND a.platform = 'x'
AND a.country = 'y'
AND a.access_token is not NULL
ORDER BY c.api_key, worker_id, a.updated_at desc
I'm currently wrapping it using from SQLAlchemy import text and then simply executing
query_results = db.execute(query).fetchall()
list_dicts = [r._asdict() for r in query_results]
df = pd.DataFrame(list_dicts)
and it works, but I would really like to see if it's possible to have it in the other notation, like :
db.query(TABLE_A).filter().join()... etc
Yes, it's possible.
But the exact way to do it will depend on your SQLAlchmey version and how you've setup your SQLAlchemy project and models.
You may want to check out the SQLAlchemy ORM querying guide and the Expression Language Tutorial to see which one fits better your case.

How to create SQL Pypika Query with "Min()"

I am trying to create a Pypika Query which uses the MIN('') function of SQL. Pypika supports the function but I don't know how to use it.
Basically I want to create this SQL statement in Pypika:
select
"ID","Car","Road","House"
from "thingsTable"
where "ID" not in
(
select MIN("ID")
from "thingsTable"
GROUP BY
"Car","Road","House"
)
order by "ID"
I have tried something like this:
from pypika import Query, Table, Field, Function
query = Query.from_(table).select(min(table.ID)).groupby(table.Car, table.Road, table.House)
And variations of it, but can't figure out how to use this function. There are not a lot of examples around.
Thanks in advance.
Try this one
the code based on Selecting Data with pypika
from pypika import functions as fn
tbl = Table('thingsTable')
q = Query.from_(tbl).where(
tbl.ID.isin(tbl.groupby(tbl.Car, tbl.Road, tbl.House).select(fn.Min(tbl.Id)))
).select(
tbl.Id,tbl.Car,tbl.House,tbl.Road
).orderby(tbl.Id)

How to write subquery in django

Is it possible to make following sql query in django
select * from (
select * from users
) order by id
It is just minimal example. I have a long subquery instead of select * from users. But I can't understand how insert it into subquery.
UPDATED:
Subquery from doc doesn't suits because it build following request
SELECT "post"."id", (
SELECT U0."email"
FROM "comment" U0
WHERE U0."post_id" = ("post"."id")
ORDER BY U0."created_at" DESC LIMIT 1
) AS "newest_commenter_email" FROM "post"
and this subquery can return only one value (.values('email')).
Construction select (subquery) as value from table instead of select value from (subquery)
i would use a python connector to postgreSQL - http://www.postgresqltutorial.com/postgresql-python/query/, that is what i do for the mysql, thought did not try for the postgresql
Making a subquery is essentially setting up two queries and using one query to "feed" another:
from django.db.models import Subquery
all_users = User.objects.all()
User.objects.annotate(the_user=Subquery(all_users.values('email')[:1]))
This is more or less the same as what you provided. You can get about as complicated as you'd like here but the best source to get going with subqueries is the docs

How to print the ACTUAL SQLAlchemy query to troubleshoot: SQLAlchemy filter statement replaces filter critieria with %(column_name_1)s

I have a SQLAlchemy query that looks like this:
query = db.session.query(
Place.name,
Place.population,
).filter(Place.population==8000)
But when I print the query, it comes out as:
SELECT place.name AS place_name, place.population AS place_population
FROM place
WHERE place.population = %(population_1)s
I can't figure out why it keeps replacing my filter criteria with %(population_1)s. This query is part of a Flask app, maybe there's something there I'm not understanding?
Edit: changed the Title to be more descriptive of the actual problem.
It is behaving just the way it should. It's just that how you print the query.
from sqlalchemy.dialects import postgresql
query = statement.compile(dialect=postgresql.dialect(),compile_kwargs={"literal_binds": True})
print(query) # will print the compiled query statement againt the dialect.

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?
This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)
I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

Categories

Resources