how to subquery in queryset in django? - python

how can i have a subquery in django's queryset? for example if i have:
select name, age from person, employee where person.id = employee.id and
employee.id in (select id from employee where employee.company = 'Private')
this is what i have done yet.
Person.objects.value('name', 'age')
Employee.objects.filter(company='Private')
but it not working because it returns two output...

as mentioned by ypercube your use case doesn't require subquery.
but anyway since many people land into this page to learn how to do sub-query here is how its done.
employee_query = Employee.objects.filter(company='Private').only('id').all()
Person.objects.value('name', 'age').filter(id__in=employee_query)
Source:
http://mattrobenolt.com/the-django-orm-and-subqueries/

ids = Employee.objects.filter(company='Private').values_list('id', flat=True)
Person.objects.filter(id__in=ids).values('name', 'age')

The correct answer on your question is here https://docs.djangoproject.com/en/2.1/ref/models/expressions/#subquery-expressions
As an example:
>>> from django.db.models import OuterRef, Subquery
>>> newest = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
>>> Post.objects.annotate(newest_commenter_email=Subquery(newest.values('email')[:1]))

You can create subqueries in Django by using an unevaluated queryset to filter your main queryset. In your case, it would look something like this:
employee_query = Employee.objects.filter(company='Private')
people = Person.objects.filter(employee__in=employee_query)
I'm assuming that you have a reverse relationship from Person to Employee named employee. I found it helpful to look at the SQL query generated by a queryset when I was trying to understand how the filters work.
print people.query
As others have said, you don't really need a subquery for your example. You could just join to the employee table:
people2 = Person.objects.filter(employee__company='Private')

hero_qs = Hero.objects.filter(category=OuterRef("pk")).order_by("-benevolence_factor")
Category.objects.all().annotate(most_benevolent_hero=Subquery(hero_qs.values('name')[:1]))
the generated sql
SELECT "entities_category"."id",
"entities_category"."name",
(SELECT U0."name"
FROM "entities_hero" U0
WHERE U0."category_id" = ("entities_category"."id")
ORDER BY U0."benevolence_factor" DESC
LIMIT 1) AS "most_benevolent_hero"
FROM "entities_category"
For more details, see this article.

Take good care with onlyif your subqueries don't select the primary key.
Example:
class Customer:
pass
class Order:
customer: Customer
pass
class OrderItem:
order: Order
is_recalled: bool
Customer has Orders
Order has OrderItems
Now we are trying to find all customers with at least one recalled order-item.(1)
This will not work properly
order_ids = OrderItem.objects \
.filter(is_recalled=True) \
.only("order_id")
customer_ids = OrderItem.objects \
.filter(id__in=order_ids) \
.only('customer_id')
# BROKEN! BROKEN
customers = Customer.objects.filter(id__in=customer_ids)
The code above looks very fine, but it produces the following query:
select * from customer where id in (
select id -- should be customer_id
from orders
where id in (
select id -- should be order_id
from order_items
where is_recalled = true))
Instead one should use select
order_ids = OrderItem.objects \
.filter(is_recalled=True) \
.select("order_id")
customer_ids = OrderItem.objects \
.filter(id__in=order_ids) \
.select('customer_id')
customers = Customer.objects.filter(id__in=customer_ids)
(1) Note: in a real case we might consider 'WHERE EXISTS'

Related

Peewee: Relation does not exist when querying with CTE

I want to query the count of bookings for a given event- if the event has bookings, I want to pull the name of the "first" person to book it.
The table looks something like: Event 1-0 or Many Booking, Booking.attendee is a 1:1 with User Table. In pure SQL I can easily do what I want by using Window Functions + CTE. Something like:
WITH booking AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY b.event_id ORDER BY b.created DESC) rn,
COUNT(*) OVER (PARTITION BY b.event_id) count
FROM
booking b JOIN "user" u on u.id = b.attendee_id
WHERE
b.status != 'cancelled'
)
SELECT e.*, a.vcount, a.first_name, a.last_name FROM event e LEFT JOIN attendee a ON a.event_id = e.id WHERE (e.seats > COALESCE(a.count, 0) and (a.rn = 1 or a.rn is null) and e.cancelled != true;
This gets everything I want. When I try to turn this into a CTE and use Peewee however, I get errors about: Relation does not exist.
Not exact code, but I'm doing something like this with some dynamic where clauses for filtering based on params.
cte = (
BookingModel.select(
BookingModel,
peewee.fn.ROW_NUMBER().over(partition_by=[BookingModel.event_id], order_by=[BookingModel.created.desc()]).alias("rn),
peewee.fn.COUNT(BookingModel.id).over(partition_by=[BookingModel.event_id]).alias("count),
UserModel.first_name,
UserModel.last_name
)
.join(
UserModel,
peewee.JOIN.LEFT_OUTER,
on(UserModel.id == BookingModel.attendee)
)
.where(BookingModel.status != "cancelled")
.cte("test")
query = (
EventModel.select(
EventModel,
UserModel,
cte.c.event_id,
cte.c.first_name,
cte.c.last_name,
cte.c.rn,
cte.c.count
)
.join(UserModel, on=(EventModel.host == UserModel.id))
.switch(EventModel)
.join(cte, peewee.JOIN.LEFT_OUTER, on=(EventModel.id == cte.c.event_id))
.where(where_clause)
.order_by(EventModel.start_time.asc(), EventModel.id.asc())
.limit(10)
.with_cte(cte)
After reading the docs twenty+ times, I can't figure out what isn't right about this. It looks like the samples... but the query will fail, because "relation "test" does not exist". I've played with "columns" being explicitly defined, but then that throws an error that "rn is ambiguous".
I'm stuck and not sure how I can get Peewee CTE to work.

Using COUNT(*) OVER() in current query with SQLAlchemy over PostgreSQL

In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.
So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.

Is it possible to make sql join on several fields using peewee python ORM?

Assuming we have these three models.
class Item(BaseModel):
title = CharField()
class User(BaseModel):
name = CharField()
class UserAnswer(BaseModel):
user = ForeignKeyField(User, 'user_answers')
item = ForeignKeyField(Item, 'user_answers_items')
answer = ForeignKeyField(Item, 'user_answers')
I want to get all Items which does not have related UserAnswer records for current user. In SQL it would be something like this:
select * from item i
left join useranswer ua on ua.item_id=i.id and ua.user_id=1
where ua.id is null;
Is it possible to make a left outer join with constraint on two fields using peewee syntax? It will be cool if I can do it in this way:
Item.select().join(UserAnswer, JOIN_LEFT_OUTER, on=['__my_constraints_here__']).where(
(UserAnswer.id.is_null(True))
)
Yes you can join on multiple conditions:
join_cond = (
(UserAnswer.item == Item) &
(UserAnswer.user == 1))
query = (Item
.select()
.join(
UserAnswer,
JOIN.LEFT_OUTER,
on=join_cond))
.where(UserAnswer.id.is_null(True)))
Docs here: http://docs.peewee-orm.com/en/latest/peewee/api.html#Query.join
Sorry there is not an example of using multiple join conditions, but the on is just an arbitrary expression so you can put any valid peewee "Expression" you like there.
Important: you should import JOIN - from peewee import JOIN

Django query: more on weekends or weekdays

Suppose I have two models:
class User(models.Model):
name = CharField(max_length=42)
class Action(models.Model):
user = models.ForeignKey(User)
timestamp = models.DateTimeField(auto_now_add=True)
How can I find all users who have more actions on weekends (sunday and saturday) than on other days of week and vice versa?
Edit: I don't need to check this condition for one user, that would be easy. I need to select all users who have one of these conditions hold true.
This can be done in one query with the extra method, passing a custom statement in the WHERE clause. This MySQL example selects all users where the number of actions during the weekend is less than or equal to the number of actions on other days:
f = {
'user_table': User._meta.db_table,
'action_table': Action._meta.db_table,
'user_id': User._meta.pk.get_attname_column()[1],
'user_fk': Action._meta.get_field('user').get_attname_column()[1],
'timestamp': Action._meta.get_field('timestamp').get_attname_column()[1],
}
query = "(SELECT COUNT(*) FROM %(action_table)s \
WHERE %(action_table)s.%(user_fk)s = %(user_table)s.%(user_id)s \
AND DAYOFWEEK(%(action_table)s.%(timestamp)s IN (1,7)) \
<= (SELECT COUNT(*) FROM %(action_table)s \
WHERE %(action_table)s.%(user_fk)s = %(user_table)s.%(user_id)s \
AND DAYOFWEEK(%(action_table)s.%(timestamp)s) NOT IN (1,7))" % f
users = User.objects.extra(where=[query]))
The syntax might be slightly different for backends other than MySQL. You should of course alter the table- and columnnames for your situation.

How do you add the mysql query select distinct in python code?

I am trying to display distinct or unique values from my column/field category
The function from views.py:
def category(request, book_category):
latest_book_list = Books.objects.all().order_by('id')
return render_to_response('books/category.html', {'latest_book_list': latest_book_list})
I would like the line: latest_book_list = Books.objects.all().order_by('id')
To perform the mysql query:
mysql> select distinct category from books;
I have tried using the Books.objects.filter(category=book_category) but it returns blank.
Any suggestions?
Try this...
cateogries = Books.objects.values_list('category', flat=True).distinct()
latest_book_list = Books.objects.filter(category__id__in = categories)

Categories

Resources