I'm trying to figure out how to perform the following SQL query with the Django ORM:
SELECT main.A, main.B, main.C
FROM
(SELECT main.A, MAX(main.B)
FROM main
GROUP BY main.A) subq
WHERE main.A = subq.A
AND main.B = subq.B
The last two lines are necessary because they recover the column C value when B is at a maximum in the group by. Without them, I would have A and the corresponding Max B but not the C value when B is at its max. I have searched extensively but cannot find an example that can construct this query using the Django ORM. Most examples use Django's Subquery class and show how to match the sub-queryset up with one column (so doing main.A = subq.A). But how do I match 2+ columns?
Edit:
Here is the model class:
class Tweets(models.Model):
tweet_id = models.AutoField(primary_key=True)
tweet_date = models.DateTimeField(blank=True)
candidate = models.CharField(max_length=100)
district = models.IntegerField(blank=True)
username = models.CharField(max_length=256)
likes = models.IntegerField(blank=True)
tweet_text = models.CharField(max_length=560)
I'd like to group by "candidate" and "district", then find the tweet with the most likes. But I'd also like to know the "username" and "tweet_text" associated with that tweet that had the most likes.
Related
I have a model called Actuals with a field called category which is unique, and another model called Budget which is a many to many field in the Actuals Model. A user can select a unique category in budget and select it in actuals so there can be many actuals to a budget. I am trying to create a query that will group and Sum 'transaction_amount' by category in Actuals model.
class Actuals(models.Model):
category = models.ForeignKey(Category,on_delete=models.CASCADE)
date = models.DateTimeField(auto_now_add=False)
transactions_amount = models.IntegerField()
vendor = models.CharField(max_length = 255,default="")
details = models.CharField(max_length = 255)
budget = models.ManyToManyField('budget')
def __str__(self):
return self.category.category_feild
This is the query that I currently have. However it still gives me multiple categories
lub = Actuals.objects.filter(category__income_or_expense = 'Expense', date__year = "2022" ,date__month = "01").values('category__category_feild','date').order_by('category__category_feild').annotate(total_actuals = Sum('transactions_amount')).annotate(total_budget = Sum('budget__budget_amt'))
This is the output. There should only be one line for "Fun" and one line for "Paycheck".
<QuerySet [<Actuals: Fun>, <Actuals: Fun>, <Actuals: Paycheck>, <Actuals: Paycheck>]>
annotate method only adds another attribute to the objects returned in the queryset. If you want to get a single object as a result you should use the aggregate queryset method:
lub = Actuals.objects.filter(category__income_or_expense='Expense', date__year="2022" ,date__month="01").order_by('category__category_feild').aggregate(total_actuals=Sum('transactions_amount'), total_budget=Sum('budget__budget_amt'))
To get the result values use:
total_actuals = lub['total_actuals']
total_budget = lub['total_budget']
This code is not tested, so let me know if it works.
Also is category__category_feild a typo, or?
PS., if you're wondering why I didn't use values, see this antipattern
I have an annotation like this: which displays the month wise count of a field
bar = Foo.objects.annotate(
item_count=Count('item')
).order_by('-item_month', '-item_year')
and this produces output like this:
html render
I would like to show the change in item_count when compared with the previous month item_count for each month (except the first month). How could I achieve this using annotations or do I need to use pandas?
Thanks
Edit:
In SQL this becomes easy with LAG function, which is similar to
SELECT item_month, item_year, COUNT(item),
LAG(COUNT(item)) OVER (ORDER BY item_month, item_year)
FROM Foo
GROUP BY item_month, item_year
(PS: item_month and item_year are date fields)
Do Django ORM have similar to LAG in SQL?
For these types of Query you need to use Window functions in django Orm
For Lag you can take the help of
https://docs.djangoproject.com/en/4.0/ref/models/database-functions/#lag
Working Query in Orm will look like this :
#models.py
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
This is just a dummy table to show you the use case of Lag and Window function in django
Because examples are not available for Lag function on Django Docs.
from django.db.models.functions import Lag, ExtractYear
from django.db.models import F, Window
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=ExtractYear('date_added').asc())).order_by('-num_likes').distinct().query)
Query will look like
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY EXTRACT(YEAR FROM `temp_view_review`.`date_added`) ASC) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
Also if you don't want to order_by on extracted year of date then you can use F expressions like this
print(Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).annotate(item_count_lag=Window(expression=Lag(expression=F('num_likes')),order_by=[F('date_added')])).order_by('-num_likes').distinct().query)
Query for this :
SELECT DISTINCT `temp_view_review`.`user_id`, `temp_view_review`.`review_text`, `temp_view_review`.`rating`, `temp_view_review`.`date_added`, `temp_view_review`.`review_id`, COUNT(`temp_view_likereview`.`id`) AS `num_likes`, LAG(COUNT(`temp_view_likereview`.`id`), 1) OVER (ORDER BY `temp_view_review`.`date_added`) AS `item_count_lag` FROM `temp_view_review` LEFT OUTER JOIN `temp_view_likereview` ON (`temp_view_review`.`review_id` = `temp_view_likereview`.`review_id`) GROUP BY `temp_view_review`.`review_id` ORDER BY `num_likes` DESC
I present a simplified version of my problem. I have venues and timeslots and users and bookings, as shown in the model descriptions below. Time slots are universal for all venues, and users can book into a time slot at a venue up until the venue capacity is reached.
class Venue(models.Model):
name = models.Charfield(max_length=200)
capacity = models.PositiveIntegerField(default=0)
class TimeSlot(models.Model):
start_time = models.TimeField()
end_time = models.TimeField()
class Booking(models.Model):
user = models.ForeignKey(User)
time_slot = models.ForeignKey(TimeSlot)
venue = models.ForeignKey(Venue)
Now I would like to as efficiently as possible get all possible combinations of Venues and TimeSlots and annotate the count of the bookings made for each combination, including the case where the number of bookings is 0.
I have managed to achieve this in raw SQL using a cross join on the Venue and TimeSlot tables. Something to the effect of the below. However despite exhaustive searching have not been able to find a django equivalent.
SELECT venue.name, timeslot.start_time, timeslot.end_time, count(booking.id)
FROM myapp_venue as venue
CROSS JOIN myapp_timeslot as timeslot
LEFT JOIN myapp_booking as booking on booking.time_slot_id = timeslot.id
GROUP BY venue.name, timeslot.start_time, timeslot.end_time
I'm also able to annotate the query to retrieve the count of bookings for which bookings for that combination do exist. But those combinations with 0 bookings get excluded. Example:
qs = Booking.objects.all().values(
venue=F('venue__name'),
start_time=F('time_slot__start_time'),
end_time=F('time_slot__end_time')
).annotate(bookings=Count('id')) \
.order_by('venue', 'start_time', 'end_time')
How can I achieve the effect of the CROSS JOIN query using the django ORM?
I don't believe Django has the capability to do cross joins without reverting down to raw SQL. I can give you two ideas that could point you in the right direction though:
Combination of queries and python loops.
venues = Venue.objects.all()
time_slots = TimeSlot.objects.all()
qs = ** your customer query above **
# Loop through both querysets, to create a master list.
venue_time_slots = []
for venue in venues:
for time_slot in time_slots:
venue_time_slots.append(venue.name, time_slot.start_time, time_slot.end_time, 0)
# Loop through master list and then compare to custom qs to update the count.
for venue_time in venue_time_slots:
for vt in qs:
# Check if venue and time found.
if venue_time[0] == qs.venue and venue_time[1] == qs.start_time:
venue_time[3] += qs.bookings
break
The harder one which I don't have a solution is to use a combination of filter, exclude, and union. I only have used this with 3 tables (two parents with a child-link-table), where you have 4 including user. So I can only provide the logic and not an example.
# Get all results that exist in table using .filter().
first_query.filter()
# Get all results that do not exist by using .exclude().
# You can use your results from the first query to exclude also, but
# would need to create an interim list.
exclude_ids = [fq_row.id for fq_row in first_query]
second_query.exclude(id__in=exclude_ids)
# Combine both queries
query = first_query.union(second_query)
return query
I have a messy and old query that I'm trying to convert from SQL to Django ORM and I can't seem to figure it out.
As the original query is not something that should be public, heres something similair to what I'm working with:
Table 1
id
Table 2
Id
username
active
birthday
table_1_fk
Table 3
Id
amount
table_1_fk
I need to end up with a list of active users (username), sorted by date, displaying the amount. Table1 references within table 2 and 3 are not in order. The main issues I'm having are:
How do I retrieve these with just ORM (no looping/executing, or hardly any if I must)
If I can't use solely ORM and do decide to just loop over the parts I need to, how would I even create a single object to display in a table without looping over everything multiple times?
My tought processes:
Table 2 is active -> get table 1 -> find table 1 pk in table 3 -> add table 3 info to table 1?
Table 1 -> get table 2 Actives, Table1 -> get table 3 amounts -> loop to match according to table1_fks
You can perform related references using the Table1. If your models looks something like this:
from django.db import models
from django.db.models import F
class Table1(models.Model):
...
class Table2(models.Model):
username = models.CharField(max_length=100)
active = models.BooleanField()
birthday = models.DateField() # Sorted by date
table1 = models.ForeignKey(Table1, related_name="table2")
class Table3(models.Model):
amount = models.IntegerField()
table1 = models.ForeignKey(Table1, related_name="table3")
You can do later:
>>> users = (
Table1.objects
.filter(table2__active=True)
.annotate(
username=F("table2__username"),
amount=F("table3__amount"),
birthday=F("table2__birthday")
)
.order_by("-birthday")
.values("username", "amount", "birthday")
)
>>> print(users)
[
["user1", 100.0, "2020-01-13"],
["user2", 890.0, "2020-01-10"],
["user3", None, "2020-01-01"],
]
It completely depends on how your models classes are implemented.
Say I have three models as follows representing the prices of goods sold at several retail locations of the same company:
class Store(models.Model):
name = models.CharField(max_length=256)
address = models.TextField()
class Product(models.Model):
name = models.CharField(max_length=256)
description = models.TextField()
class Price(models.Model):
store = models.ForeignKey(Store)
product = models.ForeignKey(Product)
effective_date = models.DateField()
value = models.FloatField()
When a price is set, it is set on a store-and-product-specific basis. I.e. the same item can have different prices in different stores. And each of these prices has an effective date. For a given store and a given product, the currently-effective price is the one with the latest effective_date.
What's the best way to write the query that will return the currently-effective price of all items in all stores?
If I were using Pandas, I would get myself a dataframe with columns ['store', 'product', 'effective_date', 'price'] and I would run
dataframe\
.sort_values(columns=['store', 'product', 'effective_date'], ascending=[True, True, False])\
.groupby('store', 'product')['price'].first()
But there has to be some way of doing this directly on the database level. Thoughts?
If your DBMS is PostgreSQL you can use distinct combined with order_by this way :
Price.objects.order_by('store','product','-effective_date').distinct('store','product')
It will give you all the latest prices for all product/store combinations.
There are tricks about distinct, have a look at the docs here : https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Without Postgres' added power (which you should really use) there is a more complicated solution to this (based on ryanpitts' idea), which requires two db hits:
latest_set = Price.objects
.values('store_id', 'product_id') # important to have values before annotate ...
.annotate(max_date=Max('effective_date')).order_by()
# ... to annotate for the grouping that results from values
# Build a query that reverse-engineers the Price records that contributed to
# 'latest_set'. (Relying on the fact that there are not 2 Prices
# for the same product-store with an identical date)
q_statement = Q(product_id=-1) # sth. that results in empty qs
for latest_dict in latest_set:
q_statement |=
(Q(product_id=latest_dict['product_id']) &
Q(store_id=latest_dict['store_id']) &
Q(effective_date=latest_dict['max_date']))
Price.objects.filter(q_statement)
If you are using PostgreSQL, you could use order_by and distinct to get the current effective prices for all the products in all the stores as follows:
prices = Price.objects.order_by('store', 'product', '-effective_date')
.distinct('store', 'product')
Now, this is quite analogous to what you have there for Pandas.
Do note that using field names in distinct only works in PostgreSQL. Once you have sorted the prices based on store, product and decreasing order of effective date, distinct('store', 'product') will retain only the first entry for each store-product pair and that will be your current entry with recent price.
Not PostgreSQL database:
If you are not using PostgreSQL, you could do it with two queries:
First, we get latest effective date for all the store-product groups:
latest_effective_dates = Price.objects.values('store_id', 'product_id')
.annotate(led=Max('effective_date')).values('led')
Once we have these dated we could get the prices for this date:
prices = Price.objects.filter(effective_date__in=latest_effective_dates)
Disclaimer: This assumes that for no effective_date is same for any store-product group.