I want to query with window function and then do some group by aggregation on the subquery. But I couldn't make it with ORM method. It will return aggregate function calls cannot contain window function calls
Is there any way to make a query like SQL below without using .raw()
SELECT a.col_id, AVG(a.max_count) FROM (
SELECT col_id,
MAX(count) OVER (PARTITION BY part_id ORDER BY part_id) AS max_count
FROM table_one
) a
GROUP BY a.col_id;
Example
table_one
| id | col_id | part_id | count |
| -- | ------ | ------- | ----- |
| 1 | c1 | p1 | 3 |
| 2 | c2 | p1 | 2 |
| 3 | c3 | p2 | 1 |
| 4 | c2 | p2 | 4 |
First I want to get the max base on the part_id
| id | col_id | part_id | count | max_count |
| -- | ------ | ------- | ----- | --------- |
| 1 | c1 | p1 | 3 | 3 |
| 2 | c2 | p1 | 2 | 3 |
| 3 | c3 | p2 | 1 | 4 |
| 4 | c2 | p2 | 4 | 4 |
And finally get the avarage of max_count group by col_id
| col_id | avg(max_count) |
| ------ | -------------- |
| c1 | 3 |
| c2 | 3.5 |
| c3 | 4 |
The models I have now
def Part(models.Model):
part_id = models.UUIDField(primary_key=True, editable=False, default=uuid.uuid4)
name = models.CharFields()
def Col(models.Model):
part_id = models.UUIDField(primary_key=True, editable=False, default=uuid.uuid4)
name = models.CharFields()
def TableOne(models.Model):
id = models.UUIDField(primary_key=True, editable=False, default=uuid.uuid4)
col_id = models.ForeignKey(
Col,
on_delete=models.CASCADE,
related_name='table_one_col'
)
part_id = models.ForeignKey(
Part,
on_delete=models.CASCADE,
related_name='table_one_part'
)
count = models.IntegerField()
I want to do group by after the partition by. This is the query I did which will bring error.
query = TableOne.objects.annotate(
max_count=Window(
expression=Max('count'),
order_by=F('part_id').asc(),
partition_by=F('part_id')
)
).values(
'col_id'
).annotate(
avg=Avg('max_count')
)
You can use subqueries in Django, you don't need to use window functions. First the subquery is a Part queryset that is annotated with the max count from TableOne
from django.db.models import Avg, Max, Subquery, OuterRef
parts = Part.objects.filter(
id=OuterRef('part_id')
).annotate(
max=Max('table_one_part__count')
)
Then annotate a TableOne queryset with the max count from the subquery, perform values on the column we want to group by (col_id ) and then annotate again with the average to generate your desired output
TableOne.objects.annotate(
max_count=Subquery(parts.values('max')[:1])
).values(
'col_id'
).annotate(
Avg('max_count')
)
Related
Information
I have two models:
class BookingModel(models.Model):
[..fields..]
class BookingComponentModel(models.Model):
STATUS_CHOICES = ['In Progress','Completed','Not Started','Incomplete','Filled','Partially Filled','Cancelled']
STATUS_CHOICES = [(choice,choice) for choice in STATUS_CHOICES]
COMPONENT_CHOICES = ['Test','Soak']
COMPONENT_CHOICES = [(choice,choice) for choice in COMPONENT_CHOICES]
booking = models.ForeignKey(BookingModel, on_delete=models.CASCADE, null=True, blank=True)
component_type = models.CharField(max_length=20, choices=COMPONENT_CHOICES)
status = models.CharField(max_length=50, choices=STATUS_CHOICES, default='Not Started')
order = models.IntegerField(unique=True)
[..fields..]
What I want
I want to get the booking component for each booking which has the last value (maximum) in order. It will also need to have a status='In Progress' and component_type='Soak'.
For example for table:
+----+------------+----------------+-------------+-------+
| id | booking_id | component_type | status | order |
+----+------------+----------------+-------------+-------+
| 1 | 1 | Test | Completed | 1 |
+----+------------+----------------+-------------+-------+
| 2 | 1 | Soak | Completed | 2 |
+----+------------+----------------+-------------+-------+
| 3 | 1 | Soak | In Progress | 3 |
+----+------------+----------------+-------------+-------+
| 4 | 2 | Test | Completed | 1 |
+----+------------+----------------+-------------+-------+
| 5 | 2 | Soak | In Progress | 2 |
+----+------------+----------------+-------------+-------+
| 6 | 3 | Test | In Progress | 1 |
+----+------------+----------------+-------------+-------+
Expected outcome would be id's: 4 & 6
What I've tried
I've tried the following:
BookingComponentModel.objects.values('booking').annotate(max_order=Max('order')).order_by('-booking')
This doesn't include the filtering but returns the max_order for each booking.
I would need the id of the component which has that max_order in order to put this in a sub-query and enable me to filter other conditions (status, component_type)
Thanks
You can make use of a Subquery expression [Django-doc] and work with:
from django.db.models import OuterRef, Subquery
BookingModel.objects.annotate(
latest_component_id=Subquery(BookingComponentModel.objects.filter(
booking_id=OuterRef('pk'), status='In Progress', component_type='Soak'
).values('pk').order_by('-order')[:1])
)
The BookingModel objects that arise from this queryset will have an extra attribute latest_component_id that will contain the primary key of the latest BookingComponentModel with as status 'In Progress', and as component_type 'Soak'.
My models.py looks something like
class RelevanceRelation(TimeStampable, SoftDeletable, models.Model):
relevance_type = models.ForeignKey(
RelevanceType,
on_delete=models.CASCADE,
related_name="relevance_relations"
)
name = models.CharField(max_length=256,
verbose_name="Relevance Relation Name")
def __str__(self):
return self.name
class RelevanceRelationValue(TimeStampable, SoftDeletable, models.Model):
entity = models.ForeignKey(
Entity, on_delete=models.CASCADE,
related_name="relevance_relation_values"
)
relevance_relation = models.ForeignKey(
RelevanceRelation,
on_delete=models.CASCADE,
related_name="values"
)
name = models.CharField(max_length=256,
verbose_name="Relevance Relation Value")
def __str__(self):
return self.name
And I have two querysets
q1 = RelevanceRelationValue.objects.filter(entity=<int>)
q2 = RelevanceRelation.objects.filter(relevance_type=<int>)
Now is there a way to find intersection of q1 and q2 i.e I wan't to display all the values of q2 whose id is present in q1 as rulevance_relation
For example:
q1 = -------------------------------
| entity | relevance_relation |
-------------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
-------------------------------
and q2 = -------------------------------
| id. | relevance_type |
-------------------------------
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 1 |
| 5 | 1 |
| 6 | 1 |
-------------------------------
so q3 should be
-------------------------------
| id. | relevance_type |
-------------------------------
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
-------------------------------
You can perform extra filtering:
q1 = RelevanceRelationValue.objects.filter(entity=some_value1).values('relevance_relation')
q2 = RelevanceRelation.objects.filter(
relevance_type=some_value2,
id__in=q1
)
But it makes more sense to simply filter on the related model, so:
RelevanceRelation.objects.filter(
values__entity=some_value1,
relevance_type=some_value2
).distinct()
Here we thus get all RelevanceRelations for which the relevance_type is some_value2, and for which a related RelevanceRelationValue exists with entity=some_value1.
This question is a follow up question for this SO question : Django Annotated Query to Count Only Latest from Reverse Relationship
Given these models:
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
Represented by these tables:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
I wanted to get a count of each status type, but only include the last status for each candidate:
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
+-----------+-------------+--------+
I was able to achieve this with this answer:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
The issue however, is if there is a status not reference by any status change, it's omitted from the result. Instead, I would like to count it as zero.
For example, if the status were
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
| 4 | Banned |
+----+--------------+
I would get:
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 |
| 2 | Accepted | 1 |
| 3 | Rejected | 1 |
| 4 | Banned | 0 |
+-----------+-------------+--------+
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
What I tried
I solved this by doing an outer join in SQL but I am not sure how to achieve that in Djano.
I tried creating a queryset with all counts annotated as zero and the merging it, but it did not work:
last_status_changes = Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
zero_query = (
Status.objects.all()
.annotate(nlast=Value(0, output_field=IntegerField()))
.exclude(pk__in=last_status_changes.values("id"))
)
>>> qs = last_status_changes | zero_query
>>> [(q.name, q.nlast) for q in qs]
[('Review', 3), ('Accepted', 1), ('Rejected', 1)]
# this would double count "Review" and include not only last but others
Any help is appreciated
Thanks
Update 1
I was able to solve this with a Raw Query using a right join, but would be great to do this using the ORM
# Untested as I am using different model names in reality
SQL = """SELECT
Min(status.id) as id
, COUNT(latest_status_change.candidate_id) as status_count
FROM
(
SELECT
candidate_id,
Max(created_at) AS latest_date
FROM
api_status_change
GROUP BY candidate_id
)
AS latest_status_change
INNER JOIN api_candidates ON (latest_status_change.candidate_id = api_candidates.id)
INNER JOIN api_status_change ON
(
latest_status_change.candidate_id = api_candidates.id
AND
latest_status_change.latest_date = api_status_change.created_at
)
RIGHT JOIN api_status AS status ON (api_status_change.status_id = `status`.id)
GROUP BY status.name
;
"""
qs = Status.objects.raw(SQL)
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
The only one problem here is that you are filtering your State queryset by existing status changes and expecting complete opposite results. In your case the solution is to get rid of obsolete filtering
last_status_changes = Status.objects.annotate(
nlast=Count('status_changes')
).order_by(
'-nlast'
)
The other case would be if you want really filter you changes (by date for example)
changed_status_ids = Status.objects.filter(
status_changes__created_at__gte='2020-03-03'
).values_list(
'id',
flat=True
)
Status.objects.annotate(
c=Count('status_changes')
).annotate(
cnt=Case(
When(
id__in=changed_status_ids,
then=F('c')
),
output_field=models.IntegerField(),
default=0
)
).values(
'cnt',
'name'
).order_by(
'-cnt'
)
I solved it with the queryset below:
qs_last_status_changes = StatusChanges.objects
.annotate(
_last_change=models.Max("candidate__status_changes__create_at")
).filter(created_at=models.F("_last_change")
qs_status = Status.objects\
.annotate(count=models.Sum(
models.Case(
models.When(
status_changes__in=qs_last_status_changes,
then=models.Value(1)
),
output_field=models.IntegerField(),
default=0,
)
)
)
>>> [(k.name, k.count) for k in qs_status]
[('Review', 2), ('Accepted', 1), ('Rejected', 1), ('Accepted 0)]
Thank you Andrey Nelubin for your suggestion
Problem Overview
Given the models
class Candidate(BaseModel):
name = models.CharField(max_length=128)
class Status(BaseModel):
name = models.CharField(max_length=128)
class StatusChange(BaseModel):
candidate = models.ForeignKey("Candidate", related_name="status_changes")
status = models.ForeignKey("Status", related_name="status_changes")
created_at = models.DateTimeField(auto_now_add=True, blank=True)
And SQL Tables:
candidates
+----+--------------+
| id | name |
+----+--------------+
| 1 | Beth |
| 2 | Mark |
| 3 | Mike |
| 4 | Ryan |
+----+--------------+
status
+----+--------------+
| id | name |
+----+--------------+
| 1 | Review |
| 2 | Accepted |
| 3 | Rejected |
+----+--------------+
status_change
+----+--------------+-----------+------------+
| id | candidate_id | status_id | created_at |
+----+--------------+-----------+------------+
| 1 | 1 | 1 | 03-01-2019 |
| 2 | 1 | 2 | 05-01-2019 |
| 4 | 2 | 1 | 01-01-2019 |
| 5 | 3 | 1 | 01-01-2019 |
| 6 | 4 | 3 | 01-01-2019 |
+----+--------------+-----------+------------+
I want to get the get the total number of candidates with a given status, but only the latest status_change is counted.
In other words, StatusChange is used to track history of status, but only the latest is considered when counting current status of candidates.
SQL Solution
Using SQL, I was able to achieve it using Group BY and COUNT.
(SQL untested)
SELECT
status.id as status_id
, status.name as status_name
, COUNT(*) as status_count
FROM
(
SELECT
status_id,
Max(created_at) AS latest_status_change
FROM
status_change
GROUP BY status_id
)
AS last_status_count
INNER JOIN
last_status_count AS status
ON (last_status_count.status_id = status.id)
GROUP BY status.name
ORDER BY status_count DESC;
last_status_count
+-----------+-------------+--------+
| status_id | status_name | count |
+-----------+-------------+--------+
| 1 | Review | 2 | # <= Does not include instance from candidate 1
| 2 | Accepted | 1 | # because status 2 is latest
| 3 | Rejected | 1 |
+-----------+-------------+--------+
Attempted Django Solution
I need a view to return each status and their corresponding count -
eg [{ status_name: "Review", count: 2 }, ...]
I am not sure how to build this queryset, without pulling all records and aggregating in python.
I figured I need annotate() and possibly Subquery but I haven't been able to stitch it all together.
The closest I got is this, which counts the number of status change for each status but does counts non-latest changes.
queryset = Status.objects.all().annotate(case_count=Count("status_changes"))
I have found lot's of SO questions on aggregating, but I couldn't find a clear answer on aggregating and annotating "latest.
Thanks in advance.
We can perform a query where we first filter the last StatusChanges per Candidate and then count the statusses:
from django.db.models import Count, F, Max
Status.objects.filter(
status_changes__in=StatusChange.objects.annotate(
last=Max('candidate__status_changes__created_at')
).filter(
created_at=F('last')
)
).annotate(
nlast=Count('status_changes')
)
For the given sample data, this gives us:
>>> [(q.name, q.nlast) for q in qs]
[('Review', 2), ('Accepted', 1), ('Rejected', 1)]
I am not well in django orm advance query.
problem is:
select rows with max a column value distinct by other column and order by date (desc)
I found answer on stackoverflow see this
Answer in MYSQL is this (from above give url)
SELECT tt.*
FROM topten tt
INNER JOIN
(SELECT home, MAX(datetime) AS MaxDateTime
FROM topten
GROUP BY home) groupedtt
ON tt.home = groupedtt.home
AND tt.datetime = groupedtt.MaxDateTime
Its Equivalent Django ORM query ..
My Model is
class ModelStatus(models.Model):
name = models.CharField(max_length=100, default='---')
status = models.CharField(max_length=10, default='---')
fall = models.CharField(max_length=100, default='---')
rise = models.CharField(max_length=100, default='---')
add_date = models.DateTimeField(auto_now_add=False, auto_now=True)
error_message = models.TextField(default=' ')
to = models.ForeignKey('model1',
related_name="status",
related_query_name="hstatus")
distinct column: name
max value column: add_date
EXAMPLE
id name add_date status fall_count | rise_count
---|-----|------------|--------|-----------|--------
1 | ab | 04/03/2009 | up | 399 | 100
2 | aa | 04/03/2009 | down | 244 | 200
3 | aa | 03/03/2009 | down | 555 | 210
4 | ba | 03/03/2009 | up | 300 | 256
5 | ab | 03/03/2009 | up | 200 | 145
OUTPUT
id name add_date status fall_count | rise_count
---|-----|------------|--------|-----------|--------
1 | ab | 04/03/2009 | up | 399 | 100
2 | aa | 04/03/2009 | down | 244 | 200
4 | ba | 03/03/2009 | up | 300 | 256