Nested query in Django ORM - python

i'm try to remove duplicated row with distinct or annotate then measure count of something.(mysql:8 ,django:2.2)
sql
select t1.cp_id, count(*) from (
SELECT user_id, product_id, cp_id, count(*) as cnt FROM A
where created_at between '2021-07-12' and '2021-07-13'
group by user_id, product_id, cp_id
) as t1
group by t1.cp_id
my queryset
A.objects.filter(
created_at__gte='2021-07-12',
created_at__lt='2021-07-13'
).values('cp_id', 'user_id', 'product_id').annotate(cnt=Count('cp_id')).values('cp_id').annotate(count=Count('cp_id'))
this queryset sql
SELECT A.cp_id, COUNT(cp_id`) AS count FROM A
WHERE (
created_at >= 2021-07-11 19:30:00 AND
created_at < 2021-07-12 19:30:00 AND
)
GROUP BY cp_id, created_at
ORDER BY created_at ASC
I'm confused why Django ignores the order of my commands and executes the last query commands on the original table (not the table where the duplicate data is deleted)
Thanks if you have a solution or idea to remove the rows that have the same 'cp_id', 'user_id', 'product_id' (not from the database), then calculate the number of repetitions per cp_id
my modele.py
class A(models.Model):
product = models.ForeignKey(Product,on_delete=models.CASCADE)
user = models.ForeignKey(User, on_delete=models.CASCADE)
cp = models.ForeignKey(CP, on_delete=models.CASCADE)
created_at = models.DateTimeField()

Try this query:
A.objects.filter(
created_at__range=[2021-07-12, 2021-07-14],
).annotate(cp_count=Count('cp_id')).values('cp_count', 'user_id', 'product_id', 'created_at').order_by('created_at')

Related

Inner joins with timestamps and no foreign key in Django ORM

Django documentation states to check the ORM before writing raw SQL, but I am not aware of a resource that explains how to perform inner joins between tables without foreign keys, something that would be relatively simple to execute in a few lines of SQL.
Many of the tables in my database need to be joined by time-related properties or intervals. For example, in this basic example I need to first take a subset of change_events, and then join a different table ais_data on two separate columns:
models.py
class AisData(models.Model):
navstatus = models.BigIntegerField(blank=True, null=True)
start_ts = models.DateTimeField()
end_ts = models.DateTimeField(blank=True, null=True)
destination = models.TextField(blank=True, null=True)
draught = models.FloatField(blank=True, null=True)
geom = models.PointField(srid=4326)
imo = models.BigIntegerField(primary_key=True)
class Meta:
managed = False
db_table = 'ais_data'
unique_together = (('imo', 'start_ts'),)
class ChangeEvent(models.Model):
event_id = models.BigIntegerField(primary_key=True)
imo = models.BigIntegerField(blank=True, null=True)
timestamp = models.DateTimeField(blank=True, null=True)
class Meta:
managed = False
db_table = 'change_event'
First I take a subset of change_events which returns a QuerySet object. Using the result of this query, I need to get all of the records in ais_data that match on imo and timestamp - so the raw SQL would look exactly like this:
WITH filtered_change_events AS (
SELECT * FROM change_events WHERE timestamp BETWEEN now() - interval '1' day and now()
)
SELECT fce.*, ad.geom FROM filtered_change_events fce JOIN ais_data ad ON fce.imo = ad.imo AND fce.timestamp = ad.start_ts
views.py
from .models import ChangeEvent
from .models import AisData
from datetime import datetime, timedelta
start_period = datetime.now() - timedelta(hours=24)
end_period = datetime.now()
subset_change_events = ChangeEvent.filter(timestamp__gte=start_period,
timestamp__lte=end_period)
#inner join
subset_change_events.filter()?
How would one write this relatively simple query using the language of Django ORM? I am finding it difficult to make a simple inner join on two columns in without using a foreign key? Any advice or links to resources would be helpful.

rawsql equivalent django queryset

I would like to write django queryset which is equivalent of below query with one hit in db. Right now I am using manager.raw() to execute.
With annotate, I can generate the inner query. But I can't use that in the filter condition (when I checked queryset.query, it looks like ex1).
select *
from table1
where (company_id, year) in (select company_id, max(year) year
from table1
where company_id=3
and total_employees is not null
group by company_id);
Ex1:
SELECT `table1`.`company_id`, `table1`.`total_employees`
FROM `table1`
WHERE `table1`.`id` = (SELECT U0.`company_id` AS Col1, MAX(U0.`year`) AS `year`
FROM `table1` U0
WHERE NOT (U0.`total_employees` IS NULL)
GROUP BY U0.`company_id`
ORDER BY NULL)
Model:
class Table1(models.Model):
year = models.IntegerField(null=False, validators=[validate_not_null])
total_employees = models.FloatField(null=True, blank=True)
company = models.ForeignKey('Company', on_delete=models.CASCADE, related_name='dummy_relation')
last_modified = models.DateTimeField(auto_now=True)
updated_by = models.CharField(max_length=100, null=False, default="research")
class Meta:
unique_together = ('company', 'year',)
I appreciate your response.
You can use OuterRef and Subquery to achive it. Try like this:
newest = Table1.objects.filter(company=OuterRef('pk'), total_employees_isnull=False).order_by('-year')
companies = Company.objects.annotate(total_employees=Subquery(newest.values('total_employees')[:1])).annotate(max_year=Subquery(newest.values('year')[:1]))
# these queries will not execute until you call companies. So DB gets hit once
Show values:
# all values
companies.values('id', 'total_employees', 'max_year')
# company three values
company_three_values = companies.filter(id=3).values('id', 'total_employees', 'max_year')
Filter on Max Year:
companies_max = companies.filter(max_year__gte=2018)
FYI: OuterRef and Subquery is available in Django from version 1.11
if you have model name is Table1, try this.
Table1.objects.get(pk=Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first().id)
This maybe one hit in db.
But if .first() not match anything. Better like this:
filter_item = Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first()
if filter_item:
return Table1.objects.get(pk=filter_item.id)

Django query api: complex subquery

I wasted lots of time trying to compose such query. Here my models:
class User(Dealer):
pass
class Post(models.Model):
text = models.CharField(max_length=500, default='')
date = models.DateTimeField(default=timezone.now)
interactions = models.ManyToManyField(User, through='UserPostInteraction', related_name='post_interaction')
class UserPostInteraction(models.Model):
post = models.ForeignKey(Post, related_name='pppost')
user = models.ForeignKey(User, related_name='uuuuser')
status = models.SmallIntegerField()
DISCARD = -1
VIEWED = 0
LIKED = 1
DISLIKED = 2
And what i need:
Subquery is: (UserPostInteractions where status = LIKED) - (UserPostInteractions where status = DISLIKED) of Post(OuterRef('pk'))
Query is : Select all posts order by value of subquery.
I'm stuck at error Subquery returned multiple rows
Elp!!))
If i understand correctly your needs, you can get what you need with such qs:
from django.db.models import Case, Sum, When, IntegerField
posts = Post.objects.values('id', 'text', 'date').annotate(
rate=Sum(Case(
When(pppost__status=1, then=1),
When(pppost__status=2, then=-1),
default=0,
output_field=IntegerField()
))
).order_by('rate')
In MySql it converts in such sql query:
SELECT
`yourapp_post`.`id`,
`yourapp_post`.`text`,
`yourapp_post`.`date`,
SUM(
CASE
WHEN `yourapp_userpostinteraction`.`status` = 1
THEN 1
WHEN `yourapp_userpostinteraction`.`status` = 2
THEN -1
ELSE 0
END) AS `rate`
FROM `yourapp_post`
LEFT OUTER JOIN `yourapp_userpostinteraction` ON (`yourapp_post`.`id` = `yourapp_userpostinteraction`.`post_id`)
GROUP BY `yourapp_post`.`id`
ORDER BY `rate` ASC

django left outer join with condition

how to put condition in left outer join query in django.I have given sample query in below.
I need equivalent django query for below sql query.
Table 1
class LeadGroups(Audit):
user = models.ForeignKey(User)
group_name = models.CharField(max_length=250)
Table 2
class lead(Audit):
group = models.ForeignKey(LeadGroups, null='true', blank='true')
First_Name = models.CharField(max_length=255, null='true', blank='true')
Last_Name = models.CharField(max_length=255, null='true', blank='true')
Required query
SELECT "lead_leadgroups"."id", "lead_leadgroups"."created_by_id", "lead_leadgroups"."modified_by_id", "lead_leadgroups"."created_at", "lead_leadgroups"."modified_at", "lead_leadgroups"."customer_id", "lead_leadgroups"."user_id", "lead_leadgroups"."group_name", COUNT("lead_lead"."id") AS "totlacontact" FROM "lead_leadgroups"
LEFT OUTER JOIN "lead_lead" ON ( "lead_leadgroups"."id" = "lead_lead"."group_id" and **"lead_lead"."data_value" = True**)
WHERE ("lead_leadgroups"."customer_id" = 309 ) GROUP BY "lead_leadgroups"."id", "lead_leadgroups"."created_by_id", "lead_leadgroups"."modified_by_id", "lead_leadgroups"."created_at", "lead_leadgroups"."modified_at", "lead_leadgroups"."customer_id", "lead_leadgroups"."user_id", "lead_leadgroups"."group_name"
ORDER BY "lead_leadgroups"."id" ASC LIMIT 100

Erroneous group_by query generated in python django

I am using Django==1.8.7 and I have the following models
# a model in users.py
class User(models.Model):
id = models.AutoField(primary_key=True)
username = models.CharField(max_length=100, blank=True)
displayname = models.CharField(max_length=100, blank=True)
# other fields deleted
# a model in healthrepo.py
class Report(models.Model):
id = models.AutoField(primary_key=True)
uploaded_by = models.ForeignKey(User, related_name='uploads',
db_index=True)
owner = models.ForeignKey(User, related_name='reports', db_index=True)
# other fields like dateofreport, deleted
I use the following Django queryset:
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
)
I see that this generates the following query:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname", "healthrepo_report"."dateofreport", "healthrepo_report”.”owner_id", "healthrepo_report"."uploaded_by_id"
ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
However, what I really wanted was just grouping based on "healthrepo_report”.”owner_id" and not multiple fields. i.e. What I wanted was:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname" ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
I am wondering why this is happening and how do I get grouping based on single column.
I just saw this post:
Django annotate and values(): extra field in 'group by' causes unexpected results
Changing the query by adding empty order_by() fixes it
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
).order_by()

Categories

Resources