Django use LEFT JOIN instead of INNER JOIN - python

I have two models: Comments and CommentFlags
class Comments(models.Model):
content_type = models.ForeignKey(ContentType,
verbose_name=_('content type'),
related_name="content_type_set_for_%(class)s",
on_delete=models.CASCADE)
object_pk = models.CharField(_('object ID'), db_index=True, max_length=64)
content_object = GenericForeignKey(ct_field="content_type", fk_field="object_pk")
submit_date = models.DateTimeField(_('date/time submitted'), default=None, db_index=True)
...
...
class CommentFlags(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="comment_flags",
on_delete=models.CASCADE)
comment = models.ForeignKey(Comment, related_name="flags", on_delete=models.CASCADE)
flag = models.CharField(max_length=30, db_index=True)
...
...
CommentFlags flag can have values: like, dislike etc.
Problem Statement: I want to get all Comments sorted by number of likes in DESC manner.
Raw Query for above problem statement:
SELECT
cmnts.*, coalesce(cmnt_flgs.num_like, 0) as num_like
FROM
comments cmnts
LEFT JOIN
(
SELECT
comment_id, Count(comment_id) AS num_like
FROM
comment_flags
WHERE
flag='like'
GROUP BY comment_id
) cmnt_flgs
ON
cmnt_flgs.comment_id = cmnts.id
ORDER BY
num_like DESC
I have not been able to convert the above query in Django ORM Queryset.
What I have tried so far...
>>> qs = (Comment.objects.filter(flags__flag='like').values('flags__comment_id')
.annotate(num_likes=Count('flags__comment_id')))
which generates different query.
>>> print(qs.query)
>>> SELECT "comment_flags"."comment_id",
COUNT("comment_flags"."comment_id") AS "num_likes"
FROM "comments"
INNER JOIN "comment_flags"
ON ("comments"."id" = "comment_flags"."comment_id")
WHERE "comment_flags"."flag" = 'like'
GROUP BY "comment_flags"."comment_id",
"comments"."submit_date"
ORDER BY "comments"."submit_date" ASC
LIMIT 21
Problem with above ORM queryset is, it uses InnerJoin and also I don't know how it adds submit_date in groupby clause.
Can you please suggest me a way to convert above mentioned Raw query to Django ORM queryset ?

You can try using filter argument in Count:
qs = (Comment.objects.all()
.annotate(num_likes=Count('flags__comment_id', filter=Q(flags__flag='like'))))
It may produce slightly different query that you're expecting, depending on the database backend, but it should have equivalent behavior.

Related

Using annotate and distinct(field) together in Django

I've got a bunch of reviews in my app. Users are able to "like" reviews.
I'm trying to get the most liked reviews. However, there are some popular users on the app, and all their reviews have the most likes. I want to only select one review (ideally the most liked one) per user.
Here are my objects,
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
class LikeReview(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='likereview_user', db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, related_name='likereview_review', db_index=True)
date_added = models.DateTimeField()
class Meta:
unique_together = [['user', 'review']]
And here's what I currently have to get the most liked reviews:
reviews = Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).order_by('-num_likes').distinct()
As you can see, the reviews I get will be sorted by the most likes, but its possible that the top liked reviews are all by the same user. I want to add distinct('user') here but I get annotate() + distinct(fields) is not implemented.
How can I accomplish this?
This will be a bit badly readable because of your related names. I would suggest to change Review.user.related_name to reviews, it will make this much more understandable, but I've elaborated on that in the second part of the answer.
With your current setup, I managed to do it fully in the DB using subqueries:
from django.db.models import Subquery, OuterRef, Count
# No DB Queries
best_reviews_per_user = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(user=OuterRef('id'))
# No DB Queries
review_sq = Subquery(best_reviews_per_user.values('review_id')[:1])
# First DB Query
best_review_ids = User.objects.all()\
.annotate(best_review_id=review_sq)\
.values_list('best_review_id', flat=True)
# Second DB Query
best_reviews = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(review_id__in=best_review_ids)\
.exclude(num_likes=0) # I assume this is the case
# Print it
for review in best_reviews:
print(review, review.num_likes, review.user)
# Test it
assert len({review.user for review in best_reviews}) == len(best_reviews)
assert sorted([r.num_likes for r in best_reviews], reverse=True) == [r.num_likes for r in best_reviews]
assert all([r.num_likes for r in best_reviews])
Let's try with this completely equivalent model structure:
from django.db import models
from django.utils import timezone
class TimestampedModel(models.Model):
"""This makes your life much easier and is pretty DRY"""
created = models.DateTimeField(default=timezone.now)
class Meta:
abstract = True
class Review(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='reviews', db_index=True)
text = models.TextField(max_length=5000)
rating = models.SmallIntegerField()
likes = models.ManyToManyField(User, through='ReviewLike')
class ReviewLike(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, db_index=True)
The likes are a clear m2m relationship between reviews and users, with an extra timestamp column - it's a model use for a Through model. Docs here.
Now everything is imho much much easier to read.
from django.db.models import OuterRef, Count, Subquery
# No DB Queries
best_reviews = Review.objects.all()\
.annotate(like_count=Count('likes'))\
.exclude(like_count=0)\
.order_by('-like_count')\
# No DB Queries
sq = Subquery(best_reviews.filter(user=OuterRef('id')).values('id')[:1])
# First DB Query
user_distinct_best_review_ids = User.objects.all()\
.annotate(best_review=sq)\
.values_list('best_review', flat=True)
# Second DB Query
best_reviews = best_reviews.filter(id__in=user_distinct_best_review_ids).all()
One way of doing it is as follows:
Get a list of tuples that represent the user.id and review.id, ordered by user and number of likes ASCENDING
Convert the list to a dict to remove duplicate user.ids. Later items replace earlier ones, which is why the ordering in step 1 is important
Create a list of review.ids from the values in the dict
Get a queryset using the list of review.ids, ordered by the number of likes DESCENDING
from django.db.models import Count
user_review_list = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.order_by('user', 'num_likes')\
.values_list('user', 'pk')
user_review_dict = dict(user_review_list)
review_pk_list = list(user_review_dict.values())
reviews = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.filter(pk__in=review_pk_list)\
.order_by('-num_likes')

rawsql equivalent django queryset

I would like to write django queryset which is equivalent of below query with one hit in db. Right now I am using manager.raw() to execute.
With annotate, I can generate the inner query. But I can't use that in the filter condition (when I checked queryset.query, it looks like ex1).
select *
from table1
where (company_id, year) in (select company_id, max(year) year
from table1
where company_id=3
and total_employees is not null
group by company_id);
Ex1:
SELECT `table1`.`company_id`, `table1`.`total_employees`
FROM `table1`
WHERE `table1`.`id` = (SELECT U0.`company_id` AS Col1, MAX(U0.`year`) AS `year`
FROM `table1` U0
WHERE NOT (U0.`total_employees` IS NULL)
GROUP BY U0.`company_id`
ORDER BY NULL)
Model:
class Table1(models.Model):
year = models.IntegerField(null=False, validators=[validate_not_null])
total_employees = models.FloatField(null=True, blank=True)
company = models.ForeignKey('Company', on_delete=models.CASCADE, related_name='dummy_relation')
last_modified = models.DateTimeField(auto_now=True)
updated_by = models.CharField(max_length=100, null=False, default="research")
class Meta:
unique_together = ('company', 'year',)
I appreciate your response.
You can use OuterRef and Subquery to achive it. Try like this:
newest = Table1.objects.filter(company=OuterRef('pk'), total_employees_isnull=False).order_by('-year')
companies = Company.objects.annotate(total_employees=Subquery(newest.values('total_employees')[:1])).annotate(max_year=Subquery(newest.values('year')[:1]))
# these queries will not execute until you call companies. So DB gets hit once
Show values:
# all values
companies.values('id', 'total_employees', 'max_year')
# company three values
company_three_values = companies.filter(id=3).values('id', 'total_employees', 'max_year')
Filter on Max Year:
companies_max = companies.filter(max_year__gte=2018)
FYI: OuterRef and Subquery is available in Django from version 1.11
if you have model name is Table1, try this.
Table1.objects.get(pk=Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first().id)
This maybe one hit in db.
But if .first() not match anything. Better like this:
filter_item = Table1.objects.filter(company_id=3, total_employees_isnull=False).latest('year').first()
if filter_item:
return Table1.objects.get(pk=filter_item.id)

Sub-query to make use of different distinct & orderby

I need to use different order_by & distinct values, and I have made an attempt using a subquery.
How can I achieve this?
Could a qset select the Products I want, and then in a separate query, select the 15 Variations whose price you want to display?
In other words: Qset randomly selects product ID's (in a queryset), then python tells it to return a queryset of just those 15 items.
Speeding up the query too is important- as it takes ~800ms (when I order_by the pk) or 5.8seconds when I use order_by '?'.
My attempt:
distinct_qs = (
Product.objects
.distinct('id')
)
qset = (
Product.objects
.filter(pk__in=distinct_qs)
.order_by('rating', '?')
.values('name', 'image',)
.annotate(
price=F('variation__price__price'),
id=F('pk'),
vari=F('variation'),
)[:15]
)
Sample of output data:
{"name":"Test Item","vari":10, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Test Item","vari":11, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Another one","vari":14, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"Another one","vari":15, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"And Again","vari":17, id":3, "price":"12", "image":"xyz.com/3.jpg"},
{"name":"And Again","vari":18, id":3, "price":"12", "image":"xyz.com/3.jpg"},
Desired output data:
{"name":"Test Item","vari":13, id":1, "price":"80", "image":"xyz.com/1.jpg"},
{"name":"Another one","vari":14, id":2, "price":"10", "image":"xyz.com/2.jpg"},
{"name":"And Again","vari":17, id":3, "price":"12", "image":"xyz.com/3.jpg"},
Sample of models.py
class Product(models.Model):
name = models.CharField ("Name", max_length=400)
...
class Variation(models.Model):
product = models.ForeignKey(Product, db_index=True, blank=False, null=False)
...
class Image(models.Model):
variation = models.ForeignKey(Variation, blank=False, null=False)
image = models.URLField(max_length=540, blank=True, null=True)
class Price(models.Model):
price = models.DecimalField("Price", decimal_places=2, max_digits=10)
variation = models.ForeignKey(Variation, blank=False, null=False)
I think you should write a custom model manager (see https://docs.djangoproject.com/en/1.9/topics/db/managers/ ) and create a method there which you then would use for returning variations instead of a standard query.
For randomising you could do like this:
select the last id of Variation (or Product), then generate different random 15 ids from that interval and then just pull objects with those ids from database. I think it should work faster.

Erroneous group_by query generated in python django

I am using Django==1.8.7 and I have the following models
# a model in users.py
class User(models.Model):
id = models.AutoField(primary_key=True)
username = models.CharField(max_length=100, blank=True)
displayname = models.CharField(max_length=100, blank=True)
# other fields deleted
# a model in healthrepo.py
class Report(models.Model):
id = models.AutoField(primary_key=True)
uploaded_by = models.ForeignKey(User, related_name='uploads',
db_index=True)
owner = models.ForeignKey(User, related_name='reports', db_index=True)
# other fields like dateofreport, deleted
I use the following Django queryset:
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
)
I see that this generates the following query:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname", "healthrepo_report"."dateofreport", "healthrepo_report”.”owner_id", "healthrepo_report"."uploaded_by_id"
ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
However, what I really wanted was just grouping based on "healthrepo_report”.”owner_id" and not multiple fields. i.e. What I wanted was:
SELECT T3."username", T3."displayname", COUNT(T3."username") AS "total" FROM "healthrepo_report"
INNER JOIN "users_user" T3 ON ( "healthrepo_report"."uploaded_by_id" = T3."id" )
WHERE "healthrepo_report”.”owner_id" = 1
GROUP BY T3."username", T3."displayname" ORDER BY "healthrepo_report"."dateofreport" DESC, "healthrepo_report"."user_id" ASC, "healthrepo_report"."uploaded_by_id" ASC
I am wondering why this is happening and how do I get grouping based on single column.
I just saw this post:
Django annotate and values(): extra field in 'group by' causes unexpected results
Changing the query by adding empty order_by() fixes it
Report.objects.filter(owner__id=1).values('uploaded_by__username',
'uploaded_by__displayname').annotate(
total=Count('uploaded_by__username')
).order_by()

How to perform this sql in django model?

SELECT *, SUM( cardtype.price - cardtype.cost ) AS profit
FROM user
LEFT OUTER JOIN card ON ( user.id = card.buyer_id )
LEFT OUTER JOIN cardtype ON ( card.cardtype_id = cardtype.id )
GROUP BY user.id
ORDER BY profit DESC
I tried this:
User.objects.extra(select=dict(profit='SUM(cardtype.price-cardtype.cost)')).annotate(sum=Sum('card__cardtype__price')).order_by('-profit')
But Django automatically added SUM( cardtype.price ) to the GROUP BY clause, and the SQL doesn't run.
Can this be done without raw SQLs?
Provide the model, never mind these Chinese characters :)
class User(models.Model):
class Meta:
verbose_name = "用户"
verbose_name_plural = "用户"
ordering = ['-regtime']
user_status= (
("normal", "正常"),
("deregistered", "注销"),
("locked", "锁定"),
)
name = models.CharField("姓名", max_length=20, db_index=True)
spec_class = models.ForeignKey(SpecClass, verbose_name="专业班级")
idcard = models.CharField("身份证号", max_length=18)
mobileno = models.CharField("手机号", max_length=11)
password = models.CharField("密码", max_length=50) # plain
address = models.CharField("住址", max_length=100)
comment = models.TextField("备注")
certserial = models.CharField("客户证书序列号", max_length=100)
regtime = models.DateTimeField("注册时间", default=datetime.datetime.now)
lastpaytime = models.DateTimeField("上次付款时间", default=datetime.datetime.now)
credit = models.FloatField("信用额度", default=100)
money = models.FloatField("余额", default=0)
use_password = models.BooleanField("使用密码")
use_fetion = models.BooleanField("接收飞信提示")
status = models.CharField("账户状态", choices = user_status, default="normal", max_length=20, db_index=True)
def __unicode__(self):
return self.name
class CardType(models.Model):
class Meta:
verbose_name = "点卡类型"
verbose_name_plural = "点卡类型"
ordering = ['name']
name = models.CharField("类型名称", max_length=20, db_index=True)
note = models.CharField("说明", max_length=100)
offcial = models.BooleanField("官方卡", default=True)
available = models.BooleanField("可用", default=True, db_index=True)
payurl = models.CharField("充值地址", max_length=200)
price = models.FloatField("价格")
cost = models.FloatField("进货价格")
def __unicode__(self):
return u"%s(%.2f元%s)" % (self.name, self.price, u", 平台卡" if not self.offcial else "")
def profit(self):
return self.price - self.cost
profit.short_description = "利润"
class Card(models.Model):
class Meta:
verbose_name = "点卡"
verbose_name_plural = "点卡"
ordering = ['-createtime']
card_status = (
("instock", "未上架"),
("available", "可用"),
("sold", "已购买"),
("invalid", "作废"),
("returned", "退卡"), # sell to the same person !
("reselled", "退卡重新售出"),
)
cardtype = models.ForeignKey(CardType, verbose_name="点卡类型")
serial = models.CharField("卡号", max_length=40)
password = models.CharField("卡密", max_length=20)
status = models.CharField("状态", choices = card_status, default="instock", max_length=20, db_index=True)
createtime = models.DateTimeField("入库时间")
buytime = models.DateTimeField("购买时间", blank=True, null=True)
buyer = models.ForeignKey(User, blank=True, null=True, verbose_name="买家")
def __unicode__(self):
return u'%s[%s]' % (self.cardtype.name, self.serial)
First, one of the outer joins appears to be a bad idea for this kind of thing. Since you provided no information on your model, I can only guess.
Are you saying that you may not have a CARD for each user? That makes some sense.
Are you also saying that some cards don't have card types? That doesn't often make sense. You haven't provided any details. However, if a Card doesn't have a Card Type, I'll bet you have either problems elsewhere in your application, or you've chosen really poor names that don't provide the least clue as to what these things mean. You should fix the other parts of your application to assure that each card actually does have a card type. Or you should fix your names to be meaningful.
Clearly, the ORM statement uses inner joins and your SQL uses outer joins. What's the real question? How to do outer joins correctly?
If you take the time to search for [Django] and Left Outer Join, you'll see that the Raw SQL is a terrible idea.
Or is the real question how to do the sum correctly? From your own answer it appears that the SQL is wrong and you're really having trouble with the sum. If so, please clean up the SQL to be correct.
If the outer joins are part of the problem -- not just visual noise -- then you have to do something like this for an outer join with a sum.
def user_profit():
for u in User.objects.all():
profit = sum[ t.price - t.cost
for c in u.card_set.all()
for t in c.cardtype_set.all() ]
yield user, profit
In your view function, you can then provide the value of function to the template to render the report. Since it's a generator, no huge list is created in memory. If you need to paginate, you can provide the generator to the paginator and everything works out reasonably well.
This is often of comparable speed to a complex raw SQL query with a lot of outer joins.
If, indeed, the card to card-type relationship is not actually optional, then you can shorten this, somewhat. You still have an outer join to think about.
def user_profit():
for u in User.objects.all():
profit = sum[ c.cardtype.price - c.cardtype.cost
for c in u.card_set.all() ]
yield user, profit
Well, I found this
Sum computed column in Django QuerySet
Have to use raw SQL now...
Thank you two!

Categories

Resources