Using annotate and distinct(field) together in Django - python

I've got a bunch of reviews in my app. Users are able to "like" reviews.
I'm trying to get the most liked reviews. However, there are some popular users on the app, and all their reviews have the most likes. I want to only select one review (ideally the most liked one) per user.
Here are my objects,
class Review(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='review_user', db_index=True)
review_text = models.TextField(max_length=5000)
rating = models.SmallIntegerField(
validators=[
MaxValueValidator(10),
MinValueValidator(1),
],
)
date_added = models.DateTimeField(db_index=True)
review_id = models.AutoField(primary_key=True, db_index=True)
class LikeReview(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='likereview_user', db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, related_name='likereview_review', db_index=True)
date_added = models.DateTimeField()
class Meta:
unique_together = [['user', 'review']]
And here's what I currently have to get the most liked reviews:
reviews = Review.objects.filter().annotate(
num_likes=Count('likereview_review')
).order_by('-num_likes').distinct()
As you can see, the reviews I get will be sorted by the most likes, but its possible that the top liked reviews are all by the same user. I want to add distinct('user') here but I get annotate() + distinct(fields) is not implemented.
How can I accomplish this?

This will be a bit badly readable because of your related names. I would suggest to change Review.user.related_name to reviews, it will make this much more understandable, but I've elaborated on that in the second part of the answer.
With your current setup, I managed to do it fully in the DB using subqueries:
from django.db.models import Subquery, OuterRef, Count
# No DB Queries
best_reviews_per_user = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(user=OuterRef('id'))
# No DB Queries
review_sq = Subquery(best_reviews_per_user.values('review_id')[:1])
# First DB Query
best_review_ids = User.objects.all()\
.annotate(best_review_id=review_sq)\
.values_list('best_review_id', flat=True)
# Second DB Query
best_reviews = Review.objects.all()\
.annotate(num_likes=Count('likereview_review'))\
.order_by('-num_likes')\
.filter(review_id__in=best_review_ids)\
.exclude(num_likes=0) # I assume this is the case
# Print it
for review in best_reviews:
print(review, review.num_likes, review.user)
# Test it
assert len({review.user for review in best_reviews}) == len(best_reviews)
assert sorted([r.num_likes for r in best_reviews], reverse=True) == [r.num_likes for r in best_reviews]
assert all([r.num_likes for r in best_reviews])
Let's try with this completely equivalent model structure:
from django.db import models
from django.utils import timezone
class TimestampedModel(models.Model):
"""This makes your life much easier and is pretty DRY"""
created = models.DateTimeField(default=timezone.now)
class Meta:
abstract = True
class Review(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, related_name='reviews', db_index=True)
text = models.TextField(max_length=5000)
rating = models.SmallIntegerField()
likes = models.ManyToManyField(User, through='ReviewLike')
class ReviewLike(TimestampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE, db_index=True)
review = models.ForeignKey(Review, on_delete=models.CASCADE, db_index=True)
The likes are a clear m2m relationship between reviews and users, with an extra timestamp column - it's a model use for a Through model. Docs here.
Now everything is imho much much easier to read.
from django.db.models import OuterRef, Count, Subquery
# No DB Queries
best_reviews = Review.objects.all()\
.annotate(like_count=Count('likes'))\
.exclude(like_count=0)\
.order_by('-like_count')\
# No DB Queries
sq = Subquery(best_reviews.filter(user=OuterRef('id')).values('id')[:1])
# First DB Query
user_distinct_best_review_ids = User.objects.all()\
.annotate(best_review=sq)\
.values_list('best_review', flat=True)
# Second DB Query
best_reviews = best_reviews.filter(id__in=user_distinct_best_review_ids).all()

One way of doing it is as follows:
Get a list of tuples that represent the user.id and review.id, ordered by user and number of likes ASCENDING
Convert the list to a dict to remove duplicate user.ids. Later items replace earlier ones, which is why the ordering in step 1 is important
Create a list of review.ids from the values in the dict
Get a queryset using the list of review.ids, ordered by the number of likes DESCENDING
from django.db.models import Count
user_review_list = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.order_by('user', 'num_likes')\
.values_list('user', 'pk')
user_review_dict = dict(user_review_list)
review_pk_list = list(user_review_dict.values())
reviews = Review.objects\
.annotate(num_likes=Count('likereview_review'))\
.filter(pk__in=review_pk_list)\
.order_by('-num_likes')

Related

How to sort a queryset based on a foreign key field?

This is a contest system project. I have these models and I know the contest_id and problem_id. I'm trying to return a queryset that contains users who have solved a problem. A user who solves a problem is the one whose submission's score is equal to score of the problem he tried to solve.
At the end I need to sort these users based on the time they have submitted their successful submission.
class Contest(models.Model):
name = models.CharField(max_length=50)
holder = models.ForeignKey(User, on_delete=models.CASCADE)
start_time = models.DateTimeField()
finish_time = models.DateTimeField()
is_monetary = models.BooleanField(default=False)
price = models.PositiveIntegerField(default=0)
problems = models.ManyToManyField(Problem)
authors = models.ManyToManyField(User, related_name='authors')
participants = models.ManyToManyField(User, related_name='participants')
class Problem(models.Model):
name = models.CharField(max_length=50)
description = models.CharField(max_length=1000)
writer = models.ForeignKey("accounts.User", on_delete=models.CASCADE)
score = models.PositiveIntegerField(default=100)
class Submission(models.Model):
submitted_time = models.DateTimeField()
participant = models.ForeignKey(User, related_name="submissions", on_delete=models.CASCADE)
problem = models.ForeignKey(Problem, related_name="submissions", on_delete=models.CASCADE)
code = models.URLField(max_length=200)
score = models.PositiveIntegerField(default=0)
I tried following code but I get wrong answer. How can I sort my QuerySet?
def list_users_solved_problem(contest_id, problem_id):
problem_score = Problem.objects.get(id=problem_id).score
successful_submissions_ids = Submission.objects.filter(
Q(score=problem_score) & Q(problem__id=problem_id)).
values_list('participant__id', flat=True)
return Contest.objects.get(id=contest_id).
participants.filter(id__in=successful_submissions_ids).
order_by('submissions__submitted_time')
You can .filter(…) [Django-doc] with:
from django.db.models import F
User.objects.filter(
participants=contest_id,
submissions__problem_id=problem_id,
submissions__score=F('submissions__problem__score')
).order_by('submissions__submitted_time')
The modeling looks however "strange": the submission is not linked to the contest. So if two (or more) contests share the same problem, then that means it will take the first complete submission to that problem, regardless what the contest is.
Note: It is normally better to make use of the settings.AUTH_USER_MODEL [Django-doc] to refer to the user model, than to use the User model [Django-doc] directly. For more information you can see the referencing the User model section of the documentation.

How to get the most liked users on a particular date in django

So I have a social media app, where users can like the posts of other users. Now I fetch the top 20 users who have received the most number of likes. Everything is perfect. But the problem is I cant figure out , how I can fetch the top users who have received the most likes on a particular date, for example get the top users who received most likes only today
My LIKES MODEL
class PostLike(models.Model):
user_who_liked = models.ForeignKey(User, on_delete=models.CASCADE)
post_liked = models.ForeignKey(Post, on_delete=models.CASCADE)
liked_on = models.DateTimeField(default=timezone.now)
SIMPLIFIED POST MODEL
class Post(models.Model):
id = models.AutoField(primary_key=True)
user = models.ForeignKey(User, on_delete=models.CASCADE,related_name='author')
caption = models.TextField()
date = models.DateTimeField(default=timezone.now)
likes = models.ManyToManyField(
User, blank=True, through=PostLike)
image = models.TextField()
class Meta:
ordering = ['-id']
SIMPLIFIED USER MODEL
class User(AbstractBaseUser, PermissionsMixin):
email = models.EmailField(unique=True)
user_name = models.CharField(max_length=100, unique=True)
date = models.DateTimeField(default=timezone.now)
profile_picture = models.TextField(
default="https://www.kindpng.com/picc/m/24-248253_user-profile-default-image-png-clipart-png-download.png")
bio = models.CharField(max_length=200, default="")
objects = CustomManger()
def __str__(self):
return self.user_name
** My query to get the top users who received the most number of likes **
users = User.objects.annotate(num__liked=Count('author__likes')).order_by('-num__likes')[:20]
# So everything is perfect and I am getting the users, now I dont know how to get the top users with most likes on a PARTICULAR DATE, for example today
** My try to get the top users with most likes on a particular day**
from django.db.models import Count, Q
from django.utils.timezone import datetime
users = User.objects.annotate(num__liked=Count('author__likes',filter=Q(author__likes__liked_on = datetime.today()))).order_by('-num__likes')[:20]
But with the above query , I cant achieve it. I am getting the error:
Related Field got invalid lookup: liked_on
I am pretty sure, I am doing something wrong with the many-many fields.
Q(author__likes__liked_on = datetime.today()) won't work, because liked_on is a datetime, while datetime.today() is a date. And the filtered field is on the 'through' table.
So you need to cast liked_on to a date, and look up the field on postlike (lower-cased by default):
Q(author__postlike__liked_on__date = datetime.today()))

Get max value from a set of rows

This question is in relation to project 2 of the cs50 course which can be found here
I have looked at the following documentation:
Django queryset API ref
Django making queries
Plus, I have also taken a look at the aggregate and annotate things.
I've created the table in the template file, which is pretty straight forward I think. The missing column is what I'm trying to fill. Image below
These are the models that I have created
class User(AbstractUser):
pass
class Category(models.Model):
category = models.CharField(max_length=50)
def __str__(self):
return self.category
class Listing(models.Model):
owner = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=200)
description = models.TextField()
initial_bid = models.IntegerField()
category = models.ForeignKey(Category, on_delete=models.CASCADE)
date_created = models.DateField(auto_now=True)
def __str__(self):
return self.title
class Bid(models.Model):
whoDidBid = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
list_item = models.ForeignKey(Listing, default=0, on_delete=models.CASCADE)
bid = models.IntegerField()
category = models.ForeignKey(Category, on_delete=models.CASCADE)
date = models.DateTimeField(auto_now=True)
def __str__(self):
return_string = '{0.whoDidBid} {0.list_item} {0.bid}'
return return_string.format(self)
This is the closest I could come to after a very long time. But the result I get is just the number 2. Ref image below
Listing.objects.filter(title='Cabinet').aggregate(Max('bid'))
Where 'Cabinet' is a Listing object that I have created. And placed two bids on them.
So the question is, how do I get the Maximum bid value(i.e. 110 for this case) for a particular listing? Using the orm. I think if I used a raw sql query, I could build a dict, send it to the template with the queryset. Then while looping through the queryset, get the value for the key, where the key is the name of the listing or something along those lines. Nah, I would like to know how to do this through the ORM please.
Here's answer #1
Bid.objects.filter(list_item__title='Cabinet').prefetch_related('list_item').aggregate(Max('bid'))
What happens when you try this (sorry, I don't have any objects like this to test on):
Bid.objects.values(list_item__title).prefetch_related('list_item').annotate(Max('bid'))

How to iterate through all foreign keys pointed at an object in Django without using _set notation?

I am fairly new to Django, but I am working on an application that will follow a CPQ flow or Configure, Price, Quote. The user should select the product they would like to configure as well as the options to go with it. Once selected, the program should query an external pricing database to calculate price. From there the program should output the pricing & text data onto a PDF quote. I was able to get the application working using the specific product inheriting from a base product class. The issue is now that I've created a second product child class, I cannot use a singular "related_name". I've omitted the lists associated with the drop down fields to help with readability, but I've posted my models.py file below.
Is there a way I can iterate through Product objects that are pointing to a Quote object with a foreign key? A lot of answers I've found on SO relating to this were able to be solved either by specifying the "_set" or "related_name". I've seen other answers use the select_related() method, however, I can't seem to get the query right as the program won't know which set it needs to look at. A quote could have any mix of product instances tied to it, so am unsure how to handle that query. Again, I have been using django under 6 months, so I am a bit green. I am not sure if I am just not fundamentally understanding the big picture here. I thought about instead of using inheritance, to make Product a standalone class and to save the Compact or WRC info to it so I could just use one "related_name", but also thought that would just create another nested layer that would still fail.
Any help would be very appreciated! I've definitely hit the wall.
models.py
class Quote(models.Model):
project_name = models.CharField(max_length=256,blank=True)
customer_first_name = models.CharField(max_length=256,blank=True)
customer_last_name = models.CharField(max_length=256,blank=True)
company_name = models.CharField(max_length=256, blank=True)
address1 = models.CharField(max_length=256, blank=True, help_text ="Address")
address2 = models.CharField(max_length=256, blank=True)
city = models.CharField(max_length=256, blank=True, default="")
state = models.CharField(max_length=256, blank=True, default="")
zip_code = models.CharField(max_length=256, blank=True, default="")
country = models.CharField(max_length=256, blank=True, default="")
phone = PhoneField(blank=True)
email = models.EmailField(max_length=254,blank=True)
grand_total = models.FloatField(default=0)
create_date = models.DateTimeField(default = timezone.now)
class Product(models.Model):
class Meta:
abstract = True
price = models.FloatField(default=0)
total_price = models.FloatField(default=0)
quantity = models.IntegerField()
quote = models.ForeignKey('quote.Quote', on_delete=models.CASCADE)
quantity = models.IntegerField()
class Compact(Product):
base_size = models.CharField(choices=size, max_length = 256)
filter = models.CharField(choices=filter_list, max_length = 256)
product_name = models.CharField(max_length=256,default="Compact")
class WRC(Product):
base_size = models.CharField(choices=size, max_length = 256)
construction = models.CharField(choices=construction_list, max_length = 256)
outlet = models.CharField(choices=outlet_list, max_length = 256)
product_name = models.CharField(max_length=256,default="WRC")
I was able to figure out my issue, but wanted to answer in case someone came across a similar problem as myself. I was able to get get all product objects attached to a quote instance dynamically by modifying the get_context_data() method of my QuoteDetailView. I also needed to use the django library NestedObjects from django.contrib.admin.utils to grab all related objects to the quote instance. I also added a timestamp field to the Product class to be able to sort them. QuoteDetailView copied below.
class QuoteDetailView(FormMixin,DetailView):
model = Quote
form_class = ProductSelectForm
def get_context_data(self, **kwargs):
### collects related objects from quote
collector = NestedObjects(using=DEFAULT_DB_ALIAS)
collector.collect([kwargs['object']])
### slice off first element which is the quote itself
related_objects = collector.nested()
related_objects = related_objects[1:]
### get context data for qoute object
context = super().get_context_data(**kwargs)
context['now'] = timezone.now()
### if number of list items is above 0, then add them to the context
### and sort by timestamp
if len(related_objects) != 0:
context['items'] = sorted(related_objects[0], key=lambda x: x.timestamp)
return context

Is there a way to filter many-to-many filters in Django using fields from the intermediate table?

I have 3 models in Django.
Group, Membership and User.
class Group(models.Model):
name = models.CharField(max_length=32)
permissions = JSONField(max_length=4096, default=list)
class Membership(models.Model):
user = models.ForeignKey('User', on_delete=models.CASCADE, related_name='memberships')
group = models.ForeignKey(Group, on_delete=models.CASCADE, related_name='memberships')
expires_at = models.DateTimeField(null=True)
valid = models.BooleanField(default=True)
class User(models.Model):
groups = models.ManyToManyField(Group, through=Membership)
last_seen = models.DateTimeField(null=True)
created_at = models.DateTimeField(auto_now=True)
I was wondering how I could "filter" the many-to-many on user to only retrieve group objects from memberships where expires_at is either greater than now or null. Thank you!
I believe you are looking for expires at less than now, but I have written the query for expires at greater than now as per your request
Use Q for querying on "OR". So your filter query will be
from django.db.models import Q
Q(expires_at__gt=now) | Q(expires_at__isnull=True)
You want to filter memberships for a certain user. So your query boils down to
Membership.objects.filter(user=user).filter(Q(expires_at__gt=now) | Q(expires_at__isnull=True))
Get all group ids for your query
Membership.objects.filter(user=user).filter(Q(expires_at__gt=now) | Q(expires_at__isnull=True)).values_list('group_id', flat=True)
Get all relevant groups
queryset = Membership.objects.filter(user=user).filter(Q(expires_at__gt=now) | Q(expires_at__isnull=True)).values_list('group_id', flat=True)
result = Group.objects.filter(id__in=queryset)
Though I think there must be a better solution for step 4, but this is fine too.
If you are just looking for group names, then this is enough
result = Membership.objects.filter(user=user).filter(Q(expires_at__gt=now) | Q(expires_at__isnull=True)).values_list('group__name', flat=True)

Categories

Resources