Django Query, distinct on foreign key

Django Query, distinct on foreign key - python

Given these models
class User(Model):
pass
class Post(Model):
by = ForeignKey(User)
posted_on = models.DateTimeField(auto_now=True)
I want to get the latest Posts, but not all from the same User, I have something like this:
posts = Post.objects.filter(public=True) \
.order_by('posted_on') \
.distinct("by")
But distinct doesn't work on mysql, I'm wondering if there is another way to do it?
I have seen some using values(), but values doesn't work for me because I need to do more things with the objects themselves

Since distinct will not work with MySQL on other fields then model id, this is possible way-around with using Subquery:
from django.db.models import Subquery, OuterRef
...
sub_qs = Post.objects.filter(user_id=OuterRef('id')).order_by('posted_on')
# here you get users with annotated last post
qs = User.objects.annotate(last_post=Subquery(sub_qs[:1]))
# next you can limit the number of users
Also note that ordering on posted_on field depends on your model constraints - perhaps you'll need to change it to -posted_on to order from newest on top.

order_by should match the distinct(). In you case, you should be doing this:
posts = Post.objects.filter(public=True) \
.order_by('by') \
.distinct('by')
.distinct([*fields]) only works in PostgresSQL.
For MySql Engine. This is MySQL documentation in Django:
Here's the difference. For a normal distinct() call, the database
compares each field in each row when determining which rows are
distinct. For a distinct() call with specified field names, the
database will only compare the specified field names.
For MySql workaround could be this:
from django.db.models import Subquery, OuterRef
user_post = Post.objects.filter(user_id=OuterRef('id')).order_by('posted_on')
post_ids = User.objects.filter(related_posts__isnull=False).annotate(post=Subquery(user_post.values_list('id', flat=True)[:1]))).values_list('post', flat=True)
posts = Post.objects.filter(id__in=post_ids)

Related

How to annotate on a Django model's M2M field and get a list of distinct instances?

I have two Django models Profile and Device with a ManyToMany relationship with one another like so:
class Profile(models.Model):
devices = models.ManyToManyField(Device, related_name='profiles')
I am trying to use annotate() and Count() to query on all profiles that have 1 or more devices like this:
profiles = Profile.objects.annotate(dev_count=Count('devices')).filter(dev_count__gt=1)
This is great, it gives me a QuerySet with all the profiles (4500+) with one or more devices, as expected.
Next, because of the M2M relationship, I would like to get a list of all the distinct devices among all the profiles from the previous queryset.
All of my failed attempts below return an empty queryset. I have read the documentation on values, values_list, and annotate but I still can't figure out how to make the correct query here.
devices = profiles.values('devices').distinct()
devices = profiles.values_list('devices', flat=True).distinct()
I have also tried to do it in one go:
devices = (
Profile.objects.values_list('devices', flat=True)
.annotate(dev_count=Count('devices'))
.filter(dev_count__gt=1)
.distinct()
)

You can not work with .values() since that that item appears both in the SELECT clause and the GROUP BY clause, so then you start mentioning the field, and hence the COUNT(devices) will return 1 for each group.
You can filter on the Devices that are linked to at least one of these Profiles with:
profiles = Profile.objects.annotate(
dev_count=Count('devices')
).filter(dev_count__gt=1)
devices = Device.objects.filter(profile__in=profiles).distinct()
For some SQL dialects, usually MySQL it is better to first materialize the list of profiles and not work with a subquery, so:
profiles = Profile.objects.annotate(
dev_count=Count('devices')
).filter(dev_count__gt=1)
profiles_list = list(profiles)
devices = Device.objects.filter(profile__in=profiles_list).distinct()

Filtering multiple models in Django

I want to Filter across multiple tables in Django.
q = json.loads(request.body)
qs = Search.objects.filter(keyword__icontains=q['q']).all()
data = serialize("json", qs, fields=('keyword', 'user'))
That's one,
secondly, the user field is returning an integer value (pk) instead of maybe the username.

You can try getting 'user__username' instead of 'user'. It might work.

Optimise Django query with large subquery

I have a database containing Profile and Relationship models. I haven't explicitly linked them in the models (because they are third party IDs and they may not yet exist in both tables), but the source and target fields map to one or more Profile objects via the id field:
from django.db import models
class Profile(models.Model):
id = models.BigIntegerField(primary_key=True)
handle = models.CharField(max_length=100)
class Relationship(models.Model):
id = models.AutoField(primary_key=True)
source = models.BigIntegerField(db_index=True)
target = models.BigIntegerField(db_index=True)
My query needs to get a list of 100 values from the Relationship.source column which don't yet exist as a Profile.id. This list will then be used to collect the necessary data from the third party. The query below works, but as the table grows (10m+), the SubQuery is getting very large and slow.
Any recommendations for how to optimise this? Backend is PostgreSQL but I'd like to use native Django ORM if possible.
EDIT: There's an extra level of complexity that will be contributing to the slow query. Not all IDs are guaranteed to return success, which would mean they continue to "not exist" and get the program in an infinite loop. So I've added a filter and order_by to input the highest id from the previous batch of 100. This is going to be causing some of the problem so apologies for missing it initially.
from django.db.models import Subquery
user = Profile.objects.get(handle="philsheard")
qs_existing_profiles = Profiles.objects.all()
rels = TwitterRelationship.objects.filter(
target=user.id,
).exclude(
source__in=Subquery(qs_existing_profiles.values("id"))
).values_list(
"source", flat=True
).order_by(
"source"
).filter(
source__gt=max_id_from_previous_batch # An integer representing a previous `Relationship.source` id
)
Thanks in advance for any advice!

For future searchers, here's how I bypassed the __in query and was able to speed up the results.
from django.db.models import Subquery
from django.db.models import Count # New
user = Profile.objects.get(handle="philsheard")
subq = Profile.objects.filter(profile_id=OuterRef("source")) # New queryset to use within Subquery
rels = Relationship.objects.order_by(
"source"
).annotate(
# Annotate each relationship record with a Count of the times that the "source" ID
# appears in the `Profile` table. We can then filter on those that have a count of 0
# (ie don't appear and therefore haven't yet been connected)
prof_count=Count(Subquery(subq.values("id")))
).filter(
target=user.id,
prof_count=0
).filter(
source__gt=max_id_from_previous_batch # An integer representing a previous `Relationship.source` id
).values_list(
"source", flat=True
)
I think this is faster because the query will complete once it reaches it's required 100 items (rather than comparing against a list of 1m+ IDs each time).

Extract OneToOne Field in django model

class Post(models.Model):
created_time = models.DateTimeField()
comment_count = models.IntegerField(default=0)
like_count = models.IntegerField(default=0)
group = models.ForeignKey(Group)
class MonthPost(models.Model):
created_time = models.DateTimeField()
comment_count = models.IntegerField(default=0)
like_count = models.IntegerField(default=0)
group = models.ForeignKey(Group)
post = models.OneToOneField(Post)
I use this two models. MonthPost is part of Post.
I want to use MonthPost when filtered date is smaller than month.
_models = Model.extra(
select={'score': 'like_count + comment_count'},
order_by=('-score',)
)
I use extra about above two models. Post works well, but MonthPost doesn't work.
django.db.utils.ProgrammingError: column reference "like_count" is ambiguous
LINE 1: ... ("archive_post"."is_show" = false)) ORDER BY (like_count...
This is the error message.
_models.values_list("post", flat=True)
And then, I want to extract OneToOne field(post) from MonthPost.
I try to use values_list("post", flat=True). It return only id list.
I need to post object list for django rest framework.

I don't' quite understand what you are trying to achieve with your MonthPost model and why it duplicates Post fields. With that being said I think you can get the results you want with this info.
First of all extra is depreciated see the docs on extra. In either case, your select is not valid SQL syntax, your query should look more like this:
annotate(val=RawSQL(
"select col from sometable where othercol =%s",
(someparam,)))
However, what you are after here requires neither extra or RawSql. These methods should only be used when there is no built in way to achieve the desired results. When using RawSql or extra, you must tailor the SQL for your specific backed. Django has built in methods for such queries:
qs = Post.objects.all().annotate(
score=(Count('like_count') + Count('comment_count'))
A values_list() query needs to explicitly list all fields from related models and extra or annotated fields. For MonthPost it should look like this:
MonthPost.objects.all().values_list('post', 'post__score', 'post__created_time')
Finally, if the purpose of MonthPost is simply to list the posts with he greatest score for a given month, you can eliminate the MonthPost model entirely and query your Post model for this.
import datetime
today = datetime.date.today()
# Filter for posts this month
# Annotate the score
# Order the results by the score field
qs = Post.objects\
.filter(created_time__year=today.year, created_time__month=today.month)\
.annotate(score=(Count('like_count') + Count('comment_count'))\
.order_by('score')
# Slice the top ten posts for the month
qs = qs[:10]
The code above is not tested, but should give you a better handle on how to perform these types of queries.

django model object filter

I have tables called 'has_location' and 'locations'. 'has_location' has user_has and location_id and its own id which is given by django itself.
'locations' have more columns.
Now I want to get all locations of some certain user. What I did is..(user.id is known):
users_locations_id = has_location.objects.filter(user_has__exact=user.id)
locations = Location.objects.filter(id__in=users_locations_id)
print len(locations)
but I am getting 0 by this print. I have data in db. but I have the feeling that __in does not accept the models id, does it ?
thanks

Using __in for this kind of query is a common anti-pattern in Django: it's tempting because of its simplicity, but it scales poorly in most databases. See slides 66ff in this presentation by Christophe Pettus.
You have a many-to-many relationship between users and locations, represented by the has_location table. You would normally describe this to Django using a ManyToManyField with a through table, something like this:
class Location(models.Model):
# ...
class User(models.Model):
locations = models.ManyToManyField(Location, through = 'LocationUser')
# ...
class LocationUser(models.Model):
location = models.ForeignKey(Location)
user = models.ForeignKey(User)
class Meta:
db_table = 'has_location'
Then you can fetch the locations for a user like this:
user.locations.all()
You can query the locations in your filter operations:
User.objects.filter(locations__name = 'Barcelona')
And you can request that users' related locations be fetched efficiently using the prefetch_related() method on a query set.

You are using has_location's own id to filter locations. You have to use location_ids to filter locations:
user_haslocations = has_location.objects.filter(user_has=user)
locations = Location.objects.filter(id__in=user_haslocations.values('location_id'))
You can also filter the locations directly through the reverse relation:
location = Location.objects.filter(has_location__user_has=user.id)

What do your models look like?
For your doubt, __in does accept filtered ids.
For your current code, the solution:
locations = Location.objects.filter(id__in=has_location.objects.filter(user=user).values('location_id'))
# if you just want the length of the locations, evaluate locations.count()
locations.count()
# if you want to iterate locations to access items afterwards
len(locations)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django Query, distinct on foreign key - python

Related

How to annotate on a Django model's M2M field and get a list of distinct instances?

Filtering multiple models in Django

Optimise Django query with large subquery

Extract OneToOne Field in django model

django model object filter

Categories

Resources