prefetch limited number of related objects in django - python

I want to display list of Posts with 5 latest Comments for each of them. How do I do that with minimum number of db queries?
Post.objects.filter(...).prefetch_related('comment_set')
retrieves all comments while I need only few of them.

I would go with two queries. First get posts:
posts = list(Post.objects.filter(...))
Now run raw SQL query with UNION (NOTE: omitted ordering for simplicity):
sql = "SELECT * FROM comments WHERE post_id=%s LIMIT 5"
query = []
for post in posts:
query.append( sql % post.id )
query = " UNION ".join(query)
and run it:
comments = Comments.objects.raw(query)
After that you can loop over comments and group them on the Python side.
I haven't tried it, but it looks ok.
There are other possible solutions for your problem (possibly getting down to one query), have a look at this:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/

Related

How can get last nth item of a query on PostgreSql

Question:
How I can get the last 750 records of a query in the Database level?
Here is What I have tried:
# Get last 750 applications
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
This query fetches all records that hit the query_params filter and after that return the last 750 records. So I want to do this work at the database level, like mongoDb aggregate queries. Is it possible?
Thanks.
Actually that's not how Django works. The limit part is also done in database level.
Django docs - Limiting QuerySets:
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query.
To see what query is actually being run in the database you can simply print the query like this:
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
print(apps.query)
The result will be something like this:
SELECT * FROM "app_mymodel" WHERE <...> ORDER BY "app_mymodel"."created_at" DESC LIMIT 750

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?
This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)
I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

Select batch of rows sqlalchemy mysql

I have a MySQL database with a few thousand forum posts + text. I would like to grab them in batches, say 1000 at a time, and do stuff to them in python3.
My single post query looks like:
pquery = session.query(Post).\
filter(Post.post_id.like(post_id))
How can I change this so that given a post_id, it returns that post and the 999 posts after it?
Use limit and offset:
pquery = session.query(Post).filter(Post.post_id.like(post_id)).limit(1000).offset(the_offset_val)

How to Group by id and Order By count in Django

I'm having trouble converting writing the correct Python script that does what I can accomplish in MYSQL
Below is the SQL query that accomplish exactly what I want. Where I get tripped up in python the the GROUP BY statement.
SELECT COUNT(story_id) AS theCount, `headline`, `url` from tracking
GROUP BY `story_id`
ORDER BY theCount DESC
LIMIT 20
Here's What I have in python so far. This queries all of the articles just fine, but it's lacking any kind of groupby() or order_by() based on COUNT.
articles = ArticleTracking.objects.all().filter(date__range=(start_date, end_date))[:20]
article_info = []
for article in articles:
this_value = {
"story_id":article.story_id,
"url":article.url,
"headline":article.headline,
}
article_info.append(this_value)
The right way to do this is to use aggregation.
articles = ArticleTracking.objects.filter(date__range=(start_date, end_date))
articles = articles.values('story_id', 'url', 'headline').annotate(count = Count('story_id')).order_by('-count')[:20]
Also go through the aggregation documentation in Django.
https://docs.djangoproject.com/en/dev/topics/db/aggregation/
Don't try this at home.
You can add a group_by clause to a queryset like this:
qs = ArticleTracking.objects.all().filter(date__range=(start_date, end_date))
qs.query.group_by = ['story_id']
articles = qs[:20]
This is not part of the public api, so it may change, and it may work differently (or be unavailable) depending on the particular db backend you're using. Worth mentioning that I'm not sure if applying the group_by clause before or after the filter makes any difference. I have had success with this with a MySQL backend, though.

Django - Selecting related set : how many times does it hit the database?

I took this sample code here : Django ORM: Selecting related set
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
My question is very simple : do you hit twice the database when you finally use the queryset choices ?
It will be one query, but containing an inner SELECT; if you want to do some debugging on that, you could either use the marvellous django-debug-toolbar, or do something like print str(choices.query) which will output the raw sql of your query!

Categories

Resources