How can get last nth item of a query on PostgreSql - python

Question:
How I can get the last 750 records of a query in the Database level?
Here is What I have tried:
# Get last 750 applications
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
This query fetches all records that hit the query_params filter and after that return the last 750 records. So I want to do this work at the database level, like mongoDb aggregate queries. Is it possible?
Thanks.

Actually that's not how Django works. The limit part is also done in database level.
Django docs - Limiting QuerySets:
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query.
To see what query is actually being run in the database you can simply print the query like this:
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
print(apps.query)
The result will be something like this:
SELECT * FROM "app_mymodel" WHERE <...> ORDER BY "app_mymodel"."created_at" DESC LIMIT 750

Related

Does the sqlalchemy make additional requests when we work with query?

I am new to sqlalchemy and I have a question regarding my code:
query = db.query(Purchase.name,
func.sum(Purchase.price).label('total'),
func.count(Purchase.name).label('count'))
if date_start and date_end:
query = query.filter(Purchase.date >= date_start,
Purchase.date <= date_end)
query = query.group_by(Purchase.name)\
.order_by(sqlalchemy.desc('total'))[:limit]
result = [ItemDict(name=item.name, total=item.total,
count=item.count) for item in query]
Do I understand correctly that:
In this program there will be only one query to the database?
When we work with Query objects, we do NOT make additional queries to the database (i.e. the expression in the list does not make additional queries)?
Ad. 1: Yes, there should be only one query (there may also be a small query that does the "ping" command depending on your pool configuration)
Ad. 2: Additional queries depends on joining strategy. If you filter only one model without joining, you should always have single query. However, if you join other models and use lazy joining strategy, you can have many implicit additional queries (my short post about it)
You can use this smart context manager to count number of queries: How to count sqlalchemy queries in unit tests.

Django ORM: Get latest record for distinct field

I'm having loads of trouble translating some SQL into Django.
Imagine we have some cars, each with a unique VIN, and we record the dates that they are in the shop with some other data. (Please ignore the reason one might structure the data this way. It's specifically for this question. :-) )
class ShopVisit(models.Model):
vin = models.CharField(...)
date_in_shop = models.DateField(...)
mileage = models.DecimalField(...)
boolfield = models.BooleanField(...)
We want a single query to return a Queryset with the most recent record for each vin and update it!
special_vins = [...]
# Doesn't work
ShopVisit.objects.filter(vin__in=special_vins).annotate(max_date=Max('date_in_shop').filter(date_in_shop=F('max_date')).update(boolfield=True)
# Distinct doesn't work with update
ShopVisit.objects.filter(vin__in=special_vins).order_by('vin', '-date_in_shop).distinct('vin').update(boolfield=True)
Yes, I could iterate over a queryset. But that's not very efficient and it takes a long time when I'm dealing with around 2M records. The SQL that could do this is below (I think!):
SELECT *
FROM cars
INNER JOIN (
SELECT MAX(dateInShop) as maxtime, vin
FROM cars
GROUP BY vin
) AS latest_record ON (cars.dateInShop= maxtime)
AND (latest_record.vin = cars.vin)
So how can I make this happen with Django?
This is somewhat untested, and relies on Django 1.11 for Subqueries, but perhaps something like:
latest_visits = Subquery(ShopVisit.objects.filter(id=OuterRef('id')).order_by('-date_in_shop').values('id')[:1])
ShopVisit.objects.filter(id__in=latest_visits)
I had a similar model, so went to test it but got an error of:
"This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery"
The SQL it generated looked reasonably like what you want, so I think the idea is sound. If you use PostGres, perhaps it has support for that type of subquery.
Here's the SQL it produced (trimmed up a bit and replaced actual names with fake ones):
SELECT `mymodel_activity`.* FROM `mymodel_activity` WHERE `mymodel_activity`.`id` IN (SELECT U0.`id` FROM `mymodel_activity` U0 WHERE U0.`id` = (`mymodel_activity`.`id`) ORDER BY U0.`date_in_shop` DESC LIMIT 1)
I wonder if you found the solution yourself.
I could come up with only raw query string. Django Raw SQL query Manual
UPDATE "yourapplabel_shopvisit"
SET boolfield = True WHERE date_in_shop
IN (SELECT MAX(date_in_shop) FROM "yourapplabel_shopvisit" GROUP BY vin);

query.group_by in Django 1.9

I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?
You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.

Django query with AVG and GROUP BY

My Django-foo isn't quite up to par to translate certain raw sql into the ORM.
Currently I am executing:
SELECT avg(<value_to_be_averaged>), <id_to group_on>
FROM <table_name>
WHERE start_time >= <timestamp>
GROUP BY <id_to group_on>;
In Django I can do:
Model.objects.filter(start_time__gte=<timestamp>).aggregate(Avg('<value_to_be_averaged>'))
but that is for all objects in the query and doesn't return a query set that is grouped by the id like in the raw SQL above. I've been fiddling with .annotate() but haven't made much progress. Any help would be appreciated!

Django - Selecting related set : how many times does it hit the database?

I took this sample code here : Django ORM: Selecting related set
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
My question is very simple : do you hit twice the database when you finally use the queryset choices ?
It will be one query, but containing an inner SELECT; if you want to do some debugging on that, you could either use the marvellous django-debug-toolbar, or do something like print str(choices.query) which will output the raw sql of your query!

Categories

Resources