Django - Selecting related set : how many times does it hit the database? - python

I took this sample code here : Django ORM: Selecting related set
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)
My question is very simple : do you hit twice the database when you finally use the queryset choices ?

It will be one query, but containing an inner SELECT; if you want to do some debugging on that, you could either use the marvellous django-debug-toolbar, or do something like print str(choices.query) which will output the raw sql of your query!

Related

FastAPI in-memory filtering

I'm following the tutorial here: https://github.com/Jastor11/phresh-tutorial/tree/tutorial-part-11-marketplace-functionality-in-fastapi/backend/app and I had a question: I want to filter a model by different parameters so how would I do that?
The current situation is that I have a list of doctors and so I get all of them. Then depending on the filter query parameters, I filter doctors. I can't just do it all in one go because these query parameters are optional.
so I was thinking something like (psuedocode):
all_doctors = await self.db.fetch_all(query=GET_ALL_DOCTORS)
if language_id:
all_doctors = all_doctors.filter(d => doctor.language_id = language_id)
if area:
all_doctors = all_doctors.xyzabc
I'm trying out FastAPI according to that tutorial and couldn't figure out how to do this.
I have defined a model file for different models and am using SQLAlchemy.
One way I thought of is just getting the ids of all the doctors then at each filtering step, passing in the doctor ids from the last step and funneling them through different sql queries but this is filtering using the database and would result in one more query per filter parameter. I want to know how to use the ORM to filter in memory.
EDIT: So basically, in the tutorial I was following, no SQLAlchemy models were defined. The tutorial was using SQL statements. Anyways, to answer my own question: I would first need to define SQLAlchemy models before I can use them.
The SQLAlchemy query object (and its operations) returns itself, so you can keep building out the query conditionally inside if-statements:
query = db_session.query(Doctor)
if language_id:
query = query.filter(Doctor.language_id == language_id)
if area_id:
query = query.filter(Doctor.area_id == area_id)
return query.all()
The query doesn't run before you call all at the end. If neither argument is given, you'll get all the doctors.

Django querysets optimization - preventing selection of annotated fields

Let's say I have following models:
class Invoice(models.Model):
...
class Note(models.Model):
invoice = models.ForeignKey(Invoice, related_name='notes', on_delete=models.CASCADE)
text = models.TextField()
and I want to select Invoices that have some notes. I would write it using annotate/Exists like this:
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True)
This works well enough, filters only Invoices with notes. However, this method results in the field being present in the query result, which I don't need and means worse performance (SQL has to execute the subquery 2 times).
I realize I could write this using extra(where=) like this:
Invoice.objects.extra(where=['EXISTS(SELECT 1 FROM note WHERE invoice_id=invoice.id)'])
which would result in the ideal SQL, but in general it is discouraged to use extra / raw SQL.
Is there a better way to do this?
You can remove annotations from the SELECT clause using .values() query set method. The trouble with .values() is that you have to enumerate all names you want to keep instead of names you want to skip, and .values() returns dictionaries instead of model instances.
Django internaly keeps the track of removed annotations in
QuerySet.query.annotation_select_mask. So you can use it to tell Django, which annotations to skip even wihout .values():
class YourQuerySet(QuerySet):
def mask_annotations(self, *names):
if self.query.annotation_select_mask is None:
self.query.set_annotation_mask(set(self.query.annotations.keys()) - set(names))
else:
self.query.set_annotation_mask(self.query.annotation_select_mask - set(names))
return self
Then you can write:
invoices = (Invoice.objects
.annotate(has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
.filter(has_notes=True)
.mask_annotations('has_notes')
)
to skip has_notes from the SELECT clause and still geting filtered invoice instances. The resulting SQL query will be something like:
SELECT invoice.id, invoice.foo FROM invoice
WHERE EXISTS(SELECT note.id, note.bar FROM notes WHERE note.invoice_id = invoice.id) = True
Just note that annotation_select_mask is internal Django API that can change in future versions without a warning.
Ok, I've just noticed in Django 3.0 docs, that they've updated how Exists works and can be used directly in filter:
Invoice.objects.filter(Exists(Note.objects.filter(invoice_id=OuterRef('pk'))))
This will ensure that the subquery will not be added to the SELECT columns, which may result in a better performance.
Changed in Django 3.0:
In previous versions of Django, it was necessary to first annotate and then filter against the annotation. This resulted in the annotated value always being present in the query result, and often resulted in a query that took more time to execute.
Still, if someone knows a better way for Django 1.11, I would appreciate it. We really need to upgrade :(
We can filter for Invoices that have, when we perform a LEFT OUTER JOIN, no NULL as Note, and make the query distinct (to avoid returning the same Invoice twice).
Invoice.objects.filter(notes__isnull=False).distinct()
This is best optimize code if you want to get data from another table which primary key reference stored in another table
Invoice.objects.filter(note__invoice_id=OuterRef('pk'),)
We should be able to clear the annotated field using the below method.
Invoice.objects.annotate(
has_notes=Exists(Note.objects.filter(invoice_id=OuterRef('pk')))
).filter(has_notes=True).query.annotations.clear()

How can get last nth item of a query on PostgreSql

Question:
How I can get the last 750 records of a query in the Database level?
Here is What I have tried:
# Get last 750 applications
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
This query fetches all records that hit the query_params filter and after that return the last 750 records. So I want to do this work at the database level, like mongoDb aggregate queries. Is it possible?
Thanks.
Actually that's not how Django works. The limit part is also done in database level.
Django docs - Limiting QuerySets:
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query.
To see what query is actually being run in the database you can simply print the query like this:
apps = MyModel.active_objects.filter(
**query_params
).order_by('-created_at').values_list('id', flat=True)[:750]
print(apps.query)
The result will be something like this:
SELECT * FROM "app_mymodel" WHERE <...> ORDER BY "app_mymodel"."created_at" DESC LIMIT 750

query.group_by in Django 1.9

I am moving code from Django 1.6 to 1.9.
In 1.6 I had this code
models.py
class MyReport(models.Model):
group_id = models.PositiveIntegerField(blank=False, null=False)
views.py
query = MyReport.objects.filter(owner=request.user).query
query.group_by = ['group_id']
entries = QuerySet(query=query, model=MyReport)
The query would return one object for each 'group_id'; due to the way I use it, any table row with the group_id would do as a representative.
With 1.9 this code is broken. The query after the second line above is:
SELECT "reports_myreport"."group_id", ... etc FROM "reports_myreport" WHERE "reports_myreport"."owner_id" = 1 GROUP BY "reports_myreport"."group_id", "reports_report"."otherfield", ...
Basically it lists all the table fields in the group by clause, making the query return the whole table.
Ever though in the debugger I see
query.group_by = ['group_by']
It doesn't look like query.group_by is a method in 1.9 nor does the change-logs of 1.7-1.9 suggest that something changed.
Is there a better way - not depending on internal Django stuff - I can use for my query?
Any way to fix my current query?
You can use order_by() to get the results ordered, in that same query you can order by a second criteria.
If your want to get the groups you will need to iterate over the collection to retrieve those values.
If you consume all of the results returned by the query, you can consider:
a) itertools.groupby which makes an in-memory group by instead, but you should not use it for large data sets.
b) Another option is to use Manager.raw() but you will need to write SQL inside Django, like this:
for report in MyReport.objects.raw('SELECT * FROM reporting_report GROUP by group_id'):
print(report)
This will work for large data sets, but you could lose compatibility with some database engines.
Bonus: I recommend you to understand what exactly the old code did before doing a rewrite.

Programming error with order by and distinct on Django with PostgreSQL

There are a lot of similar posts but none that hit exactly at what I am trying to get to. I get that a distinct has to use the same fields as order_by, which is fine.
So I have the following query:
q = MyModel.objects.order_by('field1', 'field2', '-tstamp').distinct('field1', 'field2')
Ultimately I am trying to find the latest entry in the table for all combinations of field1 and field2. The order_by does what I think it should, and that's great. But when I do the distinct I get the following error.
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Ultimately this seems like a SQL problem (not a Django one). However looking at the django docs for distinct, it shows what can and can't work. It does say that
q = MyModel.objects.order_by('field1', 'field2', '-tstamp').distinct('field1')
will work (...and it does). But I don't understand that when I add on field2 in the same order as done in the order_by I still get the same result. Any help would be greatly appreciated
EDIT: I also notice that if I do
q = MyModel.objects.order_by('field1', 'field2', '-tstamp').distinct('field1', 'field2', 'tstamp') # with or without the - in the order_by
It still raises the same error, though the docs suggest this should work just fine
Was able to get the query to run properly by using pk's in the order_by()
q = MyModel.objects.order_by('field1__pk', 'field2__pk', '-tstamp').distinct('field1__pk', 'field2__pk')
Apparently when they are not ordinary types the orderby doesn't play super nice. However using the pk's of the objects seems to work
Test this one :
q = MyModel.objects.order_by('field1__id', 'field2__id', '-tstamp').distinct('field1__id', 'field2__id')

Categories

Resources