I have a simple Python/Django class:
class myModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
and I want to get two elements from my database. First is the newest element and the second is newest positive element. So I can do this like this:
myModel.objects.all().order_by('-date')[:1][0]
myModel.objects.filter(value__gte = 0).order_by('-date')[:1][0]
Note those [:1][0] at the end - this is because I want to get maximum use of database sql engine. The thing is that I still need two queries and I want to combine it into a single one (something like [:2] at the end which will produce the result I want). I know about Django's Q, but can't figure out how to use it in this context. Maybe some raw sql? I'm waiting for ideas. :)
This looks like premature optimisation to me. Is two queries instead of one really so bad? At the moment, anyone who knows the Django ORM can understand your two queries. After you've replaced it with some funky raw SQL, that might not be the case.
You should use [0] instead of [:1][0]. Django knows how to slice querysets efficiently -- both queries will result in the exact same SQL.
This doesn't fully answer your question, but you can get rid of those [:1][0] and order_by by using latest QuerySet method, it will return the latest element in the QuerySet using the argument provided as a date field.
Related
The title says it all.
Let's take a look at this code for example:
objs = Model.objects.prefetch_related('model2').filter()
objs.first().model2_set.first().field
vs
objs = Model.objects.filter().prefetch_related('model2')
objs.first().model2_set.first().field
Question
When using prefetch_related() first, does Django fetch all the ManyToOne/ManyToMany relations without taking into consideration .filter() and after everything is fetched, the filter is applied?
IMO, that doesn't matter since there's still one query executed at the end.
Thanks in advance.
It doesn't matter where you specify prefetch_related as long as it's before any records are fetched. Personally I put things like prefetch_related, select_related. and only at the end of the chain but that just feels more expressive to me from a code readability perspective.
But that is not true of all manager methods. Some methods do have different effects depending on their position in the chain, for example order_by can have positional significance when used with distinct (group by).
Django 1.11 introduced cool feature to combine querysets.
You can make something like:
combined_qs = qs1.union(qs2, qs3)
And voila! combined_qs has all querysets together keeping their order (btw, am I right here?). I need this, as qs1 should be the first, qs2 the second and so on.
But the challenging part is those querysets can't be ordered by any field. As I get it, it happens due MySql ignores any ORDER BY statements in UNION, and it can be done only via subquery. And I don't get it how it can be done via Django ORM without too much raw SQL magic.
Any ideas on how those querysets can be unioned with respect to their order_by() statements?
Upd.
Seems like I saw what I wanted to see and not what it is. Perhaps, I still should stay with django-querysetsequence. Which is a good lib, but I had a hope we have finally got the native method.
I'd like to know how Django's order_by works if the given order_by field's values are same for a set of records. Consider I have a score field in DB and I'm filtering the queryset using order_by('score'). How will records having the same values for score arrange themselves?
Every time, they're ordered randomly within the subset of records having equal score and this breaks the pagination at client side. Is there a way to override this and return the records in a consistent order?
I'm Using Django 1.4 and PostgreSQL.
As the other answers correctly explain, order_by() accepts multiple arguments. I'd suggest using something like:
qs.order_by('score','pk') #where qs is your queryset
I recommend using 'pk' (or '-pk') as the last argument in these cases, since every model has a pk field and its value is never the same for 2 records.
order_by can have multiple params, I think order_by('score', '-create_time') will always return the same queryset.
If I understand correctly, I think you need consistently ordered result set every time, You can use something like order_by('score','id') that will first order by the score first and then by the auto-increment id within the score having same values, hence your output being consistent. The documentation is here. You need to be explicit in the order_by if you want to fetch correct result set every time, using 'id' is one of the ways.
Is there any way to remove select related from queryset?
I found, that django add JOIN on count() operation to sql query.
So, if we have code like this:
entities = Entities.objects.select_related('subentity').all()
#We will have INNER JOIN here..
entities.count()
I'm looking for a way to remove join.
One important detail - I got this queryset into django paginator, so I can't simply write
Entities.objects.all().count()
I believe this code comments provide a relatively good answer to the general question that is asked here:
If select_related(None) is called, the list is cleared.
https://github.com/django/django/blob/stable/1.8.x/django/db/models/query.py#L735
In the general sense, if you want to do something to the entities queryset, but first remove the select_related items from it, entities.select_related(None).
However, that probably doesn't solve your particular situation with the paginator. If you do entries.count(), then it already will remove the select_related items. If you find yourself with extra JOINs taking place, then it could be several non-ideal factors. It could be that the ORM fails to remove it because of other logic that may or may not affect the count when combined with the select_related.
As a simple example of one of these non-ideal cases, consider Foo.objects.select_related('bar').count() versus Foo.objects.select_related('bar').distinct().count(). It might be obvious to you that the original queryset does not contain multiple entries, but it is not obvious to the Django ORM. As a result, the SQL that executes contains a JOIN, and there is no universal prescription to work around that. Even applying .select_related(None) will not help you.
Can you show the code where you need this, I think refactoring is the best answer here.
If you want quick answer, entities.query.select_related = False, but it's rather hacky (and don't forget to restore the value if you will need select_related later).
In general, is there a type of Model query you look for to optimize by indexing a field (db_index=True)?
In case it's relevant: I'm using MySQL.
Elaboration:
Although I appreciate the responses already given. I was more looking for advice such as this one from a colleague:
You should definitely index the fields in your default ordering and any field you use for filtering.
Think that about covers it?
Install django-debug-toolbar
Look at the SQL panel, look for long-running queries
Index the columns selected in those queries
If you need help with the queries, try the "EXPLAIN" MySQL command on the query.
Basically you should index fields that are searched for often. For example if you have a user-table then username could be indexed as well if things are constantly queried based on username. There are trade offs of course.