Django 1.11 introduced cool feature to combine querysets.
You can make something like:
combined_qs = qs1.union(qs2, qs3)
And voila! combined_qs has all querysets together keeping their order (btw, am I right here?). I need this, as qs1 should be the first, qs2 the second and so on.
But the challenging part is those querysets can't be ordered by any field. As I get it, it happens due MySql ignores any ORDER BY statements in UNION, and it can be done only via subquery. And I don't get it how it can be done via Django ORM without too much raw SQL magic.
Any ideas on how those querysets can be unioned with respect to their order_by() statements?
Upd.
Seems like I saw what I wanted to see and not what it is. Perhaps, I still should stay with django-querysetsequence. Which is a good lib, but I had a hope we have finally got the native method.
Related
The title says it all.
Let's take a look at this code for example:
objs = Model.objects.prefetch_related('model2').filter()
objs.first().model2_set.first().field
vs
objs = Model.objects.filter().prefetch_related('model2')
objs.first().model2_set.first().field
Question
When using prefetch_related() first, does Django fetch all the ManyToOne/ManyToMany relations without taking into consideration .filter() and after everything is fetched, the filter is applied?
IMO, that doesn't matter since there's still one query executed at the end.
Thanks in advance.
It doesn't matter where you specify prefetch_related as long as it's before any records are fetched. Personally I put things like prefetch_related, select_related. and only at the end of the chain but that just feels more expressive to me from a code readability perspective.
But that is not true of all manager methods. Some methods do have different effects depending on their position in the chain, for example order_by can have positional significance when used with distinct (group by).
The following query:
SELECT L1.task_id FROM task_log L1
LEFT JOIN task_log L2 ON L1.started BETWEEN L2.started AND L2.ended
WHERE L2.task_id IS NULL;
Simply reads all task logs that were started when no other tasks were running. An abstraction in PHP/ZF2 looks like this:
$db->select(array('L1' => 'task_log'))
->columns('task_id')
->join(array('L2' => 'task_log'), "L1.started BETWEEN L2.started AND L2.ended", array(), 'left');
I was looking for something similar in Django, or at least python. As per this discussion, it is (understandably) out of question to build this in django models. Are there any other alternatives in python/django?
There seem to be these options:
(A) RawSQL with string templates and maybe some generic model field reflection code to make it a bit more generic.
(B) Django's Expression API which might allow you to re-use some of the Django ORM stuff and only extend it in order to make SELF JOINs work. Though I'm wondering why there wouldn't be some Django module already. Maybe it's too rare of a use case.
Looking up i found this: http://docs.sqlalchemy.org/en/latest/core/tutorial.html which is exactly what i needed.
I have a simple Python/Django class:
class myModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
and I want to get two elements from my database. First is the newest element and the second is newest positive element. So I can do this like this:
myModel.objects.all().order_by('-date')[:1][0]
myModel.objects.filter(value__gte = 0).order_by('-date')[:1][0]
Note those [:1][0] at the end - this is because I want to get maximum use of database sql engine. The thing is that I still need two queries and I want to combine it into a single one (something like [:2] at the end which will produce the result I want). I know about Django's Q, but can't figure out how to use it in this context. Maybe some raw sql? I'm waiting for ideas. :)
This looks like premature optimisation to me. Is two queries instead of one really so bad? At the moment, anyone who knows the Django ORM can understand your two queries. After you've replaced it with some funky raw SQL, that might not be the case.
You should use [0] instead of [:1][0]. Django knows how to slice querysets efficiently -- both queries will result in the exact same SQL.
This doesn't fully answer your question, but you can get rid of those [:1][0] and order_by by using latest QuerySet method, it will return the latest element in the QuerySet using the argument provided as a date field.
Is there any way to remove select related from queryset?
I found, that django add JOIN on count() operation to sql query.
So, if we have code like this:
entities = Entities.objects.select_related('subentity').all()
#We will have INNER JOIN here..
entities.count()
I'm looking for a way to remove join.
One important detail - I got this queryset into django paginator, so I can't simply write
Entities.objects.all().count()
I believe this code comments provide a relatively good answer to the general question that is asked here:
If select_related(None) is called, the list is cleared.
https://github.com/django/django/blob/stable/1.8.x/django/db/models/query.py#L735
In the general sense, if you want to do something to the entities queryset, but first remove the select_related items from it, entities.select_related(None).
However, that probably doesn't solve your particular situation with the paginator. If you do entries.count(), then it already will remove the select_related items. If you find yourself with extra JOINs taking place, then it could be several non-ideal factors. It could be that the ORM fails to remove it because of other logic that may or may not affect the count when combined with the select_related.
As a simple example of one of these non-ideal cases, consider Foo.objects.select_related('bar').count() versus Foo.objects.select_related('bar').distinct().count(). It might be obvious to you that the original queryset does not contain multiple entries, but it is not obvious to the Django ORM. As a result, the SQL that executes contains a JOIN, and there is no universal prescription to work around that. Even applying .select_related(None) will not help you.
Can you show the code where you need this, I think refactoring is the best answer here.
If you want quick answer, entities.query.select_related = False, but it's rather hacky (and don't forget to restore the value if you will need select_related later).
Setup:
Django 1.1.2, MySQL 5.1
Problem:
Blob.objects.filter(foo = foo) \
.filter(status = Blob.PLEASE_DELETE) \
.delete()
This snippet results in the ORM first generating a SELECT * from xxx_blob where ... query, then doing a DELETE from xxx_blob where id in (BLAH); where BLAH is a ridiculously long list of id's. Since I'm deleting a large amount of blobs, this makes both me and the DB very unhappy.
Is there a reason for this? I don't see why the ORM can't convert the above snippet into a single DELETE query. Is there a way to optimize this without resorting to raw SQL?
For those who are still looking for an efficient way to bulk delete in django, here's a possible solution:
The reason delete() may be so slow is twofold: 1) Django has to ensure cascade deleting functions properly, thus looking for foreign key references to your models; 2) Django has to handle pre and post-save signals for your models.
If you know your models don't have cascade deleting or signals to be handled, you can accelerate this process by resorting to the private API _raw_delete as follows:
queryset._raw_delete(queryset.db)
More details in here. Please note that Django already tries to make a good handling of these events, though using the raw delete is, in many situations, much more efficient.
Not without writing your own custom SQL or managers or something; they are apparently working on it though.
http://code.djangoproject.com/ticket/9519
Bulk delete is already part of django
Keep in mind that this will, whenever possible, be executed purely in SQL