Django. Remove select_related from queryset

Django. Remove select_related from queryset - python

Is there any way to remove select related from queryset?
I found, that django add JOIN on count() operation to sql query.
So, if we have code like this:
entities = Entities.objects.select_related('subentity').all()
#We will have INNER JOIN here..
entities.count()
I'm looking for a way to remove join.
One important detail - I got this queryset into django paginator, so I can't simply write
Entities.objects.all().count()

I believe this code comments provide a relatively good answer to the general question that is asked here:
If select_related(None) is called, the list is cleared.
https://github.com/django/django/blob/stable/1.8.x/django/db/models/query.py#L735
In the general sense, if you want to do something to the entities queryset, but first remove the select_related items from it, entities.select_related(None).
However, that probably doesn't solve your particular situation with the paginator. If you do entries.count(), then it already will remove the select_related items. If you find yourself with extra JOINs taking place, then it could be several non-ideal factors. It could be that the ORM fails to remove it because of other logic that may or may not affect the count when combined with the select_related.
As a simple example of one of these non-ideal cases, consider Foo.objects.select_related('bar').count() versus Foo.objects.select_related('bar').distinct().count(). It might be obvious to you that the original queryset does not contain multiple entries, but it is not obvious to the Django ORM. As a result, the SQL that executes contains a JOIN, and there is no universal prescription to work around that. Even applying .select_related(None) will not help you.

Can you show the code where you need this, I think refactoring is the best answer here.
If you want quick answer, entities.query.select_related = False, but it's rather hacky (and don't forget to restore the value if you will need select_related later).

Related

Does it matter in which order you use prefetch_related and filter in Django?

The title says it all.
Let's take a look at this code for example:
objs = Model.objects.prefetch_related('model2').filter()
objs.first().model2_set.first().field
vs
objs = Model.objects.filter().prefetch_related('model2')
objs.first().model2_set.first().field
Question
When using prefetch_related() first, does Django fetch all the ManyToOne/ManyToMany relations without taking into consideration .filter() and after everything is fetched, the filter is applied?
IMO, that doesn't matter since there's still one query executed at the end.
Thanks in advance.

It doesn't matter where you specify prefetch_related as long as it's before any records are fetched. Personally I put things like prefetch_related, select_related. and only at the end of the chain but that just feels more expressive to me from a code readability perspective.
But that is not true of all manager methods. Some methods do have different effects depending on their position in the chain, for example order_by can have positional significance when used with distinct (group by).

How to randomise the order of a queryset

Consider the following query:
candidates = Candidate.objects.filter(ElectionID=ElectionIDx)
Objects in this query are ordered by their id field.
How do I randomise the order of the objects in the query? Can it be done using .order_by()?

Yes, you can use the special argument ? with order_by to get randomized queryset:
Candidate.objects.filter(ElectionID=ElectionIDx).order_by('?')
Doc
Note that, depending on the DB backend, the randomization might be slow and expensive. I would suggest you to do the benchmark first. If you feel it's slow, then try finding alternatives, before that go with ? first.

QuerySet.union() django 1.11.4

Django 1.11 introduced cool feature to combine querysets.
You can make something like:
combined_qs = qs1.union(qs2, qs3)
And voila! combined_qs has all querysets together keeping their order (btw, am I right here?). I need this, as qs1 should be the first, qs2 the second and so on.
But the challenging part is those querysets can't be ordered by any field. As I get it, it happens due MySql ignores any ORDER BY statements in UNION, and it can be done only via subquery. And I don't get it how it can be done via Django ORM without too much raw SQL magic.
Any ideas on how those querysets can be unioned with respect to their order_by() statements?
Upd.
Seems like I saw what I wanted to see and not what it is. Perhaps, I still should stay with django-querysetsequence. Which is a good lib, but I had a hope we have finally got the native method.

How do to explicitly define the query used in subqueryload_all?

I'm using subqueryload/subqueryload_all pretty heavily, and I've run into the edge case where I tend to need to very explicitly define the query that is used during the subqueryload. For example I have a situation where I have posts and comments. My query looks something like this:
posts_q = db.query(Post).options(subqueryload(Post.comments))
As you can see, I'm loading each Post's comments. The problem is that I don't want all of the posts' comments, I need to also take into account a deleted field, and they need to be ordered by create time descending. The only way I have observed this being done, is by adding options to the relationship() declaration between posts and comments. I would prefer not to do this, b/c it means that that relationship cannot be reused everywhere after that, as I have other places in the app where those constraints may not apply.
What I would love to do, is explicitly define the query that subqueryload/subqueryload_all uses to load the posts' comments. I read about DisjointedEagerLoading here, and it looks like I could simply define a special function that takes in the base query, and a query to load the specified relationship. Is this a good route to take for this situation? Anyone ever run into this edge case before?

The answer is that you can define multiple relationships between Posts and Comments:
class Post(...):
active_comments = relationship(Comment,
primary_join=and_(Comment.post_id==Post.post_id, Comment.deleted=False),
order_by=Comment.created.desc())
Then you should be able to subqueryload by that relationship:
posts_q = db.query(Post).options(subqueryload(Post.active_comments))
You can still use the existing .comments relationship elsewhere.

I also had this problem and it took my some time to realize that this is an issue by design. When you say Post.comments then you refer to the relationship that says "these are all the comments of that post". However, now you want to filter them. If you'd now specify that condition somewhere on subqueryload then you are essentially loading only a subset of values into Post.comments. Thus, there will be values missing. Essentially you have a faulty representation of your data in the model.
The question here is how to approach this then, because you obviously need this value somewhere. The way I go is building the subquery myself and then specify special conditions there. That means you get two objects back: The list of posts and the list of comments. That is not a pretty solution, but at least it is not displaying data in a wrong way. If you were to access Post.comments for some reason, you can safely assume it contains all posts.
But there is room for improvement: You might want to have this attached to your class so you don't carry around two variables. The easy way might be to define a second relationship, e.g. published_comments which specifies extra parameters. You could then also control that no-one writes to it, e.g. with attribute events. In these events you could, instead of forbidding manipulation, handle how manipulation is allowed. The only problem might be when updates happen, e.g. when you add a comment to Post.comments then published_comments won't be updated automatically because they are not aware of each other. Again, I'd take events for this if this is a required feature (but with the above ugly solution you would not have that either).
As a last, hybrid, solution you could take the first approach and then just assign those values to your object, e.g. Post.deleted_comments = deleted_comments.
The thing to keep in mind here is that it is generally not a clever idea to manipulate the query the ORM makes as this could lead to problems later on. I have taken this approach and manipulated the queries (with contains_eager this is easily possible) but it has created problems on some points (while generally being functional) so I dropped that approach.

Getting certain elements from DB in Django

I have a simple Python/Django class:
class myModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
and I want to get two elements from my database. First is the newest element and the second is newest positive element. So I can do this like this:
myModel.objects.all().order_by('-date')[:1][0]
myModel.objects.filter(value__gte = 0).order_by('-date')[:1][0]
Note those [:1][0] at the end - this is because I want to get maximum use of database sql engine. The thing is that I still need two queries and I want to combine it into a single one (something like [:2] at the end which will produce the result I want). I know about Django's Q, but can't figure out how to use it in this context. Maybe some raw sql? I'm waiting for ideas. :)

This looks like premature optimisation to me. Is two queries instead of one really so bad? At the moment, anyone who knows the Django ORM can understand your two queries. After you've replaced it with some funky raw SQL, that might not be the case.
You should use [0] instead of [:1][0]. Django knows how to slice querysets efficiently -- both queries will result in the exact same SQL.

This doesn't fully answer your question, but you can get rid of those [:1][0] and order_by by using latest QuerySet method, it will return the latest element in the QuerySet using the argument provided as a date field.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.