Consider the following query:
candidates = Candidate.objects.filter(ElectionID=ElectionIDx)
Objects in this query are ordered by their id field.
How do I randomise the order of the objects in the query? Can it be done using .order_by()?
Yes, you can use the special argument ? with order_by to get randomized queryset:
Candidate.objects.filter(ElectionID=ElectionIDx).order_by('?')
Doc
Note that, depending on the DB backend, the randomization might be slow and expensive. I would suggest you to do the benchmark first. If you feel it's slow, then try finding alternatives, before that go with ? first.
Related
The title says it all.
Let's take a look at this code for example:
objs = Model.objects.prefetch_related('model2').filter()
objs.first().model2_set.first().field
vs
objs = Model.objects.filter().prefetch_related('model2')
objs.first().model2_set.first().field
Question
When using prefetch_related() first, does Django fetch all the ManyToOne/ManyToMany relations without taking into consideration .filter() and after everything is fetched, the filter is applied?
IMO, that doesn't matter since there's still one query executed at the end.
Thanks in advance.
It doesn't matter where you specify prefetch_related as long as it's before any records are fetched. Personally I put things like prefetch_related, select_related. and only at the end of the chain but that just feels more expressive to me from a code readability perspective.
But that is not true of all manager methods. Some methods do have different effects depending on their position in the chain, for example order_by can have positional significance when used with distinct (group by).
I wonder what is the way to build a query.
I'm was trying to use SubQuery or Prefetch, prefetch_related, select_releated but i can't get better results from which I start.
I have situation where I'm getting object instance.
object = get_object_or_404(Object, id=pk)
then i need to get more data.
object.id,
object.name,
object.description,
object.update_frequency,
object.resources.values_list('extension'),
object.tags.values_list('name'),
object.resources.count(),
object.resources.values_list('file'),
object.resources.values_list('licence'),
object.edited
each row is different query.
How in best way reduce query number?
First of all, I would check if the optimization worth it.
I use django-debug-toolbar to have some metrics. For SQL request, you will see how many queries and how much time is spent on each.
Then, I will concentrate on what's important. Good resources to read are this and this.
If you provide a complete example, we can go deeper: template, model, etc
What is the fastest way to query one record from database that is satisfy my filter query.
mydb.objects.filter(start__gte='2017-1-1', status='yes').order_by('?')[:1]
This statement will first query thousands of records and then select one, and it is very slow, but I only need one, a random one. what is the fastest one to get?
Well, I'm not sure you will be able to do exactly what you want. I was running into a similar issue a few months ago and I ended up redesigning my implementation of my backend to make it work.
Essentially, you want to make the query time shorter by having it choose a random record that fulfills both requirements (start__gte='2017-1-1', status='yes'), but like you say in order for the query to do so, it needs to filter your entire database. This means that you can cannot get a "true" random record from the database that also fulfills the filter requirements, because filtering inherently needs to look through all of your records (otherwise it wouldn't be really random, it would just be the first one it finds that fulfills your requirements).
Instead, consider putting all records that have a status='yes' in a separate relation, so that you can pull a random record from there and join with the larger relation. That would make the query time considerably faster (and it's the type of solution I implemented to get my code to work).
If you really want a random record with the correct filter information, you might need to employ some convoluted means.
You could use a custom manager in Django to have it find only one random record, something like this:
class UsersManager(models.Manager):
def random(self):
count = self.aggregate(count=Count('id'))['count']
random_index = randint(0, count - 1)
return self.all()[random_index]
class User(models.Model):
objects = UsersManager()
#Your fields here (whatever they are, it seems start__gte and status are some)!
objects = UserManager()
Which you can invoke then just by using:
User.objects.random()
This could be repeated with a check in your code until it returns a random record that fulfills your requirements. I don't think this is necessarily the cleanest or programmatically correct way of implementing this, but I don't think a faster solution exists for your specific issue.
I used this site as a source for this answer, and it has a lot more solid information about using this custom random method! You'll likely have to change the custom manager to serve your own needs, but if you add the random() method to your existing custom manager it should be able to do what you need of it!
Hope it helps!
Using order_by('?') will cause you a great performance issue. A better way is to use something like this: Getting a random row from a relational database.
count = mydb.objects.filter(start__gte='2017-1-1', status='yes').aggregate(count=Count('id'))['count']
random_index = randint(0, count - 1)
result= mydb.objects.filter(start__gte='2017-1-1', status='yes')[random_index]
Is there any way to remove select related from queryset?
I found, that django add JOIN on count() operation to sql query.
So, if we have code like this:
entities = Entities.objects.select_related('subentity').all()
#We will have INNER JOIN here..
entities.count()
I'm looking for a way to remove join.
One important detail - I got this queryset into django paginator, so I can't simply write
Entities.objects.all().count()
I believe this code comments provide a relatively good answer to the general question that is asked here:
If select_related(None) is called, the list is cleared.
https://github.com/django/django/blob/stable/1.8.x/django/db/models/query.py#L735
In the general sense, if you want to do something to the entities queryset, but first remove the select_related items from it, entities.select_related(None).
However, that probably doesn't solve your particular situation with the paginator. If you do entries.count(), then it already will remove the select_related items. If you find yourself with extra JOINs taking place, then it could be several non-ideal factors. It could be that the ORM fails to remove it because of other logic that may or may not affect the count when combined with the select_related.
As a simple example of one of these non-ideal cases, consider Foo.objects.select_related('bar').count() versus Foo.objects.select_related('bar').distinct().count(). It might be obvious to you that the original queryset does not contain multiple entries, but it is not obvious to the Django ORM. As a result, the SQL that executes contains a JOIN, and there is no universal prescription to work around that. Even applying .select_related(None) will not help you.
Can you show the code where you need this, I think refactoring is the best answer here.
If you want quick answer, entities.query.select_related = False, but it's rather hacky (and don't forget to restore the value if you will need select_related later).
In general, is there a type of Model query you look for to optimize by indexing a field (db_index=True)?
In case it's relevant: I'm using MySQL.
Elaboration:
Although I appreciate the responses already given. I was more looking for advice such as this one from a colleague:
You should definitely index the fields in your default ordering and any field you use for filtering.
Think that about covers it?
Install django-debug-toolbar
Look at the SQL panel, look for long-running queries
Index the columns selected in those queries
If you need help with the queries, try the "EXPLAIN" MySQL command on the query.
Basically you should index fields that are searched for often. For example if you have a user-table then username could be indexed as well if things are constantly queried based on username. There are trade offs of course.