Prefech many to many relation for one class instance - python

I want to limit the queries for a detail view. I want to access multiple many to many fields for one class instance in less query. It seems prefetch_related doesn't work with get and the server hits he database for every manytomany field.
JobInstance = Job.objects.get(pk=id).prefetch_related('cities').prefetch_related('experience_level')

You can let it work, by reordering it, like:
job_instance = Job.objects.prefetch_related('cities', 'experience_level').get(pk=id)
A .prefetch_related(..) is defined on a QuerySet, when you perform a .get(..) then you fetch the object, and you are no longer working with a queryset.
But for a single object, .prefetch_related(..) will not improve efficiency. After all, .prefetch_related(..) will make here two extra queries to fetch the related objects, exactly as much as not prefetching, and later evaluating the related objects of the job_instance.
.prefetch_related(..) is therefore useful when you want to fetch the related objects of multiple objects in bulk.

Related

Django performant way to get a queryset of a massive (I'm talking huge) list of ids in order

Pretty much the same flavor as: Django get a QuerySet from array of id's in specific order. I tried https://stackoverflow.com/a/37648265/4810639
But my list of ids is huge (> 50000) and both qs = Foo.objects.filter(id__in=id_list) and qs = qs.order_by(preserved) buckle under the strain.
Note: I need a queryset due to the specific django method I'm overriding so anything returning a list won't work.
EDIT: In response to the comments I'm specifically overriding the get_search_results() in the admin. My search engine returns the id of the model(s) that match the query. But get_search_results() needs to return a queryset. Hence the large list of id's.
I did this by creating a FakeQueryset class that had enough of the functions of a regular queryset that it was able to act like one. Then when I needed to display it I would hand it over to a custom paginator that would only pull a few ids from the database at a time. Duck typing for the win!

What is a better way to query a ManyToManyField for speed? entire object or id only?

I'm trying to determine the most efficient way to query a ManyToManyField for speed.
I have 2 options that I know of:
Add the entire object to the field
Add just the id to the field
If I add the object, obviously with on_delete=models.CASCADE, I get that benefit, which is huge, but I'm afraid adding it might slow query speed down because it's getting an entire object, and many of them at that.
Whereas with just the id, it's just an int, so less heavy, and faster I assume.
For speed only, what would you suggest?
Django provide prefetch_related to make your query faster for ManyToMany field or reverse ForeignKey field.
and select_related for normal ForeignKey field.

Number of attributes in Django Models

I searched a lot and did not find what I´am looking for.
What would be the best concept for a model class in django?
To extend User, would be better to have a class with several attributes, or break this class into several classes with few attributes? I´m using the django ORM now.
Say I have a class called Person that extends User, would be better:
class Person(models.Model):
user = foreingkey(User)
attribute1 =
...
attributeN =
Or, would it be better to do this:
class PersonContac(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonAddress(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
class PersonHobby(models.Model):
user = foreingkey(User)
attribute1 =
...
attribute3 =
My each of my views would use the data from the smaller classes (probably).
Over time, the atrribute number can expand.
I want to do is do it once, and touch the minimum possible.
Various attributes can be unfilled by the user, they are not required.
The number of user is indefinite (can be a lot).
I´m concerned in terms of long term performance and maintaining.
If someone can explain me, what would be better for my code, and why.
And what would be better in general (less classes/more attributes, or more classes/less attributes), using the Django ORM.
It is better if my views use the data of only one model class, or it makes no (or little) difference?
Edit:
On the rush for writing I used bad names on class. None of these attributes are many-to-many fields, the User will have only one value for each attribute, or blank.
The number of atributes can expand over time, but not in a great number.
Put any data that is specific to only one User directly in the model. This would probably be things like "Name", "Birthday", etc.
Some things might be better served by a separate model, though. For example multiple people might have the same Hobby or one User might have multiple Hobby(s). Make this a separate class and use a ForeignKeyField or ManyToManyField as necessary.
Whatever you choose, the real trick is to optimize the number of database queries. The django-debug-toolbar is helpful here.
Splitting up your models would by default result in multiple database queries, so make sure to read up on select related to condense that down to one.
Also take a look at the defer method when retrieving a queryset. You can exclude some of those fields that aren't necessary if you know you won't use them in a particular view.
I think it's all up to your interface.
If you have to expose ALL data for a user in a single page and you have a single, large model you will end up with a single sql join instead of one for each smaller table.
Conversely, if you just need a few of these attributes, you might obtain a small performance gain in memory usage if you join the user table with a smaller one because you don't have to load a lot of attributes that aren't going to be used (though this might be mitigated through values (documentation here)
Also, if your attributes are not mandatory, you should at least have an idea of how many attributes are going to be filled. Having a large table of almost empty records could be a waste of space. Maybe a problem, maybe not. It depends on your hw resources.
Lastly, if you really think that your attributes can expand a lot, you could try the EAV approach.

List of parents objects and their children with fewer queries

I've got a Django view that I'm trying to optimise. It shows a list of parent objects on a page, along with their children. The child model has the foreign key back to the parent, so select_related doesn't seem to apply.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent)
A naive implementation uses n+1 queries, where n is the number of parent objects, ie. one query to fetch the parent list, then one query to fetch the children of each parent.
I've written a view that does the job in two queries - one to fetch the parent objects, another to fetch the related children, then some Python (that I'm far too embarrassed to post here) to put it all back together again.
Once I found myself importing the standard library's collections module I realised that I was probably doing it wrong. There is probably a much easier way, but I lack the Django experience to find it. Any pointers would be much appreciated!
Add a related_name to the foreign key, then use the prefetch_related method which added to Django 1.4:
Returns a QuerySet that will automatically retrieve, in a single
batch, related objects for each of the specified lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different:
select_related works by creating a SQL join and including the fields
of the related object in the SELECT statement. For this reason,
select_related gets the related objects in the same database query.
However, to avoid the much larger result set that would result from
joining across a 'many' relationship, select_related is limited to
single-valued relationships - foreign key and one-to-one.
prefetch_related, on the other hand, does a separate lookup for each
relationship, and does the 'joining' in Python. This allows it to
prefetch many-to-many and many-to-one objects, which cannot be done
using select_related, in addition to the foreign key and one-to-one
relationships that are supported by select_related. It also supports
prefetching of GenericRelation and GenericForeignKey.
class Parent(models.Model):
name = models.CharField(max_length=31)
class Child(models.Model):
name = models.CharField(max_length=31)
parent = models.ForeignKey(Parent, related_name='children')
>>> Parent.objects.all().prefetch_related('children')
All the relevant children will be fetched in a single query, and used
to make QuerySets that have a pre-filled cache of the relevant
results. These QuerySets are then used in the self.children.all()
calls.
Note 1 that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously
cached results, and retrieve data using a fresh database query.
Note 2 that if you use iterator() to run the query, prefetch_related() calls will be ignored since these two
optimizations do not make sense together.
If you ever need to work with more than 2 levels at once, you can consider a different approach to storing trees in db using MPTT
In a nutshell, it adds data to your model which are updated during updates and allow a much more efficient retrieval.
Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data that you need is fetched in one statement. prefetch_related runs all the queries at once then caches them.
The trick here is to "join in" only what you absolutely need to in order to reduce the performance penalty of the join. "What you absolutely need to" is the long way of saying that you should pre-select only the fields that you will read later in your view or template. There is good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related
This is a snippet from one of my models where I faced a similar problem:
return QuantitativeResult.objects.select_related(
'enrollment__subscription__configuration__analyte',
'enrollment__subscription__unit',
'enrollment__subscription__configuration__analyte__unit',
'enrollment__subscription__lab',
'enrollment__subscription__instrument_model'
'enrollment__subscription__instrument',
'enrollment__subscription__configuration__method',
'enrollment__subscription__configuration__reagent',
'enrollment__subscription__configuration__reagent__manufacturer',
'enrollment__subscription__instrument_model__instrument__manufacturer'
).filter(<snip, snip - stuff edited out>)
In this pathological case, I went down from 700+ queries to just one. The django debug toolbar is your friend when it comes to this sort of issue.

Querying through several models

I have a django project with 5 different models in it. All of them has date field. Let's say i want to get all entries from all models with today date. Of course, i could just filter every model, and put results in one big list, but i believe it's bad. What would be efficient way to do that?
I don't think that it's a bad idea to query each model separately - indeed, from a database perspective, I can't see how you'd be able to do otherwise, as each model will need a separate SQL query. Even if, as #Nagaraj suggests, you set up a common Date model every other model references, you'd still need to query each model separately. You are probably correct, however, that putting the results into a list is bad practice, unless you actually need to load every object into memory, as explained here:
Be warned, though, that [evaluating a QuerySet as a list] could have a large memory overhead, because Django will load each element of the list into memory. In contrast, iterating over a QuerySet will take advantage of your database to load data and instantiate objects only as you need them.
It's hard to suggest other options without knowing more about your use case. However, I think I'd probably approach this by making a list or dictionary of QuerySets, which I could then use in my view, e.g.:
querysets = [cls.objects.filter(date=now) for cls in [Model1, Model2, Model3]]
Take a look at using Multiple Inheritance (docs here) to define those date fields in a class that you can subclass in the classes you want to return in the query.
For example:
class DateStuff(db.Model):
date = db.DateProperty()
class MyClass1(DateStuff):
...
class MyClass2(DateStuff):
...
I believe Django will let you query over the DateStuff class, and it'll return objects from MyClass1 and MyClass2.
Thank #nrabinowitz for pointing out my previous error.

Categories

Resources