I am newbie to Tastypie. I see that tastypie call Django Models using queryset and displays data.
My question is: if Tastypie builds the statement queryset = < DJANGO-MODEL >.objects.all(),
will it put a tremendous load on the database/backend if there are 100 million objects?
class RestaurentsResource(ModelResource):
class Meta:
queryset = Restaurents.objects.all()
print queryset
resource_name = 'restaurents'
Django querysets are lazy: https://docs.djangoproject.com/en/dev/topics/db/queries/#querysets-are-lazy, so no database activity will be carried out until the queryset is evaluated.
If you return all 1000 objects from your REST interface, then a 'tremendous' load will be placed on your server, usually pagination: http://django-tastypie.readthedocs.org/en/latest/paginator.html or similar is used to prevent this.
Calling print on the queryset as in the example class above, will force evaluation. Doing this in production code is a bad idea, although it can be handy when debugging or as a learning tool.
The two other answers are correct in terms of QuerySets being lazy. But on top of that, the queryset you specify in the Meta class is the base for the query. In Django, a QuerySet is essentially the representation of a database query, but is not executed. QuerySets can be additionally filtered before a query is executed.
So you could have code that looks like:
Restaurant.objects.all().filter(attribute1=something).filter(attribute2=somethindelse
Tastypie just uses the QuerySet you provide as the base. On each API access, it adds additional filters to the base before executing the new query. Tastypie also handles some pagination, so you can get paginated results so not every row is returned.
While using all() is very normal, this feature is most useful if you want to limit your Tastypie results. Ie, if your Restaurant resource has a 'hidden' field, you might set:
class Meta:
queryset = Restaurant.objects.filter(hidden=False)
All queries generated by the API will use the given queryset as the base, and won't show any rows where 'hidden=True'.
Django QuerySet objects are evaluated lazily, that is - the result is fetched from the db when it is really needed. In this case, queryset = Restaurents.objects.all() create a QuerySet that has not yet been evaluated.
The default implementation of ModelResource usually forces the queryset to be evaluated at dehydration time or paging. The first one requires model objects to be passed, the other one slices the queryset.
Custom views, authorization, or filtering methods can force the evaluation earlier.
That said, after doing all the filtering and paging, the results' list fetched is considerably smaller that the overall amount of data in the database.
Related
I am working on a DRF API and I am not completely familiar with django properties.
The DB relationships are classic. Companies have different jobs to which candidates can apply. Each job has several matches, match being a joined table between a job and a candidate. Matches have different statuses representing different phases of an application process.
So here is the deal:
I am using a drf viewset to get data from the api. This viewset uses a serializer to get specific fields, specifically the number of matches per status for a job. The simplified version of the serializer looks something like this.
class Team2AmBackofficeSerializer(Normal2JobSerializer):
class Meta:
model = Job
fields = (
'pk',
'name',
'company',
'company_name',
'job__nb_matches_proposition',
'job__nb_matches_preselection',
'job__nb_matches_valides',
'job__nb_matches_pitches',
'job__nb_matches_entretiens',
'job__nb_matches_offre',
)
The job__xxx fields are using the decorator #property, for instance:
#property
def job__nb_matches_offre(self):
return self.matches.filter(current_status__step_name='Offre').count()
The problem is each time I add one of these properties to my serializer's fields, the number of DB queries increases significantly. This is of course due to the fact that each property calls the DB multiple times. So here is my question:
Is there a way to optimize the number of queries made to the DB, either by changing something in the serializer or by getting the number of matches for a specific status in a different manner ?
I have had a look at select_related and prefetch_related. This allows me to reduce the numbers of queries when getting information about the company but not really for the number of matches.
Any help is greatly appreciated :)
What you want is to annotate your queryset with these values which will result in the database doing all the counting in just one query. The result is significantly faster than your current solution.
Example:
from django.db.models import Count, Q
Job.objects.annotate(
'nb_matches_offre'=Count(
'pk',
filter=Q(current_status__step_name='Offre')
),
'nb_matches_entretiens'=Count(...)
).all()
The resulting queryset will contain Job objects that have the properties job_obj.nb_matches_offre and job_obj.nb_matches_entretiens with the count.
See also https://docs.djangoproject.com/en/3.0/topics/db/aggregation/
Pretty much the same flavor as: Django get a QuerySet from array of id's in specific order. I tried https://stackoverflow.com/a/37648265/4810639
But my list of ids is huge (> 50000) and both qs = Foo.objects.filter(id__in=id_list) and qs = qs.order_by(preserved) buckle under the strain.
Note: I need a queryset due to the specific django method I'm overriding so anything returning a list won't work.
EDIT: In response to the comments I'm specifically overriding the get_search_results() in the admin. My search engine returns the id of the model(s) that match the query. But get_search_results() needs to return a queryset. Hence the large list of id's.
I did this by creating a FakeQueryset class that had enough of the functions of a regular queryset that it was able to act like one. Then when I needed to display it I would hand it over to a custom paginator that would only pull a few ids from the database at a time. Duck typing for the win!
I use [https://www.npmjs.com/package/vue-bootstrap4-table#8-filtering][1] with django-rest-framework.
The problem is that this component uses totally different query params for sorting, filtering, etc.
vue-bootstrap4-table
http://127.0.0.1:8000/api/products/?queryParams=%7B%22sort%22:[],%22filters%22:[%7B%22type%22:%22simple%22,%22name%22:%22code%22,%22text%22:%22xxx%22%7D],%22global_search%22:%22%22,%22per_page%22:10,%22page%22:1%7D&page=1
"filters":[{"type":"simple","name":"code","text":"xxx"}],
whereas Django-rest-framework needs this format:
../?code__icontains=...
I want to figure out how to make DRF accept this format instead of the built-in?
I use just ViewSet.
class ProductViewSet(viewsets.ModelViewSet):
serializer_class = ProductSerializer
filter_class = ProductFilter
filter_backends = [filters.OrderingFilter]
ordering_fields = '__all__'
Is it possible?
It translates to:
http://127.0.0.1:8000/api/products/?queryParams={"sort":[],"filters":[{"type":"simple","name":"code","text":"xxx"}],"global_search":"","per_page":10,"page":1}&page=1
It almost looks as though you're still supposed to serialize these arguments into the correct format manually, or send them in the body of the request instead of as a query param.
I don't know of a painless way to get DRF to deal with this by automatically. However, since the value of 'queryParams' is valid JSON, you can override the methods you want on the ModelViewSet. This page describes the methods you can override. To get the json into a dict you can do json.loads(request.query_params['queryParams']). From then on you can filter and order manually with the ORM.
Or, of course, you could turn the query params into 'regular' query params client side. This is a great lib that can help you out with that: https://medialize.github.io/URI.js/.
Also, it's generally ill-advised to allow users to order against any field. With products this is probably relatively low risk, but don't make it a common practice.
I would like my models to automatically filter by current user.
I did this by defining:
class UserFilterManager(models.Manager):
def get_queryset(self):
return super(UserFilterManager, self).get_queryset().filter( owner=get_current_user() )
where get_current_user() is a middleware which extracts the current user from the request passed to Django.
However, I need to use the models from Celery which does not go through the middleware. In these cases
MyModel.objects.all()
needs to become
MyModel.objects.filter(user=<some user>)
To avoid wrong queries caused by forgetting to filter by user, I would like the model/manager/queryset to assert when a query (any query) is performed without a filter on user.
Is there a way to achieve this?
From what I see get_queryset() cannot receive parameters and models.QuerySet won't provide aid here.
I wish to use Django REST framework to create a number of model objects "together" -- i.e. in a single transaction.
The objective is that each of the objects will only be visible at the (successful) end of the transaction.
How can I do that?
Use atomic from django.db.transaction as a decorator around a function performing the database operations you are after:
If obj_list contains a list of populated (but not saved) model objects, this will execute all operations as part of one transaction.
#atomic
def save_multiple_objects(obj_list):
for o in obj_list:
o.save()
If you want to save multiple objects as part of the same API request, then (for example), if they are all of the same type, then you could POST a list of objects to an API endpoint - see Django REST framework post array of objects
You can achieve this by using django db transactions. Refer to the code below
from django.db import transaction
with transaction.atomic():
model_instance = form.save(commit=False)
model_instance.creator = self.request.user
model_instance.img_field.field.upload_to = 'directory/'+model_instance.name+'/logo'
self.object = form.save()
This example is taken from my own answer to this SO post. This way, before calling save() you can save/edit other dependencies