Big django queryset in Python's if conditional expression

Big django queryset in Python's if conditional expression - python

I have a queryset that was used in below code.
result = 1 if queryset else 0
In case of small queryset it's okay but when queryset gets bigger (more than 500 000 results) program freezes, it takes some time to stop it.
What is happening behind the scenes when Django's queryset is tested in the code above?
Is some extra work performed during that check?
Even though the queryset is big, there is no problem with calling count() or iterator() or any other methods, it is that conditional expression where the problem appears.
Edit:
Queryset is too big. It populates Queryset's self._result_cache. Same thing happens for len() and iterating over queryset in a for loop.

Python will either use the __bool__ or __len__ methods to test the truth value of an object, and it looks like the implementation for the Queryset class fetches all records:
https://github.com/django/django/blob/master/django/db/models/query.py#L279
def __bool__(self):
self._fetch_all()
return bool(self._result_cache)
It might be a better idea to use if queryset.count() or if queryset.exists() if that's what you want.

Related

when exactly django query execution occures

In a technical interview a questioner asked me a weird question regarding to the execution of querysets. Suppose we have a profile model like below:
class Profile(models.Model):
user = models.OneToOneField('User').select_related(User)
surname = models.TextField(null=True)
q = Profile.object.all()
or
q = Profile.object.get(id=1)
l = q.filter(active=True)
he asked how many query execution has been happened and I replied as the python interpreter executes Profile.object.all() at the begging then one query is already done. However, he answered zero, and one if we call the query, something like this:
for a in l:
a.surname
Is his answer true in django?
another doubt was about models.OneToOneField('User'), why he didn't use django.contrib.auth.models.User and defined models.OneToOneField('User').select_related(User)

QuerySets are not evaluated until you do something that actually needs them to be evaluated. As the documentation for the class itself states a QuerySet:
Represent a lazy database lookup for a set of objects.
Emphasis on the word lazy. This is because one often needs to call or chain methods on a queryset, a good example being a group by requiring subsequent calls to .values() and .annotate(). If a queryset was evaluated directly then we would be making too many unneeded queries to the database, slowing down execution to a halt.
As to when exactly a queryset is evaluated I would list the answer in short (for the long answer refer to When QuerySets are evaluated [Django docs]):
Iterating a queryset
Slicing a queryset (with the step parameter)
Pickling/Caching a queryset
Calling repr(), len(), list(), or bool() on a queryset
Various methods like get(), first(), last(), latest(), or earliest(), etc. also make a query to the database

Overriding get_queryset leads to caching headache in ListView, where data remains stale

Could someone explain why overriding get_queryset and referencing the queryset via self completely caches the page? I need to wait 5 minutes or more before updates made to the database display.
I'm trying to save a temporary value to the each object and pass it to the template.
I've got everything working fine and dandy in example 3 but don't really understand what I did to make it work, so any insight would be great!
Example 1: Caches for several minutes, but r.css='abc' work ok
class AppointmentListView(ListView):
qs = Appointment.objects.prefetch_related('client', 'patients')
def get_queryset(self):
for r in self.qs:
r.css = 'abc' #<-passes temp value to template ok
return self.qs
Example 2: No caching problem but r.css='abc' now does not work
If I don't include a method and just have the queryset called automatically, there is no caching and database updates display immediately, but my temp data does not reach template.
class AppointmentListView(ListView):
queryset = Appointment.objects.prefetch_related('client','patients')
for r in queryset:
r.css = 'abc' #<- NOT passed to template
Example 3: No caching problem AND r.css='abc' works fine
Finally if I put everything in the method, it all works fine - temp data reaches the template and there's no caching.
class AppointmentListView(ListView):
def get_queryset(self):
qs = Appointment.objects.prefetch_related('client','patients')
for r in qs:
r.css = 'abc' #<-passes to template ok
return qs

The behavior you're seeing is how Python evaluates your code. Below is a simplified example that explains what you're seeing.
import random
class Example1(object):
roll = random.randint(1, 6) # this is evaluated immediately!
def get_roll(self):
return self.roll
ex1 = Example1()
# the call below always returns the same number!
# (until Python re-interprets the class)
ex1.get_roll()
If you type the code above into a python interpreter, you'll notice that ex1.get_roll() always returns the same number!
Example1.roll is known as a class or static variable. These are evaluated only once when the class is defined.
class Example2(object):
def get_number(self):
roll = random.randint(1,6)
return roll
In Example2, a new random number is generated everytime get_roll method is called.
For the examples listed in your question:
Example 1
qs is a class variable, and thus only gets evaluated once (which is why you see the "caching" behavior). Subsequent calls to get_queryset returns the same qs variable that was initially evaluated.
Example 2
You didn't override get_queryset, which means ListView.get_queryset implementation is used.
Django's ListView.get_queryset copies the queryset before evaluating it - which is why you don't see "caching". However, because the queryset is copied, the effects from your for loop is thrown away.
Example 3
This is generally the correct way to write your code. You should write your methods like this if you don't want to see "caching" behavior.

How to modify a model after bulk update in django?

I try some code like this:
mymodels = MyModel.objects.filter(status=1)
mymodels.update(status=4)
print(mymodels)
And the result is an empty list
I know that I can use a for loop to replace the update.
But it will makes a lot of update query.
Is there anyway to continue manipulate mymodels after the bulk update?

Remember that Django's QuerySets are lazy:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated
but the update() method function is actually applied immediately:
The update() method is applied instantly, and the only restriction on the QuerySet that is updated is that it can only update columns in the model’s main table, not on related models.
So while - in your code - are applying the update call after your filter, in reality it is being applied beforehand and therefore your objects status is being changed before the filter is (lazily) applied, meaning there are no matching records and the result is empty.

mymodels = MyModel.objects.filter(status=1)
objs = [obj for obj in mymodels] # save the objects you are about to update
mymodels.update(status=4)
print(objs)
should work.
Explanations why had been given by Timmy O'Mahony.

Django: When to use QuerySet none()

Just came across this in the django docs
Calling none() will create a queryset that never returns any objects
and no query will be executed when accessing the results. A qs.none()
queryset is an instance of EmptyQuerySet.
I build a lot of CRUD apps (surprise) and I can't think of a situation where I would need to use none().
Why would one want to return an EmptyQuerySet?

Usually in instances where you need to provide a QuerySet, but there isn't one to provide - such as calling a method or to give to a template.
The advantage is if you know there is going to be no result (or don't want a result) and you still need one, none() will not hit the database.
For a non-realistic example, say you have an API where you can query your permissions. If the account hasn't been confirmed, since you already have the Account object and you can see that account.is_activated is False, you could skip checking the database for permissions by just using Permission.objects.none()

In cases where you want to append to querysets but want an empty one to begin with
Similar to conditions where we instantiate an empty list to begin with but gradually keep appending meaningful values to it
example..
def get_me_queryset(conditionA, conditionB, conditionC):
queryset = Model.objects.none()
if conditionA:
queryset |= some_complex_computation(conditionA)
elif conditionB:
queryset |= some_complex_computation(conditionB)
if conditionC:
queryset |= some_simple_computation(conditionC)
return queryset
get_me_queryset should almost always return instance of django.db.models.query.QuerySet (because good programming) and not None or [], or else it will introduce headaches later..
This way even if none of the conditions come True, your code will still remain intact. No more type checking
For those who do not undestand | operator's usage here:
queryset |= queryset2
It translates to:
queryset = queryset + queryset

another use of queryset.none is when you don't know if there will be objects but do not want to raise an error.
example:
class DummyMixin(object):
def get_context_data(self,**kwargs):
""" Return all the pks of objects into the context """
context = super(DummyMixin, self).get_context_data(**kwargs)
objects_pks = context.get(
"object_list",
Mymodel.objects.none()
).values_list("pk", flat=True)
context["objects_pks"] = objects_pks

Another good use case for this is if some calling method wants to call .values_list() or similar on results. If the method returned None, you'd get an error like
AttributeError: 'list' object has no attribute 'values_list'
But if your clause returns MyModel.objects.none() instead of None, the calling code will be happy, since the returned data is an empty queryset rather than a None object.
Another way of putting it is that it allows you to not mix up return types (like "this function returns a QuerySet or None," which is messy).

It's useful to see where qs.none() is used in other examples in the Django docs. For example, when initializing a model formset using a queryset if you want the resulting formset to be empty the example given is:
formset = AuthorFormSet(queryset=Author.objects.none())

none() is used in get_queryset() to return an empty queryset depending on the state of has_view_or_change_permission() as shown below:
class BaseModelAdmin(metaclass=forms.MediaDefiningClass):
# ...
def has_view_or_change_permission(self, request, obj=None):
return self.has_view_permission(request, obj) or self.has_change_permission(
request, obj
)
# ...
class InlineModelAdmin(BaseModelAdmin):
# ...
def get_queryset(self, request):
queryset = super().get_queryset(request)
if not self.has_view_or_change_permission(request):
queryset = queryset.none() # Here
return queryset

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?

Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice

Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.

I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)

I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Big django queryset in Python's if conditional expression - python

Related

when exactly django query execution occures

Overriding get_queryset leads to caching headache in ListView, where data remains stale

How to modify a model after bulk update in django?

Django: When to use QuerySet none()

Django ORM: Selecting related set

Categories

Resources