Django ORM: Selecting related set - python

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?

Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice

Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.

I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)

I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Related

Django: prefetch_related() with m2m through relationship v2

I know there is already a similar question, but I think my case is a bit more complicated because I have a different entry point.
These are my models:
class m_Interaction(models.Model):
fk_ip = models.ForeignKey('m_IP', related_name="interactions")
class m_User(models.Model):
name = models.CharField(max_length=200)
class m_IP(models.Model):
fk_user = models.ForeignKey('m_User', related_name="ips" )
class m_Feature(models.Model):
name = models.CharField(max_length=200)
m2m_interaction = models.ManyToManyField(m_Interaction, related_name='features', through='m_Featurescore')
class m_Featurescore(models.Model):
score = models.FloatField(null=False)
fk_interaction = models.ForeignKey(m_Interaction, related_name='featurescore')
fk_feature = models.ForeignKey(m_Feature, related_name='featurescore')
I start with m_User, follow the reverse relationship over m_IP to the Interactions (m_Interaction). Then I want to get every m_Featurescore.score for each Interaction for a specific instance of m_Feature.
My working query to access at least all interactions in a performant way:
m_User.objects.all().prefetch_related('ips__interactions')
But I can't figure out the correct 'prefetch_related'-statement to access the m_Featurescore.score like this
db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score
without making a lot of queries.
I already tried almost all combinations of the following:
'ips__interactions__features__featurescore'
Any suggestions?
I found the answer to my own question with the help of noamk in the comments:
I didn't consider that the get()-method in db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score will issue a new query everytime it's called (it's kinda obvious now).
Therefore I simply restructured my code and now I don't need get() anymore and can use the benefit of the prefetch.
If somebody still needs to filter the Prefetch()-object should be used as suggested by noamk

Django prefetch_related GenericForeignKey with multiple content types

I'm using django-activity-stream to display a list of recent events. For the sake of example these could be someone commenting or someone editing an article. I.e. the GenericForeignKey action_object could reference a Comment or an Article. I'd like to display a link to whatever the action_object is too:
<a href="{{ action.action_object.get_absolute_url }}">
{{ action.action_object }}
</a>
The problem is this causes queries for every single item, particularly as Comment.get_absolute_url requires the comment's article, which has not been fetched yet, and Article.__unicode__ requires its revision.content, which also hasn't been fetched.
django-activity-stream already calls prefetch_related('action_object') automatically (related discussion).
This appears to be working as testing with {{ action.action_object.id }} results in a single query per action_object_content_type, despite the docs saying:
It also supports prefetching of GenericRelation and GenericForeignKey, however, it must be restricted to a homogeneous set of results. For example, prefetching objects referenced by a GenericForeignKey is only supported if the query is restricted to one ContentType.
And there is more than one content type. However in my use case above I need extra prefetch_related calls, for example:
query = query.prefetch_related('action_object__article`, `action_object__revision`)
But this complains because Articles don't have an __article (and would probably complain about Comments not having a __revision too if it got that far). I'm assuming this is what the docs are really referring to. So I thought I'd try this:
comments = query._clone().filter(action_object_content_type=comment_ctype).prefetch_related('action_object__article')
articles = query._clone().filter(action_object_content_type=article_ctype).prefetch_related('action_object__revision')
query = comments | articles
But the results are always empty. I guess querysets only support a single prefetch_related list and can't be joined like that.
I like a single queryset to return because further filtering is done later in the code which this part doesn't know about. Although once the queryset is finally evaluated I want to be able to have django fetch everything needed to render the events.
Is there another way?
I had a look at Prefetch objects but I don't think they offer any help in this situation.
A solution can be found in django-notify-x which is derived from django-notifications which, in turn, is derived from django-activity-stream. It makes use of a "django snippet" linked in the copied text below.
https://github.com/v1k45/django-notify-x/pull/19
Using a snippet from https://djangosnippets.org/snippets/2492/,
prefetch generic relations to reduce the number of queries.
Currently, we trigger one additional query for each generic relation
for each record, with this code, we reduce to one additional query for
each generic relation for each type of generic relation used.
If all your notifications are related to a Badges model, only one
aditional query will be triggered.
For Django 1.10 and 1.11, I am using the snippet above modified as below (just in case you are not using django-activity-stream):
from django.contrib.contenttypes.models import ContentType
from django.contrib.contenttypes import fields as generic
def get_field_by_name(meta, fname):
return [f for f in meta.get_fields() if f.name == fname]
def prefetch_relations(weak_queryset):
weak_queryset = weak_queryset.select_related()
# reverse model's generic foreign keys into a dict:
# { 'field_name': generic.GenericForeignKey instance, ... }
gfks = {}
for name, gfk in weak_queryset.model.__dict__.items():
if not isinstance(gfk, generic.GenericForeignKey):
continue
gfks[name] = gfk
data = {}
for weak_model in weak_queryset:
for gfk_name, gfk_field in gfks.items():
related_content_type_id = getattr(weak_model, get_field_by_name(gfk_field.model._meta, gfk_field.ct_field)[
0].get_attname())
if not related_content_type_id:
continue
related_content_type = ContentType.objects.get_for_id(related_content_type_id)
related_object_id = int(getattr(weak_model, gfk_field.fk_field))
if related_content_type not in data.keys():
data[related_content_type] = []
data[related_content_type].append(related_object_id)
for content_type, object_ids in data.items():
model_class = content_type.model_class()
models = prefetch_relations(model_class.objects.filter(pk__in=object_ids))
for model in models:
for weak_model in weak_queryset:
for gfk_name, gfk_field in gfks.items():
related_content_type_id = getattr(weak_model,
get_field_by_name(gfk_field.model._meta, gfk_field.ct_field)[
0].get_attname())
if not related_content_type_id:
continue
related_content_type = ContentType.objects.get_for_id(related_content_type_id)
related_object_id = int(getattr(weak_model, gfk_field.fk_field))
if related_object_id != model.pk:
continue
if related_content_type != content_type:
continue
setattr(weak_model, gfk_name, model)
return weak_queryset
This is giving me the intended results.
EDIT:
To use it, you simply call prefetch_relations, with your QuerySet as the argument.
For example, instead of:
my_objects = MyModel.objects.all()
you can do this:
my_objects = prefetch_relations(MyModel.objects.all())

Django: Can you tell if a related field has been prefetched without fetching it?

I was wondering if there is a way in Django to tell if a related field, specifically the "many" part of a one-to-many relationship, has been fetched via, say, prefetch_related() without actually fetching it?
So, as an example, let's say I have these models:
class Question(Model):
"""Class that represents a question."""
class Answer(Model):
"""Class the represents an answer to a question."""
question = ForeignKey('Question', related_name='answers')
Normally, to get the number of answers for a question, the most efficient way to get this would be to do the following (because the Django docs state that count() is more efficient if you just need a count):
# Note: "question" is an instance of class Question.
answer_count = question.answers.count()
However in some cases the answers may have been fetched via a prefetch_related() call (or some way, such as previously having iterated through the answers). So in situations like that, it would be more efficient to do this (because we'd skip the extra count query):
# Answers were fetched via prefetch_related()
answer_count = len(question.answers.all())
So what I really want to do is something like:
if question.answers_have_been_prefetched: # Does this exist?
answer_count = len(question.answers.all())
else:
answer_count = question.answers.count()
I'm using Django 1.4 if it matters. Thanks in advance.
Edit: added clarification that prefetch_related() isn't the only way the answers could've been fetched.
Yes, Django stores the prefetched results in the _prefetched_objects_cache attribute of the parent model instance.
So you can do something like:
instance = Parent.objects.prefetch_related('children').all()[0]
try:
instance._prefetched_objects_cache[instance.children.prefetch_cache_name]
# Ok, it's pefetched
child_count = len(instance.children.all())
except (AttributeError, KeyError):
# Not prefetched
child_count = instance.children.count()
See the relevant use in the django source trunk or the equivalent in v1.4.9

How to modify a model after bulk update in django?

I try some code like this:
mymodels = MyModel.objects.filter(status=1)
mymodels.update(status=4)
print(mymodels)
And the result is an empty list
I know that I can use a for loop to replace the update.
But it will makes a lot of update query.
Is there anyway to continue manipulate mymodels after the bulk update?
Remember that Django's QuerySets are lazy:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated
but the update() method function is actually applied immediately:
The update() method is applied instantly, and the only restriction on the QuerySet that is updated is that it can only update columns in the model’s main table, not on related models.
So while - in your code - are applying the update call after your filter, in reality it is being applied beforehand and therefore your objects status is being changed before the filter is (lazily) applied, meaning there are no matching records and the result is empty.
mymodels = MyModel.objects.filter(status=1)
objs = [obj for obj in mymodels] # save the objects you are about to update
mymodels.update(status=4)
print(objs)
should work.
Explanations why had been given by Timmy O'Mahony.

Django debug error

I have the following in my model:
class info(models.Model):
add = models.CharField(max_length=255)
name = models.CharField(max_length=255)
An in the views when i say
info_l = info.objects.filter(id=1)
logging.debug(info_l.name)
i get an error saying name doesnt exist at debug statement.
'QuerySet' object has no attribute 'name'
1.How can this be resolved.
2.Also how to query for only one field instead of selecting all like select name from info.
1. Selecting Single Items
It looks like you're trying to get a single object. Using filter will return a QuerySet object (as is happening in your code), which behaves more like a list (and, as you've noticed, lacks the name attribute).
You have two options here. First, you can just grab the first element:
info_l = info.objects.filter(id=1)[0]
You could also use the objects.get method instead, which will return a single object (and raise an exception if it doesn't exist):
info_l = info.objects.get(id=1)
Django has some pretty good documentation on QuerySets, and it may be worth taking a look at it:
Docs on using filters
QuerySet reference
2. Retrieving Specific Fields
Django provides the defer and only methods, which will let you choose specific fields from the database, rather than fetching everything at once. These don't actually prevent the fields from being read; rather, it loads them lazily. defer is an "opt-in" mode, which lets you specify what fields should be lazily loaded. only is "out-out" -- you call it, and only the fields you pass will by eagerly loaded.
So in your example, you'd want to do something like this:
info_l = info.objects.filter(id=1).only('name')[0]
Though with a model as simple as the example you give, I wouldn't worry much at all about limiting fields.

Categories

Resources