Django: prefetch_related() with m2m through relationship v2

Django: prefetch_related() with m2m through relationship v2 - python

I know there is already a similar question, but I think my case is a bit more complicated because I have a different entry point.
These are my models:
class m_Interaction(models.Model):
fk_ip = models.ForeignKey('m_IP', related_name="interactions")
class m_User(models.Model):
name = models.CharField(max_length=200)
class m_IP(models.Model):
fk_user = models.ForeignKey('m_User', related_name="ips" )
class m_Feature(models.Model):
name = models.CharField(max_length=200)
m2m_interaction = models.ManyToManyField(m_Interaction, related_name='features', through='m_Featurescore')
class m_Featurescore(models.Model):
score = models.FloatField(null=False)
fk_interaction = models.ForeignKey(m_Interaction, related_name='featurescore')
fk_feature = models.ForeignKey(m_Feature, related_name='featurescore')
I start with m_User, follow the reverse relationship over m_IP to the Interactions (m_Interaction). Then I want to get every m_Featurescore.score for each Interaction for a specific instance of m_Feature.
My working query to access at least all interactions in a performant way:
m_User.objects.all().prefetch_related('ips__interactions')
But I can't figure out the correct 'prefetch_related'-statement to access the m_Featurescore.score like this
db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score
without making a lot of queries.
I already tried almost all combinations of the following:
'ips__interactions__features__featurescore'
Any suggestions?

I found the answer to my own question with the help of noamk in the comments:
I didn't consider that the get()-method in db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score will issue a new query everytime it's called (it's kinda obvious now).
Therefore I simply restructured my code and now I don't need get() anymore and can use the benefit of the prefetch.
If somebody still needs to filter the Prefetch()-object should be used as suggested by noamk

Related

Django Add list generated from the text of one field to many to many field

Having a bit of trouble trying to bulk add a list of items to a many to many field and though having tried various things have no clue on how to approach this. I've looked at the Django documentation and cant seem to find what I'm looking for.
Here is the code for my models:
class Subject(models.Model):
noun = models.CharField(max_length=30, null=True, blank=True)
class Knowledge(models.Model):
item_text = models.TextField()
item_subjects = models.ManyToManyField(Subject, null=True, blank=True)
def add_subjects(sender, instance, *args, **kwargs):
if instance.item_info:
item_subjects = classifier.predict_subjects(instance.item_info)
if item_subjects:
....
post_save.connect(add_subjects, sender=Knowledge)
The list is being generated by the classifer.predict_subjects function.
I have tried using the m2m_changed connector and the pre_save and post_save connect. I'm not even sure the many to many field is the right option would it be better to do make a foreign key relationship.
in place of the '...' I have tried this but it doesn't create the relationship between and only saves the last one.
for sub in item_subjects:
subject = Subject(id=instance.id, noun=sub)
subject.save()
I've also tried
instance.item_subjects = item_subjects
and a load more things that I can't really remember, I don't really think I'm in the right ballpark to be honest. Any suggestions?
edit:
ok, so I have got it adding all of the list items but still haven't managed to link these items to the many to many field.
for sub in item_subjects:
subject = Subject.objects.get_or_create(noun=sub)
edit 2:
So doing pretty much exactly the same thing outside of the loop in the Django shell seems to be working and saves the entry but it doesn't inside the function.
>>> k[0].item_subjects.all()
<QuerySet []>
>>> d, b = Subject.objects.get_or_create(noun="cats")
<Subject: cats>
>>> k[0].item_subjects.add(d)
>>> k[0].item_subjects.all()
<QuerySet [<Subject: cats>]>
edit 3
So I took what Robert suggested and it works in the shell just like above just not when using it in the admin interface. The print statements in my code show the array item being updated but it just dosen't persist. I read around and this seems to be a problem to do with the admin form clearing items before saving.
def sub_related_changed(sender, instance, *args, **kwargs):
print instance.item_subjects.all()
if instance.item_info:
item_subjects = classifier.predict_subjects(instance.item_info)
if item_subjects:
for sub in item_subjects:
subject, created = Subject.objects.get_or_create(noun=sub)
instance.item_subjects.add(subject)
print instance.item_subjects.all()
post_save.connect(sub_related_changed, sender=Knowledge)
I have tried using the function as m2m_changed signal as follows:
m2m_changed.connect(model_saved, sender=Knowledge.item_subjects.through)
But this either generates a recursive loop or doesn't fire.

Once you have the subject objects (as you have in your edit), you can add them with
for sub in item_subjects:
subject, created = Subject.objects.get_or_create(noun=sub)
instance.item_subjects.add(subject)
The "item_subjects" attribute is a way of managing the related items. The through relationships are created via the "add" method.
Once you've done this, you can do things like instance.item_subjects.filter(noun='foo') or instance.item_subjects.all().delete() and so on
Documentation Reference: https://docs.djangoproject.com/en/1.11/topics/db/examples/many_to_many/
EDIT
Ahh I didn't realize that this was taking place in the Django Admin. I think you're right that that's the issue. Upon save, the admin calls two methods: The first is model_save() which calls the model's save() method (where I assume this code lives). The second method it calls is "save_related" which first clears out ManyToMany relationships and then saves them based on the submitted form data. In your case, there is no valid form data because you're creating the objeccts on save.
If you put the relevant parts of this code into the save_related() method of the admin, the changes should persist.
I can be more specific about where it should go if you'll post both your < app >/models.py and your < app >/admin.py files.
Reference from another SO question:
Issue with ManyToMany Relationships not updating inmediatly after save

Understanding normalization tables in Django's ORM

I'm trying to learn Django from a background of coding the database schema directly myself. I want to understand how I should be effectively using the database abstraction tools to normalize.
As a contrived example, let's say I have a conversation that can ask questions on 3 subjects, and each question is complicated enough to warrant its own Class.
Class Conversation(models.Model):
partner = models.CharField()
Class Weather_q(models.Model):
#stuff
Class Health_q(models.Model):
#stuff
Class Family_q(models.Model):
#stuff
So let's say I want to have 2 conversations:
Conversation 1 with Bob: ask two different weather questions and one question about his health
Conversation 2 with Alice: ask about the weather and her family
Usually, I would code myself a normalization table for this:
INSERT INTO Conversation (partner) values ("Bob", "Alice"); --primary keys = 1 and 2
INSERT INTO NormalizationTable (fk_Conversation, fk_Weather_q, fk_Health_q, fk_Family_q) VALUES
(1,1,0,0), -- Bob weather#1
(1,2,0,0), -- Bob weather#2
(1,0,1,0), -- Bob health#1
(2,1,0,0), -- Alice weather#1
(2,0,0,1); -- Alice family#1
Do I need to explicitly create this normalization table or is that discouraged?
Class NormalizationTable(models.Model):
fk_Conversation = models.ForeignKey(Conversation)
fk_Weather_q = models.ForeignKey(Weather)
fk_Health_q = models.ForeignKey(Health)
fk_Family_q = models.ForeignKey(Family)
Then I then wanted to execute the conversations. I wrote a view like this (skipping exception catching and logic to iterate through multiple questions per conversation):
from myapp.models import Conversation, Weather_q, Health_q, Family_q
def converse(request):
#get this conversation's pk
#assuming "mypartner" is provided by the URL dispatcher
conversation = Conversation.objects.filter(partner=mypartner)[0]
#get the relevant row of the NormalizationTable
questions = NormalizationTable.objects.filter(fk_Conversation=conversation)[0]
for question in questions:
if question.fk_Weather_q:
return render("weather.html", Weather_q.objects.filter(pk=fk_Weather_q)[0])
if question.fk_Health_q:
return render("health.html", Health_q.objects.filter(pk=fk_Health_q)[0])
if question.fk_Family_q:
return render("family.html", Family_q.objects.filter(pk=fk_Family_q)[0])
Considered holistically, is this the "Django" way to solve this kind of normalization problem (N objects associated with a container object)? Can I make better use of Django's inbuilt ORM or other tools?

Leaving aside "normalization tables" (the term is unfamiliar to me), this is what I think is a "djangish" way of solving your problem. Please note that I went with your statement "each question is complicated enough to warrant its own Class". For me this means that every type of question necessitate its own unique fields and methods. Otherwise I would create a single Question model connected to a Category model by a ForeignKey.
class Partner(models.Model):
name = models.CharField()
class Question(models.Model):
# Fields and methods common to all kinds of questions
partner = models.ForeignKey(Partner)
label = models.CharField() # example field
class WeatherQuestion(Question):
# Fields and methods for weather questions only
class HealthQuestion(Question):
# Fields and methods for health questions only
class FamilyQuestion(Question):
# Fields and methods for family questions only
This way you would have a base Question model for all the fields and methods common to all questions, and a bunch of child models for describing different kinds of questions. There is an implicit relation between base model and its child models, maintained by Django. This gives you an ability to create a single queryset with different questions, no matter their type. Items in this queryset are of Question type by default, but can be converted to a particular question type by accessing a special attribute (for example a healthquestion attribute for HealtQuestions). This is described in detail in the "Multi-table model inheritance" section of Django documentation.
Then in a view you can get a list of (different types of) questions and then detect their particular type:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).first()
# Detect question type
question_type = "other"
question_obj = question
# in real life the list of types below would probably live in the settings
for current_type in ['weather', 'health', 'family']:
if hasattr(question, current_type + 'question'):
question_type = current_type
question_obj = getattr(question, current_type + 'question')
break
return render(
"questions/{}.html".format(question_type),
{'question': question_obj}
)
The code for detecting question type is quite ugly and complicated. You could make it much simpler and more generic using the InheritanceManager from django-model-utils package. You would need to install the package and add the line to the Question model:
objects = InheritanceManager()
Then the view would then look something like this:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).select_subclasses().first()
question_type = question._meta.object_name.lower()
return render(
"questions/{}.html".format(question_type),
{'question': question}
)
Both views select only a single question - the first one. That's how the view in your example behaved, so I went with it. You could easily convert those examples to return a list of questions (of different types).

I'm not familiar with the term normalization table, but I see what you're trying to do.
What you've described is not, in my opinion, a very satisfactory way to model a database. The simplest approach would be to make all questions part of the same table, with a "type" field, and maybe some other optional fields that vary between the types. In that case, this becomes very simple in Django.
But, OK, you said "let's say... each question is complicated enough to warrant its own class." Django does have a solution for that, which is generic relations. It would look something like this:
class ConversationQuestion(models.Model):
conversation = models.ForeignKey(Conversation)
content_type = models.ForeignKey(ContentType)
question_id = models.PositiveIntegerField()
question = GenericForeignKey('content_type', 'question_id')
# you can use prefetch_related("question") for efficiency
cqs = ConversationQuestion.objects.filter(conversation=conversation)
for cq in cqs:
# do something with the question
# you can look at the content_type if, as above, you need to choose
# a separate template for each type.
print(cq.question)
Because it's part of Django, you get some (but not total) support in terms of the admin, forms, etc.
Or you could do what you've done above, but, as you noticed, it's ugly and doesn't seem to capture the advantages of working with an ORM.

Django: Can you tell if a related field has been prefetched without fetching it?

I was wondering if there is a way in Django to tell if a related field, specifically the "many" part of a one-to-many relationship, has been fetched via, say, prefetch_related() without actually fetching it?
So, as an example, let's say I have these models:
class Question(Model):
"""Class that represents a question."""
class Answer(Model):
"""Class the represents an answer to a question."""
question = ForeignKey('Question', related_name='answers')
Normally, to get the number of answers for a question, the most efficient way to get this would be to do the following (because the Django docs state that count() is more efficient if you just need a count):
# Note: "question" is an instance of class Question.
answer_count = question.answers.count()
However in some cases the answers may have been fetched via a prefetch_related() call (or some way, such as previously having iterated through the answers). So in situations like that, it would be more efficient to do this (because we'd skip the extra count query):
# Answers were fetched via prefetch_related()
answer_count = len(question.answers.all())
So what I really want to do is something like:
if question.answers_have_been_prefetched: # Does this exist?
answer_count = len(question.answers.all())
else:
answer_count = question.answers.count()
I'm using Django 1.4 if it matters. Thanks in advance.
Edit: added clarification that prefetch_related() isn't the only way the answers could've been fetched.

Yes, Django stores the prefetched results in the _prefetched_objects_cache attribute of the parent model instance.
So you can do something like:
instance = Parent.objects.prefetch_related('children').all()[0]
try:
instance._prefetched_objects_cache[instance.children.prefetch_cache_name]
# Ok, it's pefetched
child_count = len(instance.children.all())
except (AttributeError, KeyError):
# Not prefetched
child_count = instance.children.count()
See the relevant use in the django source trunk or the equivalent in v1.4.9

How to handle data correction in Django (but this isn't Django-specific!)

I have a Django application that gathers information about composers (in the musical sense) from various sources - APIs, HTTP POSTs, scraping, and so on.
Once this information is aggregated, it's not very high quality. So you might have "J S Bach" in one place, "J. S. Bach" in another, and various other mistakes. This leads to several rows in my table that represent the same person.
I want to eliminate these duplicates, by making "J. S. Bach" the canonical version, and have it so that if we ever see "J S Bach", we know to correct it. In reality, there are quite a lot of variations, but I'm happy for the process of correction to be a manual one with human input.
So my question is, what's the best way to represent this in code? At the moment, my model is:
class Composer(models.Model):
name = models.CharField(max_length=100)
Should I:
Have a new ComposerCorrection model, that maps composer_id to canonical_id?
Add an optional canonical_id to the Composer model?
Some other thing I've not considered?
It's also worth mentioning that there are other relationships that involve composer, such as a Work belonging to a Composer. When a correction happens, these IDs would also need to be re-pointed somehow, but I think that's not part of the main problem here.
Let me know if you'd like any more information!

Adding on to VascoP's answer (I'd make this a cmoment but there's a little too much code in it), you could store his replace_dic in the database so that you can add corrections through e.g. the Django admin, without having to change any code. This might look like:
class ComposerCorrection(models.Model):
wrong_name = models.CharField(max_length=100, unique=True)
canonical_name = models.CharField(max_length=100)
def correct_name(name):
try:
return ComposerCorrection.objects.get(wrong_name=name).canonical_name
except ComposerCorrection.DoesNotExist:
return name
Then you can put correct_name in the save() method of Composer (or as a pre-save signal), and also add VascoP's correctComposer function as a post-save signal for ComposerCorrection objects, so that adding a new one will fix the database without having to do anything else.

When you find a wrongly named Composer you should update these relationships and remove the wrongly named Composer:
def correctComposer(canonical_composer_name, wrong_composer_name):
canonical_composer = Composer.objects.get(name__exact=canonical_composer_name)
wrong_composer = Composer.objects.get(name__exact=wrong_composer_name)
# repeat this for each relationship
work = wrong_composer.work_set.all()
for entry in work:
entry.composer = canonical_composer
correction.save()
wrong_composer.delete()
EDIT: That works for previously inserted Composers. For auto-correcting upon insertion a different method could be used since we don't need to create new composers if there's already a canonical composer that suits him.
For this you can keep a dictionary (which should be kept near the model for readability) of frequent mistakes and a correcNames function:
replace_dic = {
'motzart' : 'Mozart',
'j s bach' : 'J. S. Bach'
}
def correctNames(name, dic):
return dic.get(name.lower(), name)
By making keys lowercase you get case-insensitive replacement which is kind of a bonus.
And then you might override the Composer save method like this:
def save(self, *args, **kwargs):
self.name = correctNames(self.name, replace_dic)
super(Composer, self).save()

If Composer only contains name before finishing data collection, for simplicity, I may choose not to normalize composer name to Composer at first, but store them in Work instance directly. Just as
class Work(models.Model):
composer_name = models.CharField(max_length=100)
...
And manually filter by composer name and perform batch update in the admin changelist of Work, w/ help of filter and action.
You could then create Composer instances and link Work instance to them, or even use composer_name as primary key of Composer..

Django ORM: Selecting related set

Say I have 2 models:
class Poll(models.Model):
category = models.CharField(u"Category", max_length = 64)
[...]
class Choice(models.Model):
poll = models.ForeignKey(Poll)
[...]
Given a Poll object, I can query its choices with:
poll.choice_set.all()
But, is there a utility function to query all choices from a set of Poll?
Actually, I'm looking for something like the following (which is not supported, and I don't seek how it could be):
polls = Poll.objects.filter(category = 'foo').select_related('choice_set')
for poll in polls:
print poll.choice_set.all() # this shouldn't perform a SQL query at each iteration
I made an (ugly) function to help me achieve that:
def qbind(objects, target_name, model, field_name):
objects = list(objects)
objects_dict = dict([(object.id, object) for object in objects])
for foreign in model.objects.filter(**{field_name + '__in': objects_dict.keys()}):
id = getattr(foreign, field_name + '_id')
if id in objects_dict:
object = objects_dict[id]
if hasattr(object, target_name):
getattr(object, target_name).append(foreign)
else:
setattr(object, target_name, [foreign])
return objects
which is used as follow:
polls = Poll.objects.filter(category = 'foo')
polls = qbind(polls, 'choices', Choice, 'poll')
# Now, each object in polls have a 'choices' member with the list of choices.
# This was achieved with 2 SQL queries only.
Is there something easier already provided by Django? Or at least, a snippet doing the same thing in a better way.
How do you handle this problem usually?

Time has passed and this functionality is now available in Django 1.4 with the introduction of the prefetch_related() QuerySet function. This function effectively does what is performed by the suggested qbind function. ie. Two queries are performed and the join occurs in Python land, but now this is handled by the ORM.
The original query request would now become:
polls = Poll.objects.filter(category = 'foo').prefetch_related('choice_set')
As is shown in the following code sample, the polls QuerySet can be used to obtain all Choice objects per Poll without requiring any further database hits:
for poll in polls:
for choice in poll.choice_set:
print choice

Update: Since Django 1.4, this feature is built in: see prefetch_related.
First answer: don't waste time writing something like qbind until you've already written a working application, profiled it, and demonstrated that N queries is actually a performance problem for your database and load scenarios.
But maybe you've done that. So second answer: qbind() does what you'll need to do, but it would be more idiomatic if packaged in a custom QuerySet subclass, with an accompanying Manager subclass that returns instances of the custom QuerySet. Ideally you could even make them generic and reusable for any reverse relation. Then you could do something like:
Poll.objects.filter(category='foo').fetch_reverse_relations('choices_set')
For an example of the Manager/QuerySet technique, see this snippet, which solves a similar problem but for the case of Generic Foreign Keys, not reverse relations. It wouldn't be too hard to combine the guts of your qbind() function with the structure shown there to make a really nice solution to your problem.

I think what you're saying is, "I want all Choices for a set of Polls." If so, try this:
polls = Poll.objects.filter(category='foo')
choices = Choice.objects.filter(poll__in=polls)

I think what you are trying to do is the term "eager loading" of child data - meaning you are loading the child list (choice_set) for each Poll, but all in the first query to the DB, so that you don't have to make a bunch of queries later on.
If this is correct, then what you are looking for is 'select_related' - see https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
I noticed you tried 'select_related' but it didn't work. Can you try doing the 'select_related' and then the filter. That might fix it.
UPDATE: This doesn't work, see comments below.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django: prefetch_related() with m2m through relationship v2 - python

Related

Django Add list generated from the text of one field to many to many field

Understanding normalization tables in Django's ORM

Django: Can you tell if a related field has been prefetched without fetching it?

How to handle data correction in Django (but this isn't Django-specific!)

Django ORM: Selecting related set

Categories

Resources