Understanding normalization tables in Django's ORM - python

I'm trying to learn Django from a background of coding the database schema directly myself. I want to understand how I should be effectively using the database abstraction tools to normalize.
As a contrived example, let's say I have a conversation that can ask questions on 3 subjects, and each question is complicated enough to warrant its own Class.
Class Conversation(models.Model):
partner = models.CharField()
Class Weather_q(models.Model):
#stuff
Class Health_q(models.Model):
#stuff
Class Family_q(models.Model):
#stuff
So let's say I want to have 2 conversations:
Conversation 1 with Bob: ask two different weather questions and one question about his health
Conversation 2 with Alice: ask about the weather and her family
Usually, I would code myself a normalization table for this:
INSERT INTO Conversation (partner) values ("Bob", "Alice"); --primary keys = 1 and 2
INSERT INTO NormalizationTable (fk_Conversation, fk_Weather_q, fk_Health_q, fk_Family_q) VALUES
(1,1,0,0), -- Bob weather#1
(1,2,0,0), -- Bob weather#2
(1,0,1,0), -- Bob health#1
(2,1,0,0), -- Alice weather#1
(2,0,0,1); -- Alice family#1
Do I need to explicitly create this normalization table or is that discouraged?
Class NormalizationTable(models.Model):
fk_Conversation = models.ForeignKey(Conversation)
fk_Weather_q = models.ForeignKey(Weather)
fk_Health_q = models.ForeignKey(Health)
fk_Family_q = models.ForeignKey(Family)
Then I then wanted to execute the conversations. I wrote a view like this (skipping exception catching and logic to iterate through multiple questions per conversation):
from myapp.models import Conversation, Weather_q, Health_q, Family_q
def converse(request):
#get this conversation's pk
#assuming "mypartner" is provided by the URL dispatcher
conversation = Conversation.objects.filter(partner=mypartner)[0]
#get the relevant row of the NormalizationTable
questions = NormalizationTable.objects.filter(fk_Conversation=conversation)[0]
for question in questions:
if question.fk_Weather_q:
return render("weather.html", Weather_q.objects.filter(pk=fk_Weather_q)[0])
if question.fk_Health_q:
return render("health.html", Health_q.objects.filter(pk=fk_Health_q)[0])
if question.fk_Family_q:
return render("family.html", Family_q.objects.filter(pk=fk_Family_q)[0])
Considered holistically, is this the "Django" way to solve this kind of normalization problem (N objects associated with a container object)? Can I make better use of Django's inbuilt ORM or other tools?

Leaving aside "normalization tables" (the term is unfamiliar to me), this is what I think is a "djangish" way of solving your problem. Please note that I went with your statement "each question is complicated enough to warrant its own Class". For me this means that every type of question necessitate its own unique fields and methods. Otherwise I would create a single Question model connected to a Category model by a ForeignKey.
class Partner(models.Model):
name = models.CharField()
class Question(models.Model):
# Fields and methods common to all kinds of questions
partner = models.ForeignKey(Partner)
label = models.CharField() # example field
class WeatherQuestion(Question):
# Fields and methods for weather questions only
class HealthQuestion(Question):
# Fields and methods for health questions only
class FamilyQuestion(Question):
# Fields and methods for family questions only
This way you would have a base Question model for all the fields and methods common to all questions, and a bunch of child models for describing different kinds of questions. There is an implicit relation between base model and its child models, maintained by Django. This gives you an ability to create a single queryset with different questions, no matter their type. Items in this queryset are of Question type by default, but can be converted to a particular question type by accessing a special attribute (for example a healthquestion attribute for HealtQuestions). This is described in detail in the "Multi-table model inheritance" section of Django documentation.
Then in a view you can get a list of (different types of) questions and then detect their particular type:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).first()
# Detect question type
question_type = "other"
question_obj = question
# in real life the list of types below would probably live in the settings
for current_type in ['weather', 'health', 'family']:
if hasattr(question, current_type + 'question'):
question_type = current_type
question_obj = getattr(question, current_type + 'question')
break
return render(
"questions/{}.html".format(question_type),
{'question': question_obj}
)
The code for detecting question type is quite ugly and complicated. You could make it much simpler and more generic using the InheritanceManager from django-model-utils package. You would need to install the package and add the line to the Question model:
objects = InheritanceManager()
Then the view would then look something like this:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).select_subclasses().first()
question_type = question._meta.object_name.lower()
return render(
"questions/{}.html".format(question_type),
{'question': question}
)
Both views select only a single question - the first one. That's how the view in your example behaved, so I went with it. You could easily convert those examples to return a list of questions (of different types).

I'm not familiar with the term normalization table, but I see what you're trying to do.
What you've described is not, in my opinion, a very satisfactory way to model a database. The simplest approach would be to make all questions part of the same table, with a "type" field, and maybe some other optional fields that vary between the types. In that case, this becomes very simple in Django.
But, OK, you said "let's say... each question is complicated enough to warrant its own class." Django does have a solution for that, which is generic relations. It would look something like this:
class ConversationQuestion(models.Model):
conversation = models.ForeignKey(Conversation)
content_type = models.ForeignKey(ContentType)
question_id = models.PositiveIntegerField()
question = GenericForeignKey('content_type', 'question_id')
# you can use prefetch_related("question") for efficiency
cqs = ConversationQuestion.objects.filter(conversation=conversation)
for cq in cqs:
# do something with the question
# you can look at the content_type if, as above, you need to choose
# a separate template for each type.
print(cq.question)
Because it's part of Django, you get some (but not total) support in terms of the admin, forms, etc.
Or you could do what you've done above, but, as you noticed, it's ugly and doesn't seem to capture the advantages of working with an ORM.

Related

How to access data across M2M tables in Django?

What is the 'best practice' way of accessing data across a 1 (or more) many-to-many tables?
This is incredibly difficult for me as I am not sure what I shuld be googling/looking up.
I have attached a diagram of my data model. I am able to query data for 'C' related ot a user, by utilizing serializers.
there has to be a simpler way of doing this (I'm hoping).
Doing it with serializers seems incredibly limiting. I'd like to access a user's 'B' and 'C' and transform the object to only have a custom structure and possible unique values.
Any direction is much appreciated. Pretty new to Django, so I apologize for this newb type of question.
Here is an example of M2M relation using Django:
class User(models.Model):
name = models.CharField(...)
class Song(models.Model)
title = models.CharField(...)
users_that_like_me = models.ManyToManyField('User', ..., related_name='songs_that_i_like')
So a User can like many Songs and a Song can be liked by many Users.
To see all the songs a user liked, we can do:
user = User.objects.get(id='<the-user-id>')
liked_songs = user.songs_that_i_like.all()
And to see all the users who like a particular song we can similarly do:
song = Song.objects.get(id='<the-song-id>')
users_that_like_this_song = song.users_that_like_me.all()
Both liked_songs and users_that_like_this_song are actually querysets, meaning we can do some Django magic on them.
For example, to find all users named Jon that liked this song we can do:
users_that_like_this_song.filter(name='Jon')
We can also add some property shortcuts to our Models to help with some common tasks, for example:
class User(models.Model):
...
#property
def number_of_liked_songs(self):
return self.songs_that_i_like.count()
Then we can do:
user = User.objects.get(id='<the-user-id>')
number_of_songs_i_like = user.number_of_liked_songs
There's much more we can do with Django - if you're looking for something specific let us know.

Django: prefetch_related() with m2m through relationship v2

I know there is already a similar question, but I think my case is a bit more complicated because I have a different entry point.
These are my models:
class m_Interaction(models.Model):
fk_ip = models.ForeignKey('m_IP', related_name="interactions")
class m_User(models.Model):
name = models.CharField(max_length=200)
class m_IP(models.Model):
fk_user = models.ForeignKey('m_User', related_name="ips" )
class m_Feature(models.Model):
name = models.CharField(max_length=200)
m2m_interaction = models.ManyToManyField(m_Interaction, related_name='features', through='m_Featurescore')
class m_Featurescore(models.Model):
score = models.FloatField(null=False)
fk_interaction = models.ForeignKey(m_Interaction, related_name='featurescore')
fk_feature = models.ForeignKey(m_Feature, related_name='featurescore')
I start with m_User, follow the reverse relationship over m_IP to the Interactions (m_Interaction). Then I want to get every m_Featurescore.score for each Interaction for a specific instance of m_Feature.
My working query to access at least all interactions in a performant way:
m_User.objects.all().prefetch_related('ips__interactions')
But I can't figure out the correct 'prefetch_related'-statement to access the m_Featurescore.score like this
db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score
without making a lot of queries.
I already tried almost all combinations of the following:
'ips__interactions__features__featurescore'
Any suggestions?
I found the answer to my own question with the help of noamk in the comments:
I didn't consider that the get()-method in db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score will issue a new query everytime it's called (it's kinda obvious now).
Therefore I simply restructured my code and now I don't need get() anymore and can use the benefit of the prefetch.
If somebody still needs to filter the Prefetch()-object should be used as suggested by noamk

Django 1.7 and smart, deep, filtered aggregations

I'm using Django 1.7 and I'm trying to seize the advantages of new features in the ORM.
Assume I have:
class Player(models.Model):
name = models.CharField(...)
class Question(models.Model):
title = models.CharField(...)
answer1 = models.CharField(...)
answer2 = models.CharField(...)
answer3 = models.CharField(...)
right = models.PositiveSmallIntegerField(...) #choices=1, 2, or 3
class Session(models.Model):
player = models.ForeignKey(Player, related_name="games")
class RightAnswerManager(models.Manager):
def get_queryset(self):
super(RightAnswerManager, self).get_queryset().filter(answer=models.F('question__right'))
class AnsweredQuestion(models.Model):
session = models.ForeignKey(Session, related_name="questions")
question models.ForeignKey(Question, ...)
answer = models.PositiveSmallIntegerField(...) #1, 2, 3, or None if not yet ans.
objects = models.Manager()
right = RightAnswerManager()
I know I can do:
Session.objects.prefetch_related('questions')
And get the sessions with the questions.
Also I can do:
Session.objects.prefetch_related(models.Prefetch('questions', queryset=AnsweredQuestion.right.all(), to_attr='answered'))
And get the sessions with the list of questions that were actually answered and right.
BUT I cannot do aggregation over those, to get -e.g.- the count of elements instead:
Session.objects.prefetch_related(models.Prefetch('questions', queryset=AnsweredQuestion.right.all(), to_attr='answered')).annotate(total_right=models.Count('answered'))
since answered is not a real field:
FieldError: Cannot resolve keyword 'rightones' into field. Choices are: id, name, sessions
This is only a sample, since there are a lot of fields in my models I never included. However the idea is clear: I cannot aggregate over created attributes.
Is there a way without falling to raw to respond to the following question?
Get each user annotated with their "points".
A user may play any amount of sessions.
In each session it gets many questions to answer.
For each right answer, a point is earned.
In RAW SQL it would be something like:
SELECT user.*, COUNT(answeredquestion.id)
FROM user
LEFT OUTER JOIN session ON (session.user_id = user.id)
INNER JOIN answeredquestion ON (answeredquestion.session_id = session.id)
INNER JOIN question ON (answeredquestion.question_id = question.id)
WHERE answeredquestion.answer = question.right
GROUP BY user.id
Or something like that (since there's a functional dependency in the grouping field, I would collect the user data and count the related answeredquestions, assuming the condition passes). So RAW queries are not an option for me.
The idea is to get the users with the total points.
My question can be responded in one of two ways (or both).
Is there a way to perform the same query (actually I never tested this exact query; It's here to present the idea) with the ORM in Django 1.7, somehow given the Prefetch or manager selection on related/inverse FK fields? Iteration is not allowed here (I'd have a quadratic version of the N+1 problem!).
Is there any django package which somehow does this? Perhaps doing an abstraction of RAW calls, provided by 3rd party. this is because I will have many queries like this one.
I don't believe using prefetch gets you any gain in this situation. Generally prefetch_related and select_related are used when looping through a filterset and accessing a related object for each. By doing the annotate in the initial query I believe django will take of that optimization for you.
For the question "Get each user annotated with their "points" try this query:
Player.objects.annotate(total_right=models.Count('games__questions__right'))

Django; empty model for Foreign Key attachment

I'm building a CMS for a frequently asked questions page... I want "Frequently Asked Questions" to show up in the main admin menu and when clicked just reveal a big list of editable question/answer pairs. So there only needs to be one instance of the FAQ model and it doesn't need to have any information on its own... How would I do this?
class FAQ(models.Model):
class QandA(models.Model):
reference = models.ForeignKey(FAQ)
question = models.CharField()
answer = models.CharField()
def __unicode__(self):
return self.question
This returns the error that an indent is expected after class FAQ(models.Model): what do I need to add to achieve this result?
Syntactic answer:
You need at least a pass to satisfy Python's desire for an indented statement.
Semantic answer:
I'm not sure I understand 100% why you want that class in the first place, it sounds like a hack for the admin screen but maybe you could describe that more specifically.

Datastore Design Inquiry

I'm creating a Trivia app, and need some help designing my model relationships. This question may get fairly complicated, but I'll try to be concise.
Trivia questions will all be part of a particular category. Categories may be a category within another category. If a trivia question is created/removed, I need to make sure that I also update a counter. In this way, I'll be able to see how many questions are in each category, and display that back to users. If a category has 'child' categories, I will need a way of displaying a cumulative counter of all sub-categories. Accurate tallies are fairly important, but not mission critical. I do not mind using sharded counters. My question is, how should I design this so that it will adopt GAE denormalization, and maintain optimization?
I was thinking of having a Category class, with a ListProperty in each, which will represent the ancestor tree. It will contain a key to each parent entity in the tree, in order. But, should I also specify a parent when constructing the entities, or is that not needed in this case? I'm thinking that I may have to run my counter updates in transaction, which is why I am considering a parent-child relationship.
Or perhaps there is more optimized way of designing my relationships that will still allow me to keep fairly accurate counters of all questions in each category. Thanks in advance for any help.
This isn't as complicated as you might think. Here's a Category class:
class Category(db.Model):
title = db.StringProperty()
subcategories = db.ListProperty(db.Key)
quizzes = db.ListProperty(db.Key)
def add_sub_category(self, title):
new_category = Category(title)
new_category.put()
self.subcategories.append(new_category)
self.put()
return new_category
By keeping both the subcategories and quizzes that are assocaited with this Category in a ListProperty, getting a count of them is as simple as using the len() operator.
You could use it something like this:
main_category = Category("Main")
main_category.put()
sports_category = main_category.add_sub_category("Sports")
baseball_category = sports_category.add_sub_category("Baseball")
football_category = sports_category.add_sub_category("Football")
hockey_category = sports_category.add_sub_category("Hockey")
tv_category = main_category.add_sub_category("TV")
...etc...
I'm not that familiar with Google App Engine, but here are some thoughts. First is to consider if "tags" are more appropriate than category & sub categories. Will their be a rigid 2 level category scheme? Will all items have a main and subcategory assignment?
Rather than having a class for each category, have you considered a CategoryList class that would have a incrementCategoryByName(str name) method? The class contain a dictionary of classes without having to have the overhead of a class for each category.

Categories

Resources