I'm trying to get my head around Cassandra/Pycassa db design.
With Mongoengine, you can refer to another class using "ReferenceField", as follows:
from mongoengine import *
class User(Document):
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
class Post(Document):
title = StringField(max_length=120, required=True)
author = ReferenceField(User)
As far as I can tell from the documentation, the Pycassa equivalent is something like this, but I don't know how to create a reference from the Post class author field to the User class:
from pycassa.types import *
from pycassa.pool import ConnectionPool
from pycassa.columnfamilymap import ColumnFamilyMap
import uuid
class User(object):
key = LexicalUUIDType()
email = UTF8Type()
first_name = UTF8Type()
last_name = UTF8Type()
class Post(object):
key = LexicalUUIDType()
title = UTF8Type()
author = ???
What is the preferred way to do something like this? Obviously I could just put the User key in the Post author field, but I'm hoping there's some better way where all this is handled behind the scenes, like with Mongoengine.
#jterrace is correct, you're probably going about this the wrong way. With Cassandra, you don't tend to be concerned as much with objects, how they relate, and how to normalize that. Instead, you have to ask yourself "What queries do I need to be able to answer efficiently?", and then pre-build the answers for those queries. This usually involves a mixture of denormalization and the "wide row" model. I highly suggest that you read some articles about data modeling for Cassandra online.
With that said, pycassa's ColumnFamilyMap is just a thin wrapper that can cut down on boilerplate, nothing more. It does not attempt to provide support for anything complicated because it doesn't know what kinds of queries you need to be able to answer. So, specifically, you could store the matching User's LexicalUUID in the author field, but pycassa will not automatically fetch that User object for you when you fetch the Post object.
I think you're really misunderstanding the data model for Cassandra. You should read Cassandra Data Model before continuing.
pycassa has no notion of "objects" like you have defined above. There are only column families, row key types, and column types. There is no such thing as a reference from one column family to another in Cassandra.
Related
What is the 'best practice' way of accessing data across a 1 (or more) many-to-many tables?
This is incredibly difficult for me as I am not sure what I shuld be googling/looking up.
I have attached a diagram of my data model. I am able to query data for 'C' related ot a user, by utilizing serializers.
there has to be a simpler way of doing this (I'm hoping).
Doing it with serializers seems incredibly limiting. I'd like to access a user's 'B' and 'C' and transform the object to only have a custom structure and possible unique values.
Any direction is much appreciated. Pretty new to Django, so I apologize for this newb type of question.
Here is an example of M2M relation using Django:
class User(models.Model):
name = models.CharField(...)
class Song(models.Model)
title = models.CharField(...)
users_that_like_me = models.ManyToManyField('User', ..., related_name='songs_that_i_like')
So a User can like many Songs and a Song can be liked by many Users.
To see all the songs a user liked, we can do:
user = User.objects.get(id='<the-user-id>')
liked_songs = user.songs_that_i_like.all()
And to see all the users who like a particular song we can similarly do:
song = Song.objects.get(id='<the-song-id>')
users_that_like_this_song = song.users_that_like_me.all()
Both liked_songs and users_that_like_this_song are actually querysets, meaning we can do some Django magic on them.
For example, to find all users named Jon that liked this song we can do:
users_that_like_this_song.filter(name='Jon')
We can also add some property shortcuts to our Models to help with some common tasks, for example:
class User(models.Model):
...
#property
def number_of_liked_songs(self):
return self.songs_that_i_like.count()
Then we can do:
user = User.objects.get(id='<the-user-id>')
number_of_songs_i_like = user.number_of_liked_songs
There's much more we can do with Django - if you're looking for something specific let us know.
I know there is already a similar question, but I think my case is a bit more complicated because I have a different entry point.
These are my models:
class m_Interaction(models.Model):
fk_ip = models.ForeignKey('m_IP', related_name="interactions")
class m_User(models.Model):
name = models.CharField(max_length=200)
class m_IP(models.Model):
fk_user = models.ForeignKey('m_User', related_name="ips" )
class m_Feature(models.Model):
name = models.CharField(max_length=200)
m2m_interaction = models.ManyToManyField(m_Interaction, related_name='features', through='m_Featurescore')
class m_Featurescore(models.Model):
score = models.FloatField(null=False)
fk_interaction = models.ForeignKey(m_Interaction, related_name='featurescore')
fk_feature = models.ForeignKey(m_Feature, related_name='featurescore')
I start with m_User, follow the reverse relationship over m_IP to the Interactions (m_Interaction). Then I want to get every m_Featurescore.score for each Interaction for a specific instance of m_Feature.
My working query to access at least all interactions in a performant way:
m_User.objects.all().prefetch_related('ips__interactions')
But I can't figure out the correct 'prefetch_related'-statement to access the m_Featurescore.score like this
db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score
without making a lot of queries.
I already tried almost all combinations of the following:
'ips__interactions__features__featurescore'
Any suggestions?
I found the answer to my own question with the help of noamk in the comments:
I didn't consider that the get()-method in db_obj_interaction.featurescore.get(fk_feature=db_obj_feature).score will issue a new query everytime it's called (it's kinda obvious now).
Therefore I simply restructured my code and now I don't need get() anymore and can use the benefit of the prefetch.
If somebody still needs to filter the Prefetch()-object should be used as suggested by noamk
I'm trying to learn Django from a background of coding the database schema directly myself. I want to understand how I should be effectively using the database abstraction tools to normalize.
As a contrived example, let's say I have a conversation that can ask questions on 3 subjects, and each question is complicated enough to warrant its own Class.
Class Conversation(models.Model):
partner = models.CharField()
Class Weather_q(models.Model):
#stuff
Class Health_q(models.Model):
#stuff
Class Family_q(models.Model):
#stuff
So let's say I want to have 2 conversations:
Conversation 1 with Bob: ask two different weather questions and one question about his health
Conversation 2 with Alice: ask about the weather and her family
Usually, I would code myself a normalization table for this:
INSERT INTO Conversation (partner) values ("Bob", "Alice"); --primary keys = 1 and 2
INSERT INTO NormalizationTable (fk_Conversation, fk_Weather_q, fk_Health_q, fk_Family_q) VALUES
(1,1,0,0), -- Bob weather#1
(1,2,0,0), -- Bob weather#2
(1,0,1,0), -- Bob health#1
(2,1,0,0), -- Alice weather#1
(2,0,0,1); -- Alice family#1
Do I need to explicitly create this normalization table or is that discouraged?
Class NormalizationTable(models.Model):
fk_Conversation = models.ForeignKey(Conversation)
fk_Weather_q = models.ForeignKey(Weather)
fk_Health_q = models.ForeignKey(Health)
fk_Family_q = models.ForeignKey(Family)
Then I then wanted to execute the conversations. I wrote a view like this (skipping exception catching and logic to iterate through multiple questions per conversation):
from myapp.models import Conversation, Weather_q, Health_q, Family_q
def converse(request):
#get this conversation's pk
#assuming "mypartner" is provided by the URL dispatcher
conversation = Conversation.objects.filter(partner=mypartner)[0]
#get the relevant row of the NormalizationTable
questions = NormalizationTable.objects.filter(fk_Conversation=conversation)[0]
for question in questions:
if question.fk_Weather_q:
return render("weather.html", Weather_q.objects.filter(pk=fk_Weather_q)[0])
if question.fk_Health_q:
return render("health.html", Health_q.objects.filter(pk=fk_Health_q)[0])
if question.fk_Family_q:
return render("family.html", Family_q.objects.filter(pk=fk_Family_q)[0])
Considered holistically, is this the "Django" way to solve this kind of normalization problem (N objects associated with a container object)? Can I make better use of Django's inbuilt ORM or other tools?
Leaving aside "normalization tables" (the term is unfamiliar to me), this is what I think is a "djangish" way of solving your problem. Please note that I went with your statement "each question is complicated enough to warrant its own Class". For me this means that every type of question necessitate its own unique fields and methods. Otherwise I would create a single Question model connected to a Category model by a ForeignKey.
class Partner(models.Model):
name = models.CharField()
class Question(models.Model):
# Fields and methods common to all kinds of questions
partner = models.ForeignKey(Partner)
label = models.CharField() # example field
class WeatherQuestion(Question):
# Fields and methods for weather questions only
class HealthQuestion(Question):
# Fields and methods for health questions only
class FamilyQuestion(Question):
# Fields and methods for family questions only
This way you would have a base Question model for all the fields and methods common to all questions, and a bunch of child models for describing different kinds of questions. There is an implicit relation between base model and its child models, maintained by Django. This gives you an ability to create a single queryset with different questions, no matter their type. Items in this queryset are of Question type by default, but can be converted to a particular question type by accessing a special attribute (for example a healthquestion attribute for HealtQuestions). This is described in detail in the "Multi-table model inheritance" section of Django documentation.
Then in a view you can get a list of (different types of) questions and then detect their particular type:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).first()
# Detect question type
question_type = "other"
question_obj = question
# in real life the list of types below would probably live in the settings
for current_type in ['weather', 'health', 'family']:
if hasattr(question, current_type + 'question'):
question_type = current_type
question_obj = getattr(question, current_type + 'question')
break
return render(
"questions/{}.html".format(question_type),
{'question': question_obj}
)
The code for detecting question type is quite ugly and complicated. You could make it much simpler and more generic using the InheritanceManager from django-model-utils package. You would need to install the package and add the line to the Question model:
objects = InheritanceManager()
Then the view would then look something like this:
from myapp.models import Question
def converse(request, partner_id):
question = Question.objects.filter(partner=partner_id).select_subclasses().first()
question_type = question._meta.object_name.lower()
return render(
"questions/{}.html".format(question_type),
{'question': question}
)
Both views select only a single question - the first one. That's how the view in your example behaved, so I went with it. You could easily convert those examples to return a list of questions (of different types).
I'm not familiar with the term normalization table, but I see what you're trying to do.
What you've described is not, in my opinion, a very satisfactory way to model a database. The simplest approach would be to make all questions part of the same table, with a "type" field, and maybe some other optional fields that vary between the types. In that case, this becomes very simple in Django.
But, OK, you said "let's say... each question is complicated enough to warrant its own class." Django does have a solution for that, which is generic relations. It would look something like this:
class ConversationQuestion(models.Model):
conversation = models.ForeignKey(Conversation)
content_type = models.ForeignKey(ContentType)
question_id = models.PositiveIntegerField()
question = GenericForeignKey('content_type', 'question_id')
# you can use prefetch_related("question") for efficiency
cqs = ConversationQuestion.objects.filter(conversation=conversation)
for cq in cqs:
# do something with the question
# you can look at the content_type if, as above, you need to choose
# a separate template for each type.
print(cq.question)
Because it's part of Django, you get some (but not total) support in terms of the admin, forms, etc.
Or you could do what you've done above, but, as you noticed, it's ugly and doesn't seem to capture the advantages of working with an ORM.
I'm am currently trying to figure out the best way to structure my database schema based on a few models. I'll try and explain this the best I can so I can work out the best way to tackle the problem.
Firstly, I have 3 models that are "related"
User which is extended to contain the field api_key, Campaign and finally Beacon.
User's can have many Campaign's but a Campaign can only relate to one User my first choice here was to have Campaign have a foreign key to User, correct me if I'm wrong, but I feel that is the best choice there. Likewise, Campaign can have many Beacon's but a Beacon can only relate to one Campaign at a time. Again, I'm presuming that a foreign key here would work the best.
The issue arises when I try and query the Beacon's that relate to any given Campaign. I wish to return all Beacon's that relate to the User whilst also getting the data for Campaign.
I wish to return a JSON string like the following:
{
XXXX-YYYYY: {
message: "Hello World",
destination: "http://example.com"
}
XXXX-YYYYY: {
message: "Hello World",
destination: "http://example.com"
}
}
XXXX-YYYYY being the Beacon.factory_id and message/destination being Campaign.message and Campaign.destination
I'm thinking Queryset's here, but I've never worked with them before and it just confused me.
According to your question, you have something like this:
class User(models.Model):
pass
class Campaign(models.Model):
user = models.ForeignKey(User, verbose_name="Attached to")
message = models.CharField()
destination = models.CharField()
class Beacon(models.Model):
factory_id = models.CharField()
campaign = models.ForeignKey(Campaign, verbose_name="Campaign")
You can follow ForeignKey "backward", by using campaign_set generated attribute:
If a model has a ForeignKey, instances of the foreign-key model will have access to a Manager that returns all instances of the first model. By default, this Manager is named FOO_set, where FOO is the source model name, lowercased.
So you can query your Beacon model like this:
beacon = Beacon.objects.get(factory_id="XXXX-YYYYY")
# Get every campaigns related and only relevant fields (in a list of dict)
campaigns = beacon.campaign_set.all().values('message', 'destination')
for campaign in campaigns:
print(campaign['message'])
print(campaign['destination'])
For your dictionary, it is impossible to make it exactly like this. You can't have a duplicate key.
I wish to return all Beacons that relate to the User whilst also getting the data for Campaign
beacons = Beacon.objects.filter(campaign__user=user).select_related('campaign')
You can then easily process this into your desired data structure.
I'm thinking Querysets here, but I've never worked with them before and it just confused me
A QuerySet is simply how the Django ORM represents a query to your database that results in a set of items. So the above is a QuerySet, as is something as simple as User.objects.all(). You can read some introductory material about QuerySets in the documentation.
I believe this is trival but fairly new to Python.
I am trying to create a model using google app engine.
Basically from a E/R point of view
I have 2 objects with a join table (the join table captures the point in time of the join)
Something like this
Person | Idea | Person_Idea
-------------------------------
person.key idea.key person.key
idea.key
date_of_idea
my Python code would look like
class Person (db.Model):
#some properties here....
class Idea(db.Model):
#some properties here....
class IdeaCreated(db.Model):
person= db.ReferenceProperty(Person)
idea= db.ReferenceProperty(Idea)
created = db.DateTimeProperty(auto_now_add = True)
What I want to be able to do is have a convient way to get all ideas a person has (bypass idea created objects) -sometimes I will need the list of ideas directly.
The only way I can think to do this is to add the follow method on the User class
def allIdeas(self):
ideas = []
for ideacreated in self.ideacreated_set:
ideas.append(ideacreated.idea)
return ideas
Is this the only way to do this? I is there a nicer way that I am missing?
Also assuming I could have a GQL and bypass hydrating the ideaCreated instances (not sure the exact syntax) but putting a GQL query smells wrong to me.
you should use the person as an ancestor/parent for the idea.
idea = Idea(parent=some_person, other_field=field_value).put()
then you can query all ideas where some_person is the ancestor
persons_ideas = Idea.all().ancestor(some_person_key).fetch(1000)
the ancestor key will be included in the Idea entities key and you won't be able to change that the ancestor once the entity is created.
i highly suggest you to use ndb instead of db https://developers.google.com/appengine/docs/python/ndb/
with ndb you could even use StructuredProperty or LocalStructuredProperty
https://developers.google.com/appengine/docs/python/ndb/properties#structured
EDIT:
if you need a many to many relationship look in to ListProperties and store the Persons keys in that property. then you can query for all Ideas with that Key in that property.
class Idea(db.Model):
person = db.StringListProperty()
idea = Idea(person = [str(person.key())], ....).put()
add another person to the idea
idea.person.append(str(another_person.key())).put()
ideas = Idea.filter(person=str(person.key())).fetch(1000)
look into https://developers.google.com/appengine/docs/python/datastore/typesandpropertyclasses#ListProperty