GAE structuring models without denormalizing

GAE structuring models without denormalizing - python

I'm trying restructure a relational database for Google App Engine and I'm having issues with modeling a set of relationships in a way that will let me query the data I need in an efficient manner.
As a contrived example, say I have the following models:
class Rental(db.Model):
# ancestor of one or more RentalDatas
date = db.DateProperty()
location = db.IntegerProperty()
# + customer, etc
class RentalData(db.Model):
# a Rental is our parent
unicorn = db.ReferenceProperty(Unicorn, collection_name='rentals')
mileage_in = db.FloatProperty()
mileage_out = db.FloatProperty()
# + returned_fed, etc
class Unicorn(db.Model):
name = db.StringProperty()
color = db.IntegerProperty()
# 'rentals' collection from RentalData
Each rental can include multiple unicorns, so I'd rather not combine the Rental and RentalData models if I can help it. I'll almost always be starting with a given Unicorn, and I'd like to be able to query, say, whether the unicorn was returned fed on the last x rentals, without having to iterate through every RentalData in the unicorn's rentals collection and query for the parent, just to get the date to sort on.
Do I just have to bite the bullet and de-normalize? I've tried duplicating date in RentalData, and using parent() as needed to get the corresponding Rental when I need properties from it. It works, but I'm worried I'm just taking the easy way out.

Related

Flexible database models for users to define extra columns to database tables in Django

I am trying to build a tool that, at a simple level, tries to analyse how to buy a flat. DB = POSTGRES
So the model basically is:
class Property(models.Model):
address = CharField(max_length = 200)
price = IntegerField()
user = ForeignKey(User) # user who entered the property in the database
#..
#..
# some more fields that are common across all flats
#However, users might have their own way of analysing
# one user might want to put
estimated_price = IntegerField() # his own estimate of the price, different from the zoopla or rightmove listing price
time_to_purchase = IntegerField() # his own estimate on how long it will take to purchase
# another user might want to put other fields
# might be his purchase process requires sorting or filtering based on these two fields
number_of_bedrooms = IntegerField()
previous_owner_name = CharField()
How do I give such flexiblity to users? They should be able to sort , filter and query their own rows (in the Property table) by these custom fields. The only option I can think of now is the JSONField Postgres field
Any advice? I am surprised this is not solved readily in Django - I am sure lots of other people would have come across this problem already
Thanks

Edit: As the comments point out. JSON field is a better idea in this case.
Simple. Use Relations.
Create a model called attributes.
It will have a foreign key to a Property, a name field and a value field.
Something like,
class Attribute(models.Model):
property = models.ForiegnKey(Property)
name = models.CharField(max_length=50)
value = models.CharField(max_length=150)
Create an object each for all custom attributes of a property.
When using database queries use select_related of prefetch_related for faster response, less db operations.

Nested chain vs duplicated information

There is a models.py with 4 model.
Its standard record is:
class Main(models.Model):
stuff = models.IntegerField()
class Second(models.Model):
nested = models.ForeignKey(Main)
stuff = models.IntegerField()
class Third(models.Model):
nested = models.ForeignKey(Second)
stuff = models.IntegerField()
class Last(models.Model):
nested = models.ForeignKey(Third)
stuff = models.IntegerField()
and there is another variant of Last model:
class Last(models.Model):
nested1 = models.ForeignKey(Main)
nested2 = models.ForeignKey(Second)
nested = models.ForeignKey(Third)
stuff = models.IntegerField()
Will that way save some database load?
The information in nested1 and nested2 will duplicate fields in Secod and Third and even it may become outdated ( fortunately not in my case, as the data will not be changed, only new is added ).
But from my thoughts it may save database load, when I'm looking all Last records for a certain Main record. Or when I'm looking only for Main.id for specific Last item.
Am I right?
Will it really save the load or there is a better practice?

It all depends how you access the data. By default Django will make another call to the database when you access a foreign key. So if you want to make less calls to the database, you can use select_related to prefetch the models in foreign keys.

Proper/most efficient way to handle Django models and relational data

I'm curious about what the best way to handle models in Django is. Let's say you want to make an app that deals with TV Show listings. One way to handle the model would be
class TVShow(models.Model)
channel = models.CharField()
show_name = models.CharField()
season = models.CharField()
episode = models.CharField()
Which has the advantage of everything being packed neatly. However, if I want to display a list of all of the channels, or all of the show_names, I would have to go through the TVSHow objects and remove duplicates
On the other hand one could
class CommonModel(models.Model)
name = models.CharField()
class Meta:
Abstract = True
class Channel(CommonModel)
show_name = models.ManyToMany(ShowName)
class ShowName(CommonModel)
seasons = models.ManyToMany(Seasons)
class Season(CommonModel)
episodes = models.ManyToMany(Episodes)
class Episode(CommonModel)
This would make it easy to show all of the ShowNames or all of the Channels, without having to worry about unrelated data. However, it would be much harder to see what Channel a show is, unless you map back as well
Is there a "pythonic" or Django preferred way to do this? Are there any advantages in terms of space, speed, etc?
Thanks!

Your initial stab at it looks fine. That is, you could use
class TVShow(models.Model)
channel = models.CharField()
show_name = models.CharField()
season = models.CharField()
episode = models.CharField()
And then you could just use the django orm to do the queries you were looking for.
That is, if you wanted all the channels with no duplicates, you would say
TVShow.objects.distinct('channel')
Django documentation for distinct().
As far as performance goes, this is the way to do it because you are effectively having the database do it. Databases are designed for these purposes and should be significantly faster than trying to trim it in code.

Preferred way to use normalized database structure unless it's performance-related, it will give you ability to make more complex queries in your code easier. ForeignKey and ManyToManyField accepts 'related_name' argument.
class Channel(models.Model):
pass
class Show(models.Model):
# this means you can have same show on different channels
channels = models.ManyToManyField(Channel, related_name='shows')
class Episode(models.Model):
# this means that one episode can be related only to one show
show = models.ForeignKey(Show, related_name='episodes')
Channel.objects.filter(shows__name='Arrested Development')
Channel.objects.get(name='Discovery').shows.all()
Show.objects.get(name='Arrested Development').episodes.all()
#2 db queries, 1 join
Episode.objects.get(name='Arrested Development S01E01',
select_related='show').show.channels.all()
#1 db query, 3 joins
Channel.objects.filter(shows__episode__name='Arrested Development S01E01')
and so on...

Dynamic property vs. fixed property query speeds on Google App Engine

I'm designing datastore models and trying to decide the best approach when thinking about how the query filters are going to work.
Best if I just write out two examples. The first example is if I have a fixed "gender" property with just a string which can be set to either "male" or "female".
class Person(db.Model):
name = db.StringProperty()
gender = db.StringProperty()
p1 = Person(name="Steve", gender="male")
p2 = Person(name="Jane", gender="female")
p1.put()
p2.put()
males = db.GqlQuery("SELECT * FROM Person WHERE gender = :1", "male")
The second example is if the Person entity is an expando model, and I dynamically set a "is_male" or "is_female" dynamic property.
class Person(db.Expando):
name = db.StringProperty()
p1 = Person(name="Steve")
p1.is_male = True
p1.put()
p2 = Person(name="Jane")
p2.is_female = True
p2.put()
males = db.GqlQuery("SELECT * FROM Person WHERE is_male = :1", True)
Now lets say that we gather millions of records and we want to do a query, which one of the two methods above would be faster in production Google App Engine running Python 2.7?

There is absolutely no difference - models look the same in the datastore regardless of whether the property was 'dynamic' or not. The only difference is that a standard property class with no value set will insert a field with value None in the datastore, which takes some extra space, but allows you to query for users with that value not set.

even if its not exactly what you are asking for i think it might give you an idea.
Is there some performance issue between leaving empty ListProperties or using dynamic (expando) properties?

Entity groups, ReferenceProperty or key as a string

After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?

Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class

I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling

I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.

When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

GAE structuring models without denormalizing - python

Related

Flexible database models for users to define extra columns to database tables in Django

Nested chain vs duplicated information

Proper/most efficient way to handle Django models and relational data

Dynamic property vs. fixed property query speeds on Google App Engine

Entity groups, ReferenceProperty or key as a string

Categories

Resources