Entity groups, ReferenceProperty or key as a string

Entity groups, ReferenceProperty or key as a string - python

After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?

Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class

I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling

I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.

When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff

Related

SQLAlchemy Mapping Multiple Columns to Single Property

I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!

TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.

Is it possible to modify only one property of a ndb datastore entity?

I have a datastore entity with several properties. Each property is updated using a separate method. However, every so often I find that either a method overwrites a property it is not modifying with an old value (Null).
For example.
class SomeModel(ndb.Model):
property1 = ndb.StringProperty()
property2 = ndb.StringProperty()
def method1(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property1 = "1"
data1.put()
The data 1 entity now has property1 with value of "1"
def method2(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property2 = "2"
data1.put()
The data 1 entity now has property2 with value of "2"
However, if these methods are run to closely in succession - method2 seems to overwrite property1 with its initial value (Null).
To get around this issue, I've been using the deferred library, however it's not reliable (deferred entities seem to disappear every now-and-then) or predictable (the _countdown time seems to be for guidance at best) enough.
My question is: Is there a way to only retrieve and modify one property of a datastore entity without overwriting the rest when you call data1.put()? I.e. In the case of method2 - could I only write to property2 without overwriting property1?

The way to prevent such overwrites, is to make sure your updates are done inside transactions. With NDB this is really easy - just attach the #ndb.transactional decorator to your methods:
#ndb.transactional
def method1(self, entity_key_urlsafe):
data1 = ndb.Key(urlsafe = entity_key_urlsafe).get()
data1.property1 = "1"
data1.put()
The documentation on transactions with NDB doesn't give as much background as the (older) DB version, so to familiarise yourself fully with the limitations and options, you should read both.

I say No
I have never seen a reference to that or a trick or a hack.
I also think that it would be quite difficult for such an operation to exist.
When you perform .put() on an entity the entity is serialised and then written.
An entity is an instance of the Class that you can save or retrieve from the Datastore.
Imagine if you had a date property that has auto_now? What would have to happen then? Which of the 2 saves should edit that property?
Though your problem seems to be different. One of your functions commits first and nullifies the other methods value because it retrieves an outdated copy, and not the expected one.
#Greg's Answer talks about transactions. You might want to take a look at them.
Transactions are used for concurrent requests and not that much for succession.
Imagine that 2 users pressing the save button to increase a counter at the same time. There transactions work.
#ndb.transactional
def increase_counter(entity_key_urlsafe):
entity = ndb.Key(urlsafe = entity_key_urlsafe).get()
entity.counter += 1
entity.put()
Transactions will ensure that the counter is correct.
The first that tries to commit the above transaction will succeed and the later will have to retry if retries are on (3 by default).
Though succession is something different. Said that, I and #Greg advise you to change your logic towards using transaction if the problem you want to solve is something like the counter example.

Creating a multi-model in Django

I'd like to create a directed graph in Django, but each node could be a separate model, with separate fields, etc.
Here's what I've got so far:
from bannergraph.apps.banners.models import *
class Node(models.Model):
uuid = UUIDField(db_index=True, auto=True)
class Meta:
abstract = True
class FirstNode(Node):
field_name = models.CharField(max_length=100)
next_node = UUIDField()
class SecondNode(Node):
is_something = models.BooleanField(default=False)
first_choice = UUIDField()
second_choice = UUIDField()
(obviously FirstNode and SecondNode are placeholders for the more domain-specific models, but hopefully you get the point.)
So what I'd like to do is query all the subclasses at once, returning all of the ones that match. I'm not quite sure how to do this efficiently.
Things I've tried:
Iterating over the subclasses with queries - I don't like this, as it could get quite heavy with the number of queries.
Making Node concrete. Apparently I have to still check for each subclass, which goes back to #1.
Things I've considered:
Making Node the class, and sticking a JSON blob in it. I don't like this.
Storing pointers in an external table or system. This would mean 2 queries per UUID, where I'd ideally want to have 1, but it would probably do OK in a pinch.
So, am I approaching this wrong, or forgetting about some neat feature of Django? I'd rather not use a schemaless DB if I don't have to (the Django admin is almost essential for this project). Any ideas?

The InheritanceManager from django-model-utils is what you are looking for.
You can iterate over all your Nodes with:
nodes = Node.objects.filter(foo="bar").select_subclasses()
for node in nodes:
#logic

Python Model with ReferenceProperty and join table

I believe this is trival but fairly new to Python.
I am trying to create a model using google app engine.
Basically from a E/R point of view
I have 2 objects with a join table (the join table captures the point in time of the join)
Something like this
Person | Idea | Person_Idea
-------------------------------
person.key idea.key person.key
idea.key
date_of_idea
my Python code would look like
class Person (db.Model):
#some properties here....
class Idea(db.Model):
#some properties here....
class IdeaCreated(db.Model):
person= db.ReferenceProperty(Person)
idea= db.ReferenceProperty(Idea)
created = db.DateTimeProperty(auto_now_add = True)
What I want to be able to do is have a convient way to get all ideas a person has (bypass idea created objects) -sometimes I will need the list of ideas directly.
The only way I can think to do this is to add the follow method on the User class
def allIdeas(self):
ideas = []
for ideacreated in self.ideacreated_set:
ideas.append(ideacreated.idea)
return ideas
Is this the only way to do this? I is there a nicer way that I am missing?
Also assuming I could have a GQL and bypass hydrating the ideaCreated instances (not sure the exact syntax) but putting a GQL query smells wrong to me.

you should use the person as an ancestor/parent for the idea.
idea = Idea(parent=some_person, other_field=field_value).put()
then you can query all ideas where some_person is the ancestor
persons_ideas = Idea.all().ancestor(some_person_key).fetch(1000)
the ancestor key will be included in the Idea entities key and you won't be able to change that the ancestor once the entity is created.
i highly suggest you to use ndb instead of db https://developers.google.com/appengine/docs/python/ndb/
with ndb you could even use StructuredProperty or LocalStructuredProperty
https://developers.google.com/appengine/docs/python/ndb/properties#structured
EDIT:
if you need a many to many relationship look in to ListProperties and store the Persons keys in that property. then you can query for all Ideas with that Key in that property.
class Idea(db.Model):
person = db.StringListProperty()
idea = Idea(person = [str(person.key())], ....).put()
add another person to the idea
idea.person.append(str(another_person.key())).put()
ideas = Idea.filter(person=str(person.key())).fetch(1000)
look into https://developers.google.com/appengine/docs/python/datastore/typesandpropertyclasses#ListProperty

Filtering models with ReferenceProperties

I'm using google app engine, and am having trouble writing querys to filter ReferenceProperties.
eg.
class Group(db.Model):
name = db.StringProperty(required=True)
creator = db.ReferenceProperty(User)
class GroupMember(db.Model):
group = db.ReferenceProperty(Group)
user = db.ReferenceProperty(User)
And I have tried writing something like this:
members = models.GroupMember.all().filter('group.name =', group_name)
and various other things that don't work. Hopefully someone can give me a prod in the right direction...

If your groups are uniquely named, then your "group.name" is a unique identifier of a Group entity.
That means you can write:
members = models.GroupMember.all().filter(
"group =",model.Group.gql("WHERE name=:1", group_name).get()
)
though you only need to do that if you don't already have the group entity lying around in the stack somewhere.
Google's essay on many-to-many with appengine is here.

If what you want is to get the members of a group, ReferenceProperties have that built-in.
class GroupMember(db.Model):
group = db.ReferenceProperty(Group, collection_name="groupMembers")
user = db.ReferenceProperty(User, collection_name="groupMembers")
Then you can write:
# get the group entity somehow
group = Group.get(group_key)
# do something with the members, such as list the nicknames
nicknames = [x.user.nickname for x in group.groupMembers]

This would require a join, which isn't possible in App Engine. If you want to filter by a property of another model, you need to include that property on the model you're querying against.

This would result in two datastore hits but should work. If you use memcache shouldnt be a problem.
group = models.Group.all().filter("name =", group_name).get()
members = models.GroupMember.all().filter('group =', group)

Using the Models that you defined in your question, lets say you want to list all members of a group called "Space monkeys".
mygroup = Group.gql("WHERE name = :1",'Space monkeys')
for group_member in mygroup.groupmember_set:
print 'group members name is: %s' % (group_member.user.name)
The "groupmember_set" is called a "implicit collection property" and is very useful. You can call it whatever you want by over-riding the default name using the collection_name keyword parameter to ReferenceProperty. For an example see the answer by Thomas L Holaday.
This is all explained in a very good paper by Rafe Kapla: Modeling Entity Relationships.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Entity groups, ReferenceProperty or key as a string - python

Tip: replace... bar.parentFoo.key() == fooKey with... Bar.parentFoo.get_value_for_datastore(bar) == fooKey To avoid the extra lookup and just fetch the key from the ReferenceProperty See Property Class

Related

SQLAlchemy Mapping Multiple Columns to Single Property

Is it possible to modify only one property of a ndb datastore entity?

Creating a multi-model in Django

Python Model with ReferenceProperty and join table

Filtering models with ReferenceProperties

Categories

Resources