List of References in Google App Engine for Python - python

In Google App Engine, there is such a thing as a ListProperty that allows you to hold a list (array) of items. You may also specify the type of the item being held, for instance string, integer, or whatever.
Google App Engine also allows you to have a ReferenceProperty. A ReferenceProperty "contains" a reference to another Google App Engine Model entity. If you access a ReferenceProperty, it will automatically retrieve the actual entity that the reference points to. This is convenient, as it beats getting the key, and then getting the entity for said key.
However, I do not see any such thing as a ListReferenceProperty (or ReferenceListProperty). I would like to hold a list of references to other entities, that would automatically be resolved when I attempt to access elements within the list. The closest I can get it seems is to hold a list of db.Key objects. I can use these keys to then manually retrieve their associated entities from the server.
Is there any good solution to this? Basically, I would like the ability to have a collection of (auto-dereferencing) references to other entities. I can almost get there by having a collection of keys to other entities, but I would like it to "know" that these are key items, and that it could dereference them as a service to me.
Thank you

Step one:
Use db.ListProperty(db.Key) to create the relationship. You want the ListProp to be on the Entity that will have the fewer references in the Many to Many relationship. This will also give you a back reference. So:
class Spam
prop1 = db.String
eggs = db.List
class Eggs
prop1 = db.string
#property
def spams(self):
return Spam.all().filter('eggs', self.key())
This provides a References both ways.
Step two:
Create a utlility method that derefrences properties.
def prefetch_refprops(entities, *props):
"""Dereference Reference Properties to reduce Gets. See:
http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine
"""
fields = [(entity, prop) for entity in entities for prop in props]
ref_keys = [prop.get_value_for_datastore(x) for x, prop in fields]
ref_entities = dict((x.key(), x) for x in db.get(set(ref_keys)))
for (entity, prop), ref_key in zip(fields, ref_keys):
prop.__set__(entity, ref_entities[ref_key])
return entities
Usage would be:
derefrenced_spams = prefetch_refprops(Spams, models.Spam.eggs)

You're right, there's no built in ReferenceListProperty. It'd be possible to write one yourself - custom Property subclasses are generally fairly easy - but getting it right is harder than you'd think, when it comes to deferencing and caching a list of references.
You can use a db.ListProperty(db.Key), however, which allows you to store a list of keys. Then, you can load them individually or all at once using a batch db.get() operation. This requires you to do the resolution step yourself, but it also gives you more control over when you dereference entities.

Related

Initialize classes from POST JSON data

I am writing a Django app, which will send some data from the site to a python script to process. I am planning on sending this data as a JSON string (this need not be the case). Some of the values sent over would ideally be class instances, however this is clearly not possible, and the class name plus any arguments needed to initialize the class must some how be serialized into a JSON value before then being deserialized by the python script. This could be achieved with the code below, but it has several problems:
My attempt
I have put all the data needed for each class, in a list and used that to initialize each class:
import json
class Class1():
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
self._others = args
class Bar():
POTENTIAL_OBJECTS = {"RANGE": range,
"Class1": Class1}
def __init__(self, json_string):
python_dict = json.loads(json_string)
for key, value in python_dict.items():
if isinstance(value, list) and value[0] in Bar.POTENTIAL_OBJECTS:
setattr(self, key, Bar.POTENTIAL_OBJECTS[value[0]](*value[1], **value[2]))
else:
setattr(self, key, value)
example = ('{ "key_1":"Some string", "key_2":["heres", "a", "list"],'
'"key_3":["RANGE", [10], {}], "key_4":["Class1", ["stuff"], {"stuff2":"x"}] }')
a = Bar(example)
The Problems with my approach
Apart from generally being a bit messy and not particularly elegant, there are other problems. Some of the lists in the JSON object will be generated by the user, and this obviously presents problems if the user uses a key from POTENTIAL_OBJECTS. (In a non-simplified version, Bar will have lots of subclasses, each with a second POTENTIAL_OBJECTS so keeping track of all the potential values for front-end validation would be tricky).
My Question
It feels like this must be a reasonably common thing that is needed and there must be some standard patterns or ways of achieving this. Is there a common/better approach/method to achieve this?
EDIT: I have realised, one way round the problem is to make all the keys in POTENTIAL_OBJECTS start with an underscore, and then validate against any underscores in user-inputs at the front-end. It still seems like there must be a better way to de-serialize from JSON to more complex objects than strings/ints/bools/lists etc.
Instead of having one master method to turn any arbitrary JSON into an arbitrary hierarchy of Python objects, the typical pattern would be to create a Django model for each type of thing you are trying to model. Relationships between them would then be modeled via relationship fields (ForeignKey, ManyToMany, etc, as appropriate). For instance, you might create a class Employee that models an employee, and a class Paycheck. Paycheck could then have a ForeignKey field named issued_to that refers to an Employee.
Note also that any scheme similar to the one you describe (where user-created JSON is translated directly into arbitrary Python objects) would have security implications, potentially allowing users to execute arbitrary code in the context of the Django server, though if you were to attempt it, the whitelist approach have started here would be a decent place to start as a way to do it safely.
In short, you're reinventing most of what Django already does for you. The Django ORM features will help you to create models of the specific things you are interested in, validate the data, turn those data into Python objects safely, and even save instances of these models in the database for retrieval later.
That said, if you are to parse a JSON string directly into an object hierarchy, you would have to do a full traversal instead of just going over the top-level items. To do that, you should look into doing something like a depth-first traversal, creating new model instances at each new node in the hierarchy. If you want to validate these inputs server-side, you'd need to replicate this work in Javascript as well.

NDB Expando Model with dynamic TextProperty?

I'm trying to do:
MyModel({'text': db.Text('text longer than 500 byets')})
But get:
BadValueError: Indexed value fb_education must be at most 500 bytes
I'm thinking this is just a carry over from this issue with the old db api.
https://groups.google.com/forum/?fromgroups#!topic/google-appengine/wLAwrjtsuks
First create entity dynamically :
kindOfEntity = "MyTable"
class DynamicEntity(ndb.Expando):
#classmethod
def _get_kind(cls):
return kindOfEntity
then after to assign Text Properties run time/dynamically as shown below
dbObject = DynamicEntity()
key = "studentName"
value = "Vijay Kumbhani"
textProperties = ndb.TextProperty(key)
dbObject._properties[key] = {}
dbObject._values[key] = {}
dbObject._properties[key] = textProperties
dbObject._values[key] = value
dbObject.put()
then after key properties assign with Text properties
You're trying to use a db.Text, part of the old API, with NDB, which isn't going to work.
To the best of my knowledge, there's no good way to set unindexed properties in an Expando in NDB, currently. You can set _default_indexed = False on your expando subclass, as (briefly) documented here, but that will make the default for all expando properties unindexed.
A better solution would be to avoid the use of Expando alltogether; there are relatively few compelling uses for it where you wouldn't be better served by defining a model (or even defining one dynamically).
Yeah, I know question is old. But I also googled for same solutions and not found any result.
So here receipt that works for me (I expand User() with "permissions" property):
prop = ndb.GenericProperty("permissions", indexed=False)
prop._code_name = "permissions"
user._properties["permissions"] = prop
prop._set_value(user, permissions)
The previous answer was VERY use to me... Thanks!!! I just wanted to add that it appears you can also create a specific property type using this technique (if you know the datatype you want to create). When the entity is later retrieved, the dynamic property is set to the specific type instead of GenericProperty. This can be handy for ndb.PickleProperty and ndb.JsonProperty values in particular (to get the in/out conversions).
prop = ndb.TextProperty("permissions", indexed=False)
prop._code_name = "permissions"
user._properties["permissions"] = prop
prop._set_value(user, permissions)
I was trying to just change one property of an entity to Text. But when you don't map your properties explicitly, Expando/Model seem to change all properties of an entity to GenericProperty (after get).
When you put those entities again (to change the desired property), it affects other existing TextProperties, changing then to regular strings.
Only the low-level datastore api seems to work:
https://gist.github.com/feroult/75b9ab32b463fe7f9e8a
You can call this from the remote_api_shell.py:
from yawp import *
yawp(kind).migrate(20, 'change_property_to_text', 'property_name')

Entity groups, ReferenceProperty or key as a string

After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?
Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class
I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling
I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.
When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

How do I define a unique property for a Model in Google App Engine?

I need some properties to be unique. How can I achieve this?
Is there something like unique=True?
I'm using Google App Engine for Python.
Google has provided function to do that:
http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert
Model.get_or_insert(key_name, **kwds)
Attempts to get the entity of the model's kind with the given key name. If it exists, get_or_insert() simply returns it. If it doesn't exist, a new entity with the given kind, name, and parameters in kwds is created, stored, and returned.
The get and subsequent (possible) put are wrapped in a transaction to ensure atomicity. Ths means that get_or_insert() will never overwrite an existing entity, and will insert a new entity if and only if no entity with the given kind and name exists.
In other words, get_or_insert() is equivalent to this Python code:
def txn():
entity = MyModel.get_by_key_name(key_name, parent=kwds.get('parent'))
if entity is None:
entity = MyModel(key_name=key_name, **kwds)
entity.put()
return entity
return db.run_in_transaction(txn)
Arguments:
key_name
The name for the key of the entity
**kwds
Keyword arguments to pass to the model class's constructor if an instance with the specified key name doesn't exist. The parent argument is required if the desired entity has a parent.
Note: get_or_insert() does not accept an RPC object.
The method returns an instance of the model class that represents the requested entity, whether it existed or was created by the method. As with all datastore operations, this method can raise a TransactionFailedError if the transaction could not be completed.
There's no built-in constraint for making sure a value is unique. You can do this however:
query = MyModel.all(keys_only=True).filter('unique_property', value_to_be_used)
entity = query.get()
if entity:
raise Exception('unique_property must have a unique value!')
I use keys_only=True because it'll improve the performance slightly by not fetching the data for the entity.
A more efficient method would be to use a separate model with no fields whose key name is made up of property name + value. Then you could use get_by_key_name to fetch one or more of these composite key names and if you get one or more not-None values, you know there are duplicate values (and checking which values were not None, you'll know which ones were not unique.)
As onebyone mentioned in the comments, these approaches – by their get first, put later nature – run the risk concurrency issues. Theoretically, an entity could be created just after the check for an existing value, and then the code after the check will still execute, leading to duplicate values. To prevent this, you will have to use transactions: Transactions - Google App Engine
If you're looking to check for uniqueness across all entities with transactions, you'd have to put all of them in the same group using the first method, which would be very inefficient. For transactions, use the second method like this:
class UniqueConstraint(db.Model):
#classmethod
def check(cls, model, **values):
# Create a pseudo-key for use as an entity group.
parent = db.Key.from_path(model.kind(), 'unique-values')
# Build a list of key names to test.
key_names = []
for key in values:
key_names.append('%s:%s' % (key, values[key]))
def txn():
result = cls.get_by_key_name(key_names, parent)
for test in result:
if test: return False
for key_name in key_names:
uc = cls(key_name=key_name, parent=parent)
uc.put()
return True
return db.run_in_transaction(txn)
UniqueConstraint.check(...) will assume that every single key/value pair must be unique to return success. The transaction will use a single entity group for every model kind. This way, the transaction is reliable for several different fields at once (for only one field, this would be much simpler.) Also, even if you've got fields with the same name in one or more models, they will not conflict with each other.

Categories

Resources