ComputedProperty only updates on second put() - python

I have a ComputedProperty inside a StructuredProperty that does not get updated when the object is first created.
When I create the object address_components_ascii does not get saved. The field is not visible in the Datastore Viewer at all. But if I get() and then immediately put() again (even without changing anything), the ComputedProperty works as expected. The address_components field works properly.
I have tried clearing the database, and deleting the whole database folder, without success.
I am using the local dev server on windows 7. I have not tested it on GAE.
Here's the code:
class Item(ndb.Model):
location = ndb.StructuredProperty(Location)
The inner Location class:
class Location(ndb.Model):
address_components = ndb.StringProperty(repeated=True) # array of names of parent areas, from smallest to largest
address_components_ascii = ndb.ComputedProperty(lambda self: [normalize(part) for part in self.address_components], repeated=True)
The normalization function
def normalize(s):
return unicodedata.normalize('NFKD', s.decode("utf-8").lower()).encode('ASCII', 'ignore')
An example of the address_components field:
[u'114B', u'Drottninggatan', u'Norrmalm', u'Stockholm', u'Stockholm', u'Stockholms l\xe4n', u'Sverige']
and the address_components_ascii field, after the second put():
[u'114b', u'drottninggatan', u'norrmalm', u'stockholm', u'stockholm', u'stockholms lan', u'sverige']

The real problem seemed to be the order that GAE calls _prepare_for_put() on the StructuredProperty relative to the call to _pre_put_hook() of the surrounding Model.
I was writing to address_components in the Item._pre_put_hook(). I assume GAE computed the ComputedProperty of the StructuredProperty before calling the _pre_put_hook() on Item. Reading from the ComputedProperty causes its value to be recalculated.
I added this to the end of the _pre_put_hook():
# quick-fix: ComputedProperty not getting generated properly
# read from all ComputedProperties, to compute them again before put
_ = self.address_components_ascii
I'm saving the return value to a dummy variable to avoid IDE warnings.

I just tried this code on dev server and its worked. Computed property is accessible before and after put.
from google.appengine.ext import ndb
class TestLocation(ndb.Model):
address = ndb.StringProperty(repeated=True)
address_ascii = ndb.ComputedProperty(lambda self: [
part.lower() for part in self.address], repeated=True)
class TestItem(ndb.Model):
location = ndb.StructuredProperty(TestLocation)
item = TestItem(id='test', location=TestLocation(
address=['Drottninggatan', 'Norrmalm']))
assert item.location.address_ascii == ['drottninggatan', 'norrmalm']
item.put()
assert TestItem.get_by_id('test').location.address_ascii == [
'drottninggatan', 'norrmalm']

This seems to be a limitation in ndb. Simply doing a put() followed by a get() and another put() worked. It's slower, but only required when creating an object the first time.
I added this method:
def double_put(self):
return self.put().get().put()
which is a drop-in replacement for put().
When I put() a new object I call MyObject.double_put() instead of MyObject.put().

Related

Initialization and changing ndb property in google-apps-engine

I created a datastore object accoring to the guestbook tutorial:
class myDS(ndb.Model):
a = ndb.StringProperty(indexed=True)
And I have an Handlers to access it and update is:
class Handler1:
my_ds = myDS()
my_ds.a = "abc" #Trying to update the value
class Handler2:
my_ds = myDS()
self.response.write(my_ds.a) #prints None although I changed the value in Handlers1
def main():
application = webapp.WSGIApplication([
('/set', Handler1),
('/get', Handler2])
I call:
Myapp.com/set
Myapp.com/get : Prints None (Didn't update to "abc")
Why wasn't the value of a updated?
How can I update across the handlers?
Cloud Datastore (GCD) stores data objects as entities, which may have one or more properties. In your case the property value type is a string 'abc'. However each entity is identified by a key which is a unique identifier within your app's datastore.
So in your case you would need to create a key for object my_ds and also define a model class. That could be Handler1 (e.g. class Handler1(ndb.Model): #your code) which defines the property you are trying to call.
Additionally, you cannot expect the value to be updated without using the put() function (e.g. my_ds.put() ). In order to use the second handler (Handler2) to create a new object and set the values of the properties you need to learn a bit more about using Webapp2 request handler.
I also suggest you follow this tutorial to get started.

Delete Data store entries synchronously in Google App Engine

I use python in GAP and try to delete one entries in datastore by using db.delete(model_obj). I suppose this operation is undertaken synchronously, since the document tell the difference between delete() and delete_async(), but when I read the source code in the db, the delete method just simply call the delete_async, which is not match what the document says :(
So is there any one to do delete in synchronous flow?
Here is the source code in db:
def delete_async(models, **kwargs):
"""Asynchronous version of delete one or more Model instances.
Identical to db.delete() except returns an asynchronous object. Call
get_result() on the return value to block on the call.
"""
if isinstance(models, (basestring, Model, Key)):
models = [models]
else:
try:
models = iter(models)
except TypeError:
models = [models]
keys = [_coerce_to_key(v) for v in models]
return datastore.DeleteAsync(keys, **kwargs)
def delete(models, **kwargs):
"""Delete one or more Model instances.
"""
delete_async(models, **kwargs).get_result()
EDIT: From a comment, this is the original misbehaving code:
def tearDown(self):
print self.account
db.delete(self.device)
db.delete(self.account)
print Account.get_by_email(self.email, case_sensitive=False)
The result for two print statement is <Account object at 0x10d1827d0> <Account object at 0x10d1825d0>. Even two memory addresses are different but they point to the same object. If I put some latency after the delete like for loop, the object fetched is None.
The code you show for delete calls delete_async, yes, but then it calls get_result on the returned asynchronous handle, which will block until the delete actually occurs. So, delete is synchronous.
The reason the sample code you show is returning an object is that you're probably running a query to fetch the account; I presume the email is not the db.Key of the account? Normal queries are not guaranteed to return updated results immediately. To avoid seeing stale data, you either need to use an ancestor query or look up the entity by key, both of which are strongly consistent.

I came across a tricky trouble about django Queryset

Tricky code:
user = User.objects.filter(id=123)
user[0].last_name = 'foo'
user[0].save() # Cannot be saved.
id(user[0]) # 32131
id(user[0]) # 44232 ( different )
user cannot be saved in this way.
Normal code:
user = User.objects.filter(id=123)
if user:
user[0].last_name = 'foo'
user[0].save() # Saved successfully.
id(user[0]) # 32131
id(user[0]) # 32131 ( same )
So, what is the problem?
In first variant your user queryset isn't evaluated yet. So every time you write user[0] ORM makes independent query to DB. In second variation queryset is evalutaed and acts like normal Python list.
And BTW if you want just one row, use get:
user = User.objects.get(id=123)
when you index into a queryset, django fetches the data (or looks in its cache) and creates a model instance for you. as you discovered with id(), each call creates a new instance. so while you can set the properties on these qs[0].last_name = 'foo', the subsequent call to qs[0].save() creates a new instance (with the original last_name) and saves that
i'm guessing your particular issue has to do with when django caches query results. when you are just indexing into the qs, nothing gets cached, but your call if users causes the entire (original) qs to be evaluated, and thus cached. so in that case each call to [0] retrieves the same model instance
Saving is possible, but everytime you access user[0], you actually get it from the database so it's unchanged.
Indeed, when you slice a Queryset, Django issues a SELECT ... FROM ... OFFSET ... LIMIT ... query to your database.
A Queryset is not a list, so if you want to it to behave like a list, you need to evaluate it, to do so, call list() on it.
user = list(User.objects.filter(id=123))
In your second example, calling if user will actually evaluate the queryset (get it from the database into your python program), so you then work with your Queryset's internal cache.
Alternatively, you can use u = user[0], edit that and then save, which will work.
Finally, you should actually be calling Queryset.get, not filter here, since you're using the unique key.

Keeping an Audit Trail of any/all Python Database objects in GAE

I'm new to Python. I'm trying to figure out how to emulate an existing application I've coded using PHP and MS-SQL, and re-create the basic back-end functionality on the Google Apps Engine.
One of the things I'm trying to do is emulate the current activity on certain tables I have in MS-SQL, which is an Insert/Delete/Update trigger which inserts a copy of the current (pre-change) record into an audit table, and stamps it with a date and time. I'm then able to query this audit table at a later date to examine the history of changes that the record went through.
I've found the following code here on stackoverflow:
class HistoryEventFieldLevel(db.Model):
# parent, you don't have to define this
date = db.DateProperty()
model = db.StringProperty()
property = db.StringProperty() # Name of changed property
action = db.StringProperty( choices=(['insert', 'update', 'delete']) )
old = db.StringProperty() # Old value for field, empty on insert
new = db.StringProperty() # New value for field, empty on delete
However, I'm unsure how this code can be applied to all objects in my new database.
Should I create get() and put() functions for each of my objects, and then in the put() function I create a child object of this class, and set its particular properties?
This is certainly possible, albeit somewhat tricky. Here's a few tips to get you started:
Overriding the class's put() method isn't sufficient, since entities can also be stored by calling db.put(), which won't call any methods on the class being written.
You can get around this by monkeypatching the SDK to call pre/post call hooks, as documented in my blog post here.
Alternately, you can do this at a lower level by implementing RPC hooks, documented in another blog post here.
Storing the audit record as a child entity of the modified entity is a good idea, and means you can do it transactionally, though that would require further, more difficult changes.
You don't need a record per field. Entities have a natural serialization format, Protocol Buffers, and you can simply store the entity as an encoded Protocol Buffer in the audit record. If you're operating at the model level, use model_to_protobuf to convert a model into a Protocol Buffer.
All of the above are far more easily applied to storing the record after it's modified, rather than before it was changed. This shouldn't be an issue, though - if you need the record before it was modified, you can just go back one entry in the audit log.
I am bit out of touch of GAE and also no sdk with me to test it out, so here is some guidelines to given you a hint what you may do.
Create a metaclass AuditMeta which you set in any models you want audited
AuditMeta while creating a new model class should copy Class with new name with "_audit" appended and should also copy the attribute too, which becomes a bit tricky on GAE as attributes are itself descriptors
Add a put method to each such class and on put create a audit object for that class and save it, that way for each row in tableA you will have history in tableA_audit
e.g. a plain python example (without GAE)
import new
class AuditedModel(object):
def put(self):
print "saving",self,self.date
audit = self._audit_class()
audit.date = self.date
print "saving audit",audit,audit.date
class AuditMeta(type):
def __new__(self, name, baseclasses, _dict):
# create model class, dervied from AuditedModel
klass = type.__new__(self, name, (AuditedModel,)+baseclasses, _dict)
# create a audit class, copy of klass
# we need to copy attributes properly instead of just passing like this
auditKlass = new.classobj(name+"_audit", baseclasses, _dict)
klass._audit_class = auditKlass
return klass
class MyModel(object):
__metaclass__ = AuditMeta
date = "XXX"
# create object
a = MyModel()
a.put()
output:
saving <__main__.MyModel object at 0x957aaec> XXX
saving audit <__main__.MyModel_audit object at 0x957ab8c> XXX
Read audit trail code , only 200 lines, to see how they do it for django

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

Categories

Resources