Detecting app engine datastore model attribute changes

Detecting app engine datastore model attribute changes - python

How to trigger a method call every time a datastore entity attribute changes?
One way to do this I looked into was monkeypatching db.Model.put. That involved overriding the put method. While that allows me to react to every put(), it wasn't clear how I would detect if the address attribute has changed, since self.address would be already set in the beginning of .put().
Elaboration:
I have users and each user has a physical address.
class User(db.Model):
...
address = db.StringProperty() # for example "2 Macquarie Street, Sydney"
...
I would like to verify that the entered addresses are correct. For this I have an expensive address checking function (it contacts a remote API) and a boolean field.
class User(db.Model):
...
address = db.StringProperty()
address_is_valid = db.BooleanProperty(default=False)
def address_has_changed(self):
self.address_is_valid = False
Task(
url = "/check_address", # this would later set .address_is_valid
params = {
'user' : self.key()
}
).add()
...
But how can I get the address_has_changed method to trigger every time the address changes, without having to explicitly call it everywhere?
# It should work when changing an address
some_user = User.all().get()
some_user.address = "Santa Claus Main Post Office, FI-96930 Arctic Circle"
some_user.put()
# It should also work when multiple models are changed
...
db.put([some_user, another_user, yet_another_user])
# It should even work when creating a user
sherlock = User(address='221 B Baker St, London, England')
sherlock.put() # this should trigger address_has_changed

What about a Hook?
NDB offers a lightweight hooking mechanism. By defining a hook, an
application can run some code before or after some type of operations;
for example, a Model might run some function before every get().
from google.appengine.ext import ndb
class Friend(ndb.Model):
name = ndb.StringProperty()
def _pre_put_hook(self):
# inform someone they have new friend
#classmethod
def _post_delete_hook(cls, key, future):
# inform someone they have lost a friend
f = Friend()
f.name = 'Carole King'
f.put() # _pre_put_hook is called
fut = f.key.delete_async() # _post_delete_hook not yet called
fut.get_result() # _post_delete_hook is called
You could build in some further logic so that the original and new versions of the address are checked, and if they differ then run the expensive operation, otherwise just save.

Alright this might be 2 years too late, but here it is anyway you can always create class local variables in the __init()__ method that store old values and while you're calling the put method compare the values from these old variables.
class User(db.Model):
def __init__(self, *args, **kwargs):
super(User, self).__init__(*args, **kwargs)
self._old_address = self.address
address = db.StringProperty()
def put(self, **kwargs):
if self._old_address != self.address:
# ...
# do your thing
# ...
super(User, self).put(**kwargs)

Use a python property. This makes it easy to call address_has_changed whenever it is actually changed.

Neither Nick's article you refer too or ndb hooks solve the problem of tracking explicit changes in entities, they just make it easier to solve.
You would normally call your address_is_changed method inside the pre put hook rather all over the code base when you call put().
I have code in place that uses the these hook strategies to create audit trails of every change to a record,
However your code doesn't actually detect a change to the address.
You should consider changing to ndb, then use a post_get hook (to squirrel away orginal property values you wish to check - for instance in a session or request object) then use pre_put_hook to check the current property vs the orginal value, to see if you should then take any action, then call you address_has_changed method. You can use this strategy using db (by following Nicks article) but then you have to do a lot more heavy lifting yourself.

Related

Auto delete a Django object from the database based on DateTimeField

Let's imagine a simple Food model with a name and an expiration date, my goal is to auto delete the object after the expiration date is reached.
I want to delete objects from the database (postgresql in my case) just after exp_date is reached, not filter by exp_date__gt=datetime.datetime.now() in my code then cron/celery once a while a script that filter by exp_date__lt=datetime.datetime.now() and then delete
Food(models.Model):
name = models.CharField(max_length=200)
exp_date = models.DateTimeField()
*I could do it with a vanilla view when the object is accessed via an endpoint or even with the DRF like so :
class GetFood(APIView):
def check_date(self, food):
"""
checking expiration date
"""
if food.exp_date <= datetime.datetime.now():
food.delete()
return False
def get(self, request, *args, **kwargs):
id = self.kwargs["id"]
if Food.objects.filter(pk=id).exists():
food = Food.objects.get(pk=id)
if self.check_date(food) == False:
return Response({"error": "not found"}, status.HTTP_404_NOT_FOUND)
else:
name = food.name
return Response({"food":name}, status.HTTP_200_OK)
else:
return Response({"error":"not found"},status.HTTP_404_NOT_FOUND)
but it would not delete the object if no one try to access it via an endpoint.
*I could also set cronjob with a script that query the database for every Food object which has an expiration date smaller than today and then delete themor even setup Celery. It would indeed just need to run once a day if I was using DateField but as I am using DateTimeField it would need to run every minute (every second for the need of ny project).
*I've also thought of a fancy workaround with a post_save signal with a while loop like :
#receiver(post_save, sender=Food)
def delete_after_exp_date(sender, instance, created, **kwargs):
if created:
while instance.exp_date > datetime.datetime.now():
pass
else:
instance.delete()
I don't know if it'd work but it seems very inefficient (if someone could please confirm)
Voila, thanks in advance if you know some ways or some tools to achieve what I want to do, thanks for reading !

I would advice not to delete the objects, or at least not effectively. Sceduling tasks is cumbersome. Even if you manage to schedule this, the time when you remove the items will always be slighlty off the time when you scheduled this from happening. It also means you will make an extra query per element, and not remove the items in bulk. Furthermore scheduling is inherently more complicated: it means you need something to persist the schedule. If later the expiration date of some food is changed, it will require extra logic to "cancel" the current schedule and create a new one. It also makes the system less "reliable": besides the webserver, the scheduler daemon has to run. It can happen that for some reason the daemon fails, and then you will no longer retrieve food that is not expired.
Therefore it might be better to combine filtering the records such that you only retrieve food that did not expire, and remove at some regular interval Food that has expired. You can easily filter the objects with:
from django.db.models.functions import Now
Food.objects.filter(exp_date__gt=Now())
to retrieve Food that is not expired. To make it more efficient, you can add a database index on the exp_date field:
Food(models.Model):
name = models.CharField(max_length=200)
exp_date = models.DateTimeField(db_index=True)
If you need to filter often, you can even work with a Manager [Django-doc]:
from django.db.models.functions import Now
class FoodManager(models.Manager):
def get_queryset(*args, **kwargs):
return super().get_queryset(*args, **kwargs).filter(
exp_date__gt=Now()
)
class Food(models.Model):
name = models.CharField(max_length=200)
exp_date = models.DateTimeField(db_index=True)
objects = FoodManager()
Now if you work with Food.objects you automatically filter out all Food that is expired.
Besides that you can make a script that for example runs daily to remove the Food objects that have expired:
from django.db.models import Now
Food._base_manager.filter(exp_date__lte=Now()).delete()

Update to the accepted answer. You may run into Super(): No Arguments if you define the method outside the class. I found this answer helpful.
As Per PEP 3135, which introduced "new super":
The new syntax:
super()
is equivalent to:
super(__class__, <firstarg>)
where class is the class that the method
was defined in, and is the first
parameter of the method (normally self for
instance methods, and cls for class methods).
While super is not a reserved word, the parser recognizes the use of super in a method definition and only passes in the class cell when this is found. Thus, calling a global alias of super without arguments will not necessarily work.
As such, you will still need to include self:
class FoodManager(models.Manager):
def get_queryset(self, *args, **kwargs):
return super().get_queryset(*args, **kwargs).filter(
exp_date__gt=Now()
)
Just something to keep in mind.

Is it possible to determine with NDB if model is persistent in the datastore or not?

I am in process of migration from db.Model to ndb.Model. The only issue that I have to solve before finish this migration is that there is no Model.is_saved method. I have used db.Model.is_saved in my application to determine if sharded counters must be updated on put/delete, to check for conflicted keys on creating entities etc.
The documentation says that ndb.Model has no equivalent for is_saved method. I can reimplement some use cases with get_or_insert instead of is_saved. But not all of them.
As a dirty hack I can set flag like _in_memory_instance for every instance I have created by calling constructor. But it does not solve my issue. I still have to update this flag at least after every put() call.
The question is: is there better way to determine if model is persistent in the datastore or not without extra datastore hit?
Edit 1: Forgot to mention: all the entities got keys so check for Model._has_complete_key() does not work for me.
Edit 2: After this discussion https://groups.google.com/d/topic/google-appengine/Tm8NDWIvc70/discussion it seems to be the only way to solve my issue is to use _post_get_hook/_post_put_hook. I wondering why such a trivial thing was not included in official API.
Edit 3: I ended up with next base class for all my models. Now I can leave my codebase (almost) untouched:
class BaseModel(ndb.Model):
#classmethod
def _post_get_hook(cls, key, future):
self = future.get_result()
if self:
self._is_saved = bool(key)
def _post_put_hook(self, future):
self._is_saved = future.state == future.FINISHING
def is_saved(self):
if self._has_complete_key():
return getattr(self, "_is_saved", False)
return False

To get the same kind of state in NDB you would need a combination of
post-get-hook and post-put-hook to set a flag. Here's a working
example:
class Employee(ndb.Model):
<properties here>
saved = False # class variable provides default value
#classmethod
def _post_get_hook(cls, key, future):
obj = future.get_result()
if obj is not None:
# test needed because post_get_hook is called even if get() fails!
obj.saved = True
def _post_put_hook(self, future):
self.saved = True
There's no need to check for the status of the future -- when either
hook is called, the future always has a result. This is because the
hook is actually a callback on the future. However there is a need to
check if its result is None!
PS: Inside a transaction, the hooks get called as soon as the put() call returns; success or failure of the transaction doesn't enter affect them. See https://developers.google.com/appengine/docs/python/ndb/contextclass#Context_call_on_commit for a way to run a hook after a successful commit.

Based on #Tim Hoffmans idea you can you a post hook like so:
class Article(ndb.Model):
title = ndb.StringProperty()
is_saved = False
def _post_put_hook(self, f):
if f.state == f.FINISHING:
self.is_saved = True
else:
self.is_saved = False
article = Article()
print article.is_saved ## False
article.put()
print article.is_saved ## True
I can't guarantee that it's persisted in the datastore. Didn't find anything about it on google :)
On a side not, looking to see if a ndb.Model instance has a key won't probably work since a new instance seems to get a Key before it's ever sent to the datastore. You can look at the source code to see what happens when you create an instance of the ndb.Model class.

If you do not mention a key while creating an instance of the Model, you can use the following implementation of is_saved() to know if the object has been written to the datastore or not atleast once. (should be appropriate if you are migrating from google.appengine.ext.db to google.appengine.ext.ndb)
Using the example given by #fredrik,
class Article(ndb.Model):
title = ndb.StringProperty()
def is_saved(self):
if self.key:
return True
return False
P.S. - I do not know if this would work with google.cloud.ndb

Keeping an Audit Trail of any/all Python Database objects in GAE

I'm new to Python. I'm trying to figure out how to emulate an existing application I've coded using PHP and MS-SQL, and re-create the basic back-end functionality on the Google Apps Engine.
One of the things I'm trying to do is emulate the current activity on certain tables I have in MS-SQL, which is an Insert/Delete/Update trigger which inserts a copy of the current (pre-change) record into an audit table, and stamps it with a date and time. I'm then able to query this audit table at a later date to examine the history of changes that the record went through.
I've found the following code here on stackoverflow:
class HistoryEventFieldLevel(db.Model):
# parent, you don't have to define this
date = db.DateProperty()
model = db.StringProperty()
property = db.StringProperty() # Name of changed property
action = db.StringProperty( choices=(['insert', 'update', 'delete']) )
old = db.StringProperty() # Old value for field, empty on insert
new = db.StringProperty() # New value for field, empty on delete
However, I'm unsure how this code can be applied to all objects in my new database.
Should I create get() and put() functions for each of my objects, and then in the put() function I create a child object of this class, and set its particular properties?

This is certainly possible, albeit somewhat tricky. Here's a few tips to get you started:
Overriding the class's put() method isn't sufficient, since entities can also be stored by calling db.put(), which won't call any methods on the class being written.
You can get around this by monkeypatching the SDK to call pre/post call hooks, as documented in my blog post here.
Alternately, you can do this at a lower level by implementing RPC hooks, documented in another blog post here.
Storing the audit record as a child entity of the modified entity is a good idea, and means you can do it transactionally, though that would require further, more difficult changes.
You don't need a record per field. Entities have a natural serialization format, Protocol Buffers, and you can simply store the entity as an encoded Protocol Buffer in the audit record. If you're operating at the model level, use model_to_protobuf to convert a model into a Protocol Buffer.
All of the above are far more easily applied to storing the record after it's modified, rather than before it was changed. This shouldn't be an issue, though - if you need the record before it was modified, you can just go back one entry in the audit log.

I am bit out of touch of GAE and also no sdk with me to test it out, so here is some guidelines to given you a hint what you may do.
Create a metaclass AuditMeta which you set in any models you want audited
AuditMeta while creating a new model class should copy Class with new name with "_audit" appended and should also copy the attribute too, which becomes a bit tricky on GAE as attributes are itself descriptors
Add a put method to each such class and on put create a audit object for that class and save it, that way for each row in tableA you will have history in tableA_audit
e.g. a plain python example (without GAE)
import new
class AuditedModel(object):
def put(self):
print "saving",self,self.date
audit = self._audit_class()
audit.date = self.date
print "saving audit",audit,audit.date
class AuditMeta(type):
def __new__(self, name, baseclasses, _dict):
# create model class, dervied from AuditedModel
klass = type.__new__(self, name, (AuditedModel,)+baseclasses, _dict)
# create a audit class, copy of klass
# we need to copy attributes properly instead of just passing like this
auditKlass = new.classobj(name+"_audit", baseclasses, _dict)
klass._audit_class = auditKlass
return klass
class MyModel(object):
__metaclass__ = AuditMeta
date = "XXX"
# create object
a = MyModel()
a.put()
output:
saving <__main__.MyModel object at 0x957aaec> XXX
saving audit <__main__.MyModel_audit object at 0x957ab8c> XXX
Read audit trail code , only 200 lines, to see how they do it for django

problems using observer pattern in django

I'm working on a website where I sell products (one class Sale, one class Product). Whenever I sell a product, I want to save that action in a History table and I have decided to use the observer pattern to do this.
That is: my class Sales is the subject and the History class is the observer, whenever I call the save_sale() method of the Sales class I will notify the observers. (I've decided to use this pattern because later I'll also send an email, notify the admin, etc.)
This is my subject class (the Sales class extends from this)
class Subject:
_observers = []
def attach(self, observer):
if not observer in self._observers:
self._observers.append(observer)
def detach(self, observer):
try:
self._observers.remove(observer)
except ValueError:
pass
def notify(self,**kargs):
for observer in self._observers:
observer.update(self,**kargs)
on the view I do something like this
sale = Sale()
sale.user = request.user
sale.product = product
h = History() #here I create the observer
sale.attach(h) #here I add the observer to the subject class
sale.save_sale() #inside this class I will call the notify() method
This is the update method on History
def update(self,subject,**kargs):
self.action = "sale"
self.username = subject.user.username
self.total = subject.product.total
self.save(force_insert=True)
It works fine the first time, but when I try to make another sale, I get an error saying I can't insert into History because of a primary key constraint.
My guess is that when I call the view the second time, the first observer is still in the Subject class, and now I have two history observers listening to the Sales, but I'm not sure if that's the problem (gosh I miss the print_r from php).
What am I doing wrong? When do I have to "attach" the observer? Or is there a better way of doing this?
BTW: I'm using Django 1.1 and I don't have access to install any plugins.

This may not be an acceptable answer since it's more architecture related, but have you considered using signals to notify the system of the change? It seems that you are trying to do exactly what signals were designed to do. Django signals have the same end-result functionality as Observer patterns.
http://docs.djangoproject.com/en/1.1/topics/signals/

I think this is because _observers = [] acts like static shared field. So every instance of Subject changes the _observers instance and it has unwanted side effect.
Initialize this variable in constructor:
class Subject:
def __init__(self):
self._observers = []

#Andrew Sledge's answer indicates a good way of tackling this problem. I would like to suggest an alternate approach.
I had a similar problem and started out using signals. They worked well but I found that my unit tests had become slower as the signals were called each time I loaded an instance of the associated class using a fixture. This added tens of seconds to the test run. There is a work around but I found it clumsy. I defined a custom test runner and disconnected my functions from the signals before loading fixtures. I reconnected them afterwards.
Finally I decided to ditch signals altogether and overrode the appropriate save() methods of models instead. In my case whenever an Order is changed a row is automatically created in and OrderHistory table, among other things. In order to do this I added a function to create an instance of OrderHistory and called it from within the Order.save() method. this also made it possible to test the save() and the function separately.
Take a look at this SO question. It has a discussion about when to override save() versus when to use signals.

Thank you all for your answers, reading about signals gave me another perspective but i dont want to use them because of learning purposes (i wanted to use the observer pattern in web development :P) In the end, i solved doing something like this:
class Sales(models.Model,Subject):
...
def __init__(self):
self._observers = [] #reset observers
self.attach(History()) #attach a History Observer
...
def save(self):
super(Sales,self).save()
self.notify() # notify all observers
now every time i call the save(), the observers will be notified and if i need it, i could add or delete an observer
what do you think? is this a good way to solve it?

Exposing a "dumbed-down", read-only instance of a Model in GAE

Does anyone know a clever way, in Google App Engine, to return a wrapped Model instance that only exposes a few of the original properties, and does not allow saving the instance back to the datastore?
I'm not looking for ways of actually enforcing these rules, obviously it'll still be possible to change the instance by digging through its __dict__ etc. I just want a way to avoid accidental exposure/changing of data.
My initial thought was to do this (I want to do this for a public version of a User model):
class PublicUser(db.Model):
display_name = db.StringProperty()
#classmethod
def kind(cls):
return 'User'
def put(self):
raise SomeError()
Unfortunately, GAE maps the kind to a class early on, so if I do PublicUser.get_by_id(1) I will actually get a User instance back, not a PublicUser instance.
Also, the idea is that it should at least appear to be a Model instance so that I can pass it around to code that does not know about the fact that it is a "dumbed-down" version. Ultimately I want to do this so that I can use my generic data exposure functions on the read-only version, so that they only expose public information about the user.
Update
I went with icio's solution. Here's the code I wrote for copying the properties from the User instance over to a PublicUser instance:
class User(db.Model):
# ...
# code
# ...
def as_public(self):
"""Returns a PublicUser version of this object.
"""
props = self.properties()
pu = PublicUser()
for prop in pu.properties().values():
# Only copy properties that exist for both the PublicUser model and
# the User model.
if prop.name in props:
# This line of code sets the property of the PublicUser
# instance to the value of the same property on the User
# instance.
prop.__set__(pu, props[prop.name].__get__(self, type(self)))
return pu
Please comment if this isn't a good way of doing it.

Could you not create a method within your User class which instantiates a ReadOnlyUser object and copies the values of member variables over as appropriate? Your call would be something like User.get_by_id(1).readonly() with the readonly method defined in the following form:
class User(db.Model):
def readonly(self):
return ReadOnlyUser(self.name, self.id);
Or you could perhaps have your User class extend another class with methods to do this automatically based on some static vars listing properties to copy over, or something.
P.S. I don't code in Python

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Detecting app engine datastore model attribute changes - python

Use a python property. This makes it easy to call address_has_changed whenever it is actually changed.

Related

Auto delete a Django object from the database based on DateTimeField

Is it possible to determine with NDB if model is persistent in the datastore or not?

Keeping an Audit Trail of any/all Python Database objects in GAE

problems using observer pattern in django

Exposing a "dumbed-down", read-only instance of a Model in GAE

Categories

Resources