Keeping an Audit Trail of any/all Python Database objects in GAE

Keeping an Audit Trail of any/all Python Database objects in GAE - python

I'm new to Python. I'm trying to figure out how to emulate an existing application I've coded using PHP and MS-SQL, and re-create the basic back-end functionality on the Google Apps Engine.
One of the things I'm trying to do is emulate the current activity on certain tables I have in MS-SQL, which is an Insert/Delete/Update trigger which inserts a copy of the current (pre-change) record into an audit table, and stamps it with a date and time. I'm then able to query this audit table at a later date to examine the history of changes that the record went through.
I've found the following code here on stackoverflow:
class HistoryEventFieldLevel(db.Model):
# parent, you don't have to define this
date = db.DateProperty()
model = db.StringProperty()
property = db.StringProperty() # Name of changed property
action = db.StringProperty( choices=(['insert', 'update', 'delete']) )
old = db.StringProperty() # Old value for field, empty on insert
new = db.StringProperty() # New value for field, empty on delete
However, I'm unsure how this code can be applied to all objects in my new database.
Should I create get() and put() functions for each of my objects, and then in the put() function I create a child object of this class, and set its particular properties?

This is certainly possible, albeit somewhat tricky. Here's a few tips to get you started:
Overriding the class's put() method isn't sufficient, since entities can also be stored by calling db.put(), which won't call any methods on the class being written.
You can get around this by monkeypatching the SDK to call pre/post call hooks, as documented in my blog post here.
Alternately, you can do this at a lower level by implementing RPC hooks, documented in another blog post here.
Storing the audit record as a child entity of the modified entity is a good idea, and means you can do it transactionally, though that would require further, more difficult changes.
You don't need a record per field. Entities have a natural serialization format, Protocol Buffers, and you can simply store the entity as an encoded Protocol Buffer in the audit record. If you're operating at the model level, use model_to_protobuf to convert a model into a Protocol Buffer.
All of the above are far more easily applied to storing the record after it's modified, rather than before it was changed. This shouldn't be an issue, though - if you need the record before it was modified, you can just go back one entry in the audit log.

I am bit out of touch of GAE and also no sdk with me to test it out, so here is some guidelines to given you a hint what you may do.
Create a metaclass AuditMeta which you set in any models you want audited
AuditMeta while creating a new model class should copy Class with new name with "_audit" appended and should also copy the attribute too, which becomes a bit tricky on GAE as attributes are itself descriptors
Add a put method to each such class and on put create a audit object for that class and save it, that way for each row in tableA you will have history in tableA_audit
e.g. a plain python example (without GAE)
import new
class AuditedModel(object):
def put(self):
print "saving",self,self.date
audit = self._audit_class()
audit.date = self.date
print "saving audit",audit,audit.date
class AuditMeta(type):
def __new__(self, name, baseclasses, _dict):
# create model class, dervied from AuditedModel
klass = type.__new__(self, name, (AuditedModel,)+baseclasses, _dict)
# create a audit class, copy of klass
# we need to copy attributes properly instead of just passing like this
auditKlass = new.classobj(name+"_audit", baseclasses, _dict)
klass._audit_class = auditKlass
return klass
class MyModel(object):
__metaclass__ = AuditMeta
date = "XXX"
# create object
a = MyModel()
a.put()
output:
saving <__main__.MyModel object at 0x957aaec> XXX
saving audit <__main__.MyModel_audit object at 0x957ab8c> XXX
Read audit trail code , only 200 lines, to see how they do it for django

Related

Run code when "foreign" object is added to set

I have a foreign key relationship in my Django (v3) models:
class Example(models.Model):
title = models.CharField(max_length=200) # this is irrelevant for the question here
not_before = models.DateTimeField(auto_now_add=True)
...
class ExampleItem(models.Model):
myParent = models.ForeignKey(Example, on_delete=models.CASCADE)
execution_date = models.DateTimeField(auto_now_add=True)
....
Can I have code running/triggered whenever an ExampleItem is "added to the list of items in an Example instance"? What I would like to do is run some checks and, depending on the concrete Example instance possibly alter the ExampleItem before saving it.
To illustrate
Let's say the Example's class not_before date dictates that the ExampleItem's execution_date must not be before not_before I would like to check if the "to be saved" ExampleItem's execution_date violates this condition. If so, I would want to either change the execution_date to make it "valid" or throw an exception (whichever is easier). The same is true for a duplicate execution_date (i.e. if the respective Example already has an ExampleItem with the same execution_date).
So, in a view, I have code like the following:
def doit(request, example_id):
# get the relevant `Example` object
example = get_object_or_404(Example, pk=example_id)
# create a new `ExampleItem`
itm = ExampleItem()
# set the item's parent
itm.myParent = example # <- this should trigger my validation code!
itm.save() # <- (or this???)
The thing is, this view is not the only way to create new ExampleItems; I also have an API for example that can do the same (let alone that a user could potentially "add ExampleItems manually via REPL). Preferably the validation code must not be duplicated in all the places where new ExampleItems can be created.
I was looking into Signals (Django docu), specifically pre_save and post_save (of ExampleItem) but I think pre_save is too early while post_save is too late... Also m2m_changed looks interesting, but I do not have a many-to-many relationship.
What would be the best/correct way to handle these requirements? They seem to be rather common, I imagine. Do I have to restructure my model?

The obvious solution here is to put this code in the ExampleItem.save() method - just beware that Model.save() is not invoked by some queryset bulk operations.
Using signals handlers on your own app's models is actually an antipattern - the goal of signal is to allow for your app to hook into other app's lifecycle without having to change those other apps code.
Also (unrelated but), you can populate your newly created models instances directly via their initializers ie:
itm = ExampleItem(myParent=example)
itm.save()
and you can even save them directly:
# creates a new instance, populate it AND save it
itm = ExampleItem.objects.create(myParent=example)
This will still invoke your model's save method so it's safe for your use case.

Detecting app engine datastore model attribute changes

How to trigger a method call every time a datastore entity attribute changes?
One way to do this I looked into was monkeypatching db.Model.put. That involved overriding the put method. While that allows me to react to every put(), it wasn't clear how I would detect if the address attribute has changed, since self.address would be already set in the beginning of .put().
Elaboration:
I have users and each user has a physical address.
class User(db.Model):
...
address = db.StringProperty() # for example "2 Macquarie Street, Sydney"
...
I would like to verify that the entered addresses are correct. For this I have an expensive address checking function (it contacts a remote API) and a boolean field.
class User(db.Model):
...
address = db.StringProperty()
address_is_valid = db.BooleanProperty(default=False)
def address_has_changed(self):
self.address_is_valid = False
Task(
url = "/check_address", # this would later set .address_is_valid
params = {
'user' : self.key()
}
).add()
...
But how can I get the address_has_changed method to trigger every time the address changes, without having to explicitly call it everywhere?
# It should work when changing an address
some_user = User.all().get()
some_user.address = "Santa Claus Main Post Office, FI-96930 Arctic Circle"
some_user.put()
# It should also work when multiple models are changed
...
db.put([some_user, another_user, yet_another_user])
# It should even work when creating a user
sherlock = User(address='221 B Baker St, London, England')
sherlock.put() # this should trigger address_has_changed

What about a Hook?
NDB offers a lightweight hooking mechanism. By defining a hook, an
application can run some code before or after some type of operations;
for example, a Model might run some function before every get().
from google.appengine.ext import ndb
class Friend(ndb.Model):
name = ndb.StringProperty()
def _pre_put_hook(self):
# inform someone they have new friend
#classmethod
def _post_delete_hook(cls, key, future):
# inform someone they have lost a friend
f = Friend()
f.name = 'Carole King'
f.put() # _pre_put_hook is called
fut = f.key.delete_async() # _post_delete_hook not yet called
fut.get_result() # _post_delete_hook is called
You could build in some further logic so that the original and new versions of the address are checked, and if they differ then run the expensive operation, otherwise just save.

Alright this might be 2 years too late, but here it is anyway you can always create class local variables in the __init()__ method that store old values and while you're calling the put method compare the values from these old variables.
class User(db.Model):
def __init__(self, *args, **kwargs):
super(User, self).__init__(*args, **kwargs)
self._old_address = self.address
address = db.StringProperty()
def put(self, **kwargs):
if self._old_address != self.address:
# ...
# do your thing
# ...
super(User, self).put(**kwargs)

Use a python property. This makes it easy to call address_has_changed whenever it is actually changed.

Neither Nick's article you refer too or ndb hooks solve the problem of tracking explicit changes in entities, they just make it easier to solve.
You would normally call your address_is_changed method inside the pre put hook rather all over the code base when you call put().
I have code in place that uses the these hook strategies to create audit trails of every change to a record,
However your code doesn't actually detect a change to the address.
You should consider changing to ndb, then use a post_get hook (to squirrel away orginal property values you wish to check - for instance in a session or request object) then use pre_put_hook to check the current property vs the orginal value, to see if you should then take any action, then call you address_has_changed method. You can use this strategy using db (by following Nicks article) but then you have to do a lot more heavy lifting yourself.

Appengine ndb: differentiating between creating and editing?

I have the following code in models.py:
class Order(ndb.Model):
created_at = ndb.DateTimeProperty(auto_now_add=True)
updated_at = ndb.DateTimeProperty(auto_now=True)
name = ndb.StringProperty()
due_dates = ndb.DateProperty(repeated=True)
class Task(ndb.Model):
created_at = ndb.DateTimeProperty(auto_now_add=True)
updated_at = ndb.DateTimeProperty(auto_now=True)
order = ndb.KeyProperty(required=True)
order_updated_at = ndb.DateTimeProperty(required=True)
...
When an order is created, 6 tasks will be created. Currently, I have the following method:
def _post_put_hook(self, future):
# Deleting old tasks
tbd = Task.query(Task.order == self.key).fetch(keys_only=True)
ndb.delete_multi(tbd)
# Generating new tasks
for entry in self.entries:
pt = entry.producetype.get()
# Now create Tasks and store them into the database
Task(order=self.key,
order_updated_at=self.updated_at,
order_entry_serial=entry.serial,
date=dt_sowing,
action=TaskAction.SOWING).put()
Now I am changing the way Order and Task are created.
I want to create Tasks when an Order is created AND I want to delete the tasks of an Order when an order is modified
Unfortunately, ndb's API states:
The Datastore API does not distinguish between creating a new entity
and updating an existing one. If the object's key represents an entity
that already exists, the put() method overwrites the existing entity.
You can use a transaction to test whether an entity with a given key
exists before creating one. See also the Model.get_or_insert() method.
I don't really understand how Model.get_or_insert can be applied in my scenario.
Note that I can't use _pre_put_hooks because my Tasks needs to reference their Order via its key.

Ignore get_or_insert(), it return an entity in any case and don't help you. You need to check if tasks exist in the datastore. I think to wrap a get() or a get_multi() function in a try/except. If the entities exists, delete it else create 6 new Task entities with a put_multi().
edit: you need timestamps for check the preexistance. Look datetime property and auto_now_add/auto_now options.

Exposing a "dumbed-down", read-only instance of a Model in GAE

Does anyone know a clever way, in Google App Engine, to return a wrapped Model instance that only exposes a few of the original properties, and does not allow saving the instance back to the datastore?
I'm not looking for ways of actually enforcing these rules, obviously it'll still be possible to change the instance by digging through its __dict__ etc. I just want a way to avoid accidental exposure/changing of data.
My initial thought was to do this (I want to do this for a public version of a User model):
class PublicUser(db.Model):
display_name = db.StringProperty()
#classmethod
def kind(cls):
return 'User'
def put(self):
raise SomeError()
Unfortunately, GAE maps the kind to a class early on, so if I do PublicUser.get_by_id(1) I will actually get a User instance back, not a PublicUser instance.
Also, the idea is that it should at least appear to be a Model instance so that I can pass it around to code that does not know about the fact that it is a "dumbed-down" version. Ultimately I want to do this so that I can use my generic data exposure functions on the read-only version, so that they only expose public information about the user.
Update
I went with icio's solution. Here's the code I wrote for copying the properties from the User instance over to a PublicUser instance:
class User(db.Model):
# ...
# code
# ...
def as_public(self):
"""Returns a PublicUser version of this object.
"""
props = self.properties()
pu = PublicUser()
for prop in pu.properties().values():
# Only copy properties that exist for both the PublicUser model and
# the User model.
if prop.name in props:
# This line of code sets the property of the PublicUser
# instance to the value of the same property on the User
# instance.
prop.__set__(pu, props[prop.name].__get__(self, type(self)))
return pu
Please comment if this isn't a good way of doing it.

Could you not create a method within your User class which instantiates a ReadOnlyUser object and copies the values of member variables over as appropriate? Your call would be something like User.get_by_id(1).readonly() with the readonly method defined in the following form:
class User(db.Model):
def readonly(self):
return ReadOnlyUser(self.name, self.id);
Or you could perhaps have your User class extend another class with methods to do this automatically based on some static vars listing properties to copy over, or something.
P.S. I don't code in Python

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.

Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!

What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keeping an Audit Trail of any/all Python Database objects in GAE - python

Related

Run code when "foreign" object is added to set

Detecting app engine datastore model attribute changes

Appengine ndb: differentiating between creating and editing?

Exposing a "dumbed-down", read-only instance of a Model in GAE

Using Property Builtin with GAE Datastore's Model

Categories

Resources