Using Property Builtin with GAE Datastore's Model - python

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.

Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!

What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

Related

Passing an object created with SubFactory and LazyAttribute to a RelatedFactory in factory_boy

I am using factory.LazyAttribute within a SubFactory call to pass in an object, created in the factory_parent. This works fine.
But if I pass the object created to a RelatedFactory, LazyAttribute can no longer see the factory_parent and fails.
This works fine:
class OKFactory(factory.DjangoModelFactory):
class = Meta:
model = Foo
exclude = ['sub_object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
The identical call to LazyAttribute fails here:
class ProblemFactory(OKFactory):
class = Meta:
model = Foo
exclude = ['sub_object', 'object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
another_object = factory.RelatedFactory(AnotherObjectFactory, 'foo', object=object)
The identical LazyAttribute call can no longer see factory_parent, and can only access AnotherObject values. LazyAttribute throws the error:
AttributeError: The parameter sub_object is unknown. Evaluated attributes are...[then lists all attributes of AnotherObjectFactory]
Is there a way round this?
I can't just put sub_object=sub_object into the ObjectFactory call, ie:
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory, sub_object=sub_object)
because if I then do:
object2 = factory.SubFactory(ObjectFactory, sub_object=sub_object)
a second sub_object is created, whereas I need both objects to refer to the same sub_object. I have tried SelfAttribute to no avail.
I think you can leverage the ability to override parameters passed in to the RelatedFactory to achieve what you want.
For example, given:
class MyFactory(OKFactory):
object = factory.SubFactory(MyOtherFactory)
related = factory.RelatedFactory(YetAnotherFactory) # We want to pass object in here
If we knew what the value of object was going to be in advance, we could make it work with something like:
object = MyOtherFactory()
thing = MyFactory(object=object, related__param=object)
We can use this same naming convention to pass the object to the RelatedFactory within the main Factory:
class MyFactory(OKFactory):
class Meta:
exclude = ['object']
object = factory.SubFactory(MyOtherFactory)
related__param = factory.SelfAttribute('object')
related__otherrelated__param = factory.LazyAttribute(lambda myobject: 'admin%d_%d' % (myobject.level, myobject.level - 1))
related = factory.RelatedFactory(YetAnotherFactory) # Will be called with {'param': object, 'otherrelated__param: 'admin1_2'}
I solved this by simply calling factories within #factory.post_generation. Strictly speaking this isn't a solution to the specific problem posed, but I explain below in great detail why this ended up being a better architecture. #rhunwick's solution does genuinely pass a SubFactory(LazyAttribute('')) to RelatedFactory, however restrictions remained that meant this was not right for my situation.
We move the creation of sub_object and object from ProblemFactory to ObjectWithSubObjectsFactory (and remove the exclude clause), and add the following code to the end of ProblemFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return # No IDs, so wouldn't work anyway
object = ObjectWithSubObjectsFactory()
sub_object_ids_by_code = dict((sbj.name, sbj.id) for sbj in object.subobject_set.all())
# self is the `Foo` Django object just created by the `ProblemFactory` that contains this code.
for another_obj in self.anotherobject_set.all():
if another_obj.name == 'age_in':
another_obj.attribute_id = sub_object_ids_by_code['Age']
another_obj.save()
elif another_obj.name == 'income_in':
another_obj.attribute_id = sub_object_ids_by_code['Income']
another_obj.save()
So it seems RelatedFactory calls are executed before PostGeneration calls.
The naming in this question is easier to understand, so here is the same solution code for that sample problem:
The creation of dataset, column_1 and column_2 are moved into a new factory DatasetAnd2ColumnsFactory, and the code below is then added to the end of FunctionToParameterSettingsFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return
dataset = DatasetAnd2ColumnsFactory()
column_ids_by_name =
dict((column.name, column.id) for column in dataset.column_set.all())
# self is the `FunctionInstantiation` Django object just created by the `FunctionToParameterSettingsFactory` that contains this code.
for parameter_setting in self.parametersetting_set.all():
if parameter_setting.name == 'age_in':
parameter_setting.column_id = column_ids_by_name['Age']
parameter_setting.save()
elif parameter_setting.name == 'income_in':
parameter_setting.column_id = column_ids_by_name['Income']
parameter_setting.save()
I then extended this approach passing in options to configure the factory, like this:
whatever = WhateverFactory(options__an_option=True, options__another_option=True)
Then this factory code detected the options and generated the test data required (note the method is renamed to options to match the prefix on the parameter names):
#factory.post_generation
def options(self, create, not_used, **kwargs):
# The standard code as above
if kwargs.get('an_option', None):
# code for custom option 'an_option'
if kwargs.get('another_option', None):
# code for custom option 'another_option'
I then further extended this. Because my desired models contained self joins, my factory is recursive. So for a call such as:
whatever = WhateverFactory(options__an_option='xyz',
options__an_option_for_a_nested_whatever='abc')
Within #factory.post_generation I have:
class Meta:
model = Whatever
# self is the top level object being generated
#factory.post_generation
def options(self, create, not_used, **kwargs):
# This generates the nested object
nested_object = WhateverFactory(
options__an_option=kwargs.get('an_option_for_a_nested_whatever', None))
# then join nested_object to self via the self join
self.nested_whatever_id = nested_object.id
Some notes you do not need to read as to why I went with this option rather than #rhunwicks's proper solution to my question above. There were two reasons.
The thing that stopped me experimenting with it was that the order of RelatedFactory and post-generation is not reliable - apparently unrelated factors affect it, presumably a consequence of lazy evaluation. I had errors where a set of factories would suddenly stop working for no apparent reason. Once was because I renamed the variables RelatedFactory were assigned to. This sounds ridiculous but I tested it to death (and posted here) but there is no doubt - renaming the variables reliably switched the sequence of RelatedFactory and post-gen execution. I still assumed this was some oversight on my behalf until it happened again for some other reason (which I never managed to diagnose).
Secondly I found the declarative code confusing, inflexible and hard to re-factor. It isn't straightforward to pass different configurations during instantiation so that the same factory can be used for different variations of test data, meaning I had to repeat code, object needs adding to a Factory Meta.exclude list - sounds trivial but when you've pages of code generating data it was a reliable error. As a developer you'd have to pass over several factories several times to understand the control flow. Generation code would be spread between the declarative body, until you'd exhausted these tricks, then the rest would go in post-generation or get very convoluted. A common example for me is a triad of interdependent models (eg, a parent-children category structure or dataset/attributes/entities) as a foreign key of another triad of inter-dependent objects (eg, models, parameter values, etc, referring to other models' parameter values). A few of these types of structures, especially if nested, quickly become unmanagable.
I realize it isn't really in the spirit of factory_boy, but putting everything into post-generation solved all these problems. I can pass in parameters, so the same single factory serves all my composite model test data requirements and no code is repeated. The sequence of creation is easy to see immediately, obvious and completely reliable, rather than depending on confusing chains of inheritance and overriding and subject to some bug. The interactions are obvious so you don't need to digest the whole thing to add some functionality, and different areas of funtionality are grouped in the post-generation if clauses. There's no need to exclude working variables and you can refer to them for the duration of the factory code. The unit test code is simplified, because describing the functionality goes in parameter names rather than Factory class names - so you create data with a call like WhateverFactory(options__create_xyz=True, options__create_abc=True.., rather than WhateverCreateXYZCreateABC..(). This makes a nice division of responsibilities quite clean to code.

Django using self on __init__ when not saved in database

Is it possible to use self as a reference in the __init__ method when the object is not instantiated yet?
What I'm trying to do is :
class MyClass(models.Model)
__init__(self):
some_attributes = AnotherClass.objects.filter(foreignkey=self)
The thing is that as the instance of MyClass is not registered in db yet, I have an exception like "MyClass has not attribute id"
I tried to add
if self.pk:
but it doesn't work. Is there a method like
if self.is_saved_in_db():
#some code
or do I have to created this one ?
EDIT
To be more specific, I'll give an example. I have a generic class which I try to hydrate with attributes from another Model.
class MyClass(models.Model)
_init__(self):
self.hydrate()
def hydrate(self):
# Retrieving the related objects
attributes = Information.objects.filter(...)
for attr in attributes:
attribute_id = attr.name.lower().replace(" ","_")
setattr(self,attribute_id,attr)
By doing so, I can access to attributes with MyClass.my_attribute.
For a small example, if we replace MyClass by Recipe and Information with Ingredients I can do :
pasta_recipe.pasta
pasta_recipie.tomato
pasta_recipie.onions
It's a simple parsing from a foreign_key to an attribute
By writing it, I realise that it's a bit useless because I can directly use ForeignKey relationships. I think I'll do that but for my own culture, is it possible do the filter with self as attribute before database saving ?
Thanks!
This is a very strange thing to do. I strongly recommend you do not try to do it.
(That said, the self.pk check is the correct one: you need to provide more details than "it doesn't work".)

Google App Engine base and subclass gets

I want to have a base class called MBUser that has some predefined properties, ones that I don't want to be changed. If the client wants to add properties to MBUser, it is advised that MBUser be subclassed, and any additional properties be put in there.
The API code won't know if the client actually subclasses MBUser or not, but it shouldn't matter. The thinking went that we could just get MBUser by id. So I expected this to work:
def test_CreateNSUser_FetchMBUser(self):
from nsuser import NSUser
id = create_unique_id()
user = NSUser(id = id)
user.put()
# changing MBUser.get.. to NSUser.get makes this test succeed
get_user = MBUser.get_by_id(id)
self.assertIsNotNone(get_user)
Here NSUser is a subclass of MBUser. The test fails.
Why can't I do this?
What's a work around?
Models are defined by their "kind", and a subclass is a different kind, even if it seems the same.
The point of subclassing is not to share values, but to share the "schema" you've created for a given "kind".
A kind map is created on base class ndb.Model (it seems like you're using ndb since you mentioned get_by_id) and each kind is looked up when you do queries like this.
For subclasses, the kind is just defined as the class name:
#classmethod
def _get_kind(cls):
return cls.__name__
I just discovered GAE has a solution for this. It's called the PolyModel:
https://developers.google.com/appengine/docs/python/ndb/polymodelclass

Python: Assign an object's variable from a function (OpenERP)

I'm working on a OpenERP environment, but maybe my issue can be answered from a pure python perspective. What I'm trying to do is define a class whose "_columns" variable can be set from a function that returns the respective dictionary. So basically:
class repos_report(osv.osv):
_name = "repos.report"
_description = "Reposition"
_auto = False
def _get_dyna_cols(self):
ret = {}
cr = self.cr
cr.execute('Select ... From ...')
pass #<- Fill dictionary
return ret
_columns = _get_dyna_cols()
def init(self, cr):
pass #Other stuff here too, but I need to set my _columns before as per openerp
repos_report()
I have tried many ways, but these code reflects my basic need. When I execute my module for installation I get the following error.
TypeError: _get_dyna_cols() takes exactly 1 argument (0 given)
When defining the the _get_dyna_cols function I'm required to have self as first parameter (even before executing). Also, I need a reference to openerp's 'cr' cursor in order to query data to fill my _columns dictionary. So, how can I call this function so that it can be assigned to _columns? What parameter could I pass to this function?
From an OpenERP perspective, I guess I made my need quite clear. So any other approach suggested is also welcome.
From an OpenERP perspective, the right solution depends on what you're actually trying to do, and that's not quite clear from your description.
Usually the _columns definition of a model must be static, since it will be introspected by the ORM and (among other things) will result in the creation of corresponding database columns. You could set the _columns in the __init__ method (not init1) of your model, but that would not make much sense because the result must not change over time, (and it will only get called once when the model registry is initialized anyway).
Now there are a few exceptions to the "static columns" rules:
Function Fields
When you simply want to dynamically handle read/write operations on a virtual column, you can simply use a column of the fields.function type. It needs to emulate one of the other field types, but can do anything it wants with the data dynamically. Typical examples will store the data in other (real) columns after some pre-processing. There are hundreds of example in the official OpenERP modules.
Dynamic columns set
When you are developing a wizard model (a subclass of TransientModel, formerly osv_memory), you don't usually care about the database storage, and simply want to obtain some input from the user and take corresponding actions.
It is not uncommon in that case to need a completely dynamic set of columns, where the number and types of the columns may change every time the model is used. This can be achieved by overriding a few key API methods to simulate dynamic columns`:
fields_view_get is the API method that is called by the clients to obtain the definition of a view (form/tree/...) for the model.
fields_get is included in the result of fields_view_get but may be called separately, and returns a dict with the columns definition of the model.
search, read, write and create are called by the client in order to access and update record data, and should gracefully accept or return values for the columns that were defined in the result of fields_get
By overriding properly these methods, you can completely implement dynamic columns, but you will need to preserve the API behavior, and handle the persistence of the data (if any) yourself, in real static columns or in other models.
There are a few examples of such dynamic columns sets in the official addons, for example in the survey module that needs to simulate survey forms based on the definition of the survey campaign.
1 The init() method is only called when the model's module is installed or updated, in order to setup/update the database backend for this model. It relies on the _columns to do this.
When you write _columns = _get_dyna_cols() in the class body, that function call is made right there, in the class body, as Python is still parsing the class itself. At that point, your _get_dyn_cols method is just a function object in the local (class body) namespace - and it is called.
The error message you get is due to the missing self parameter, which is inserted only when you access your function as a method - but this error message is not what is wrong here: what is wrong is that you are making an imediate function call and expecting an special behavior, like late execution.
The way in Python to achieve what you want - i.e. to have the method called authomatically when the attribute colluns is accessed is to use the "property" built-in.
In this case, do just this: _columns = property(_get_dyna_cols) -
This will create a class attribute named "columns" which through a mechanism called "descriptor protocol" will call the desired method whenever the attribute is accessed from an instance.
To leran more about the property builtin, check the docs: http://docs.python.org/library/functions.html#property

Overriding/blocking __set__ on db.Property to enforce an alternate set method/process

I'm trying to sub-class db.Property and override the set method to implement stuff like pre and post set logic.
The problem is that __set__ is being called directly on the property by db.Model.__init__() during the from_entity conversion of entity to instance (after it comes out of the datastore), so obviously pre and post set logic should not be called.
class MyProperty(db.StringProperty):
def __set__(model_instance, value):
self.pre_set(value)
super(MyProperty.__set__(model_instance, value)
self.post_set(value)
class MyModel(db.Model):
foo = MyProperty()
my_model = MyModel()
my_model.put()
my_model.foo = u'A new string.' """pre/post set logic runs."""
#onload the __set__ method will be called again
loaded_model = db.get(my_model.key())
# In db.Model.__init__()
for prop in self.properties().values():
value = kwargs.get(prop.name, None) or prop.default() #or something like that
prop.__set__(self, value) """pre/post set logic also runs :("""
How can I differentiate between these two occurrences without having to override db.Model.__init__()? or should I just do that? Am I not supposed to be doing this with prop.__set__()?
Unfortunately, there's no easy way to distinguish the two situations. You could examine the stack, but that's an extremely kludgy approach. You could override Model.__init__ - either by monkeypatching it or by requiring users of your property extend your custom model subclass - but I'm not sure how you'd modify it in a way that would help yet remain backwards compatible with existing property classes.
You might want to check out Guido's NDB project - it may be more flexible in this respect, and is still under active development.

Categories

Resources