Google App Engine base and subclass gets - python

I want to have a base class called MBUser that has some predefined properties, ones that I don't want to be changed. If the client wants to add properties to MBUser, it is advised that MBUser be subclassed, and any additional properties be put in there.
The API code won't know if the client actually subclasses MBUser or not, but it shouldn't matter. The thinking went that we could just get MBUser by id. So I expected this to work:
def test_CreateNSUser_FetchMBUser(self):
from nsuser import NSUser
id = create_unique_id()
user = NSUser(id = id)
user.put()
# changing MBUser.get.. to NSUser.get makes this test succeed
get_user = MBUser.get_by_id(id)
self.assertIsNotNone(get_user)
Here NSUser is a subclass of MBUser. The test fails.
Why can't I do this?
What's a work around?

Models are defined by their "kind", and a subclass is a different kind, even if it seems the same.
The point of subclassing is not to share values, but to share the "schema" you've created for a given "kind".
A kind map is created on base class ndb.Model (it seems like you're using ndb since you mentioned get_by_id) and each kind is looked up when you do queries like this.
For subclasses, the kind is just defined as the class name:
#classmethod
def _get_kind(cls):
return cls.__name__

I just discovered GAE has a solution for this. It's called the PolyModel:
https://developers.google.com/appengine/docs/python/ndb/polymodelclass

Related

Passing an object created with SubFactory and LazyAttribute to a RelatedFactory in factory_boy

I am using factory.LazyAttribute within a SubFactory call to pass in an object, created in the factory_parent. This works fine.
But if I pass the object created to a RelatedFactory, LazyAttribute can no longer see the factory_parent and fails.
This works fine:
class OKFactory(factory.DjangoModelFactory):
class = Meta:
model = Foo
exclude = ['sub_object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
The identical call to LazyAttribute fails here:
class ProblemFactory(OKFactory):
class = Meta:
model = Foo
exclude = ['sub_object', 'object']
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory,
sub_object=factory.LazyAttribute(lambda obj: obj.factory_parent.sub_object))
another_object = factory.RelatedFactory(AnotherObjectFactory, 'foo', object=object)
The identical LazyAttribute call can no longer see factory_parent, and can only access AnotherObject values. LazyAttribute throws the error:
AttributeError: The parameter sub_object is unknown. Evaluated attributes are...[then lists all attributes of AnotherObjectFactory]
Is there a way round this?
I can't just put sub_object=sub_object into the ObjectFactory call, ie:
sub_object = factory.SubFactory(SubObjectFactory)
object = factory.SubFactory(ObjectFactory, sub_object=sub_object)
because if I then do:
object2 = factory.SubFactory(ObjectFactory, sub_object=sub_object)
a second sub_object is created, whereas I need both objects to refer to the same sub_object. I have tried SelfAttribute to no avail.
I think you can leverage the ability to override parameters passed in to the RelatedFactory to achieve what you want.
For example, given:
class MyFactory(OKFactory):
object = factory.SubFactory(MyOtherFactory)
related = factory.RelatedFactory(YetAnotherFactory) # We want to pass object in here
If we knew what the value of object was going to be in advance, we could make it work with something like:
object = MyOtherFactory()
thing = MyFactory(object=object, related__param=object)
We can use this same naming convention to pass the object to the RelatedFactory within the main Factory:
class MyFactory(OKFactory):
class Meta:
exclude = ['object']
object = factory.SubFactory(MyOtherFactory)
related__param = factory.SelfAttribute('object')
related__otherrelated__param = factory.LazyAttribute(lambda myobject: 'admin%d_%d' % (myobject.level, myobject.level - 1))
related = factory.RelatedFactory(YetAnotherFactory) # Will be called with {'param': object, 'otherrelated__param: 'admin1_2'}
I solved this by simply calling factories within #factory.post_generation. Strictly speaking this isn't a solution to the specific problem posed, but I explain below in great detail why this ended up being a better architecture. #rhunwick's solution does genuinely pass a SubFactory(LazyAttribute('')) to RelatedFactory, however restrictions remained that meant this was not right for my situation.
We move the creation of sub_object and object from ProblemFactory to ObjectWithSubObjectsFactory (and remove the exclude clause), and add the following code to the end of ProblemFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return # No IDs, so wouldn't work anyway
object = ObjectWithSubObjectsFactory()
sub_object_ids_by_code = dict((sbj.name, sbj.id) for sbj in object.subobject_set.all())
# self is the `Foo` Django object just created by the `ProblemFactory` that contains this code.
for another_obj in self.anotherobject_set.all():
if another_obj.name == 'age_in':
another_obj.attribute_id = sub_object_ids_by_code['Age']
another_obj.save()
elif another_obj.name == 'income_in':
another_obj.attribute_id = sub_object_ids_by_code['Income']
another_obj.save()
So it seems RelatedFactory calls are executed before PostGeneration calls.
The naming in this question is easier to understand, so here is the same solution code for that sample problem:
The creation of dataset, column_1 and column_2 are moved into a new factory DatasetAnd2ColumnsFactory, and the code below is then added to the end of FunctionToParameterSettingsFactory.
#factory.post_generation
def post(self, create, extracted, **kwargs):
if not create:
return
dataset = DatasetAnd2ColumnsFactory()
column_ids_by_name =
dict((column.name, column.id) for column in dataset.column_set.all())
# self is the `FunctionInstantiation` Django object just created by the `FunctionToParameterSettingsFactory` that contains this code.
for parameter_setting in self.parametersetting_set.all():
if parameter_setting.name == 'age_in':
parameter_setting.column_id = column_ids_by_name['Age']
parameter_setting.save()
elif parameter_setting.name == 'income_in':
parameter_setting.column_id = column_ids_by_name['Income']
parameter_setting.save()
I then extended this approach passing in options to configure the factory, like this:
whatever = WhateverFactory(options__an_option=True, options__another_option=True)
Then this factory code detected the options and generated the test data required (note the method is renamed to options to match the prefix on the parameter names):
#factory.post_generation
def options(self, create, not_used, **kwargs):
# The standard code as above
if kwargs.get('an_option', None):
# code for custom option 'an_option'
if kwargs.get('another_option', None):
# code for custom option 'another_option'
I then further extended this. Because my desired models contained self joins, my factory is recursive. So for a call such as:
whatever = WhateverFactory(options__an_option='xyz',
options__an_option_for_a_nested_whatever='abc')
Within #factory.post_generation I have:
class Meta:
model = Whatever
# self is the top level object being generated
#factory.post_generation
def options(self, create, not_used, **kwargs):
# This generates the nested object
nested_object = WhateverFactory(
options__an_option=kwargs.get('an_option_for_a_nested_whatever', None))
# then join nested_object to self via the self join
self.nested_whatever_id = nested_object.id
Some notes you do not need to read as to why I went with this option rather than #rhunwicks's proper solution to my question above. There were two reasons.
The thing that stopped me experimenting with it was that the order of RelatedFactory and post-generation is not reliable - apparently unrelated factors affect it, presumably a consequence of lazy evaluation. I had errors where a set of factories would suddenly stop working for no apparent reason. Once was because I renamed the variables RelatedFactory were assigned to. This sounds ridiculous but I tested it to death (and posted here) but there is no doubt - renaming the variables reliably switched the sequence of RelatedFactory and post-gen execution. I still assumed this was some oversight on my behalf until it happened again for some other reason (which I never managed to diagnose).
Secondly I found the declarative code confusing, inflexible and hard to re-factor. It isn't straightforward to pass different configurations during instantiation so that the same factory can be used for different variations of test data, meaning I had to repeat code, object needs adding to a Factory Meta.exclude list - sounds trivial but when you've pages of code generating data it was a reliable error. As a developer you'd have to pass over several factories several times to understand the control flow. Generation code would be spread between the declarative body, until you'd exhausted these tricks, then the rest would go in post-generation or get very convoluted. A common example for me is a triad of interdependent models (eg, a parent-children category structure or dataset/attributes/entities) as a foreign key of another triad of inter-dependent objects (eg, models, parameter values, etc, referring to other models' parameter values). A few of these types of structures, especially if nested, quickly become unmanagable.
I realize it isn't really in the spirit of factory_boy, but putting everything into post-generation solved all these problems. I can pass in parameters, so the same single factory serves all my composite model test data requirements and no code is repeated. The sequence of creation is easy to see immediately, obvious and completely reliable, rather than depending on confusing chains of inheritance and overriding and subject to some bug. The interactions are obvious so you don't need to digest the whole thing to add some functionality, and different areas of funtionality are grouped in the post-generation if clauses. There's no need to exclude working variables and you can refer to them for the duration of the factory code. The unit test code is simplified, because describing the functionality goes in parameter names rather than Factory class names - so you create data with a call like WhateverFactory(options__create_xyz=True, options__create_abc=True.., rather than WhateverCreateXYZCreateABC..(). This makes a nice division of responsibilities quite clean to code.

Django using self on __init__ when not saved in database

Is it possible to use self as a reference in the __init__ method when the object is not instantiated yet?
What I'm trying to do is :
class MyClass(models.Model)
__init__(self):
some_attributes = AnotherClass.objects.filter(foreignkey=self)
The thing is that as the instance of MyClass is not registered in db yet, I have an exception like "MyClass has not attribute id"
I tried to add
if self.pk:
but it doesn't work. Is there a method like
if self.is_saved_in_db():
#some code
or do I have to created this one ?
EDIT
To be more specific, I'll give an example. I have a generic class which I try to hydrate with attributes from another Model.
class MyClass(models.Model)
_init__(self):
self.hydrate()
def hydrate(self):
# Retrieving the related objects
attributes = Information.objects.filter(...)
for attr in attributes:
attribute_id = attr.name.lower().replace(" ","_")
setattr(self,attribute_id,attr)
By doing so, I can access to attributes with MyClass.my_attribute.
For a small example, if we replace MyClass by Recipe and Information with Ingredients I can do :
pasta_recipe.pasta
pasta_recipie.tomato
pasta_recipie.onions
It's a simple parsing from a foreign_key to an attribute
By writing it, I realise that it's a bit useless because I can directly use ForeignKey relationships. I think I'll do that but for my own culture, is it possible do the filter with self as attribute before database saving ?
Thanks!
This is a very strange thing to do. I strongly recommend you do not try to do it.
(That said, the self.pk check is the correct one: you need to provide more details than "it doesn't work".)

Figure out child type with Django MTI or specify type as field?

I'm setting up a data model in django using multiple-table inheritance (MTI) like this:
class Metric(models.Model):
account = models.ForeignKey(Account)
date = models.DateField()
value = models.FloatField()
calculation_in_progress = models.BooleanField()
type = models.CharField( max_length=20, choices= METRIC_TYPES ) # Appropriate?
def calculate(self):
# default calculation...
class WebMetric(Metric):
url = models.URLField()
def calculate(self):
# web-specific calculation...
class TextMetric(Metric):
text = models.TextField()
def calculate(self):
# text-specific calculation...
My instinct is to put a 'type' field in the base class as shown here, so I can tell which sub-class any Metric object belongs to. It would be a bit of a hassle to keep this up to date all the time, but possible. But do I need to do this? Is there some way that django handles this automatically?
When I call Metric.objects.all() every objects returned is an instance of Metric never the subclasses. So if I call .calculate() I never get the sub-class's behavior.
I could write a function on the base class that tests to see if I can cast it to any of the sub-types like:
def determine_subtype(self):
try:
self.webmetric
return WebMetric
except WebMetric.DoesNotExist:
pass
# Repeat for every sub-class
but this seems like a bunch of repetitious code. And it's also not something that can be included in a SELECT filter -- only works in python-space.
What's the best way to handle this?
While it might offend some people's sensibilities, the only practical way to solve this problem is to put either a field or a method in the base class which says what kind of object each record really is. The problem with the method you describe is that it requires a separate database query for every type of subclass, for each object you're dealing with. This could get extremely slow when working with large querysets. A better way is to use a ForeignKey to the django Content Type class.
#Carl Meyer wrote a good solution here: How do I access the child classes of an object in django without knowing the name of the child class?
Single Table Inheritance could help alleviate this issue, depending on how it gets implemented. But for now Django does not support it: Single Table Inheritance in Django so it's not a helpful suggestion.
But do I need to do this?
Never. Never. Never.
Is there some way that django handles this automatically?
Yes. It's called "polymorphism".
You never need to know the subclass. Never.
"What about my WebMetric.url and my TextMetric.text attributes?"
What will you do with these attributes? Define a method function that does something. Implement different versions in WebMetric (that uses url) and TextMetric (that uses text).
That's proper polymorphism.
Please read this: http://docs.djangoproject.com/en/1.2/topics/db/models/#abstract-base-classes
Please make your superclass abstract.
Do NOT do this: http://docs.djangoproject.com/en/1.2/topics/db/models/#multi-table-inheritance
You want "single-table inheritance".

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

How do I access the child classes of an object in django without knowing the name of the child class?

In Django, when you have a parent class and multiple child classes that inherit from it you would normally access a child through parentclass.childclass1_set or parentclass.childclass2_set, but what if I don't know the name of the specific child class I want?
Is there a way to get the related objects in the parent->child direction without knowing the child class name?
(Update: For Django 1.2 and newer, which can follow select_related queries across reverse OneToOneField relations (and thus down inheritance hierarchies), there's a better technique available which doesn't require the added real_type field on the parent model. It's available as InheritanceManager in the django-model-utils project.)
The usual way to do this is to add a ForeignKey to ContentType on the Parent model which stores the content type of the proper "leaf" class. Without this, you may have to do quite a number of queries on child tables to find the instance, depending how large your inheritance tree is. Here's how I did it in one project:
from django.contrib.contenttypes.models import ContentType
from django.db import models
class InheritanceCastModel(models.Model):
"""
An abstract base class that provides a ``real_type`` FK to ContentType.
For use in trees of inherited models, to be able to downcast
parent instances to their child types.
"""
real_type = models.ForeignKey(ContentType, editable=False)
def save(self, *args, **kwargs):
if self._state.adding:
self.real_type = self._get_real_type()
super(InheritanceCastModel, self).save(*args, **kwargs)
def _get_real_type(self):
return ContentType.objects.get_for_model(type(self))
def cast(self):
return self.real_type.get_object_for_this_type(pk=self.pk)
class Meta:
abstract = True
This is implemented as an abstract base class to make it reusable; you could also put these methods and the FK directly onto the parent class in your particular inheritance hierarchy.
This solution won't work if you aren't able to modify the parent model. In that case you're pretty much stuck checking all the subclasses manually.
In Python, given a ("new-style") class X, you can get its (direct) subclasses with X.__subclasses__(), which returns a list of class objects. (If you want "further descendants", you'll also have to call __subclasses__ on each of the direct subclasses, etc etc -- if you need help on how to do that effectively in Python, just ask!).
Once you have somehow identified a child class of interest (maybe all of them, if you want instances of all child subclasses, etc), getattr(parentclass,'%s_set' % childclass.__name__) should help (if the child class's name is 'foo', this is just like accessing parentclass.foo_set -- no more, no less). Again, if you need clarification or examples, please ask!
Carl's solution is a good one, here's one way to do it manually if there are multiple related child classes:
def get_children(self):
rel_objs = self._meta.get_all_related_objects()
return [getattr(self, x.get_accessor_name()) for x in rel_objs if x.model != type(self)]
It uses a function out of _meta, which is not guaranteed to be stable as django evolves, but it does the trick and can be used on-the-fly if need be.
It turns out that what I really needed was this:
Model inheritance with content type and inheritance-aware manager
That has worked perfectly for me. Thanks to everyone else, though. I learned a lot just reading your answers!
You can use django-polymorphic for that.
It allows to automatically cast derived classes back to their actual type. It also provides Django admin support, more efficient SQL query handling, and proxy model, inlines and formset support.
The basic principle seems to be reinvented many times (including Wagtail's .specific, or the examples outlined in this post). It takes more effort however, to make sure it doesn't result in an N-query issue, or integrate nicely with the admin, formsets/inlines or third party apps.
Here's my solution, again it uses _meta so isn't guaranteed to be stable.
class Animal(models.model):
name = models.CharField()
number_legs = models.IntegerField()
...
def get_child_animal(self):
child_animal = None
for r in self._meta.get_all_related_objects():
if r.field.name == 'animal_ptr':
child_animal = getattr(self, r.get_accessor_name())
if not child_animal:
raise Exception("No subclass, you shouldn't create Animals directly")
return child_animal
class Dog(Animal):
...
for a in Animal.objects.all():
a.get_child_animal() # returns the dog (or whatever) instance
You can achieve this looking for all the fields in the parent that are an instance of django.db.models.fields.related.RelatedManager. From your example it seems that the child classes you are talking about are not subclasses. Right?
An alternative approach using proxies can be found in this blog post. Like the other solutions, it has its benefits and liabilities, which are very well put in the end of the post.

Categories

Resources