Proper way to migrate NDB model property - python

I currently have a model in NDB and I'd like to change the property name without necessarily touching NBD. Let's say I have the following:
from google.appengine.ext import ndb
class User(ndb.Model):
company = ndb.KeyProperty(repeated=True)
What I would like to have is something more like this:
class User(ndb.Model):
company_ = ndb.KeyProperty(repeated=True)
#property
def company(self):
return '42'
#company.setter
def company(self, new_company):
#set company here
Is there a relatively pain-free way to do so? I'd like the convienance of using property getter/setters, but given the current implementation I would like to avoid touching the underlying datastore.

you can change the class-level property name while keeping the underlying NDB property name by specifying the name="xx" param in the Property() constructor
so something like this could be done:
class User(ndb.Model):
company_ = ndb.KeyProperty(name="company", repeated=True)
#property
def company(self):
return self.company_
#company.setter
def company(self, new_company):
self.company_ = new_company
so now anytime you access .company_ NDB will actually set/get "company" internally... and you don't have to do any data migrations

From my understanding of ndb, the property names are stored in the database along with their contents for every entity. You would have to rewrite every entity with the new property name (and without the old one).
Since that is not pain-free, maybe you could choose other names for your getter and setter like get_company and set_company.

Related

How to maintain a table of all proxy models in Django?

I have a model A and want to make subclasses of it.
class A(models.Model):
type = models.ForeignKey(Type)
data = models.JSONField()
def compute():
pass
class B(A):
def compute():
df = self.go_get_data()
self.data = self.process(df)
class C(A):
def compute():
df = self.go_get_other_data()
self.data = self.process_another_way(df)
# ... other subclasses of A
B and C should not have their own tables, so I decided to use the proxy attirbute of Meta. However, I want there to be a table of all the implemented proxies.
In particular, I want to keep a record of the name and description of each subclass.
For example, for B, the name would be "B" and the description would be the docstring for B.
So I made another model:
class Type(models.Model):
# The name of the class
name = models.String()
# The docstring of the class
desc = models.String()
# A unique identifier, different from the Django ID,
# that allows for smoothly changing the name of the class
identifier = models.Int()
Now, I want it so when I create an A, I can only choose between the different subclasses of A.
Hence the Type table should always be up-to-date.
For example, if I want to unit-test the behavior of B, I'll need to use the corresponding Type instance to create an instance of B, so that Type instance already needs to be in the database.
Looking over on the Django website, I see two ways to achieve this: fixtures and data migrations.
Fixtures aren't dynamic enough for my usecase, since the attributes literally come from the code. That leaves me with data migrations.
I tried writing one, that goes something like this:
def update_results(apps, schema_editor):
A = apps.get_model("app", "A")
Type = apps.get_model("app", "Type")
subclasses = get_all_subclasses(A)
for cls in subclasses:
id = cls.get_identifier()
Type.objects.update_or_create(
identifier=id,
defaults=dict(name=cls.__name__, desc=cls.__desc__)
)
class Migration(migrations.Migration):
operations = [
RunPython(update_results)
]
# ... other stuff
The problem is, I don't see how to store the identifier within the class, so that the Django Model instance can recover it.
So far, here is what I have tried:
I have tried using the fairly new __init_subclass__ construct of Python. So my code now looks like:
class A:
def __init_subclass__(cls, identifier=None, **kwargs):
super().__init_subclass__(**kwargs)
if identifier is None:
raise ValueError()
cls.identifier = identifier
Type.objects.update_or_create(
identifier=identifier,
defaults=dict(name=cls.__name__, desc=cls.__doc__)
)
# ... the rest of A
# The identifier should never change, so that even if the
# name of the class changes, we still know which subclass is referred to
class B(A, identifier=3):
# ... the rest of B
But this update_or_create fails when the database is new (e.g. during unit tests), because the Type table does not exist.
When I have this problem in development (we're still in early stages so deleting the DB is still sensible), I have to go
comment out the update_or_create in __init_subclass__. I can then migrate and put it back in.
Of course, this solution is also not great because __init_subclass__ is run way more than necessary. Ideally this machinery would only happen at migration.
So there you have it! I hope the problem statement makes sense.
Thanks for reading this far and I look forward to hearing from you; even if you have other things to do, I wish you a good rest of your day :)
With a little help from Django-expert friends, I solved this with the post_migrate signal.
I removed the update_or_create in __init_subclass, and in project/app/apps.py I added:
from django.apps import AppConfig
from django.db.models.signals import post_migrate
def get_all_subclasses(cls):
"""Get all subclasses of a class, recursively.
Used to get a list of all the implemented As.
"""
all_subclasses = []
for subclass in cls.__subclasses__():
all_subclasses.append(subclass)
all_subclasses.extend(get_all_subclasses(subclass))
return all_subclasses
def update_As(sender=None, **kwargs):
"""Get a list of all implemented As and write them in the database.
More precisely, each model is used to instantiate a Type, which will be used to identify As.
"""
from app.models import A, Type
subclasses = get_all_subclasses(A)
for cls in subclasses:
id = cls.identifier
Type.objects.update_or_create(identifier=id, defaults=dict(name=cls.__name__, desc=cls.__doc__))
class MyAppConfig(AppConfig):
default_auto_field = "django.db.models.BigAutoField"
name = "app"
def ready(self):
post_migrate.connect(update_As, sender=self)
Hope this is helpful for future Django coders in need!

Google App Engine base and subclass gets

I want to have a base class called MBUser that has some predefined properties, ones that I don't want to be changed. If the client wants to add properties to MBUser, it is advised that MBUser be subclassed, and any additional properties be put in there.
The API code won't know if the client actually subclasses MBUser or not, but it shouldn't matter. The thinking went that we could just get MBUser by id. So I expected this to work:
def test_CreateNSUser_FetchMBUser(self):
from nsuser import NSUser
id = create_unique_id()
user = NSUser(id = id)
user.put()
# changing MBUser.get.. to NSUser.get makes this test succeed
get_user = MBUser.get_by_id(id)
self.assertIsNotNone(get_user)
Here NSUser is a subclass of MBUser. The test fails.
Why can't I do this?
What's a work around?
Models are defined by their "kind", and a subclass is a different kind, even if it seems the same.
The point of subclassing is not to share values, but to share the "schema" you've created for a given "kind".
A kind map is created on base class ndb.Model (it seems like you're using ndb since you mentioned get_by_id) and each kind is looked up when you do queries like this.
For subclasses, the kind is just defined as the class name:
#classmethod
def _get_kind(cls):
return cls.__name__
I just discovered GAE has a solution for this. It's called the PolyModel:
https://developers.google.com/appengine/docs/python/ndb/polymodelclass

sqlalchemy search function on table as classmethod?

Assuming i have a class that is called Customer that is defined in sqlalchemy to represent the customer table. I want to write a search method so that ...
results = Customer.search(query)
will return the results based on the method. Do I want to do this as a #classmethod?
#classmethod
def search(cls,query):
#do some stuff
return results
Can i use cls in place of DBSession.query?
DBSession.query(Customer).filter(...)
cls.query(Customer).filter(...)
To use
cls.query
you have to assign a query_property to your model classes!
You probably want to use this in other model classes as well, so you might want to do that in your model Base class somewhere in your model/__init__.py:
Base.query = Session.query_property()
Then you can simply write:
cls.query.filter(...)
Note that you don't specify the Object to query for anymore, that is automatically done by using the query_property mechanism.
I just recently wrote some similar code, following this reference.
class Users(Base):
__tablename__ = 'users'
#classmethod
def by_id(cls, userid):
return Session.query(Users).filter(Users.id==userid).first()
So the answer to your first question appears to be "yes". Not sure about the second, as I didn't substitute cls for DBSession.

Exposing a "dumbed-down", read-only instance of a Model in GAE

Does anyone know a clever way, in Google App Engine, to return a wrapped Model instance that only exposes a few of the original properties, and does not allow saving the instance back to the datastore?
I'm not looking for ways of actually enforcing these rules, obviously it'll still be possible to change the instance by digging through its __dict__ etc. I just want a way to avoid accidental exposure/changing of data.
My initial thought was to do this (I want to do this for a public version of a User model):
class PublicUser(db.Model):
display_name = db.StringProperty()
#classmethod
def kind(cls):
return 'User'
def put(self):
raise SomeError()
Unfortunately, GAE maps the kind to a class early on, so if I do PublicUser.get_by_id(1) I will actually get a User instance back, not a PublicUser instance.
Also, the idea is that it should at least appear to be a Model instance so that I can pass it around to code that does not know about the fact that it is a "dumbed-down" version. Ultimately I want to do this so that I can use my generic data exposure functions on the read-only version, so that they only expose public information about the user.
Update
I went with icio's solution. Here's the code I wrote for copying the properties from the User instance over to a PublicUser instance:
class User(db.Model):
# ...
# code
# ...
def as_public(self):
"""Returns a PublicUser version of this object.
"""
props = self.properties()
pu = PublicUser()
for prop in pu.properties().values():
# Only copy properties that exist for both the PublicUser model and
# the User model.
if prop.name in props:
# This line of code sets the property of the PublicUser
# instance to the value of the same property on the User
# instance.
prop.__set__(pu, props[prop.name].__get__(self, type(self)))
return pu
Please comment if this isn't a good way of doing it.
Could you not create a method within your User class which instantiates a ReadOnlyUser object and copies the values of member variables over as appropriate? Your call would be something like User.get_by_id(1).readonly() with the readonly method defined in the following form:
class User(db.Model):
def readonly(self):
return ReadOnlyUser(self.name, self.id);
Or you could perhaps have your User class extend another class with methods to do this automatically based on some static vars listing properties to copy over, or something.
P.S. I don't code in Python

Using Property Builtin with GAE Datastore's Model

I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.

Categories

Resources