Following backreferences of unknown kinds in NDB - python

I'm in the process of writing my first RESTful web service atop GAE and the Python 2.7 runtime; I've started out using Guido's shiny new ndb API.
However, I'm unsure how to solve a particular case without the implicit back-reference feature of the original db API. If the user-agent requests a particular resource and those resources 1 degree removed:
host/api/kind/id?depth=2
What's the best way to discover a related collection of entities from the "one" in a one-to-many relationship, given that the kind of the related entity is unknown at development time?
I'm unable to use a replacement query as described in a previous SO inquiry due to the latter restriction. The fact that my model is definable at runtime (and therefore isn't hardcoded) prevents me from using a query to filter properties for matching keys.
Ancestor and other kindless queries are also out due to the datastore limitation that prevents me from filtering on a property without the kind specified.
Thus far, the only idea I've had (beyond reverting to the db api) is to use a cross-group transaction to write my own reference on the "one", either by updating an ndb.StringProperty(repeat=True) containing all the related kinds when an entity of a new kind is introduced or by simply maintaining a list of keys on the "one" ndb.KeyProperty(repeat=True) every time a related "many" entity is written to the datastore.
I'm hoping someone more experienced than myself can suggest a better approach.
Given jmort253's suggestion, I'll try to augment my question with a concrete example adapted from the docs:
class Contact(ndb.Expando):
""" The One """
# basic info
name = ndb.StringProperty()
birth_day = ndb.DateProperty()
# If I were using db, a collection called 'phone_numbers' would be implicitly
# created here. I could use this property to retrieve related phone numbers
# when this entity was queried. Since NDB lacks this feature, the service
# will neither have a reference to query nor the means to know the
# relationship exists in the first place since it cannot be hard-coded. The
# data model is extensible and user-defined at runtime; most relationships
# will be described only in the data, and must be discoverable by the server.
# In this case, when Contact is queried, I need a way to retrieve the
# collection of phone numbers.
# Company info.
company_title = ndb.StringProperty()
company_name = ndb.StringProperty()
company_description = ndb.StringProperty()
company_address = ndb.PostalAddressProperty()
class PhoneNumber(ndb.Expando):
""" The Many """
# no collection_name='phone_numbers' equivalent exists for the key property
contact = ndb.KeyProperty(kind='Contact')
number = ndb.PhoneNumberProperty()

Interesting question! So basically you want to look at the Contact class and find out if there is some other model class that has a KeyProperty referencing it; in this example PhoneNumber (but there could be many).
I think the solution is to ask your users to explicitly add this link when the PhoneNumber class is created.
You can make this easy for your users by giving them a subclass of KeyProperty that takes care of this; e.g.
class LinkedKeyProperty(ndb.KeyProperty):
def _fix_up(self, cls, code_name):
super(LinkedKeyProperty, self)._fix_up(cls, code_name)
modelclass = ndb.Model._kind_map[self._kind]
collection_name = '%s_ref_%s_to_%s' % (cls.__name__,
code_name,
modelclass.__name__)
setattr(modelclass, collection_name, (cls, self))
Exactly how you pick the name for the collection and the value to store there is up to you; just put something there that makes it easy for you to follow the link back. The example would create a new attribute on Contact:
Contact.PhoneNumber_ref_contact_to_Contact == (PhoneNumber, PhoneNumber.contact)
[edited to make the code working and to add an example. :-) ]

Sound like a good use case for ndb.StructuredProperty.

Related

is it possible to override a foreign key relation with a custom method/property

Context
I'm working on refactoring a Django 2.X app, particularly the core model, CoreModel. There's a single database (Postgres) containing all related tables.
Instances of CoreModel will no longer live in Postgres after this refactor, they will live somewhere else but outside the scope of the Django project, let's say some AWS No-SQL database service.
There also several satellites models SateliteModel to CoreModel which will continue to live on Postgres, but CoreModelis currently modelled as a foreign key field.
class CordeModel(models.Model):
pass
class SatelliteModel(models.Model):
core = models.ForeignKey(CoreModel)
def some_instance_method(self):
return self.core.calculate_stuff() # <- override self.core!
Problem
The code is filled with mentions to the CoreModel relation, and I haven't been able to successfully solved this issue.
My first naive approach was to implement a #property getter method, that way I had enough flexibility to do something like:
#property
def core(self):
try:
# ORM
return self.core
except CoreNotFound:
# External datastore
return aws_client.fetch_core()
With this snippet I have a circular dependency on the core name, so the idea is out.
I could rename the foreign key: but I would much rather not touch the database schema. After all I'm already refactoring the central part of the app, and that's an very error-prone process. I'd do this if there's no other choice.
I could rename the #property field, to something like current_core: This way I avoid the infinite recursion part, but this in turn would imply a very big task of searching the whole code base for mentions of the relation, and this being the central model, it would take a lot of time.
After some hours of research I'm beginning to doubt if the concept of overriding a getter for a foreign key field is possible, as I need it. Maybe this is isn't exactly what I'm looking for, it's a very unusual use case, but the requirement is also very unusual.
Any insights you can give are greatly appreciated.
UPDATE
I've forgotten to add the most crucial piece of information.
Most CoreModel will be removed for Postgres (the historic ones), but there's a tiny part of CoreModels that will remain and will be moved after a while. In essence, only the "active" CoreModels will stay in Postgres, but all will eventually be moved out, while new CoreModel will be created.
So that rules out the possibility of change the ForeignKey field for an integer.
You could retain but rename the foreign key and then add a property with the old name
class SatelliteModel(models.Model):
old_core = models.ForeignKey(CoreModel, null=True, blank=True, on_delete=models.SET_NULL)
#property
def core(self):
try:
return self.old_core
except CoreModel.DoesNotExist:
return aws_client.fetch_core()
This would change the column name in your schema, although you could override the column name to prevent this
old_core = models.ForeignKey(CoreModel, db_column='core_id', null=True, blank=True, on_delete=models.SET_NULL)
It may be possible to create a subclass of ForeignKey that would perform as you wished, if this answer is not sufficient I can share some thoughts

Pulling basic mongoengine document definitions into flask-mongoengine

I've been using mongoengine for a while now and have a ton of python data processing code that relies on a common set of Object Document Models.
Now I need to access the same mongodb instances from Flask. I'd like to use the same ODM definitions.
class User(Document):
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
The problem is that flask-mongoengine requires you to first set up your flask context "db" and then build your ODM definitions, inheriting the document class and fieldtypes from "db" instead of the base mongoengine classes.
class User(db.Document):
email = db.StringField(required=True)
first_name = db.StringField(max_length=50)
last_name = db.StringField(max_length=50)
One solution, I suppose, is to make copies of all of the existing ODM definitions, import "db" from my main flask app, and then prepend everything with "db." If I do that, I'll have to maintain two sets of nearly identical ODM definitions.
If I simply change everything to the "db." version, that would probably break all of my legacy code.
So I'm thinking there might be a trick using super() on the document classes that can detect whether I'm importing my ODM into a Flask context or whether I'm importing it from a stand alone data processing script.
I'm also thinking I don't want to have to super() every fieldtype for every document, that I should be able to build or reference a common function that took care of that for me.
However, my super() skills are weak. I'm not even certain if that is the best approach. I was hoping someone might be able and willing to share some hints as to how to approach this.

Google App Engine NDB custom key id

When I create an object with ndb's method put it creates the key automatically of the type Key(kind, id) where id is a number. All over the documentation it shows that you can use a string for the key's id but I couldn't find out how to do this automatically when an object is created.
I have a User model and I was thinking to use the user's username (since its unique) as the key's id for faster retrieval. Is that even a good idea? Would I have any problems with the username since it's user submited (i'm validating the input)?
class UserModel(ndb.Model):
...
user_model_entity = UserModel(id='some_string', ...)
If these IDs are subject to change, this may be a bad idea. If it's your own system and you can react to potential changes, it is a fine idea, but you need make sure the IDs will be unique and relatively stable before deciding to use them.
You specify the id of the entity at the time of creation. When you define the model, you don't set an id attribute there. Thus, for example you have:
class User(ndb.Model):
# fields here
When you create the model, you have:
user = User(id='username', ...)
Since the username is unique and you validate your input, then you will not have any problems with this approach.
For more information about an ndb Model constructor, you can take a look at NDB Model Class - Constructor.
Hope this helps.
You can also supply integer ID (not necessarily a string) for your model entity.
class User(ndb.Model):
...
user = User(id=1234567890, ...)
user.put()

How do I remove an item from db.ListProperty?

In a Google App Engine solution (Python), I've used the db.ListProperty as a way to describe a many-to-many relation, like so:
class Department(db.Model):
name = db.StringProperty()
#property
def employees(self):
return Employee.all().filter('departments', self.key())
class Employee(db.Model):
name = db.StringProperty()
departments = db.ListProperty(db.Key)
I create many-to-many relations by simply appending the Department key to the db.ListProperty like so:
employee.departments.append(department.key())
The problem is that I don't know how to actually remove this relationship again, when it is no longer needed.
I've tried Googling it, but I can't seem to find any documentation that describes the db.ListProperty in details.
Any ideas or references?
The ListProperty is just a Python list with some helper methods to make it work with GAE, so anything that applies to a list applies to a ListProperty.
employee.departments.remove(department.key())
employee.put()
Keep in mind that the data must be deserialized/reserialized every time a change is made, so if you are looking for speed when adding or removing single values you may want to go with another method of modelling the relationship like the one in the Relationship Model section of this page.
The ListProperty method also has the disadvantage of sometimes producing very large indexes if you want to search through the lists in a datastore request.
This may not not be a problem for you since your Lists should be relatively small, but it's something to keep in mind for future projects.
Found it via trial and error:
employee.departments.remove(department.key())

Query over a db.Model retrieves all the properties of the db.Model whether they are necessary or not. Is there an alternative?

I have db.Model which has several properties as described below:
class Doc(db.Model):
docTitle = db.StringProperty(required=True)
docText = db.TextProperty()
docUser = db.UserProperty(required=True)
docDate = db.DateTimeProperty(auto_now_add=True)
In the template I just list the names of these documents as links. For that purpose I use the following query:
docList = Doc.gql("WHERE docUser = :1 ORDER BY docDate DESC", user)
As you can see docList includes all properties (including the "TextProperty"). However, I just use its docTitle and key() in my view.
Is there an alternative way to retrieve just the requested attributes of the model class?
If not, should I use PolyModel classes to differentiate the listing and actual usage of the Doc model class by creating another model class for the docText property?
EDIT: I am using webapp web framework in google app engine...
Entities are stored in the App Engine datastore as serialized protocol buffers, which are returned as a single blob, so it's not possible to just retrieve part of them. In any case, this would only save on RPC overhead between the datastore and your app, so the savings would be minimal.
If the size of each entity is significant, you may want to separate the model out, as you suggest. You don't need to (and probably shouldn't) use PolyModel, though - just use two model classes, a 'summary' and a 'detail' one.

Categories

Resources