GAE: Find children of an ndb entity

GAE: Find children of an ndb entity - python

In the GAE I'm trying to find a way to implement or construct something similar to a foreign key constraint in a SQL database. Basically, deleting an entity that is being refered by another entity via a foreign key constraint should not be allowed (in other words: deleting a parent should not be allowed if there are children that refer to that parent).
I tried the KeyProperty in the ndb datastore but that gives me no options to find all depenend entites from the entity I want to delete. Also the ancestor hierarchy doesn't seem to cut it. I can query the ancestor from the children, but there doesn't seem a way to query the children from the ancestor.
Is there any way to either get the children from an ancestor or another database design in the GAE ndb datastore to implement this foreign key constraint?

Is there any way to either get the children from an ancestor
Yes it's called a query, used in conjunction with a KeyProperty in the child holding the key of the parent, or having the parent as the ancestor of the child key.
You can find all children of an ancestor irrespective of kind using an kindless ancestor query - https://cloud.google.com/appengine/docs/python/datastore/queries?hl=en#Python_Kindless_ancestor_queries
There is no such thing as a foreign key constraint in the datastore.
Not sure how it doesn't "cut it" apart from not performing the delete for you.

As Tim points out, if you use the ancestor solution you can easily query for all children. The same would be true if you use KeyProperty; you can easily query for all entities pointing at the current key:
MyChildModel.all().filter(MyChildModel.my_key_property==my_parent_entity.key)
Again, it's not clear how this fairly simple solution wouldn't "cut it".
However it is worth mentioning that it is a fundamental mistake to treat the datastore as if it were a relational database. It is not, and you should not try. You cannot enforce referential integrity; eventual consistency makes that impossible.

Related

in database related document, parent and child is refer to what?

that makes sense that we refer to the table that uses values from another table as a child(this column is defined as a foreign key), in this regard the other table is parent. this makes more sense in one to many relations.
one record in the table(parent) has a relationship with one or more records in another table(child). in short terms one parent has many child.
in this answer from StackOverflow these were mentioned.
but in this section of SQLAlchemy document that talks about many to one relationship, they are placing the ForeingKey in the parent table . what does it mean?
also in this section of SQLALchey document that talks about many to many relationship between two table, and implementing this behavior with Asociation object rather than Association table, they referring to one of the tables as parent and another one as child what does it mean?

Google NDB: Adding an entity with non existing parent

I am working on a web application based on Google App Engine (Python / Webapp2) and Google NDB Datastore.
I assumed that if I tried to add a new entity using as parent key the key of a no longer existing entity an exception was thrown. I have instead found the entity is actually created.
Am i doing something wrong?
I may check before whether the parent still exist through a keys_only query. Does it consume GAE read quotas?

You can create a key for any entity whether this entity exists or not. This is because a key is simply an encoding of an entity kind and either an id or name (and ancestor keys, if any).
This means that you can store a child entity before a parent entity is saved, as long as you know the parent's id or name. You cannot reassign a child from one parent to another, though.

As for your second question, the AppEngine pricing page says:
Calls to the datastore API result in the following billable operations. Small datastore operations include calls to allocate datastore ids or keys-only queries. These operations are free.
Complementing on #andrei's answer to the first question, no key reference in Ndb is checked for refering to an existing entity, this is true for keys used as parent, as well as for keys used as ̀KeyProperty within an entity.

AppEngine model structure for user/follower relations

I have a users who have "followers". I need to be able to navigate up and down the tree of users/followers. I'm eventually going hit AppEngine's 1mb limit on entity entries if I use ancestor relations if a user has many followers.
What's the best way to structure this data on AppEngine?

You cannot use ancestor relations for a simple reason that your use case allows circular references (I follow you, you follow me).
The solution depends on your expected usage patterns. You can choose between two options:
(A) In each suer entity store a list of IDs of other users that this user is following.
(B) Create a separate entity that has two properties: "User" and"Follower". Every entity will represent a single "connection" between users.
While the first option seems simpler, you may run into exploding indexes problem. Besides, it may turn out to be a more expensive solution as each change in user relationships will require an overwrite of a user entity with updates to all of its other indexes. The second solution does not have these drawbacks, but may require a little extra code.

Simple explanation of Google App Engine NDB Datastore

I'm creating a Google App Engine application (python) and I'm learning about the general framework. I've been looking at the tutorial and documentation for the NDB datastore, and I'm having some difficulty wrapping my head around the concepts. I have a large background with SQL databases and I've never worked with any other type of data storage system, so I'm thinking that's where I'm running into trouble.
My current understanding is this: The NDB datastore is a collection of entities (analogous to DB records) that have properties (analogous to DB fields/columns). Entities are created using a Model (analogous to a DB schema). Every entity has a key that is generated for it when it is stored. This is where I run into trouble because these keys do not seem to have an analogy to anything in SQL DB concepts. They seem similar to primary keys for tables, but those are more tightly bound to records, and in fact are fields themselves. These NDB keys are not properties of entities, but are considered separate objects from entities. If an entity is stored in the datastore, you can retrieve that entity using its key.
One of my big questions is where do you get the keys for this? Some of the documentation I saw showed examples in which keys were simply created. I don't understand this. It seemed that when entities are stored, the put() method returns a key that can be used later. So how can you just create keys and define ids if the original keys are generated by the datastore?
Another thing that I seem to be struggling with is the concept of ancestry with keys. You can define parent keys of whatever kind you want. Is there a predefined schema for this? For example, if I had a model subclass called 'Person', and I created a key of kind 'Person', can I use that key as a parent of any other type? Like if I wanted a 'Shoe' key to be a child of a 'Person' key, could I also then declare a 'Car' key to be a child of that same 'Person' key? Or will I be unable to after adding the 'Shoe' key?
I'd really just like a simple explanation of the NDB datastore and its API for someone coming from a primarily SQL background.

I think you've overcomplicating things in your mind. When you create an entity, you can either give it a named key that you've chosen yourself, or leave that out and let the datastore choose a numeric ID. Either way, when you call put, the datastore will return the key, which is stored in the form [<entity_kind>, <id_or_name>] (actually this also includes the application ID and any namespace, but I'll leave that out for clarity).
You can make entities members of an entity group by giving them an ancestor. That ancestor doesn't actually have to refer to an existing entity, although it usually does. All that happens with an ancestor is that the entity's key includes the key of the ancestor: so it now looks like [<parent_entity_kind>, <parent_id_or_name>, <entity_kind>, <id_or_name>]. You can now only get the entity by including its parent key. So, in your example, the Shoe entity could be a child of the Person, whether or not that Person has previously been created: it's the child that knows about the ancestor, not the other way round.
(Note that that ancestry path can be extended arbitrarily: the child entity can itself be an ancestor, and so on. In this case, the group is determined by the entity at the top of the tree.)
Saving entities as part of a group has advantages in terms of consistency, in that a query inside an entity group is always guaranteed to be fully consistent, whereas outside the query is only eventually consistent. However, there are also disadvantages, in that the write rate of an entity group is limited to 1 per second for the whole group.

Datastore keys are a little more analogous to internal SQL row identifiers, but of course not entirely. Identifiers in Appengine are a bit like SQL primary keys. To support decentralised concurrent creation of new keys by many application instances in a cloud of servers, AppEngine internally generates the keys to guarantee uniqueness. Your application defines parameters (application identifier, optional namespace, kind and optional entity identifier) which AppEngine uses to seed its key generator. If you do not provide an identifier, AppEngine will generate a unique numeric identifier that you can read.
Eventual consistency takes time so it is occasionally more efficient to request multiple new keys in bulk. AppEngine then generates a range of numeric entity identifiers for you. You can read their values from keys as KeyProperty metadata.
Ancestry is used to group together writes of related entities of all kinds for the purpose of transactions and isolation. There is no predefined schema for this but you are limited to one parent per child.
In your example, one particular Shoe might have a particular Person as parent. Another particular Shoe could have a Horse as parent. And another Shoe might have no parent. Many entities of all kinds can have the same parent, so several Car entities could also have that initial Person as parent. The Datastore is schemaless, so it's up to your application to allow or forbid a Car to have a Horse as parent.
Note that a child knows its parent, but a parent does not know its children, because implementing that would impact scalability.

Discovering referers to SQLAlchemy object

I have a lot of model classes with ralations between them with a CRUD interface to edit. The problem is that some objects can't be deleted since there are other objects refering to them. Sometimes I can setup ON DELETE rule to handle this case, but in most cases I don't want automatic deletion of related objects till they are unbound manually. Anyway, I'd like to present editor a list of objects refering to currently viewed one and highlight those that prevent its deletion due to FOREIGN KEY constraint. Is there a ready solution to automatically discover referers?
Update
The task seems to be quite common (e.g. django ORM shows all dependencies), so I wonder that there is no solution to it yet.
There are two directions suggested:
Enumerate all relations of current object and go through their backref. But there is no guarantee that all relations have backref defined. Moreover, there are some cases when backref is meaningless. Although I can define it everywhere I don't like doing this way and it's not reliable.
(Suggested by van and stephan) Check all tables of MetaData object and collect dependencies from their foreign_keys property (the code of sqlalchemy_schemadisplay can be used as example, thanks to stephan's comments). This will allow to catch all dependencies between tables, but what I need is dependencies between model classes. Some foreign keys are defined in intermediate tables and have no models corresponding to them (used as secondary in relations). Sure, I can go farther and find related model (have to find a way to do it yet), but it looks too complicated.
Solution
Below is a method of base model class (designed for declarative extention) that I use as solution. It is not perfect and doesn't meet all my requirements, but it works for current state of my project. The result is collected as dictionary of dictionaries, so I can show them groupped by objects and their properties. I havn't decided yet whether it's good idea, since the list of referers sometimes is huge and I'm forced to limit it to some reasonable number.
def _get_referers(self):
db = object_session(self)
cls, ident = identity_key(instance=self)
medatada = cls.__table__.metadata
result = {}
# _mapped_models is my extension. It is collected by metaclass, so I didn't
# look for other ways to find all model classes.
for other_class in medatada._mapped_models:
queries = {}
for prop in class_mapper(other_class).iterate_properties:
if not (isinstance(prop, PropertyLoader) and \
issubclass(cls, prop.mapper.class_)):
continue
query = db.query(prop.parent)
comp = prop.comparator
if prop.uselist:
query = query.filter(comp.contains(self))
else:
query = query.filter(comp==self)
count = query.count()
if count:
queries[prop] = (count, query)
if queries:
result[other_class] = queries
return result
Thanks to all who helped me, especially stephan and van.

SQL: I have to absolutely disagree with S.Lott' answer.
I am not aware of out-of-the-box solution, but it is definitely possible to discover all the tables that have ForeignKey constraints to a given table. One needs to use properly the INFORMATION_SCHEMA views such as REFERENTIAL_CONSTRAINTS, KEY_COLUMN_USAGE, TABLE_CONSTRAINTS, etc. See SQL Server example. With some limitations and extensions, most versions of new relational databases support INFORMATION_SCHEMA standard. When you have all the FK information and the object (row) in the table, it is a matter of running few SELECT statements to get all other rows in other tables that refer to given row and prevent it from being deleted.
SqlAlchemy: As noted by stephan in his comment, if you use orm with backref for relations, then it should be quite easy for you to get the list of parent objects that keep reference to the object you are trying to delete, because those objects are basically mapped properties of your object (child1.Parent).
If you work with Table objects of sql alchemy (or not always use backref for relations), then you would have to get values of foreign_keys for all the tables, and then for all those ForeignKeys call references(...) method, providing your table as a parameter. In this way you will find all the FKs (and tables) that have reference to the table your object maps to. Then you can query all the objects that keep reference to your object by constructing the query for each of those FKs.

In general, there's no way to "discover" all of the references in a relational database.
In some databases, they may use declarative referential integrity in the form of explicit Foreign Key or Check constraints.
But there's no requirement to do this. It can be incomplete or inconsistent.
Any query can include a FK relationship that is not declared. Without the universe of all queries, you can't know the relationships which are used but not declared.
To find "referers" in general, you must actually know the database design and have all queries.

For each model class, you can easily see if all its one-to-many relations are empty simply by asking for the list in each case and seeing how many entries it contains. (There is probably a more efficient way implemented in terms of COUNT, too.) If there are any foreign keys relating to the object, and you have your object relations set up correctly, then at least one of these lists will be non-zero in length.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.