Google NDB: Adding an entity with non existing parent

Google NDB: Adding an entity with non existing parent - python

I am working on a web application based on Google App Engine (Python / Webapp2) and Google NDB Datastore.
I assumed that if I tried to add a new entity using as parent key the key of a no longer existing entity an exception was thrown. I have instead found the entity is actually created.
Am i doing something wrong?
I may check before whether the parent still exist through a keys_only query. Does it consume GAE read quotas?

You can create a key for any entity whether this entity exists or not. This is because a key is simply an encoding of an entity kind and either an id or name (and ancestor keys, if any).
This means that you can store a child entity before a parent entity is saved, as long as you know the parent's id or name. You cannot reassign a child from one parent to another, though.

As for your second question, the AppEngine pricing page says:
Calls to the datastore API result in the following billable operations. Small datastore operations include calls to allocate datastore ids or keys-only queries. These operations are free.
Complementing on #andrei's answer to the first question, no key reference in Ndb is checked for refering to an existing entity, this is true for keys used as parent, as well as for keys used as ̀KeyProperty within an entity.

Related

Google Cloud Datastore - is it possible to use a transaction for a single root entity?

I'm new to Google CloudDatastore and reading a document.
(Note: we don't plan to use Google AppEngine, just DataStore only.)
According to the document, DataStore supports transaction but
If you want to use queries within a transaction,
your data must be organized into entity groups in such a way
that you can specify ancestor filters that will match the right data.
So I thought as long as I want to use transaction, I am forced to create some parent key and set it as an ancestor. And all entities under the parent have a limitation that update and transaction can be only performed once per second.
However, I also see a very simple example of insert here:
https://cloud.google.com/datastore/docs/concepts/entities#datastore-insert-python
with client.transaction():
incomplete_key = client.key('Task')
task = datastore.Entity(key=incomplete_key)
task.update({
'category': 'Personal',
'done': False,
'priority': 4,
'description': 'Learn Cloud Datastore'
})
client.put(task)
It doesn't specify a parent and use a single root entity inside a transaction, does it ? Even about examples in Transaction page only the one for "read-only transaction" explicitly specifies a parent. Do other ones simply omit a parent while it actually exists?
I'm wondering I can use transaction without an entity group (= without a big performance degrade) if I can specify a key of a root entity, but there is no such description in the document...
I'd appreciate if someone can clarify the behavior. Thanks.

Transactions across multiple entity groups is indeed allowed (with a limit of 25 entity groups per documentation)
If you want to use queries within a transaction,
Note this key sentence in the text you quoted. It is saying any 'queries' you want to issue inside of a transaction need to be ancestor queries. This is because non-ancestor queries are eventually consistent, so it would be impossible for the transaction engine to reason about any state changes and hence not know when to fail or succeed the transaction. It is not saying you cannot to do transactions across entity groups.
It doesn't specify a parent and use a single root entity inside a
transaction, does it ?
I think this is the other source of confusion. Only children entities have parents specified to denote which entity group they are in. When no parent is specified, then the entity in question is a root entity (it's parent is root). Another way of saying this is every root entity is it's own entity group.

Technically the task entity in your description constitutes an entity group even though it has no child entities. The max number of entity groups allowed is 25 so if you try to create more than 25 top-level entities using this pattern your queries will fail.
The way I avoid performance hits is to use multiple entity groups. I structure my datastore so that I have multiple root entities and try to limit multiple transactions within an entity group.

GAE: Find children of an ndb entity

In the GAE I'm trying to find a way to implement or construct something similar to a foreign key constraint in a SQL database. Basically, deleting an entity that is being refered by another entity via a foreign key constraint should not be allowed (in other words: deleting a parent should not be allowed if there are children that refer to that parent).
I tried the KeyProperty in the ndb datastore but that gives me no options to find all depenend entites from the entity I want to delete. Also the ancestor hierarchy doesn't seem to cut it. I can query the ancestor from the children, but there doesn't seem a way to query the children from the ancestor.
Is there any way to either get the children from an ancestor or another database design in the GAE ndb datastore to implement this foreign key constraint?

Is there any way to either get the children from an ancestor
Yes it's called a query, used in conjunction with a KeyProperty in the child holding the key of the parent, or having the parent as the ancestor of the child key.
You can find all children of an ancestor irrespective of kind using an kindless ancestor query - https://cloud.google.com/appengine/docs/python/datastore/queries?hl=en#Python_Kindless_ancestor_queries
There is no such thing as a foreign key constraint in the datastore.
Not sure how it doesn't "cut it" apart from not performing the delete for you.

As Tim points out, if you use the ancestor solution you can easily query for all children. The same would be true if you use KeyProperty; you can easily query for all entities pointing at the current key:
MyChildModel.all().filter(MyChildModel.my_key_property==my_parent_entity.key)
Again, it's not clear how this fairly simple solution wouldn't "cut it".
However it is worth mentioning that it is a fundamental mistake to treat the datastore as if it were a relational database. It is not, and you should not try. You cannot enforce referential integrity; eventual consistency makes that impossible.

Simple explanation of Google App Engine NDB Datastore

I'm creating a Google App Engine application (python) and I'm learning about the general framework. I've been looking at the tutorial and documentation for the NDB datastore, and I'm having some difficulty wrapping my head around the concepts. I have a large background with SQL databases and I've never worked with any other type of data storage system, so I'm thinking that's where I'm running into trouble.
My current understanding is this: The NDB datastore is a collection of entities (analogous to DB records) that have properties (analogous to DB fields/columns). Entities are created using a Model (analogous to a DB schema). Every entity has a key that is generated for it when it is stored. This is where I run into trouble because these keys do not seem to have an analogy to anything in SQL DB concepts. They seem similar to primary keys for tables, but those are more tightly bound to records, and in fact are fields themselves. These NDB keys are not properties of entities, but are considered separate objects from entities. If an entity is stored in the datastore, you can retrieve that entity using its key.
One of my big questions is where do you get the keys for this? Some of the documentation I saw showed examples in which keys were simply created. I don't understand this. It seemed that when entities are stored, the put() method returns a key that can be used later. So how can you just create keys and define ids if the original keys are generated by the datastore?
Another thing that I seem to be struggling with is the concept of ancestry with keys. You can define parent keys of whatever kind you want. Is there a predefined schema for this? For example, if I had a model subclass called 'Person', and I created a key of kind 'Person', can I use that key as a parent of any other type? Like if I wanted a 'Shoe' key to be a child of a 'Person' key, could I also then declare a 'Car' key to be a child of that same 'Person' key? Or will I be unable to after adding the 'Shoe' key?
I'd really just like a simple explanation of the NDB datastore and its API for someone coming from a primarily SQL background.

I think you've overcomplicating things in your mind. When you create an entity, you can either give it a named key that you've chosen yourself, or leave that out and let the datastore choose a numeric ID. Either way, when you call put, the datastore will return the key, which is stored in the form [<entity_kind>, <id_or_name>] (actually this also includes the application ID and any namespace, but I'll leave that out for clarity).
You can make entities members of an entity group by giving them an ancestor. That ancestor doesn't actually have to refer to an existing entity, although it usually does. All that happens with an ancestor is that the entity's key includes the key of the ancestor: so it now looks like [<parent_entity_kind>, <parent_id_or_name>, <entity_kind>, <id_or_name>]. You can now only get the entity by including its parent key. So, in your example, the Shoe entity could be a child of the Person, whether or not that Person has previously been created: it's the child that knows about the ancestor, not the other way round.
(Note that that ancestry path can be extended arbitrarily: the child entity can itself be an ancestor, and so on. In this case, the group is determined by the entity at the top of the tree.)
Saving entities as part of a group has advantages in terms of consistency, in that a query inside an entity group is always guaranteed to be fully consistent, whereas outside the query is only eventually consistent. However, there are also disadvantages, in that the write rate of an entity group is limited to 1 per second for the whole group.

Datastore keys are a little more analogous to internal SQL row identifiers, but of course not entirely. Identifiers in Appengine are a bit like SQL primary keys. To support decentralised concurrent creation of new keys by many application instances in a cloud of servers, AppEngine internally generates the keys to guarantee uniqueness. Your application defines parameters (application identifier, optional namespace, kind and optional entity identifier) which AppEngine uses to seed its key generator. If you do not provide an identifier, AppEngine will generate a unique numeric identifier that you can read.
Eventual consistency takes time so it is occasionally more efficient to request multiple new keys in bulk. AppEngine then generates a range of numeric entity identifiers for you. You can read their values from keys as KeyProperty metadata.
Ancestry is used to group together writes of related entities of all kinds for the purpose of transactions and isolation. There is no predefined schema for this but you are limited to one parent per child.
In your example, one particular Shoe might have a particular Person as parent. Another particular Shoe could have a Horse as parent. And another Shoe might have no parent. Many entities of all kinds can have the same parent, so several Car entities could also have that initial Person as parent. The Datastore is schemaless, so it's up to your application to allow or forbid a Car to have a Horse as parent.
Note that a child knows its parent, but a parent does not know its children, because implementing that would impact scalability.

Is there a function to check whether an ID you want to use for an entity is available?

I think I read something about a function appengine has that can tell whether an ID / key you want to use for an entity is available, or if there was a function to get an available ID to choose. App engine team said also that we should set the ID when the entity is created and not change it. But in practice we can just copy everything to a new entity with the new ID?
Thanks!
Update
I think the function I'm looking for is allocateIDs from the docs:
http://code.google.com/appengine/docs/python/datastore/functions.html

To reserve one or more IDs, use allocate_ids(). To check whether an ID is already taken, just construct a Key for it using Key.from_path(kind, id) and try to db.get() it. Also note that IDs for keys with a parent are taken from separate pools and are only unique among keys with the same parent.

On the page describing transactions, a use case is presented where the entity in question, a SalesAccount is updated, or if the account doesn't exist, it is created instead. The technique is to just try to load the entity with the given key; and if it returns nothing, create it. It's important to do this inside a transaction to avoid the situation where two users are both racing for the same key, and both see that it doesn't exist (and both try to create it).

appengine: cached reference property?

How can I cache a Reference Property in Google App Engine?
For example, let's say I have the following models:
class Many(db.Model):
few = db.ReferenceProperty(Few)
class Few(db.Model):
year = db.IntegerProperty()
Then I create many Many's that point to only one Few:
one_few = Few.get_or_insert(year=2009)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Many.get_or_insert(few=one_few)
Now, if I want to iterate over all the Many's, reading their few value, I would do this:
for many in Many.all().fetch(1000):
print "%s" % many.few.year
The question is:
Will each access to many.few trigger a database lookup?
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time?
As noted in one comment: I know about memcache, but I'm not sure how I can "inject it" when I'm calling the other entity through a reference.
In any case memcache wouldn't be useful, as I need caching within an execution, not between them. Using memcache wouldn't help optimizing this call.

The first time you dereference any reference property, the entity is fetched - even if you'd previously fetched the same entity associated with a different reference property. This involves a datastore get operation, which isn't as expensive as a query, but is still worth avoiding if you can.
There's a good module that adds seamless caching of entities available here. It works at a lower level of the datastore, and will cache all datastore gets, not just dereferencing ReferenceProperties.
If you want to resolve a bunch of reference properties at once, there's another way: You can retrieve all the keys and fetch the entities in a single round trip, like so:
keys = [MyModel.ref.get_value_for_datastore(x) for x in referers]
referees = db.get(keys)
Finally, I've written a library that monkeypatches the db module to locally cache entities on a per-request basis (no memcache involved). It's available, here. One warning, though: It's got unit tests, but it's not widely used, so it could be broken.

The question is:
Will each access to many.few trigger a database lookup? Yes. Not sure if its 1 or 2 calls
If yes, is it possible to cache somewhere, as only one lookup should be enough to bring the same entity every time? You should be able to use the memcache repository to do this. This is in the google.appengine.api.memcache package.
Details for memcache are in http://code.google.com/appengine/docs/python/memcache/usingmemcache.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.