Fun with GAE: using key_name as PK? - python

I want to insert new entities programatically as well as manually. For this I was thinking about using key_name to uniquely identify an entity.
The problem is that I don't know how to get the model to generate a new unique key name when I create the entity.
On the other hand, I cannot create the ID (which is unique across data store) manually.
How can I do "create unique key name if provided value is None"?
Thanks for your help!

If you really need a string id (as opposed to an automatically assigned integer id), you could use a random string generator, or unique id generator like uuid.uuid4.

I don't really understand your question. If you want an automatically-generated key name, just leave out the key when you instantiate the object - one will be automatically assigned when you call put().

If most of the time you want your entities to have automatically assigned ids, then just go around creating your entities (without passing a key or key_name), the key will be auto-assigned and your entities will have .key().id() available.
If sometimes you need to assign the numeric ids manually, then you can reserve a block of ids so that AppEngine will never auto-assign them, and then you can use an id from this reserved range whenever you want to assign an id to a known entity. To assign an id to an entity you create a Key for that entity:
# reserve ids 4000 to 5000 for use manually on the Customer entity
# manualy create a customer with id 4501
bob = Customer(key=db.Key.from_path('Customer',4501), name='bob')


Firestore: how to specify the firestore key to be used for merge?

I have a function written in nodejs and another in python. They both do the same thing in different scripts.
I currently have a function that creates a firestore collection called profile, then insert a document, which has as a name, the document id created by firestore
The document itself contains an object, representing the user, first name, last name, email, and phone.
the phone is a list, or array [], it always contains just one phone number, later on we'll add multiple phone numbers per user but for now, you can assume that there is just one element in a list. Phone can never be null, email can be null sometimes.
What I want to do, is if the phone doesn't exist already then insert a new user, otherwise update the existing user. So there should be no duplication.
Can this be done via merge in firestore or update or should I do a where query? I know that a query is always possible, but I want to know if I can use a merge on a list field in firestore.
If you don't have any document reference to update to and need to query the field phone_number then you need to use the where() method with the array_contains operator. Use the array_contains operator to filter based on array values. If a document is found you can use arrayUnion() to add elements to an array but only elements are not already present. See sample snippets below:
# Collection Reference
col_ref = db.collection(u'profile')
# Check collection if there's any array in documents that contains the `phone_number`
admin_ref = col_ref.where(u'phone_number', u'array_contains', 12345).get()
# Checks if there's a result.
if admin_ref:
# Iterate the result
for item in admin_ref:
doc = col_ref.document(
# Adds multiple phone numbers to the field, but only adds new ones.
{u'phone_number': firestore.ArrayUnion([12345,54321])})
# If there's no result, then add a document with the field name `phone_number`
col_ref.add({'phone_number': [123456]})
For more information, you can check these documentations:
Array membership
Update elements in an array

Dealing with python hash() collision

I've created a program to take a users predetermined unique identifier, hash it, and store it in a dictionary mapping to the user's name. I later receive the unique identifier, rehash it, and can look up the user's name.
I've come to a problem where an individual's 9 digit unique ID hash()'s to the same number as somebody else. This has occurred after gathering data for about 40 users.
Is there a common work around to this? I believe this is different than just using a hashmap, because if I create a bucket for the hashed ID, I won't be able to tell who the user was (whether it be the first item in the bucket or second).
id = raw_input()
hashed_id = hash(id)
if not dictionary.has_key(hashed_id):
name = raw_input()
dictionary[hashed_id] = name
I have never seen hash() used for this. hash() should be used for data structures as a shorthand for the entire object, such as keys in the internal implementation of dictionaries.
I would suggest using a UUID (universally unique identifier) for your users instead.
import uuid
# UUID('d36b850c-2433-42c6-9252-6371ea3d33c2')
You'll be very hard pressed to get a collision out of UUIDs.

Where clause in Google App Engine Datastore

The model for my Resource class is as follows:
class Resource(ndb.Model):
name = ndb.StringProperty()
availability = ndb.StructuredProperty(Availability, repeated=True)
tags = ndb.StringProperty(repeated=True)
owner = ndb.StringProperty()
id = ndb.StringProperty(indexed=True, required=True)
lastReservedTime = ndb.DateTimeProperty(auto_now_add=False)
startString = ndb.StringProperty()
endString = ndb.StringProperty()
I want to extract records where the owner is equal to a certain string.
I have tried the below query. It does not give an error but does not return any result either.
Resource.query(Resource.owner== '').fetch()
As per my understanding if a column has duplicate values it shouldn't be indexed and that is why owner is not indexed. Please correct me if I am wrong.
Can someone help me figure out how to achieve a where clause kind of functionality?
Any help is appreciated! Thanks!
Just tried this. It worked first time. Either you have no Resource entities with an owner of "", or the owner property was not indexed when the entities were put (which can happen if you had indexed=False at the time the entities were put).
My test:
Resource(id='1', owner='').put()
Resource(id='2', owner='').put()
resources = Resource.query(Resource.owner == '').fetch()
assert len(resources) == 2
Also, your comment:
As per my understanding if a column has duplicate values it shouldn't
be indexed and that is why owner is not indexed. Please correct me if
I am wrong.
Your wrong!
Firstly, there is no concept of a 'column' in a datastore model, so I will I assume you mean 'Property'.
Next, to clarify what you mean by "if a column property has duplicate values":
I assume you mean 'multiple entities created from the same model with the same value for a specific property', in your case 'owner'. This has no effect on indexing, each entity will be indexed as expected.
Or maybe you mean 'a single entity with a property that allows multiple values (ie a list)', which also does not prevent indexing. In this case, the entity will be indexed multiple times, once for each item in the list.
To further elaborate, most properties (ie ones that accept primitive types such as string, int, float etc) are indexed automatically, unless you add the attribute indexed=False to the Property constructor. In fact, the only time you really need to worry about indexing is when you need to perform more complex queries, which involve querying against more that 1 property (and even then, by default, the app engine dev server will auto create the indexes for you in your local index.yaml file), or using inequality filters.
Please read the docs for more detail.
Hope this helps!

use IDs instead of instances to create new objects with foreign keys

I want to create a new Django ORM object with three foreign keys. I got the IDs of the foreign rows already, and I mean - that's all I need to fill the foreign key columns in my new row, right? However, I don't seem able to create the new row without hitting the DB three times to instantiate objects out of those IDs.
So what I need to do:
foreign_object = models.ForeignObject.get(pk=foreign_object_id)
a = models.Object1.get_or_create(f = foreign_object)
What I'd like to do:
a = models.Object1.get_or_create(f_id = foreign_object_id)
f_id however is not a field Django recognizes. If I just assign foreign_object_id to f (I think I recall this works in some cases), Django complains that it wants a ForeignObject instance instead of an int.
Any way to do this?
You need to use the double-underscore notation in this case
a = models.Object1.get_or_create(f__pk=foreign_object_id)

AppEngine: Query datastore for records with <missing> value

I created a new property for my db model in the Google App Engine Datastore.
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
class Logo(db.Model):
name = db.StringProperty()
image = db.BlobProperty()
is_approved = db.BooleanProperty(default=False)
How to query for the Logo records, which to not have the 'is_approved' value set?
I tried
logos.filter("is_approved = ", None)
but it didn't work.
In the Data Viewer the new field values are displayed as .
According to the App Engine documentation on Queries and Indexes, there is a distinction between entities that have no value for a property, and those that have a null value for it; and "Entities Without a Filtered Property Are Never Returned by a Query." So it is not possible to write a query for these old records.
A useful article is Updating Your Model's Schema, which says that the only currently-supported way to find entities missing some property is to examine all of them. The article has example code showing how to cycle through a large set of entities and update them.
A practice which helps us is to assign a "version" field on every Kind. This version is set on every record initially to 1. If a need like this comes up (to populate a new or existing field in a large dataset), the version field allows iteration through all the records containing "version = 1". By iterating through, setting either a "null" or another initial value to the new field, bump the version to 2, store the record, allows populating the new or existing field with a default value.
The benefit to the "version" field is that the selection process can continue to select against that lower version number (initially set to 1) over as many sessions or as much time is needed until ALL records are updated with the new field default value.
Maybe this has changed, but I am able to filter records based on null fields.
When I try the GQL query SELECT * FROM Contact WHERE demo=NULL, it returns only records for which the demo field is missing.
According to the doc
The right-hand side of a comparison can be one of the following (as
appropriate for the property's data type): [...] a Boolean literal, as TRUE or
FALSE; the NULL literal, which represents the null value (None in
I'm not sure that "null" is the same as "missing" though : in my case, these fields already existed in my model but were not populated on creation. Maybe Federico you could let us know if the NULL query works in your specific case?

