I'm having some trouble understanding how entities and keys work in Google App Engine NDB.
I have a post entity and a user entity. How do I set the user_key on post to user?
In the interactive console, I have this so far:
from google.appengine.ext import ndb
from app.lib.posts import Post
from app.lib.users import User
from random import shuffle
users = User.query()
posts = Post.query().fetch()
for post in posts:
post.user_key = shuffle(users)[0]
post.put()
I'm just trying to set up some seed data for development. I know this probably isn't the ideal way to set things, but my first question is:
How do I get a key from an entity (the reverse is described in the docs)
How do I set associations in ndb?
try:
post.user_key = shuffle(users)[0].key
Maybe this helps to understand the NDB. I had the same questions with you.
class Person(ndb.Expando):
pass
class Favourite(ndb.Expando):
pass
class Picture(ndb.Expando):
pass
person = Person()
person.put()
picture = Picture()
picture.put()
fav = Favourite(parent=person.key,
person=person.key,
picture=picture.key
)
fav.put()
Verify that shuffle works in this case, as User.query() returns an iter, not a list. (You can convert it to a list using shuffle( [ x for x in users ] ). Beware, this list could be looong.
NDB has some really wired behavior sometimes, so id recommend you dont store an NDB-Key, but its serialized string, which is also compatible to ext.db: post.user_key = shuffle( [ x for x in users ] ).key.urlsafe()
You could use KeyProperty for associations. If you need a more fine-graned control over your relations, you must implement them yourself.
See https://developers.google.com/appengine/docs/python/ndb/properties#structured
Related
The documentation (https://cloud.google.com/appengine/docs/python/ndb/) states that
NDB uses Memcache as a cache service for "hot spots" in the data
I am now using memcache only as follows:
memcache.set(key=(id), value=params, time=0)
That expires (auto flushes) pretty often and so I would like to use NDB Datastore.
I thought I would have to always put the key-value in both NDB and Memcache, then check both.
Is this being done automatically by NDB?
Ie.
ancestor_key = ndb.Key("Book", guestbook_name or "*notitle*")
greetings = Greeting.query_book(ancestor_key).fetch(20)
Would that implicitly set Memcache ?
And when I read from NDB, would it implicitly try a memcache.get(key) first?
Thanks for your patience.
EDIT - What I tried:
As a test I tried something like this:
class Book(ndb.Model):
content = ndb.StringProperty()
class update(webapp2.RequestHandler):
def post(self):
p1='1'
p2='2'
p3='3'
p4='4'
p5='5'
id='test'
paramarray = (p1,p2,p3,p4,p5)
book = Book(name=id,value=paramarray)
# OR likes this - book = Book(ndb.Key(id),value=paramarray)
book.put()
Both versions error out.
Trying to get a key of the var id with the values of paramarray
EDIT 2 Daniel, Thank you for everything.
Have follow up formatting questions, will ask a new question.
Yes; see the full documentation on ndb caching. Basically, every write is cached both in a request-local in-context cache, and in the main memcached store; a get by key will look up in both caches first before falling back to the real datastore.
Edit I can't understand why you think your example would work. You defined a model with a content property, but then try to set name and value properties on it; naturally that will fail.
You should go through the ndb documentation, which gives a good introduction to using the model class.
I'm writing a REST api over Google App Engine NDB
I excluded existing libraries because I need control over transactions and caching.
I excluded Google Endpoints for a similar reason and also because I don't want to use the javascript client they provide.
When evaluating architectural decisions I encountered some problems and weird situations, this is because my experience with python and pythonic style is probably not big enough.
In this moment I tried to come up with some guidelines that should shape my codebase:
in Handlers -> create the dictionary representation of the objects
and return them as json; perform authentication and authorization
checks
encapsulate ndb interaction in Services
Model classes always receive Model objects or keys as parameters and return Model objects or Model lists
Model classes are imported and used in services
One particular thing I encountered is this
I have a many to many relationship that I implemented with a mapping Model, something like
UserOrganizationMembership, that has the keys of the User and the Organization
now, in my Service, at some point I want to return an object that has the list of the organizations the current user is member of and recursively fetch the companies that are in each organization:
'organizations': [
{
'name': 'TEST'
'companies': [ {company1}, {company2}]
},
{
...
}
]
I do this
def user_organizatios_with_companies(user):
def fetch_companies(x):
x.companies = Company.by_organization(x) #NOTICE THIS
return x
user_organizations = [ membership.organization.get() for membership in UserOrganizationMembership.by_user(user)]
return [fetch_companies(x) for x in user_organizations]
in the highlighted line I attach the dynamic property 'companies' to the Organization Model
now if I call this method in my Handler, when I create the dictionary representation to output json, ndb.Model.to_dict() implementation ignores dynamically attached properties.
One solution I tried is this (in my Handler)
xs = Service.user_organizatios_with_companies(u)
organizations = [x.to_dict() for x in xs]
for x in xrange(0,len(xs)):
organizations[x]['companies'] = xs[x].to_dict()
but I don't like it because I need to know that each organization has a 'companies' property and the code seems a bit complicated and not obvious
another approach is to override ndb.Model.to_dict()
isolating dynamically attached properties and providing a dictionary representation, this simplifies my code in the Handler letting me to only call to_dict() on the stuff returned by the Service.
from google.appengine.ext import ndb
import util
class BaseModel(ndb.Model):
created = ndb.DateTimeProperty(auto_now_add = True)
updated = ndb.DateTimeProperty(auto_now = True)
# App Engine clock times are always
# expressed in coordinated universal time (UTC).s
def to_dict(self, include=None, exclude=None):
result = super(BaseModel,self).to_dict(include=include, exclude=exclude)
result['key'] = self.key.id() #get the key as a string
# add properties dynamically added to the class
dynamic_vars = {k:v for (k,v) in vars(self).iteritems() if not k.startswith('_')}
for prop, val in dynamic_vars.iteritems():
result[prop] = val.to_dict() if not isinstance(val, list) else [x.to_dict() for x in val]
return util.model_db_to_object(result)
do you have any recommendation against this approach? any thought will be appreciated!
ndb supports dynamic properties through the Expando type.
Instead of defining your models as:
class BaseModel(ndb.Model):
Define it using Expando:
class BaseModel(ndb.Expando):
Now, if you write x.companies = [...], calling _to_dict() will output those companies. Just be careful about when you put() these entities, as any dynamically added properties will also be put into the Datastore.
I'm rewriting the back end of an app to use Django, and I'd like to keep the front end as untouched as possible. I need to be consistent with the JSON that is sent between projects.
In models.py I have:
class Resource(models.Model):
# Name chosen for consistency with old app
_id = models.AutoField(primary_key=True)
name = models.CharField(max_length=255)
#property
def bookingPercentage(self):
from bookings.models import Booking
return Booking.objects.filter(resource=self)
.aggregate(models.Sum("percent"))["percent__sum"]
And in views.py that gets all resource data as JSON:
def get_resources(request):
resources = []
for resource in Resource.objects.all():
resources.append({
"_id": resource._id,
"name": resource.first,
"bookingPercentage": resource.bookingPercentage
})
return HttpResponse(json.dumps(resources))
This works exactly as I need it to, but it seems somewhat antithetical to Django and/or Python. Using .all().values will not work because bookinPercentage is a derived property.
Another issue is that there are other similar models that will need JSON representations in pretty much the same way. I would be rewriting similar code and just using different names for the values of the models. In general is there a better way to do this that is more pythonic/djangothonic/does not require manual creation of the JSON?
Here's what I do in this situation:
def get_resources(request):
resources = list(Resource.objects.all())
for resource in resources:
resource.booking = resource.bookingPercentage()
That is, I create a new attribute for each entity using the derived property. It's only a local attribute (not stored in the database), but it's available for your json.dumps() call.
It sounds like you just want a serialisation of your models, in JSON. You can use the serialisers in core:
from django.core import serializers
data = serializers.serialize('json', Resource.objects.all(), fields=('name','_id', 'bookingPercentage'))
So just pass in your Model class, and the fields you want to serialize into your view:
get_resources(request, model_cls, fields):
documentation here https://docs.djangoproject.com/en/dev/topics/serialization/#id2
I'm trying to get my head around Cassandra/Pycassa db design.
With Mongoengine, you can refer to another class using "ReferenceField", as follows:
from mongoengine import *
class User(Document):
email = StringField(required=True)
first_name = StringField(max_length=50)
last_name = StringField(max_length=50)
class Post(Document):
title = StringField(max_length=120, required=True)
author = ReferenceField(User)
As far as I can tell from the documentation, the Pycassa equivalent is something like this, but I don't know how to create a reference from the Post class author field to the User class:
from pycassa.types import *
from pycassa.pool import ConnectionPool
from pycassa.columnfamilymap import ColumnFamilyMap
import uuid
class User(object):
key = LexicalUUIDType()
email = UTF8Type()
first_name = UTF8Type()
last_name = UTF8Type()
class Post(object):
key = LexicalUUIDType()
title = UTF8Type()
author = ???
What is the preferred way to do something like this? Obviously I could just put the User key in the Post author field, but I'm hoping there's some better way where all this is handled behind the scenes, like with Mongoengine.
#jterrace is correct, you're probably going about this the wrong way. With Cassandra, you don't tend to be concerned as much with objects, how they relate, and how to normalize that. Instead, you have to ask yourself "What queries do I need to be able to answer efficiently?", and then pre-build the answers for those queries. This usually involves a mixture of denormalization and the "wide row" model. I highly suggest that you read some articles about data modeling for Cassandra online.
With that said, pycassa's ColumnFamilyMap is just a thin wrapper that can cut down on boilerplate, nothing more. It does not attempt to provide support for anything complicated because it doesn't know what kinds of queries you need to be able to answer. So, specifically, you could store the matching User's LexicalUUID in the author field, but pycassa will not automatically fetch that User object for you when you fetch the Post object.
I think you're really misunderstanding the data model for Cassandra. You should read Cassandra Data Model before continuing.
pycassa has no notion of "objects" like you have defined above. There are only column families, row key types, and column types. There is no such thing as a reference from one column family to another in Cassandra.
I'm looking for an excuse to learn Django for a new project that has come up. Typically I like to build RESTful server-side interfaces where a URL maps to resources that spits out data in some platform independent context, such as XML or JSON. This is
rather straightforward to do without the use of frameworks, but some of them such as Ruby on Rails conveniently allow you to easily spit back XML to a client based on the type of URL you pass it, based on your existing model code.
My question is, does something like Django have support for this? I've googled and found some 'RESTful' 3rd party code that can go on top of Django. Not sure if I'm too keen on that.
If not Django, any other Python framework that's already built with this in mind so I do not have to reinvent the wheel as I already have in languages like PHP?
This is probably pretty easy to do.
URL mappings are easy to construct, for example:
urlpatterns = patterns('books.views',
(r'^books/$', 'index'),
(r'^books/(\d+)/$', 'get'))
Django supports model serialization, so it's easy to turn models into XML:
from django.core import serializers
from models import Book
data = serializers.serialize("xml", Book.objects.all())
Combine the two with decorators and you can build fast, quick handlers:
from django.http import HttpResponse
from django.shortcuts import get_object_or_404
def xml_view(func):
def wrapper(*args, **kwargs):
result = func(*args, **kwargs)
return HttpResponse(serializers.serialize("xml", result),
mimetype="text/xml")
return wrapper
#xml_view
def index(request):
return Books.objects.all()
#xml_view
def get(request, id):
return get_object_or_404(Book, pk=id)
(I had to edit out the most obvious links.)
+1 for piston - (link above). I had used apibuilder (Washington Times open source) in the past, but Piston works easier for me. The most difficult thing for me is in figuring out my URL structures for the API, and to help with the regular expressions. I've also used surlex which makes that chore much easier.
Example, using this model for Group (from a timetable system we're working on):
class Group(models.Model):
"""
Tree-like structure that holds groups that may have other groups as leaves.
For example ``st01gp01`` is part of ``stage1``.
This allows subgroups to work. The name is ``parents``, i.e.::
>>> stage1group01 = Group.objects.get(unique_name = 'St 1 Gp01')
>>> stage1group01
>>> <Group: St 1 Gp01>
# get the parents...
>>> stage1group01.parents.all()
>>> [<Group: Stage 1>]
``symmetrical`` on ``subgroup`` is needed to allow the 'parents' attribute to be 'visible'.
"""
subgroup = models.ManyToManyField("Group", related_name = "parents", symmetrical= False, blank=True)
unique_name = models.CharField(max_length=255)
name = models.CharField(max_length=255)
academic_year = models.CharField(max_length=255)
dept_id = models.CharField(max_length=255)
class Meta:
db_table = u'timetable_group'
def __unicode__(self):
return "%s" % self.name
And this urls.py fragment (note that surlex allows regular expression macros to be set up easily):
from surlex.dj import surl
from surlex import register_macro
from piston.resource import Resource
from api.handlers import GroupHandler
group_handler = Resource(GroupHandler)
# add another macro to our 'surl' function
# this picks up our module definitions
register_macro('t', r'[\w\W ,-]+')
urlpatterns = patterns('',
# group handler
# all groups
url(r'^groups/$', group_handler),
surl(r'^group/<id:#>/$', group_handler),
surl(r'^group/<name:t>/$', group_handler),)
Then this handler will look after JSON output (by default) and can also do XML and YAML.
class GroupHandler(BaseHandler):
"""
Entry point for Group model
"""
allowed_methods = ('GET', )
model = Group
fields = ('id', 'unique_name', 'name', 'dept_id', 'academic_year', 'subgroup')
def read(self, request, id=None, name=None):
base = Group.objects
if id:
print self.__class__, 'ID'
try:
return base.get(id=id)
except ObjectDoesNotExist:
return rc.NOT_FOUND
except MultipleObjectsReturned: # Should never happen, since we're using a primary key.
return rc.BAD_REQUEST
else:
if name:
print self.__class__, 'Name'
return base.filter(unique_name = name).all()
else:
print self.__class__, 'NO ID'
return base.all()
As you can see, most of the handler code is in figuring out what parameters are being passed in urlpatterns.
Some example URLs are api/groups/, api/group/3301/ and api/group/st1gp01/ - all of which will output JSON.
Take a look at Piston, it's a mini-framework for Django for creating RESTful APIs.
A recent blog post by Eric Holscher provides some more insight on the PROs of using Piston: Large Problems in Django, Mostly Solved: APIs
It can respond with any kind of data. JSON/XML/PDF/pictures/CSV...
Django itself comes with a set of serializers.
Edit
I just had a look at at Piston — looks promising. Best feature:
Stays out of your way.
:)
Regarding your comment about not liking 3rd party code - that's too bad because the pluggable apps are one of django's greatest features. Like others answered, piston will do most of the work for you.
A little over a year ago, I wrote a REST web service in Django for a large Seattle company that does streaming media on the Internet.
Django was excellent for the purpose. As "a paid nerd" observed, the Django URL config is wonderful: you can set up your URLs just the way you want them, and have it serve up the appropriate objects.
The one thing I didn't like: the Django ORM has absolutely no support for binary BLOBs. If you want to serve up photos or something, you will need to keep them in a file system, and not in a database. Because we were using multiple servers, I had to choose between writing my own BLOB support or finding some replication framework that would keep all the servers up to date with the latest binary data. (I chose to write my own BLOB support. It wasn't very hard, so I was actually annoyed that the Django guys didn't do that work. There should be one, and preferably only one, obvious way to do something.)
I really like the Django ORM. It makes the database part really easy; you don't need to know any SQL. (I don't like SQL and I do like Python, so it's a double win.) The "admin interface", which you get for free, gives you a great way to look through your data, and to poke data in during testing and development.
I recommend Django without reservation.