Dynamic property vs. fixed property query speeds on Google App Engine

Dynamic property vs. fixed property query speeds on Google App Engine - python

I'm designing datastore models and trying to decide the best approach when thinking about how the query filters are going to work.
Best if I just write out two examples. The first example is if I have a fixed "gender" property with just a string which can be set to either "male" or "female".
class Person(db.Model):
name = db.StringProperty()
gender = db.StringProperty()
p1 = Person(name="Steve", gender="male")
p2 = Person(name="Jane", gender="female")
p1.put()
p2.put()
males = db.GqlQuery("SELECT * FROM Person WHERE gender = :1", "male")
The second example is if the Person entity is an expando model, and I dynamically set a "is_male" or "is_female" dynamic property.
class Person(db.Expando):
name = db.StringProperty()
p1 = Person(name="Steve")
p1.is_male = True
p1.put()
p2 = Person(name="Jane")
p2.is_female = True
p2.put()
males = db.GqlQuery("SELECT * FROM Person WHERE is_male = :1", True)
Now lets say that we gather millions of records and we want to do a query, which one of the two methods above would be faster in production Google App Engine running Python 2.7?

There is absolutely no difference - models look the same in the datastore regardless of whether the property was 'dynamic' or not. The only difference is that a standard property class with no value set will insert a field with value None in the datastore, which takes some extra space, but allows you to query for users with that value not set.

even if its not exactly what you are asking for i think it might give you an idea.
Is there some performance issue between leaving empty ListProperties or using dynamic (expando) properties?

Related

How can I use "class" as an enum value in Python/SQLAlchemy?

I have a model in SQLAlchemy of which one column is an enum. Wanting to stay as close to vanilla SQLAlchemy and the Python3 stdlib as possible, I've defined the enum against the stdlib's enum.Enum class, and then fed that to SQLAlchemy using its sqlalchemy.Enum class (as recommended somewhere in the SQLAlchemy documentation.)
class TaxonRank(enum.Enum):
domain = "domain"
kingdom = "kingdom"
phylum = "phylum"
class_ = "class"
order = "order"
family = "family"
genus = "genus"
species = "species"
And in the model:
rank = sqlalchemy.Column(sqlalchemy.Enum(TaxonRank), name = "rank", nullable = False)
This works well, except for forcing me to use class_ instead of class for one of the enum values (naturally to avoid conflict with the Python keyword; it's illegal syntax to attempt to access TaxonRank.class.)
I don't really mind using class_, but the issue I'm having is that class_ is the value that ends up getting stored in the database. This, in turn, is causing me issues with my CRUD API, wherein I allow the user to do things like "filter on rank where rank ends with ss." Naturally this doesn't match anything because the value actually ends with ss_!
For record display I've been putting in some hacky case-by-case translation to always show the user class in place of class_. Doing something similar with sorting and filtering, however, is more tricky because I do both of those at the SQL level.
So my question: is there a good way around this mild annoyance? I don't really care about accessing TaxonRank.class_ in my Python, but perhaps there's a way to subclass the stdlib's enum.Enum to force the string representation of the class_ attribute (and thus the value that actually gets stored in the database) to the desired class?

Thanks to Sergey Shubin for pointing out to me an alternative form for defining an enum.Enum.
TaxonRank = enum.Enum("TaxonRank", [
("domain", "domain"),
("kingdom", "kingdom"),
("phylum", "phylum"),
("class", "class"),
("order", "order"),
("family", "family"),
("genus", "genus"),
("species", "species")
])

I have been working on an interface for a Russian and English database. I am using postgresql, but it will probably work for any brand X enumeration. This is the solution solution:
In mymodel.py:
from sqlalchemy.dialects.postgresql import ENUM
from .meta import Base
from enum import Enum
class NounVar(Enum):
abstract = 1
proper = 2
concrete = 3
collective = 4,
compound = 5
class Nouns(Base):
__tablename__ = 'nouns'
id = Column(Integer, primary_key=True)
name = Column(Text)
runame = Column(Text)
variety = Column("variety", ENUM(NounVar, name='variety_enum'))
And then further in default.py:
from .models.mymodel import Nouns
class somecontainer():
def somecallable():
page = Nouns(
name="word",
runame="слово",
variety=NounVar().concrete))
self.request.dbsession.add(page)
I hope it works for you.

GAE structuring models without denormalizing

I'm trying restructure a relational database for Google App Engine and I'm having issues with modeling a set of relationships in a way that will let me query the data I need in an efficient manner.
As a contrived example, say I have the following models:
class Rental(db.Model):
# ancestor of one or more RentalDatas
date = db.DateProperty()
location = db.IntegerProperty()
# + customer, etc
class RentalData(db.Model):
# a Rental is our parent
unicorn = db.ReferenceProperty(Unicorn, collection_name='rentals')
mileage_in = db.FloatProperty()
mileage_out = db.FloatProperty()
# + returned_fed, etc
class Unicorn(db.Model):
name = db.StringProperty()
color = db.IntegerProperty()
# 'rentals' collection from RentalData
Each rental can include multiple unicorns, so I'd rather not combine the Rental and RentalData models if I can help it. I'll almost always be starting with a given Unicorn, and I'd like to be able to query, say, whether the unicorn was returned fed on the last x rentals, without having to iterate through every RentalData in the unicorn's rentals collection and query for the parent, just to get the date to sort on.
Do I just have to bite the bullet and de-normalize? I've tried duplicating date in RentalData, and using parent() as needed to get the corresponding Rental when I need properties from it. It works, but I'm worried I'm just taking the easy way out.

SQLAlchemy: How do I get an object from a relationship by object's PK?

Suppose I have a one-to-many relationship like this:
class Book(Base):
__tablename__ = "books"
id = Column(Integer)
...
library_id = Column(Integer, ForeignKey("libraries.id"))
class Library(Base):
__tablename__ = "books"
id = Column(Integer)
...
books = relationship(Book, backref="library")
Now, if I have an ID of a book, is there a way to retrieve it from the Library.books relationship, "get me a book with id=10 in this particular library"? Something like:
try:
the_book = some_library.books.by_primary_key(10)
except SomeException:
print "The book with id 10 is not found in this particular library"
Workarounds I can think of (but which I'd rather avoid using):
book = session.query(Book).get(10)
if book and book.library_id != library_id:
raise SomeException("The book with id 10 is not found in this particular library")
or
book = session.query(Book).filter(Book.id==10).filter(Book.library_id=library.id).one()
Reason: imagine there are several different relationships (scifi_books, books_on_loan etc.) which specify different primaryjoin conditions - manually querying would require writing individual queries for all of them, while SQLAlchemy already knows how to retrieve items for that relationship. Also, I'd prefer to load the books all at once (by accessing library.books) than issuing individual queries.
Another option, which works but is inefficient and inelegant is:
for b in library.books:
if b.id == book_id:
return b
What I'm currently using is:
library_books = {b.id:b for b in library.books}
for data in list_of_dicts_containing_book_id:
if data['id'] in library_books:
library_books[data['id']].do_something(data)
else:
print "Book %s is not in the library" % data['id']
I just hope there's a nicer built-in way of quickly retrieving items from a relationship by their id
UPD: I've asked the question in the sqlalchemy mail list.

SQLAlchemy's query object has with_parent method which does exactly that:
with_parent(instance, property=None)
Add filtering criterion that relates the given instance to a child object or collection, using its attribute state as well as an established relationship() configuration.
so in my example the code would look like
q = session.query(Book)
q = q.with_parent(my_library, "scifi_books")
q = q.filter(Book.id==10).one()
This will issue a separate query though, even if the my_library.scifi_books relation is already loaded. There seems to be no "built-in" way to retrieve an item from an already-loaded relation by its PK, so the easiest is to just convert the relation to a dict and use that to look up items:
book_lookup = {b.id: b for b in my_library.scifi_books}
book = books_lookup[10]

See SQLAlchemy docs on querying with joins. So you want something like this (be aware that this is untested):
query(Book, Library). \
filter(Book.id==10). \
filter(Book.library.id==needed_library_id).all()
If Book -> Library reference would be scalar, you could use has():
query.filter(Library.books.has(id=10))
To make batch queries for multiple books at once, you can use in_() operator:
query(Library).join('books', Book).filter(Book.id.in_([1, 2, 10])).all()

Entity groups, ReferenceProperty or key as a string

After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?

Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class

I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling

I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.

When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff

Filtering models with ReferenceProperties

I'm using google app engine, and am having trouble writing querys to filter ReferenceProperties.
eg.
class Group(db.Model):
name = db.StringProperty(required=True)
creator = db.ReferenceProperty(User)
class GroupMember(db.Model):
group = db.ReferenceProperty(Group)
user = db.ReferenceProperty(User)
And I have tried writing something like this:
members = models.GroupMember.all().filter('group.name =', group_name)
and various other things that don't work. Hopefully someone can give me a prod in the right direction...

If your groups are uniquely named, then your "group.name" is a unique identifier of a Group entity.
That means you can write:
members = models.GroupMember.all().filter(
"group =",model.Group.gql("WHERE name=:1", group_name).get()
)
though you only need to do that if you don't already have the group entity lying around in the stack somewhere.
Google's essay on many-to-many with appengine is here.

If what you want is to get the members of a group, ReferenceProperties have that built-in.
class GroupMember(db.Model):
group = db.ReferenceProperty(Group, collection_name="groupMembers")
user = db.ReferenceProperty(User, collection_name="groupMembers")
Then you can write:
# get the group entity somehow
group = Group.get(group_key)
# do something with the members, such as list the nicknames
nicknames = [x.user.nickname for x in group.groupMembers]

This would require a join, which isn't possible in App Engine. If you want to filter by a property of another model, you need to include that property on the model you're querying against.

This would result in two datastore hits but should work. If you use memcache shouldnt be a problem.
group = models.Group.all().filter("name =", group_name).get()
members = models.GroupMember.all().filter('group =', group)

Using the Models that you defined in your question, lets say you want to list all members of a group called "Space monkeys".
mygroup = Group.gql("WHERE name = :1",'Space monkeys')
for group_member in mygroup.groupmember_set:
print 'group members name is: %s' % (group_member.user.name)
The "groupmember_set" is called a "implicit collection property" and is very useful. You can call it whatever you want by over-riding the default name using the collection_name keyword parameter to ReferenceProperty. For an example see the answer by Thomas L Holaday.
This is all explained in a very good paper by Rafe Kapla: Modeling Entity Relationships.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamic property vs. fixed property query speeds on Google App Engine - python

even if its not exactly what you are asking for i think it might give you an idea. Is there some performance issue between leaving empty ListProperties or using dynamic (expando) properties?

Related

How can I use "class" as an enum value in Python/SQLAlchemy?

GAE structuring models without denormalizing

SQLAlchemy: How do I get an object from a relationship by object's PK?

Entity groups, ReferenceProperty or key as a string

Filtering models with ReferenceProperties

Categories

Resources