I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!
TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.
Related
I have a complex model. Let's say it contains 100 entities, all of which are related to each other in some way. Some are many to many, some are one to one, some are many to one, and so on.
These entities all have start and end timestamps indicating valid time ranges. When loading these entities via query, I wish to populate the relationship fields only with entities that have start and end stamps wrapping a given timestamp: for example datetime.now(), or yesterday, or whenever.
I'll define two models here for example, but assume there are a vast number of others:
class User(base):
__tablename__ = 'User'
class Role(base):
__tablename__ = 'Role'
user_id = Column(Integer, ForeignKey('User.uid'))
user = relationship(User, backref=backref('Role')
start = Column(DateTime, default=func.current_timestamp())
end = Column(DateTime))
Now, I want to return entities via restful endpoints in flask. So, a get might look something like this in flask:
def get(self, uid=None) -> Tuple[Dict, int]:
query = User.query
if uid:
query.filter_by(uid=uid)
return create_response(
query.all()
200
)
Now, I want to restrict the Role entities returned as children to the User returned by the above query. Obviously, this could easily be done by just extending the query to filter the Roles. The problem comes when this scales up. Consider 100 nested levels of child relationships. Now consider restful endpoints providing a get for any one of them. It would be practically impossible to write out a query to properly filter every different level of child.
My desired solution was to define loading behavior on each entity, making everything composable. For example:
class User(base):
__tablename__ = 'User'
role = relationship("Role",
primaryjoin="and_(Role.start<={desired_timestamp} "
"Role.end>={desired_timestamp})")
The problem, of course, is that we don't know our desired_timestamp at class definition time as it is passed at runtime. I have thought of some hacks for this such as redefining everything during every runtime, but I'm not happy with them. Does anyone have some insight as to the "right" way to do something like this?
I want to duplicate (copy) an object mapped by SQLAlchemy. It should only copy the data created by me, not all the underliying stuff. It shouldn't copy the primary keys or unique values.
This is usefull when creating new data entries which differ only a little from the last one. So the user doesn't have to enter all data again.
An important requirement is that this need to work when the column name in the table (e.g. name) and the memeber name (e.g. _name) in the python class are not the same.
This (simplified) code work for all declarative_base() derived classes BUT ONLY when the col-name and the member-name are the same.
import sqlalchemy as sa
def DuplicateObject(oldObj):
mapper = sa.inspect(type(oldObj))
newObj = type(oldObj)()
for col in mapper.columns:
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(newObj, col.key, getattr(oldObj, col.key))
return newObj
col.key is the name of the column in the table. When the member name in the python class is different this wouldn't work. I don't know how SQLAlchemy connect the column-name with the member-name. How does SQLA know this connection? How can I take care of it?
import sqlalchemy as sqa
def duplicate_object(old_obj):
# SQLAlchemy related data class?
if not isinstance(old_obj, _Base):
raise TypeError('The given parameter with type {} is not '
'mapped by SQLAlchemy.'.format(type(old_obj)))
mapper = sa.inspect(type(old_obj))
new_obj = type(old_obj)()
for name, col in mapper.columns.items():
# no PrimaryKey not Unique
if not col.primary_key and not col.unique:
setattr(new_obj, name, getattr(old_obj, name))
return new_obj
It looks like that this work. Even when the members have begin with double underlines (__name).
But someone on the SQLA-mailinglist mentioned
It’s not a generalized solution for the whole world, though. It doesn’t take into account columns that are part of unique Index objects or columns that are mentioned in standalone UniqueConstraint objects.
But because the SQLA-docu is (for me!) quite hard to read and understand I am not really sure what happen in that code - especially in the for-construct. What is behind items() and why are there two parameters (name, col)?
I'm not sure what this is called since it is new to me, but here is what I want to do:
I have two tables in my database: TableA and TableB. TableA has pk a_id and another field called a_code. TableB has pk b_id and another field called b_code.
I have these tables mapped in my sqlalchemy code and they work fine. I want to create a third object called TableC that doesn't actually exist in my database, but that contains combinations of a_code and b_code, something like this:
class TableC:
a_code = String
b_code = String
Then I'd like to query TableC like:
TableC.query.filter(and_(
TableC.a_code == x,
TableC.b_code == y)).all()
Question 1) Does this type of thing have a name? 2) How do I do the mapping (using declarative would be nice)?
I don't really have a complete understanding of the query you are trying to express, weather it's a union or a join or some third thing, but that aside, it certainly is possible to map an arbitrary selectable (anything you can pass to a database that returns rows).
I'll start with the assumption that you want some kind of union of TableA and TableB, which would be all of the rows in A, and also all of the rows in B. This is easy enough to change to a different concept if you reveal more information about the shape of the data you are expressing.
We'll start by setting up the real tables, and classes to map them, in the declarative style.
from sqlalchemy import *
import sqlalchemy.ext.declarative
Base = sqlalchemy.ext.declarative.declarative_base()
class TableA(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a_code = Column(String)
class TableB(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
b_code = Column(String)
Since we've used declarative, we don't actually have table instances to work from, which is neccesary for the next part. There are many ways to access the tables, but the way I prefer is to use sqlalchemy mapping introspection methods, since that will work no matter how the class was mapped.
from sqlalchemy.orm.attributes import manager_of_class
a_table = manager_of_class(TableA).mapper.mapped_table
b_table = manager_of_class(TableB).mapper.mapped_table
Next, we need an actual sql expression that represents the data we are interested in.
This is a union, which results in columns that look the same as the columns defined in the first class, id and a_code. We could rename it, but that's not a very important part of the example.
ab_view_sel = sqlalchemy.alias(a_table.select().union(b_table.select()))
Finally, we map a class to this. It is possible to use declarative for this, but it's actually more code to do it that way instead of classic mapping style, not less. Notice that the class inherits from object, not base
class ViewAB(object):
pass
sqlalchemy.orm.mapper(ViewAB, ab_view_sel)
And that's pretty much it. Of course there are some limitations with this; the most obvious being there's no (trivial) way to save instances of ViewAB back to the database.
There isn't really a concept of 'virtual tables', but it is possible to send a single query that 'joins' the data from multiple tables. This is probably as close as you can get to what you want.
For example, one way to do this in sqlalchemy/elixir would be (and this isn't far off from what you've shown, we're just not querying a 'virtual' table):
result = session.query(TableA, TableB).filter(TableA.a_code==x).filter(TableB.b_code==y).all()
This is similar to an SQL inner join, with some qualifying conditions in the filter statements. This isn't going to give you an sqlalchemy table object, but will give you a list of objects from each real table.
It looks like SQLAlchemy allows you to map an arbitrary query to a class. e.g. From SQLAlchemy: one classes – two tables:
usersaddresses = sql.join(t_users, t_addresses,
t_users.c.id == t_addresses.c.user_id)
class UserAddress(object):
def __repr__(self):
return "<FullUser(%s,%s,%s)" % (self.id, self.name, self.address)
mapper(UserAddress, usersaddresses, properties={
'id': [t_users.c.id, t_addresses.c.user_id],
})
f = session.query(UserAddress).filter_by(name='Hagar').one()
After building a few application on the gae platform I usually use some relationship between different models in the datastore in basically every application. And often I find my self the need to see what record is of the same parent (like matching all entry with same parent)
From the beginning I used the db.ReferenceProperty to get my relations going, like:
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
parentFoo = db.ReferanceProperty(Foo)
fooKey = someFooKeyFromSomePlace
bars = Bar.all()
for bar in bar:
if bar.parentFoo.key() == fooKey:
// do stuff
But lately I've abandoned this approch since the bar.parentFoo.key() makes a sub query to fetch Foo each time. The approach I now use is to store each Foo key as a string on Bar.parentFoo and this way I can string compare this with someFooKeyFromSomePlace and get rid of all the subquery overhead.
Now I've started to look at Entity groups and wondering if this is even a better way to go? I can't really figure out how to use them.
And as for the two approaches above I'm wondering is there any downsides to using them? Could using stored key string comeback and bit me in the * * *. And last but not least is there a faster way to do this?
Tip:
replace...
bar.parentFoo.key() == fooKey
with...
Bar.parentFoo.get_value_for_datastore(bar) == fooKey
To avoid the extra lookup and just fetch the key from the ReferenceProperty
See Property Class
I think you should consider this as well. This will help you fetch all the child entities of a single parent.
bmw = Car(brand="BMW")
bmw.put()
lf = Wheel(parent=bmw,position="left_front")
lf.put()
lb = Wheel(parent=bmw,position="left_back")
lb.put()
bmwWheels = Wheel.all().ancestor(bmw)
For more reference in modeling. you can refer this Appengine Data modeling
I'm not sure what you're trying to do with that example block of code, but I get the feeling it could be accomplished with:
bars = Bar.all().filter("parentFoo " = SomeFoo)
As for entity groups, they are mainly used if you want to alter multiple things in transactions, since appengine restricts that to entities within the same group only; in addition, appengine allows ancestor filters ( http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_ancestor ), which could be useful depending on what it is you need to do. With the code above, you could very easily also use an ancestor query if you set the parent of Bar to be a Foo.
If your purposes still require a lot of "subquerying" as you put it, there is a neat prefetch pattern that Nick Johnson outlines here: http://blog.notdot.net/2010/01/ReferenceProperty-prefetching-in-App-Engine which basically fetches all the properties you need in your entity set as one giant get instead of a bunch of tiny ones, which gets rid of a lot of the overhead. However do note his warnings, especially regarding altering the properties of entities while using this prefetch method.
Not very specific, but that's all the info I can give you until you be more specific about exactly what you're trying to do here.
When you design your modules you also need to consider whether you want to be able to save this within a transaction. However only do this if you need to use transactions.
An alternative approach is to assign the parent like so:
from google.appengine.ext import db
class Foo(db.Model):
name = db.StringProperty()
class Bar(db.Model):
name = db.StringProperty()
def _save_entities( foo_name, bar_name ):
"""Save the model data"""
foo_item = Foo( name = foo_name )
foo_item.put()
bar_item = Bar( parent = foo_item, name = bar_name )
bar_item.put()
def main():
# Run the save in a transaction, if any fail this should all roll back
db.run_in_transaction( _save_transaction, "foo name", "bar name" )
# to query the model data using the ancestor relationship
for item in bar_item.gql("WHERE ANCESTOR IS :ancestor", ancestor = foo_item.key()).fetch(1000):
# do stuff
I'm using SQLAlchemy declarative base to define my model. I defined a property name that is computed from one the columns (title):
class Entry(Base):
__tablename__ = "blog_entry"
id = Column(Integer, primary_key=True)
title = Column(Unicode(255))
...
#property
def name(self):
return re.sub(r'[^a-zA-Z0-9 ]','',self.title).replace(' ','-').lower()
When trying to perform a query using name, SQLAlchemy throws an error:
Session.query(Entry).filter(Entry.name == my_name).first()
>>> ArgumentError: filter() argument must be of type sqlalchemy.sql.ClauseElement or string
After investigating for a while, I found that maybe comparable_using() could help, but I couldn't find any example that shows a comparator that references another column of the table.
Is this even possible or is there a better approach?
From SqlAlchemy 0.7 you can achieve this using hybrid_property
see the docs here: http://www.sqlalchemy.org/docs/orm/extensions/hybrid.html
Can you imagine what SQL should be issued for your query? The database knows nothing about name, it has neither a way to calculate it, nor to use any index to speed up the search.
My best bet is a full scan, fetching title for every record, calculating name then filtering by it. You can rawly do it by [x for x in Session.query(Entry).all() if x.name==my_name][0]. With a bit more of sophistication, you'll only fetch id and title in the filtering pass, and then fetch the full record(s) by id.
Note that a full scan is usually not nice from performance POV, unless your table is quite small.