Pretty recent (but not newborn) to both Python, SQLAlchemy and Postgresql, and trying to understand inheritance very hard.
As I am taking over another programmer's code, I need to understand what is necessary, and where, for the inheritance concept to work.
My questions are:
Is it possible to rely only on SQLAlchemy for inheritance? In other words, can SQLAlchemy apply inheritance on Postgresql database tables that were created without specifying INHERITS=?
Is the declarative_base technology (SQLAlchemy) necessary to use inheritance the proper way. If so, we'll have to rewrite everything, so please don't discourage me.
Assuming we can use Table instance, empty Entity classes and mapper(), could you give me a (very simple) example of how to go through the process properly (or a link to an easily understandable tutorial - I did not find any easy enough yet).
The real world we are working on is real estate objects. So we basically have
- one table immobject(id, createtime)
- one table objectattribute(id, immoobject_id, oatype)
- several attribute tables: oa_attributename(oa_id, attributevalue)
Thanks for your help in advance.
Vincent
Welcome to Stack Overflow: in the future, if you have more than one question; you should provide a separate post for each. Feel free to link them together if it might help provide context.
Table inheritance in postgres is a very different thing and solves a different set of problems from class inheritance in python, and sqlalchemy makes no attempt to combine them.
When you use table inheritance in postgres, you're doing some trickery at the schema level so that more elaborate constraints can be enforced than might be easy to express in other ways; Once you have designed your schema; applications aren't normally aware of the inheritance; If they insert a row; it just magically appears in the parent table (much like a view). This is useful, for instance, for making some kinds of bulk operations more efficient (you can just drop the table for the month of january).
This is a fundamentally different idea from inheritance as seen in OOP (in python or otherwise, with relational persistence or otherwise). In that case, the application is aware that two types are related, and that the subtype is a permissible substitute for the supertype. "A holding is an address, a contact has an address therefore a contact can have a holding."
Which of these, (mostly orthogonal) tools you need depends on the application. You might need neither, you might need both.
Sqlalchemy's mechanisms for working with object inheritance is flexible and robust, you should use it in favor of a home built solution if it is compatible with your particular needs (this should be true for almost all applications).
The declarative extension is a convenience; It allows you to describe the mapped table, the python class and the mapping between the two in one 'thing' instead of three. It makes your code more "DRY"; It is however only a convenience layered on top of "classic sqlalchemy" and it isn't necessary by any measure.
If you find that you need table inheritance that's visible from sqlalchemy; your mapped classes won't be any different from not using those features; tables with inheritance are still normal relations (like tables or views) and can be mapped without knowledge of the inheritance in the python code.
For your #3, you don't necessarily have to declare empty entity classes to use mapper. If your application doesn't need fancy properties, you can just use introspection and metaclasses to model the existing tables without defining them. Here's what I did:
mymetadata = sqlalchemy.MetaData()
myengine = sqlalchemy.create_engine(...)
def named_table(tablename):
u"return a sqlalchemy.Table object given a SQL table name"
return sqlalchemy.Table(tablename, mymetadata, autoload=True, autoload_with=myengine)
def new_bound_class(engine, table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a sqlalchemy.Table object"
fieldnames = table.c.__dict__['_data']
def format_attributes(obj, transform):
attributes = [u'%s=%s' % (x, transform(x)) for x in fieldnames]
return u', '.join(attributes)
class DynamicORMClass(object):
def __init__(self, **kw):
u"Keyword arguments may be used to initialize fields/columns"
for key in kw:
if key in fieldnames: setattr(self, key, kw[key])
else: raise KeyError, '%s is not a valid field/column' % (key,)
def __repr__(self):
return u'%s(%s)' % (self.__class__.__name__, format_attributes(self, repr))
def __str__(self):
return u'%s(%s)' % (str(self.__class__), format_attributes(self, str))
DynamicORMClass.__doc__ = u"This is a dynamic class created using SQLAlchemy based on table %s" % (table,)
return sqlalchemy.orm.mapper(DynamicORMClass, table)
def named_orm_class(table):
u"returns a new ORM class (processed by sqlalchemy.orm.mapper) given a table name or object"
if not isinstance(table, Table):
table = named_table(table)
return new_bound_class(table)
Example of use:
>>> myclass = named_orm_class('mytable')
>>> session = Session()
>>> obj = myclass(name='Fred', age=25, ...)
>>> session.add(obj)
>>> session.commit()
>>> print str(obj) # will print all column=value pairs
I beefed up my versions of new_bound_class and named_orm_class a little more with decorators, etc. to provide extra capabilities, and you can too. Of course, under the covers, it is declaring an empty entity class. But you don't have to do it, except this one time.
This will tide you over until you decide that you're tired of doing all those joins yourself, and why can't I just have an object attribute that does a lazy select query against related classes whenever I use it. That's when you make the leap to declarative (or Elixir).
Related
I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!
TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.
I am writing a Django app, which will send some data from the site to a python script to process. I am planning on sending this data as a JSON string (this need not be the case). Some of the values sent over would ideally be class instances, however this is clearly not possible, and the class name plus any arguments needed to initialize the class must some how be serialized into a JSON value before then being deserialized by the python script. This could be achieved with the code below, but it has several problems:
My attempt
I have put all the data needed for each class, in a list and used that to initialize each class:
import json
class Class1():
def __init__(self, *args, **kwargs):
for k, v in kwargs.items():
setattr(self, k, v)
self._others = args
class Bar():
POTENTIAL_OBJECTS = {"RANGE": range,
"Class1": Class1}
def __init__(self, json_string):
python_dict = json.loads(json_string)
for key, value in python_dict.items():
if isinstance(value, list) and value[0] in Bar.POTENTIAL_OBJECTS:
setattr(self, key, Bar.POTENTIAL_OBJECTS[value[0]](*value[1], **value[2]))
else:
setattr(self, key, value)
example = ('{ "key_1":"Some string", "key_2":["heres", "a", "list"],'
'"key_3":["RANGE", [10], {}], "key_4":["Class1", ["stuff"], {"stuff2":"x"}] }')
a = Bar(example)
The Problems with my approach
Apart from generally being a bit messy and not particularly elegant, there are other problems. Some of the lists in the JSON object will be generated by the user, and this obviously presents problems if the user uses a key from POTENTIAL_OBJECTS. (In a non-simplified version, Bar will have lots of subclasses, each with a second POTENTIAL_OBJECTS so keeping track of all the potential values for front-end validation would be tricky).
My Question
It feels like this must be a reasonably common thing that is needed and there must be some standard patterns or ways of achieving this. Is there a common/better approach/method to achieve this?
EDIT: I have realised, one way round the problem is to make all the keys in POTENTIAL_OBJECTS start with an underscore, and then validate against any underscores in user-inputs at the front-end. It still seems like there must be a better way to de-serialize from JSON to more complex objects than strings/ints/bools/lists etc.
Instead of having one master method to turn any arbitrary JSON into an arbitrary hierarchy of Python objects, the typical pattern would be to create a Django model for each type of thing you are trying to model. Relationships between them would then be modeled via relationship fields (ForeignKey, ManyToMany, etc, as appropriate). For instance, you might create a class Employee that models an employee, and a class Paycheck. Paycheck could then have a ForeignKey field named issued_to that refers to an Employee.
Note also that any scheme similar to the one you describe (where user-created JSON is translated directly into arbitrary Python objects) would have security implications, potentially allowing users to execute arbitrary code in the context of the Django server, though if you were to attempt it, the whitelist approach have started here would be a decent place to start as a way to do it safely.
In short, you're reinventing most of what Django already does for you. The Django ORM features will help you to create models of the specific things you are interested in, validate the data, turn those data into Python objects safely, and even save instances of these models in the database for retrieval later.
That said, if you are to parse a JSON string directly into an object hierarchy, you would have to do a full traversal instead of just going over the top-level items. To do that, you should look into doing something like a depth-first traversal, creating new model instances at each new node in the hierarchy. If you want to validate these inputs server-side, you'd need to replicate this work in Javascript as well.
How can I make a relationship without having a foreign key?
#declared_attr
def custom_stuff(cls):
joinstr = 'foreign(Custom.name) == "{name}"'.format(name=cls.__name__)
return db.relationship('Custom', primaryjoin=joinstr)
This raises an error:
ArgumentError: Could not locate any simple equality expressions involving locally mapped foreign key columns for primary join condition
This works, but I think it's a pretty ugly hack.
#declared_attr
def custom_stuff(cls):
joinstr = 'or_(
and_(foreign(Custom.name) == MyTable.title,
foreign(Custom.name) != MyTable.title),
foreign(Custom.name) == "{name}")'.format(name=cls.__name__)
return db.relationship('Custom', primaryjoin=joinstr)
Is there a better way to do this?
EDIT: the extra attribute needs to be added as #declared_attr and has to use a relationship, since our serializer is written so it works with declarred attrs.
Doing this with #hybrid_property or something else would work, but then our json serializer would break. Getting that to work seems harder than defining a relationship.
You don't necessarily have to define relationships when creating tables in a database (this applies for almost every SQL). You can still join tables that don't have a foreign-key relation predefined (the main reason for foreign keys is to enforce data consistency, not to define what you can join or not).
See this for reference - http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.join
If you would still like to "show" the database and table structure/model, then better use some entity relationship modeler like ERwin (or some diagramming software).
Maybe you should point to the meaning of "relationship" word. It states that some thing "depends" on some other thing or both depend each other.
If you're trying to define a relationship on the ERM (Entity Relationship Model) that means you should state which one of the entities "depends" on the other one. Also, databases have some hacks to deal faster with tables relating each other than just simple tables that are related on an upper abstract way.
Is there any reason why you need to do that?
I've just noticed that in a stackoverflow reply sqlalchemy's creator Mike Bayer (zzzeek) said that "Custom metaclasses aren't needed with Declarative for simple use cases, and pretty much not at all for hard use cases either. Mixins + custom bases should be able to do pretty much everything." This raised my hopes that something I've been doing in sqlalchemy using a custom metaclass with declarative_base can be done using mixins + custom bases instead. However, I have not been able to make it work. I would greatly appreciate any hints!
I have a mixin ConciseRelationshipMixin that enables the creation of relationships using a concise syntax. It is used as follows:
class Eggs(ConciseRelationshipMixin, OtherMixin1, OtherMixin2):
pass
Spam = declarative_base(cls=Eggs)
class Ham(Spam):
some_column = Column(Integer)
crispy_bacon = ConciseRelationship(Bacon)
This creates an extra column in Ham.__table__ to hold the FK of the relationship, as well as the relationship itself under the name Ham.bar, using project-specific conventions for the name of the extra column and the primaryjoin. I really like this mixin because it makes the definition of Ham look very clear and uncluttered.
The reason I have not been able to implement this without using metaclasses is that I am trying to convert the assigment crispy_bacon = ... into two assigments column_for_crispy_bacon = Column(...) and crispy_bacon = relationship(...), I could not see any other way to do that.
I am also open to replies of the form "you should not be doing this kind of thing, because ...".
I want to make attributes of GAE Model properties. The reason is for cases like to turn the value into uppercase before storing it. For a plain Python class, I would do something like:
Foo(db.Model):
def get_attr(self):
return self.something
def set_attr(self, value):
self.something = value.upper() if value != None else None
attr = property(get_attr, set_attr)
However, GAE Datastore have their own concept of Property class, I looked into the documentation and it seems that I could override get_value_for_datastore(model_instance) to achieve my goal. Nevertheless, I don't know what model_instance is and how to extract the corresponding field from it.
Is overriding GAE Property classes the right way to provides getter/setter-like functionality? If so, how to do it?
Added:
One potential issue of overriding get_value_for_datastore that I think of is it might not get called before the object was put into datastore. Hence getting the attribute before storing the object would yield an incorrect value.
Subclassing GAE's Property class is especially helpful if you want more than one "field" with similar behavior, in one or more models. Don't worry, get_value_for_datastore and make_value_from_datastore are going to get called, on any store and fetch respectively -- so if you need to do anything fancy (including but not limited to uppercasing a string, which isn't actually all that fancy;-), overriding these methods in your subclass is just fine.
Edit: let's see some example code (net of imports and main):
class MyStringProperty(db.StringProperty):
def get_value_for_datastore(self, model_instance):
vv = db.StringProperty.get_value_for_datastore(self, model_instance)
return vv.upper()
class MyModel(db.Model):
foo = MyStringProperty()
class MainHandler(webapp.RequestHandler):
def get(self):
my = MyModel(foo='Hello World')
k = my.put()
mm = MyModel.get(k)
s = mm.foo
self.response.out.write('The secret word is: %r' % s)
This shows you the string's been uppercased in the datastore -- but if you change the get call to a simple mm = my you'll see the in-memory instance wasn't affected.
But, a db.Property instance itself is a descriptor -- wrapping it into a built-in property (a completely different descriptor) will not work well with the datastore (for example, you can't write GQL queries based on field names that aren't really instances of db.Property but instances of property -- those fields are not in the datastore!).
So if you want to work with both the datastore and for instances of Model that have never actually been to the datastore and back, you'll have to choose two names for what's logically "the same" field -- one is the name of the attribute you'll use on in-memory model instances, and that one can be a built-in property; the other one is the name of the attribute that ends up in the datastore, and that one needs to be an instance of a db.Property subclass and it's this second name that you'll need to use in queries. Of course the methods underlying the first name need to read and write the second name, but you can't just "hide" the latter because that's the name that's going to be in the datastore, and so that's the name that will make sense to queries!
What you want is a DerivedProperty. The procedure for writing one is outlined in that post - it's similar to what Alex describes, but by overriding get instead of get_value_for_datastore, you avoid issues with needing to write to the datastore to update it. My aetycoon library has it and other useful properties included.