I'm not sure what this is called since it is new to me, but here is what I want to do:
I have two tables in my database: TableA and TableB. TableA has pk a_id and another field called a_code. TableB has pk b_id and another field called b_code.
I have these tables mapped in my sqlalchemy code and they work fine. I want to create a third object called TableC that doesn't actually exist in my database, but that contains combinations of a_code and b_code, something like this:
class TableC:
a_code = String
b_code = String
Then I'd like to query TableC like:
TableC.query.filter(and_(
TableC.a_code == x,
TableC.b_code == y)).all()
Question 1) Does this type of thing have a name? 2) How do I do the mapping (using declarative would be nice)?
I don't really have a complete understanding of the query you are trying to express, weather it's a union or a join or some third thing, but that aside, it certainly is possible to map an arbitrary selectable (anything you can pass to a database that returns rows).
I'll start with the assumption that you want some kind of union of TableA and TableB, which would be all of the rows in A, and also all of the rows in B. This is easy enough to change to a different concept if you reveal more information about the shape of the data you are expressing.
We'll start by setting up the real tables, and classes to map them, in the declarative style.
from sqlalchemy import *
import sqlalchemy.ext.declarative
Base = sqlalchemy.ext.declarative.declarative_base()
class TableA(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
a_code = Column(String)
class TableB(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
b_code = Column(String)
Since we've used declarative, we don't actually have table instances to work from, which is neccesary for the next part. There are many ways to access the tables, but the way I prefer is to use sqlalchemy mapping introspection methods, since that will work no matter how the class was mapped.
from sqlalchemy.orm.attributes import manager_of_class
a_table = manager_of_class(TableA).mapper.mapped_table
b_table = manager_of_class(TableB).mapper.mapped_table
Next, we need an actual sql expression that represents the data we are interested in.
This is a union, which results in columns that look the same as the columns defined in the first class, id and a_code. We could rename it, but that's not a very important part of the example.
ab_view_sel = sqlalchemy.alias(a_table.select().union(b_table.select()))
Finally, we map a class to this. It is possible to use declarative for this, but it's actually more code to do it that way instead of classic mapping style, not less. Notice that the class inherits from object, not base
class ViewAB(object):
pass
sqlalchemy.orm.mapper(ViewAB, ab_view_sel)
And that's pretty much it. Of course there are some limitations with this; the most obvious being there's no (trivial) way to save instances of ViewAB back to the database.
There isn't really a concept of 'virtual tables', but it is possible to send a single query that 'joins' the data from multiple tables. This is probably as close as you can get to what you want.
For example, one way to do this in sqlalchemy/elixir would be (and this isn't far off from what you've shown, we're just not querying a 'virtual' table):
result = session.query(TableA, TableB).filter(TableA.a_code==x).filter(TableB.b_code==y).all()
This is similar to an SQL inner join, with some qualifying conditions in the filter statements. This isn't going to give you an sqlalchemy table object, but will give you a list of objects from each real table.
It looks like SQLAlchemy allows you to map an arbitrary query to a class. e.g. From SQLAlchemy: one classes – two tables:
usersaddresses = sql.join(t_users, t_addresses,
t_users.c.id == t_addresses.c.user_id)
class UserAddress(object):
def __repr__(self):
return "<FullUser(%s,%s,%s)" % (self.id, self.name, self.address)
mapper(UserAddress, usersaddresses, properties={
'id': [t_users.c.id, t_addresses.c.user_id],
})
f = session.query(UserAddress).filter_by(name='Hagar').one()
Related
I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!
TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.
I'm using python sqlite3 api to create a database.
In all examples I saw on the documentation table names and colum names are hardcoded inside queries..but this could be a potential problem if I re-use the same table multiple times (ie, creating table, inserting records into table, reading data from table, alter table and so on...) because In case of table modification I need to change the hardcoded names in multiple places and this is not a good programming practice..
How can I solve this problem?
I thought creating a class with just constructor method in order to store all this string names..and use it inside the class that will operation on database..but as I'm not an expert python programmer I would like to share my thoughts...
class TableA(object):
def __init__(self):
self.table_name = 'tableA'
self.name_col1 = 'first_column'
self.type_col1='INTEGER'
self.name_col2 = 'second_column'
self.type.col2 = 'TEXT'
self.name_col3 = 'third_column'
self.type_col3 = 'BLOB'
and then inside the DB classe
table_A = TableA()
def insert_table(self):
conn = sqlite3.connect(self._db_name)
query = 'INSERT INTO ' + table_A.table_name + ..... <SNIP>
conn.execute(query)
Is this a proper way to proceed?
I don't know what's proper but I can tell you that it's not conventional.
If you really want to structure tables as classes, you could consider an object relational mapper like SQLAlchemy. Otherwise, the way you're going about it, how do you know how many column variables you have? What about storing a list of 2-item lists? Or a list of dictionaries?
self.column_list = []
self.column_list.append({'name':'first','type':'integer'})
The way you're doing it sounds pretty novel. Check out their code and see how they do it.
If you are going to start using classes to provide an abstraction layer for your database tables, you might as well start using an ORM. Some examples are SQLAlchemy and SQLObject, both of which are extremely popular.
Here's a taste of SQLAlchemy:
from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
Base = declarative_base()
class TableA(Base):
__tablename__ = 'tableA'
id = Column(Integer, primary_key=True)
first_column = Column(Integer)
second_column = Column(String)
# etc...
engine = create_engine('sqlite:///test.db')
Base.metadata.bind = engine
session = sessionmaker(bind=engine)()
ta = TableA(first_column=123, second_column='Hi there')
session.add(ta)
session.commit()
Of course you would choose semantic names for the table and columns, but you can see that declaring a table is something along the lines of what you were proposing in your question, i.e. using a class. Inserting records is simplified by creating instances of that class.
I personally don't like to use libraries and frameworks without proper reason. So, if I'd such reason, so will write a thinking wrapper around sqlite.
class Column(object):
def __init__(self, col_name="FOO", col_type="INTEGER"):
# standard initialization
And then table class that encapsulates operations with database
class Table(object):
def __init__(self, list_of_columns, cursor):
#initialization
#create-update-delete commands
In table class you can encapsulate all operations with the database you want.
Suppose I have a one-to-many relationship like this:
class Book(Base):
__tablename__ = "books"
id = Column(Integer)
...
library_id = Column(Integer, ForeignKey("libraries.id"))
class Library(Base):
__tablename__ = "books"
id = Column(Integer)
...
books = relationship(Book, backref="library")
Now, if I have an ID of a book, is there a way to retrieve it from the Library.books relationship, "get me a book with id=10 in this particular library"? Something like:
try:
the_book = some_library.books.by_primary_key(10)
except SomeException:
print "The book with id 10 is not found in this particular library"
Workarounds I can think of (but which I'd rather avoid using):
book = session.query(Book).get(10)
if book and book.library_id != library_id:
raise SomeException("The book with id 10 is not found in this particular library")
or
book = session.query(Book).filter(Book.id==10).filter(Book.library_id=library.id).one()
Reason: imagine there are several different relationships (scifi_books, books_on_loan etc.) which specify different primaryjoin conditions - manually querying would require writing individual queries for all of them, while SQLAlchemy already knows how to retrieve items for that relationship. Also, I'd prefer to load the books all at once (by accessing library.books) than issuing individual queries.
Another option, which works but is inefficient and inelegant is:
for b in library.books:
if b.id == book_id:
return b
What I'm currently using is:
library_books = {b.id:b for b in library.books}
for data in list_of_dicts_containing_book_id:
if data['id'] in library_books:
library_books[data['id']].do_something(data)
else:
print "Book %s is not in the library" % data['id']
I just hope there's a nicer built-in way of quickly retrieving items from a relationship by their id
UPD: I've asked the question in the sqlalchemy mail list.
SQLAlchemy's query object has with_parent method which does exactly that:
with_parent(instance, property=None)
Add filtering criterion that relates the given instance to a child object or collection, using its attribute state as well as an established relationship() configuration.
so in my example the code would look like
q = session.query(Book)
q = q.with_parent(my_library, "scifi_books")
q = q.filter(Book.id==10).one()
This will issue a separate query though, even if the my_library.scifi_books relation is already loaded. There seems to be no "built-in" way to retrieve an item from an already-loaded relation by its PK, so the easiest is to just convert the relation to a dict and use that to look up items:
book_lookup = {b.id: b for b in my_library.scifi_books}
book = books_lookup[10]
See SQLAlchemy docs on querying with joins. So you want something like this (be aware that this is untested):
query(Book, Library). \
filter(Book.id==10). \
filter(Book.library.id==needed_library_id).all()
If Book -> Library reference would be scalar, you could use has():
query.filter(Library.books.has(id=10))
To make batch queries for multiple books at once, you can use in_() operator:
query(Library).join('books', Book).filter(Book.id.in_([1, 2, 10])).all()
I'm using SQLAlchemy and I what I liked with Django ORM was the Manager I could implement to override the initial query of an object.
Is something like this exist in SQLAlchemy? I'd like to always exclude items that have "visible = False", when I do something like :
session.query(BlogPost).all()
Is it possible?
Thanks!
EDIT: the original version almost worked. The following version actually works.
It sounds like what you're trying to do is arrange for the query entity to be something other than SELECT table.* FROM table. In sqlalchemy, you can map any "selectable" to a class; There are some caveats, though; if the selectable is not a table, inserting data can be tricky. Something like this approaches a workable solution. You probably do want to have a regular table mapped to permit inserts, so the first part is a totally normal table, class and mapper.
blog_post_table = Table("blog_posts", metadata,
Column('id', Integer, primary_key=True),
Column('visible', Boolean, default=True),
...
)
class BlogPost(object):
pass
blog_post_mapper = mapper(BlogPost, blog_post_table)
Or, if you were using the declarative extension, it'll all be one
class BlogPost(Base):
__tablename__ = 'blog_posts'
id = Column(Integer, primary_key=True)
visible = Column(Boolean, default=True)
Now, we need a select expression to represent the visible posts.
visible_blog_posts_expr = sqlalchemy.sql.select(
[BlogPost.id,
BlogPost.visible]) \
.where(BlogPost.visible == True) \
.alias()
Or, since naming all of the columns of the desired query is tedious (not to mention in violation of DRY), you can use the same construct as session.query(BlogPost) and extract the 'statement'. You don't actually want it bound to a session, though, so call the class directly.
visible_blog_posts_expr = \
sqlalchemy.orm.Query(BlogPost) \
.filter(BlogPost.visible == True) \
.statement \
.alias()
And we map that too.
visible_blog_posts = mapper(BlogPost, visible_blog_posts_expr, non_primary=True)
You can then use the visible_blog_posts mapper instead of BlogPosts with Session.query, and you will still get BlogPost, which can be updated and saved as normal.
posts = session.query(visible_blog_posts).all()
assert all(post.visible for post in posts)
For this particular example, there's not much difference between explicit mapper use and declarative extension, you still must call mapper for the non-primary mappings. At best, it allows you to type SomeClass.colname instead of some_table.c.colname (or SomeClass.__table__.colname, or BlogPost.metadata.tables[BlogPost.__tablename__] or ... and so on).
The mistakes I made in the original example, which are now corrected. I was missing some missing []'s in the call to sqlalchemy.sql.select, which expects the columns to be in a sequence. when using a select statement to mapper, sqlalchemy insists that the statement be aliased, so that it can be named (SELECT .... ) AS some_subselect_alias_5
You can do e.g.
session.query(BlogPost).filter_by(visible=True)
which should give you just the posts you need.
I'm using SQLAlchemy declarative base to define my model. I defined a property name that is computed from one the columns (title):
class Entry(Base):
__tablename__ = "blog_entry"
id = Column(Integer, primary_key=True)
title = Column(Unicode(255))
...
#property
def name(self):
return re.sub(r'[^a-zA-Z0-9 ]','',self.title).replace(' ','-').lower()
When trying to perform a query using name, SQLAlchemy throws an error:
Session.query(Entry).filter(Entry.name == my_name).first()
>>> ArgumentError: filter() argument must be of type sqlalchemy.sql.ClauseElement or string
After investigating for a while, I found that maybe comparable_using() could help, but I couldn't find any example that shows a comparator that references another column of the table.
Is this even possible or is there a better approach?
From SqlAlchemy 0.7 you can achieve this using hybrid_property
see the docs here: http://www.sqlalchemy.org/docs/orm/extensions/hybrid.html
Can you imagine what SQL should be issued for your query? The database knows nothing about name, it has neither a way to calculate it, nor to use any index to speed up the search.
My best bet is a full scan, fetching title for every record, calculating name then filtering by it. You can rawly do it by [x for x in Session.query(Entry).all() if x.name==my_name][0]. With a bit more of sophistication, you'll only fetch id and title in the filtering pass, and then fetch the full record(s) by id.
Note that a full scan is usually not nice from performance POV, unless your table is quite small.