I'm using SQLAlchemy with Python (linking to an MySQL database) and am having a little design issue, which has presented itself as a problem with a backref.
So the situation is this. I have a SearchGroup which contains TargetObjects and SearchObjects. These are both many to many relationships, and so the SearchGroup table comes with two association tables, one for each. The SearchObject is the same time for any SearchGroup, but the TargetObject varies. So far so good. The whole idea here is that a SearchObject is simply a string with a few other variables, and a SearchGroup compares them all to a given string and then, if there's a match, supplies the target objects.
Now for some code: the declaration of these three classes, although with the parent logic hidden for brevity:
class AssocTable_GroupCMClassesGrades_Grades(AssociationTable_Group_TargetObjectsParent, med.DeclarativeBase):
__tablename__ = 'AssocTable_GroupCMClassesGrades_Grades'
_groupsTableName = 'SearchGroup_CMClasses_Grades'
_targetObjectsTableName = 'Grades'
class AssocTable_GroupCMClassesGrades_SearchObjects(AssociationTable_Group_SearchObjectsParent, med.DeclarativeBase):
__tablename__ = 'AssocTable_GroupCMClassesGrades_SearchObjects'
_groupsTableName = 'SearchGroup_CMClasses_Grades'
_searchObjectsTableName = 'SearchObjects'
class SearchGroup_CMClasses_Grades(SearchObjectGroupParent, med.DeclarativeBase):
__tablename__ = 'SearchGroup_CMClasses_Grades'
targetAssociatedTargetObjectTableName = 'AssocTable_GroupCMClassesGrades_Grades'
targetAssociatedSearchObjectTableName = 'AssocTable_GroupCMClassesGrades_SearchObjects'
targetClassName = 'Grade'
myClassName = 'SearchGroup_CMClasses_Grades'
searchObjectClassName = 'SearchObject'
searchObjectChildrenBackRefName = 'Groups'
The top two are the association tables and the bottom is the main class. The strings are used to set up various foreign keys and relationships and such.
Let's look at a specific example, which is crucial to the question:
#declared_attr
def searchObject_childen(cls):
return relationship(f'{cls.searchObjectClassName}', secondary=f'{cls.targetAssociatedSearchObjectTableName}', backref=f'{cls.searchObjectChildrenBackRefName}')
This is inside the SearchObjectGroupParent class and, as you can see, is for the 'children' of the SearchGroup, which are SearchObjects.
So now to the problem.
That all works rather well, except for one thing. If I could direct your attention back to the large bit of code above, and to this line:
searchObjectChildrenBackRefName = 'Groups'
This, as seen in the second posted piece of code (the declared_attr one), sets up a backref; a property in the target - it creates that property and then populates it. I'm not an expert at this by any means so I won't pretend to be. The point is this: if I create another SearchObjectGroupParent derived class, like the one above, with its association tables, I can't put another 'Groups' property into SearchObject - in fact it will throw an error telling me as much:
sqlalchemy.exc.ArgumentError: Error creating backref 'Groups' on relationship 'SearchGroup_CMClasses_Grades.searchObject_childen': property of that name exists on mapper 'mapped class SearchObject->SearchObjects'
There is a rather unsatisfying way to solve this, which is to simple change that name each time, but then the SearchObject won't have a common list of SearchGroups. In fact it will contain the 'Groups' property for every SearchGroup. This will work, but will be messy and I'd rather not do it. What I would like is to say 'okay, if this backref already exists, just use that one'. I don't know if that's possible, but I think such a thing would solve my problem.
Edit: I thought an image might help explain better:
Figure 1: what I have now:
The more of these objects derived from SearchObjectsGroupParent I have, the messier it will be (SearchObject will contain Groups03, Groups04, Groups05, etc.).
Figure 2: what I want:
Related
I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!
TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.
How can I make a relationship without having a foreign key?
#declared_attr
def custom_stuff(cls):
joinstr = 'foreign(Custom.name) == "{name}"'.format(name=cls.__name__)
return db.relationship('Custom', primaryjoin=joinstr)
This raises an error:
ArgumentError: Could not locate any simple equality expressions involving locally mapped foreign key columns for primary join condition
This works, but I think it's a pretty ugly hack.
#declared_attr
def custom_stuff(cls):
joinstr = 'or_(
and_(foreign(Custom.name) == MyTable.title,
foreign(Custom.name) != MyTable.title),
foreign(Custom.name) == "{name}")'.format(name=cls.__name__)
return db.relationship('Custom', primaryjoin=joinstr)
Is there a better way to do this?
EDIT: the extra attribute needs to be added as #declared_attr and has to use a relationship, since our serializer is written so it works with declarred attrs.
Doing this with #hybrid_property or something else would work, but then our json serializer would break. Getting that to work seems harder than defining a relationship.
You don't necessarily have to define relationships when creating tables in a database (this applies for almost every SQL). You can still join tables that don't have a foreign-key relation predefined (the main reason for foreign keys is to enforce data consistency, not to define what you can join or not).
See this for reference - http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.join
If you would still like to "show" the database and table structure/model, then better use some entity relationship modeler like ERwin (or some diagramming software).
Maybe you should point to the meaning of "relationship" word. It states that some thing "depends" on some other thing or both depend each other.
If you're trying to define a relationship on the ERM (Entity Relationship Model) that means you should state which one of the entities "depends" on the other one. Also, databases have some hacks to deal faster with tables relating each other than just simple tables that are related on an upper abstract way.
Is there any reason why you need to do that?
I'm using sqlalchemy and am trying to integrate alembic for database migrations.
My database currently exists and has a number of ForeignKeys defined without names. I would like to add a naming convention to allow for migrations that affect ForeignKey columns.
I've added the naming convention given here to the top of my models.py file:
SQLAlchemy Naming Constraints
convention = {
"ix": 'ix_%(column_0_label)s',
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s"
}
DeclarativeBase = declarative_base()
DeclarativeBase.metadata = MetaData(naming_convention=convention)
def db_connect():
return create_engine(URL(**settings.DATABASE))
def create_reviews_table(engine):
DeclarativeBase.metadata.create_all(engine)
class Review(DeclarativeBase):
__tablename__ = 'reviews'
id = Column(Integer, primary_key=True)
review_id = Column('review_id', String, primary_key=True)
resto_id = Column('resto_id', Integer, ForeignKey('restaurants.id'),
nullable=True)
url = Column('url', String),
resto_name = Column('resto_name', String)
I've set up alembic/env.py as per the tutorial instructions, feeding my model's metadata into target_metadata.
When I run
$: alembic current
I get the following error:
sqlalchemy.exc.InvalidRequestError: Naming convention including %(constraint_name)s token requires that constraint is explicitly named.
In the docs they say that "This same feature [generating names for columns using a naming convention] takes effect even if we just use the Column.unique flag:" 1, so I'm thinking that there shouldn't be a problem (they go on to give an example using a ForeignKey that isn't named too).
Do I need to go back and give all my constraints explicit names, or is there a way to do it automatically?
just modify th "ck" in convention to "ck": "ck_%(table_name)s_%(column_0_name)s"。it works for me .
refer to see sqlalchemy docs
What this error message is telling you is that you should name constraints explicitly. The constraints it's referring to are Boolean, Enum etc but not foreignkeys nor primary keys.
So go through your table, wherever you have a Boolean or Enum add a name to it. For example:
is_active = Column(Boolean(name='is_active'))
That's what you need to do.
This does not aim to be definitive answer and also fails to answer your immediate technical question, but could it be a "philosophical problem"? Either your SQLAlchemy code is the source of truth as far as the database is concerned, or the RDMS is the source. In front of this a mixed situation, where each of the two have part of it, I would see two avenues:
The one that you are exploring: you modify the database's schema to match the SQLAlchemy model and you make your Python code the master. This is the most intuitive, but this may not always be possible, both for technical and administrative reasons.
Accepting that the RDMS has info that SQLAlchemy doesn't have, but is fortunately not relevant for day-to-day work. Your best chance is to use another migration tool (ETL) that will reverse engineer the database before migrating it. After the migration is complete you could give back control of the new instance to SQLAlchemy (which may require some adjustments to the new DB or to the model).
There is no way to tell which approach will work, since both have their own challenges. But I would give some thought to the second method?
I've had some luck altering naming_convention back to {} in each older migration so that they run with the correct historical context.
Still entirely unsure what kind of interesting side-effects this might have.
I'm getting an error I don't understand with AbstractConcreteBase
in my_enum.py
class MyEnum(AbstractConcreteBase, Base):
pass
in enum1.py
class Enum1(MyEnum):
years = Column(SmallInteger, default=0)
# class MyEnums1:
# NONE = Enum1()
# Y1 = Enum1(years=1)
in enum2.py
class Enum2(MyEnum):
class_name_python = Column(String(50))
in test.py
from galileo.copernicus.basic_enum.enum1 import Enum1
from galileo.copernicus.basic_enum.enum2 import Enum2
#...
If I uncomment the three lines in enum1.py I get the following error on the second import.
AttributeError: type object 'MyEnum' has no attribute 'table'
but without MyEnums1 it works fine or with MyEnums1 in a separate file it works fine. Why would this instantiation affect the import? Is there anyway I can keep MyEnums1 in the same file?
the purpose of the abstractconcretebase is to apply a non-standard order of operations to the standard mapping procedure. normally, mapping works like this:
define a class to be mapped
define a Table
map the class to the Table using mapper().
Declarative essentially combines these three steps, but that's what it does.
When using an abstract concrete base, we have this totally special step that needs to happen - the base class needs to be mapped to a union of all the tables that the subclasses are mapped to. So if you have enum1 and enum2, the "Base" needs to map to essentially "select * from enum1 UNION ALL select * from enum2".
This mapping to a UNION can't happen piecemeal; the MyEnum base class has to present itself to mapper() with the full UNION of every sub-table at once. So AbstractConcreteBase performs the complex task of rearranging how declarative works such that the base MyEnum is not mapped at all until the mapper configuration occurs, which among other places occurs when you first instantiate a mapped class. It then inserts itself as the mapped base for all the existing mapped subclasses.
So basically by instantiating an Enum1() object at the class level like that, you're invoking configure_mappers() way too early, such that by the time Enum2() comes along the abstractconcretebase is baked and the process fails.
All of that aside, it's not at all correct to be instantiating a mapped class like Enum1() at the class level like that. ORM-mapped objects are the complete opposite of global objects and must always be created local to a specific Session.
edit: also those classes are supposed to have {"concrete": True} on them which is part of why you're getting this message. Im trying to see if the message can be improved.
edit 2: yeah the mechanics here are weird. I've committed something else that skips this particular error message, though it will fail differently now and not much better. getting this to fail more gracefully would require a little more work.
I have the following classes:
Class User(Base):
#user properties
Class Item(Base):
#item properties
Class User_Item(Base):
__tablename__='users_items'
id=Column(Integer, primary_key=True)
user_id=Column(Integer, ForeignKey('users.id'))
item_id=Column(Integer, ForeignKey('items.id'))
info=Column(String(20))
user=relationship(User,backref='items',primaryjoin=(User.id==user_id))
item=relationship(Item, backref='users',primaryjoin=(Item.id==item_id))
now what is the difference between the following two queries:
result1= session.query(Item).options(joinedload(Item.users)).filter(Item.users.any(user=user1)
or
result2=session.query(Item).join(Item.users).filter(Item.users.any(user=user1))
for some reason the second one looks weird! Let's say user1 has two items, running result1.count() returns 2 as expected, but result2.count() returns 3! While, len(result2.all()) is 2! can somebody tell me, what is going on?! :D
The first approach with joinedload is clearly not want you want, because it defines only how item.users is loaded after executing the query (see Relationship Loading Techniques).
And with join, you should use contains instead of any:
result3 = session.query(Item).join(Item.users).filter(Item.users.contains(user1))
My guess is that your version with any produces multiple rows per join result. You can verify this by inspecting the produced SQL (set echo=True on engine configuration).
Edit
You might defined the many to many relationship wrong, this could be the real problem. In your version User.items and Item.users is both pointing to User_Item instead to User or Item and I think you are not aware of that.
You should have a look at the Association Object Pattern and how you can simplify it.