difference of join and jonedload in sqlalchemy - python

I have the following classes:
Class User(Base):
#user properties
Class Item(Base):
#item properties
Class User_Item(Base):
__tablename__='users_items'
id=Column(Integer, primary_key=True)
user_id=Column(Integer, ForeignKey('users.id'))
item_id=Column(Integer, ForeignKey('items.id'))
info=Column(String(20))
user=relationship(User,backref='items',primaryjoin=(User.id==user_id))
item=relationship(Item, backref='users',primaryjoin=(Item.id==item_id))
now what is the difference between the following two queries:
result1= session.query(Item).options(joinedload(Item.users)).filter(Item.users.any(user=user1)
or
result2=session.query(Item).join(Item.users).filter(Item.users.any(user=user1))
for some reason the second one looks weird! Let's say user1 has two items, running result1.count() returns 2 as expected, but result2.count() returns 3! While, len(result2.all()) is 2! can somebody tell me, what is going on?! :D

The first approach with joinedload is clearly not want you want, because it defines only how item.users is loaded after executing the query (see Relationship Loading Techniques).
And with join, you should use contains instead of any:
result3 = session.query(Item).join(Item.users).filter(Item.users.contains(user1))
My guess is that your version with any produces multiple rows per join result. You can verify this by inspecting the produced SQL (set echo=True on engine configuration).
Edit
You might defined the many to many relationship wrong, this could be the real problem. In your version User.items and Item.users is both pointing to User_Item instead to User or Item and I think you are not aware of that.
You should have a look at the Association Object Pattern and how you can simplify it.

Related

SQLAlchemy multiple backrefs causing problems

I'm using SQLAlchemy with Python (linking to an MySQL database) and am having a little design issue, which has presented itself as a problem with a backref.
So the situation is this. I have a SearchGroup which contains TargetObjects and SearchObjects. These are both many to many relationships, and so the SearchGroup table comes with two association tables, one for each. The SearchObject is the same time for any SearchGroup, but the TargetObject varies. So far so good. The whole idea here is that a SearchObject is simply a string with a few other variables, and a SearchGroup compares them all to a given string and then, if there's a match, supplies the target objects.
Now for some code: the declaration of these three classes, although with the parent logic hidden for brevity:
class AssocTable_GroupCMClassesGrades_Grades(AssociationTable_Group_TargetObjectsParent, med.DeclarativeBase):
__tablename__ = 'AssocTable_GroupCMClassesGrades_Grades'
_groupsTableName = 'SearchGroup_CMClasses_Grades'
_targetObjectsTableName = 'Grades'
class AssocTable_GroupCMClassesGrades_SearchObjects(AssociationTable_Group_SearchObjectsParent, med.DeclarativeBase):
__tablename__ = 'AssocTable_GroupCMClassesGrades_SearchObjects'
_groupsTableName = 'SearchGroup_CMClasses_Grades'
_searchObjectsTableName = 'SearchObjects'
class SearchGroup_CMClasses_Grades(SearchObjectGroupParent, med.DeclarativeBase):
__tablename__ = 'SearchGroup_CMClasses_Grades'
targetAssociatedTargetObjectTableName = 'AssocTable_GroupCMClassesGrades_Grades'
targetAssociatedSearchObjectTableName = 'AssocTable_GroupCMClassesGrades_SearchObjects'
targetClassName = 'Grade'
myClassName = 'SearchGroup_CMClasses_Grades'
searchObjectClassName = 'SearchObject'
searchObjectChildrenBackRefName = 'Groups'
The top two are the association tables and the bottom is the main class. The strings are used to set up various foreign keys and relationships and such.
Let's look at a specific example, which is crucial to the question:
#declared_attr
def searchObject_childen(cls):
return relationship(f'{cls.searchObjectClassName}', secondary=f'{cls.targetAssociatedSearchObjectTableName}', backref=f'{cls.searchObjectChildrenBackRefName}')
This is inside the SearchObjectGroupParent class and, as you can see, is for the 'children' of the SearchGroup, which are SearchObjects.
So now to the problem.
That all works rather well, except for one thing. If I could direct your attention back to the large bit of code above, and to this line:
searchObjectChildrenBackRefName = 'Groups'
This, as seen in the second posted piece of code (the declared_attr one), sets up a backref; a property in the target - it creates that property and then populates it. I'm not an expert at this by any means so I won't pretend to be. The point is this: if I create another SearchObjectGroupParent derived class, like the one above, with its association tables, I can't put another 'Groups' property into SearchObject - in fact it will throw an error telling me as much:
sqlalchemy.exc.ArgumentError: Error creating backref 'Groups' on relationship 'SearchGroup_CMClasses_Grades.searchObject_childen': property of that name exists on mapper 'mapped class SearchObject->SearchObjects'
There is a rather unsatisfying way to solve this, which is to simple change that name each time, but then the SearchObject won't have a common list of SearchGroups. In fact it will contain the 'Groups' property for every SearchGroup. This will work, but will be messy and I'd rather not do it. What I would like is to say 'okay, if this backref already exists, just use that one'. I don't know if that's possible, but I think such a thing would solve my problem.
Edit: I thought an image might help explain better:
Figure 1: what I have now:
The more of these objects derived from SearchObjectsGroupParent I have, the messier it will be (SearchObject will contain Groups03, Groups04, Groups05, etc.).
Figure 2: what I want:

SQLAlchemy Mapping Multiple Columns to Single Property

I'm building a web application in Python 3 using Flask & SQLAlchemy (via Flask-SQLAlchemy; with either MySQL or SQLite), and I've run into a situation where I'd like to reference a single property on my model class that encapsulates multiple columns in my database. I'm pretty well versed in MySQL, but this is my first real foray into SQLAlchemy beyond the basics. Reading the docs, scouring SO, and searching Google have led me to two possible solutions: Hybrid attributes (docs) or Composite columns (docs).
My question is what are the implications of using each of these, and which of these is the appropriate solution to my situation? I've included example code below that's a snippet of what I'm doing.
Background: I'm developing an application to track & sort photographs, and have a DB table in which I store the metadata for these photos, including when the picture was taken. Since photos are taken in a specific place, the taken date & time have an associated timezone. As SQL has a notoriously love/hate relationship with timezones, I've opted to record when the photo was taken in two columns: a datetime storing the date & time and a string storing the timezone name. (I'd like to sidestep the inevitable debate about how to store timezone aware dates & times in SQL, please.) What I would like is a single parameter on the model class that can I can use to get a proper python datetime object, and that I can also set like any other column.
Here's my table:
class Photo(db.Model):
__tablename__ = 'photos'
id = db.Column(db.Integer, primary_key=True)
...
taken_dt = db.Column(db.datetime, nullable=False)
taken_tz = db.Column(db.String(64), nullable=False)
...
Here's what I have using a hybrid parameter (added to the above class, datetime/pytz code is psuedocode):
#hybrid_parameter
def taken(self):
return datetime.datetime(self.taken_dt, self.taken_tz)
#taken.setter(self, dt):
self.taken_dt = dt
self.taken_tz = dt.tzinfo
From there I'm not exactly sure what else I need in the way of a #taken.expression or #taken.comparator, or why I'd choose one over the other.
Here's what I have using a composite column (again, added to the above class, datetime/pytz code is psuedocode):
taken = composite(DateTimeTimeZone._make, taken_dt, taken,tz)
class DateTimeTimeZone(object):
def __init__(self, dt, tz):
self.dt = dt
self.tz = tz
#classmethod
def from_db(cls, dt, tz):
return DateTimeTimeZone(dt, tz)
#classmethod
def from_dt(cls, dt):
return DateTimeTimeZone(dt, dt.tzinfo)
def __composite_values__(self):
return (self.dt, self.tz)
def value(self):
#This is here so I can get the actual datetime.datetime object
return datetime.datetime(self.dt, self.tz)
It would seem that this method has a decent amount of extra overhead, and I can't figure out a way to set it like I would any other column directly from a datetime.datetime object without instantiating the value object first using .from_dt.
Any guidance on if I'm going down the wrong path here would be welcome. Thanks!
TL;DR: Look into hooking up an AttributeEvent to your column and have it check for datetime instances which have a tz attribute set and then return a DateTimeTimeZone object. If you look at the SQLAlchemy docs for Attribute Events you can see that you can tell SQLAlchemy to listen to an attribute-set event and call your code on that. In there you can do any modification to the value being set as you like. You can't however access other attributes of the class at that time. I haven't tried this in combination with composites yet, so I don't know if this will be called before or after the type-conversion of the composite. You'd have to try.
edit: Its all about what you want to achieve though. The AttributeEvent can help you with your data consistency, while the hybrid_property and friends will make querying easier for you. You should use each one for it's intended use-case.
More detailed discussion on the differences between the various solutions:
hybrid_attribute and composite are two completely different beasts. To understand hybrid_attribute one first has to understand what a column_property is and can do.
1) column_property
This one is placed on a mapper and can contain any selectable. So if you put an concrete sub-select into a column_property you can access it read-only as if it were a concrete column. The calculation is done on the fly. You can even use it to search for entries. SQLAlchemy will construct the right select containing your sub-select for you.
Example:
class User(Base):
id = Column(Integer, primary_key=True)
first_name = Column(Unicode)
last_name = Column(Unicode)
name = column_property(first_name + ' ' + last_name)
category = column_property(select([CategoryName.name])
.select_from(Category.__table__
.join(CategoryName.__table__))
.where(Category.user_id == id))
db.query(User).filter(User.name == 'John Doe').all()
db.query(User).filter(User.category == 'Paid').all()
As you can see, this can simplify a lot of code, but one has to be careful to think of the performance implications.
2) hybrid_method and hybrid_attribute
A hybrid_attribute is just like a column_property but can call a different code-path when you are in an instance context. So you can have the selectable on the class level but a different implementation on the instance level. With a hybrid_method you can even parametrize both sides.
3) composite_attribute
This is what enables you to combine multiple concrete columns to a logical single one. You have to write a class for this logical column so that SQLAlchemy can extract the correct values from there and use it in the selects. This integrates neatly in the query framework and should not impose any additional problems. In my experience the use-cases for composite columns are rather rare. Your use-case seems fine. For modification of values you can always use AttributeEvents. If you want to have the whole instance available you'd have to have a MapperEvent called before flush. This certainly works, as I used this to implement a completely transparent Audit Trail tracking system which stored every value changed in every table in a separate set of tables.

Adding Naming Convention to Existing Database

I'm using sqlalchemy and am trying to integrate alembic for database migrations.
My database currently exists and has a number of ForeignKeys defined without names. I would like to add a naming convention to allow for migrations that affect ForeignKey columns.
I've added the naming convention given here to the top of my models.py file:
SQLAlchemy Naming Constraints
convention = {
"ix": 'ix_%(column_0_label)s',
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s"
}
DeclarativeBase = declarative_base()
DeclarativeBase.metadata = MetaData(naming_convention=convention)
def db_connect():
return create_engine(URL(**settings.DATABASE))
def create_reviews_table(engine):
DeclarativeBase.metadata.create_all(engine)
class Review(DeclarativeBase):
__tablename__ = 'reviews'
id = Column(Integer, primary_key=True)
review_id = Column('review_id', String, primary_key=True)
resto_id = Column('resto_id', Integer, ForeignKey('restaurants.id'),
nullable=True)
url = Column('url', String),
resto_name = Column('resto_name', String)
I've set up alembic/env.py as per the tutorial instructions, feeding my model's metadata into target_metadata.
When I run
$: alembic current
I get the following error:
sqlalchemy.exc.InvalidRequestError: Naming convention including %(constraint_name)s token requires that constraint is explicitly named.
In the docs they say that "This same feature [generating names for columns using a naming convention] takes effect even if we just use the Column.unique flag:" 1, so I'm thinking that there shouldn't be a problem (they go on to give an example using a ForeignKey that isn't named too).
Do I need to go back and give all my constraints explicit names, or is there a way to do it automatically?
just modify th "ck" in convention to "ck": "ck_%(table_name)s_%(column_0_name)s"。it works for me .
refer to see sqlalchemy docs
What this error message is telling you is that you should name constraints explicitly. The constraints it's referring to are Boolean, Enum etc but not foreignkeys nor primary keys.
So go through your table, wherever you have a Boolean or Enum add a name to it. For example:
is_active = Column(Boolean(name='is_active'))
That's what you need to do.
This does not aim to be definitive answer and also fails to answer your immediate technical question, but could it be a "philosophical problem"? Either your SQLAlchemy code is the source of truth as far as the database is concerned, or the RDMS is the source. In front of this a mixed situation, where each of the two have part of it, I would see two avenues:
The one that you are exploring: you modify the database's schema to match the SQLAlchemy model and you make your Python code the master. This is the most intuitive, but this may not always be possible, both for technical and administrative reasons.
Accepting that the RDMS has info that SQLAlchemy doesn't have, but is fortunately not relevant for day-to-day work. Your best chance is to use another migration tool (ETL) that will reverse engineer the database before migrating it. After the migration is complete you could give back control of the new instance to SQLAlchemy (which may require some adjustments to the new DB or to the model).
There is no way to tell which approach will work, since both have their own challenges. But I would give some thought to the second method?
I've had some luck altering naming_convention back to {} in each older migration so that they run with the correct historical context.
Still entirely unsure what kind of interesting side-effects this might have.

Including Duplicate Tables using Django's ORM Extra()

I'm trying to implement a simple triplestore using Django's ORM. I'd like to be able to search for arbitrarily complex triple patterns (e.g. as you would with SparQL).
To do this, I'm attempting to use the .extra() method. However, even though the docs mention it can, in theory, handle duplicate references to the same table by automatically creating an alias for the duplicate table references, I've found it does not do this in practice.
For example, say I have the following model in my "triple" app:
class Triple(models.Model):
subject = models.CharField(max_length=100)
predicate = models.CharField(max_length=100)
object = models.CharField(max_length=100)
and I have the following triples stored in my database:
subject predicate object
bob has-a hat .
bob knows sue .
sue has-a house .
bob knows tom .
Now, say I want to query the names of everyone bob knows who has a house. In SQL, I'd simply do:
SELECT t2.subject AS name
FROM triple_triple t1
INNER JOIN triple_triple t2 ON
t1.subject = 'bob'
AND t1.predicate = 'knows'
AND t1.object = t2.subject
AND t2.predicate = 'has-a'
AND t2.object = 'house'
I'm not completely sure what this would look like with Django's ORM, although I think it would be along the lines of:
q = Triple.objects.filter(subject='bob', predicate='knows')
q = q.extra(tables=['triple_triple'], where=["triple_triple.object=t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house'"])
q.values('t1.subject')
Unfortunately, this fails with the error "DatabaseError: no such column: t1.subject"
Running print q.query shows:
SELECT "triple_triple"."subject" FROM "triple_triple" WHERE ("triple_triple"."subject" = 'bob' AND "triple_triple"."predicate" = 'knows'
AND triple_triple.object = t1.subject AND t1.predicate = 'has-a' AND t1.object = 'house')
which appears to show that the tables param in my call to .extra() is being ignored, as there's no second reference to triple_triple inserted anywhere.
Why is this happening? What's the appropriate way to refer to complex relationships between records in the same table using Django's ORM?
EDIT: I found this useful snippet for including custom SQL via .extra() so that it's usable inside a model manager.
I think what you're missing is the select parameter (for the extra method)
This seems to work:
qs = Triple.objects.filter(subject="bob", predicate="knows").extra(
select={'known': "t1.subject"},
tables=['"triple_triple" AS "t1"'],
where=['''triple_triple.object=t1.subject
AND t1.predicate="has-a" AND t1.object="'''])
qs.values("known")
I've had the same issue where Django escapes (adds back-ticks) to my table names, meaning that I can't add an alias manually; the resulting FROM clause looks like this:
"mytable" AS T100
But at the same time, Django won't automatically create aliases for you if the table is already mentioned; instead it ignores the tables and just adds on the WHERE clauses as if they refer to the original tables.
The documentation for Django 1.8 suggests that .extra() will create aliases for you:
https://docs.djangoproject.com/en/1.8/ref/models/querysets/#django.db.models.query.QuerySet.extra
But this doesn't appear to be the case for my query, possibly because original table is part of a LEFT OUTER JOIN rather than a simple FROM x,y,z clause.

SQLAlchemy: query custom property based on table field

I'm using SQLAlchemy declarative base to define my model. I defined a property name that is computed from one the columns (title):
class Entry(Base):
__tablename__ = "blog_entry"
id = Column(Integer, primary_key=True)
title = Column(Unicode(255))
...
#property
def name(self):
return re.sub(r'[^a-zA-Z0-9 ]','',self.title).replace(' ','-').lower()
When trying to perform a query using name, SQLAlchemy throws an error:
Session.query(Entry).filter(Entry.name == my_name).first()
>>> ArgumentError: filter() argument must be of type sqlalchemy.sql.ClauseElement or string
After investigating for a while, I found that maybe comparable_using() could help, but I couldn't find any example that shows a comparator that references another column of the table.
Is this even possible or is there a better approach?
From SqlAlchemy 0.7 you can achieve this using hybrid_property
see the docs here: http://www.sqlalchemy.org/docs/orm/extensions/hybrid.html
Can you imagine what SQL should be issued for your query? The database knows nothing about name, it has neither a way to calculate it, nor to use any index to speed up the search.
My best bet is a full scan, fetching title for every record, calculating name then filtering by it. You can rawly do it by [x for x in Session.query(Entry).all() if x.name==my_name][0]. With a bit more of sophistication, you'll only fetch id and title in the filtering pass, and then fetch the full record(s) by id.
Note that a full scan is usually not nice from performance POV, unless your table is quite small.

Categories

Resources