SQL to SQLAlchemy translation - python

I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)

You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.

Related

SQLAlchemy class method for subquery

I have a table of time series data that I frequently need to get records where the date is equal to the max date in the table. In SQL this is easily accomplished via subquery, i.e.:
SELECT * from my_table where date = (select max(date) from my_table);
The model for this table would look like:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
And I can accomplish the desired behavior in SQLAlchemy with two separate queries, ie:
maxdate = session.query(func.max(MyTable.date)).first()[0]
desired_results = session.query(MyTable).filter(MyTable.date == maxdate).all()
The problem is that I have this subquery sprinkled everywhere in my code and I feel it is an inelegant solution. Ideally I would like to write a class property or custom comparator that I can stick in the model definition, so that I can compress the subquery into a single line and reuse it constantly, something like:
session.query(MyTable).filter(MyTable.date == MyTable.max_date)
I have looked through the SQLAlchemy docs on this but haven't come up with anything that works. Does anybody have neat a solution for this kind of problem?
For posterity, here is the solution I came up with
from sqlalchemy.sql import func
from sqlalchemy import select
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
maxdate = select([func.max(date)])
desired_results = session.query(MyTable).filter(MyTable.date == MyTable.maxdate).all()

Turning SQL expression into SQLAlchemy query

I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?
The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".
Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()

SQLAlachmey: ORM filter to match all items in a list, not any

I want to search a SQLAlachmey list (via an association table) and match on multiple items within it via a filter.
I already reviewed this question but I am looking to accomplish this via the ORM filter only (and the second answer is not via an association table).
Database table setup:
tag_ast_table = Table('tag_association',
Base.metadata,
Column('file_id', Integer, ForeignKey('files.id')),
Column('tag_id', Integer, ForeignKey('tags.id')),
PrimaryKeyConstraint('file_id', 'tag_id'))
class File(Base):
__tablename__ = 'files'
id = Column(Integer, primary_key=True)
tags = relationship("Tag", secondary=tag_ast_table)
class Tag(Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key=True)
tag = Column(String)
Current filter to match any I would like to modify to match all:
query = db.query(File).filter(File.tags.any(Tag.tag.in_(my_list))).all()
A reasonable approach to this in SQL (alluded to in your link) is to use having count(distinct tags.id) = <your number of tags>.
So the query needs 2 things: it needs an in that looks for your list of tags, and it needs a having that looks for the full count being present.
query = (
session.query(File)
.join(File.tags)
.filter(Tag.tag.in_(search_tags))
.group_by(File)
.having(func.count(distinct(Tag.id)) == len(search_tags))
)
As an edge case, if search_tags is an empty list you won't get any results, so best to check for that first.

SQLAlchemy mapping joined tables' columns to one object

I have three tables: UserTypeMapper, User, and SystemAdmin. In my get_user method, depending on the UserTypeMapper.is_admin row, I then query either the User or SystemAdmin table. The user_id row correlates to the primary key id in the User and SystemAdmin tables.
class UserTypeMapper(Base):
__tablename__ = 'user_type_mapper'
id = Column(BigInteger, primary_key=True)
is_admin = Column(Boolean, default=False)
user_id = Column(BigInteger, nullable=False)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
class User(Base):
__tablename__ = 'user'
id = Column(BigInteger, primary_key=True)
name = Column(Unicode)
email = Column(Unicode)
I want to be able to get any user – system admin or regular user – from one query, so I do a join, on either User or SystemAdmin depending on the is_admin row. For example:
DBSession.query(UserTypeMapper, SystemAdmin).join(SystemAdmin, UserTypeMapper.user_id==SystemAdmin.id).first()
and
DBSession.query(UserTypeMapper, User).join(User, UserTypeMapper.user_id==User.id).first()
This works fine; however, I then would like to be access these, like so:
>>> my_admin_obj.is_admin
True
>>> my_admin_obj.name
Bob Smith
versus
>>> my_user_obj.is_admin
False
>>> my_user_obj.name
Bob Stevens
Currently, I have to specify: my_user_obj.UserTypeMapper.is_admin and my_user_obj.User.name. From what I've been reading, I need to map the tables so that I don't need to specify which table the attribute belongs to. My problem is that I do not understand how I can specify this given that I have two potential tables that the name attribute, for example, may come from.
This is the example I am referring to: Mapping a Class against Multiple Tables
How can I achieve this? Thank you.
You have discovered why "dual purpose foreign key", is an antipattern.
There is a related problem to this that you haven't quite pointed out; there's no way to use a foreign key constraint to enforce the data be in a valid state. You want to be sure that there's exactly one of something for each row in UserTypeMapper, but that 'something' is not any one table. formally you want a functional dependance on
user_type_mapper &rightarrow; (system_admin× 1) ∪ (user× 0)
But most sql databses won't allow you to write a foreign key constraint expressing that.
It looks complicated because it is complicated.
instead, lets consider what we really want to say; "every system_admin should be a user; or
system_admin &rightarrow; user
In sql, that would be written:
CREATE TABLE user (
id INTEGER PRIMARY KEY,
name VARCHAR,
email VARCHAR
);
CREATE TABLE system_admin (
user_id INTEGER PRIMARY KEY REFERENCES user(id)
);
Or, in sqlalchemy declarative style
class User(Base):
__tablename__ = 'user'
id = Column(Integer, primary_key=True)
name = Column(String)
email = Column(String)
class SystemAdmin(Base):
__tablename__ = 'system_admin'
user_id = Column(ForeignKey(User.id), primary_key=True)
What sort of questions does this schema allow us to ask?
"Is there a SystemAdmin by the name of 'john doe'"?
>>> print session.query(User).join(SystemAdmin).filter(User.name == 'john doe').exists()
EXISTS (SELECT 1
FROM "user" JOIN system_admin ON "user".id = system_admin.user_id
WHERE "user".name = :name_1)
"How many users are there? How many sysadmins?"
>>> print session.query(func.count(User.id), func.count(SystemAdmin.user_id)).outerjoin(SystemAdmin)
SELECT count("user".id) AS count_1, count(system_admin.user_id) AS count_2
FROM "user" LEFT OUTER JOIN system_admin ON "user".id = system_admin.user_id
I hope you can see why the above is prefereable to the design you describe in your question; but in the off chance you don't have a choice (and only in that case, if you still feel what you've got is better, please refine your question), you can still cram that data into a single python object, which will be very difficult to work with, by providing an alternate mapping to the tables; specifically one which follows the rough structure in the first equation.
We need to mention UserTypeMapper twice, once for each side of the union, for that, we need to give aliases.
>>> from sqlalchemy.orm import aliased
>>> utm1 = aliased(UserTypeMapper)
>>> utm2 = aliased(UserTypeMapper)
For the union bodies join each alias to the appropriate table: Since SystemAdmin and User have the same columns in the same order, we don't need to describe them in detail, but if they are at all different, we need to make them "union compatible", by mentioning each column explicitly; this is left as an exercise.
>>> utm_sa = Query([utm1, SystemAdmin]).join(SystemAdmin, (utm1.user_id == SystemAdmin.id) & (utm1.is_admin == True))
>>> utm_u = Query([utm2, User]).join(User, (utm2.user_id == User.id) & (utm2.is_admin == False))
And then we join them together...
>>> print utm_sa.union(utm_u)
SELECT anon_1.user_type_mapper_1_id AS anon_1_user_type_mapper_1_id, anon_1.user_type_mapper_1_is_admin AS anon_1_user_type_mapper_1_is_admin, anon_1.user_type_mapper_1_user_id AS anon_1_user_type_mapper_1_user_id, anon_1.system_admin_id AS anon_1_system_admin_id, anon_1.system_admin_name AS anon_1_system_admin_name, anon_1.system_admin_email AS anon_1_system_admin_email
FROM (SELECT user_type_mapper_1.id AS user_type_mapper_1_id, user_type_mapper_1.is_admin AS user_type_mapper_1_is_admin, user_type_mapper_1.user_id AS user_type_mapper_1_user_id, system_admin.id AS system_admin_id, system_admin.name AS system_admin_name, system_admin.email AS system_admin_email
FROM user_type_mapper AS user_type_mapper_1 JOIN system_admin ON user_type_mapper_1.user_id = system_admin.id AND user_type_mapper_1.is_admin = 1 UNION SELECT user_type_mapper_2.id AS user_type_mapper_2_id, user_type_mapper_2.is_admin AS user_type_mapper_2_is_admin, user_type_mapper_2.user_id AS user_type_mapper_2_user_id, "user".id AS user_id, "user".name AS user_name, "user".email AS user_email
FROM user_type_mapper AS user_type_mapper_2 JOIN "user" ON user_type_mapper_2.user_id = "user".id AND user_type_mapper_2.is_admin = 0) AS anon_1
While it's theoretically possible to wrap this all up into a python class that looks a bit like standard sqlalchemy orm stuff, I would certainly not do that. working with non-table mappings, especially when they are more than simple joins (this is a union), is lots of work for zero payoff.

sqlalchemy relational mapping

Hi I have a simple question - i have 2 tables (addresses and users - user has one address, lot of users can live at the same address)... I created a sqlalchemy mapping like this:
when I get my session and try to query something like
class Person(object):
'''
classdocs
'''
idPerson = Column("idPerson", Integer, primary_key = True)
name = Column("name", String)
surname = Column("surname", String)
idAddress = Column("idAddress", Integer, ForeignKey("pAddress.idAddress"))
idState = Column("idState", Integer, ForeignKey("pState.idState"))
Address = relationship(Address, primaryjoin=idAddress==Address.idAddress)
class Address(object):
'''
Class to represent table address object
'''
idAddress = Column("idAddress", Integer, primary_key=True)
street = Column("street", String)
number = Column("number", Integer)
postcode = Column("postcode", Integer)
country = Column("country", String)
residents = relationship("Person",order_by="desc(Person.surname, Person.name)", primaryjoin="idAddress=Person.idPerson")
self.tablePerson = sqlalchemy.Table("pPerson", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Person, self.tablePerson)
self.tableAddress = sqlalchemy.Table("pAddress", self.metadata, autoload=True)
sqlalchemy.orm.mapper(Address, self.tableAddress)
myaddress = session.query(Address).get(1);
print myaddress.residents[1].name
=> I get TypeError: 'RelationshipProperty' object does not support indexing
I understand residents is there to define the relationship but how the heck can I get the list of residents that the given address is assigned to?!
Thanks
You define a relationship in a wrong place. I think you are mixing Declarative Extension with non-declarative use:
when using declarative, you define your relations in your model.
otherwise, you define them when mapping model to a table
If option-2 is what you are doing, then you need to remove both relationship definitions from the models, and add it to a mapper (only one is enought):
mapper(Address, tableAddress,
properties={'residents': relationship(Person, order_by=(desc(Person.name), desc(Person.surname)), backref="Address"),}
)
Few more things about the code above:
Relation is defined only on one side. The backref takes care about the other side.
You do not need to specify the primaryjoin (as long as you have a ForeignKey specified, and SA is able to infer the columns)
Your order_by configuration is not correct, see code above for the version which works.
You might try defining Person after Address, with a backref to Address - this will create the array element:
class Address(object):
__tablename__ = 'address_table'
idAddress = Column("idAddress", Integer, primary_key=True)
class Person(object):
idPerson = Column("idPerson", Integer, primary_key = True)
...
address_id = Column(Integer, ForeignKey('address_table.idAddress'))
address = relationship(Address, backref='residents')
Then you can query:
myaddress = session.query(Address).get(1);
for residents in myaddress.residents:
print name
Further, if you have a lot of residents at an address you can further filter using join:
resultset = session.query(Address).join(Address.residents).filter(Person.name=='Joe')
# or
resultset = session.query(Person).filter(Person.name=='Joe').join(Person.address).filter(Address.state='NY')
and resultset.first() or resultset[0] or resultset.get(...) etc...

Categories

Resources