SQLAlchemy looses column label on chained union/except_ - python

I have a somewhat complex query where I need to join subquery. That subquery contains except and union. In RAW sql it looks something like this
SELECT ... FROM table t
JOIN (SELECT id AS foo_id FROM foo WHERE select_me
EXCLUDE SELECT foo_id FROM bar WHERE add_or_remove = 'remove'
UNION SELECT foo_id FROM bar WHERE add_or_remove = 'add') subq
ON t.foo_id = subq.foo_id;
Where foo and bar tables are defined like this:
class Foo(Base):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True, autoincrement=True)
select_me = Column(Boolean)
class Bar(Base):
__tablename__ = 'bar'
foo_id = Column(Integer, primary_key=True)
add_or_remove = Column(Enum('add', 'remove', name='add_or_remove'), primary_key=True)
When I'm trying to make this subquery in SQLAlchemy, it looses column label when I add second union/except_.
Here is what I'm talking about:
q = session.query(Foo.id.label('foo_id')).filter(Foo.select_me)
print(q.subquery().c)
Prints ['%(140275696626880 anon)s.foo_id'] still contains correct label
q = q.union(session.query(Bar.foo_id.label('foo_id')).filter(Bar.add_or_remove == 'add'))
print(q.subquery().c)
Prints ['%(140275696767384 anon)s.foo_id'] still contains correct label
q = q.except_(session.query(Bar.foo_id.label('foo_id')).filter(Bar.add_or_remove == 'remove'))
print(q.subquery().c)
Prints ['%(140275696769064 anon)s.%(140275696769008 anon)s_foo_id'] now column is labeled with autogenerated name and I cannot use it to specify condition in join.
For now I think I can just take first column and use it. But this is hacky solution, so I wonder if this is bug in SQLAlchemy or I'm doing something wrong.

Related

Auto-aliasing issues when selecting by relationship count with SQLAlchemy + Postgres

Given the following code:
from sqlalchemy import Column, ForeignKey, Integer, alias, create_engine, func, select
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
Base = declarative_base()
engine = create_engine(
"postgresql+psycopg2://***:***#127.0.0.1:5432/***", future=True
)
Session = sessionmaker(engine)
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
bars = relationship("Bar", uselist=True, secondary="foo_bar")
baz_id = Column(ForeignKey("baz.id"))
baz = relationship("Baz", back_populates="foos", lazy="joined")
class Bar(Base):
__tablename__ = "bar"
id = Column(Integer, primary_key=True)
class Baz(Base):
__tablename__ = "baz"
id = Column(Integer, primary_key=True)
foos = relationship(Foo, uselist=True)
class FooBar(Base):
__tablename__ = "foo_bar"
foo_id = Column(ForeignKey(Foo.id), primary_key=True)
bar_id = Column(ForeignKey(Bar.id), primary_key=True)
Base.metadata.create_all(engine)
stmt = (
select(Foo)
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Session().execute(stmt)
I want to select all Foos with exactly two Bars.
But I'm running into the following error:
column "baz_1.id" must appear in the GROUP BY clause or be used in an aggregate function
The generated SQL is:
SELECT foo.id, foo.baz_id, baz_1.id AS id_1
FROM foo JOIN foo_bar ON foo_bar.foo_id = foo.id
LEFT OUTER JOIN baz AS baz_1 ON baz_1.id = foo.baz_id GROUP BY foo.id
HAVING count(foo_bar.foo_id) = :count_1
Now I get what Postgres wants me to do, but I'm not sure how to achieve this, since I can't add baz_1.id to the GROUP PY clause because it's something that SQLAlchemy generates on the fly and I don't have any control over it.
Baz is being included in the query because of the lazy='joined' option on the relationship in Foo. We can override that option in the query, so that the join is not executed and the query works as desired.
stmt = (
select(Foo)
.options(orm.lazyload(Foo.baz)) # <- don't automatically join Baz.
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Generated SQL:
SELECT foo.id, foo.baz_id
FROM foo
JOIN foo_bar ON foo_bar.foo_id = foo.id
GROUP BY foo.id
HAVING count(foo_bar.foo_id) = %(count_1)s

Is there a way to populate one table from multiple on SQLAlchemy

I'm trying to build a database with SQLAlchemy, my problem is that I have two tables with the same columns name and trying to populate a third table from the two others. There is below a simple diagram to illustrate:
I usually set Foreign key on one table and the relationship on the other like that :
class TableA(Base):
__tablename__ = "tableA"
id = Column(Integer, primary_key=True)
name = Column(String(100))
age = Column(Integer)
name_relation = relationship("TableC", backref='owner')
class TableC(Base):
__tablename__ = "tableC"
id = Column(Integer, primary_key=True)
name = Column(String(100), ForeignKey('tableA.name'))
age = Column(Integer)
You can see that this method can only works with two table because my ForeignKey on tableC for the name specifies the name of tableA.
Is there a way to do that ?
Thanks
In SQL, the query you'd be looking for is
INSERT INTO C (id, name, age) (
SELECT *
FROM A
UNION ALL
SELECT *
FROM B
)
As per this answer, this makes the equivalent SQLAlchemy
session = Session()
query = session.query(TableA).union_all(session.query(TableB))
stmt = TableC.insert().from_select(['id', 'name', 'age'], query)
or equivalently
stmt = TableC.insert().from_select(
['id', 'name', 'age'],
TableA.select().union_all(TableB.select())
)
After which you can execute it using connection.execute(stmt) or session.execute(stmt), depending on what you're using.

SQL to SQLAlchemy translation

I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)
You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.

Turning SQL expression into SQLAlchemy query

I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?
The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".
Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()

How to Add rows using subqueries in sqlalchemy?

I'm using Postgresql with SQLAlchemy but it seems sqlalchemy is having trouble adding rows when using subqueries.
In my example, I want to update a counter for a specific tag in a table.
In SqlAlchemy a test run class would look like the following:
class TestRun( base ):
__tablename__ = 'test_runs'
id = sqlalchemy.Column( 'id', sqlalchemy.Integer, sqlalchemy.Sequence('user_id_seq'), primary_key=True )
tag = sqlalchemy.Column( 'tag', sqlalchemy.String )
counter = sqlalchemy.Column( 'counter', sqlalchemy.Integer )
The insertion code should then look like the following:
tag = 'sampletag'
counterquery = session.query(sqlalchemy.func.coalesce(sqlalchemy.func.max(TestRun.counter),0) + 1).\
filter(TestRun.tag == tag).\
subquery()
testrun = TestRun()
testrun.tag = tag
testrun.counter = counterquery
session.add( testrun )
session.commit()
The problem with this, is it gives a very interesting error when running this code, it's trying to run the following SQL Query:
'INSERT INTO test_runs (id, tag, counter)
VALUES (%(id)s,
%(tag)s,
SELECT coalesce(max(test_runs.counter), %(param_1)s) + %(coalesce_1)s AS anon_1
FROM test_runs
WHERE test_runs.tag = %(tag_1)s)'
{'coalesce_1': 1, 'param_1': 0, 'tag_1': 'mytag', 'tag': 'mytag', 'id': 267L}
Which looks reasonable, except it's missing parenthesis around the SELECT call. When I run the SQL query manually it gives me the same exact error that sqlalchemy gives me until I type in the parenthesis manually which fixes everything up. Seems like an unlikely bug that sqlalchemy would forget to put parenthesis when it needs to, so my question is am I missing a function to use subqueries correctly when adding rows using sqlalchemy?
Instead of using subquery() call as_scalar() method:
Return the full SELECT statement represented by this Query, converted
to a scalar subquery.
Example:
Models with classing parent-child relationship:
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
counter = Column(Integer, nullable=False, default=0)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
parent_id = Column(ForeignKey(Parent.id), nullable=False)
parent = relationship(Parent)
Code to update counter field:
parent.counter = session.query(func.count(Child.id))\
.filter_by(parent=parent).as_scalar()
Produced SQL (copied from the log):
UPDATE parents SET counter=(SELECT count(children.id) AS count_1
FROM children
WHERE ? = children.parent_id) WHERE parents.id = ?

Categories

Resources