sqlalchemy: Select from table where column in QUERY

sqlalchemy: Select from table where column in QUERY - python

I have a situation where I am trying to count up the number of rows in a table when the column value is in a subquery. For example, lets say that I have some sql like so:
select count(*) from table1
where column1 in (select column2 from table2);
I have my tables defined like so:
class table1(Base):
__tablename__ = "table1"
__table_args__ = {'schema': 'myschema'}
acct_id = Column(DECIMAL(precision=15), primary_key=True)
class table2(Base):
__tablename__ = "table2"
__table_args__ = {'schema': 'myschema'}
ban = Column(String(length=128), primary_key=True)
The tables are reflected from the database so there are other attributes present that aren't explicitly specified in the class definition.
I can try to write my query but here is where I am getting stuck...
qry=self.session.query(func.?(...)) # what to put here?
res = qry.one()
I tried looking through the documentation here but I don't see any comparable implementation to the 'in' keyword which is a feature of many SQL dialects.
I am using Teradata as my backend if that matters.

sub_stmt = session.query(table2.some_id)
stmt = session.query(table1).filter(table1.id.in_(sub_stmt))
data = stmt.all()

Related

Is there a way to populate one table from multiple on SQLAlchemy

I'm trying to build a database with SQLAlchemy, my problem is that I have two tables with the same columns name and trying to populate a third table from the two others. There is below a simple diagram to illustrate:
I usually set Foreign key on one table and the relationship on the other like that :
class TableA(Base):
__tablename__ = "tableA"
id = Column(Integer, primary_key=True)
name = Column(String(100))
age = Column(Integer)
name_relation = relationship("TableC", backref='owner')
class TableC(Base):
__tablename__ = "tableC"
id = Column(Integer, primary_key=True)
name = Column(String(100), ForeignKey('tableA.name'))
age = Column(Integer)
You can see that this method can only works with two table because my ForeignKey on tableC for the name specifies the name of tableA.
Is there a way to do that ?
Thanks

In SQL, the query you'd be looking for is
INSERT INTO C (id, name, age) (
SELECT *
FROM A
UNION ALL
SELECT *
FROM B
)
As per this answer, this makes the equivalent SQLAlchemy
session = Session()
query = session.query(TableA).union_all(session.query(TableB))
stmt = TableC.insert().from_select(['id', 'name', 'age'], query)
or equivalently
stmt = TableC.insert().from_select(
['id', 'name', 'age'],
TableA.select().union_all(TableB.select())
)
After which you can execute it using connection.execute(stmt) or session.execute(stmt), depending on what you're using.

SQL to SQLAlchemy translation

I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)

You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.

SQLAlchemy class method for subquery

I have a table of time series data that I frequently need to get records where the date is equal to the max date in the table. In SQL this is easily accomplished via subquery, i.e.:
SELECT * from my_table where date = (select max(date) from my_table);
The model for this table would look like:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
And I can accomplish the desired behavior in SQLAlchemy with two separate queries, ie:
maxdate = session.query(func.max(MyTable.date)).first()[0]
desired_results = session.query(MyTable).filter(MyTable.date == maxdate).all()
The problem is that I have this subquery sprinkled everywhere in my code and I feel it is an inelegant solution. Ideally I would like to write a class property or custom comparator that I can stick in the model definition, so that I can compress the subquery into a single line and reuse it constantly, something like:
session.query(MyTable).filter(MyTable.date == MyTable.max_date)
I have looked through the SQLAlchemy docs on this but haven't come up with anything that works. Does anybody have neat a solution for this kind of problem?

For posterity, here is the solution I came up with
from sqlalchemy.sql import func
from sqlalchemy import select
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
maxdate = select([func.max(date)])
desired_results = session.query(MyTable).filter(MyTable.date == MyTable.maxdate).all()

Turning SQL expression into SQLAlchemy query

I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?

The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".

Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()

How to Add rows using subqueries in sqlalchemy?

I'm using Postgresql with SQLAlchemy but it seems sqlalchemy is having trouble adding rows when using subqueries.
In my example, I want to update a counter for a specific tag in a table.
In SqlAlchemy a test run class would look like the following:
class TestRun( base ):
__tablename__ = 'test_runs'
id = sqlalchemy.Column( 'id', sqlalchemy.Integer, sqlalchemy.Sequence('user_id_seq'), primary_key=True )
tag = sqlalchemy.Column( 'tag', sqlalchemy.String )
counter = sqlalchemy.Column( 'counter', sqlalchemy.Integer )
The insertion code should then look like the following:
tag = 'sampletag'
counterquery = session.query(sqlalchemy.func.coalesce(sqlalchemy.func.max(TestRun.counter),0) + 1).\
filter(TestRun.tag == tag).\
subquery()
testrun = TestRun()
testrun.tag = tag
testrun.counter = counterquery
session.add( testrun )
session.commit()
The problem with this, is it gives a very interesting error when running this code, it's trying to run the following SQL Query:
'INSERT INTO test_runs (id, tag, counter)
VALUES (%(id)s,
%(tag)s,
SELECT coalesce(max(test_runs.counter), %(param_1)s) + %(coalesce_1)s AS anon_1
FROM test_runs
WHERE test_runs.tag = %(tag_1)s)'
{'coalesce_1': 1, 'param_1': 0, 'tag_1': 'mytag', 'tag': 'mytag', 'id': 267L}
Which looks reasonable, except it's missing parenthesis around the SELECT call. When I run the SQL query manually it gives me the same exact error that sqlalchemy gives me until I type in the parenthesis manually which fixes everything up. Seems like an unlikely bug that sqlalchemy would forget to put parenthesis when it needs to, so my question is am I missing a function to use subqueries correctly when adding rows using sqlalchemy?

Instead of using subquery() call as_scalar() method:
Return the full SELECT statement represented by this Query, converted
to a scalar subquery.
Example:
Models with classing parent-child relationship:
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
counter = Column(Integer, nullable=False, default=0)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
parent_id = Column(ForeignKey(Parent.id), nullable=False)
parent = relationship(Parent)
Code to update counter field:
parent.counter = session.query(func.count(Child.id))\
.filter_by(parent=parent).as_scalar()
Produced SQL (copied from the log):
UPDATE parents SET counter=(SELECT count(children.id) AS count_1
FROM children
WHERE ? = children.parent_id) WHERE parents.id = ?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

sqlalchemy: Select from table where column in QUERY - python

sub_stmt = session.query(table2.some_id) stmt = session.query(table1).filter(table1.id.in_(sub_stmt)) data = stmt.all()

Related

Is there a way to populate one table from multiple on SQLAlchemy

SQL to SQLAlchemy translation

SQLAlchemy class method for subquery

Turning SQL expression into SQLAlchemy query

How to Add rows using subqueries in sqlalchemy?

Categories

Resources