I have a situation where I am trying to count up the number of rows in a table when the column value is in a subquery. For example, lets say that I have some sql like so:
select count(*) from table1
where column1 in (select column2 from table2);
I have my tables defined like so:
class table1(Base):
__tablename__ = "table1"
__table_args__ = {'schema': 'myschema'}
acct_id = Column(DECIMAL(precision=15), primary_key=True)
class table2(Base):
__tablename__ = "table2"
__table_args__ = {'schema': 'myschema'}
ban = Column(String(length=128), primary_key=True)
The tables are reflected from the database so there are other attributes present that aren't explicitly specified in the class definition.
I can try to write my query but here is where I am getting stuck...
qry=self.session.query(func.?(...)) # what to put here?
res = qry.one()
I tried looking through the documentation here but I don't see any comparable implementation to the 'in' keyword which is a feature of many SQL dialects.
I am using Teradata as my backend if that matters.
sub_stmt = session.query(table2.some_id)
stmt = session.query(table1).filter(table1.id.in_(sub_stmt))
data = stmt.all()
Related
I'm trying to build a database with SQLAlchemy, my problem is that I have two tables with the same columns name and trying to populate a third table from the two others. There is below a simple diagram to illustrate:
I usually set Foreign key on one table and the relationship on the other like that :
class TableA(Base):
__tablename__ = "tableA"
id = Column(Integer, primary_key=True)
name = Column(String(100))
age = Column(Integer)
name_relation = relationship("TableC", backref='owner')
class TableC(Base):
__tablename__ = "tableC"
id = Column(Integer, primary_key=True)
name = Column(String(100), ForeignKey('tableA.name'))
age = Column(Integer)
You can see that this method can only works with two table because my ForeignKey on tableC for the name specifies the name of tableA.
Is there a way to do that ?
Thanks
In SQL, the query you'd be looking for is
INSERT INTO C (id, name, age) (
SELECT *
FROM A
UNION ALL
SELECT *
FROM B
)
As per this answer, this makes the equivalent SQLAlchemy
session = Session()
query = session.query(TableA).union_all(session.query(TableB))
stmt = TableC.insert().from_select(['id', 'name', 'age'], query)
or equivalently
stmt = TableC.insert().from_select(
['id', 'name', 'age'],
TableA.select().union_all(TableB.select())
)
After which you can execute it using connection.execute(stmt) or session.execute(stmt), depending on what you're using.
I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)
You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.
I have a table of time series data that I frequently need to get records where the date is equal to the max date in the table. In SQL this is easily accomplished via subquery, i.e.:
SELECT * from my_table where date = (select max(date) from my_table);
The model for this table would look like:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
And I can accomplish the desired behavior in SQLAlchemy with two separate queries, ie:
maxdate = session.query(func.max(MyTable.date)).first()[0]
desired_results = session.query(MyTable).filter(MyTable.date == maxdate).all()
The problem is that I have this subquery sprinkled everywhere in my code and I feel it is an inelegant solution. Ideally I would like to write a class property or custom comparator that I can stick in the model definition, so that I can compress the subquery into a single line and reuse it constantly, something like:
session.query(MyTable).filter(MyTable.date == MyTable.max_date)
I have looked through the SQLAlchemy docs on this but haven't come up with anything that works. Does anybody have neat a solution for this kind of problem?
For posterity, here is the solution I came up with
from sqlalchemy.sql import func
from sqlalchemy import select
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
maxdate = select([func.max(date)])
desired_results = session.query(MyTable).filter(MyTable.date == MyTable.maxdate).all()
I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?
The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".
Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()
I'm using Postgresql with SQLAlchemy but it seems sqlalchemy is having trouble adding rows when using subqueries.
In my example, I want to update a counter for a specific tag in a table.
In SqlAlchemy a test run class would look like the following:
class TestRun( base ):
__tablename__ = 'test_runs'
id = sqlalchemy.Column( 'id', sqlalchemy.Integer, sqlalchemy.Sequence('user_id_seq'), primary_key=True )
tag = sqlalchemy.Column( 'tag', sqlalchemy.String )
counter = sqlalchemy.Column( 'counter', sqlalchemy.Integer )
The insertion code should then look like the following:
tag = 'sampletag'
counterquery = session.query(sqlalchemy.func.coalesce(sqlalchemy.func.max(TestRun.counter),0) + 1).\
filter(TestRun.tag == tag).\
subquery()
testrun = TestRun()
testrun.tag = tag
testrun.counter = counterquery
session.add( testrun )
session.commit()
The problem with this, is it gives a very interesting error when running this code, it's trying to run the following SQL Query:
'INSERT INTO test_runs (id, tag, counter)
VALUES (%(id)s,
%(tag)s,
SELECT coalesce(max(test_runs.counter), %(param_1)s) + %(coalesce_1)s AS anon_1
FROM test_runs
WHERE test_runs.tag = %(tag_1)s)'
{'coalesce_1': 1, 'param_1': 0, 'tag_1': 'mytag', 'tag': 'mytag', 'id': 267L}
Which looks reasonable, except it's missing parenthesis around the SELECT call. When I run the SQL query manually it gives me the same exact error that sqlalchemy gives me until I type in the parenthesis manually which fixes everything up. Seems like an unlikely bug that sqlalchemy would forget to put parenthesis when it needs to, so my question is am I missing a function to use subqueries correctly when adding rows using sqlalchemy?
Instead of using subquery() call as_scalar() method:
Return the full SELECT statement represented by this Query, converted
to a scalar subquery.
Example:
Models with classing parent-child relationship:
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
counter = Column(Integer, nullable=False, default=0)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
parent_id = Column(ForeignKey(Parent.id), nullable=False)
parent = relationship(Parent)
Code to update counter field:
parent.counter = session.query(func.count(Child.id))\
.filter_by(parent=parent).as_scalar()
Produced SQL (copied from the log):
UPDATE parents SET counter=(SELECT count(children.id) AS count_1
FROM children
WHERE ? = children.parent_id) WHERE parents.id = ?