I have a table of time series data that I frequently need to get records where the date is equal to the max date in the table. In SQL this is easily accomplished via subquery, i.e.:
SELECT * from my_table where date = (select max(date) from my_table);
The model for this table would look like:
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
And I can accomplish the desired behavior in SQLAlchemy with two separate queries, ie:
maxdate = session.query(func.max(MyTable.date)).first()[0]
desired_results = session.query(MyTable).filter(MyTable.date == maxdate).all()
The problem is that I have this subquery sprinkled everywhere in my code and I feel it is an inelegant solution. Ideally I would like to write a class property or custom comparator that I can stick in the model definition, so that I can compress the subquery into a single line and reuse it constantly, something like:
session.query(MyTable).filter(MyTable.date == MyTable.max_date)
I have looked through the SQLAlchemy docs on this but haven't come up with anything that works. Does anybody have neat a solution for this kind of problem?
For posterity, here is the solution I came up with
from sqlalchemy.sql import func
from sqlalchemy import select
class MyTable(Base):
__tablename__ = 'my_table'
id = Column(Integer, primary_key = True)
date = Column(Date)
maxdate = select([func.max(date)])
desired_results = session.query(MyTable).filter(MyTable.date == MyTable.maxdate).all()
Related
I am trying to build an ORM mapped SQLite database. The conception of the DB seems to work as intended but I can't seem to be able to query it properly for more complex cases. I have spent the day trying to find an existing answer to my question but nothing works. I am not sure if the issue is with my mapping, my query or both. Or if maybe querying with attributes from a many to many association table with extra data works differently.
This the DB setup:
engine = create_engine('sqlite:///')
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = 'users'
# Columns
id = Column('id', Integer, primary_key=True)
first = Column('first_name', String(100))
last = Column('last_name', String(100))
age = Column('age', Integer)
quality = Column('quality', String(100))
unit = Column('unit', String(100))
# Relationships
cases = relationship('UserCaseLink', back_populates='user_data')
def __repr__(self):
return f"<User(first='{self.first}', last='{self.last}', quality='{self.quality}', unit='{self.unit}')>"
class Case(Base):
__tablename__ = 'cases'
# Columns
id = Column('id', Integer, primary_key=True)
num = Column('case_number', String(100))
type = Column('case_type', String(100))
# Relationships
users = relationship('UserCaseLink', back_populates='case_data')
def __repr__(self):
return f"<Case(num='{self.num}', type='{self.type}')>"
class UserCaseLink(Base):
__tablename__ = 'users_cases'
# Columns
user_id = Column('user_id', Integer, ForeignKey('users.id'), primary_key=True)
case_id = Column('case_id', Integer, ForeignKey('cases.id'), primary_key=True)
role = Column('role', String(100))
# Relationships
user_data = relationship('User', back_populates='cases')
case_data = relationship('Case', back_populates='users')
if __name__ == '__main__':
Base.metadata.create_all()
session = Session()
and I would like to retrieve all the cases on which a particular person is working under a certain role.
So for example I want a list of all the cases a person named 'Alex' is working on as an 'Administrator'.
In other words I would like the result of this query:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
So far I have tried many things along the lines of:
cases = session.query(Case).filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')
but it is not working as I would like it to.
How can I modify the ORM mapping so that it does the joining for me and allows me to query easily (something like the query I tried)?
According to your calsses, the quivalent query for:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
is
cases = session.query(
Case.id, Case.num,Cas.type,
UserCaseLink.role
).filter(
(Case.id==UserCaseLink.case_id)
&(User.id==UserCaseLink.user_id)
&(User.first=='Alex')
&(UserCaseLink.role=='Administrator'
).all()
also, you can:
cases = Case.query\
.join(UserCaseLink,Case.id==UserCaseLink.case_id)\
.join(User,User.id==UserCaseLink.user_id)\
.filter( (User.first=='Alex') & (User.first=='Alex') )\
.all()
Good Luck
After comment
based in your comment, I think you want something like:
cases = Case.query\
.filter( (Case.case_data.cases.first=='Alex') & (Case.case_data.cases.first=='Alex') )\
.all()
where case_data connect between Case an UserCaseLink and cases connect between UserCaseLink and User as in your relations.
But,that case causes error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with dimpor.org_type has an attribute 'org_type_id'
The missage shows that the attributes combined in filter should belong to the table class
So I ended up having to compromise.
It seems the query cannot be aware of all the relationships present in the ORM mapping at all times. Instead I had to manually give it the path between the different classes for it to find all the data I wanted:
cases = session.query(Case)\
.join(Case.users)\
.join(UserCaseLink.user_data)\
.filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')\
.all()
However, as it does not meet all the criteria for my original question (ie I still have to specify the joins), I will not mark this answer as the accepted one.
I have a situation where I am trying to count up the number of rows in a table when the column value is in a subquery. For example, lets say that I have some sql like so:
select count(*) from table1
where column1 in (select column2 from table2);
I have my tables defined like so:
class table1(Base):
__tablename__ = "table1"
__table_args__ = {'schema': 'myschema'}
acct_id = Column(DECIMAL(precision=15), primary_key=True)
class table2(Base):
__tablename__ = "table2"
__table_args__ = {'schema': 'myschema'}
ban = Column(String(length=128), primary_key=True)
The tables are reflected from the database so there are other attributes present that aren't explicitly specified in the class definition.
I can try to write my query but here is where I am getting stuck...
qry=self.session.query(func.?(...)) # what to put here?
res = qry.one()
I tried looking through the documentation here but I don't see any comparable implementation to the 'in' keyword which is a feature of many SQL dialects.
I am using Teradata as my backend if that matters.
sub_stmt = session.query(table2.some_id)
stmt = session.query(table1).filter(table1.id.in_(sub_stmt))
data = stmt.all()
I have a, somewhat odd, query that gets me all the items in a parent table that have no matches in its corresponding child table.
If possible, id like to turn it into an SQLAlchemy query. But I have no idea how. I can do basic gets and filters, but this one is beyond my experience so far. Any help you folks might give would be greatly appreciated.
class customerTranslations(Base):
"""parent table. holds customer names"""
__tablename__ = 'customer_translation'
id = Column(Integer, primary_key=True)
class customerEmails(Base):
"""child table. hold emails for customers in translation table"""
__tablename__ = 'customer_emails'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('customer_translation.id'))
I want to build:
SELECT * FROM customer_translation
WHERE id NOT IN (SELECT parent_id FROM customer_emails)
You have a subquery, so create one first:
all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
and then you can use that to filter your other table:
translations_with_no_email = session.query(customerTranslations).filter(
~customerTranslations.id.in_(all_emails_stmnt))
This produces the same SQL (but with all the column names expanded, rather than using *, the ORM then can create your objects):
>>> all_emails_stmnt = session.query(customerEmails.parent_id).subquery()
>>> print(all_emails_stmnt)
SELECT customer_emails.parent_id
FROM customer_emails
>>> translations_with_no_email = session.query(customerTranslations).filter(
... ~customerTranslations.id.in_(all_emails_stmnt))
>>> print(translations_with_no_email)
SELECT customer_translation.id AS customer_translation_id
FROM customer_translation
WHERE customer_translation.id NOT IN (SELECT customer_emails.parent_id
FROM customer_emails)
You could also use NOT EXISTS:
from sqlalchemy.sql import exists
has_no_email_stmnt = ~exists().where(customerTranslations.id == customerEmails.parent_id)
translations_with_no_email = session.query(customerTranslations).filter(has_no_email_stmnt)
or, if you have a a backreference on the customerTranslations class pointing to emails, named emails, use .any() on the relationship and invert:
session.query(customerTranslations).filter(
~customerTranslations.emails.any())
Back in 2010 NOT EXISTS was a little slower on MySQL but you may want to re-assess if that is still the case.
I have this SQL expression that I'm trying to write in SQL Alchemy
select * from candidates1 c
inner join uploaded_emails1 e
on c.id=e.candidate_id
group by e.thread_id
How would I go about doing that?
The execute method can be used to run raw SQL, like so:
from sqlalchemy import text
sql = text('select * from candidates1 c inner join uploaded_emails1 e on c.id=e.candidate_id group by e.thread_id')
result = db.engine.execute(sql)
... do stuff ...
If you have some models that you're working with, you could use the relationship field type to create a one-to-many relationship between the Candidate and the UploadedEmail, like so:
class Candidate(Base):
__tablename__ = 'candidates1'
id = Column(Integer, primary_key=True)
uploaded_emails = relationship("UploadedEmail", lazy='dynamic')
class UploadedEmail(Base):
__tablename__ = 'uploaded_emails1'
id = Column(Integer, primary_key=True)
candidate_id = Column(Integer, ForeignKey('candidate.id'))
thread_id = Column(Integer)
And in your code, you might use that like this (including the group_by)
candidate_id = 1
c = Candidate.query.filter_by(id=candidate_id).first()
thread_id_results = c.uploaded_emails.with_entities(UploadedEmail.thread_id).group_by(UploadedEmail.thread_id).all()
thread_ids = [row[0] for row in thread_id_results]
Note that you have to use the .with_entities clause to specify the columns you would like to select, and then the fact that you are specifying the thread_id column. If you don't do this, you'll get errors along the lines of "Expression #X of SELECT list is not in GROUP BY clause and contains nonaggregated column ... which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by".
Sorry I didn't provide enough information to answer the question. This ended up working:
x = db_session.query(Candidate1, Uploaded_Emails1).filter(Candidate1.id == Uploaded_Emails1.candidate_id).group_by(Uploaded_Emails1.thread_id).all()
I'm using Postgresql with SQLAlchemy but it seems sqlalchemy is having trouble adding rows when using subqueries.
In my example, I want to update a counter for a specific tag in a table.
In SqlAlchemy a test run class would look like the following:
class TestRun( base ):
__tablename__ = 'test_runs'
id = sqlalchemy.Column( 'id', sqlalchemy.Integer, sqlalchemy.Sequence('user_id_seq'), primary_key=True )
tag = sqlalchemy.Column( 'tag', sqlalchemy.String )
counter = sqlalchemy.Column( 'counter', sqlalchemy.Integer )
The insertion code should then look like the following:
tag = 'sampletag'
counterquery = session.query(sqlalchemy.func.coalesce(sqlalchemy.func.max(TestRun.counter),0) + 1).\
filter(TestRun.tag == tag).\
subquery()
testrun = TestRun()
testrun.tag = tag
testrun.counter = counterquery
session.add( testrun )
session.commit()
The problem with this, is it gives a very interesting error when running this code, it's trying to run the following SQL Query:
'INSERT INTO test_runs (id, tag, counter)
VALUES (%(id)s,
%(tag)s,
SELECT coalesce(max(test_runs.counter), %(param_1)s) + %(coalesce_1)s AS anon_1
FROM test_runs
WHERE test_runs.tag = %(tag_1)s)'
{'coalesce_1': 1, 'param_1': 0, 'tag_1': 'mytag', 'tag': 'mytag', 'id': 267L}
Which looks reasonable, except it's missing parenthesis around the SELECT call. When I run the SQL query manually it gives me the same exact error that sqlalchemy gives me until I type in the parenthesis manually which fixes everything up. Seems like an unlikely bug that sqlalchemy would forget to put parenthesis when it needs to, so my question is am I missing a function to use subqueries correctly when adding rows using sqlalchemy?
Instead of using subquery() call as_scalar() method:
Return the full SELECT statement represented by this Query, converted
to a scalar subquery.
Example:
Models with classing parent-child relationship:
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
counter = Column(Integer, nullable=False, default=0)
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
parent_id = Column(ForeignKey(Parent.id), nullable=False)
parent = relationship(Parent)
Code to update counter field:
parent.counter = session.query(func.count(Child.id))\
.filter_by(parent=parent).as_scalar()
Produced SQL (copied from the log):
UPDATE parents SET counter=(SELECT count(children.id) AS count_1
FROM children
WHERE ? = children.parent_id) WHERE parents.id = ?