How to execute "left outer join" in SqlAlchemy - python

I need to execute this query::
select field11, field12
from Table_1 t1
left outer join Table_2 t2 ON t2.tbl1_id = t1.tbl1_id
where t2.tbl2_id is null
I had these classes in python:
class Table1(Base):
....
class Table2(Base):
table_id = Column(
Integer,
ForeignKey('Table1.id', ondelete='CASCADE'),
)
....
How do I get to the above from the below?

q = session.query(Table1.field1, Table1.field2)\
.outerjoin(Table2)\ # use in case you have relationship defined
# .outerjoin(Table2, Table1.id == Table2.table_id)\ # use if you do not have relationship defined
.filter(Table2.tbl2_id == None)
should do it, assuming that field1 and field2 are from Table1, and that you define a relationship:
class Table2(Base):
# ...
table1 = relationship(Table1, backref="table2s")

You can also do that using SQLAlchemy Core only:
session.execute(
select(['field11', 'field12'])
.select_from(
Table1.outerjoin(Table2, Table1.tbl1_id == Table2.tbl1_id))
.where(Table2.tbl2_id.is_(None))
)
PS .outerjoin(table, condition) is equivalent to .join(table, condition, isouter=True).

Related

Auto-aliasing issues when selecting by relationship count with SQLAlchemy + Postgres

Given the following code:
from sqlalchemy import Column, ForeignKey, Integer, alias, create_engine, func, select
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
Base = declarative_base()
engine = create_engine(
"postgresql+psycopg2://***:***#127.0.0.1:5432/***", future=True
)
Session = sessionmaker(engine)
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
bars = relationship("Bar", uselist=True, secondary="foo_bar")
baz_id = Column(ForeignKey("baz.id"))
baz = relationship("Baz", back_populates="foos", lazy="joined")
class Bar(Base):
__tablename__ = "bar"
id = Column(Integer, primary_key=True)
class Baz(Base):
__tablename__ = "baz"
id = Column(Integer, primary_key=True)
foos = relationship(Foo, uselist=True)
class FooBar(Base):
__tablename__ = "foo_bar"
foo_id = Column(ForeignKey(Foo.id), primary_key=True)
bar_id = Column(ForeignKey(Bar.id), primary_key=True)
Base.metadata.create_all(engine)
stmt = (
select(Foo)
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Session().execute(stmt)
I want to select all Foos with exactly two Bars.
But I'm running into the following error:
column "baz_1.id" must appear in the GROUP BY clause or be used in an aggregate function
The generated SQL is:
SELECT foo.id, foo.baz_id, baz_1.id AS id_1
FROM foo JOIN foo_bar ON foo_bar.foo_id = foo.id
LEFT OUTER JOIN baz AS baz_1 ON baz_1.id = foo.baz_id GROUP BY foo.id
HAVING count(foo_bar.foo_id) = :count_1
Now I get what Postgres wants me to do, but I'm not sure how to achieve this, since I can't add baz_1.id to the GROUP PY clause because it's something that SQLAlchemy generates on the fly and I don't have any control over it.
Baz is being included in the query because of the lazy='joined' option on the relationship in Foo. We can override that option in the query, so that the join is not executed and the query works as desired.
stmt = (
select(Foo)
.options(orm.lazyload(Foo.baz)) # <- don't automatically join Baz.
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Generated SQL:
SELECT foo.id, foo.baz_id
FROM foo
JOIN foo_bar ON foo_bar.foo_id = foo.id
GROUP BY foo.id
HAVING count(foo_bar.foo_id) = %(count_1)s

SqlAlchemy: Sub-Select with a Dual EXISTS (OR) and an additional boolean check

In SqlAlchemy I need to implement the following subquery, which runs fine in PostgreSQL. It is an OR condition consisting of 2 EXISTS, plus an additional AND block. That whole column results in a True/False boolean value.
SELECT
...
...
,
(SELECT
(
EXISTS (SELECT id from participating_ic_t pi1 where pi1.id = agreement_t_1.participating_ic_id
AND pi1.ic_nihsac = substr(ned_person_t_2.nihsac, 1, 3))
OR EXISTS (SELECT id from external_people_t ep1 where ep1.participating_ic_id = agreement_t_1.participating_ic_id
AND ep1.uniqueidentifier = ned_person_t_2.uniqueidentifier)
)
AND ned_person_t_2.current_flag = 'Y' and ned_person_t_2.inactive_date is null and ned_person_t_2.organizationalstat = 'EMPLOYEE'
) as ACTIVE_APPROVER1,
First of all, if I omit the additional AND block, the following OR-EXISTS by itself works OK:
subq1 = db_session.query(ParticipatingIcT.id).filter((ParticipatingIcT.id == agreement.participating_ic_id),
(ParticipatingIcT.ic_nihsac == func.substr(approver.nihsac, 1, 3)))
.subquery()
subq2 = db_session.query(ExternalPeopleT.id).filter((ExternalPeopleT.participating_ic_id == agreement.participating_ic_id),
(ExternalPeopleT.uniqueidentifier == approver.uniqueidentifier))
.subquery()
subqMain = db_session.query(or_(exists(subq1), exists(subq2))
.subquery()
# ...
# Now we will select from subqMain.
agreements = (db_session.query(
..,
subqMain
But the problem starts when I introduce the final AND block. Conceptually, the final result should be the following:
subqMain = db_session.query(and_(
or_(exists(subq1), exists(subq2)),
approver.current_flag == 'Y',
approver.inactive_date == None,
approver.organizationalstat == 'EMPLOYEE'))
.subquery()
But this actually emits " .. FROM APPROVER_T" right in the sub-query, whereas it should be linked to the FROM APPROVER_T of the main query at the very end. I need to avoid adding the .. FROM [table] which happens as soon as I specify the and_(..). I don't understand why it's doing that. All subqueries are specifically marked as subquery(). The alias approver is defined at the very top as approver = aliased(NedPersonT).
exists1 = db_session.query(
ParticipatingIcT.id
).filter(
(ParticipatingIcT.id == agreement.participating_ic_id),
(ParticipatingIcT.ic_nihsac == func.substr(approver.nihsac, 1, 3))
).exists() # use exists here instead of subquery
exists2 = db_session.query(
ExternalPeopleT.id
).filter(
(ExternalPeopleT.participating_ic_id == agreement.participating_ic_id),
(ExternalPeopleT.uniqueidentifier == approver.uniqueidentifier)
).exists() # use exists again instead of subquery
subqMain = db_session.query(and_(
or_(exists1, exists2),
approver.current_flag == 'Y',
approver.inactive_date == None,
approver.organizationalstat == 'EMPLOYEE')).subquery()
# ...
# Now we will select from subqMain.
agreements = (db_session.query(
OtherModel,
subqMain))
sqlalchemy.orm.Query.exists
Half Baked Example
I also use aliased here because I don't quite understand the method of operations concerning wrapping sub queries in parentheses. It seems to work without it but the RAW sql is confusing to read. This is toy example I made to try and see if the SQL produced was invalid.
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String, nullable=False)
hobbies = relationship('Hobby', backref='user')
class Hobby(Base):
__tablename__ = 'hobbies'
id = Column(Integer, primary_key=True, autoincrement=True)
user_id = Column(Integer, ForeignKey('users.id'), nullable = False)
name = Column(String, nullable=False)
Base.metadata.create_all(engine, checkfirst=True)
with Session(engine) as session:
user1 = User(name='user1')
session.add(user1)
session.add_all([
Hobby(name='biking', user=user1),
Hobby(name='running', user=user1),
Hobby(name='eating', user=user1),
])
user2 = User(name='user2')
session.add(user2)
session.add(Hobby(name='biking', user=user2))
session.commit()
nested_user = aliased(User)
subq1 = session.query(Hobby.id).filter(nested_user.id ==Hobby.user_id, Hobby.name.like('bik%'))
subq2 = session.query(Hobby.id).filter(nested_user.id ==Hobby.user_id, Hobby.name.like('eat%'))
subqmain = session.query(nested_user).filter(or_(subq1.exists(), subq2.exists()), nested_user.id > 0).subquery()
q = session.query(User).select_from(User).join(subqmain, User.id == subqmain.c.id)
print (q)
print ([user.name for user in q.all()])
SELECT users.id AS users_id, users.name AS users_name
FROM users JOIN (SELECT users_1.id AS id, users_1.name AS name
FROM users AS users_1
WHERE ((EXISTS (SELECT 1
FROM hobbies
WHERE users_1.id = hobbies.user_id AND hobbies.name LIKE %(name_1)s)) OR (EXISTS (SELECT 1
FROM hobbies
WHERE users_1.id = hobbies.user_id AND hobbies.name LIKE %(name_2)s))) AND users_1.id > %(id_1)s) AS anon_1 ON users.id = anon_1.id
['user1', 'user2']
Something wasn't working with that Sub-Select so I rewrote it as a CASE statement instead. Now it's working. Also, I had to join on a couple of extra tables in the main query to facilitate this new CASE.
casePrimaryApprover = case(
[
(and_(
(agreement.approving_official_id.is_not(None)),
(approver.current_flag == 'Y'),
(approver.inactive_date.is_(None)),
(approver.organizationalstat == 'EMPLOYEE'),
or_(
(participatingIc.ic_nihsac == func.substr(approver.nihsac, 1, 3)),
(externalPeoplePrimaryApprover.id.is_not(None))
)
), 'Y')
],
else_ = 'N'
)
# Main Query
agreements = (db_session.query(
agreement.id,
...
casePrimaryApprover
...
.join(participatingIc, agreement.participating_ic_id == participatingIc.id)
.outerjoin(externalPeoplePrimaryApprover, approver.uniqueidentifier == externalPeoplePrimaryApprover.uniqueidentifier)

SQLAlchemy: Selecting all records in one table that are not in another, related table

I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)

Sqlalchemy: subquery in FROM must have an alias

How can I structure this sqlalchemy query so that it does the right thing?
I've given everything I can think of an alias, but I'm still getting:
ProgrammingError: (psycopg2.ProgrammingError) subquery in FROM must have an alias
LINE 4: FROM (SELECT foo.id AS foo_id, foo.version AS ...
Also, as IMSoP pointed out, it seems to be trying to turn it into a cross join, but I just want it to join a table with a group by subquery on that same table.
Here is the sqlalchemy:
(Note: I've rewritten it to be a standalone file that is as complete as possible and can be run from a python shell)
from sqlalchemy import create_engine, func, select
from sqlalchemy import Column, BigInteger, DateTime, Integer, String, SmallInteger
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('postgresql://postgres:########localhost:5435/foo1234')
session = sessionmaker()
session.configure(bind=engine)
session = session()
Base = declarative_base()
class Foo(Base):
__tablename__ = 'foo'
__table_args__ = {'schema': 'public'}
id = Column('id', BigInteger, primary_key=True)
time = Column('time', DateTime(timezone=True))
version = Column('version', String)
revision = Column('revision', SmallInteger)
foo_max_time_q = select([
func.max(Foo.time).label('foo_max_time'),
Foo.id.label('foo_id')
]).group_by(Foo.id
).alias('foo_max_time_q')
foo_q = select([
Foo.id.label('foo_id'),
Foo.version.label('foo_version'),
Foo.revision.label('foo_revision'),
foo_max_time_q.c.foo_max_time.label('foo_max_time')
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
).alias('foo_q')
thing = session.query(foo_q).all()
print thing
generated sql:
SELECT foo_id AS foo_id,
foo_version AS foo_version,
foo_revision AS foo_revision,
foo_max_time AS foo_max_time,
foo_max_time_q.foo_max_time AS foo_max_time_q_foo_max_time,
foo_max_time_q.foo_id AS foo_max_time_q_foo_id
FROM (SELECT id AS foo_id,
version AS foo_version,
revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q)
JOIN (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q
ON foo_max_time_q.foo_id = id
and here is the toy table:
CREATE TABLE foo (
id bigint ,
time timestamp with time zone,
version character varying(32),
revision smallint
);
The SQL was I expecting to get (desired SQL) would be something like this:
SELECT foo.id AS foo_id,
foo.version AS foo_version,
foo.revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM foo
JOIN (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q
ON foo_max_time_q.foo_id = foo.id
Final note:
I'm hoping to get an answer using select() instead of session.query() if possible. Thank you
You are almost there. Make a "selectable" subquery and join it with the main query via join():
foo_max_time_q = select([func.max(Foo.time).label('foo_max_time'),
Foo.id.label('foo_id')
]).group_by(Foo.id
).alias("foo_max_time_q")
foo_q = session.query(
Foo.id.label('foo_id'),
Foo.version.label('foo_version'),
Foo.revision.label('foo_revision'),
foo_max_time_q.c.foo_max_time.label('foo_max_time')
).join(foo_max_time_q,
foo_max_time_q.c.foo_id == Foo.id)
print(foo_q.__str__())
Prints (prettified manually):
SELECT
foo.id AS foo_id,
foo.version AS foo_version,
foo.revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM
foo
JOIN
(SELECT
max(foo.time) AS foo_max_time,
foo.id AS foo_id
FROM
foo
GROUP BY foo.id) AS foo_max_time_q
ON
foo_max_time_q.foo_id = foo.id
The complete working code is available in this gist.
Cause
subquery in FROM must have an alias
This error means the subquery (on which we're trying to perform a join) has no alias.
Even if we .alias('t') it just to satisfy this requirement, we will then get the next error:
missing FROM-clause entry for table "foo"
That's because the join on clause (... == Foo.id) is not familiar with Foo.
It only knows the "left" and "right" tables: t (the subquery) and foo_max_time_q.
Solution
Instead, select_from a join of Foo and foo_max_time_q.
Method 1
Replace .join(B, on_clause) with .select_from(B.join(A, on_clause):
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(foo_max_time_q.join(Foo, foo_max_time_q.c.foo_id == Foo.id)
This works here because A INNER JOIN B is equivalent to B INNER JOIN A.
Method 2
To preserve the order of joined tables:
from sqlalchemy import join
and replace .join(B, on_clause) with .select_from(join(A, B, on_clause)):
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(join(Foo, foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id)
Alternatives to session.query() can be found here.

SQLAlchemy: filtering multiple column values in a subquery

In SQL, I can use the IN operator with a subquery like so:
SELECT * FROM t1
WHERE (t1.some_int, t1.some_string) IN (
SELECT id, name FROM t2
)
But I am unable to translate this to an SQLAlchemy query. As far as I know, the in_ method only works on one column. Is there any way to replicate this functionality in SQLAlchemy?
You could use JOIN instead of subquery. Something like this:
SELECT * FROM t1 INNER JOIN t2 ON t1.some_int = t2.id AND t1.some_string = t2.name
And in sqlalchemy:
T1:
class T1(DeclarativeBase):
__tablename__ = 't1'
__table_args__ = {'mysql_engine': 'InnoDB'}
id = Column(u'id', Integer, primary_key=True)
some_int = Column('some_int', Integer)
some_str = Column('some_str', String(45))
def __init__ (self, some_int, some_str):
self.some_int = some_int
self.some_str = some_str
T2:
class T2(DeclarativeBase):
__tablename__ = 't2'
__table_args__ = {'mysql_engine': 'InnoDB'}
id = Column(u'id', Integer, primary_key=True)
name = Column('name', String(45))
data = Column('data', String(45))
def __init__ (self, name, data):
self.name = name
self.data = data
In source code:
data = session.query(T1).join(T2, and_(T1.some_int == T2.id, T1.some_string == T2.name)).all()
In result engine generates sql:
SELECT t1.some_int AS t1_some_int, t1.id AS t1_id FROM t1 INNER JOIN t2 ON t1.some_int = t2.id AND t1.some_str = t2.name

Categories

Resources