I have three entities, A, B, C, where C links As to Bs (A-*C-B). I want to find those instances of A for which there is no instance of C that is not connected to a B.
I haven't been able to come up with a SQLAlchemy query that will do this for me and I'm beginning to think there's a problem with the compiler.
The following unit test illustrates this:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Column, Integer, create_engine, literal
_sql_engine = create_engine('sqlite:///:memory:')
session = sessionmaker(bind=_sql_engine)()
def test_model():
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
class B(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
class C(Base):
__tablename__ = 'c'
a = Column(Integer, primary_key=True)
b = Column(Integer, primary_key=True)
Base.metadata.create_all(_sql_engine)
a = [A(id=10), A(id=20)]
c = [C(a=10, b=11), C(a=20, b=21)]
b = [B(id=11)]
session.add_all(a + c + b)
session.commit()
q = session.query(A).filter(
A.id < literal(100),
~(
session.query(C)
.filter(
A.id == C.a,
~(
session.query(B)
.filter(
B.id == C.b,
).exists()
)
).exists()
)
)
print(q.statement)
print(len(q.all()))
assert len(q.all()) == 1
The test expects one result, but it gets zero. The SQL statement that is printed is
SELECT a.id
FROM a
WHERE a.id < :param_1 AND NOT (EXISTS (SELECT 1
FROM c
WHERE NOT (EXISTS (SELECT 1
FROM b, a
WHERE b.id = c.b AND a.id = c.a))))
Now, it looks to me like the problem is with the third FROM statement. b and a override the aliases above and disconnected from the previous constraints.
Is this correct? Is this how scoping in SQL works? If so, am I making a mistake with SQLAlchemy or is this a bug?
(The unit test uses SQLite, but the end result should run in PostgreSQL.)
Thanks to zzzeek on github, I found the answer: aliases are shadowed in subqueries after the second nesting. If we don't want this, we use correlated subqueries.
Fixed unit test:
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, aliased
from sqlalchemy import Column, Integer, create_engine, literal
_sql_engine = create_engine('sqlite:///:memory:')
session = sessionmaker(bind=_sql_engine)()
def test_model():
Base = declarative_base()
class A(Base):
__tablename__ = 'a'
id = Column(Integer, primary_key=True)
class B(Base):
__tablename__ = 'b'
id = Column(Integer, primary_key=True)
class C(Base):
__tablename__ = 'c'
a = Column(Integer, primary_key=True)
b = Column(Integer, primary_key=True)
Base.metadata.create_all(_sql_engine)
a = [A(id=10), A(id=20)]
c = [C(a=10, b=11), C(a=20, b=21)]
b = [B(id=11)]
session.add_all(a + c + b)
session.commit()
q = session.query(A).filter(
A.id < literal(100),
~(
session.query(C)
.filter(
A.id == C.a,
~(
session.query(B)
.filter(
B.id == C.b
).exists().correlate(A,C)
)
).exists()
)
)
print(q.statement)
print(len(q.all()))
assert len(q.all()) == 1
Related
I have the following table structure (I have simplified it as much as possible, narrowed down the child/inheriting tables [there are additional] and removed all irrelevant columns from the provided tables):
## Base is my declarative_base
class AbstractQuestion(Base):
questionTypeId: Column = Column(
Integer, ForeignKey("luQuestionTypes.id"), index=True, nullable=False
)
__mapper_args__ = {
"polymorphic_identity": 0,
"polymorphic_on": questionTypeId,
}
class MultiChoiceQuestion(AbstractQuestion):
id: Column = Column(Integer, ForeignKey(AbstractQuestion.id), primary_key=True)
__mapper_args__ = {"polymorphic_identity": 1}
class AbstractSurveyQuestion(AbstractQuestion):
id: Column = Column(Integer, ForeignKey(AbstractQuestion.id), primary_key=True)
surveyQuestionTypeId: Column = Column(
Integer, ForeignKey("luSurveyQuestionTypes.id"), index=True, nullable=False
)
__mapper_args__ = {"polymorphic_identity": 2}
class RatingQuestion(AbstractSurveyQuestion):
id: Column = Column(
Integer, ForeignKey(AbstractSurveyQuestion.id), primary_key=True
)
The challenge I'm facing is, that I'm trying to make AbstractSurveyQuestion have two types of polymorphic mappings - one as a child of AbstractQuestion with a polymorphic_identity that matches the questionTypeId, but I also need it to have a separate polymorphic_on mapper for its own child table, which is RatingQuestion.
The closest thing I could find was this question, but it doesn't seem to be aiming at exactly what I'm looking for.
I also looked at the official docs about inheritance, but again couldn't find an accurate example to what I'm trying to achieve.
Can anyone please help me with this?
Thanks!
I posted the same question on SQLAlchemy's GitHub repo. Got this answer from the maintainer:
https://github.com/sqlalchemy/sqlalchemy/discussions/8089#discussioncomment-2878725
I'll paste the contents below as well:
it sounds like you are looking for mult-level polymorphic_on. We don't support that right now without workarounds, and that's #2555 which is a feature we're unlikely to implement, or if we did it would be a long time from now.
It looks like you are using joined inheritance....so...two ways. The more SQL efficient one is to have an extra "supplemetary" column on your base table that can discriminate for AbstractSurveyQuestion...because if you query for all the AbstractQuestion objects, by default it's just going to query that one table, and needs to know from each row if that row is in fact a RatingQuestion.
the more convoluted way is to use mapper-configured with_polymorphic so that all queries for AbstractQuestion include all the tables (or a subset of tables, can be configured, but at minimum you'd need to join out to AbstractSurveyQuestion) using a LEFT OUTER JOIN (or if you really wanted to go crazy it can be a UNION ALL).
the workarounds are a little ugly since it's not very easy to get a "composed" value out of two columns in SQL, but they are contained to the base classes. Below examples work on SQLite and might need tweaking for other databases.
Here's the discriminator on base table demo, a query here looks like:
SELECT aq.id AS aq_id, aq.d1 AS aq_d1, aq.d2 AS aq_d2, CAST(aq.d1 AS VARCHAR) || ? || CAST(coalesce(aq.d2, ?) AS VARCHAR) AS _sa_polymorphic_on
FROM aq
from typing import Tuple, Optional
from sqlalchemy import cast
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import event
from sqlalchemy import ForeignKey
from sqlalchemy import inspect
from sqlalchemy import Integer, func
from sqlalchemy import String
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import Session
Base = declarative_base()
class ident_(str):
"""describe a composed identity.
Using a string for easy conversion to a string SQL composition.
"""
_tup: Tuple[int, Optional[int]]
def __new__(cls, d1, d2=None):
self = super().__new__(cls, f"{d1}, {d2 or ''}")
self._tup = d1, d2
return self
def _as_tuple(self):
return self._tup
class AbstractQuestion(Base):
__tablename__ = "aq"
id = Column(Integer, primary_key=True)
d1 = Column(
Integer, nullable=False
) # this can be your FK to the other table etc.
d2 = Column(
Integer, nullable=True
) # this is a "supplementary" discrim column
__mapper_args__ = {
"polymorphic_identity": ident_(0),
"polymorphic_on": cast(d1, String)
+ ", "
+ cast(func.coalesce(d2, ""), String),
}
#event.listens_for(AbstractQuestion, "init", propagate=True)
def _setup_poly(target, args, kw):
"""receive new AbstractQuestion objects when they are constructed and
set polymorphic identity"""
# this is the ident_() object
ident = inspect(target).mapper.polymorphic_identity
d1, d2 = ident._as_tuple()
kw["d1"] = d1
if d2:
kw["d2"] = d2
class MultiChoiceQuestion(AbstractQuestion):
__tablename__ = "mcq"
id: Column = Column(
Integer, ForeignKey(AbstractQuestion.id), primary_key=True
)
__mapper_args__ = {"polymorphic_identity": ident_(1)}
class AbstractSurveyQuestion(AbstractQuestion):
__tablename__ = "acq"
id: Column = Column(
Integer, ForeignKey(AbstractQuestion.id), primary_key=True
)
__mapper_args__ = {"polymorphic_identity": ident_(2)}
class RatingQuestion(AbstractSurveyQuestion):
__tablename__ = "rq"
id: Column = Column(
Integer, ForeignKey(AbstractSurveyQuestion.id), primary_key=True
)
__mapper_args__ = {"polymorphic_identity": ident_(2, 1)}
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)
s.add(MultiChoiceQuestion())
s.add(RatingQuestion())
s.commit()
s.close()
for q in s.query(AbstractQuestion):
print(q)
then there's the one that maintains your schema fully, a query here looks like:
SELECT aq.id AS aq_id, aq.d1 AS aq_d1, CAST(aq.d1 AS VARCHAR) || ? || CAST(coalesce(acq.d2, ?) AS VARCHAR) AS _sa_polymorphic_on, acq.id AS acq_id, acq.d2 AS acq_d2
FROM aq LEFT OUTER JOIN acq ON aq.id = acq.id
from typing import Tuple, Optional
from sqlalchemy import cast
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import event
from sqlalchemy import ForeignKey
from sqlalchemy import func
from sqlalchemy import inspect
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import Session
Base = declarative_base()
class ident_(str):
"""describe a composed identity.
Using a string for easy conversion to a string SQL composition.
"""
_tup: Tuple[int, Optional[int]]
def __new__(cls, d1, d2=None):
self = super().__new__(cls, f"{d1}, {d2 or ''}")
self._tup = d1, d2
return self
def _as_tuple(self):
return self._tup
class AbstractQuestion(Base):
__tablename__ = "aq"
id = Column(Integer, primary_key=True)
d1 = Column(
Integer, nullable=False
) # this can be your FK to the other table etc.
__mapper_args__ = {
"polymorphic_identity": ident_(0),
}
#event.listens_for(AbstractQuestion, "init", propagate=True)
def _setup_poly(target, args, kw):
"""receive new AbstractQuestion objects when they are constructed and
set polymorphic identity"""
# this is the ident_() object
ident = inspect(target).mapper.polymorphic_identity
d1, d2 = ident._as_tuple()
kw["d1"] = d1
if d2:
kw["d2"] = d2
class MultiChoiceQuestion(AbstractQuestion):
__tablename__ = "mcq"
id: Column = Column(
Integer, ForeignKey(AbstractQuestion.id), primary_key=True
)
__mapper_args__ = {"polymorphic_identity": ident_(1)}
class AbstractSurveyQuestion(AbstractQuestion):
__tablename__ = "acq"
id: Column = Column(
Integer, ForeignKey(AbstractQuestion.id), primary_key=True
)
d2 = Column(Integer, nullable=False)
__mapper_args__ = {
"polymorphic_identity": ident_(2),
"polymorphic_load": "inline", # adds ASQ to all AQ queries
}
# after ASQ is set up, set the discriminator on the base class
# that includes ASQ column
inspect(AbstractQuestion)._set_polymorphic_on(
cast(AbstractQuestion.d1, String)
+ ", "
+ cast(func.coalesce(AbstractSurveyQuestion.d2, ""), String)
)
class RatingQuestion(AbstractSurveyQuestion):
__tablename__ = "rq"
id: Column = Column(
Integer, ForeignKey(AbstractSurveyQuestion.id), primary_key=True
)
__mapper_args__ = {"polymorphic_identity": ident_(2, 1)}
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
s = Session(e)
s.add(MultiChoiceQuestion())
s.add(RatingQuestion())
s.commit()
s.close()
for q in s.query(AbstractQuestion):
print(q)
Given the following code:
from sqlalchemy import Column, ForeignKey, Integer, alias, create_engine, func, select
from sqlalchemy.orm import declarative_base, relationship, sessionmaker
Base = declarative_base()
engine = create_engine(
"postgresql+psycopg2://***:***#127.0.0.1:5432/***", future=True
)
Session = sessionmaker(engine)
class Foo(Base):
__tablename__ = "foo"
id = Column(Integer, primary_key=True)
bars = relationship("Bar", uselist=True, secondary="foo_bar")
baz_id = Column(ForeignKey("baz.id"))
baz = relationship("Baz", back_populates="foos", lazy="joined")
class Bar(Base):
__tablename__ = "bar"
id = Column(Integer, primary_key=True)
class Baz(Base):
__tablename__ = "baz"
id = Column(Integer, primary_key=True)
foos = relationship(Foo, uselist=True)
class FooBar(Base):
__tablename__ = "foo_bar"
foo_id = Column(ForeignKey(Foo.id), primary_key=True)
bar_id = Column(ForeignKey(Bar.id), primary_key=True)
Base.metadata.create_all(engine)
stmt = (
select(Foo)
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Session().execute(stmt)
I want to select all Foos with exactly two Bars.
But I'm running into the following error:
column "baz_1.id" must appear in the GROUP BY clause or be used in an aggregate function
The generated SQL is:
SELECT foo.id, foo.baz_id, baz_1.id AS id_1
FROM foo JOIN foo_bar ON foo_bar.foo_id = foo.id
LEFT OUTER JOIN baz AS baz_1 ON baz_1.id = foo.baz_id GROUP BY foo.id
HAVING count(foo_bar.foo_id) = :count_1
Now I get what Postgres wants me to do, but I'm not sure how to achieve this, since I can't add baz_1.id to the GROUP PY clause because it's something that SQLAlchemy generates on the fly and I don't have any control over it.
Baz is being included in the query because of the lazy='joined' option on the relationship in Foo. We can override that option in the query, so that the join is not executed and the query works as desired.
stmt = (
select(Foo)
.options(orm.lazyload(Foo.baz)) # <- don't automatically join Baz.
.join(FooBar, FooBar.foo_id == Foo.id)
.group_by(Foo.id)
.having(func.count(FooBar.foo_id) == 2)
)
Generated SQL:
SELECT foo.id, foo.baz_id
FROM foo
JOIN foo_bar ON foo_bar.foo_id = foo.id
GROUP BY foo.id
HAVING count(foo_bar.foo_id) = %(count_1)s
I tried to use sqlaclhemy joined table inheritance and had a strange occurrence.
class CommonObject(Base):
__tablename__ = "objects"
id = Column("objid", Integer, primary_key=True)
objname = Column(String(32))
...
class GoodsPlacement(Container, Loadable, Dumpable):
__tablename__ = "goods_placements"
id = Column("objid", Integer, ForeignKey("containers.objid"), primary_key=True)
...
class Departure(CommonObject):
__tablename__ = "departures"
id = Column(Integer, ForeignKey("objects.objid"), primary_key=True)
content_id = Column(Integer, ForeignKey("goods_placements.objid"))
content = relationship("GoodsPlacement",
primaryjoin="Departure.content_id==GoodsPlacement.id",
foreign_keys=[content_id],
lazy='joined',
backref="departures")
...
When I write query:
session.query(GoodsPlacement).filter(~GoodsPlacement.departures.any(Departure.status_id < 2))
it generates me something like this:
SELECT
objects.objid AS objects_objid,
goods_placements.objid AS goods_placements_objid,
objects.objname AS objects_objname
FROM objects
JOIN goods_placements ON objects.objid = goods_placements.objid
WHERE NOT (EXISTS (
SELECT 1
FROM (
SELECT
objects.objid AS objects_objid,
objects.objname AS objects_objname,
departures.id AS departures_id,
departures.content_id AS departures_content_id,
departures.status_id AS departures_status_id
FROM objects
JOIN departures ON objects.objid = departures.id)
AS anon_1, objects
WHERE anon_1.departures_content_id = objects.objid
AND anon_1.departures_status_id < :status_id_1)
)
And this doesn't work because objects in exist clause overrides outer objects.
As workaround I used exists from sqlexpression directly,
session.query(GoodsPlacement).filter(~exists([1],
and_("departures.status_id<2",
"departures.content_id=goods_placements.objid"),
from_obj="departures"))
but it strongly depends from column and table names.
How I can specify alias for object table in exists statement?
Debian wheezy, python-2.7.3rc2, sqlaclhemy 0.7.7-1
there's a bug involving the declarative system in how it sets up columns. The "objid" name you're giving the columns, distinct from the "id" attribute name, is the source of the issue here. The below test case approximates your above system and shows a workaround until the bug is fixed:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
Base= declarative_base()
class CommonObject(Base):
__tablename__ = "objects"
id = Column("objid", Integer, primary_key=True)
objname = Column(String(32))
class Container(CommonObject):
__tablename__ = 'containers'
id = Column("objid", Integer, ForeignKey("objects.objid"), primary_key=True)
class GoodsPlacement(Container):
__tablename__ = "goods_placements"
id = Column("objid", Integer, ForeignKey("containers.objid"), primary_key=True)
class Departure(CommonObject):
__tablename__ = "departures"
id = Column(Integer, ForeignKey("objects.objid"), primary_key=True)
content_id = Column(Integer, ForeignKey("goods_placements.objid"))
status_id = Column(Integer)
content = relationship("GoodsPlacement",
primaryjoin=lambda:Departure.__table__.c.content_id==GoodsPlacement.__table__.c.objid,
backref="departures"
)
session = Session()
print session.query(GoodsPlacement).filter(~GoodsPlacement.departures.any(Departure.status_id < 2))
output:
SELECT objects.objid AS objects_objid, containers.objid AS containers_objid, goods_placements.objid AS goods_placements_objid, objects.objname AS objects_objname
FROM objects JOIN containers ON objects.objid = containers.objid JOIN goods_placements ON containers.objid = goods_placements.objid
WHERE NOT (EXISTS (SELECT 1
FROM (SELECT objects.objid AS objects_objid, objects.objname AS objects_objname, departures.id AS departures_id, departures.content_id AS departures_content_id, departures.status_id AS departures_status_id
FROM objects JOIN departures ON objects.objid = departures.id) AS anon_1
WHERE anon_1.departures_content_id = goods_placements.objid AND anon_1.departures_status_id < :status_id_1))
Suppose we have a PostgreSQL database with two tables A, B.
table A columns: id, name
table B columns: id, name, array_a
The column array_a in table B contains a variable length array of ids from table A. In SQLAlchemy we have two classes that model those tables, say class A and B.
The following works fine to get all the objects A that are referenced in an object B:
session.query(A).join(B, A.id == func.any(B.array_a)).filter(B.id == <id>).all()
How can we create a relationship in B referencing the objects A corresponding to the array? Tried column comparators using the func.any above but it complains that ANY(array_a) is not a column in the model. Specifying the primaryjoin conditions as above doesn't seem to cut it either.
This anti-pattern is called "Jaywalking"; and PostgreSQL's powerful type system makes it very tempting. you should be using another table:
CREATE TABLE table_a (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE table_b (
id SERIAL PRIMARY KEY,
name VARCHAR
);
CREATE TABLE a_b (
a_id INTEGER PRIMARY KEY REFERENCES table_a(id),
b_id INTEGER PRIMARY KEY REFERENCES table_b(id)
)
Which is mapped:
from sqlalchemy import *
from sqlalchemy.dialects import postgresql
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import *
Base = declarative_base()
a_b_table = Table("a_b", Base.metadata,
Column("a_id", Integer, ForeignKey("table_a.id"), primary_key=True),
Column("b_id", Integer, ForeignKey("table_b.id"), primary_key=True))
class A(Base):
__tablename__ = "table_a"
id = Column(Integer, primary_key=True)
name = Column(String)
class B(Base):
__tablename__ = "table_b"
id = Column(Integer, primary_key=True)
name = Column(String)
a_set = relationship(A, secondary=a_b_table, backref="b_set")
example:
>>> print Query(A).filter(A.b_set.any(B.name == "foo"))
SELECT table_a.id AS table_a_id, table_a.name AS table_a_name
FROM table_a
WHERE EXISTS (SELECT 1
FROM a_b, table_b
WHERE table_a.id = a_b.a_id AND table_b.id = a_b.b_id AND table_b.name = :name_1)
If you are stuck with the ARRAY column, your best bet is to use an alternate selectable that "looks" like a proper association table.
from sqlalchemy import *
from sqlalchemy.dialects import postgresql
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import *
Base = declarative_base()
class A(Base):
__tablename__ = "table_a"
id = Column(Integer, primary_key=True)
name = Column(String)
class B(Base):
__tablename__ = "table_b"
id = Column(Integer, primary_key=True)
name = Column(String)
array_a = Column(postgresql.ARRAY(Integer))
a_b_selectable = select([func.unnest(B.array_a).label("a_id"),
B.id.label("b_id")]).alias()
A.b_set = relationship(B, secondary=a_b_selectable,
primaryjoin=A.id == a_b_selectable.c.a_id,
secondaryjoin=a_b_selectable.c.b_id == B.id,
viewonly=True,)
B.a_set = relationship(A, secondary=a_b_selectable,
primaryjoin=A.id == a_b_selectable.c.a_id,
secondaryjoin=a_b_selectable.c.b_id == B.id,
viewonly=True)
which gives you:
>>> print Query(A).filter(A.b_set.any(B.name == "foo"))
SELECT table_a.id AS table_a_id, table_a.name AS table_a_name
FROM table_a
WHERE EXISTS (SELECT 1
FROM (SELECT unnest(table_b.array_a) AS a_id, table_b.id AS b_id
FROM table_b) AS anon_1, table_b
WHERE table_a.id = anon_1.a_id AND anon_1.b_id = table_b.id AND table_b.name = :name_1)
And obviously, since there's no real table there, viewonly=True is neccesary and you can't get the nice, dynamic objecty goodness you would if you had avoided jaywalking.
Or else simply you can join explicitly like below:
class A(Base):
__tablename__ = "table_a"
id = Column(Integer, primary_key=True)
name = Column(String)
class B(Base):
__tablename__ = "table_b"
id = Column(Integer, primary_key=True)
name = Column(String)
array_a = Column(postgresql.ARRAY(Integer))
a_ids= relationship('A',primaryjoin='A.id == any_(foreign(B.array_a))',uselist=True)
I can't seem to find any good documentation on this. I have a list of users and order amounts, and I want to display the users with the top 10 order amount totals. I've been having trouble creating a query that sufficiently extracts this data in SQLAlchemy. Is there a better way to approach this?
customers, amount = DBSession.query(Order.customer, func.sum(Order.amount).label('totalamount')).\
group_by(Order.customer).\
order_by(func.desc(totalamount)).\
limit(10)
for a, b in zip(customers, amount):
print a.name, str(amount)
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
import random
Base= declarative_base()
class Customer(Base):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(Unicode)
orders = relationship("Order", backref="customer")
class Order(Base):
__tablename__ = "order"
id = Column(Integer, primary_key=True)
customer_id= Column(Integer, ForeignKey('customer.id'))
amount = Column(Integer)
e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)
session = Session(e)
session.add_all([
Customer(name="c%d" % i, orders=[
Order(amount=random.randint(10, 100))
for j in xrange(random.randint(0, 5))
])
for i in xrange(100)
])
amount_sum = func.sum(Order.amount).label('totalamount')
amount = session.query(Order.customer_id, amount_sum).\
group_by(Order.customer_id).\
order_by(amount_sum.desc()).\
limit(10).\
subquery()
for a, b in session.query(Customer, amount.c.totalamount).\
join(amount, amount.c.customer_id==Customer.id):
print a.name, b
some guidelines on the pattern here are at http://www.sqlalchemy.org/docs/orm/tutorial.html#using-subqueries, but overall start in SQL first.