How can I structure this sqlalchemy query so that it does the right thing?
I've given everything I can think of an alias, but I'm still getting:
ProgrammingError: (psycopg2.ProgrammingError) subquery in FROM must have an alias
LINE 4: FROM (SELECT foo.id AS foo_id, foo.version AS ...
Also, as IMSoP pointed out, it seems to be trying to turn it into a cross join, but I just want it to join a table with a group by subquery on that same table.
Here is the sqlalchemy:
(Note: I've rewritten it to be a standalone file that is as complete as possible and can be run from a python shell)
from sqlalchemy import create_engine, func, select
from sqlalchemy import Column, BigInteger, DateTime, Integer, String, SmallInteger
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine('postgresql://postgres:########localhost:5435/foo1234')
session = sessionmaker()
session.configure(bind=engine)
session = session()
Base = declarative_base()
class Foo(Base):
__tablename__ = 'foo'
__table_args__ = {'schema': 'public'}
id = Column('id', BigInteger, primary_key=True)
time = Column('time', DateTime(timezone=True))
version = Column('version', String)
revision = Column('revision', SmallInteger)
foo_max_time_q = select([
func.max(Foo.time).label('foo_max_time'),
Foo.id.label('foo_id')
]).group_by(Foo.id
).alias('foo_max_time_q')
foo_q = select([
Foo.id.label('foo_id'),
Foo.version.label('foo_version'),
Foo.revision.label('foo_revision'),
foo_max_time_q.c.foo_max_time.label('foo_max_time')
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
).alias('foo_q')
thing = session.query(foo_q).all()
print thing
generated sql:
SELECT foo_id AS foo_id,
foo_version AS foo_version,
foo_revision AS foo_revision,
foo_max_time AS foo_max_time,
foo_max_time_q.foo_max_time AS foo_max_time_q_foo_max_time,
foo_max_time_q.foo_id AS foo_max_time_q_foo_id
FROM (SELECT id AS foo_id,
version AS foo_version,
revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q)
JOIN (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q
ON foo_max_time_q.foo_id = id
and here is the toy table:
CREATE TABLE foo (
id bigint ,
time timestamp with time zone,
version character varying(32),
revision smallint
);
The SQL was I expecting to get (desired SQL) would be something like this:
SELECT foo.id AS foo_id,
foo.version AS foo_version,
foo.revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM foo
JOIN (SELECT max(time) AS foo_max_time,
id AS foo_id GROUP BY id
) AS foo_max_time_q
ON foo_max_time_q.foo_id = foo.id
Final note:
I'm hoping to get an answer using select() instead of session.query() if possible. Thank you
You are almost there. Make a "selectable" subquery and join it with the main query via join():
foo_max_time_q = select([func.max(Foo.time).label('foo_max_time'),
Foo.id.label('foo_id')
]).group_by(Foo.id
).alias("foo_max_time_q")
foo_q = session.query(
Foo.id.label('foo_id'),
Foo.version.label('foo_version'),
Foo.revision.label('foo_revision'),
foo_max_time_q.c.foo_max_time.label('foo_max_time')
).join(foo_max_time_q,
foo_max_time_q.c.foo_id == Foo.id)
print(foo_q.__str__())
Prints (prettified manually):
SELECT
foo.id AS foo_id,
foo.version AS foo_version,
foo.revision AS foo_revision,
foo_max_time_q.foo_max_time AS foo_max_time
FROM
foo
JOIN
(SELECT
max(foo.time) AS foo_max_time,
foo.id AS foo_id
FROM
foo
GROUP BY foo.id) AS foo_max_time_q
ON
foo_max_time_q.foo_id = foo.id
The complete working code is available in this gist.
Cause
subquery in FROM must have an alias
This error means the subquery (on which we're trying to perform a join) has no alias.
Even if we .alias('t') it just to satisfy this requirement, we will then get the next error:
missing FROM-clause entry for table "foo"
That's because the join on clause (... == Foo.id) is not familiar with Foo.
It only knows the "left" and "right" tables: t (the subquery) and foo_max_time_q.
Solution
Instead, select_from a join of Foo and foo_max_time_q.
Method 1
Replace .join(B, on_clause) with .select_from(B.join(A, on_clause):
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(foo_max_time_q.join(Foo, foo_max_time_q.c.foo_id == Foo.id)
This works here because A INNER JOIN B is equivalent to B INNER JOIN A.
Method 2
To preserve the order of joined tables:
from sqlalchemy import join
and replace .join(B, on_clause) with .select_from(join(A, B, on_clause)):
]).join(foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id
]).select_from(join(Foo, foo_max_time_q, foo_max_time_q.c.foo_id == Foo.id)
Alternatives to session.query() can be found here.
Related
I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)
I use sqlalchemy to make changes to a table in SQL Server database, and would like to get back number of affected rows.
I know there is .rowcount attribute to ResultProxy, but as, for example, this answer is demonstrating .rowcount is not necessarily the same as number of affected rows.
SQL Server uses ##ROWCOUNT to access number of affected rows from the previous statement execution.
Is there a way to modify an sqlalchemy expression that uses insert / update statement to end with SELECT ##ROWCOUNT?
For example, given:
from sqlalchemy import Table, Column, Integer, String, MetaData, create_engine
url = 'mssql+pyodbc://dsn'
engine = create_engine(url)
metadata = MetaData()
users = Table('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('fullname', String),
)
ins = users.insert().values(name='jack', fullname='Jack Jones')
upd1 = users.update().values(fullname='Jack Doe').where(users.c.name == 'jack')
upd2 = users.update().values(fullname='Jack Doe').where(users.c.name == 'jack')
I could prepend SELECT ##ROWCOUNT to an update statement:
sel = select([text('##ROWCOUNT')])
sql1 = sel.suffix_with(upd2)
print(sql1.compile(engine, compile_kwargs={"literal_binds": True}))
Yielding "wrong" query:
SELECT ##ROWCOUNT UPDATE users SET fullname='Jack Doe' WHERE users.name = 'jack'
Trying to do the "right" thing:
sql2 = upd2.suffix_with(sel)
Raises AttributeError since 'Update' object has no attribute 'suffix_with'.
So is there a way to get desired sql query:
UPDATE users SET fullname='Jack Doe' WHERE users.name = 'jack';
SELECT ##ROWCOUNT
Using sql expression language without fully textual constructs.
I want to select the count of all contacts with the same email address that have more than one duplicate. I can't get this query working in SQLAlchey with PostgreSQL.
SELECT count(*), email FROM contact group by email having count(*) > 1
I tried this:
all_records = db.session.query(Contact).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "contact.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT contact.id AS contact_id, contact.email AS contact_em...
^
[SQL: 'SELECT contact.id AS contact_id, contact.email AS contact_email \nFROM contact GROUP BY contact.email \nHAVING count(%(count_1)s) > %(count_2)s'] [parameters: {'count_1': '*', 'count_2': 1}]
And I tried this:
all_records = db.session.query(func.count(Contact.id)).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ArgumentError
sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity. Use Load(ent) to specify specific entities.
It works correctly if I execute raw SQL:
all_records = db.session.execute(
"SELECT count(*), email FROM contact group by email"
" having count(*) > 1").fetchall()
I'm using Flask-SQLAlchemy, but here's a minimal SQLAlchemy setup to demonstrate the issue:
import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Contact(Base):
__tablename__ = 'contact'
id = sa.Column(sa.Integer, primary_key=True)
email = sa.Column(sa.String)
engine = sa.create_engine('postgresql:///example', echo=True)
Base.metadata.create_all(engine)
session = orm.Session(engine)
session.add_all((
Contact(email='a#example.com'),
Contact(email='b#example.com'),
Contact(email='a#example.com'),
Contact(email='c#example.com'),
Contact(email='a#example.com'),
))
session.commit()
# first failed query
all_records = session.query(Contact).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
# second failed query
all_records = db.session.query(sa.func.count(Contact.id)).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
With the sample data, I expect to get one result row, 3, a#example.com.
You're not building the same query in SQLAlchemy that you're writing manually.
You want to select the count of each email that has more than one occurrence.
q = session.query(
db.func.count(Contact.email),
Contact.email
).group_by(
Contact.email
).having(
db.func.count(Contact.email) > 1
)
print(q)
SELECT count(contact.email) AS count_1, contact.email AS contact_email
FROM contact GROUP BY contact.email
HAVING count(contact.email) > %(count_2)s
The first query fails because you query the entire model, so SQLAlchemy selects all columns. You can only select grouped columns when using group_by. SQLAlchemy must always select the primary key when querying the entire model, load_only doesn't affect that.
The second query fails because load_only only works when selecting an entire model, but you're selecting an aggregate and a column.
Just select what you would in a text query:
db.session.query(func.count('*'), Contact.email).\
group_by(Contact.email).\
having(func.count('*') > 1).\
all()
I would like to move rows from one table to another using SQLAlchemy with a Postgres database (there are other questions on Stack Overflow about moving data but they don't focus on using SQLAlchemy for this).
The approach is to use DELETE with RETURNING and to insert the rows into the other table.
I'm using: SQLAlchemy 1.0.12, Postgres 9.4 and Python 2.7.11.
Setting up the tables
The following SQL creates the tables and inserts a row of data:
create table example1 (
id integer,
value_a integer,
value_b integer,
CONSTRAINT example1_pkey PRIMARY KEY (id)
);
create table example2 (
id integer,
value_a integer,
value_b integer,
CONSTRAINT example2_pkey PRIMARY KEY (id)
);
insert into example1 values (18, 1, 9);
Creating tables using SQLAlchemy
The following SQLAlchemy code creates the same tables and inserts a row of data:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class ExampleOne(Base):
__tablename__ = 'example1'
id = Column(Integer, primary_key=True)
value_a = Column(Integer)
value_b = Column(Integer)
class ExampleTwo(Base):
__tablename__ = 'example2'
id = Column(Integer, primary_key=True)
value_a = Column(Integer)
value_b = Column(Integer)
Base.metadata.create_all(session.bind)
with session.begin():
session.add(ExampleOne(id=18, value_a=1, value_b=9))
Query that I would like to implement
This is the SQL query that I wish to run (which works on its own):
with output as (delete from example1 where value_a < 10 returning id, value_a)
insert into example2 (id, value_a, value_b)
select id, value_a, 3 from output;
SQLAlchemy query so far
The query that I have constructed so far is:
query = insert(ExampleTwo, inline=True).from_select(
['id', 'value_a', 'value_b'],
select(
['id', 'value_a', literal(3)]
).where(
select([
'id', 'value_a',
]).select_from(
delete(ExampleOne).where(
ExampleOne.value_a < 10,
).returning(
ExampleOne.id,
ExampleOne.value_a,
)
)
)
)
session.execute(query)
The problem
The error is:
File ".../lib/python2.7/site-packages/sqlalchemy/sql/selectable.py", line 41, in _interpret_as_from
raise exc.ArgumentError("FROM expression expected")
sqlalchemy.exc.ArgumentError: FROM expression expected
The problem seems to be that SQLAlchemy does not recognise the DELETE ... RETURNING query as a valid expression for the FROM part of the INSERT query.
Is there a way to make this clear to SQLAlchemy or is there are a different approach to create the given query in SQLAlchemy?
You need to make the delete expression into a CTE, as your raw SQL calls for:
>>> output = delete(ExampleOne).where(
... ExampleOne.value_a < 10,
... ).returning(
... ExampleOne.id,
... ExampleOne.value_a,
... ).cte('output')
>>> query = insert(ExampleTwo, inline=True).from_select(
... ['id', 'value_a', 'value_b'],
... select(
... ['id', 'value_a', literal(3)]
... ).select_from(output)
... )
>>> query.compile(engine)
WITH output AS
(DELETE FROM example1 WHERE example1.value_a < %(value_a_1)s RETURNING example1.id, example1.value_a)
INSERT INTO example2 (id, value_a, value_b) SELECT id, value_a, %(param_1)s AS anon_1
FROM output
Unfortunately, .cte only works on delete expressions in SQLAlchemy 1.1, which is currently unreleased, so you'll have to install SQLAlchemy from the source repo to make this work:
pip install -e git+https://bitbucket.org/zzzeek/sqlalchemy#egg=sqlalchemy
I need to execute this query::
select field11, field12
from Table_1 t1
left outer join Table_2 t2 ON t2.tbl1_id = t1.tbl1_id
where t2.tbl2_id is null
I had these classes in python:
class Table1(Base):
....
class Table2(Base):
table_id = Column(
Integer,
ForeignKey('Table1.id', ondelete='CASCADE'),
)
....
How do I get to the above from the below?
q = session.query(Table1.field1, Table1.field2)\
.outerjoin(Table2)\ # use in case you have relationship defined
# .outerjoin(Table2, Table1.id == Table2.table_id)\ # use if you do not have relationship defined
.filter(Table2.tbl2_id == None)
should do it, assuming that field1 and field2 are from Table1, and that you define a relationship:
class Table2(Base):
# ...
table1 = relationship(Table1, backref="table2s")
You can also do that using SQLAlchemy Core only:
session.execute(
select(['field11', 'field12'])
.select_from(
Table1.outerjoin(Table2, Table1.tbl1_id == Table2.tbl1_id))
.where(Table2.tbl2_id.is_(None))
)
PS .outerjoin(table, condition) is equivalent to .join(table, condition, isouter=True).