I want to select the count of all contacts with the same email address that have more than one duplicate. I can't get this query working in SQLAlchey with PostgreSQL.
SELECT count(*), email FROM contact group by email having count(*) > 1
I tried this:
all_records = db.session.query(Contact).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "contact.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT contact.id AS contact_id, contact.email AS contact_em...
^
[SQL: 'SELECT contact.id AS contact_id, contact.email AS contact_email \nFROM contact GROUP BY contact.email \nHAVING count(%(count_1)s) > %(count_2)s'] [parameters: {'count_1': '*', 'count_2': 1}]
And I tried this:
all_records = db.session.query(func.count(Contact.id)).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ArgumentError
sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity. Use Load(ent) to specify specific entities.
It works correctly if I execute raw SQL:
all_records = db.session.execute(
"SELECT count(*), email FROM contact group by email"
" having count(*) > 1").fetchall()
I'm using Flask-SQLAlchemy, but here's a minimal SQLAlchemy setup to demonstrate the issue:
import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Contact(Base):
__tablename__ = 'contact'
id = sa.Column(sa.Integer, primary_key=True)
email = sa.Column(sa.String)
engine = sa.create_engine('postgresql:///example', echo=True)
Base.metadata.create_all(engine)
session = orm.Session(engine)
session.add_all((
Contact(email='a#example.com'),
Contact(email='b#example.com'),
Contact(email='a#example.com'),
Contact(email='c#example.com'),
Contact(email='a#example.com'),
))
session.commit()
# first failed query
all_records = session.query(Contact).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
# second failed query
all_records = db.session.query(sa.func.count(Contact.id)).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
With the sample data, I expect to get one result row, 3, a#example.com.
You're not building the same query in SQLAlchemy that you're writing manually.
You want to select the count of each email that has more than one occurrence.
q = session.query(
db.func.count(Contact.email),
Contact.email
).group_by(
Contact.email
).having(
db.func.count(Contact.email) > 1
)
print(q)
SELECT count(contact.email) AS count_1, contact.email AS contact_email
FROM contact GROUP BY contact.email
HAVING count(contact.email) > %(count_2)s
The first query fails because you query the entire model, so SQLAlchemy selects all columns. You can only select grouped columns when using group_by. SQLAlchemy must always select the primary key when querying the entire model, load_only doesn't affect that.
The second query fails because load_only only works when selecting an entire model, but you're selecting an aggregate and a column.
Just select what you would in a text query:
db.session.query(func.count('*'), Contact.email).\
group_by(Contact.email).\
having(func.count('*') > 1).\
all()
Related
I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)
I have the following SQLAlchemy subquery with a join:
ChildModel
id
first_name
last_name
ParentModel
id
first_name
last_name
ClassroomModel
id
child_id
subquery = PostgresqlSession().\
query(ChildModel, ParentModel).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()
query = PostgresqlSession().\
query(subquery, ClassroomModel).\
join(subquery, subquery.c.student_id == ClassroomModel.student_id)
But I'm getting a AmbiguousColumn error because the subquery has an id column for both ChildModel and ParentModel
What I'd like to do is do a "SELECT AS" for the subquery. I'm looking at the SQLAlchemy documentation about how to do this with select(), so I tried something like:
subquery = PostgresqlSession().\
query(ChildModel, ParentModel).\
select(ChildModel.c.id.label('student_id'), ParentModel.c.id.label('parent_id')).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()
but select() is not available on the query model. How can I do a SELECT AS on a session query in SQLAlchemy? Thanks!
Ok I found the answer from this SO post: How can I select only one column using SQLAlchemy?
TL;DR
Indicate your SELECT AS fields in the query() method and use label() function. You must explicitly list ALL attributes if you specify one (can't just specify the id, or it won't return first_name, last_name etc. this is expected)
subquery = PostgresqlSession().\
query(
ChildModel.id.label('child_id'),
ChildModel.first_name,
ChildModel.last_name,
ParentModel.id.label('parent_id'),
ChildModel.first_name,
ChildModel.last_name
).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()
In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.
So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.
I use sqlalchemy to make changes to a table in SQL Server database, and would like to get back number of affected rows.
I know there is .rowcount attribute to ResultProxy, but as, for example, this answer is demonstrating .rowcount is not necessarily the same as number of affected rows.
SQL Server uses ##ROWCOUNT to access number of affected rows from the previous statement execution.
Is there a way to modify an sqlalchemy expression that uses insert / update statement to end with SELECT ##ROWCOUNT?
For example, given:
from sqlalchemy import Table, Column, Integer, String, MetaData, create_engine
url = 'mssql+pyodbc://dsn'
engine = create_engine(url)
metadata = MetaData()
users = Table('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('fullname', String),
)
ins = users.insert().values(name='jack', fullname='Jack Jones')
upd1 = users.update().values(fullname='Jack Doe').where(users.c.name == 'jack')
upd2 = users.update().values(fullname='Jack Doe').where(users.c.name == 'jack')
I could prepend SELECT ##ROWCOUNT to an update statement:
sel = select([text('##ROWCOUNT')])
sql1 = sel.suffix_with(upd2)
print(sql1.compile(engine, compile_kwargs={"literal_binds": True}))
Yielding "wrong" query:
SELECT ##ROWCOUNT UPDATE users SET fullname='Jack Doe' WHERE users.name = 'jack'
Trying to do the "right" thing:
sql2 = upd2.suffix_with(sel)
Raises AttributeError since 'Update' object has no attribute 'suffix_with'.
So is there a way to get desired sql query:
UPDATE users SET fullname='Jack Doe' WHERE users.name = 'jack';
SELECT ##ROWCOUNT
Using sql expression language without fully textual constructs.
I am trying to perform a SELECT query with an IN() clause, and have sqlalchemy perform the
parameter escaping for me. I am using pyodbc as my database connector.
This is the code I have written so far:
tables = ['table1', 'table2', ... ]
sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME IN(:tables)"
result = session.execute(sql, {"tables": tables})
Unfortunatenly this fails with an error:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('Invalid parameter type. param-index=0 param-type=list', 'HY105')
Is there any way I can have sqlalchemy escape the whole list of parameters and join them with ,
without manually adding a :tableX placeholder for each item of the list?
Try something like this....
DECLARE #string Varchar(100) = 'Table1,table2,table3'
declare #xml xml
set #xml = N'<root><r>' + replace(#string,',','</r><r>') + '</r></root>'
SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME IN( select
r.value('.','varchar(max)') as item
from #xml.nodes('//root/r') as records(r)
)
For good reasons it is not possible to expand a list of arguments as you wish.
If you really would like to create a raw SQL query, then you can just enumerate over your list and dynamically create the query:
vals = {"param{}".format(i): table for i, table in enumerate(tables)}
keys = ", ".join([":{}".format(k) for k in vals])
sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME IN ({keys})".format(keys=keys)
result = session.execute(sql, vals)
for tbl in result:
print(tbl)
But you could ask sqlalchemy to do this for you. Here we make a fake mapping of the INFORMATION_SCHEMA.tables view, and query it using sqlalchemy toolset:
# definition (done just once)
class ISTable(Base):
__tablename__ = 'tables'
__table_args__ = {'schema': 'INFORMATION_SCHEMA'}
_fake_id = Column(Integer, primary_key=True)
table_catalog = Column(String)
table_schema = Column(String)
table_name = Column(String)
table_type = Column(String)
# actual usage
result = session.query(
ISTable.table_catalog, ISTable.table_schema,
ISTable.table_name, ISTable.table_type,
).filter(
ISTable.table_name.in_(tables))
for tbl in result:
print(tbl)
One gotcha: you cannot query for the whole mapped class (like this query(ISTable)) because the primary_key does not exist and an exception will be raised. But querying only columns we can about (as shown above) is good enough for the purpose.