SQLAlchemy Query "SELECT AS" - python

I have the following SQLAlchemy subquery with a join:
ChildModel
id
first_name
last_name
ParentModel
id
first_name
last_name
ClassroomModel
id
child_id
subquery = PostgresqlSession().\
query(ChildModel, ParentModel).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()
query = PostgresqlSession().\
query(subquery, ClassroomModel).\
join(subquery, subquery.c.student_id == ClassroomModel.student_id)
But I'm getting a AmbiguousColumn error because the subquery has an id column for both ChildModel and ParentModel
What I'd like to do is do a "SELECT AS" for the subquery. I'm looking at the SQLAlchemy documentation about how to do this with select(), so I tried something like:
subquery = PostgresqlSession().\
query(ChildModel, ParentModel).\
select(ChildModel.c.id.label('student_id'), ParentModel.c.id.label('parent_id')).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()
but select() is not available on the query model. How can I do a SELECT AS on a session query in SQLAlchemy? Thanks!

Ok I found the answer from this SO post: How can I select only one column using SQLAlchemy?
TL;DR
Indicate your SELECT AS fields in the query() method and use label() function. You must explicitly list ALL attributes if you specify one (can't just specify the id, or it won't return first_name, last_name etc. this is expected)
subquery = PostgresqlSession().\
query(
ChildModel.id.label('child_id'),
ChildModel.first_name,
ChildModel.last_name,
ParentModel.id.label('parent_id'),
ChildModel.first_name,
ChildModel.last_name
).\
outerjoin(ParentModel, ChildModel.last_name == ParentModel.last_name).\
subquery()

Related

SQLAlchemy: Selecting all records in one table that are not in another, related table

I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)

Using COUNT(*) OVER() in current query with SQLAlchemy over PostgreSQL

In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.
So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.

Select the count and value of a SQLAlchemy column using HAVING

I want to select the count of all contacts with the same email address that have more than one duplicate. I can't get this query working in SQLAlchey with PostgreSQL.
SELECT count(*), email FROM contact group by email having count(*) > 1
I tried this:
all_records = db.session.query(Contact).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "contact.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT contact.id AS contact_id, contact.email AS contact_em...
^
[SQL: 'SELECT contact.id AS contact_id, contact.email AS contact_email \nFROM contact GROUP BY contact.email \nHAVING count(%(count_1)s) > %(count_2)s'] [parameters: {'count_1': '*', 'count_2': 1}]
And I tried this:
all_records = db.session.query(func.count(Contact.id)).options(
load_only('email')).group_by(Contact.email).having(
func.count('*') > 1).all()
sqlalchemy.exc.ArgumentError
sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity. Use Load(ent) to specify specific entities.
It works correctly if I execute raw SQL:
all_records = db.session.execute(
"SELECT count(*), email FROM contact group by email"
" having count(*) > 1").fetchall()
I'm using Flask-SQLAlchemy, but here's a minimal SQLAlchemy setup to demonstrate the issue:
import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Contact(Base):
__tablename__ = 'contact'
id = sa.Column(sa.Integer, primary_key=True)
email = sa.Column(sa.String)
engine = sa.create_engine('postgresql:///example', echo=True)
Base.metadata.create_all(engine)
session = orm.Session(engine)
session.add_all((
Contact(email='a#example.com'),
Contact(email='b#example.com'),
Contact(email='a#example.com'),
Contact(email='c#example.com'),
Contact(email='a#example.com'),
))
session.commit()
# first failed query
all_records = session.query(Contact).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
# second failed query
all_records = db.session.query(sa.func.count(Contact.id)).options(
orm.load_only('email')).group_by(Contact.email).having(
sa.func.count('*') > 1).all()
With the sample data, I expect to get one result row, 3, a#example.com.
You're not building the same query in SQLAlchemy that you're writing manually.
You want to select the count of each email that has more than one occurrence.
q = session.query(
db.func.count(Contact.email),
Contact.email
).group_by(
Contact.email
).having(
db.func.count(Contact.email) > 1
)
print(q)
SELECT count(contact.email) AS count_1, contact.email AS contact_email
FROM contact GROUP BY contact.email
HAVING count(contact.email) > %(count_2)s
The first query fails because you query the entire model, so SQLAlchemy selects all columns. You can only select grouped columns when using group_by. SQLAlchemy must always select the primary key when querying the entire model, load_only doesn't affect that.
The second query fails because load_only only works when selecting an entire model, but you're selecting an aggregate and a column.
Just select what you would in a text query:
db.session.query(func.count('*'), Contact.email).\
group_by(Contact.email).\
having(func.count('*') > 1).\
all()

SQLAlchemy Relationship / Hybrid Property to specific instance of One-To-Many

I'm trying to create a hybrid property or a relationship (either works) to pick out a single model from the "Many" side of a One-To-Many relationship.
The accepted answer for How to set one to many and one to one relationship at same time in Flask-SQLAlchemy? doesn't work for me, as I need an expression-level construct to use in additional queries.
Relevant model details are as follows:
class ItemIdentifierType(db.Model):
id = db.Column(db.Integer, primary_key=True)
code = db.Column(db.String(12))
priority = db.Column(db.Integer)
class ItemIdentifier(db.Model):
id = db.Column(db.String(8), primary_key=True)
type_id = db.Column(db.ForeignKey('item_identifier_type.id')
type = relationship('ItemIdentifierType')
item_id = db.Column(db.ForeignKey('item.id'))
item = db.relationship('Item', back_populates='identifiers')
class Item(db.Model):
id = db.Column(db.String(8), primary_key=True)
name = db.Column(db.String(40))
identifiers = db.relationship('ItemIdentifier', back_populates='instrument', lazy='dynamic')
#hybrid_property
def primary_identifier(self):
return sorted(self.identifiers, key=lambda x: x.type.priority)[0]
#primary_identifier.expression:
def primary_identifier(cls):
primary_identifiers = select([
ItemIdentifier.item_id,
ItemIdentifierType.code,
ItemIdentifier.value
]).select_from(join(ItemIdentifier, ItemIdentifierType,
ItemIdentifier.type_id == ItemIdentifierType.id))\
.order_by(ItemIdentifier.item_id,
ItemIdentifierType.priority.asc())\
.distinct(ItemIdentifier.item_id)\
.alias()
# <<< psycopg2 throws the error shown below >>>
return select([ItemIdentifierType.code, ItemIdentifier.value])\
.select_from(primary_identifiers)\
.where(primary_identifiers.c.item_id == self.id)
Error this throws when attempting to use the sql expression:
(psycopg2.ProgrammingError) subquery in FROM must have an alias
LINE 2: FROM (SELECT item_identifier_type.code AS code, instru...
^
HINT: For example, FROM (SELECT ...) [AS] foo.
[SQL: 'SELECT code AS code, value AS value
FROM (SELECT item_identifier_type.code AS code, item_identifier.value AS value
FROM item_identifier_type, item_identifier, (SELECT DISTINCT item_identifier.item_id AS item_id, item_identifier.id AS id
FROM item_identifier JOIN item_identifier_type ON item_identifier.type_id = item_identifier_type.id ORDER BY item_identifier.item_id, item_identifier_type.priority ASC, item_identifier.id) AS primary_identifiers, item
WHERE primary_identifiers.item_id = item.id) ORDER BY item.name ASC']
The following query pulls out what I'm after, no problem:
SELECT
DISTINCT ON (item_identifier.item_id)
item_identifier.item_id,
item_identifier_type.code,
item_identifier.value
FROM item_identifier
JOIN item_identifier_type
ON item_identifier.type_id = item_identifier_type.id
ORDER BY
item_identifier.item_id,
item_identifier_type.priority ASC;

Put an empty field in peewee union select

I'm try to select two table that have some common field. in raw MySQL query, i can write this:
SELECT t1.id, t1.username, t1.date FROM table1 as 't1' UNION SELECT t2.id, "const_txt", t2.date FROM table2 as 't2'
In that query ,the username field is not in table2 and I set const_txt instead.
So, in peewee, i want to union two table that have the same above situation.
class PeeweeBaseModel(Model):
class Meta:
database = my_db
class Table1(PeeweeBaseModel):
id = PrimaryKeyField()
username = CharField(255)
date = DateTimeField()
#other fields ...
class Table2(PeeweeBaseModel):
id = PrimaryKeyField()
date = DateTimeField()
#other fields ...
and then , union two model. something like this:
u = (
Table1(
Table1.id,
Table1.username,
Table1.date
).select()
|
Table2(
Table2.id,
"const_text_instead_real_field_value",
Table2.date
).select()
).select().execute()
But the const_text is not accepted by a field and ignore in result query.
the question is: How can I define a field that does not exist in my table and set it manually in query?
(And I prefer not using SQL() function.)
thanks.
you can use SQL() in SELECT statement.
u = (
Table1(
Table1.id,
Table1.username,
Table1.date
).select()
|
Table2(
Table2.id,
SQL(" '' AS username "),
Table2.date
).select()
).select().execute()

Categories

Resources