SQLAlchemy not building JOIN on select correctly - python

I'm trying to translate the following query into a SQLAlchemy ORM query:
SELECT applications.*,
appversions.version
FROM applications
JOIN appversions
ON appversions.id = (SELECT id
FROM appversions
WHERE appversions.app_id = applications.id
ORDER BY sort_ver DESC
LIMIT 1)
The model for the tables are as follows:
Base = declarative_base()
class Application(Base):
__tablename__ = 'applications'
id = Column(Integer, primary_key = True)
group = Column(Unicode(128))
artifact = Column(Unicode(128))
versions = relationship("AppVersion", backref = "application")
class AppVersion(Base):
__tablename__ = 'versions'
id = Column(Integer, primary_key = True)
app_id = Column(Integer, ForeignKey('applications.id'))
version = Column(Unicode(64))
sort_ver = Column(Unicode(64))
And the query I've so far come up with is:
subquery = select([AppVersion.id]). \
where(AppVersion.app_id == Application.id). \
order_by(AppVersion.sort_ver). \
limit(1). \
alias()
query = session.query(Application). \
join(AppVersion, AppVersion.id == subquery.c.id) \
.all()
However, this is producing the following SQL statement and error:
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: anon_1.id
[SQL: SELECT applications.id AS applications_id, applications."group" AS applications_group, applications.artifact AS applications_artifact
FROM applications JOIN versions ON versions.id = anon_1.id]
I have tried various different methods to produce the subquery and attempting to 'tack on' the sub-SELECT command, but without any positive impact.
Is there a way to coerce the SQLAlchemy query builder to correctly append the sub-SELECT?

With thanks to #Ilja Everilä for the nudge in the right direction, the code to generate the correct query is:
subquery = select([AppVersion.id]). \
where(AppVersion.app_id == Application.id). \
order_by(AppVersion.sort_ver). \
limit(1). \
correlate(Application)
query = session.query(Application). \
join(AppVersion, AppVersion.id == subquery) \
.all()
The main change is to use the correlate() method, which alters how SQLAlchemy constructs the subquery.
To explain why this works requires some understanding of how SQL subqueries are categorised and handled. The best explanation I have found is from https://www.geeksforgeeks.org/sql-correlated-subqueries/:
With a normal nested subquery, the inner SELECT query runs first and executes once, returning values to be used by the main query. A correlated subquery, however, executes once for each candidate row considered by the outer query. In other words, the inner query is driven by the outer query.

Related

SQLAlchemy: Selecting all records in one table that are not in another, related table

I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)

SQLAlchemy: Query filter including extra data from association table

I am trying to build an ORM mapped SQLite database. The conception of the DB seems to work as intended but I can't seem to be able to query it properly for more complex cases. I have spent the day trying to find an existing answer to my question but nothing works. I am not sure if the issue is with my mapping, my query or both. Or if maybe querying with attributes from a many to many association table with extra data works differently.
This the DB setup:
engine = create_engine('sqlite:///')
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
class User(Base):
__tablename__ = 'users'
# Columns
id = Column('id', Integer, primary_key=True)
first = Column('first_name', String(100))
last = Column('last_name', String(100))
age = Column('age', Integer)
quality = Column('quality', String(100))
unit = Column('unit', String(100))
# Relationships
cases = relationship('UserCaseLink', back_populates='user_data')
def __repr__(self):
return f"<User(first='{self.first}', last='{self.last}', quality='{self.quality}', unit='{self.unit}')>"
class Case(Base):
__tablename__ = 'cases'
# Columns
id = Column('id', Integer, primary_key=True)
num = Column('case_number', String(100))
type = Column('case_type', String(100))
# Relationships
users = relationship('UserCaseLink', back_populates='case_data')
def __repr__(self):
return f"<Case(num='{self.num}', type='{self.type}')>"
class UserCaseLink(Base):
__tablename__ = 'users_cases'
# Columns
user_id = Column('user_id', Integer, ForeignKey('users.id'), primary_key=True)
case_id = Column('case_id', Integer, ForeignKey('cases.id'), primary_key=True)
role = Column('role', String(100))
# Relationships
user_data = relationship('User', back_populates='cases')
case_data = relationship('Case', back_populates='users')
if __name__ == '__main__':
Base.metadata.create_all()
session = Session()
and I would like to retrieve all the cases on which a particular person is working under a certain role.
So for example I want a list of all the cases a person named 'Alex' is working on as an 'Administrator'.
In other words I would like the result of this query:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
So far I have tried many things along the lines of:
cases = session.query(Case).filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')
but it is not working as I would like it to.
How can I modify the ORM mapping so that it does the joining for me and allows me to query easily (something like the query I tried)?
According to your calsses, the quivalent query for:
SELECT [cases].*,
[main].[users_cases].role
FROM [main].[cases]
INNER JOIN [main].[users_cases] ON [main].[cases].[id] = [main].[users_cases].[case_id]
INNER JOIN [main].[users] ON [main].[users].[id] = [main].[users_cases].[user_id]
WHERE [main].[users].[first_name] = 'Alex'
AND [main].[users_cases].[role] = 'Administrator';
is
cases = session.query(
Case.id, Case.num,Cas.type,
UserCaseLink.role
).filter(
(Case.id==UserCaseLink.case_id)
&(User.id==UserCaseLink.user_id)
&(User.first=='Alex')
&(UserCaseLink.role=='Administrator'
).all()
also, you can:
cases = Case.query\
.join(UserCaseLink,Case.id==UserCaseLink.case_id)\
.join(User,User.id==UserCaseLink.user_id)\
.filter( (User.first=='Alex') & (User.first=='Alex') )\
.all()
Good Luck
After comment
based in your comment, I think you want something like:
cases = Case.query\
.filter( (Case.case_data.cases.first=='Alex') & (Case.case_data.cases.first=='Alex') )\
.all()
where case_data connect between Case an UserCaseLink and cases connect between UserCaseLink and User as in your relations.
But,that case causes error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with dimpor.org_type has an attribute 'org_type_id'
The missage shows that the attributes combined in filter should belong to the table class
So I ended up having to compromise.
It seems the query cannot be aware of all the relationships present in the ORM mapping at all times. Instead I had to manually give it the path between the different classes for it to find all the data I wanted:
cases = session.query(Case)\
.join(Case.users)\
.join(UserCaseLink.user_data)\
.filter(User.first == 'Alex', UserCaseLink.role == 'Administrator')\
.all()
However, as it does not meet all the criteria for my original question (ie I still have to specify the joins), I will not mark this answer as the accepted one.

Using COUNT(*) OVER() in current query with SQLAlchemy over PostgreSQL

In a prototype application that uses Python and SQLAlchemy with a PostgreSQL database I have the following schema (excerpt):
class Guest(Base):
__tablename__ = 'guest'
id = Column(Integer, primary_key=True)
name = Column(String(50))
surname = Column(String(50))
email = Column(String(255))
[..]
deleted = Column(Date, default=None)
I want to build a query, using SQLAlchemy, that retrieves the list of guests, to be displayed in the back-office.
To implement pagination I will be using LIMIT and OFFSET, and also COUNT(*) OVER() to get the total amount of records while executing the query (not with a different query).
An example of the SQL query could be:
SELECT id, name, surname, email,
COUNT(*) OVER() AS total
FROM guest
WHERE (deleted IS NULL)
ORDER BY id ASC
LIMIT 50
OFFSET 0
If I were to build the query using SQLAlchemy, I could do something like:
query = session.query(Guest)
query = query.filter(Login.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
And if I wanted to count all the rows in the guests table, I could do something like this:
from sqlalchemy import func
query = session.query(func.count(Guest.id))
query = query.filter(Login.deleted == None)
result = query.scalar()
Now the question I am asking is how to execute one single query, using SQLAlchemy, similar to the one above, that kills two birds with one stone (returns the first 50 rows and the count of the total rows to build the pagination links, all in one query).
The interesting bit is the use of window functions in PostgreSQL which allows the abovementioned behaviour, thus saving you from having to query twice but just once.
Is it possible?
Thanks in advance.
So I could not find any examples in the SQLAlchemy documentation, but I found these functions:
count()
over()
label()
And I managed to combine them to produce exactly the result I was looking for:
from sqlalchemy import func
query = session.query(Guest, func.count(Guest.id).over().label('total'))
query = query.filter(Guest.deleted == None)
query = query.order_by(Guest.id.asc())
query = query.offset(0)
query = query.limit(50)
result = query.all()
Cheers!
P.S. I also found this question on Stack Overflow, which was unanswered.

sqlalchemy filtering by model which has multiple relation references

For example, I have Parcel model, which has sender and receiver, both are Subjects.
I'm trying to get parcels from particular sender. I don't want to use Parcel.sender.has(), because of performance, my real table is too big.
From docs:
Because has() uses a correlated subquery, its performance is not nearly as good when compared against large target tables as that of using a join.
Here is a full paste-and-run example:
from sqlalchemy import create_engine, Column, Integer, Text, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative.api import declarative_base
from sqlalchemy.orm.util import aliased
engine = create_engine('sqlite://')
Session = sessionmaker(bind=engine)
s = Session()
Base = declarative_base()
class Subject(Base):
__tablename__ = 'subject'
id = Column(Integer, primary_key=True)
name = Column(Text)
class Parcel(Base):
__tablename__ = 'parcel'
id = Column(Integer, primary_key=True)
sender_id = Column(Integer, ForeignKey('subject.id'))
receiver_id = Column(Integer, ForeignKey('subject.id'))
sender = relationship('Subject', foreign_keys=[sender_id], uselist=False, lazy='joined')
receiver = relationship('Subject', foreign_keys=[receiver_id], uselist=False, lazy='joined')
def __repr__(self):
return '<Parcel #{id} {s} -> {r}>'.format(id=self.id, s=self.sender.name, r=self.receiver.name)
# filling database
Base.metadata.create_all(engine)
p = Parcel()
p.sender, p.receiver = Subject(name='Bob'), Subject(name='Alice')
s.add(p)
s.flush()
#
# Method #1 - using `has` method - working but slow
print(s.query(Parcel).filter(Parcel.sender.has(name='Bob')).all())
So, i tried to join and filter by aliased relationship, which raised an error:
#
# Method #2 - using aliased joining - doesn't work
# I'm getting next error:
#
# sqlalchemy.exc.InvalidRequestError: Could not find a FROM clause to join from.
# Tried joining to <AliasedClass at 0x7f24b7adef98; Subject>, but got:
# Can't determine join between 'parcel' and '%(139795676758928 subject)s';
# tables have more than one foreign key constraint relationship between them.
# Please specify the 'onclause' of this join explicitly.
#
sender = aliased(Parcel.sender)
print(s.query(Parcel).join(sender).filter(sender.name == 'Bob').all())
I've found that if I specify Model with join condition instead of relation, it'll work. But final SQL query was'nt what I expect:
print(
s.query(Parcel)\
.join(Subject, Parcel.sender_id == Subject.id)\
.filter(Subject.name == 'Bob')
)
produces next SQL query:
SELECT parcel.id AS parcel_id,
parcel.sender_id AS parcel_sender_id,
parcel.receiver_id AS parcel_receiver_id,
subject_1.id AS subject_1_id,
subject_1.name AS subject_1_name,
subject_2.id AS subject_2_id,
subject_2.name AS subject_2_name
FROM parcel
JOIN subject ON parcel.sender_id = subject.id
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_2 ON subject_2.id = parcel.receiver_id
WHERE subject.name = ?
Here you can see that the subject table is being joined three times instead of two. That's because both sender and receiver relations are configured to load joined. And third join is the subject, by which I'm filtering.
I expect that final query will look like this:
SELECT parcel.id AS parcel_id,
parcel.sender_id AS parcel_sender_id,
parcel.receiver_id AS parcel_receiver_id,
subject_1.id AS subject_1_id,
subject_1.name AS subject_1_name,
subject_2.id AS subject_2_id,
subject_2.name AS subject_2_name
FROM parcel
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_2 ON subject_2.id = parcel.receiver_id
WHERE subject_1.name = ?
I believe that filtering by multiple referenced relations shouldn't be so unclear and there is better and clearer ways to do it. Please help me find it.
You have configured it in such a way that sender and reciever will always by join loaded.
You can change it and do a joinedload by hand, when you actually need both of them loaded at the same time with a join.
If you prefer to leave the definitions as they are, you can just "help" SQLAlchemy and point out that the query already has all the data for this comparison and there is no need for an additional join. For this, contains_eager option is used.
The modified query:
q = (s.query(Parcel)
.join(Parcel.sender)
.options(contains_eager(Parcel.sender))
.filter(Subject.name == 'Bob'))
And SQL it produces:
SELECT subject.id AS subject_id,
subject.name AS subject_name,
parcel.id AS parcel_id,
parcel.sender_id AS parcel_sender_id,
parcel.receiver_id AS parcel_receiver_id,
subject_1.id AS subject_1_id,
subject_1.name AS subject_1_name
FROM parcel
JOIN subject ON subject.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.receiver_id
WHERE subject.name = ?

GeoAlchemy2: find a set of Geometry items that doesn't intersect with a separate set

I have a postgis database table called tasks, mapped to a python class Task using geoalchemy2/sqlalchemy - each entry has a MultiPolygon geometry and an integer state. Collectively, entries in my database cover a geographic region. I want to select a random entry of state=0 which is not geographically adjacent to any entry of state=1.
Here's code which selects a random entry of state=0:
class Task(Base):
__tablename__ = "tasks"
id = Column(Integer, primary_key=True, index=True)
geometry = Column(Geometry('MultiPolygon', srid=4326))
state = Column(Integer, default=0)
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0)
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
So far so good. But now, how to make sure that they are not adjacent to another set of entries?
Geoalchemy has a function ST_Union which can unify geometries, and ST_Disjoint which detects if they intersect or not. SO it seems I should be able to select items of state=1, union them into a single geometry, and then filter down my original query (above) to only keep the items that are disjoint to it. But I can't find a way to express this in geoalchemy. Here's one way I tried:
session = DBSession()
taskgetter = session.query(Task).filter_by(state=0) \
.filter(Task.geometry.ST_Disjoint(session.query( \
Task.geometry.ST_Union()).filter_by(state=1)))
count = taskgetter.count()
if count != 0:
atask = taskgetter.offset(random.randint(0, count-1)).first()
and it yields an error like this:
ProgrammingError: (ProgrammingError) subquery in FROM must have an alias
LINE 3: FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"...
^
HINT: For example, FROM (SELECT ...) [AS] foo.
'SELECT count(*) AS count_1
FROM (SELECT tasks.id AS tasks_id
FROM tasks, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s)
WHERE tasks.state = %(state_2)s AND ST_Disjoint(tasks.geometry, (SELECT ST_Union(tasks.geometry) AS "ST_Union_1"
FROM tasks
WHERE tasks.state = %(state_1)s))) AS anon_1' {'state_1': 1, 'state_2': 0}
A shot in the dark as I don't have the setup to test it :
This seems to be related to SQLAlchemy's subqueries more than GeoAlchemy, try to add .subquery() at the end of your subquery to generate an alias (cf : http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html#using-subqueries)
Edit :
Still using info from the linked tutorial, I think this may work :
state1 = session.query(
Task.geometry.ST_Union().label('taskunion')
).filter_by(state=1).subquery()
taskgetter = session.query(Task)\
.filter_by(state=0)
.filter(Task.geometry.ST_Disjoint(state1.c.taskunion))
Add a label to the column you're creating on your subquery to reference it in your super-query.

Categories

Resources