Converting a CTE query to SQLAlchemy ORM - python

(This is a rewritten version of a deleted question from earlier today.)
I'm using the SQLAlchemy ORM as part of a Flask app, with MySQL as a backend, and I'm trying to write a query to return a list of entries surrounding a particular entry. While I have a working query in SQL, I'm unsure how to code it in SQLA. The docs for CTE in the ORM show a very complicated example, and there aren't many other examples I can find.
For now assume a very simple table that only contains words:
class Word(db.Model):
__tablename__ = 'word'
id = db.Column(db.Integer, primary_key=True)
word = db.Column(db.String(100))
If I want the 10 words before and after a given word (with an id of 73), an SQL query that does what I need is:
WITH cte AS (SELECT id, word, ROW_NUMBER() OVER (ORDER BY word) AS rownumber FROM word)
SELECT * FROM cte
WHERE rownumber > (SELECT rownumber FROM cte WHERE cte.id = 73) - 10
AND rownumber < (SELECT rownumber FROM cte WHERE cte.id = 73) + 10
ORDER BY rownumber;
I can't figure out the next step, however. I want to get a list of Word objects. I'd imagine that the first part of it could be something like
id = 73
rowlist = db.session.query(Word.id, db.func.row_number()).filter(Word.id == id).order_by(Word.word).cte()
but even if this is right, I don't know how to get this into the next part; I got bogged down in the aliased bits in the examples. Could someone give me a push in the right direction?

This may not be the most elegant solution but it seems to be working for me:
engine = db.create_engine(sqlalchemy_uri)
Base = declarative_base()
class Word(Base):
__tablename__ = "so64359277"
id = db.Column(db.Integer, primary_key=True)
word = db.Column(db.String(100))
def __repr__(self):
return f"<Word(id={self.id}, word='{self.word}')>"
Base.metadata.drop_all(engine, checkfirst=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
# test data
word_objects = []
for x in [
"Hotel",
"Charlie",
"Alfa",
"India",
"Foxtrot",
"Echo",
"Bravo",
"Golf",
"Delta",
]:
word_objects.append(Word(word=x))
session.add_all(word_objects)
session.commit()
# show test data with id values
pprint(session.query(Word).all())
"""console output:
[<Word(id=1, word='Hotel')>,
<Word(id=2, word='Charlie')>,
<Word(id=3, word='Alfa')>,
<Word(id=4, word='India')>,
<Word(id=5, word='Foxtrot')>,
<Word(id=6, word='Echo')>,
<Word(id=7, word='Bravo')>,
<Word(id=8, word='Golf')>,
<Word(id=9, word='Delta')>]
"""
target_word = "Echo"
num_context_rows = 3
rowlist = session.query(
Word.id,
Word.word,
db.func.row_number().over(order_by=Word.word).label("rownum"),
).cte("rowlist")
target_rownum = session.query(rowlist.c.rownum).filter(
rowlist.c.word == target_word
)
select_subset = session.query(rowlist.c.rownum, rowlist.c.id).filter(
db.and_(
(rowlist.c.rownum >= target_rownum.scalar() - num_context_rows),
(rowlist.c.rownum <= target_rownum.scalar() + num_context_rows),
)
)
rownum_id_map = {x[0]: x[1] for x in select_subset.all()}
min_rownum = min(rownum_id_map)
max_rownum = max(rownum_id_map)
result = []
for rownum in range(min_rownum, max_rownum + 1):
result.append(session.query(Word).get(rownum_id_map[rownum]))
pprint(result)
"""console output:
[<Word(id=7, word='Bravo')>,
<Word(id=2, word='Charlie')>,
<Word(id=9, word='Delta')>,
<Word(id=6, word='Echo')>,
<Word(id=5, word='Foxtrot')>,
<Word(id=8, word='Golf')>]
<Word(id=1, word='Hotel')>]
"""

Related

Subquery to the same table in SQLAlchemy ORM

Hello SQLAlchemy masters,
I am just facing a problem with how to use SQLAlchemy ORM in python for the SQL query
SELECT systems.name,
(
SELECT date
FROM accounting A
WHERE A.ticker = C.ticker AND A.info = 'Trade_opened'
) AS entry,
C.*
FROM accounting C
JOIN systems ON C.system_id = systems.id
WHERE C.status = 'open'
And I can't use an aliased() in a right way:
H = aliased(Accounting, name='H')
C = aliased(Accounting, name='C')
his = db.session.query(H.date) \
.filter(H.ticker == C.ticker, H.info == r'Trade_opened')
sql = db.session.query(Systems.name, C, his) \
.join(Systems, C.system_id == Systems.id) \
.filter(C.status == r'Open') \
.statement
print(sql)
Can you help me, please?
I think you need:
scalar_subquery to be able to use the subquery as a column
select_from to be able to set the "left" side of the joins to be different from the first column (ie. C instead of systems).
I didn't test this with actual data so I don't know if it works correctly. It helps if you post your schema and some test data. I used Account because it has an easy plural, accounts, to setup a test.
Base = declarative_base()
class Account(Base):
__tablename__ = 'accounts'
id = Column(Integer, primary_key=True)
date = Column(Date)
ticker = Column(String(length=200))
info = Column(String(length=200))
status = Column(String(length=200))
system = relationship('System', backref='accounts')
system_id = Column(Integer, ForeignKey('systems.id'))
class System(Base):
__tablename__ = 'systems'
id = Column(Integer, primary_key=True)
name = Column(String(length=200))
with Session(engine) as session:
C = aliased(Account, name='C')
A = aliased(Account, name='A')
date_subq = session.query(A.date).filter(and_(A.ticker == C.ticker, A.info == 'Trade_opened')).scalar_subquery()
q = session.query(System.name, date_subq.label('entry'), C).select_from(C).join(C.system).filter(C.status == 'open')
print (q)
Formatted SQL:
SELECT
systems.name AS systems_name,
(SELECT "A".date
FROM accounts AS "A" WHERE "A".ticker = "C".ticker AND "A".info = %(info_1)s) AS entry,
"C".id AS "C_id",
"C".date AS "C_date",
"C".ticker AS "C_ticker",
"C".info AS "C_info",
"C".status AS "C_status",
"C".system_id AS "C_system_id"
FROM accounts AS "C"
JOIN systems ON systems.id = "C".system_id
WHERE "C".status = %(status_1)s

SqlAlchemy: Sub-Select with a Dual EXISTS (OR) and an additional boolean check

In SqlAlchemy I need to implement the following subquery, which runs fine in PostgreSQL. It is an OR condition consisting of 2 EXISTS, plus an additional AND block. That whole column results in a True/False boolean value.
SELECT
...
...
,
(SELECT
(
EXISTS (SELECT id from participating_ic_t pi1 where pi1.id = agreement_t_1.participating_ic_id
AND pi1.ic_nihsac = substr(ned_person_t_2.nihsac, 1, 3))
OR EXISTS (SELECT id from external_people_t ep1 where ep1.participating_ic_id = agreement_t_1.participating_ic_id
AND ep1.uniqueidentifier = ned_person_t_2.uniqueidentifier)
)
AND ned_person_t_2.current_flag = 'Y' and ned_person_t_2.inactive_date is null and ned_person_t_2.organizationalstat = 'EMPLOYEE'
) as ACTIVE_APPROVER1,
First of all, if I omit the additional AND block, the following OR-EXISTS by itself works OK:
subq1 = db_session.query(ParticipatingIcT.id).filter((ParticipatingIcT.id == agreement.participating_ic_id),
(ParticipatingIcT.ic_nihsac == func.substr(approver.nihsac, 1, 3)))
.subquery()
subq2 = db_session.query(ExternalPeopleT.id).filter((ExternalPeopleT.participating_ic_id == agreement.participating_ic_id),
(ExternalPeopleT.uniqueidentifier == approver.uniqueidentifier))
.subquery()
subqMain = db_session.query(or_(exists(subq1), exists(subq2))
.subquery()
# ...
# Now we will select from subqMain.
agreements = (db_session.query(
..,
subqMain
But the problem starts when I introduce the final AND block. Conceptually, the final result should be the following:
subqMain = db_session.query(and_(
or_(exists(subq1), exists(subq2)),
approver.current_flag == 'Y',
approver.inactive_date == None,
approver.organizationalstat == 'EMPLOYEE'))
.subquery()
But this actually emits " .. FROM APPROVER_T" right in the sub-query, whereas it should be linked to the FROM APPROVER_T of the main query at the very end. I need to avoid adding the .. FROM [table] which happens as soon as I specify the and_(..). I don't understand why it's doing that. All subqueries are specifically marked as subquery(). The alias approver is defined at the very top as approver = aliased(NedPersonT).
exists1 = db_session.query(
ParticipatingIcT.id
).filter(
(ParticipatingIcT.id == agreement.participating_ic_id),
(ParticipatingIcT.ic_nihsac == func.substr(approver.nihsac, 1, 3))
).exists() # use exists here instead of subquery
exists2 = db_session.query(
ExternalPeopleT.id
).filter(
(ExternalPeopleT.participating_ic_id == agreement.participating_ic_id),
(ExternalPeopleT.uniqueidentifier == approver.uniqueidentifier)
).exists() # use exists again instead of subquery
subqMain = db_session.query(and_(
or_(exists1, exists2),
approver.current_flag == 'Y',
approver.inactive_date == None,
approver.organizationalstat == 'EMPLOYEE')).subquery()
# ...
# Now we will select from subqMain.
agreements = (db_session.query(
OtherModel,
subqMain))
sqlalchemy.orm.Query.exists
Half Baked Example
I also use aliased here because I don't quite understand the method of operations concerning wrapping sub queries in parentheses. It seems to work without it but the RAW sql is confusing to read. This is toy example I made to try and see if the SQL produced was invalid.
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True, autoincrement=True)
name = Column(String, nullable=False)
hobbies = relationship('Hobby', backref='user')
class Hobby(Base):
__tablename__ = 'hobbies'
id = Column(Integer, primary_key=True, autoincrement=True)
user_id = Column(Integer, ForeignKey('users.id'), nullable = False)
name = Column(String, nullable=False)
Base.metadata.create_all(engine, checkfirst=True)
with Session(engine) as session:
user1 = User(name='user1')
session.add(user1)
session.add_all([
Hobby(name='biking', user=user1),
Hobby(name='running', user=user1),
Hobby(name='eating', user=user1),
])
user2 = User(name='user2')
session.add(user2)
session.add(Hobby(name='biking', user=user2))
session.commit()
nested_user = aliased(User)
subq1 = session.query(Hobby.id).filter(nested_user.id ==Hobby.user_id, Hobby.name.like('bik%'))
subq2 = session.query(Hobby.id).filter(nested_user.id ==Hobby.user_id, Hobby.name.like('eat%'))
subqmain = session.query(nested_user).filter(or_(subq1.exists(), subq2.exists()), nested_user.id > 0).subquery()
q = session.query(User).select_from(User).join(subqmain, User.id == subqmain.c.id)
print (q)
print ([user.name for user in q.all()])
SELECT users.id AS users_id, users.name AS users_name
FROM users JOIN (SELECT users_1.id AS id, users_1.name AS name
FROM users AS users_1
WHERE ((EXISTS (SELECT 1
FROM hobbies
WHERE users_1.id = hobbies.user_id AND hobbies.name LIKE %(name_1)s)) OR (EXISTS (SELECT 1
FROM hobbies
WHERE users_1.id = hobbies.user_id AND hobbies.name LIKE %(name_2)s))) AND users_1.id > %(id_1)s) AS anon_1 ON users.id = anon_1.id
['user1', 'user2']
Something wasn't working with that Sub-Select so I rewrote it as a CASE statement instead. Now it's working. Also, I had to join on a couple of extra tables in the main query to facilitate this new CASE.
casePrimaryApprover = case(
[
(and_(
(agreement.approving_official_id.is_not(None)),
(approver.current_flag == 'Y'),
(approver.inactive_date.is_(None)),
(approver.organizationalstat == 'EMPLOYEE'),
or_(
(participatingIc.ic_nihsac == func.substr(approver.nihsac, 1, 3)),
(externalPeoplePrimaryApprover.id.is_not(None))
)
), 'Y')
],
else_ = 'N'
)
# Main Query
agreements = (db_session.query(
agreement.id,
...
casePrimaryApprover
...
.join(participatingIc, agreement.participating_ic_id == participatingIc.id)
.outerjoin(externalPeoplePrimaryApprover, approver.uniqueidentifier == externalPeoplePrimaryApprover.uniqueidentifier)

SQLAlchemy: Selecting all records in one table that are not in another, related table

I have two tables, ProjectData and Label, like this.
class ProjectData(db.Model):
__tablename__ = "project_data"
id = db.Column(db.Integer, primary_key=True)
class Label(db.Model):
__tablename__ = "labels"
id = db.Column(db.Integer, primary_key=True)
data_id = db.Column(db.Integer, db.ForeignKey('project_data.id'))
What I want to do is select all records from ProjectData that are not represented in Label - basically the opposite of a join, or a right outer join, which is not a feature SQLAlchemy offers.
I have tried to do it like this, but it doesn't work.
db.session.query(ProjectData).select_from(Label).outerjoin(
ProjectData
).all()
Finding records in one table with no match in another is known as an anti-join.
You can do this with a NOT EXISTS query:
from sqlalchemy.sql import exists
stmt = exists().where(Label.data_id == ProjectData.id)
q = db.session.query(ProjectData).filter(~stmt)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
WHERE NOT (
EXISTS (
SELECT *
FROM labels
WHERE labels.data_id = project_data.id
)
)
Or by doing a LEFT JOIN and filtering for null ids in the other table:
q = (db.session.query(ProjectData)
.outerjoin(Label, ProjectData.id == Label.data_id)
.filter(Label.id == None)
)
which generates this SQL:
SELECT project_data.id AS project_data_id
FROM project_data
LEFT OUTER JOIN labels ON project_data.id = labels.data_id
WHERE labels.id IS NULL
If you know your desired SQL statement to run, you can utilize the 'text' function from sqlalchemy in order to execute a complex query
https://docs.sqlalchemy.org/en/13/core/sqlelement.html
from sqlalchemy import text
t = text("SELECT * "
"FROM users "
"where user_id=:user_id "
).params(user_id=user_id)
results = db.session.query(t)

Select linked objects from Query in SQLAlchemy

I have a database where I store mouse IDs and FMRI measurement sessions, the classes (with greatly reduced columns, for convenience) look as follows:
class FMRIMeasurement(Base):
__tablename__ = "fmri_measurements"
id = Column(Integer, primary_key=True)
date = Column(DateTime)
animal_id = Column(Integer, ForeignKey('animals.id'))
class Animal(Base):
__tablename__ = "animals"
id = Column(Integer, primary_key=True)
id_eth = Column(Integer)
fmri_measurements = relationship("FMRIMeasurement", backref="animal")
I would like to create a pandas dataframe cntaining all of the details of all of the FMRIMeasurements assigned to one particular animal. Selecting data from that animal works fine:
mystring = str(session.query(Animal).filter(Animal.id_eth == 1))
print pd.read_sql_query(mystring, engine, params=[4001])
But as soon as I try to select the FMRIMeasurements it blows up. None of the follwing work.
mystring = str(session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1))
mystring = str(session.query(FMRIMeasurement).filter(FMRIMeasurement.animal.id_eth == 1))
mystring = str(session.query(Animal.fmri_measurements.date).filter(Animal.id_eth == 1))
I guess I'm just using SQLAlchemy wrong, but I couldn't find anything to help me with my use case in the docs (perhaps I don't know how wthat I want to do is actually called) :-/
session.query makes a query string, you need to actually execute it using .first() or .all() to get the resultset
For instance
sql_query = session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1)
result_set = sql_query.all() # execute the query and return resultset
for item in result_set:
# do work with item
# if the item changes and you want to commit the changes
session.merge(item)
# commit changes
session.commit()
Alterantively you do not need the .all() and iterating through a query object will execute it as well
sql_query = session.query(Animal.fmri_measurements).filter(Animal.id_eth == 1)
for item in sql_query:
#do something
To then get the pandas dataframe, you can run:
mystring = str(session.query(Weight))
print pd.read_sql_query(mystring,engine)

SQLAlchemy: Converting Self-Ref JOIN, COUNT, GROUP BY SELECT

I have been struggling for a day to get an SQL Select statement that works into the equivalent SQLAlchemy code. It involves two tables.
A Tags table
class Tags(Base):
__tablename__ = 't_tags'
uid = Column(Integer, primary_key=True)
category = Column(Enum('service', 'event', 'attribute', name='enum_tag_category'))
name = Column(String(32))
And a table that maps them to their originating parents
class R_Incident_Tags(Base):
__tablename__ ='r_incident_tags'
incident_uid = Column(String(48), ForeignKey('t_incident.uid'), primary_key=True)
tag_uid = Column(Integer, ForeignKey('t_tags.uid'), primary_key=True)
tag = relationship("Tags", backref="r_incident_tags")
incident_uid is a unique string to identify the parent.
The SELECT I have been struggling to represent in SQLAlchemy is as follows
SELECT DISTINCT s.name, e.name, count(e.name)
FROM "t_tags" AS s,
"t_tags" AS e,
"r_incident_tags" AS sr,
"r_incident_tags" AS er
WHERE s.category='service' AND
e.category='event' AND
e.uid = er.tag_uid AND
s.uid = sr.tag_uid AND
er.incident_uid = sr.incident_uid
GROUP BY s.name, e.name
Any assistance would be appreciated as I haven't even got close to getting something working after a whole day of effort.
Kindest Regards!
This should do the job:
s = aliased(Tags)
e = aliased(Tags)
sr = aliased(R_Incident_Tags)
er = aliased(R_Incident_Tags)
qry = (session.query(s.name, e.name, func.count(e.name)).
select_from(s, e, sr, er).
filter(s.category=='service').
filter(e.category=='event').
filter(e.uid == er.tag_uid).
filter(s.uid == sr.tag_uid).
filter(er.incident_uid == sr.incident_uid).
group_by(s.name, e.name)
)
But you could also use relationship-based JOINs instead of simple WHERE clauses.

Categories

Resources