Join on a CTE in SQLAlchemy

Join on a CTE in SQLAlchemy - python

I'm trying to formulate a SQLAlchemy query that uses a CTE to build a table-like structure of an input list of tuples, and JOIN it with one of my tables (backend DB is Postgres). Conceptually, it would look like:
WITH to_compare AS (
SELECT * FROM (
VALUES
(1, 'flimflam'),
(2, 'fimblefamble'),
(3, 'pigglywiggly'),
(4, 'beepboop')
-- repeat for a couple dozen or hundred rows
) AS t (field1, field2)
)
SELECT b.field1, b.field2, b.field3
FROM my_model b
JOIN to_compare c ON (c.field1 = b.field1) AND (c.field2 = b.field2)
The goal is to see what field3 for the pair (field1, field2) in the table if it is, for a medium-sized list of (field1, field2) pairs.
In SQLAlchemy I'm trying to do it like this:
stmts = [
sa.select(
[
sa.cast(sa.literal(field1), sa.Integer).label("field1"),
sa.cast(sa.literal(field2), sa.Text).label("field2"),
]
)
if idx == 0
else sa.select([sa.literal(field1), sa.literal(field2)])
for idx, (field1, field2) in enumerate(list_of_tuples)
]
cte = sa.union_all(*stmts).cte(name="temporary_table")
already_in_db_query = db.session.query(MyModel)\
.join(cte,
cte.c.field1 == MyModel.field1,
cte.c.field2 == MyModel.field2,
).all()
But it seems like CTEs and JOINs don't play well together: the error is on the join, saying:
sqlalchemy.exc.InvalidRequestError: Don't know how to join to ; please use an ON clause to more clearly establish the left side of this join
And if I try to print the cte, it does look like a non-SQL entity:
$ from pprint import pformat
$ print(pformat(str(cte)), flush=True)
> ''
Is there a way to do this? Or a better way to achieve my goal?

The second argument to Query.join() should in this case be the full ON clause, but instead you pass 3 arguments to join(). Use and_() to combine the predicates, as is done in the raw SQL:
already_in_db_query = db.session.query(MyModel)\
.join(cte,
and_(cte.c.field1 == MyModel.field1,
cte.c.field2 == MyModel.field2),
).all()

Related

How to prevent nested queries in sqlalchemy from selecting a table again?

I wrote a query for mysql that achieved what I wanted. It's structured a bit like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c where table_a.y = table_c.y and table_b.z = table_c.z
)
)
I translated the query to sqlalchemy and the result is structured like this:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z).exists()
).exists()
)
Which generates a query like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c, table_a where table_a.y = table_c.y and table_b.z = table_c.z
)
)
Note the re-selection of table_a in the innermost query - which breaks the intended functionality.
How can I stop sqlalchemy from selecting the table again in a nested query?

Tell the innermost query to correlate all except table_c:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z)
.exists().correlate_except(table_c)
).exists()
)
In contrast to "auto-correlation", which only considers FROM elements from the enclosing Select, explicit correlation will consider FROM elements from any nesting level as candidates.

sqlalchemy: selecting columns referring to an aliased table name

I'm fairly new to using sqlalchemy and having some issues generating the sql code that I am looking for.
Ultimately, I'm trying to join two different subsets of table2 to table1 by using the following SQL query:
SELECT table1.date, a1.id AS name1_id, a2.id AS name2_id
FROM table1
LEFT JOIN table2 as a1
ON table1.name1 = table2.label AND table2.lookup_id = 1000
LEFT JOIN table2 as a2
ON table1.name2 = table2.label AND table2.lookup_id = 2000
Here's what I have so far using sqlalchemy:
q_generate = (
select([table1.c.date,
a1.id.label('name1_id'),
a2.id.label('name2_id')])
.select_from(table1
.outerjoin(table2.alias(name='a1'),
and_(
table2.c.lookup_id == 1000,
table1.c.name1 == table2.c.label
))
.outerjoin(table2.alias(name='a2'),
and_(
table2.c.lookup_id == 2000,
table1.c.name2== table2.c.label
))
)
)
which produces the following errors:
*NameError: name 'a1' is not defined*
Is there a special way that aliased table names must be referenced? What am I missing here? I think the error has something to do with these lines but I can't figure out how exactly to get this to work:
...
a1.id.label('name1_id'),
a2.id.label('name2_id')])
...
Thank you!

Yes, do this:
a1 = table2.alias(name='a1')
a2 = table2.alias(name='a2')
q_generate = (
select([table1.c.date,
a1.c.id.label('name1_id'),
a2.c.id.label('name2_id')])
.select_from(table1.outerjoin(a1, ...).outerjoin(a2, ...)))

Selecting a Join in sqlalchemy gives too many rows

I am trying to build a compound SQL query that builds a table from a join I have previously performed. (Using SqlAlchemy (Core part) with python3 and Postgresql 9.4)
I include here the relevant part of my python3 code. I first create "in_uuid_set" using a select with a group_by. Then I join "in_uuid_set" with "in_off_messages" to get "jn_in".
Finally, I try to build a new table "incoming" from "jn_in" by selecting and generating the wanted columns:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))\
.alias()
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
Surprisingly, I get that "incoming" has more rows than "jn_in". "incoming" has 12 rows, while "jn_in" has only 2 rows. I expect that "incoming" will have the same amount of rows (2) as "jn_in".
I also include here the SQL output the SqlAlchemy generates for "incoming":
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM in_off_messages,
(SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1,
(SELECT anon_1.remote_uuid AS anon_1_remote_uuid,
in_off_messages.msg_uuid AS in_off_messages_msg_uuid,
in_off_messages.orig_src_uuid AS in_off_messages_orig_src_uuid,
in_off_messages.src_uuid AS in_off_messages_src_uuid,
in_off_messages.dst_uuid AS in_off_messages_dst_uuid,
in_off_messages.msg_type AS in_off_messages_msg_type,
in_off_messages.date_sent AS in_off_messages_date_sent,
in_off_messages.content AS in_off_messages_content,
in_off_messages.was_read AS in_off_messages_was_read
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2) AS anon_2
Something doesn't look right for me with this SQL output, mostly because I see GROUP BY too many times. I would have expected it to show up about once, but it seems like it shows up twice here.
My guesses is that somehow some braces went out of place (In the generated SQL). I also suspect that I did something wrong with the alias() thing, though I'm not sure about it.
What should I do to get the wanted result (Same amount of rows for "jn_in" and "incoming")?

After playing with the code for a while, I found a way to fix it.
The answer was eventually related to the alias().
In order to make this work, the second alias() (Of jn_in) should be omitted, like this:
in_uuid_set = \
sa.select([in_off_messages.c.src_uuid.label('remote_uuid')])\
.select_from(in_off_messages)\
.where(in_off_messages.c.dst_uuid == local_uuid)\
.group_by(in_off_messages.c.src_uuid)\
.alias()
jn_in = in_uuid_set.join(in_off_messages,\
and_(\
in_off_messages.c.src_uuid == in_uuid_set.c.remote_uuid,\
in_off_messages.c.dst_uuid == local_uuid,\
))
# <<< The alias() is gone >>>
incoming = sa.select([\
in_off_messages.c.msg_uuid.label('msg_uuid'),\
in_uuid_set.c.remote_uuid.label('remote_uuid'),\
in_off_messages.c.msg_type.label('msg_type'),\
in_off_messages.c.date_sent.label('date_sent'),\
in_off_messages.c.content.label('content'),\
in_off_messages.c.was_read.label('was_read'),\
true().label('is_incoming')]
)\
.select_from(jn_in)
It seems, however, that the first alias() (of in_uuid_set) can not be ommited. If I try to omit it, I get this error message:
E subquery in FROM must have an alias
E LINE 2: FROM (SELECT in_off_messages.src_uuid AS remote_uuid
E ^
E HINT: For example, FROM (SELECT ...) [AS] foo.
As a generalization of this, probably if you have a select that you want to put as a clause somewhere else, then you want to alias() it, however if you have a join that you want to put as a clause, you should not alias() it.
For the sake of completeness, I include here the resulting SQL of the new code:
SELECT in_off_messages.msg_uuid AS msg_uuid,
anon_1.remote_uuid AS remote_uuid,
in_off_messages.msg_type AS msg_type,
in_off_messages.date_sent AS date_sent,
in_off_messages.content AS content,
in_off_messages.was_read AS was_read,
1 AS is_incoming
FROM (SELECT in_off_messages.src_uuid AS remote_uuid
FROM in_off_messages
WHERE in_off_messages.dst_uuid = :dst_uuid_1
GROUP BY in_off_messages.src_uuid) AS anon_1
JOIN in_off_messages
ON in_off_messages.src_uuid = anon_1.remote_uuid
AND in_off_messages.dst_uuid = :dst_uuid_2
Much shorter than the one at the question.

SQLAlchemy subquery in from clause without join

i need a little help.
I have following query and i'm, curious about how to represent it in terms of sqlalchemy.orm. Currently i'm executing it by session.execute. Its not critical for me, but i'm just curious. The thing that i'm actually don't know is how to put subquery in FROM clause (nested view) without doing any join.
select g_o.group_ from (
select distinct regexp_split_to_table(g.group_name, E',') group_
from (
select array_to_string(groups, ',') group_name
from company
where status='active'
and array_to_string(groups, ',') like :term
limit :limit
) g
) g_o
where g_o.group_ like :term
order by 1
limit :limit
I need this subquery thing because of speed issue - without limit in the most inner query function regexp_split_to_table starts to parse all data and does limit only after that. But my table is huge and i cannot afford that.
If something is not very clear, please, ask, i'll do my best)

I presume this is PostgreSQL.
To create a subquery, use subquery() method. The resulting object can be used as if it were Table object. Here's how your query would look like in SQLAlchemy:
subq1 = session.query(
func.array_to_string(Company.groups, ',').label('group_name')
).filter(
(Company.status == 'active') &
(func.array_to_string(Company.groups, ',').like(term))
).limit(limit).subquery()
subq2 = session.query(
func.regexp_split_to_table(subq1.c.group_name, ',')
.distinct()
.label('group')
).subquery()
q = session.query(subq2.c.group).\
filter(subq2.c.group.like(term)).\
order_by(subq2.c.group).\
limit(limit)
However, you could avoid one subquery by using unnest function instead of converting array to string with arrayt_to_string and then splitting it with regexp_split_to_table:
subq = session.query(
func.unnest(Company.groups).label('group')
).filter(
(Company.status == 'active') &
(func.array_to_string(Company.groups, ',').like(term))
).limit(limit).subquery()
q = session.query(subq.c.group.distinct()).\
filter(subq.c.group.like(term)).\
order_by(subq.c.group).\
limit(limit)

format a string with Python from list elements

I have a list that contains two elements like this :
tr = ['table1', 'table2']
I would like to be able to generate a part of a query and get this :
table1 INNER JOIN table2 ON table1.id = table2.id
How can I do this please in Python ?
Any help would be appreciated.
EDIT :
Here is what I've tried to produce table1 INNER JOIN table2:
join_tables = ('%s LEFT JOIN %s'.format(' '.join('%s' for _ in range(element -1))) for element in tr)

"{0} INNER JOIN {1} ON {0}.id = {1}.id".format("table1", "table2")
Edit
If this was a legitimate example, you need to use cursor.execute("{0} INNER JOIN {1} ON {0}.id = {1}.id", ("table1", "table2")) for MySQL, or cursor.mogrify(...) for Postgres to properly escape the table names and prevent SQL injection.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Join on a CTE in SQLAlchemy - python

Related

How to prevent nested queries in sqlalchemy from selecting a table again?

sqlalchemy: selecting columns referring to an aliased table name

Selecting a Join in sqlalchemy gives too many rows

SQLAlchemy subquery in from clause without join

format a string with Python from list elements

Categories

Resources