How to prevent nested queries in sqlalchemy from selecting a table again? - python

I wrote a query for mysql that achieved what I wanted. It's structured a bit like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c where table_a.y = table_c.y and table_b.z = table_c.z
)
)
I translated the query to sqlalchemy and the result is structured like this:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z).exists()
).exists()
)
Which generates a query like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c, table_a where table_a.y = table_c.y and table_b.z = table_c.z
)
)
Note the re-selection of table_a in the innermost query - which breaks the intended functionality.
How can I stop sqlalchemy from selecting the table again in a nested query?

Tell the innermost query to correlate all except table_c:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z)
.exists().correlate_except(table_c)
).exists()
)
In contrast to "auto-correlation", which only considers FROM elements from the enclosing Select, explicit correlation will consider FROM elements from any nesting level as candidates.

Related

How to iteratively create UNION ALL SQL statement using Python?

I am connecting to Snowflake to query row count data of view table from Snowflake. I am also querying metadata related to View table. My Query looks like below. I was wondering if I can iterate through UNION ALL statement using python ? When I try to run my below query I received an error that says "view_table_3" does not exist.
Thanks in advance for your time and efforts!
Query to get row count for Snowflake view table (with metadata)
view_tables=['view_table1','view_table2','view_table3','view_table4']
print(f""" SELECT * FROM (SELECT TABLE_SCHEMA,TABLE_NAME,CREATED,LAST_ALTERED FROM SCHEMA='INFORMATION_SCHEMA.VIEWS' WHERE TABLE_SCHEMA='MY_SCHEMA' AND TABLE_NAME IN ({','.join("'" +x+ "'" for x in view_tables)})) t1
LEFT JOIN
(SELECT 'view_table1' table_name2, count(*) as view_row_count from MY_DB.SCHEMA.view_table1
UNION ALL SELECT {','.join("'" +x+ "'" for x in view_tables[1:])},count(*) as view_row_count from MY_DB.SCHEMA.{','.join("" +x+ "" for x.replace("'"," ") in view_tables)})t2
on t1.TABLE_NAME =t2.table_name2 """)
If you want to make a union dynamically, put the entire SELECT query inside the generator, and then join them with ' UNION '.
sql = f'''SELECT * FROM INFORMATION_SCHEMA.VIEWS AS v
LEFT JOIN (
{' UNION '.join(f"SELECT '{table}' AS table_name2, COUNT(*) AS view_row_count FROM MY_SCHEMA.{table}" for table in view_tables)}
) AS t2 ON v.TABLE_NAME = t2.table_name2
WHERE v.TABLE_NAME IN ({','.join(f"'{table}'" for table in view_tables)})
'''
print(sql);

Join on a CTE in SQLAlchemy

I'm trying to formulate a SQLAlchemy query that uses a CTE to build a table-like structure of an input list of tuples, and JOIN it with one of my tables (backend DB is Postgres). Conceptually, it would look like:
WITH to_compare AS (
SELECT * FROM (
VALUES
(1, 'flimflam'),
(2, 'fimblefamble'),
(3, 'pigglywiggly'),
(4, 'beepboop')
-- repeat for a couple dozen or hundred rows
) AS t (field1, field2)
)
SELECT b.field1, b.field2, b.field3
FROM my_model b
JOIN to_compare c ON (c.field1 = b.field1) AND (c.field2 = b.field2)
The goal is to see what field3 for the pair (field1, field2) in the table if it is, for a medium-sized list of (field1, field2) pairs.
In SQLAlchemy I'm trying to do it like this:
stmts = [
sa.select(
[
sa.cast(sa.literal(field1), sa.Integer).label("field1"),
sa.cast(sa.literal(field2), sa.Text).label("field2"),
]
)
if idx == 0
else sa.select([sa.literal(field1), sa.literal(field2)])
for idx, (field1, field2) in enumerate(list_of_tuples)
]
cte = sa.union_all(*stmts).cte(name="temporary_table")
already_in_db_query = db.session.query(MyModel)\
.join(cte,
cte.c.field1 == MyModel.field1,
cte.c.field2 == MyModel.field2,
).all()
But it seems like CTEs and JOINs don't play well together: the error is on the join, saying:
sqlalchemy.exc.InvalidRequestError: Don't know how to join to ; please use an ON clause to more clearly establish the left side of this join
And if I try to print the cte, it does look like a non-SQL entity:
$ from pprint import pformat
$ print(pformat(str(cte)), flush=True)
> ''
Is there a way to do this? Or a better way to achieve my goal?
The second argument to Query.join() should in this case be the full ON clause, but instead you pass 3 arguments to join(). Use and_() to combine the predicates, as is done in the raw SQL:
already_in_db_query = db.session.query(MyModel)\
.join(cte,
and_(cte.c.field1 == MyModel.field1,
cte.c.field2 == MyModel.field2),
).all()

Add calculated column in lateral join SQLAlchemy

I am trying to write the following PosgreSQL query in SQLAlchemy:
SELECT DISTINCT user_id
FROM
(SELECT *, (amount * usd_rate) as usd_amount
FROM transactions AS t1
LEFT JOIN LATERAL (
SELECT rate as usd_rate
FROM fx_rates fx
WHERE (fx.ccy = t1.currency) AND (t1.created_date > fx.ts)
ORDER BY fx.ts DESC
LIMIT 1
) t2 On true) AS complete_table
WHERE type = 'CARD_PAYMENT' AND usd_amount > 10
So far, I have the lateral join by using subquery in the following way:
lateral_query = session.query(fx_rates.rate.label('usd_rate')).filter(fx_rates.ccy == transactions.currency,
transactions.created_date > fx_rates.ts).order_by(desc(fx_rates.ts)).limit(1).subquery('rates_lateral').lateral('rates')
task2_query = session.query(transactions).outerjoin(lateral_query,true()).filter(transactions.type == 'CARD_PAYMENT')
print(task2_query)
This produces:
SELECT transactions.currency AS transactions_currency, transactions.amount AS transactions_amount, transactions.state AS transactions_state, transactions.created_date AS transactions_created_date, transactions.merchant_category AS transactions_merchant_category, transactions.merchant_country AS transactions_merchant_country, transactions.entry_method AS transactions_entry_method, transactions.user_id AS transactions_user_id, transactions.type AS transactions_type, transactions.source AS transactions_source, transactions.id AS transactions_id
FROM transactions LEFT OUTER JOIN LATERAL (SELECT fx_rates.rate AS usd_rate
FROM fx_rates
WHERE fx_rates.ccy = transactions.currency AND transactions.created_date > fx_rates.ts ORDER BY fx_rates.ts DESC
LIMIT %(param_1)s) AS rates ON true
WHERE transactions.type = %(type_1)s
Which print the correct lateral query,but so far I don't know how to add the calculated field (amount*usd_rate), so I can apply the distinct and where statements.
Add the required entity in the Query, give it a label, and use the result as a subquery as you've done in SQL:
task2_query = session.query(
transactions,
(transactions.amount * lateral_query.c.usd_rate).label('usd_amount')).\
outerjoin(lateral_query, true()).\
subquery()
task3_query = session.query(task2_query.c.user_id).\
filter(task2_query.c.type == 'CARD_PAYMENT',
task2_query.c.usd_amount > 10).\
distinct()
On the other hand wrapping it in a subquery should be unnecessary, since you can use the calculated USD amount in a WHERE predicate in the inner query just as well:
task2_query = session.query(transactions.user_id).\
outerjoin(lateral_query, true()).\
filter(transactions.type == 'CARD_PAYMENT',
transactions.amount * lateral_query.c.usd_rate > 10).\
distinct()

Multiple insertion of one value in sqlalchemy statement to pandas

I have constructed a sql clause where I reference the same table as a and b to compare the two geometries as a postgis command.
I would like to pass a value into the sql statement using the %s operator and read the result into a pandas dataframe using to_sql, params kwargs. Currently my code will allow for one value to be passed to one %s but i'm looking for multiple insertions of the same list of values.
I'm connecting to a postgresql database using psycopg2.
Simplified code is below
sql = """
SELECT
st_distance(a.the_geom, b.the_geom, true) AS dist
FROM
(SELECT
table.*
FROM table
WHERE id in %s) AS a,
(SELECT
table.*
FROM table
WHERE id in %s) AS b
WHERE a.nid <> b.nid """
sampList = (14070,11184)
df = pd.read_sql(sql, con=conn, params = [sampList])
Basically i'm looking to replace both %s with the sampList value in both places. The code as written will only replace the first value indicating ': list index out of range. If I adjust to having one %s and replacing the second in statement with numbers the code runs, but ultimately I would like away to repeat those values.
You dont need the subqueries, just join the table with itself:
SELECT a.*, b.* -- or whatwever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM ztable a
JOIN ztable b ON a.nid < b.nid
WHERE a.id IN (%s)
AND b.id IN (%s)
;
avoid repetition by using a CTE (this may be non-optimal, performance-wise)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (%s)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
;
Performance-wise, I would just stick to the first version, and supply the list-argument twice. (or refer to it twice, using a FORMAT() construct)
first of all i would recommend you to use updated SQL from #wildplasser - it's much better and more efficient way to do that.
now you can do the following:
sql_ = """\
WITH zt AS (
SELECT * FROM ztable
WHERE id IN ({})
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
"""
sampList = (14070,11184)
sql = sql_.format(','.join(['?' for x in sampList]))
df = pd.read_sql(sql, con=conn, params=sampList)
dynamically generated SQL with parameters (AKA: prepared statements, bind variables, etc.):
In [27]: print(sql)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (?,?)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid

Return all values when a condition value is NULL

Suppose I have the following very simple query:
query = 'SELECT * FROM table1 WHERE id = %s'
And I'm calling it from a python sql wrapper, in this case psycopg:
cur.execute(query, (row_id))
The thing is that if row_id is None, I would like to get all the rows, but that query would return an empty table instead.
The easy way to approach this would be:
if row_id:
cur.execute(query, (row_id))
else:
cur.execute("SELECT * FROM table1")
Of course this is non idiomatic and gets unnecessarily complex with non-trivial queries. I guess there is a way to handle this in the SQL itself but couldn't find anything. What is the right way?
Try to use COALESCE function as below
query = 'SELECT * FROM table1 WHERE id = COALESCE(%s,id)'
SELECT * FROM table1 WHERE id = %s OR %s IS NULL
But depending how the variable is forwarded to the query it might be better to make it 0 if it is None
SELECT * FROM table1 WHERE id = %s OR %s = 0

Categories

Resources