I am trying to write the following PosgreSQL query in SQLAlchemy:
SELECT DISTINCT user_id
FROM
(SELECT *, (amount * usd_rate) as usd_amount
FROM transactions AS t1
LEFT JOIN LATERAL (
SELECT rate as usd_rate
FROM fx_rates fx
WHERE (fx.ccy = t1.currency) AND (t1.created_date > fx.ts)
ORDER BY fx.ts DESC
LIMIT 1
) t2 On true) AS complete_table
WHERE type = 'CARD_PAYMENT' AND usd_amount > 10
So far, I have the lateral join by using subquery in the following way:
lateral_query = session.query(fx_rates.rate.label('usd_rate')).filter(fx_rates.ccy == transactions.currency,
transactions.created_date > fx_rates.ts).order_by(desc(fx_rates.ts)).limit(1).subquery('rates_lateral').lateral('rates')
task2_query = session.query(transactions).outerjoin(lateral_query,true()).filter(transactions.type == 'CARD_PAYMENT')
print(task2_query)
This produces:
SELECT transactions.currency AS transactions_currency, transactions.amount AS transactions_amount, transactions.state AS transactions_state, transactions.created_date AS transactions_created_date, transactions.merchant_category AS transactions_merchant_category, transactions.merchant_country AS transactions_merchant_country, transactions.entry_method AS transactions_entry_method, transactions.user_id AS transactions_user_id, transactions.type AS transactions_type, transactions.source AS transactions_source, transactions.id AS transactions_id
FROM transactions LEFT OUTER JOIN LATERAL (SELECT fx_rates.rate AS usd_rate
FROM fx_rates
WHERE fx_rates.ccy = transactions.currency AND transactions.created_date > fx_rates.ts ORDER BY fx_rates.ts DESC
LIMIT %(param_1)s) AS rates ON true
WHERE transactions.type = %(type_1)s
Which print the correct lateral query,but so far I don't know how to add the calculated field (amount*usd_rate), so I can apply the distinct and where statements.
Add the required entity in the Query, give it a label, and use the result as a subquery as you've done in SQL:
task2_query = session.query(
transactions,
(transactions.amount * lateral_query.c.usd_rate).label('usd_amount')).\
outerjoin(lateral_query, true()).\
subquery()
task3_query = session.query(task2_query.c.user_id).\
filter(task2_query.c.type == 'CARD_PAYMENT',
task2_query.c.usd_amount > 10).\
distinct()
On the other hand wrapping it in a subquery should be unnecessary, since you can use the calculated USD amount in a WHERE predicate in the inner query just as well:
task2_query = session.query(transactions.user_id).\
outerjoin(lateral_query, true()).\
filter(transactions.type == 'CARD_PAYMENT',
transactions.amount * lateral_query.c.usd_rate > 10).\
distinct()
Related
I have this code:
query = """SELECT sp.customer_surname, sp.amount, cp.amount, sp.monthly, sp.date_ FROM set_payment7777 sp JOIN customers_payments7777 cp ON cp.customer_VAT = sp.customer_VAT WHERE sp.date_ = (SELECT MAX(date_) FROM set_payment7777 GROUP BY customer_VAT) GROUP BY sp.customer_VAT"""
mycursor.execute(query)
for row in mycursor:
#do something
but I get the error:
mysql.connector.errors.DataError: 1242 (21000): Subquery returns more
than 1 row
You have several customer_VAT then your subquery return more than a row .. for avoid this you could use a join on the subquery
query = """SELECT sp.customer_surname, sp.amount, cp.amount, sp.monthly, sp.date_
FROM set_payment7777 sp
INNER JOIN customers_payments7777 cp ON cp.customer_VAT = sp.customer_VAT
INNER JOIN (
SELECT MAX(date_) FROM set_payment7777
GROUP BY customer_VAT
) t on t.customer_VAT = sp.customer_VAT
GROUP BY sp.customer_VAT"""
anyway you have a main select without aggregation function then you should avoid an improper use of group by. In this case use DISTINCT if you need not repeated result
query = """SELECT DISTINCT sp.customer_surname, sp.amount, cp.amount, sp.monthly, sp.date_
FROM set_payment7777 sp
INNER JOIN customers_payments7777 cp ON cp.customer_VAT = sp.customer_VAT
INNER JOIN (
SELECT MAX(date_) FROM set_payment7777
GROUP BY customer_VAT
) t on t.customer_VAT = sp.customer_VAT"""
I have a fairly heavy query in SQLAlchemy and I'm trying to optimise it a bit, but I'm struggling with the joins as it's not something I have much knowledge in. My very small test showed the selects were 7x slower than the joins, so it'll potentially be quite a speed increase.
Here are the relevant tables and their relationships:
ActionInfo (id, session_id = SessionInfo.id)
SessionInfo (id)
SessionLink (info_id = SessionInfo.id, data_id = SessionData.id)
SessionData (id, key, value)
I basically want to read SessionData.value where SessionData.key equals something, from a select of ActionInfo.
Here is the current way I've been doing things:
stmt = select(
ActionInfo.id,
select(SessionData.value).where(
SessionData.key == 'username',
SessionLink.data_id == SessionData.id,
SessionLink.info_id == ActionInfo.session_id,
).label('username'),
select(SessionData.value).where(
SessionData.key == 'country',
SessionLink.data_id == SessionData.id,
SessionLink.info_id == ActionInfo.session_id,
).label('country'),
)
In doing the above mentioned speed test, I got a single join working, but I'm obviously limited to only 1 value via this method:
stmt = select(
ActionInfo.id,
SessionData.value.label('country')
).filter(
SessionData.key == 'country'
).outerjoin(SessionInfo).outerjoin(SessionLink).outerjoin(SessionData)
How would I adapt it to end up something like this?
stmt = select(
ActionInfo.id,
select(SessionData.value).where(SessionData.key=='username').label('username'),
select(SessionData.value).where(SessionData.key=='country').label('country'),
).outerjoin(SessionInfo).outerjoin(SessionLink).outerjoin(SessionData)
If it's at all helpful, this is the join code as generated by SQLAlchemy:
SELECT action_info.id
FROM action_info LEFT OUTER JOIN session_info ON session_info.id = action_info.session_id LEFT OUTER JOIN session_link ON session_info.id = session_link.info_id LEFT OUTER JOIN session_data ON session_data.id = session_link.data_id
As a side note, I'm assuming I want a left outer join because I want to still include any records with missing SessionData records. Once I have this working though I'll test what difference an inner join makes to be sure.
The code below:
keys = ["username", "country", "gender"]
q = select(ActionInfo.id).join(SessionInfo)
for key in keys:
SD = aliased(SessionData)
SL = aliased(SessionLink)
q = (
q.outerjoin(SL, SessionInfo.id == SL.info_id)
.outerjoin(SD, and_(SL.data_id == SD.id, SD.key == key))
.add_columns(SD.value.label(key))
)
is generic and can be extended to different number of fields, and should generate SQL similar to below:
SELECT action_info.id,
session_data_1.value AS username,
session_data_2.value AS country,
session_data_3.value AS gender
FROM action_info
JOIN session_info ON session_info.id = action_info.session_id
LEFT OUTER JOIN session_link AS session_link_1 ON session_info.id = session_link_1.info_id
LEFT OUTER JOIN session_data AS session_data_1 ON session_link_1.data_id = session_data_1.id
AND session_data_1.key = :key_1
LEFT OUTER JOIN session_link AS session_link_2 ON session_info.id = session_link_2.info_id
LEFT OUTER JOIN session_data AS session_data_2 ON session_link_2.data_id = session_data_2.id
AND session_data_2.key = :key_2
LEFT OUTER JOIN session_link AS session_link_3 ON session_info.id = session_link_3.info_id
LEFT OUTER JOIN session_data AS session_data_3 ON session_link_3.data_id = session_data_3.id
AND session_data_3.key = :key_3
I have a "multipliers" table that has a foreign key "push_id" pointing to records in the "pushes" table. This is a many to one relationship.
Some push records don't have multipliers, but others do. What I'm trying to accomplish is a SQL query that selects the latest push record which has multipliers, and then query the multipliers themselves.
Something like:
push_id = result_of("SELECT id FROM pushes ORDER BY ID DESC LIMIT 1 WHERE <multiplier record exists where push_id == id>")
multipliers = result_of("SELECT * FROM multipliers LIMIT 1 WHERE push_id == push_id")
print(multipliers)
I may also want to add a constraint on the pushes. As in I only want multipliers from a certain type of push.
Not much SQL experience here - any help appreciated.
Thanks.
UPDATE
I've tried the following:
SELECT * from
(
select m.*, p.type,
from multipliers m
inner join pushes p
on m.push_id = p.id
where p.type = 'CONSTANT'
) AS res1 where res1.push_id = (
select max(push_id) from
(
select m.push_id
from res1
) AS res2
);
and I get this error:
Error Code: 1146. Table 'res1' doesn't exist
Since you are only interested in pushes that are linked to a multiplier, this can be achieved without a join between the tables. The following query based on your own attempts demonstrates the general idea:
select *
from multipliers
where push_id is not null and
push_id = ( select max(push_id)
from multipliers
)
If you want to constrain by push_type, assuming your model is normalized to have that information only within the pushes table, you are going to need a join such as:
select m.*
from multipliers m
inner join pushes p
on m.push_id = p.id
where p.type = 'Whatever push type'
and m.push_id = (
select max(push_id)
from multipliers
);
EDIT based on new requirement in question to partition by push_type:
You can extend the previous query using nested membership tests as follows to achieve the required result
select m.*
from multipliers m
inner join pushes p
on m.push_id = p.id
where p.type = 'CONSTANT'
and m.push_id = (
select max(push_id)
from multipliers
where push_id in (
select push_id
from pushes
where type = 'CONSTANT'
)
);
or alternatively use a much simpler query derived from the initial one:
select *
from multipliers
where push_id = (
select max(push_id)
from pushes
where push_type = 'CONSTANT'
)
Perhaps something like this?
SELECT * FROM
multipliers INNER JOIN pushes
ON multipliers.push_id = pushes.push_id
ORDER BY multipliers.push_id DESC
LIMIT 1
The INNER JOIN ensures that your are selecting the data from a multiplier which has a push record, with the dataset ordered by the largest push_id.
Following your comments, for additional criteria on the "push type", you could use:
SELECT * FROM
multipliers INNER JOIN pushes
ON multipliers.push_id = pushes.push_id
WHERE pushes.type = 'X'
ORDER BY multipliers.push_id DESC
LIMIT 1
EDIT:
Following the update to your question, you may need to do something like this:
select m.*, p.type
from multipliers m inner join pushes p on m.push_id = p.id
where m.push_id =
(
select max(m2.push_id)
from multipliers m2 inner join pushes p2 on m2.push_id = p2.id
where p2.type = 'CONSTANT'
)
I wrote a query for mysql that achieved what I wanted. It's structured a bit like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c where table_a.y = table_c.y and table_b.z = table_c.z
)
)
I translated the query to sqlalchemy and the result is structured like this:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z).exists()
).exists()
)
Which generates a query like this:
select * from table_a where exists(
select * from table_b where table_a.x = table_b.x and exists(
select * from table_c, table_a where table_a.y = table_c.y and table_b.z = table_c.z
)
)
Note the re-selection of table_a in the innermost query - which breaks the intended functionality.
How can I stop sqlalchemy from selecting the table again in a nested query?
Tell the innermost query to correlate all except table_c:
session.query(table_a).filter(
session.query(table_b).filter(table_a.x == table_b.x).filter(
session.query(table_c).filter(table_a.y == table_c.y).filter(table_b.x == table_c.z)
.exists().correlate_except(table_c)
).exists()
)
In contrast to "auto-correlation", which only considers FROM elements from the enclosing Select, explicit correlation will consider FROM elements from any nesting level as candidates.
The task is a grouping of datetime values (using SQLAlchemy) into per minute points (group by minute).
I have a custom SQL-query:
SELECT COUNT(*) AS point_value, MAX(time) as time
FROM `Downloads`
LEFT JOIN Mirror ON Downloads.mirror = Mirror.id
WHERE Mirror.domain_name = 'localhost.local'
AND `time` BETWEEN '2012-06-30 00:29:00' AND '2012-07-01 00:29:00'
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
ORDER BY time ASC
It works great, but now I have do it in SQLAlchemy. This is what I've got for now (grouping by year is just an example):
rows = (DBSession.query(func.count(Download.id), func.max(Download.time)).
filter(Download.time >= fromInterval).
filter(Download.time <= untilInterval).
join(Mirror,Download.mirror==Mirror.id).
group_by(func.year(Download.time)).
order_by(Download.time)
)
It gives me this SQL:
SELECT count("Downloads".id) AS count_1, max("Downloads".time) AS max_1
FROM "Downloads" JOIN "Mirror" ON "Downloads".mirror = "Mirror".id
WHERE "Downloads".time >= :time_1 AND "Downloads".time <= :time_2
GROUP BY year("Downloads".time)
ORDER BY "Downloads".time
As you can see, it lacking only the correct grouping:
GROUP BY DAYOFYEAR( time ) , ( 60 * HOUR( time ) + MINUTE(time ))
Does SQLAlchemy have some function to group by minute?
You can use any SQL side function from SA by means of Functions, which you already use fr the YEAR part. I think in your case you just need to add (not tested):
from sqlalchemy.sql import func
...
# add another group_by to your existing query:
rows = ...
group_by(func.year(Download.time),
60 * func.HOUR(Download.time) + func.MINUTE(Download.time)
)