When i trying execute sql query appear error - python

Even so how to solve this problem:
cursor.execute("""SELECT * FROM Users AS t1
INNER JOIN Users_has_Users AS t ON t.Users_id = t1.id
INNER JOIN Users AS t2 ON t.Users_id1 = t2.id
WHERE t1.email = %s AND t1.id != t2.id AND t2.id >= %s
ORDER BY t2.name {}
LIMIT 10""".format(order), (email, since_id, limit))
Error:
not all arguments converted during string formatting

You cannot use SQL parameters to interpolate anything other than data; you cannot use it for any SQL keywords such as ASC, nor the limit parameter. That is the point of SQL parameters; to avoid their values from being interpreted as SQL instead.
Use string formatting to interpolate your sort direction and query limit instead:
cursor.execute("""SELECT * FROM Users AS t1
INNER JOIN Users_has_Users AS t ON t.Users_id = t1.id
INNER JOIN Users AS t2 ON t.Users_id1 = t2.id
WHERE t1.email = %s AND t1.id != t2.id AND t2.id >= %s
ORDER BY t2.name {}
LIMIT {}""".format(order, limit), (email, since_id))
This does assume that you have full control over the contents of order and limit; never set it from user-supplied data as string formatting like this would open you up to a SQL injection attack otherwise.

Related

How to replace IN in an SQL query containing a lot of parameters with Postgresql?

I am trying to retrieve information from a database using a Python tuple containing a set of ids (between 1000 and 10000 ids), but my query uses the IN statement and is subsequently very slow.
query = """ SELECT *
FROM table1
LEFT JOIN table2 ON table1.id = table2.id
LEFT JOIN ..
LEFT JOIN ...
WHERE table1.id IN {} """.format(my_tuple)
and then I query the database using PostgreSQL to charge the result in a Pandas dataframe:
with tempfile.TemporaryFile() as tmpfile:
copy_sql = "COPY ({query}) TO STDOUT WITH CSV {head}".format(
query=query, head="HEADER"
)
conn = db_engine.raw_connection()
cur = conn.cursor()
cur.copy_expert(copy_sql, tmpfile)
tmpfile.seek(0)
df = pd.read_csv(tmpfile, low_memory=False)
I know that IN is not very efficient with a high number of parameters, but I do not have any idea to optimise this part of the query. Any hint?
You could debug your query using explain statement. Probably you are trying to
sequently read big table while needing only a few rows. Is field table1.id indexed?
Or you could try to filter table1 first and then start joining
with t1 as (
select f1,f2, .... from table1 where id in {}
)
select *
from t1
left join ....

Python SQL - two left joins

I have some problems with a SQL for Python that I hope you can help me with - I'm trying to retrieve some data from wordpress/woocommerce.
My code:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2
ON t1.ID = t2.post_id
WHERE t2.meta_key = '_billing_first_name' and t2.post_id = t1.ID
LEFT JOIN test_postmeta t3
ON t1.ID = t3.post_id
WHERE t3.meta_key = '_billing_last_name' and t3.post_id = t1.ID
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
I'm getting the following error:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LEFT JOIN test_postmeta t3 ON t1.ID = t3.post_id WHERE t3.meta_key = '_billing' at line 1
What am I doing wrong?
Thanks in advance.
There should be only 1 WHERE clause before GROUP BY.
But since you use LEFT joins, setting a condition on the right table like t2.meta_key = '_billing_first_name' you get an INNER join instead because you reject unmatched rows.
So set all the conditions in the ON clauses:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2
ON t1.ID = t2.post_id AND t2.meta_key = '_billing_first_name'
LEFT JOIN test_postmeta t3
ON t1.ID = t3.post_id AND t3.meta_key = '_billing_last_name'
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
Although this query may be syntactically correct for MySql, it does not make sense to use GROUP BY since you do not do any aggregation.
Your SQL syntax is incorrect. Try this:
cursor.execute("
SELECT t1.ID, t1.post_date, t2.meta_value AS first_name, t3.meta_value AS last_name
FROM test_posts t1
LEFT JOIN test_postmeta t2 ON t1.ID = t2.post_id
LEFT JOIN test_postmeta t3 ON t1.ID = t3.post_id
WHERE t3.meta_key = '_billing_last_name' and t2.meta_key = '_billing_first_name'
GROUP BY t1.ID
ORDER BY t1.post_date DESC LIMIT 20")
It might be worth reading a little bit about SQL Joins and WHERE statements.

Add calculated column in lateral join SQLAlchemy

I am trying to write the following PosgreSQL query in SQLAlchemy:
SELECT DISTINCT user_id
FROM
(SELECT *, (amount * usd_rate) as usd_amount
FROM transactions AS t1
LEFT JOIN LATERAL (
SELECT rate as usd_rate
FROM fx_rates fx
WHERE (fx.ccy = t1.currency) AND (t1.created_date > fx.ts)
ORDER BY fx.ts DESC
LIMIT 1
) t2 On true) AS complete_table
WHERE type = 'CARD_PAYMENT' AND usd_amount > 10
So far, I have the lateral join by using subquery in the following way:
lateral_query = session.query(fx_rates.rate.label('usd_rate')).filter(fx_rates.ccy == transactions.currency,
transactions.created_date > fx_rates.ts).order_by(desc(fx_rates.ts)).limit(1).subquery('rates_lateral').lateral('rates')
task2_query = session.query(transactions).outerjoin(lateral_query,true()).filter(transactions.type == 'CARD_PAYMENT')
print(task2_query)
This produces:
SELECT transactions.currency AS transactions_currency, transactions.amount AS transactions_amount, transactions.state AS transactions_state, transactions.created_date AS transactions_created_date, transactions.merchant_category AS transactions_merchant_category, transactions.merchant_country AS transactions_merchant_country, transactions.entry_method AS transactions_entry_method, transactions.user_id AS transactions_user_id, transactions.type AS transactions_type, transactions.source AS transactions_source, transactions.id AS transactions_id
FROM transactions LEFT OUTER JOIN LATERAL (SELECT fx_rates.rate AS usd_rate
FROM fx_rates
WHERE fx_rates.ccy = transactions.currency AND transactions.created_date > fx_rates.ts ORDER BY fx_rates.ts DESC
LIMIT %(param_1)s) AS rates ON true
WHERE transactions.type = %(type_1)s
Which print the correct lateral query,but so far I don't know how to add the calculated field (amount*usd_rate), so I can apply the distinct and where statements.
Add the required entity in the Query, give it a label, and use the result as a subquery as you've done in SQL:
task2_query = session.query(
transactions,
(transactions.amount * lateral_query.c.usd_rate).label('usd_amount')).\
outerjoin(lateral_query, true()).\
subquery()
task3_query = session.query(task2_query.c.user_id).\
filter(task2_query.c.type == 'CARD_PAYMENT',
task2_query.c.usd_amount > 10).\
distinct()
On the other hand wrapping it in a subquery should be unnecessary, since you can use the calculated USD amount in a WHERE predicate in the inner query just as well:
task2_query = session.query(transactions.user_id).\
outerjoin(lateral_query, true()).\
filter(transactions.type == 'CARD_PAYMENT',
transactions.amount * lateral_query.c.usd_rate > 10).\
distinct()

Multiple insertion of one value in sqlalchemy statement to pandas

I have constructed a sql clause where I reference the same table as a and b to compare the two geometries as a postgis command.
I would like to pass a value into the sql statement using the %s operator and read the result into a pandas dataframe using to_sql, params kwargs. Currently my code will allow for one value to be passed to one %s but i'm looking for multiple insertions of the same list of values.
I'm connecting to a postgresql database using psycopg2.
Simplified code is below
sql = """
SELECT
st_distance(a.the_geom, b.the_geom, true) AS dist
FROM
(SELECT
table.*
FROM table
WHERE id in %s) AS a,
(SELECT
table.*
FROM table
WHERE id in %s) AS b
WHERE a.nid <> b.nid """
sampList = (14070,11184)
df = pd.read_sql(sql, con=conn, params = [sampList])
Basically i'm looking to replace both %s with the sampList value in both places. The code as written will only replace the first value indicating ': list index out of range. If I adjust to having one %s and replacing the second in statement with numbers the code runs, but ultimately I would like away to repeat those values.
You dont need the subqueries, just join the table with itself:
SELECT a.*, b.* -- or whatwever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM ztable a
JOIN ztable b ON a.nid < b.nid
WHERE a.id IN (%s)
AND b.id IN (%s)
;
avoid repetition by using a CTE (this may be non-optimal, performance-wise)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (%s)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
;
Performance-wise, I would just stick to the first version, and supply the list-argument twice. (or refer to it twice, using a FORMAT() construct)
first of all i would recommend you to use updated SQL from #wildplasser - it's much better and more efficient way to do that.
now you can do the following:
sql_ = """\
WITH zt AS (
SELECT * FROM ztable
WHERE id IN ({})
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid
"""
sampList = (14070,11184)
sql = sql_.format(','.join(['?' for x in sampList]))
df = pd.read_sql(sql, con=conn, params=sampList)
dynamically generated SQL with parameters (AKA: prepared statements, bind variables, etc.):
In [27]: print(sql)
WITH zt AS (
SELECT * FROM ztable
WHERE id IN (?,?)
)
SELECT a.*, b.* -- or whatever
, st_distance(a.the_geom, b.the_geom, true) AS dist
FROM zt a
JOIN zt b ON a.nid < b.nid

Return all values when a condition value is NULL

Suppose I have the following very simple query:
query = 'SELECT * FROM table1 WHERE id = %s'
And I'm calling it from a python sql wrapper, in this case psycopg:
cur.execute(query, (row_id))
The thing is that if row_id is None, I would like to get all the rows, but that query would return an empty table instead.
The easy way to approach this would be:
if row_id:
cur.execute(query, (row_id))
else:
cur.execute("SELECT * FROM table1")
Of course this is non idiomatic and gets unnecessarily complex with non-trivial queries. I guess there is a way to handle this in the SQL itself but couldn't find anything. What is the right way?
Try to use COALESCE function as below
query = 'SELECT * FROM table1 WHERE id = COALESCE(%s,id)'
SELECT * FROM table1 WHERE id = %s OR %s IS NULL
But depending how the variable is forwarded to the query it might be better to make it 0 if it is None
SELECT * FROM table1 WHERE id = %s OR %s = 0

Categories

Resources