SQLAlchemy SQL parameter list substitution with pyodbc - python

I am trying to bind a list to a parameter in a raw SQL query in sqlalchemy. This post suggests a great way to do so with psycopg2 as below.
some_ids = [1, 2, 3, 4]
query = "SELECT * FROM my_table WHERE id = ANY(:ids);"
engine.execute(sqlalchemy.sql.text(query), ids=some_ids)
However, this does not seems to work for my environment for SQL Server with pyodbc. Only one "?" gets added instead of 4.
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError)
('Invalid parameter type. param-index=0 param-type=tuple', 'HY105')
[SQL: 'SELECT * FROM my_table WHERE id = ANY(?);'] [parameters: ((1, 2, 3, 4),)]
Is there any way to make this work? I would like to avoid manually creating placeholders if possible.
sqlalchemy version=1.0.13, pyodbc version=4.0.16

This seems to be working for me:
from sqlalchemy import create_engine
from sqlalchemy.sql import table, column, select, literal_column
engine = create_engine('mssql+pyodbc://SQLmyDb')
cnxn = engine.connect()
some_ids = [2, 3]
sql = select([literal_column('*')]).where(column('ID').in_(some_ids)).select_from(table('People'))
print(sql)
print('')
params = {":ID_" + str(x+1): some_ids[x] for x in range(len(some_ids))}
rows = cnxn.execute(sql, params).fetchall()
print(rows)
which prints the generated SQL statement, followed by the results of the query
SELECT *
FROM "People"
WHERE "ID" IN (:ID_1, :ID_2)
[(2, 'Anne', 'Elk'), (3, 'Gord', 'Thompson')]

Related

SQLAlchemy returns script parameters instead of script results

I'm trying to use SQLAlchemy 1.4 to query an Oracle database:
SELECT MAX(:created_date) FROM EXAMPLE_TABLE
This should return some kind of datetime object.
When I execute this code (roughly) using SQLAlchemy, it decides to return the value of the :created_date parameter instead.
from sqlalchemy import create_engine as SQLAlchemyEngine
from sqlalchemy import text as SQLAlchemyText
with SQLAlchemyEngine("example").connect() as sqlalchemy_connection:
with open(sql_script_path) as sql_script:
sql_query = SQLAlchemyText(sql_script.read())
parameters = {"created_date": "blah"}
result = connection.execute(sql_query, parameters)
for row in result:
print(row)
Why is the result (literally) "blah"?
EDIT:: See snakecharmerb's answer below; you can't bind column or table names with SQLAlchemy. It's a bit is hidden in the SQLAlchemy docs:
Binding Column and Table Names
Column and table names cannot be bound in SQL queries. You can concatenate text to build up a SQL statement, but make sure you use an Allow List or other means to validate the data in order to avoid SQL Injection security issues.
The behaviour described in the question is consistent with trying to set column names in the query using parameter binding. For example given this initial setup:
import datetime
import sqlalchemy as sa
engine = sa.create_engine('sqlite://', future=True)
tbl = sa.Table(
't',
sa.MetaData(),
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('created_date', sa.Date),
)
tbl.create(engine)
dates = [datetime.date(2022, 9, 30), datetime.date(2022, 10, 1)]
with engine.begin() as conn:
conn.execute(tbl.insert(), [{'created_date': d} for d in dates])
del tbl
Then executing this code:
q = 'SELECT :created_date FROM t'
with engine.connect() as conn:
rows = conn.execute(sa.text(q), {'created_date': 'blah'})
for row in rows:
print(row)
results in this output:
('blah',)
('blah',)
The problem is that parameter binding is designed for values, not identifiers like column or table names. To use dynamic identifiers you need to either construct the query string manually, or consider using SQLAlchemy's reflection capabilities to build the query using objects:
# Reflect the table.
reflected_tbl = sa.Table('t', sa.MetaData(), autoload_with=engine)
# Specify the column(s) to be selected.
q2 = sa.select(reflected_tbl.c['created_date'])
with engine.connect() as conn:
rows = conn.execute(q2)
for row in rows:
print(row)
outputs:
(datetime.date(2022, 9, 30),)
(datetime.date(2022, 10, 1),)

Alter query according to user selection in sqlite python

I have a sqlite database named StudentDB which has 3 columns Roll number, Name, Marks. Now I want to fetch only the columns that user selects in the IDE. User can select one column or two or all the three. How can I alter the query accordingly using Python?
I tried:
import sqlite3
sel={"Roll Number":12}
query = 'select * from StudentDB Where({seq})'.format(seq=','.join(['?']*len(sel))),[i for k,i in sel.items()]
con = sqlite3.connect(database)
cur = con.cursor()
cur.execute(query)
all_data = cur.fetchall()
all_data
I am getting:
operation parameter must be str
You should control the text of the query. The where clause shall allways be in the form WHERE colname=value [AND colname2=...] or (better) WHERE colname=? [AND ...] if you want to build a parameterized query.
So you want:
query = 'select * from StudentDB Where ' + ' AND '.join('"{}"=?'.format(col)
for col in sel.keys())
...
cur.execute(query, tuple(sel.values()))
In your code, the query is now a tuple instead of str and that is why the error.
I assume you want to execute a query like below -
select * from StudentDB Where "Roll number"=?
Then you can change the sql query like this (assuming you want and and not or) -
query = "select * from StudentDB Where {seq}".format(seq=" and ".join('"{}"=?'.format(k) for k in sel.keys()))
and execute the query like -
cur.execute(query, tuple(sel.values()))
Please make sure in your code the provided database is defined and contains the database name and studentDB is indeed the table name and not database name.

Multi-row UPSERT (INSERT or UPDATE) from Python

I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:
import pyodbc
table_name = 'my_table'
insert_values = [(1,2,3),(2,2,4),(3,4,5)]
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(
' '.join([
'insert into',
table_name,
'values',
','.join(
[str(i) for i in insert_values]
)
])
)
cursor.commit()
This should work as long as there are no duplicate keys (let's assume the first column contains the key). However for data with duplicate keys (data already existing in the table) it will raise an error.
How can I, in one go, insert multiple rows in a SQL server table using pyodbc such that data with duplicate keys simply gets updated.
Note: There are solutions proposed for single rows of data, however, I would like to insert multiple rows at once (avoid loops)!
This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:
MERGE INTO MyTable as Target
USING (SELECT * FROM
(VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5))
AS s (ID, col_a, col_b)
) AS Source
ON Target.ID=Source.ID
WHEN NOT MATCHED THEN
INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
WHEN MATCHED THEN
UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
You can give it a try on rextester.com/IONFW62765.
Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).
In python with pyodbc, it should probably look like this:
import pyodbc
insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
table_name = 'my_table'
key_col = 'ID'
col_a = 'col_a'
col_b = 'col_b'
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(('MERGE INTO {table_name} as Target '
'USING (SELECT * FROM '
'(VALUES {vals}) '
'AS s ({k}, {a}, {b}) '
') AS Source '
'ON Target.ID=Source.ID '
'WHEN NOT MATCHED THEN '
'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
'WHEN MATCHED THEN '
'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
.format(table_name=table_name,
vals=','.join([str(i) for i in insert_values]),
k=key_col,
a=col_a,
b=col_b)))
cursor.commit()
You can read up more on MERGE in the SQL Server docs.
Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:
# assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
# also assumes your dataframe columns have all the same names as the existing table
table_name_to_update = 'update_table'
table_name_to_transfer = 'placeholder_table'
# the dataframe and existing table should both have a column to use as the primary key
primary_key_col = 'id'
# replace the placeholder table with the dataframe
df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
# building the command terms
cols_list = df.columns.tolist()
cols_list_query = f'({(", ".join(cols_list))})'
sr_cols_list = [f'Source.{i}' for i in cols_list]
sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
up_cols_list_query = f'{", ".join(up_cols_list)}'
# fill values that should be interpreted as "NULL" with None
def fill_null(vals: list) -> list:
def bad(val):
if isinstance(val, type(pd.NA)):
return True
# the list of values you want to interpret as 'NULL' should be
# tweaked to your needs
return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
return tuple(i if not bad(i) else None for i in vals)
# create the list of parameter indicators (?, ?, ?, etc...)
# and the parameters, which are the values to be inserted
params = [fill_null(row.tolist()) for _, row in df.iterrows()]
param_slots = '('+', '.join(['?']*len(df.columns))+')'
cmd = f'''
MERGE INTO {table_name_to_update} as Target
USING (SELECT * FROM
(VALUES {param_slots})
AS s {cols_list_query}
) AS Source
ON Target.{primary_key_col}=Source.{primary_key_col}
WHEN NOT MATCHED THEN
INSERT {cols_list_query} VALUES {sr_cols_list_query}
WHEN MATCHED THEN
UPDATE SET {up_cols_list_query};
'''
# execute the command to merge tables
with engine.begin() as conn:
conn.execute(cmd, params)
This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).
For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:
import urllib
import pyodbc
pyodbc.pooling = False
import sqlalchemy
terms = urllib.parse.quote_plus(
'DRIVER={SQL Server Native Client 11.0};'
'SERVER=<your server>;'
'DATABASE=<your database>;'
'Trusted_Connection=yes;' # to logon using Windows credentials
url = f'mssql+pyodbc:///?odbc_connect={terms}'
engine = sqlalchemy.create_engine(url, fast_executemany=True)
EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.
Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.
def append(df, c):
table_name = 'ddf.ddf_actuals'
columns_list = df.columns.tolist()
columns_list_query = f'({(",".join(columns_list))})'
sr_columns_list = [f'Source.{i}' for i in columns_list]
sr_columns_list_query = f'({(",".join(sr_columns_list))})'
up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
up_columns_list_query = f'{",".join(up_columns_list)}'
rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
query = f"MERGE INTO {table_name} as Target \
USING (SELECT * FROM \
(VALUES {rows_to_insert}) \
AS s {columns_list_query}\
) AS Source \
ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
WHEN NOT MATCHED THEN \
INSERT {columns_list_query} VALUES {sr_columns_list_query} \
WHEN MATCHED THEN \
UPDATE SET {up_columns_list_query};"
c.execute(query)
c.commit()

Passing Array Parameter to SQL command

In Python 2.7, I can do this pass a parameter to an sql command like this:
cursor.execute("select * from my_table where id = %s", [2])
I can not get the array equivalent working like this:
cursor.execute("select * from my_table where id in %s", [[10,2]])
Obviously, I can just do string formatting, but I would like to do a proper parameter if possible. I'm using a postgresql database if that matters.
cursor.execute("select * from my_table where id = ANY(%s);", [[10, 20]])
See note. To use IN see section below.
cursor.execute(cursor.mogrify("select * from my_table where id in %s",
[tuple([10, 20])]))

Python MySQL executemany in WHERE clause

I want to select values from MySQL as follows
do_not_select = [1,2,3]
cursor = database.cursor()
cursor.executemany("""SELECT * FROM table_a WHERE id != %s""",(do_not_select))
data = cursor.fetchall()
The query return all the values in the db apart form the first id (1). I don't want it to select id 1,2 or 3 however.
Is this possible using the executemany command..?
Give NOT IN a go:
do_not_select = [1, 2, 3]
cursor.execute("""SELECT * FROM table_a
WHERE id NOT IN ({}, {}, {})""".format(do_not_select[0],
do_not_select[1],
do_not_select[2]))
data.cursor.fetchall()
I suspect (though I haven't tested this) that this would work better id do_not_select was a tuple, then I think you could just fire it straight into your query:
do_not_select = (1, 2, 3)
cursor.execute("""SELECT * FROM table_a
WHERE id NOT IN {}""".format(do_not_select))
data.cursor.fetchall()
I'd be interested to know if this works - if you try it please let me know :)

Categories

Resources