I have an SQL Alchemy engine where I try to insert parameters via sqlalchemy.sql.text to protect against SQL injection.
The following code works, where I code variables for the condition and conditions values.
from sqlalchemy import create_engine
from sqlalchemy.sql import text
db_engine = create_engine(...)
db_engine.execute(
text(
'SELECT * FROM table_name WHERE :condition_1 = :condition_1_value'), condition_1="name", condition_1_value="John"
)
).fetchall()
However, when I try to code the variable name for table_name, it returns an error.
from sqlalchemy import create_engine
from sqlalchemy.sql import text
db_engine = create_engine(...)
db_engine.execute(
text(
'SELECT * FROM :table_name WHERE :condition_1 = :condition_1_value'), table_name="table_1", condition_1="name", condition_1_value="John"
)
).fetchall()
Any ideas why this does not work?
EDIT:
I know that it has something to do with the table_name not being a string, but I am not sure how to do it in another way.
Any ideas why this does not work?
Query parameters are used to supply the values of things (usually column values), not the names of things (tables, columns, etc.). Every database I've seen works that way.
So, despite the ubiquitous advice that dynamic SQL is a "Bad Thing", there are certain cases where it is simply necessary. This is one of them.
table_name = "table_1" # NOTE: Do not use untrusted input here!
stmt = text(f'SELECT * FROM "{table_name}" …')
Also, check the results you get from trying to parameterize a column name. You may not be getting what you expect.
stmt = text("SELECT * FROM table_name WHERE :condition_1 = :condition_1_value")
db_engine.execute(stmt, dict(condition_1="name", condition_1_value="John"))
will not produce the equivalent of
SELECT * FROM table_name WHERE name = 'John'
It will render the equivalent of
SELECT * FROM table_name WHERE 'name' = 'John'
and will not throw an error, but it will also return no rows because 'name' = 'John' will never be true.
Related
fairly new to SQL in general. I'm currently trying to bolster my general understanding of how to pass commands via cursor.execute(). I'm currently trying to grab a column from a table and rename it to something different.
import mysql.connector
user = 'root'
pw = 'test!*'
host = 'localhost'
db = 'test1'
conn = mysql.connector.connect(user=user, password=pw, host=host, database=db)
cursor = conn.cursor(prepared=True)
new_name = 'Company Name'
query = f'SELECT company_name AS {new_name} from company_directory'
cursor.execute(query)
fetch = cursor.fetchall()
I've also tried it like this:
query = 'SELECT company_name AS %s from company_directory'
cursor.execute(query, ('Company Name'),)
fetch = cursor.fetchall()
but that returns the following error:
stmt = self._cmysql.stmt_prepare(statement)
_mysql_connector.MySQLInterfaceError: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '? from company_directory' at line 1
I'm using python and mySQL. I keep reading about database injection and not using string concatenation but every time I try to use %s I get an error similar to the one below where. I've tried switching to ? syntax but i get the same error.
If someone could ELI5 what the difference is and what exactly database injection is and if what I'm doing in the first attempt qualifies as string concatenation that I should be trying to avoid.
Thank you so much!
If a column name or alias contains spaces, you need to put it in backticks.
query = f'SELECT company_name AS `{new_name}` from company_directory'
You can't use a placeholder for identifiers like table and column names or aliases, only where expressions are allowed.
You can't make a query parameter in place of a column alias. The rules for column aliases are the same as column identifiers, and they must be fixed in the query before you pass the query string.
So you could do this:
query = f"SELECT company_name AS `{'Company Name'}` from company_directory'
cursor.execute(query)
I am trying to create a temporary table from a pandas df and then use it in a sql statement
import snowflake.connector
from snowflake.connector.pandas_tools import write_pandas
with snowflake.connector.connect(
account='snoflakewebsite',
user='username',
authenticator='externalbrowser',
database='db',
schema='schema'
) as con:
success, nchunks, nrows, _ = write_pandas(
conn=con,
df=df,
table_name='temp_table',
auto_create_table = True,
table_type='temporary',
overwrite = True,
database='db',
schema='schema'
)
cur = con.cursor()
cur.execute('select * from temp_table')
The error I get:
ProgrammingError: 002003 (42S02): SQL compilation error:
Object 'TEMP_TABLE' does not exist or not authorized.
write_pandas() creates a table using the letter case exactly how it is passed in table_name=, while the query submitted in cur.execute() passes the entire string with the query to Snowflake SQL, and Snowflake SQL capitalizes the object names unless they are written in double quotes.
Therefore, either you create a table using capital letters table_name='TEMP_TABLE',
or you query it using double quotes:
cur.execute('select * from "temp_table"')
In this case, you will get your table created in small letters, and you always need to add double quotes to refer to its name.
One of our queries that was working in Python 2 + mxODBC is not working in Python 3 + pyodbc; it raises an error like this: Maximum number of parameters in the sql query is 2100. while connecting to SQL Server. Since both the printed queries have 3000 params, I thought it should fail in both environments, but clearly that doesn't seem to be the case here. In the Python 2 environment, both MSODBC 11 or MSODBC 17 works, so I immediately ruled out a driver related issue.
So my question is:
Is it correct to send a list as multiple params in SQLAlchemy because the param list will be proportional to the length of list? I think it looks a bit strange; I would have preferred concatenating the list into a single string because the DB doesn't understand the list datatype.
Are there any hints on why it would be working in mxODBC but not pyodbc? Does mxODBC optimize something that pyodbc does not? Please let me know if there are any pointers - I can try and paste more info here. (I am still new to debugging SQLAlchemy.)
Footnote: I have seen lot of answers that suggest to chunk the data, but because of 1 and 2, I wonder if I am doing the correct thing in the first place.
(Since it seems to be related to pyodbc, I have raised an internal issue in the official repository.)
import sqlalchemy
import sqlalchemy.orm
from sqlalchemy import MetaData, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
create_tables = """
CREATE TABLE products(
idn NUMERIC(8) PRIMARY KEY
);
"""
check_tables = """
SELECT * FROM products;
"""
insert_values = """
INSERT INTO products
(idn)
values
(1),
(2);
"""
delete_tables = """
DROP TABLE products;
"""
engine = sqlalchemy.create_engine('mssql+pyodbc://user:password#dsn')
connection = engine.connect()
cursor = engine.raw_connection().cursor()
Session = sqlalchemy.orm.sessionmaker(bind=connection)
session = Session()
session.execute(create_tables)
metadata = MetaData(connection)
class Products(Base):
__table__ = Table('products', metadata, autoload=True)
try:
session.execute(check_tables)
session.execute(insert_values)
session.commit()
query = session.query(Products).filter(
Products.idn.in_(list(range(0, 3000)))
)
query.all()
f = open("query.sql", "w")
f.write(str(query))
f.close()
finally:
session.execute(delete_tables)
session.commit()
When you do a straightforward .in_(list_of_values) SQLAlchemy renders the following SQL ...
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN (?, ?)
... where each value in the IN clause is specified as a separate parameter value. pyodbc sends this to SQL Server as ...
exec sp_prepexec #p1 output,N'#P1 nvarchar(4),#P2 nvarchar(4)',N'SELECT team.prov AS team_prov, team.city AS team_city, team.team_name AS team_team_name
FROM team
WHERE team.prov IN (#P1, #P2)',N'AB',N'ON'
... so you hit the limit of 2100 parameters if your list is very long. Presumably, mxODBC inserted the parameter values inline before sending it to SQL Server, e.g.,
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN ('AB', 'ON')
You can get SQLAlchemy to do that for you with
provinces = ["AB", "ON"]
stmt = (
session.query(Team)
.filter(
Team.prov.in_(sa.bindparam("p1", expanding=True, literal_execute=True))
)
.statement
)
result = list(session.query(Team).params(p1=provinces).from_statement(stmt))
I am using Sqlalchemy 1.3 to connect to a PostgreSQL 9.6 database (through Psycopg).
I have a very, very raw Sql string formatted using Psycopg2 syntax which I can not modify because of some legacy issues:
statement_str = SELECT * FROM users WHERE user_id=%(user_id)s
Notice the %(user_id)s
I can happily execute that using a sqlalchemy connection just by doing:
connection = sqlalch_engine.connect()
rows = conn.execute(statement_str, user_id=self.user_id)
And it works fine. I get my user and all is nice and good.
Now, for debugging purposes I'd like to get the actual query with the %(user_id)s argument expanded to the actual value. For instance: If user_id = "foo", then get SELECT * FROM users WHERE user_id = 'foo'
I've seen tons of examples using sqlalchemy.text(...) to produce a statement and then get a compiled version. I have that thanks to other answers like this one or this one been able to produce a decent str when I have an SqlAlchemy query.
However, in this particular case, since I'm using a more cursor-specific syntax %(user_id) I can't do that. If I try:
text(statement_str).bindparams(user_id="foo")
I get:
This text() construct doesn't define a bound parameter named 'user_id'
So I guess what I'm looking for would be something like
conn.compile(statement_str, user_id=self.user_id)
But I haven't been able to get that.
Not sure if this what you want but here goes.
Assuming statement_str is actually a string:
import sqlalchemy as sa
statement_str = "SELECT * FROM users WHERE user_id=%(user_id)s"
params = {'user_id': 'foo'}
query_text = sa.text(statement_str % params)
# str(query_text) should print "select * from users where user_id=foo"
Ok I think I got it.
The combination of SqlAlchemy's raw_connection + Psycopg's mogrify seems to be the answer.
conn = sqlalch_engine.raw_connection()
try:
cursor = conn.cursor()
s_str = cursor.mogrify(statement_str, {'user_id': self.user_id})
s_str = s_str.decode("utf-8") # mogrify returns bytes
# Some cleanup for niceness:
s_str = s_str.replace('\n', ' ')
s_str = re.sub(r'\s{2,}', ' ', s_str)
finally:
conn.close()
I hope someone else finds this helpful
I just upgraded to Pandas 0.24.0 from 0.23.4 (Python 2.7.12), and many of my pd.read_sql queries are breaking. It looks like something related to MySQL, but it's strange that these errors only occur after updating my pandas version. Any ideas what's going on?
Here's my MySQL table:
CREATE TABLE `xlations_topic_update_status` (
`run_ts` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Here's my query:
import pandas as pd
from sqlalchemy import create_engine
db_engine = create_engine('mysql+mysqldb://<><>/product_analytics', echo=False)
pd.read_sql('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]
And here's the error:
OperationalError: (_mysql_exceptions.OperationalError) (1059, "Identifier name 'select max(run_ts) from product_analytics.xlations_topic_update_status;' is too long") [SQL: 'DESCRIBE `select max(run_ts) from product_analytics.xlations_topic_update_status;`']
I've also gotten this for other more complex queries, but won't post them here.
According to documentation the first argument is either a string (a table name) or SQLAlchemy Selectable (select or text object). In other words pd.read_sql() is delegating to pd.read_sql_table() and treating the entire query string as a table identifier.
Wrap your query string in a text() construct first:
stmt = text('select max(run_ts) from product_analytics.xlations_topic_update_status')
pd.read_sql(stmt, con = db_engine).values[0][0]
This way pd.read_sql() will delegate to pd.read_sql_query() instead. Another option is to call it directly.
Try using pd.read_sql_query(sql, con), instead of pd.read_sql(...).
So:
pd.read_sql_query('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]