I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
Related
I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
The code is very simple, I just directly run it from console, meanwhile, spider.table_name = 'crawler'
import MySQLdb
import scrapy
print (spider.table_name) # >> 'crawler'
db = MySQLdb.connect(........)
db.set_character_set('utf8')
mysql = db.cursor()
sql = "CREATE TABLE %s like r_template;"
mysql.execute(sql, (spider.table_name, ))
db.commit()
But I got Syntax Error:
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''crawler' like r_template' at line 1")
It seems that the actual sql sentence that being executed was:
CREATE TABLE 'crawler' like r_template
How is that single quote '' generated? How to prevent it from happening?
Then, I tried in more simple way:
mysql.execute(sql, ('crawler', ))
mysql.execute("CREATE TABLE %s like r_template", ('crawler', ))
errors still happened.
You accidentally opened the door to a mysterious and adventurous world. My suggestion is: don't close that door before having a look.
The problem is that your are trying to pass argument for a placeholder "on the left side", but your interface with MySQL works only on the "right side". Placeholders are used for values, not for field names or tables.
The superficial explanation
Let me start with another example.
It is legal to write:
where field = %s
and if the variable is a string, your PEP 249-compliant interface will correctly interpret it: think of it as "putting quotes arounds it" (though it's NOT what it does, otherwise it would open the door to SQL injections; but that will illustrate the point).
That's on the right side of the equality.
But if you write:
where %s = 5
with a value 'my_field', it will not work, because it is on the left side. This is not part of the interface.
As you said if you applied the same logic, it would "put quotes around it", so you would get:
where 'my_field' = 5
it apparently doesn't make sense, because you get quotes where you didn't expect them (caution: again, that's not what happens, but it illustrates the point). It doesn't work, yet those quotes are what you should get if you followed your own logic. There is a contradiction, so something is apparently wrong.
But wait!
A deeper explanation
it is important to understand that with PEP 249 interfaces, the arguments for placeholders are NOT converted into strings and then put into a string query. They are translated into their equivalents (int, etc.) and treated at a lower level (within a parsing tree or some similar structure, I suppose).
The mechanism has been specified for converting the arguments into values. It was not designed for variable identifiers, such as fields or tables (which is a more advanced use case).
Saying that an "identifier is a variable" is quite an advanced idea... I gets you into the wonderful world of higher-order programming.
Could PEP249 be extended to do that? In theory yes, but this is not an open and shut question.
Solution
In the mean time, you are left with only one option: interpolate your string query before you give it to the SQL engine.
sql = "CREATE TABLE %s like r_template;" % 'crawler'
I can imagine the shudders of horror in people around you (don't open that door!). But to the best of my knowledge, that's what you have to do if you really want to have a variable table name.
At that point, you may want to ask yourself why you want to have a variable table name? Did you do that as a lazy workaround for something else? In that case, I would return to the beaten path and forget about making tables or fields variable.
I see, however, two use cases where variable tables or fields are perfectly legitimate: for database administration or for building a database framework.
If that is your case, just use your string interpolation with care, to avoid unintentional SQL injections. You will have to deal wit different issues with whitespaces, special characters, etc.; a good practice will be to "quote" your field or table names, e.g. in standard MySQL:
`table accentuée`
whereas, with ANSI quoting:
"table accentuée"
(As you see, you were not that far off!)
Also be careful to strip off things that could throw the interpreter off, like semicolons.
Anyway, if you want to do that, you will need to navigate out of sight of the coast, straight toward the sunset. You are on the threshold of the hero's journey to the left side. You will enjoy the adventure, as long as you accept that there will be no lifeguard to come to your rescue.
Use .format instead of %s this will allow you to avoid the single quotes in your query.
For Example:
my_sql_query = "CREATE TABLE {}".format(table_name)
mysql.execute(my_sql_query)
That should work :)
This question already has answers here:
How do you escape strings for SQLite table/column names in Python?
(8 answers)
Closed 7 years ago.
I have a wide table in a sqlite3 database, and I wish to dynamically query certain columns in a Python script. I know that it's bad to inject parameters by string concatenation, so I tried to use parameter substitution instead.
I find that, when I use parameter substitution to supply a column name, I get unexpected results. A minimal example:
import sqlite3 as lite
db = lite.connect("mre.sqlite")
c = db.cursor()
# Insert some dummy rows
c.execute("CREATE TABLE trouble (value real)")
c.execute("INSERT INTO trouble (value) VALUES (2)")
c.execute("INSERT INTO trouble (value) VALUES (4)")
db.commit()
for row in c.execute("SELECT AVG(value) FROM trouble"):
print row # Returns 3
for row in c.execute("SELECT AVG(:name) FROM trouble", {"name" : "value"}):
print row # Returns 0
db.close()
Is there a better way to accomplish this than simply injecting a column name into a string and running it?
As Rob just indicated in his comment, there was a related SO post that contains my answer. These substitution constructions are called "placeholders," which is why I did not find the answer on SO initially. There is no placeholder pattern for column names, because dynamically specifying columns is not a code safety issue:
It comes down to what "safe" means. The conventional wisdom is that
using normal python string manipulation to put values into your
queries is not "safe". This is because there are all sorts of things
that can go wrong if you do that, and such data very often comes from
the user and is not in your control. You need a 100% reliable way of
escaping these values properly so that a user cannot inject SQL in a
data value and have the database execute it. So the library writers do
this job; you never should.
If, however, you're writing generic helper code to operate on things
in databases, then these considerations don't apply as much. You are
implicitly giving anyone who can call such code access to everything
in the database; that's the point of the helper code. So now the
safety concern is making sure that user-generated data can never be
used in such code. This is a general security issue in coding, and is
just the same problem as blindly execing a user-input string. It's a
distinct issue from inserting values into your queries, because there
you want to be able to safely handle user-input data.
So, the solution is that there is no problem in the first place: inject the values using string formatting, be happy, and move on with your life.
Why not use string formatting?
for row in c.execute("SELECT AVG({name}) FROM trouble".format(**{"name" : "value"})):
print row # => (3.0,)
I have a table of three columnsid,word,essay.I want to do a query using (?). The sql sentence is sql1 = "select id,? from training_data". My code is below:
def dbConnect(db_name,sql,flag):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
if (flag == "danci"):
itm = 'word'
elif flag == "wenzhang":
itm = 'essay'
n = cursor.execute(sql,(itm,))
res1 = cursor.fetchall()
return res1
However, when I print dbConnect("data.db",sql1,"danci")
The result I obtained is [(1,'word'),(2,'word'),(3,'word')...].What I really want to get is [(1,'the content of word column'),(2,'the content of word column')...]. What should I do ? Please give me some ideas.
You can't use placeholders for identifiers -- only for literal values.
I don't know what to suggest in this case, as your function takes a database nasme, an SQL string, and a flag to say how to modify that string. I think it would be better to pass just the first two, and write something like
sql = {
"danci": "SELECT id, word FROM training_data",
"wenzhang": "SELECT id, essay FROM training_data",
}
and then call it with one of
dbConnect("data.db", sql['danci'])
or
dbConnect("data.db", sql['wenzhang'])
But a lot depends on why you are asking dbConnect to decide on the columns to fetch based on a string passed in from outside; it's an unusual design.
Update - SQL Injection
The problems with SQL injection and tainted data is well documented, but here is a summary.
The principle is that, in theory, a programmer can write safe and secure programs as long as all the sources of data are under his control. As soon as they use any information from outside the program without checking its integrity, security is under threat.
Such information ranges from the obvious -- the parameters passed on the command line -- to the obscure -- if the PATH environment variable is modifiable then someone could induce a program to execute a completely different file from the intended one.
Perl provides direct help to avoid such situations with Taint Checking, but SQL Injection is the open door that is relevant here.
Suppose you take the value for a database column from an unverfied external source, and that value appears in your program as $val. Then, if you write
my $sql = "INSERT INTO logs (date) VALUES ('$val')";
$dbh->do($sql);
then it looks like it's going to be okay. For instance, if $val is set to 2014-10-27 then $sql becomes
INSERT INTO logs (date) VALUES ('2014-10-27')
and everything's fine. But now suppose that our data is being provided by someone less than scrupulous or downright malicious, and your $val, having originated elsewhere, contains this
2014-10-27'); DROP TABLE logs; SELECT COUNT(*) FROM security WHERE name != '
Now it doesn't look so good. $sql is set to this (with added newlines)
INSERT INTO logs (date) VALUES ('2014-10-27');
DROP TABLE logs;
SELECT COUNT(*) FROM security WHERE name != '')
which adds an entry to the logs table as before, end then goes ahead and drops the entire logs table and counts the number of records in the security table. That isn't what we had in mind at all, and something we must guard against.
The immediate solution is to use placeholders ? in a prepared statement, and later passing the actual values in a call to execute. This not only speeds things up, because the SQL statement can be prepared (compiled) just once, but protects the database from malicious data by quoting every supplied value appropriately for the data type, and escaping any embedded quotes so that it is impossible to close one statement and another open another.
This whole concept was humourised in Randall Munroe's excellent XKCD comic
It should be simple, bit I've spent the last hour searching for the answer. This is using psycopg2 on python 2.6.
I need something like this:
special_id = 5
sql = """
select count(*) as ct,
from some_table tbl
where tbl.id = %(the_id)
"""
cursor = connection.cursor()
cursor.execute(sql, {"the_id" : special_id})
I cannot get this to work. Were special_id a string, I could replace %(the_id) with %(the_id)s and things work well. However, I want it to use the integer so that it hits my indexes correctly.
There is a surprising lack of specific information on psycopg2 on the internet. I hope someone has an answer to this seemingly simple question.
Per PEP 249, since in psycopg2 paramstyle is pyformat, you need to use %(the_id)s even for non-strings -- trust it to do the right thing.
BTW, internet searches will work better if you use the correct spelling (no h there), but even if you mis-spelled, I'm surprised you didn't get a "did you mean" hint (I did when I deliberately tried!).