The code is very simple, I just directly run it from console, meanwhile, spider.table_name = 'crawler'
import MySQLdb
import scrapy
print (spider.table_name) # >> 'crawler'
db = MySQLdb.connect(........)
db.set_character_set('utf8')
mysql = db.cursor()
sql = "CREATE TABLE %s like r_template;"
mysql.execute(sql, (spider.table_name, ))
db.commit()
But I got Syntax Error:
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''crawler' like r_template' at line 1")
It seems that the actual sql sentence that being executed was:
CREATE TABLE 'crawler' like r_template
How is that single quote '' generated? How to prevent it from happening?
Then, I tried in more simple way:
mysql.execute(sql, ('crawler', ))
mysql.execute("CREATE TABLE %s like r_template", ('crawler', ))
errors still happened.
You accidentally opened the door to a mysterious and adventurous world. My suggestion is: don't close that door before having a look.
The problem is that your are trying to pass argument for a placeholder "on the left side", but your interface with MySQL works only on the "right side". Placeholders are used for values, not for field names or tables.
The superficial explanation
Let me start with another example.
It is legal to write:
where field = %s
and if the variable is a string, your PEP 249-compliant interface will correctly interpret it: think of it as "putting quotes arounds it" (though it's NOT what it does, otherwise it would open the door to SQL injections; but that will illustrate the point).
That's on the right side of the equality.
But if you write:
where %s = 5
with a value 'my_field', it will not work, because it is on the left side. This is not part of the interface.
As you said if you applied the same logic, it would "put quotes around it", so you would get:
where 'my_field' = 5
it apparently doesn't make sense, because you get quotes where you didn't expect them (caution: again, that's not what happens, but it illustrates the point). It doesn't work, yet those quotes are what you should get if you followed your own logic. There is a contradiction, so something is apparently wrong.
But wait!
A deeper explanation
it is important to understand that with PEP 249 interfaces, the arguments for placeholders are NOT converted into strings and then put into a string query. They are translated into their equivalents (int, etc.) and treated at a lower level (within a parsing tree or some similar structure, I suppose).
The mechanism has been specified for converting the arguments into values. It was not designed for variable identifiers, such as fields or tables (which is a more advanced use case).
Saying that an "identifier is a variable" is quite an advanced idea... I gets you into the wonderful world of higher-order programming.
Could PEP249 be extended to do that? In theory yes, but this is not an open and shut question.
Solution
In the mean time, you are left with only one option: interpolate your string query before you give it to the SQL engine.
sql = "CREATE TABLE %s like r_template;" % 'crawler'
I can imagine the shudders of horror in people around you (don't open that door!). But to the best of my knowledge, that's what you have to do if you really want to have a variable table name.
At that point, you may want to ask yourself why you want to have a variable table name? Did you do that as a lazy workaround for something else? In that case, I would return to the beaten path and forget about making tables or fields variable.
I see, however, two use cases where variable tables or fields are perfectly legitimate: for database administration or for building a database framework.
If that is your case, just use your string interpolation with care, to avoid unintentional SQL injections. You will have to deal wit different issues with whitespaces, special characters, etc.; a good practice will be to "quote" your field or table names, e.g. in standard MySQL:
`table accentuée`
whereas, with ANSI quoting:
"table accentuée"
(As you see, you were not that far off!)
Also be careful to strip off things that could throw the interpreter off, like semicolons.
Anyway, if you want to do that, you will need to navigate out of sight of the coast, straight toward the sunset. You are on the threshold of the hero's journey to the left side. You will enjoy the adventure, as long as you accept that there will be no lifeguard to come to your rescue.
Use .format instead of %s this will allow you to avoid the single quotes in your query.
For Example:
my_sql_query = "CREATE TABLE {}".format(table_name)
mysql.execute(my_sql_query)
That should work :)
Related
I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
I'm currently reviewing someone's code, and I ran into the following Python line:
db.query('''SELECT foo FROM bar WHERE id = %r''' % id)
This goes against my common sense, because I would usually opt-in to use prepared statements, or at the very least use the database system's native string escaping function.
However, I am still curious how this could be exploited, given that:
The 'id' value is a string or number that's provided by an end-user/pentester
This is MySQL
The connection is explicitly set to use UTF8.
Python drivers for MySQL don't support real prepared statements. They all do some form of string-interpolation. The trick is to get Python to do the string-interpolation with proper escaping.
See a demonstration of doing it unsafely: How do PyMySQL prevent user from sql injection attack?
The conventional solution to simulate parameters is the following:
sql = "SELECT foo FROM bar WHERE id = %s"
cursor.execute(sql, (id,))
See https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html
The only ways I know to overcome escaping (when it is done correctly) are:
Exploit GBK or SJIS or similar character sets, where an escaped quote becomes part of a multi-byte character. By ensuring to set names utf8, you should be safe from this issue.
Change the sql_mode to break the escaping, like enable NO_BACKSLASH_ESCAPES or ANSI_QUOTES. You should set sql_mode at the start of your session, similar to how you set names. This will ensure it isn't using a globally changed sql_mode that causes a problem.
See also Is "mysqli_real_escape_string" enough to avoid SQL injection or other SQL attacks?
I have a table of three columnsid,word,essay.I want to do a query using (?). The sql sentence is sql1 = "select id,? from training_data". My code is below:
def dbConnect(db_name,sql,flag):
conn = sqlite3.connect(db_name)
cursor = conn.cursor()
if (flag == "danci"):
itm = 'word'
elif flag == "wenzhang":
itm = 'essay'
n = cursor.execute(sql,(itm,))
res1 = cursor.fetchall()
return res1
However, when I print dbConnect("data.db",sql1,"danci")
The result I obtained is [(1,'word'),(2,'word'),(3,'word')...].What I really want to get is [(1,'the content of word column'),(2,'the content of word column')...]. What should I do ? Please give me some ideas.
You can't use placeholders for identifiers -- only for literal values.
I don't know what to suggest in this case, as your function takes a database nasme, an SQL string, and a flag to say how to modify that string. I think it would be better to pass just the first two, and write something like
sql = {
"danci": "SELECT id, word FROM training_data",
"wenzhang": "SELECT id, essay FROM training_data",
}
and then call it with one of
dbConnect("data.db", sql['danci'])
or
dbConnect("data.db", sql['wenzhang'])
But a lot depends on why you are asking dbConnect to decide on the columns to fetch based on a string passed in from outside; it's an unusual design.
Update - SQL Injection
The problems with SQL injection and tainted data is well documented, but here is a summary.
The principle is that, in theory, a programmer can write safe and secure programs as long as all the sources of data are under his control. As soon as they use any information from outside the program without checking its integrity, security is under threat.
Such information ranges from the obvious -- the parameters passed on the command line -- to the obscure -- if the PATH environment variable is modifiable then someone could induce a program to execute a completely different file from the intended one.
Perl provides direct help to avoid such situations with Taint Checking, but SQL Injection is the open door that is relevant here.
Suppose you take the value for a database column from an unverfied external source, and that value appears in your program as $val. Then, if you write
my $sql = "INSERT INTO logs (date) VALUES ('$val')";
$dbh->do($sql);
then it looks like it's going to be okay. For instance, if $val is set to 2014-10-27 then $sql becomes
INSERT INTO logs (date) VALUES ('2014-10-27')
and everything's fine. But now suppose that our data is being provided by someone less than scrupulous or downright malicious, and your $val, having originated elsewhere, contains this
2014-10-27'); DROP TABLE logs; SELECT COUNT(*) FROM security WHERE name != '
Now it doesn't look so good. $sql is set to this (with added newlines)
INSERT INTO logs (date) VALUES ('2014-10-27');
DROP TABLE logs;
SELECT COUNT(*) FROM security WHERE name != '')
which adds an entry to the logs table as before, end then goes ahead and drops the entire logs table and counts the number of records in the security table. That isn't what we had in mind at all, and something we must guard against.
The immediate solution is to use placeholders ? in a prepared statement, and later passing the actual values in a call to execute. This not only speeds things up, because the SQL statement can be prepared (compiled) just once, but protects the database from malicious data by quoting every supplied value appropriately for the data type, and escaping any embedded quotes so that it is impossible to close one statement and another open another.
This whole concept was humourised in Randall Munroe's excellent XKCD comic
I'm to link my code to a MySQL database using pymysql. In general everything has gone smoothly but I'm having difficulty with the following function to find the minimum of a variable column.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN(%s) FROM table"
cur.execute(sql,column)
mintup = cur.fetchone()
if everything went smoothly this would return me a tuple with the minimum, e.g. (1,).
However, if I run the function:
findmin(column_name)
I have to put column name in "" (i.e. "column_name"), else Python sees it as an unknown variable. But if I put the quotation marks around column_name then SQL sees
SELECT MIN("column_name") FROM table
which just returns the column header, not the value.
How can I get around this?
The issue is likely the use of %s for the column name. That means the SQL Driver will try to escape that variable when interpolating it, including quoting, which is not what you want for things like column names, table names, etc.
When using a value in SELECT, WHERE, etc. then you do want to use %s to prevent SQL injections and enable quoting, among other things.
Here, you just want to interpolate using pure Python (assuming a trusted value; please see below for more information). That also means no bindings tuple passed to the execute method.
def findmin(column):
cur = db.cursor()
sql = "SELECT MIN({0}) FROM table".format(column)
cur.execute(sql)
mintup = cur.fetchone()
SQL fiddle showing the SQL working:
http://sqlfiddle.com/#!2/e70a41/1
In response to the Jul 15, 2014 comment from Colin Phipps (September 2022):
The relatively recent edit on this post by another community member brought it to my attention, and I wanted to respond to Colin's comment from many years ago.
I totally agree re: being careful about one's input if one interpolates like this. Certainly one needs to know exactly what is being interpolated. In this case, I would say a defined value within a trusted internal script or one supplied by a trusted internal source would be fine. But if, as Colin mentioned, there is any external input, then that is much different and additional precautions should be taken.
I could write a SP inside Mysql and excute with a call statement. But looking to write it in python instead. I got stuck with using sql script on multiple lines.
conn = pyodbc.connect('DSN=MySQL;PWD=xxxx')
csr = conn.cursor()
Sql= 'SELECT something, something
FROM table
WHERE foo=bar
ORDER BY foo '
csr.execute(Sql)
sqld = csr.fetchall()
Heh, I don't mind to make it a proper answer.
String literals in triple quotes can include linebreaks and won't cause syntax errors. Otherwise (with "string" or 'string') you will need to include a backslash before every linebreak to make it work. And from experience, that's easy to screw up. :)
As a minor note, in Python variables are usually started with a lowercase letter, names starting with capital letters usually being given to classes.
So:
Sql = """SELECT something, something
FROM table
WHERE foo=bar
ORDER BY foo"""
If you don't mind the overhead, take a look at sqlalchemy:
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.