I'm currently reviewing someone's code, and I ran into the following Python line:
db.query('''SELECT foo FROM bar WHERE id = %r''' % id)
This goes against my common sense, because I would usually opt-in to use prepared statements, or at the very least use the database system's native string escaping function.
However, I am still curious how this could be exploited, given that:
The 'id' value is a string or number that's provided by an end-user/pentester
This is MySQL
The connection is explicitly set to use UTF8.
Python drivers for MySQL don't support real prepared statements. They all do some form of string-interpolation. The trick is to get Python to do the string-interpolation with proper escaping.
See a demonstration of doing it unsafely: How do PyMySQL prevent user from sql injection attack?
The conventional solution to simulate parameters is the following:
sql = "SELECT foo FROM bar WHERE id = %s"
cursor.execute(sql, (id,))
See https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html
The only ways I know to overcome escaping (when it is done correctly) are:
Exploit GBK or SJIS or similar character sets, where an escaped quote becomes part of a multi-byte character. By ensuring to set names utf8, you should be safe from this issue.
Change the sql_mode to break the escaping, like enable NO_BACKSLASH_ESCAPES or ANSI_QUOTES. You should set sql_mode at the start of your session, similar to how you set names. This will ensure it isn't using a globally changed sql_mode that causes a problem.
See also Is "mysqli_real_escape_string" enough to avoid SQL injection or other SQL attacks?
Related
The code is very simple, I just directly run it from console, meanwhile, spider.table_name = 'crawler'
import MySQLdb
import scrapy
print (spider.table_name) # >> 'crawler'
db = MySQLdb.connect(........)
db.set_character_set('utf8')
mysql = db.cursor()
sql = "CREATE TABLE %s like r_template;"
mysql.execute(sql, (spider.table_name, ))
db.commit()
But I got Syntax Error:
ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''crawler' like r_template' at line 1")
It seems that the actual sql sentence that being executed was:
CREATE TABLE 'crawler' like r_template
How is that single quote '' generated? How to prevent it from happening?
Then, I tried in more simple way:
mysql.execute(sql, ('crawler', ))
mysql.execute("CREATE TABLE %s like r_template", ('crawler', ))
errors still happened.
You accidentally opened the door to a mysterious and adventurous world. My suggestion is: don't close that door before having a look.
The problem is that your are trying to pass argument for a placeholder "on the left side", but your interface with MySQL works only on the "right side". Placeholders are used for values, not for field names or tables.
The superficial explanation
Let me start with another example.
It is legal to write:
where field = %s
and if the variable is a string, your PEP 249-compliant interface will correctly interpret it: think of it as "putting quotes arounds it" (though it's NOT what it does, otherwise it would open the door to SQL injections; but that will illustrate the point).
That's on the right side of the equality.
But if you write:
where %s = 5
with a value 'my_field', it will not work, because it is on the left side. This is not part of the interface.
As you said if you applied the same logic, it would "put quotes around it", so you would get:
where 'my_field' = 5
it apparently doesn't make sense, because you get quotes where you didn't expect them (caution: again, that's not what happens, but it illustrates the point). It doesn't work, yet those quotes are what you should get if you followed your own logic. There is a contradiction, so something is apparently wrong.
But wait!
A deeper explanation
it is important to understand that with PEP 249 interfaces, the arguments for placeholders are NOT converted into strings and then put into a string query. They are translated into their equivalents (int, etc.) and treated at a lower level (within a parsing tree or some similar structure, I suppose).
The mechanism has been specified for converting the arguments into values. It was not designed for variable identifiers, such as fields or tables (which is a more advanced use case).
Saying that an "identifier is a variable" is quite an advanced idea... I gets you into the wonderful world of higher-order programming.
Could PEP249 be extended to do that? In theory yes, but this is not an open and shut question.
Solution
In the mean time, you are left with only one option: interpolate your string query before you give it to the SQL engine.
sql = "CREATE TABLE %s like r_template;" % 'crawler'
I can imagine the shudders of horror in people around you (don't open that door!). But to the best of my knowledge, that's what you have to do if you really want to have a variable table name.
At that point, you may want to ask yourself why you want to have a variable table name? Did you do that as a lazy workaround for something else? In that case, I would return to the beaten path and forget about making tables or fields variable.
I see, however, two use cases where variable tables or fields are perfectly legitimate: for database administration or for building a database framework.
If that is your case, just use your string interpolation with care, to avoid unintentional SQL injections. You will have to deal wit different issues with whitespaces, special characters, etc.; a good practice will be to "quote" your field or table names, e.g. in standard MySQL:
`table accentuée`
whereas, with ANSI quoting:
"table accentuée"
(As you see, you were not that far off!)
Also be careful to strip off things that could throw the interpreter off, like semicolons.
Anyway, if you want to do that, you will need to navigate out of sight of the coast, straight toward the sunset. You are on the threshold of the hero's journey to the left side. You will enjoy the adventure, as long as you accept that there will be no lifeguard to come to your rescue.
Use .format instead of %s this will allow you to avoid the single quotes in your query.
For Example:
my_sql_query = "CREATE TABLE {}".format(table_name)
mysql.execute(my_sql_query)
That should work :)
I have wrote a query which has some string replacements. I am trying to update a url in a table but the url has % signs in which causes a tuple index out of range exception.
If I print the query and run in manually it works fine but through peewee causes an issue. How can I get round this? I'm guessing this is because the percentage signs?
query = """
update table
set url = '%s'
where id = 1
""" % 'www.example.com?colour=Black%26white'
db.execute_sql(query)
The code you are currently sharing is incredibly unsafe, probably for the same reason as is causing your bug. Please do not use it in production, or you will be hacked.
Generally: you practically never want to use normal string operations like %, +, or .format() to construct a SQL query. Rather, you should to use your SQL API/ORM's specific built-in methods for providing dynamic values for a query. In your case of SQLite in peewee, that looks like this:
query = """
update table
set url = ?
where id = 1
"""
values = ('www.example.com?colour=Black%26white',)
db.execute_sql(query, values)
The database engine will automatically take care of any special characters in your data, so you don't need to worry about them. If you ever find yourself encountering issues with special characters in your data, it is a very strong warning sign that some kind of security issue exists.
This is mentioned in the Security and SQL Injection section of peewee's docs.
Wtf are you doing? Peewee supports updates.
Table.update(url=new_url).where(Table.id == some_id).execute()
I've just realized that psycopg2 allows multiple queries in one execute call.
For instance, this code will actually insert two rows in my_table:
>>> import psycopg2
>>> connection = psycopg2.connection(database='testing')
>>> cursor = connection.cursor()
>>> sql = ('INSERT INTO my_table VALUES (1, 2);'
... 'INSERT INTO my_table VALUES (3, 4)')
>>> cursor.execute(sql)
>>> connection.commit()
Does psycopg2 have some way of disabling this functionality? Or is there some other way to prevent this from happening?
What I've come so far is to search if the query has any semicolon (;) on it:
if ';' in sql:
# Multiple queries not allowed!
But this solution is not perfect, because it wouldn't allow some valid queries like:
SELECT * FROM my_table WHERE name LIKE '%;'
EDIT: SQL injection attacks are not an issue here. I do want to give to the user full access of the database (he can even delete the whole database if he wants).
If you want a general solution to this kind of problem, the answer is always going to be "parse format X, or at least parse it well enough to handle your needs".
In this case, it's probably pretty simple. PostgreSQL doesn't allow semicolons in the middle of column or table names, etc.; the only places they can appear are inside strings, or as statement terminators. So, you don't need a full parser, just one that can handle strings.
Unfortunately, even that isn't completely trivial, because you have to know the rules for what counts as a string literal in PostgreSQL. For example, is "abc\"def" a string abc"def?
But once you write or find a parser that can identify strings in PostgreSQL, it's easy: skip all the strings, then see if there are any semicolons left over.
For example (this is probably not the correct logic,* and it's also written in a verbose and inefficient way, just to show you the idea):
def skip_quotes(sql):
in_1, in_2 = False, False
for c in sql:
if in_1:
if c == "'":
in_1 = False
elif in_2:
if c == '"':
in_2 = False
else:
if c == "'":
in_1 = True
elif c == '"':
in_2 = True
else:
yield c
Then you can just write:
if ';' in skip_quotes(sql):
# Multiple queries not allowed!
If you can't find a pre-made parser, the first things to consider are:
If it's so trivial that simple string operations like find will work, do that.
If it's a simple, regular language, use re.
If the logic can be explained descriptively (e.g., via a BNF grammar), use a parsing library or parser-generator library like pyparsing or pybison.
Otherwise, you will probably need to write a state machine, or even explicit iterative code (like my example above). But this is very rarely the best answer for anything but teaching purposes.
* This is correct for a dialect that accepts either single- or double-quoted strings, does not escape one quote type within the other, and escapes quotes by doubling them (we will incorrectly treat 'abc''def' as two strings abc and def, rather than one string abc'def, but since all we're doing is skipping the strings anyway, we get the right result), but does not have C-style backslash escapes or anything else. I believe this matches sqlite3 as it actually works, although not sqlite3 as it's documented, and I have no idea whether it matches PostgreSQL.
Allowing users to make arbitrary queries (even single queries) can open your program up to SQL injection attacks and denial-of-service (DOS) attacks. The safest way to deal with potentially malicious users is to enumerate exactly what what queries are allowable and only allow the user to supply parameter values, not the entire SQL query itself.
So for example, you could define
sql = 'INSERT INTO my_table VALUES (%s, %s)'
args = [1, 2] # <-- Supplied by the user
and then safely execute the INSERT statement with:
cursor.execute(sql, args)
This is called parametrized SQL because the sql uses %s as parameter placemarkers, and the cursor.execute statement takes two arguments. The second argument is expected to be a sequence, and the database driver (e.g. psycopg2) will replace the parameter placemarkers with propertly quoted values supplied by args.
This will prevent SQL injection attacks.
The onus is still on you (when you write your allowable SQL) to prevent denial-of-service attacks. You can attempt to protect yourself from DOS attacks by making sure the arguments supplied by the user is in a reasonable range, for instance.
I am currently trying to use place holders in my PostgreSQL query within Python's psycopg's module. Here is a sample of the code I am using.
table.execute('SELECT * FROM table WHERE col2 = %s ORDER BY pID ASC LIMIT %s OFFSET %s;',(val1,val2,val3))
I read somewhere that it is not possible to use placeholders like this for LIMIT and OFFSET however I should use this placeholder format for WHERE =.
safely specifying 'order by' clause from user input in python / postgresql / psycopg2
Does anyone know the proper placeholder syntax for this sql query? Thanks!
Limit and offset can both be used with placeholders without any issue.
Generally speaking you can use placeholders wherever a 'value' would be allowed in an expression.
cur.execute("select * from node where node_name = %s limit %s offset %s", ('test', 5, 5))
Works just fine.
As already noted in the referenced article you cannot use placeholders to refer to tables, columns, schemas, or aliases for any of them. In those cases you generally need to do your own variable substitution before calling execute.
In very old versions of PostgreSQL, it was indeed not possible to use placeholders in LIMIT and OFFSET clauses. This functionality was added in version 7.4, so it is safe to assume that it exists in current installations.
But that only applies to server-side prepared statements. Psycopg does not use server-side prepared statements. It does its own string substitution and sends the resulting string to the backend as a constant. So in principle, you can use its parameter substitution feature anywhere there resulting literal would be syntactically valid.
So what you are proposing to do is fine either way.
I could write a SP inside Mysql and excute with a call statement. But looking to write it in python instead. I got stuck with using sql script on multiple lines.
conn = pyodbc.connect('DSN=MySQL;PWD=xxxx')
csr = conn.cursor()
Sql= 'SELECT something, something
FROM table
WHERE foo=bar
ORDER BY foo '
csr.execute(Sql)
sqld = csr.fetchall()
Heh, I don't mind to make it a proper answer.
String literals in triple quotes can include linebreaks and won't cause syntax errors. Otherwise (with "string" or 'string') you will need to include a backslash before every linebreak to make it work. And from experience, that's easy to screw up. :)
As a minor note, in Python variables are usually started with a lowercase letter, names starting with capital letters usually being given to classes.
So:
Sql = """SELECT something, something
FROM table
WHERE foo=bar
ORDER BY foo"""
If you don't mind the overhead, take a look at sqlalchemy:
SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.