I'm looking to escape special characters in string for Python 2.7.
For example, if I have :
str = "You're the best "dog" on earth."
I would have :
str = "You\'re the best \"dog\" on earth."
I want it because I'm inserting strings in SQL database using pymySQL and I can't find a way to do this.
I guess escaping characters must be like this ? (not really sure)
I also would find a way to do the reverse action remove escpaing characters.
You are approaching this entirely the wrong way. You should never need to escape special characters when inserting a string into a SQL database: always use parametrised SQL queries and any needed escaping will be done for you. If you start trying to escape the strings yourself you are opening your code up to all manner of security problems.
with connection.cursor() as cursor:
# Create a new record
sql = "INSERT INTO `mytable` (`thestring`) VALUES (%s)"
cursor.execute(sql, (str,))
If you ever find yourself building a query string out of data that has come from any outside source stop and reconsider: you should never need to do that.
You don't need to escape values for the purpose of SQL by hand! Let the database API take care of that.
Form a valid string literal in Python source code:
str = "You're the best \"dog\" on earth."
str = 'You\'re the best "dog" on earth.'
str = """You're the best "dog" on earth."""
These are all equivalent, you just need to escape the appropriate quotes that you're using as string literal terminators.
Use the database API correctly and don't worry about escaping. From the manual:
sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
cursor.execute(sql, ('webmaster#python.org', 'very-secret'))
Escaping is handled by separating the query and values, not by adding backslashes.
Related
I have a program in python and I want to insert data into a table(using insert into statement). I receive data from web(web scraping) and the data contain both single and double quotes. As you know MySQL allows to insert both single and double quotes to a table so the error is not from database. Problem appears when I use that data in python and an error appears.
No matters if i use single or double quotes in the string (insert into statement values) in python, in both ways error appears because of the data(that contain single or double quotes).I use MySQL and Connector/python and in my script I import mysql. I hope you got this, sorry about bad English.
Most likely explanation for the behavior is a SQL Injection vulnerability. (That's just a guess because we are speculating about code we haven't seen; only a description of the behavior.)
The short answer is to use prepared statements with bind placeholders
https://pynative.com/python-mysql-execute-parameterized-query-using-prepared-statement/
If for some reason that is not possible, then at a bare minimum, any potentially unsafe values included in SQL text must be properly escaped to make them safe for inclusion
(The single quote in Little Bobby Tables https://xkcd.com/327/ is not escaped.)
As example, this SQL will throw an error, because the second single quote ends the string literal, and what follows the end of the string literal "s wrong" is gibberish in terms of SQL:
INSERT INTO mytab (mycol) VALUES ( 'It's wrong' )
^
But this will work:
INSERT INTO mytab (mycol) VALUES ( 'It''ll work' )
^^
Because the single quote within the string literal is escaped, by preceding it with another single quote.
The OWASP project provides a good overview of SQL Injection.
https://www.owasp.org/index.php/SQL_Injection
https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet
I know that variants of this topic have been discussed elsewhere, but none of the other threads were helpful.
I want to hand over a string from python to sql. It might however happen that apostrophes (') occur in the string. I want to escape them with a backslash.
sql = "update tf_data set authors=\'"+(', '.join(authors).replace("\'","\\\'"))+"\' where tf_data_id="+str(tf_data_id)+";"
However, this will always give \\' in my string. Therefore, the backslash itself is escaped and the sql statement doesn't work.
Can someone help me or give me an alternative to the way I am doing this?
Thanks
Simply don't.
Also don't concatenate sql queries as these are prone to sql injections.
Instead, use a parameterized query:
sql = "update tf_data set authors=%(authors)s where tf_data_id=%(data_id)s"
# or :authors and :data_id, I get confused with all those sql dialects out there
authors = ', '.join(authors)
data_id = str(tf_data_id)
# db or whatever your db instance is called
db.execute(sql, {'authors': authors, 'data_id': data_id})
You're using double-quoted strings, but still escaping the single quotes within them. That's not required, all you need to do is escape the backslash that you want to use in the replace operation.
>>> my_string = "'Hello there,' I said."
>>> print(my_string)
'Hello there,' I said.
>>> print(my_string.replace("'", "\\'"))
\'Hello there,\' I said.
Note that I'm using print. If you just ask Python to show you its representation of the string after the replace operation, you'll see double backslashes because they need to be escaped.
>>> my_string.replace("'", "\\'")
"\\'Hello there,\\' I said."
As others have alluded to, if you are using a python package to execute your SQL use the provided methods with parameter placeholders(if available).
My answer addresses the escaping issues mentioned.
Use a String literal with prefix r
print(r"""the\quick\fox\\\jumped\'""")
Output:
the\quick\fox\\\jumped\'
Is MATCH from MySQL also vulnerable to injection attack?
For example:
"""SELECT *
FROM myTable
WHERE MATCH(myColumnName) AGAINST(%s)
ORDER BY id
LIMIT 20""" % query
seems to allow arbitrary strings, which looks bad.
If so, I've instead tried - following examples in the Python docs -
t = (query,)
statement = """SELECT *
FROM myTable
WHERE MATCH(myColumnName) AGAINST(?)
ORDER BY id
LIMIT 20"""
cursor.execute(statement, t)
but nothing is returned - even when the string query returned hits in (1) above. Why is that?
In 2), using the placeholder %s instead of ? returns results. Why is this safer than 1) (if at all)? E.g. with the query string I can always close off a string and parenthesis with query=')...' and continue query=') OR otherColumnName LIKE '%hello%' --.
Therefore, is it enough to strip query strings of everything but roman characters or numerals?
It doesn't much matter what operators, functions, clauses or any other host-language terms you're using when it comes to injection. Injection is a matter of mixing data and language statements, which happens when you interpolate data into a statement. Prepared statement parameters keep data and statements separate, so they're not vulnerable to injection.
As for ? versus %s for parameters, the MySQLdb documentation for Cursor.execute says the following:
execute(self, query, args=None)
[...]
Note: If args is a sequence, then %s must be used as the parameter placeholder in the query. If a mapping is used, %(key)s must be used as the placeholder.
When using a parameterized statement with arguments, MySQLdb escapes the arguments and uses Python string formating to re-insert those arguments into the parametrized statement. A single string statement is then sent to the server.
You are relying on MySQLdb's ability to properly escape arguments to protect against SQL injection:
MySQLdb/cursors.py:
def execute(self, query, args=None):
...
if args is not None:
query = query % db.literal(args)
In contrast, oursql sends queries and data to the server completely separately.
It does not rely on escaping the data. This should be even safer.
The first method is not correct since it uses basic python string formatting, that will not escape the query string.
The second method is the preferred method of sending a simple query to the server and it will properly escape the query.
You just have a simple bug. You need to replace AGAINST(?) with AGAINST(%s)
The information is also found in the python DbApi FAQ
For my purposes it's enough to sanitise the query string - strip if of all non-alphanumeric characters (in particular ' and )).
This string:
"CREATE USER %s PASSWORD %s", (user, pw)
always gets expanded to:
CREATE USER E'someuser' PASSWORD E'somepassword'
Can anyone tell me why?
Edit:
The expanded string above is the string my database gives me back in the error message. I'm using psycopg2 to access my postgres database. The real code looks like this:
conn=psycopg2.connect(user=adminuser, password=adminpass, host=host)
cur = conn.cursor()
#user and pw are simple standard python strings the function gets as parameter
cur.execute("CREATE USER %s PASSWORD %s", (user, pw))
conn.commit()
To pass identifiers to postgresql through psycopg use AsIs from the extensions module
from psycopg2.extensions import AsIs
import psycopg2
connection = psycopg2.connect(database='db', user='user')
cur = connection.cursor()
cur.mogrify(
'CREATE USER %s PASSWORD %s', (AsIs('someuser'), AsIs('somepassword'))
)
'CREATE USER someuser PASSWORD somepassword'
That works also for passing conditions to clauses like order by:
cur.mogrify(
'select * from t order by %s', (AsIs('some_column, another column desc'),)
)
'select * from t order by some_column, another column desc'
As the OP's edit reveals he's using PostgreSQL, the docs for it are relevant, and they say:
PostgreSQL also accepts "escape"
string constants, which are an
extension to the SQL standard. An
escape string constant is specified by
writing the letter E (upper or lower
case) just before the opening single
quote, e.g. E'foo'.
In other words, psycopg is correctly generating escape string constants for your strings (so that, as the docs also say:
Within an escape string, a backslash
character () begins a C-like
backslash escape sequence, in which
the combination of backslash and
following character(s) represents a
special byte value.
(which as it happens are also the escape conventions of non-raw Python string literals).
The OP's error clearly has nothing to do with that, and, besides the excellent idea of studying PostgreSQL's excellent docs, he should not worry about that E'...' form in this case;-).
Not only the E but the quotes appear to come from whatever type user and pw have. %s simply does what str() does, which may fall back to repr(), both of which have corresponding methods __str__ and __repr__. Also, that isn't the code that generates your result (I'd assumed there was a %, but now see only a comma). Please expand your question with actual code, types and values.
Addendum: Considering that it looks like SQL, I'd hazard a guess that you're seeing escape string constants, likely properly generated by your database interface module or library.
Before attempting something like:
statement = "CREATE USER %s PASSWORD %s" % (user, pw)
Please ensure you read: http://www.initd.org/psycopg/docs/usage.html
Basically the issue is that if you are accepting user input (I assume so as someone is entering in the user & pw) you are likely leaving yourself open to SQL injection.
As PsyCopg2 states:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
As has been identified, Postgres (or Psycopg2) doesn't seem to provide a good answer to escaping identifiers. In my opinion, the best way to resolve this is to provide a 'whitelist' filtering method.
ie: Identify what characters are allowed in a 'user' and a 'pw'. (perhaps A-Za-z0-9_). Be careful that you don't include escape characters (' or ;, etc..) or if you do, that you escape these values.
I have a python script that reads raw movie text files into an sqlite database.
I use re.escape(title) to add escape chars into the strings to make them db safe before executing the inserts.
Why does this not work:
In [16]: c.execute("UPDATE movies SET rating = '8.7' WHERE name='\'Allo\ \'Allo\!\"\ \(1982\)'")
--------------------------------------------------------------------------- OperationalError Traceback (most recent call last)
/home/rajat/Dropbox/amdb/<ipython console> in <module>()
OperationalError: near "Allo": syntax error
Yet this works (removed \' in two places) :
In [17]: c.execute("UPDATE movies SET rating = '8.7' WHERE name='Allo\ Allo\!\"\ \(1982\)'") Out[17]: <sqlite3.Cursor object at 0x9666e90>
I can't figure it out. I also can't ditch those leading quotes because they're actually part of the movie title.
Thank you.
You're doing it wrong. Literally. You should be using parameters, like this:
c.execute("UPDATE movies SET rating = ? WHERE name = ?", (8.7, "'Allo 'Allo! (1982)"))
Like that, you won't need to do any quoting at all and (if those values are coming from anyone untrusted) you'll be 100% safe (here) from SQL injection attacks too.
I use re.escape(title) to add escape
chars into the strings to make them db
safe
Note that re.escape makes a string re-safe -- nothing to do with making it db safe. Rather, as #Donal says, what you need is the parameter substitution concept of the Python DB API -- that makes things "db safe" as you need.
SQLite doesn't support backslash escape sequences. Apostrophes in string literals are indicated by doubling them: '''Allo ''Allo! (1982)'.
But, like Donal said, you should be using parameters.
I've one simple tip you could use to handle this problem:
When your SQL statement string has single quote:', then you could use double quote to enclose your statement string. And when your SQL statement string has double quotes:", then you could use single quote:" to enclose your statement string.
E.g.
sqlString="UPDATE movies SET rating = '8.7' WHERE name='Allo Allo !' (1982 )"
c.execute(sqlString)
Or,
sqlString='UPDATE movies SET rating = "8.7" WHERE name="Allo Allo !" (1982 )'
c.execute(sqlString)
This solution works for me in Python environment.