I am trying to update cassandra database using python client as follows.
def update_content(session, id, content)
update_statement = """
UPDATE mytable SET content='{}' WHERE id={}
"""
session.execute(update_statement.format(content, id))
It works for most of the cases but in some scenarios the content is a string of the form
content = "Content Message -'[re]...)"
which results in error Exception calling application: <Error from server: code=2000 [Syntax error in CQL query] message="line 2:61 mismatched input 're' expecting K_WHERE (
which I am not sure why is it happening?
Is cassandra trying to interpret the string as regex somehow.
I tried printing the data before updation and its seems fine
"UPDATE mytable SET content='Content Message -'[re]...)' WHERE id=2"
To avoid such problems you should stop using the .format to create CQL statements, and start to use prepared statements that allow you to:
avoid problems with not escaped special characters, like, '
allows to do basic type checking
get better performance, because query will be parsed once, and only the data will be sent over the wire
you'll get token-aware query routing, meaning that query will be sent directly to one of the replicas that holds data for partition
Your code need to be modified as following:
prep_statement = session.prepare('UPDATE mytable SET content=? WHERE id=?')
def update_content(session, id, content):
session.execute(prep_statement, [content, id])
Please notice that statement need to be prepared only once, because it includes the round-trip to the cluster nodes to perform parsing of the query
Related
I have a Python script, that's using PyMySQL to connect to a MySQL database, and insert rows in there. Some of the columns in the database table are of type json.
I know that in order to insert a json, we can run something like:
my_json = {"key" : "value"}
cursor = connection.cursor()
cursor.execute(insert_query)
"""INSERT INTO my_table (my_json_column) VALUES ('%s')""" % (json.dumps(my_json))
connection.commit()
The problem in my case is that the json is variable over which I do not have much control (it's coming from an API call to a third party endpoint), so my script keeps throwing new error for non-valid json variables.
For example, the json could very well contain a stringified json as a value, so my_json would look like:
{"key": "{\"key_str\":\"val_str\"}"}
→ In this case, running the usual insert script would throw a [ERROR] OperationalError: (3140, 'Invalid JSON text: "Missing a comma or \'}\' after an object member." at position 1234 in value for column \'my_table.my_json_column\'.')
Or another example are json variables that contain a single quotation mark in some of the values, something like:
{"key" : "Here goes my value with a ' quotation mark"}
→ In this case, the usual insert script returns an error similar to the below one, unless I manually escape those single quotation marks in the script by replacing them.
[ERROR] ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'key': 'Here goes my value with a ' quotation mark' at line 1")
So my question is the following:
Are there any best practices that I might be missing on, and that I can use in order to avoid my script breaking, in the 2 scenarios mentioned above, but also in any other potential examples of jsons that might break the insert query ?
I read some existing posts like this one here or this one, where it's recommended to insert the json into a string or a blob column, but I'm not sure if that's a good practice / if other issues (like string length limitations for example) might arise from using a string column instead of json.
Thanks !
My objective is to store a JSON object into a MySQL database field of type json, using the mysql.connector library.
import mysql.connector
import json
jsonData = json.dumps(origin_of_jsonData)
cnx = mysql.connector.connect(**config_defined_elsewhere)
cursor = cnx.cursor()
cursor.execute('CREATE DATABASE dataBase')
cnx.database = 'dataBase'
cursor = cnx.cursor()
cursor.execute('CREATE TABLE table (id_field INT NOT NULL, json_data_field JSON NOT NULL, PRIMARY KEY (id_field))')
Now, the code below WORKS just fine, the focus of my question is the use of '%s':
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES (%s, %s)"
values_to_insert = (1, jsonData)
cursor.execute(insert_statement, values_to_insert)
My problem with that: I am very strictly adhering to the use of '...{}'.format(aValue) (or f'...{aValue}') when combining variable aValue(s) into a string, thus avoiding the use of %s (whatever my reasons for that, let's not debate them here - but it is how I would like to keep it wherever possible, hence my question).
In any case, I am simply unable, whichever way I try, to create something that stores the jsonData into the mySql dataBase using something that resembles the above structure and uses '...{}'.format() (in whatever shape or form) instead of %s. For example, I have (among many iterations) tried
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES ({}, {})".format(1, jsonData)
cursor.execute(insert_statement)
but no matter how I turn and twist it, I keep getting the following error:
ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[some_content_from_jsonData})]' at line 1
Now my question(s):
1) Is there a way to avoid the use of %s here that I am missing?
2) If not, why? What is it that makes this impossible? Is it the cursor.execute() function, or is it the fact that it is a JSON object, or is it something completely different? Shouldn't {}.format() be able to do everything that %s could do, and more?
First of all: NEVER DIRECTLY INSERT YOUR DATA INTO YOUR QUERY STRING!
Using %s in a MySQL query string is not the same as using it in a python string.
In python, you just format the string and 'hello %s!' % 'world' becomes 'hello world!'. In SQL, the %s signals parameter insertion. This sends your query and data to the server separately. You are also not bound to this syntax. The python DB-API specification specifies more styles for this: DB-API parameter styles (PEP 249). This has several advantages over inserting your data directly into the query string:
Prevents SQL injection
Say you have a query to authenticate users by password. You would do that with the following query (of course you would normally salt and hash the password, but that is not the topic of this question):
SELECT 1 FROM users WHERE username='foo' AND password='bar'
The naive way to construct this query would be:
"SELECT 1 FROM users WHERE username='{}' AND password='{}'".format(username, password)
However, what would happen if someone inputs ' OR 1=1 as password. The formatted query would then become
SELECT 1 FROM users WHERE username='foo' AND password='' OR 1=1
which will allways return 1. When using parameter insertion:
execute('SELECT 1 FROM users WHERE username=%s AND password=%s', username, password)
this will never happen, as the query will be interpreted by the server separately.
Performance
If you run the same query many times with different data, the performance difference between using a formatted query and parameter insertion can be significant. With parameter insertion, the server only has to compile the query once (as it is the same every time) and execute it with different data, but with string formatting, it will have to compile it over and over again.
In addition to what was said above, I would like to add some details that I did not immediately understand, and that other (newbies like me ;)) may also find helpful:
1) "parameter insertion" is meant for only for values, it will not work for table names, column names, etc. - for those, the Python string substitution works fine in the sql syntax defintion
2) the cursor.execute function requires a tuple to work (as specified here, albeit not immediately clear, at least to me: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html)
EXAMPLE for both in one function:
def checkIfRecordExists(column, table, condition_name, condition_value):
...
sqlSyntax = 'SELECT {} FROM {} WHERE {} = %s'.format(column, table, condition_name)
cursor.execute(sqlSyntax, (condition_value,))
Note both the use of .format in the initial sql syntax definition and the use of (condition_value,) in the execute function.
I have some code in Python that sets a char(80) value in an sqlite DB.
The string is obtained directly from the user through a text input field and sent back to the server with a POST method in a JSON structure.
On the server side I currently pass the string to a method calling the SQL UPDATE operation.
It works, but I'm aware it is not safe at all.
I expect that the client side is unsafe anyway, so any protection is to be put on the server side. What can I do to secure the UPDATE operation agains SQL injection ?
A function that would "quote" the text so that it can't confuse the SQL parser is what I'm looking for. I expect such function exist but couldn't find it.
Edit:
Here is my current code setting the char field name label:
def setLabel( self, userId, refId, label ):
self._db.cursor().execute( """
UPDATE items SET label = ? WHERE userId IS ? AND refId IS ?""", ( label, userId, refId) )
self._db.commit()
From the documentation:
con.execute("insert into person(firstname) values (?)", ("Joe",))
This escapes "Joe", so what you want is
con.execute("insert into person(firstname) values (?)", (firstname_from_client,))
The DB-API's .execute() supports parameter substitution which will take care of escaping for you, its mentioned near the top of the docs; http://docs.python.org/library/sqlite3.html above Never do this -- insecure.
Noooo... USE BIND VARIABLES! That's what they're there for. See this
Another name for the technique is parameterized sql (I think "bind variables" may be the name used with Oracle specifically).
I'm trying to extract email addresses from text in the column alltext and update the column email with the list of emails found in alltext. The datatype for email is a string array (i.e. text[]).
1) I'm getting the following error and can't seem to find a way around it:
psycopg2.ProgrammingError: syntax error at or near "["
LINE 1: UPDATE comments SET email=['person#email.com', 'other#email.com']
2) Is there a more efficient way to be doing this in the first place? I've experimented some with the PostgreSQL regex documentation but a lot of people seem to think it's not great for this purpose.
def getEmails():
'''Get emails from alltext.
'''
DB = psycopg2.connect("dbname=commentDB")
c = DB.cursor()
c.execute("SELECT id, alltext FROM comments WHERE id < 100")
for row in c:
match = re.findall(r'[\w\.-]+#[\w\.-]+', str(row[1]))
data = {'id':int(row[0]), 'email':match}
c.execute("UPDATE comments SET email=%(email)s WHERE id=%(id)s" % data)
DB.commit()
DB.close()
execute should be passed a list for unnamed arguments, or dict -- as in this case -- for named arguments, as a second argument to ensure that it is psycopg2 (via libpq) that is doing all the proper escaping. You are using native Python string interpolation, which is subject to SQL Injection, and leading to this error, since it isn't libpq doing the interpolation.
Also, as an aside, your regex won't capture various types of email addresses. One type that immediately comes to mind is the form foo+bar#loopback.edu. The + is technically allowed, and can be used, for example, for filtering email. See this link for more details as to issues that crop up with using regexes for validating/parsing email addresses.
In short, the above link recommends using this regex:
\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b
With the caveat that it is valid for only what the author claims is valid email address. Still, it's probably a good jumping-off point, and can be adjusted if you have specific cases that differ.
Edit in response to comment from OP:
The execute line from above would become:
c.execute("UPDATE comments SET email=%(email)s WHERE id=%(id)s", data)
Note that data is now a second argument to execute as opposed to being part of an interpolation operation. This means that Psycopg2 will handle the interpolation and not only avoid the SQL Injection issue, but also properly interpret how the dict should be interpolated into the query string.
Edit in response to follow-up comment from OP:
Yes, the subsequent no results to fetch error is likely because you are using the same cursor. Since you are iterating over the current cursor, trying to use it again in the for loop to do an update interferes with the iteration.
I would declare a new cursor inside the for loop and use that.
I am looking for a syntax definition, example, sample code, wiki, etc. for
executing a LOAD DATA LOCAL INFILE command from python.
I believe I can use mysqlimport as well if that is available, so any feedback (and code snippet) on which is the better route, is welcome. A Google search is not turning up much in the way of current info
The goal in either case is the same: Automate loading hundreds of files with a known naming convention & date structure, into a single MySQL table.
David
Well, using python's MySQLdb, I use this:
connection = MySQLdb.Connect(host='**', user='**', passwd='**', db='**')
cursor = connection.cursor()
query = "LOAD DATA INFILE '/path/to/my/file' INTO TABLE sometable FIELDS TERMINATED BY ';' ENCLOSED BY '\"' ESCAPED BY '\\\\'"
cursor.execute( query )
connection.commit()
replacing the host/user/passwd/db as appropriate for your needs. This is based on the MySQL docs here, The exact LOAD DATA INFILE statement would depend on your specific requirements etc (note the FIELDS TERMINATED BY, ENCLOSED BY, and ESCAPED BY statements will be specific to the type of file you are trying to read in).
You can also get the results for the import by adding the following lines after your query:
results = connection.info()