I'm using psycopg2 to interact with a PostgreSQL database. I have a function whereby any number of columns (from a single column to all columns) in a table could be inserted into. My question is: how would one properly, dynamically, construct this query?
At the moment I am using string formatting and concatenation and I know this is the absolute worst way to do this. Consider the below code where, in this case, my unknown number of columns (i.e. keys from a dict is in fact 2):
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
fields, values = list(), list()
for key, val in dictOfUnknownLength.items():
fields.append(key)
values.append(val)
fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
values = str(values).replace('[', '(').replace(']', ')')
query = f"INSERT INTO myTable {fields} VALUES {values} RETURNING someValue;"
query = INSERT INTO myTable (key1, key2) VALUES (3, 'myString') RETURNING someValue;
This provides a correctly formatted query but is of course prone to SQL injections and the like and, as such, is not an acceptable method of achieving my goal.
In other queries I am using the recommended methods of query construction when handling a known number of variables (%s and separate argument to .execute() containing variables) but I'm unsure how to adapt this to accommodate an unknown number of variables without using string formatting.
How can I elegantly and safely construct a query with an unknown number of specified insert columns?
To add to your worries, the current methodology using .replace() is prone to edge cases where fields or values contain [, ], or '. They will get replaced no matter what and may mess up your query.
You could always use .join() to join a variable number of values in your list. To top it up, format the query appropriately with %s after VALUES and pass your arguments into .execute().
Note: You may also want to consider the case where the number of fields is not equal to the number values.
import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
# Directly assign keys/values.
fields, values = list(dictOfUnknownLength.keys()), list(dictOfUnknownLength.values())
if len(fields) != len(values):
# Raise an error? SQL won't work in this case anyways...
pass
# Stringify the fields and values.
fieldsParam = ','.join(fields) # "key1, key2"
valuesParam = ','.join(['%s']*len(values))) # "%s, %s"
# "INSERT ... (key1, key2) VALUES (%s, %s) ..."
query = 'INSERT INTO myTable ({}) VALUES ({}) RETURNING someValue;'.format(fieldsParam, valuesParam)
# .execute('INSERT ... (key1, key2) VALUES (%s, %s) ...', [3, 'myString'])
cur.execute(query, values) # Anti-SQL-injection: pass placeholder
# values as second argument.
Related
lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
I have a long list of dictionaries of the form above.
I have two fixed variables.
person = 'Sam'
date = datetime.datetime.now()
I wish to insert this information into a mysql table.
How I do it currently
for item in lst:
item['Person'] = person
item['Date'] = date
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%(Person)s, %(Date)s, %(Fruit)s, %(HadToday)s)""", lst)
conn.commit()
Is their a way to do it, that bypasses the loop as the person and date variables are constant. I have tried
lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%s, %s, %(Fruit)s, %(HadToday)s)""", (person,date,lst))
conn.commit()
TypeError: not enough arguments for format string
Your problem here is, that it tries to apply all of lst into %(Fruit)s and nothing is left for %(HadToday)s).
You should not fix it by hardcoding the fixed values into the statement as you get into troubles if you have a name like "Tim O'Molligan" - its better to let the db handle the correct formatting.
Not mysql, but you get the gist: http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters - learned this myself just a week ago ;o)
The probably cleanest way would be to use
cursor.execute("SET #myname = %s", (person,))
cursor.execute("SET #mydate = %s", (datetime.datetime.now(),))
and use
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (#myname, #mydate, %(Fruit)s, %(HadToday)s)""", lst)
I am not 100% about the syntax, but I hope you get the idea. Comment/edit the answer if I have a misspell in it.
I'm trying to insert data from a JSON string to MySQL using MySQLdb. The total number of columns is fixed. Each row of data from the JSON string does not always have values for each column.
Here is my sample code:
vacant_building = 'http://data.cityofchicago.org/resource/7nii-7srd.json?%24where=date_service_request_was_received=%272014-06-02T00:00:00%27'
obj = urllib2.urlopen(vacant_building)
data = json.load(obj)
def insert_mysql(columns, placeholders, data):
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s)" % (columns, placeholders)
db = MySQLdb.connect(host="localhost", user="xxxx", passwd="xxxx", db="chicago_data")
cur = db.cursor()
cur.execute(sql, data)
for row in data:
placeholders = ', '.join(['%s'] * len(row))
columns = ', '.join(c[:64] for c in row.keys())
row_data = ', '.join(str(value) for value in row.values())
insert_mysql(columns, placeholders, row_data)
I get the following error:
query = query % tuple([db.literal(item) for item in args])
TypeError: not all arguments converted during string formatting
I'm pretty sure the error has to do with the way I'm inserting the values. I've tried to change this to:
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s) (%s)" % (columns, placeholders, data)
but I get a 1064 error. It's because the values are not enclosed by quotes (').
Thoughts to fix?
In order to parameterize your query using MySQLdb's cursor.execute method, the second argument to execute has to be a sequence of values; in your for loop, you're joining the values together into one string with the following line:
row_data = ', '.join(str(value) for value in row.values())
Since you generated a number of placeholders for your values equal to len(row), you need to supply that many values to cursor.execute. If you gave it only a single string, it will put that entire string into the first placeholder, leaving the others without any arguments. This will throw a TypeError - the message in this case would read, "not enough arguments for format string," but I'm going to assume you simply mixed up when copy/pasting because the opposite case (supplying too many arguments/too few placeholders) reads as you indicate, "not all arguments converted during string formatting."
In order to run an INSERT statement through MySQLdb with a variable set of columns, you could do just as you've done for the columns and placeholders, but I prefer to use mapping types with the extended formatting syntax supported by MySQLdb (e.g., %(name)s instead of %s) to make sure that I've constructed my query correctly and not put the values into any wrong order. I also like using advanced string formatting where possible in my own code.
You could prepare your inputs like this:
max_key_length = 64
columns = ','.join(k[:max_key_length] for k in row.keys())
placeholders = ','.join('%({})s'.format(k[:max_key_length]) for k in row.keys())
row_data = [str(v) for v in row.values()]
Noting that the order of the dict comprehensions is guaranteed, so long as you don't alter the dict in the meanwhile.
Generally speaking, this should work okay with the sort of code in your insert_mysql function. However, looking at the JSON data you're actually pulling from that URL, you should be aware that you may run into nesting issues; for example:
>>> pprint.pprint(data[0])
{u'address_street_direction': u'W',
u'address_street_name': u'61ST',
u'address_street_number': u'424',
u'address_street_suffix': u'ST',
u'any_people_using_property_homeless_childen_gangs_': True,
u'community_area': u'68',
u'date_service_request_was_received': u'2014-06-02T00:00:00',
u'if_the_building_is_open_where_is_the_entry_point_': u'FRONT',
u'is_building_open_or_boarded_': u'Open',
u'is_the_building_currently_vacant_or_occupied_': u'Vacant',
u'is_the_building_vacant_due_to_fire_': False,
u'latitude': u'41.78353874626324',
u'location': {u'latitude': u'41.78353874626324',
u'longitude': u'-87.63573355602661',
u'needs_recoding': False},
u'location_of_building_on_the_lot_if_garage_change_type_code_to_bgd_': u'Front',
u'longitude': u'-87.63573355602661',
u'police_district': u'7',
u'service_request_number': u'14-00827306',
u'service_request_type': u'Vacant/Abandoned Building',
u'ward': u'20',
u'x_coordinate': u'1174508.30988836',
u'y_coordinate': u'1864483.93566661',
u'zip_code': u'60621'}
The string representation of the u'location' column is:
"{u'latitude': u'41.78353874626324', u'needs_recoding': False, u'longitude': u'-87.63573355602661'}"
You may not want to put that into a database field, especially considering that there are atomic lat/lon fields already in the JSON object.
I'm getting a weird error when inserting some data from a Python script to MySQL. It's basically related to a variable being blank that I am inserting. I take it that MySQL does not like blank variables but is there something else I can change it to so it works with my insert statement?
I can successfully use an IF statement to turn it to 0 if its blank but this may mess up some of the data analytics I plan to do in MySQL later. Is there a way to convert it to NULL or something so MySQL accepts it but doesn't add anything?
When using mysqldb and cursor.execute(), pass the value None, not "NULL":
value = None
cursor.execute("INSERT INTO table (`column1`) VALUES (%s)", (value,))
Found the answer here
if the col1 is char, col2 is int, a trick could be:
insert into table (col1, col2) values (%s, %s) % ("'{}'".format(val1) if val1 else "NULL", val2 if val2 else "NULL");
you do not need to add ' ' to %s, it could be processed before pass value to sql.
this method works when execute sql with session of sqlalchemy, for example session.execute(text(sql))
ps: sql is not tested yet
Quick note about using parameters in SQL statements with Python. See the RealPython article on this topic - Preventing SQL Injection Attacks With Python. Here's another good article from TowardsDataScience.com - A Simple Approach To Templated SQL Queries In Python. These helped me with same None/NULL issue.
Also, I found that if I put "NULL" (without quotes) directly into the INSERT query in VALUES, it was interpreted appropriately in the SQL Server DB. The translation problem only exists if needing to conditionally add NULL or a value via string interpolation.
Examples:
cursor.execute("SELECT admin FROM users WHERE username = %s'", (username, ));
cursor.execute("SELECT admin FROM users WHERE username = %(username)s", {'username': username});
UPDATE: This StackOverflow discussion is more in line with what I'm trying to do and may help someone else.
Example:
import pypyodbc
myData = [
(1, 'foo'),
(2, None),
(3, 'bar'),
]
connStr = """
DSN=myDb_SQLEXPRESS;
"""
cnxn = pypyodbc.connect(connStr)
crsr = cnxn.cursor()
sql = """
INSERT INTO myTable VALUES (?, ?)
"""
for dataRow in myData:
print(dataRow)
crsr.execute(sql, dataRow)
cnxn.commit()
crsr.close()
cnxn.close()
Based on above answers I wrote a wrapper function for my use case, you can try and change the function according to your need.
def sanitizeData(value):
if value in ('', None):
return "NULL"
# This case handles the case where value already has ' in it (ex: O'Brien). This is how SQL skils single quotes
if type(value) is str:
return "'{}'".format(value.replace("'", "''"))
return value
Now call the sql query like so,
"INSERT INTO %s (Name, Email) VALUES (%s, %s)"%(table_name, sanitizeData(actual_name), sanitizeData(actual_email))
Why not set the variable equal to some string like 'no price' and then filter this out later when you want to do math on the numbers?
filter(lambda x: x != 'no price',list_of_data_from_database)
Do a quick check for blank, and if it is, set it equal to NULL:
if(!variable_to_insert)
variable_to_insert = "NULL"
...then make sure that the inserted variable is not in quotes for the insert statement, like:
insert = "INSERT INTO table (var) VALUES (%s)" % (variable_to_insert)
...
not like:
insert = "INSERT INTO table (var) VALUES ('%s')" % (variable_to_insert)
...
I'm trying to create an SQLite 3 database from Python. I have a few types I'd like to insert into each record: A float, and then 3 groups of n floats, currently a tuple but could be an array or list.. I'm not well-enough versed in Python to understand all the differences. My problem is the INSERT statement.
DAS = 12345
lats = (42,43,44,45)
lons = (10,11,12,13)
times = (1,2,3,4,5,6,7,8,9)
import sqlite3
connection = sqlite3.connect("test.db")
cursor = connection.cursor()
cursor.execute( "create table foo(DAS LONG PRIMARY KEY,lats real(4),lons real(4), times real(9) )" )
I'm not sure what comes next. Something along the lines of:
cmd = 'INSERT into foo values (?,?,?,?), ..."
cursor.execute(cmd)
How should I best build the SQL insert command given this data?
The type real(4) does not mean an array/list/tuple of 4 reals; the 4 alters the 'real' type. However, SQLite mostly ignores column types due to its manifest typing, but they can still affect column affinity.
You have a few options, such as storing the text representation (from repr) or using four columns, one for each.
You can modify this with various hooks provided by the Python's SQLite library to handle some of the transformation for you, but separate columns (with functions to localize and handle various statements, so you don't repeat yourself) is probably the easiest to work with if you need to search/etc. in SQL on each value.
If you do store a text representation, ast.literal_eval (or eval, under special conditions) will convert back into a Python object.
Something like this:
db = sqlite3.connect("test.db")
cursor = db.cursor()
cursor.execute("insert into foo values (?,?,?,?)", (val1, val2, val3, val4))
db.commit() # Autocommit is off by default (and rightfully so)!
Please note, that I am not using string formatting to inject actual data into the query, but instead make the library do this work for me. That way the data is quoted and escaped correctly.
EDIT: Obviously, considering your database schema, it doesn't work. It is impractical to attempt to store a collection-type value in a single field of a sqlite database. If I understand you correctly, you should just create a separate column for every value you are storing in the single row. That will be a lot of columns, sure, but that's the most natural way to do it.
(A month later), two steps:
1. flatten e.g. DAS lats lons times to one long list, say 18 long
2. generate "Insert into tablename xx (?,?,... 18 question marks )" and execute that.
Test = 1
def flatten( *args ):
""" 1, (2,3), [4,5] -> [1 2 3 4 5] """
# 1 level only -- SO [python] [flatten] zzz
all = []
for a in args:
all.extend( a if hasattr( a, "__iter__" ) else [a] )
if Test: print "test flatten:", all
return all
def sqlinsert( db, tablename, *args ):
flatargs = flatten( *args ) # one long list
ncol = len(flatargs)
qmarks = "?" + (ncol-1) * ",?"
insert = "Insert into tablename %s values (%s)" % (tablename, qmarks)
if Test: print "test sqlinsert:", insert
if db:
db.execute( insert, flatargs )
# db.executemany( insert, map( flatargs, rows ))
return insert
#...............................................................................
if __name__ == "__main__":
print sqlinsert( None, "Table", "hidiho", (4,5), [6] )
I'm using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to come).
Currently I use cursor.execute(...) to import some metadata. This isn't problematic as there are only a few entries for these.
The problem is that when I try to use cursor.executemany() to import larger quantities of the actual measurement data, MySQLdb raises a
TypeError: not all arguments converted during string formatting
My current code is
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into values (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()
where values is a list of tuples containing three strings each. Any ideas what could be wrong with this?
Edit:
The values are generated by
yield (prefix + row['id'], row['value'], sample_id)
and then read into a list one thousand at a time where row is and iterator coming from csv.DictReader.
In retrospective this was a really stupid but hard to spot mistake. Values is a keyword in sql so the table name values needs quotes around it.
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into `values` (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()
The message you get indicates that inside the executemany() method, one of the conversions failed. Check your values list for a tuple longer than 3.
For a quick verification:
max(map(len, values))
If the result is higher than 3, locate your bad tuple with a filter:
[t for t in values if len(t) != 3]
or, if you need the index:
[(i,t) for i,t in enumerate(values) if len(t) != 3]