Python + MySQLdb executemany - python

I'm using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to come).
Currently I use cursor.execute(...) to import some metadata. This isn't problematic as there are only a few entries for these.
The problem is that when I try to use cursor.executemany() to import larger quantities of the actual measurement data, MySQLdb raises a
TypeError: not all arguments converted during string formatting
My current code is
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into values (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()
where values is a list of tuples containing three strings each. Any ideas what could be wrong with this?
Edit:
The values are generated by
yield (prefix + row['id'], row['value'], sample_id)
and then read into a list one thousand at a time where row is and iterator coming from csv.DictReader.

In retrospective this was a really stupid but hard to spot mistake. Values is a keyword in sql so the table name values needs quotes around it.
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into `values` (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()

The message you get indicates that inside the executemany() method, one of the conversions failed. Check your values list for a tuple longer than 3.
For a quick verification:
max(map(len, values))
If the result is higher than 3, locate your bad tuple with a filter:
[t for t in values if len(t) != 3]
or, if you need the index:
[(i,t) for i,t in enumerate(values) if len(t) != 3]

Related

Properly format SQL query when insert into variable number of columns

I'm using psycopg2 to interact with a PostgreSQL database. I have a function whereby any number of columns (from a single column to all columns) in a table could be inserted into. My question is: how would one properly, dynamically, construct this query?
At the moment I am using string formatting and concatenation and I know this is the absolute worst way to do this. Consider the below code where, in this case, my unknown number of columns (i.e. keys from a dict is in fact 2):
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
fields, values = list(), list()
for key, val in dictOfUnknownLength.items():
fields.append(key)
values.append(val)
fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
values = str(values).replace('[', '(').replace(']', ')')
query = f"INSERT INTO myTable {fields} VALUES {values} RETURNING someValue;"
query = INSERT INTO myTable (key1, key2) VALUES (3, 'myString') RETURNING someValue;
This provides a correctly formatted query but is of course prone to SQL injections and the like and, as such, is not an acceptable method of achieving my goal.
In other queries I am using the recommended methods of query construction when handling a known number of variables (%s and separate argument to .execute() containing variables) but I'm unsure how to adapt this to accommodate an unknown number of variables without using string formatting.
How can I elegantly and safely construct a query with an unknown number of specified insert columns?
To add to your worries, the current methodology using .replace() is prone to edge cases where fields or values contain [, ], or '. They will get replaced no matter what and may mess up your query.
You could always use .join() to join a variable number of values in your list. To top it up, format the query appropriately with %s after VALUES and pass your arguments into .execute().
Note: You may also want to consider the case where the number of fields is not equal to the number values.
import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
# Directly assign keys/values.
fields, values = list(dictOfUnknownLength.keys()), list(dictOfUnknownLength.values())
if len(fields) != len(values):
# Raise an error? SQL won't work in this case anyways...
pass
# Stringify the fields and values.
fieldsParam = ','.join(fields) # "key1, key2"
valuesParam = ','.join(['%s']*len(values))) # "%s, %s"
# "INSERT ... (key1, key2) VALUES (%s, %s) ..."
query = 'INSERT INTO myTable ({}) VALUES ({}) RETURNING someValue;'.format(fieldsParam, valuesParam)
# .execute('INSERT ... (key1, key2) VALUES (%s, %s) ...', [3, 'myString'])
cur.execute(query, values) # Anti-SQL-injection: pass placeholder
# values as second argument.

Insert list of dictionaries and variable into table

lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
I have a long list of dictionaries of the form above.
I have two fixed variables.
person = 'Sam'
date = datetime.datetime.now()
I wish to insert this information into a mysql table.
How I do it currently
for item in lst:
item['Person'] = person
item['Date'] = date
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%(Person)s, %(Date)s, %(Fruit)s, %(HadToday)s)""", lst)
conn.commit()
Is their a way to do it, that bypasses the loop as the person and date variables are constant. I have tried
lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%s, %s, %(Fruit)s, %(HadToday)s)""", (person,date,lst))
conn.commit()
TypeError: not enough arguments for format string
Your problem here is, that it tries to apply all of lst into %(Fruit)s and nothing is left for %(HadToday)s).
You should not fix it by hardcoding the fixed values into the statement as you get into troubles if you have a name like "Tim O'Molligan" - its better to let the db handle the correct formatting.
Not mysql, but you get the gist: http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters - learned this myself just a week ago ;o)
The probably cleanest way would be to use
cursor.execute("SET #myname = %s", (person,))
cursor.execute("SET #mydate = %s", (datetime.datetime.now(),))
and use
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (#myname, #mydate, %(Fruit)s, %(HadToday)s)""", lst)
I am not 100% about the syntax, but I hope you get the idea. Comment/edit the answer if I have a misspell in it.

Python MySQLdb TypeError("not all arguments converted during string formatting")

I know this is a popular topic but I searched the various answers and didn't see a clear answer to my issue. I have a function that I want to use to insert records into my NDBC database that is giving me the error I mentioned in the title. The function is below:
def insertStdMet(station,cursor,data):
# This function takes in a station id, database cursor and an array of data. At present
# it assumes the data is a pandas dataframe with the datetime value as the index
# It may eventually be modified to be more flexible. With the parameters
# passed in, it goes row by row and builds an INSERT INTO SQL statement
# that assumes each row in the data array represents a new record to be
# added.
fields=list(data.columns) # if our table has been constructed properly, these column names should map to the fields in the data table
# Building the SQL string
strSQL1='REPLACE INTO std_met (station_id,date_time,'
strSQL2='VALUES ('
for f in fields:
strSQL1+=f+','
strSQL2+='%s,'
# trimming the last comma
strSQL1=strSQL1[:-1]
strSQL2=strSQL2[:-1]
strSQL1+=") " + strSQL2 + ")"
# Okay, now we have our SQL string. Now we need to build the list of tuples
# that will be passed along with it to the .executemany() function.
tuplist=[]
for i in range(len(data)):
r=data.iloc[i][:]
datatup=(station,r.name)
for f in r:
datatup+=(f,)
tuplist.append(datatup)
cursor.executemany(strSQL1,tuplist)
When we get to the cursor.executemany() call, strSQL looks like this:
REPLACE INTO std_met (station_id,date_time,WDIR,WSPD,GST,WVHT,DPD,APD,MWD,PRES,ATMP,WTMP,DEWP,VIS) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)'
I'm using % signs throughout and I am passing a list of tuples (~2315 tuples). Every value being passed is either a string,datetime, or number. I still have not found the issue. Any insights anyone cares to pass along would be sincerely appreciated.
Thanks!
You haven't given your SQL query a value for either station_id or date_time, so when it goes to unpack your arguments, there are two missing.
I suspect you want the final call to be something like:
REPLACE INTO std_met
(station_id,date_time,WDIR,WSPD,GST,WVHT,DPD,APD,MWD,
PRES,ATMP,WTMP,DEWP,VIS) VALUES (%s, %s, %s,%s,%s,%s,
%s,%s,%s,%s,%s,%s,%s,%s)'
Note the extra two %s. It looks like your tuple already contains values for station_id and date_time, so you could try this change:
strSQL1='REPLACE INTO std_met (station_id,date_time,'
strSQL2='VALUES (%s, %s, '

Is it possible to create a query command that takes in a list of variables in python-mysql

I am trying to do a multiquery which utilizes executemany in MySQLDb library. After searching around, I found that I'll have to create a command that uses INSERT INTO along with ON DUPLICATE KEY instead of UPDATE in order to use executemany
All is good so far, but then I run into a problem which I can't set the SET part efficiently. My table has about 20 columns (whether you want to criticize the fatness of the table is up to you. It works for me so far) and I want to form the command string efficiently if possible.
Right now I have
update_query = """
INSERT INTO `my_table`
({all_columns}) VALUES({vals})
ON DUPLICATE KEY SET <should-have-each-individual-column-set-to-value-here>
""".format(all_columns=all_columns, vals=vals)
Where all_columns covers all the columns, and vals cover bunch of %s as I'm going to use executemany later.
However I have no idea how to form the SET part of string. I thought about using comma-split to separate them into elements in a list, but I'm not sure if I can iterate them.
Overall, the goal of this is to only call the db once for update, and that's the only way I can think of right now. If you happen to have a better idea, please let me know as well.
EDIT: adding more info
all_columns is something like 'id, id2, num1, num2'
vals right now is set to be '%s, %s, %s, %s'
and of course there are more columns than just 4
Assuming that you have a list of tuples for the set piece of your command:
listUpdate = [('f1', 'i'), ('f2', '2')]
setCommand = ', '.join([' %s = %s' % x for x in listUpdate])
all_columns = 'id, id2, num1, num2'
vals = '%s, %s, %s, %s'
update_query = """
INSERT INTO `my_table`
({all_columns}) VALUES({vals})
ON DUPLICATE KEY SET {set}
""".format(all_columns=all_columns, vals=vals, set=setCommand)
print(update_query)

TypeError when inserting JSON data into MySQL using MySQL-Python

I'm trying to insert data from a JSON string to MySQL using MySQLdb. The total number of columns is fixed. Each row of data from the JSON string does not always have values for each column.
Here is my sample code:
vacant_building = 'http://data.cityofchicago.org/resource/7nii-7srd.json?%24where=date_service_request_was_received=%272014-06-02T00:00:00%27'
obj = urllib2.urlopen(vacant_building)
data = json.load(obj)
def insert_mysql(columns, placeholders, data):
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s)" % (columns, placeholders)
db = MySQLdb.connect(host="localhost", user="xxxx", passwd="xxxx", db="chicago_data")
cur = db.cursor()
cur.execute(sql, data)
for row in data:
placeholders = ', '.join(['%s'] * len(row))
columns = ', '.join(c[:64] for c in row.keys())
row_data = ', '.join(str(value) for value in row.values())
insert_mysql(columns, placeholders, row_data)
I get the following error:
query = query % tuple([db.literal(item) for item in args])
TypeError: not all arguments converted during string formatting
I'm pretty sure the error has to do with the way I'm inserting the values. I've tried to change this to:
sql = "INSERT INTO vacant_buildings (%s) VALUES (%s) (%s)" % (columns, placeholders, data)
but I get a 1064 error. It's because the values are not enclosed by quotes (').
Thoughts to fix?
In order to parameterize your query using MySQLdb's cursor.execute method, the second argument to execute has to be a sequence of values; in your for loop, you're joining the values together into one string with the following line:
row_data = ', '.join(str(value) for value in row.values())
Since you generated a number of placeholders for your values equal to len(row), you need to supply that many values to cursor.execute. If you gave it only a single string, it will put that entire string into the first placeholder, leaving the others without any arguments. This will throw a TypeError - the message in this case would read, "not enough arguments for format string," but I'm going to assume you simply mixed up when copy/pasting because the opposite case (supplying too many arguments/too few placeholders) reads as you indicate, "not all arguments converted during string formatting."
In order to run an INSERT statement through MySQLdb with a variable set of columns, you could do just as you've done for the columns and placeholders, but I prefer to use mapping types with the extended formatting syntax supported by MySQLdb (e.g., %(name)s instead of %s) to make sure that I've constructed my query correctly and not put the values into any wrong order. I also like using advanced string formatting where possible in my own code.
You could prepare your inputs like this:
max_key_length = 64
columns = ','.join(k[:max_key_length] for k in row.keys())
placeholders = ','.join('%({})s'.format(k[:max_key_length]) for k in row.keys())
row_data = [str(v) for v in row.values()]
Noting that the order of the dict comprehensions is guaranteed, so long as you don't alter the dict in the meanwhile.
Generally speaking, this should work okay with the sort of code in your insert_mysql function. However, looking at the JSON data you're actually pulling from that URL, you should be aware that you may run into nesting issues; for example:
>>> pprint.pprint(data[0])
{u'address_street_direction': u'W',
u'address_street_name': u'61ST',
u'address_street_number': u'424',
u'address_street_suffix': u'ST',
u'any_people_using_property_homeless_childen_gangs_': True,
u'community_area': u'68',
u'date_service_request_was_received': u'2014-06-02T00:00:00',
u'if_the_building_is_open_where_is_the_entry_point_': u'FRONT',
u'is_building_open_or_boarded_': u'Open',
u'is_the_building_currently_vacant_or_occupied_': u'Vacant',
u'is_the_building_vacant_due_to_fire_': False,
u'latitude': u'41.78353874626324',
u'location': {u'latitude': u'41.78353874626324',
u'longitude': u'-87.63573355602661',
u'needs_recoding': False},
u'location_of_building_on_the_lot_if_garage_change_type_code_to_bgd_': u'Front',
u'longitude': u'-87.63573355602661',
u'police_district': u'7',
u'service_request_number': u'14-00827306',
u'service_request_type': u'Vacant/Abandoned Building',
u'ward': u'20',
u'x_coordinate': u'1174508.30988836',
u'y_coordinate': u'1864483.93566661',
u'zip_code': u'60621'}
The string representation of the u'location' column is:
"{u'latitude': u'41.78353874626324', u'needs_recoding': False, u'longitude': u'-87.63573355602661'}"
You may not want to put that into a database field, especially considering that there are atomic lat/lon fields already in the JSON object.

Categories

Resources