Insert list of dictionaries and variable into table - python

lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
I have a long list of dictionaries of the form above.
I have two fixed variables.
person = 'Sam'
date = datetime.datetime.now()
I wish to insert this information into a mysql table.
How I do it currently
for item in lst:
item['Person'] = person
item['Date'] = date
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%(Person)s, %(Date)s, %(Fruit)s, %(HadToday)s)""", lst)
conn.commit()
Is their a way to do it, that bypasses the loop as the person and date variables are constant. I have tried
lst = [{'Fruit':'Apple','HadToday':2},{'Fruit':'Banana','HadToday':8}]
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (%s, %s, %(Fruit)s, %(HadToday)s)""", (person,date,lst))
conn.commit()
TypeError: not enough arguments for format string

Your problem here is, that it tries to apply all of lst into %(Fruit)s and nothing is left for %(HadToday)s).
You should not fix it by hardcoding the fixed values into the statement as you get into troubles if you have a name like "Tim O'Molligan" - its better to let the db handle the correct formatting.
Not mysql, but you get the gist: http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters - learned this myself just a week ago ;o)
The probably cleanest way would be to use
cursor.execute("SET #myname = %s", (person,))
cursor.execute("SET #mydate = %s", (datetime.datetime.now(),))
and use
cursor.executemany("""
INSERT INTO myTable (Person,Date,Fruit,HadToday)
VALUES (#myname, #mydate, %(Fruit)s, %(HadToday)s)""", lst)
I am not 100% about the syntax, but I hope you get the idea. Comment/edit the answer if I have a misspell in it.

Related

Properly format SQL query when insert into variable number of columns

I'm using psycopg2 to interact with a PostgreSQL database. I have a function whereby any number of columns (from a single column to all columns) in a table could be inserted into. My question is: how would one properly, dynamically, construct this query?
At the moment I am using string formatting and concatenation and I know this is the absolute worst way to do this. Consider the below code where, in this case, my unknown number of columns (i.e. keys from a dict is in fact 2):
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
fields, values = list(), list()
for key, val in dictOfUnknownLength.items():
fields.append(key)
values.append(val)
fields = str(fields).replace('[', '(').replace(']', ')').replace("'", "")
values = str(values).replace('[', '(').replace(']', ')')
query = f"INSERT INTO myTable {fields} VALUES {values} RETURNING someValue;"
query = INSERT INTO myTable (key1, key2) VALUES (3, 'myString') RETURNING someValue;
This provides a correctly formatted query but is of course prone to SQL injections and the like and, as such, is not an acceptable method of achieving my goal.
In other queries I am using the recommended methods of query construction when handling a known number of variables (%s and separate argument to .execute() containing variables) but I'm unsure how to adapt this to accommodate an unknown number of variables without using string formatting.
How can I elegantly and safely construct a query with an unknown number of specified insert columns?
To add to your worries, the current methodology using .replace() is prone to edge cases where fields or values contain [, ], or '. They will get replaced no matter what and may mess up your query.
You could always use .join() to join a variable number of values in your list. To top it up, format the query appropriately with %s after VALUES and pass your arguments into .execute().
Note: You may also want to consider the case where the number of fields is not equal to the number values.
import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()
dictOfUnknownLength = {'key1': 3, 'key2': 'myString'}
def createMyQuery(user_ids, dictOfUnknownLength):
# Directly assign keys/values.
fields, values = list(dictOfUnknownLength.keys()), list(dictOfUnknownLength.values())
if len(fields) != len(values):
# Raise an error? SQL won't work in this case anyways...
pass
# Stringify the fields and values.
fieldsParam = ','.join(fields) # "key1, key2"
valuesParam = ','.join(['%s']*len(values))) # "%s, %s"
# "INSERT ... (key1, key2) VALUES (%s, %s) ..."
query = 'INSERT INTO myTable ({}) VALUES ({}) RETURNING someValue;'.format(fieldsParam, valuesParam)
# .execute('INSERT ... (key1, key2) VALUES (%s, %s) ...', [3, 'myString'])
cur.execute(query, values) # Anti-SQL-injection: pass placeholder
# values as second argument.

Is it possible to create a query command that takes in a list of variables in python-mysql

I am trying to do a multiquery which utilizes executemany in MySQLDb library. After searching around, I found that I'll have to create a command that uses INSERT INTO along with ON DUPLICATE KEY instead of UPDATE in order to use executemany
All is good so far, but then I run into a problem which I can't set the SET part efficiently. My table has about 20 columns (whether you want to criticize the fatness of the table is up to you. It works for me so far) and I want to form the command string efficiently if possible.
Right now I have
update_query = """
INSERT INTO `my_table`
({all_columns}) VALUES({vals})
ON DUPLICATE KEY SET <should-have-each-individual-column-set-to-value-here>
""".format(all_columns=all_columns, vals=vals)
Where all_columns covers all the columns, and vals cover bunch of %s as I'm going to use executemany later.
However I have no idea how to form the SET part of string. I thought about using comma-split to separate them into elements in a list, but I'm not sure if I can iterate them.
Overall, the goal of this is to only call the db once for update, and that's the only way I can think of right now. If you happen to have a better idea, please let me know as well.
EDIT: adding more info
all_columns is something like 'id, id2, num1, num2'
vals right now is set to be '%s, %s, %s, %s'
and of course there are more columns than just 4
Assuming that you have a list of tuples for the set piece of your command:
listUpdate = [('f1', 'i'), ('f2', '2')]
setCommand = ', '.join([' %s = %s' % x for x in listUpdate])
all_columns = 'id, id2, num1, num2'
vals = '%s, %s, %s, %s'
update_query = """
INSERT INTO `my_table`
({all_columns}) VALUES({vals})
ON DUPLICATE KEY SET {set}
""".format(all_columns=all_columns, vals=vals, set=setCommand)
print(update_query)

How can I insert NULL data into MySQL database with Python?

I'm getting a weird error when inserting some data from a Python script to MySQL. It's basically related to a variable being blank that I am inserting. I take it that MySQL does not like blank variables but is there something else I can change it to so it works with my insert statement?
I can successfully use an IF statement to turn it to 0 if its blank but this may mess up some of the data analytics I plan to do in MySQL later. Is there a way to convert it to NULL or something so MySQL accepts it but doesn't add anything?
When using mysqldb and cursor.execute(), pass the value None, not "NULL":
value = None
cursor.execute("INSERT INTO table (`column1`) VALUES (%s)", (value,))
Found the answer here
if the col1 is char, col2 is int, a trick could be:
insert into table (col1, col2) values (%s, %s) % ("'{}'".format(val1) if val1 else "NULL", val2 if val2 else "NULL");
you do not need to add ' ' to %s, it could be processed before pass value to sql.
this method works when execute sql with session of sqlalchemy, for example session.execute(text(sql))
ps: sql is not tested yet
Quick note about using parameters in SQL statements with Python. See the RealPython article on this topic - Preventing SQL Injection Attacks With Python. Here's another good article from TowardsDataScience.com - A Simple Approach To Templated SQL Queries In Python. These helped me with same None/NULL issue.
Also, I found that if I put "NULL" (without quotes) directly into the INSERT query in VALUES, it was interpreted appropriately in the SQL Server DB. The translation problem only exists if needing to conditionally add NULL or a value via string interpolation.
Examples:
cursor.execute("SELECT admin FROM users WHERE username = %s'", (username, ));
cursor.execute("SELECT admin FROM users WHERE username = %(username)s", {'username': username});
UPDATE: This StackOverflow discussion is more in line with what I'm trying to do and may help someone else.
Example:
import pypyodbc
myData = [
(1, 'foo'),
(2, None),
(3, 'bar'),
]
connStr = """
DSN=myDb_SQLEXPRESS;
"""
cnxn = pypyodbc.connect(connStr)
crsr = cnxn.cursor()
sql = """
INSERT INTO myTable VALUES (?, ?)
"""
for dataRow in myData:
print(dataRow)
crsr.execute(sql, dataRow)
cnxn.commit()
crsr.close()
cnxn.close()
Based on above answers I wrote a wrapper function for my use case, you can try and change the function according to your need.
def sanitizeData(value):
if value in ('', None):
return "NULL"
# This case handles the case where value already has ' in it (ex: O'Brien). This is how SQL skils single quotes
if type(value) is str:
return "'{}'".format(value.replace("'", "''"))
return value
Now call the sql query like so,
"INSERT INTO %s (Name, Email) VALUES (%s, %s)"%(table_name, sanitizeData(actual_name), sanitizeData(actual_email))
Why not set the variable equal to some string like 'no price' and then filter this out later when you want to do math on the numbers?
filter(lambda x: x != 'no price',list_of_data_from_database)
Do a quick check for blank, and if it is, set it equal to NULL:
if(!variable_to_insert)
variable_to_insert = "NULL"
...then make sure that the inserted variable is not in quotes for the insert statement, like:
insert = "INSERT INTO table (var) VALUES (%s)" % (variable_to_insert)
...
not like:
insert = "INSERT INTO table (var) VALUES ('%s')" % (variable_to_insert)
...

Problems INSERTing record if similar doesn't already exist

I'm trying to check whether a record already exists in the database (by similar title), and insert it if not. I've tried it two ways and neither quite works.
More elegant way (?) using IF NOT EXISTS
if mode=="update":
#check if book is already present in the system
cursor.execute('IF NOT EXISTS (SELECT * FROM book WHERE TITLE LIKE "%s") INSERT INTO book (title,author,isbn) VALUES ("%s","%s","%s") END IF;' % (title,title,author,isbn))
cursor.execute('SELECT bookID FROM book WHERE TITLE LIKE "%s";' % (title))
bookID = cursor.fetchall()
print('found the bookid %s' % (bookID))
#cursor.execute('INSERT INTO choice (uid,catID,priority,bookID) VALUES ("%d","%s","%s","%s");' % ('1',cat,priority,bookID)) #commented out because above doesn't work
With this, I get an error on the IF NOT EXISTS query saying that "author" isn't defined (although it is).
Less elegant way using count of matching records
if mode=="update":
#check if book is already present in the system
cursor.execute('SELECT COUNT(*) FROM book WHERE title LIKE "%s";' % (title))
anyresults = cursor.fetchall()
print('anyresults looks like %s' % (anyresults))
if anyresults[0] == 0: # if we didn't find a bookID
print("I'm in the loop for adding a book")
cursor.execute('INSERT INTO book (title,author,isbn) VALUES ("%s","%s","%s");' % (title,author,isbn))
cursor.execute('SELECT bookID FROM book WHERE TITLE LIKE "%s";' % (title))
bookID = cursor.fetchall()
print('found the bookid %s' % (bookID))
#cursor.execute('INSERT INTO choice (uid,catID,priority,bookID) VALUES ("%d","%s","%s","%s");' % ('1',cat,priority,bookID)) #commented out because above doesn't work
In this version, anyresults is a tuple that looks like (0L,) but I can't find a way of matching it that gets me into that "loop for adding a book." if anyresults[0] == 0, 0L, '0', '0L' -- none of these seem to get me into the loop.
I think I may not be using IF NOT EXISTS correctly--examples I've found are for separate procedures, which aren't really in the scope of this small project.
ADDITION:
I think unutbu's code will work great, but I'll still getting this dumb NameError saying author is undefined which prevents the INSERT from being tried, even when I am definitely passing it in.
if form.has_key("title"):
title = form['title'].value
mode = "update"
if form.has_key("author"):
author = form['author'].value
mode = "update"
print("I'm in here")
if form.has_key("isbn"):
isbn = form['isbn'].value
mode = "update"
It never prints that "I'm in here" test statement. What would stop it getting in there? It seems so obvious--I keep checking my indentation, and I'm testing it on the command line and definitely specifying all three parameters.
If you set up a UNIQUE index on book, then inserting unique rows is easy.
For example,
mysql> ALTER IGNORE TABLE book ADD UNIQUE INDEX book_index (title,author);
WARNING: if there are rows with non-unique (title,author) pairs, all but one such row will be dropped.
If you want just the author field to be unique, then just change (title,author) to (author).
Depending on how big the table, this may take a while...
Now, to insert a unique record,
sql='INSERT IGNORE INTO book (title,author,isbn) VALUES (%s, %s, %s)'
cursor.execute(sql,[title,author,isbn])
If (title,author) are unique, the triplet (title,author,isbn) is inserted into the book table.
If (title,author) are not unique, then the INSERT command is ignored.
Note, the second argument to cursor.execute. Passing arguments this way helps prevent SQL injection.
This doesn't answer your question since it's for Postgresql rather than MySQL, but I figured I'd drop it in for people searching their way here.
In Postgres, you can batch insert items if they don't exist:
CREATE TABLE book (title TEXT, author TEXT, isbn TEXT);
# Create a row of test data:
INSERT INTO book (title,author,isbn) VALUES ('a', 'b', 'c');
# Do the real batch insert:
INSERT INTO book
SELECT add.* FROM (VALUES
('a', 'b', 'c'),
('d', 'e', 'f'),
('g', 'h', 'i'),
) AS add (title, author, isbn)
LEFT JOIN book ON (book.title = add.title)
WHERE book.title IS NULL;
This is pretty simple. It selects the new rows as if they're a table, then left joins them against the existing data. The rows that don't already exist will join against a NULL row; we then filter out the ones that already exist (where book.title isn't NULL). This is extremely fast: it takes only a single database transaction to do a large batch of inserts, and lets the database backend do a bulk join, which it's very good at.
By the way, you really need to stop formatting your SQL queries directly (unless you really have to and really know what you're doing, which you don't here). Use query substitution, eg. cur.execute("SELECT * FROM table WHERE title=? and isbn=?", (title, isbn)).

Python + MySQLdb executemany

I'm using Python and its MySQLdb module to import some measurement data into a Mysql database. The amount of data that we have is quite high (currently about ~250 MB of csv files and plenty of more to come).
Currently I use cursor.execute(...) to import some metadata. This isn't problematic as there are only a few entries for these.
The problem is that when I try to use cursor.executemany() to import larger quantities of the actual measurement data, MySQLdb raises a
TypeError: not all arguments converted during string formatting
My current code is
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into values (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()
where values is a list of tuples containing three strings each. Any ideas what could be wrong with this?
Edit:
The values are generated by
yield (prefix + row['id'], row['value'], sample_id)
and then read into a list one thousand at a time where row is and iterator coming from csv.DictReader.
In retrospective this was a really stupid but hard to spot mistake. Values is a keyword in sql so the table name values needs quotes around it.
def __insert_values(self, values):
cursor = self.connection.cursor()
cursor.executemany("""
insert into `values` (ensg, value, sampleid)
values (%s, %s, %s)""", values)
cursor.close()
The message you get indicates that inside the executemany() method, one of the conversions failed. Check your values list for a tuple longer than 3.
For a quick verification:
max(map(len, values))
If the result is higher than 3, locate your bad tuple with a filter:
[t for t in values if len(t) != 3]
or, if you need the index:
[(i,t) for i,t in enumerate(values) if len(t) != 3]

Categories

Resources