I am trying to store some data generated by a python script in a MySQL database. Essentially I am using the commands:
con = oursql.connect(user="user", host="host", passwd="passwd",
db="testdb")
c = con.cursor()
c.executemany(insertsimoutput, zippedsimoutput)
con.commit()
c.close()
where,
insertsimoutput = '''insert into simoutput
(repnum,
timepd,
...) values (?, ?, ...?)'''
About 30,000 rows are inserted and there are about 15 columns. The above takes about 7 minutes. If I use MySQLdb instead of oursql, it takes about 2 seconds. Why this huge difference? Is this supposed to be done some other way in oursql, our oursql is just plain slow? If there is a better way to insert this data with oursql, I would appreciate if you can let me know.
Thank you.
The difference is that MySQLdb does some hackery to your query while oursql does not...
Taking this:
cursor.executemany("INSERT INTO sometable VALUES (%s, %s, %s)",
[[1,2,3],[4,5,6],[7,8,9]])
MySQLdb translates it before running into this:
cursor.execute("INSERT INTO sometable VALUES (1,2,3),(4,5,6),(7,8,9)")
But if you do:
cursor.executemany("INSERT INTO sometable VALUES (?, ?, ?)",
[[1,2,3],[4,5,6],[7,8,9]])
In oursql, it gets translated into something like this pseudocode:
stmt = prepare("INSERT INTO sometable VALUES (?, ?, ?)")
for params in [[1,2,3],[4,5,6],[7,8,9]]:
stmt.execute(*params)
So if you want to emulate what mysqldb is doing but benefit from prepared statements and other goodness with oursql, you need to do this:
from itertools import chain
data = [[1,2,3],[4,5,6],[7,8,9]]
one_val = "({})".format(','.join("?" for i in data[0]))
vals_clause = ','.join(one_val for i in data)
cursor.execute("INSERT INTO sometable VALUES {}".format(vals_clause),
chain.from_iterable(data))
I bet oursql will be faster when you do this :-)
Also, if you think its ugly, you are right. But just remember MySQL db is doing something uglier internally - its using regular expressions to parse your INSERT statement and break off the parameterized part and THEN doing what I suggested you do for oursql.
I would say to check if oursql supports a bulk insert sql command to boost performance.
Oursql does support bulk insert statements. I've written code to do so, using the sqlalchemy wrapper.
For pure oursql, something like this should be fine:
with open('tmp.csv', 'wb') as tmp:
for item in zippedsimoutput:
tmp.write("{0}\n".format(item))
c.execute("""LOAD DATA LOCAL INFILE 'tmp.csv' INTO TABLE flags FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' ;""")
Note that the rows must be in the same order as the columns on the database.
Related
I'm inserting data in a database with INSERT INTO
The main problem is that I'm inserting around 170k points of data and I'm looping a code like that:
for row in data:
SQL='''INSERT INTO Table
VALUES ({},{},{})
'''.format(row[0],row[1],row[2])
cur.execute(SQL)
cur.commit()
cur.close()
con.close()
This code is extremely slow, is there a faster way to do it?
I was thinking if there is a way to insert a whole column of my matrix data at once.
Try this. Basically you can achieve it using executemany() method.
import mysql.connector
mydb = mysql.connector.connect(
.....
)
mycursor = mydb.cursor()
val = []
for row in data:
val.append((row[0],row[1],row[2]))
sql = "INSERT INTO table (x,y,z) VALUES (%s, %s, %s)"
mycursor.executemany(sql, val)
mydb.commit()
Support may vary by DBMS (which you do not specify), but you can use a prepared statement, using your DBMS's paramstyle string in the VALUES clause, and pass a list of rows to the executemany() method. See the docs at https://www.python.org/dev/peps/pep-0249/#cursor-methods
I am using python 2.7 and postgresql 10.0.
For learning purposes I am attempting to get user raw_input and place into an insert execute, but no matter what I do, either it be %s or {} and using .format i am receiving errors.
all values are string except age (int)
specifically
with conn:
c.execute("INSERT INTO people(person_first, person_last, person_email,
person_age) VALUES ({}, {}, {}, {})".format(person_first, person_last,
person_email, person_age))
gives me non-string values (from the inputs)
and %s method gives me an error at the first '%' VALUES(%s, %s, %s, %s)
also have attempted VALUES (?, ?, ?, ?) and also unsuccessful similar to %s
The code, as pasted, looks wrong. You have with conn and c.execute. Assuming c is the cursor, and conn is the connection, the way to use them would look like this: with conn.cursor() as c:. The cursor is a context manager that will properly clean itself up when the with block exits.
Also, don't get in the habit of using .format() on your SQL. That will 1) be a vector for SQL injection vulnerabilities and 2) it will break if the input contains a single quote character.
So, combining those two points, your code should look like this:
with conn.cursor() as c:
c.execute("INSERT INTO people(person_first, person_last, person_email,
person_age) VALUES (%s, %s, %s, %s)", (person_first, person_last,
person_email, person_age,))
Note that the parameters are passed as a tuple directly to execute; the driver will parse the query, translate to appropriate SQL/parameter for the server, manage quoting, etc. If you are still seeing errors, post the traceback.
See also -
http://initd.org/psycopg/docs/usage.html#with-statement
http://initd.org/psycopg/docs/usage.html#the-problem-with-the-query-parameters
Hope this helps.
In a python script, I need to run a query on one datasource and insert each row from that query into a table on a different datasource. I'd normally do this with a single insert/select statement with a tsql linked server join but I don't have a linked server connection to this particular datasource.
I'm having trouble finding a simple pyodbc example of this. Here's how I'd do it but I'm guessing executing an insert statement inside a loop is pretty slow.
result = ds1Cursor.execute(selectSql)
for row in result:
insertSql = "insert into TableName (Col1, Col2, Col3) values (?, ?, ?)"
ds2Cursor.execute(insertSql, row[0], row[1], row[2])
ds2Cursor.commit()
Is there a better bulk way to insert records with pyodbc? Or is this a relatively efficient way to do this anyways. I'm using SqlServer 2012, and the latest pyodbc and python versions.
The best way to handle this is to use the pyodbc function executemany.
ds1Cursor.execute(selectSql)
result = ds1Cursor.fetchall()
ds2Cursor.executemany('INSERT INTO [TableName] (Col1, Col2, Col3) VALUES (?, ?, ?)', result)
ds2Cursor.commit()
Here's a function that can do the bulk insert into SQL Server database.
import pyodbc
import contextlib
def bulk_insert(table_name, file_path):
string = "BULK INSERT {} FROM '{}' (WITH FORMAT = 'CSV');"
with contextlib.closing(pyodbc.connect("MYCONN")) as conn:
with contextlib.closing(conn.cursor()) as cursor:
cursor.execute(string.format(table_name, file_path))
conn.commit()
This definitely works.
UPDATE: I've noticed at the comments, as well as coding regularly, that pyodbc is better supported than pypyodbc.
NEW UPDATE: remove conn.close() since the with statement handles that automatically.
Since the discontinuation of the pymssql library (which seems to be under development again) we started using the cTDS library developed by the smart people at Zillow and for our surprise it supports the FreeTDS Bulk Insert.
As the name suggests cTDS is written in C on top of FreeTDS library, which makes it fast, really fast. IMHO this is the best way to bulk insert into SQL Server since the ODBC driver does not support bulk insert and executemany or fast_executemany as suggested aren't really bulk insert operations. The BCP tool and T-SQL Bulk Insert has it limitations since it needs the file to be accessible by the SQL Server which can be a deal breaker in many scenarios.
Bellow a naive implementation of Bulk Inserting a CSV file. Please, forgive me for any bug, I wrote this from mind without testing.
I don't know why but for my server which uses Latin1_General_CI_AS I needed to wrap the data which goes into NVarChar columns with ctds.SqlVarChar. I opened an issue about this but developers said the naming is correct, so I changed my code to keep me mentally health.
import csv
import ctds
def _to_varchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.NVARCHAR.
"""
if txt == "null":
return None
return ctds.SqlNVarChar(txt)
def _to_nvarchar(txt: str) -> ctds.VARCHAR:
"""
Wraps strings into ctds.VARCHAR.
"""
if txt == "null":
return None
return ctds.SqlVarChar(txt.encode("utf-16le"))
def read(file):
"""
Open CSV File.
Each line is a column:value dict.
https://docs.python.org/3/library/csv.html?highlight=csv#csv.DictReader
"""
with open(file, newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
yield row
def transform(row):
"""
Do transformations to data before loading.
Data specified for bulk insertion into text columns (e.g. VARCHAR,
NVARCHAR, TEXT) is not encoded on the client in any way by FreeTDS.
Because of this behavior it is possible to insert textual data with
an invalid encoding and cause the column data to become corrupted.
To prevent this, it is recommended the caller explicitly wrap the
the object with either ctds.SqlVarChar (for CHAR, VARCHAR or TEXT
columns) or ctds.SqlNVarChar (for NCHAR, NVARCHAR or NTEXT columns).
For non-Unicode columns, the value should be first encoded to
column’s encoding (e.g. latin-1). By default ctds.SqlVarChar will
encode str objects to utf-8, which is likely incorrect for most SQL
Server configurations.
https://zillow.github.io/ctds/bulk_insert.html#text-columns
"""
row["col1"] = _to_datetime(row["col1"])
row["col2"] = _to_int(row["col2"])
row["col3"] = _to_nvarchar(row["col3"])
row["col4"] = _to_varchar(row["col4"])
return row
def load(rows):
stime = time.time()
with ctds.connect(**DBCONFIG) as conn:
with conn.cursor() as curs:
curs.execute("TRUNCATE TABLE MYSCHEMA.MYTABLE")
loaded_lines = conn.bulk_insert("MYSCHEMA.MYTABLE", map(transform, rows))
etime = time.time()
print(loaded_lines, " rows loaded in ", etime - stime)
if __name__ == "__main__":
load(read('data.csv'))
You should use executemany with the cursor.fast_executemany = True, to improve the performance.
pyodbc's default behaviour is to run many inserts, but this is inefficient. By applying fast_executemany, you can drastically improve performance.
Here is an example:
connection = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server}',host='host', database='db', user='usr', password='foo')
cursor = connection.cursor()
# I'm the important line
cursor.fast_executemany = True
sql = "insert into TableName (Col1, Col2, Col3) values (?, ?, ?)"
tuples=[('foo','bar', 'ham'), ('hoo','far', 'bam')]
cursor.executemany(sql, tuples)
cursor.commit()
cursor.close()
connection.close()
Docs.
Note that this has been available since 4.0.19 Oct 23, 2017
Helpful function for generating the SQL required for using execute_many():
def generate_bulk_insert_sql(self, data:pd.DataFrame, table_name) -> str:
table_sql = str([c for c in data.columns]).replace("'","").replace("[", "").replace("]", "")
return f'INSERT INTO {table_name} ({table_sql}) VALUES ({("?,"*len(data.columns))[:-1]})
Background:
So I have a large array that I am reading from one source and trying to write (efficiently) into SQLite3 using python.
Currently I use the default form:
cursor.executemany("INSERT into mytable1 VALUES(?,?,?)", my_arr.tolist())
Now I want to expand to a few hundred thousand tables. I would like to be able to do something like the following (wish):
cursor.executemany("INSERT into ? VALUES(?,?,?)", TableNameList, my_arr.tolist())
Questions:
Is there a way to do this without inserting columns into the array
before converting it to list? What?
If there is not such a way, then suggestions and alternatives are
requested.
I tried looking in stackexchange, but may have missed something.
I tried looking in the Python SQLite docs, but did not see something like this.
I tried generic google search.
First, the Python bit. Assuming that my_arr is some sort of two-dimensional array, and that .tolist() produces a list-of-lists, Yes, there is a way to add an element to every row in your list:
result = [[a]+b for a,b in zip(TableNameList, my_arr.tolist()]
Second, the SQL bit. No, you can't use ? to specify a table name. The table name must be literally present in the SQL statement. The best that I can offer you is to run curssor.execute several times:
for table, values in zip(TableNameList, my_arr):
c.execute("INSERT INTO %s VALUES (?, ?, ?)"%table, values)
But, be mindful of whether you trust the source of TableNameList. Using untrusted data in a %s leads to SQL injection security flaws.
Sample program:
import sqlite3
import numpy as np
import itertools
my_arr = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
TableNameList = 't1', 't1', 't2', 't3'
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''CREATE TABLE t1 (c1, c2, c3)''')
c.execute('''CREATE TABLE t2 (c1, c2, c3)''')
c.execute('''CREATE TABLE t3 (c1, c2, c3)''')
## Insert a row of data
#c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
for table, values in itertools.izip(TableNameList, my_arr):
c.execute("INSERT INTO %s VALUES (?, ?, ?)"%table, values)
# Save (commit) the changes
conn.commit()
# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
conn.close()
I am trying to use SQL with prepared statements in Python. Python doesn't have its own mechanism for this so I try to use SQL directly:
sql = "PREPARE stmt FROM ' INSERT INTO {} (date, time, tag, power) VALUES (?, ?, ?, ?)'".format(self.db_scan_table)
self.cursor.execute(sql)
Then later, in the loop:
sql = "EXECUTE stmt USING \'{}\', \'{}\', {}, {};".format(d, t, tag, power)
self.cursor.execute(sql)
And in the loop I get:
MySQL Error [1064]: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ''2014-12-25', '12:31:46', 88000000, -6.64' at line 1
What's going on?
Using prepared statements with MySQL in Python is explained e.g at http://zetcode.com/db/mysqlpython/ -- look within that page for Prepared statements.
In your case, that would be, e.g:
sql = ('INSERT INTO {} (date, time, tag, power) VALUES '
'(%s, %s, %s, %s)'.format(self.db_scan_table))
and later, "in the loop" as you put it:
self.cursor.execute(sql, (d, t, tag, power))
with no further string formatting -- the MySQLdb module does the prepare and execute parts on your behalf (and may cache things to avoid repeating work needlessly, etc, etc).
Do consider, depending on the nature of "the loop" you mention, that it's possible that a single call to .execute_many (with a sequence of tuples as the second argument) could take the place of the whole loop (unless you need more processing within that loop beyond just the insertion of data into the DB).
Added: a better alternative nowadays may be to use mysql's own Connector/Python and the explicit prepare=True option in the .cursor() factory -- see http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html . This lets you have a specific cursor on which statements are prepared (with the "more efficient than using PREPARE and EXECUTE" binary protocol, according to that mysql.com page) and another one for statements that are better not prepared; "explicit is better than implicit" is after all one of the principles in "The Zen of Python" (import this from an interactive prompt to read all those principles). mysqldb doing things implicitly (and it seems the current open-source version doesn't use prepared statements) can't be as good an architecture as Connector/Python's more explicit one.
import mysql.connector
db_con=mysql.connector.connect(host='',
database='',
user='',
password='')
cursor = db_con.cursor(prepared=True,)
#cursor = db_con.cursor(prepared=True)#IT MAY HAVE PROBLEM
sql = """INSERT INTO table (xy,zy) VALUES (%s, %s)"""
input=(1,2)
cursor.execute(sql , input)
db_con.commit()
SELECT STMT
sql = """SELECT * FROM TABLE WHERE XY=%s ORDER BY id DESC LIMIT 1 """
ID=1
input=(ID,)
#input=(ID)# IT MAY HAS PROBLEM
cursor.execute(sql, input)
data = cursor.fetchall()
rowsNumber=cursor.rowcount
Python does support prepared statements:
sql = "INSERT INTO {} (date, time, tag, power) VALUES (%s, %s, %s, %s);"
sql = sql.format(self.db_scan_table)
self.cursor.execute(sql, (d, t, tag, power))
(You should ensure self.db_scan_table is not vulnerable to SQL injection)
This assumes your paramstyle is 'format', which it should be for MySQL.