python sqlite3 executemany using multiple lists - python

Background:
So I have a large array that I am reading from one source and trying to write (efficiently) into SQLite3 using python.
Currently I use the default form:
cursor.executemany("INSERT into mytable1 VALUES(?,?,?)", my_arr.tolist())
Now I want to expand to a few hundred thousand tables. I would like to be able to do something like the following (wish):
cursor.executemany("INSERT into ? VALUES(?,?,?)", TableNameList, my_arr.tolist())
Questions:
Is there a way to do this without inserting columns into the array
before converting it to list? What?
If there is not such a way, then suggestions and alternatives are
requested.
I tried looking in stackexchange, but may have missed something.
I tried looking in the Python SQLite docs, but did not see something like this.
I tried generic google search.

First, the Python bit. Assuming that my_arr is some sort of two-dimensional array, and that .tolist() produces a list-of-lists, Yes, there is a way to add an element to every row in your list:
result = [[a]+b for a,b in zip(TableNameList, my_arr.tolist()]
Second, the SQL bit. No, you can't use ? to specify a table name. The table name must be literally present in the SQL statement. The best that I can offer you is to run curssor.execute several times:
for table, values in zip(TableNameList, my_arr):
c.execute("INSERT INTO %s VALUES (?, ?, ?)"%table, values)
But, be mindful of whether you trust the source of TableNameList. Using untrusted data in a %s leads to SQL injection security flaws.
Sample program:
import sqlite3
import numpy as np
import itertools
my_arr = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
TableNameList = 't1', 't1', 't2', 't3'
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''CREATE TABLE t1 (c1, c2, c3)''')
c.execute('''CREATE TABLE t2 (c1, c2, c3)''')
c.execute('''CREATE TABLE t3 (c1, c2, c3)''')
## Insert a row of data
#c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
for table, values in itertools.izip(TableNameList, my_arr):
c.execute("INSERT INTO %s VALUES (?, ?, ?)"%table, values)
# Save (commit) the changes
conn.commit()
# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
conn.close()

Related

INSERT INTO a whole list with python and SQL

I'm inserting data in a database with INSERT INTO
The main problem is that I'm inserting around 170k points of data and I'm looping a code like that:
for row in data:
SQL='''INSERT INTO Table
VALUES ({},{},{})
'''.format(row[0],row[1],row[2])
cur.execute(SQL)
cur.commit()
cur.close()
con.close()
This code is extremely slow, is there a faster way to do it?
I was thinking if there is a way to insert a whole column of my matrix data at once.
Try this. Basically you can achieve it using executemany() method.
import mysql.connector
mydb = mysql.connector.connect(
.....
)
mycursor = mydb.cursor()
val = []
for row in data:
val.append((row[0],row[1],row[2]))
sql = "INSERT INTO table (x,y,z) VALUES (%s, %s, %s)"
mycursor.executemany(sql, val)
mydb.commit()
Support may vary by DBMS (which you do not specify), but you can use a prepared statement, using your DBMS's paramstyle string in the VALUES clause, and pass a list of rows to the executemany() method. See the docs at https://www.python.org/dev/peps/pep-0249/#cursor-methods

Proper way to store table in python efficiently: low memory usage and fast search by few complex indexes

I need to store few tables of strings in python (each table contains few million records). Let the header be ("A", "B", "C") and ("A, "B") is the data primary key. Then I need following operations to proceed fast:
Add new record (need O(1) complexity).
Find / update, delete record with (A="spam", B="eggs") (need O(1) complexity).
Find all records with (A="spam", C="foo") (need O(k) complexity, where k is the number of result rows).
I see a solution based on nested dicts structure for each index. It fits my needs, but I think, there is a better existing solution.
As suggested in the comments use a database. sqlite3 is small and fairly easy. It creates a database that exists in a single file and you interact with it.
Here is an adapted example from the API
import sqlite3
# Connect to your database (or create it if it was not there)
db = sqlite3.connect('data.db')
# Create the table
conn = db.cursor()
conn.execute("""
CREATE TABLE my_table
A text,
B text,
C text
""")
# Add an entry to the db
conn.execute("INSERT INTO my_table VALUES ('spam','eggs','foo')")
# Read all the entries under a condition
for row in conn.execute("SELECT * FROM my_table WHERE A='spam' AND C='foo'"):
print(row)
#safely close the db connection
conn.close()
Note: example is in python3

How to use python2.7 to write more table of sqlite3 by using loop

I have a question to write different table by using loop
I have 3 table in sqlite file.
its name are Table_A, Table_B and Table_C
And i want to use loop to do it.
Could you help me to teach how to write the script.
My Code
import sqlite3
Data_temp= [1,2,3,4,5,6]
conn = sqlite3.connect('test.sqlite')
conn.execute('INSERT INTO Table_A values (?,?,?,?,?,?,?,?,?)', Data_temp)
conn.execute('INSERT INTO Table_B values (?,?,?,?,?,?,?,?,?)', Data_temp)
conn.execute('INSERT INTO Table_C values (?,?,?,?,?,?,?,?,?)', Data_temp)
conn.commit()
Thank you, erveybody!!
This looks like a homework to me, but w/e.
tables = ["Table_A ", "Table_B", "Table_C"]
for table in tables:
conn.execute('INSERT INTO {} values (?,?,?,?,?,?,?,?,?)'.format(table), Data_temp)
If you are using this, however, you will have to be extra-careful with the format thing. If tablenames are coming from user input, you will open your code to sql injection.

Using UPDATE in SQLite for Many Rows with a Python List

I using SQLite (sqlite3) interfaced with Python, to hold parameters in a table which I use for processing a large amount of data. Suppose I have already populated the table initially, but then change the parameters, and I want to update my table. If I create a Python list holding the updated parameters, for every row and column in the table, how do I update the table?
I have looked here and here (though the latter refers to C++ as opposed to Python) but these don't really answer my question.
To make this concrete, I show some of my code below:
import sqlite3 as sql
import numpy as np
db = sql.connect('./db.sq3')
cur = db.cursor()
#... Irrelevant Processing Code ...#
cur.execute("""CREATE TABLE IF NOT EXISTS process_parameters (
parameter_id INTEGER PRIMARY KEY,
exciton_bind_energy REAL,
exciton_bohr_radius REAL,
exciton_mass REAL,
exciton_density_per_QW REAL,
box_trap_side_length REAL,
electron_hole_overlap REAL,
dipole_matrix_element REAL,
k_cutoff REAL)""")
#Parameter list
process_params = [(E_X/1.6e-19, a_B/1e-9, m_exc/9.11e-31, 1./(np.sqrt(rho_0)*a_B), D/1e-6, phi0/1e8, d/1e-28, k_cut/(1./a_B)) for i in range(0,14641)]
#Check to see if table is populated or not
count = cur.execute("""SELECT COUNT (*) FROM process_parameters""").fetchone()[0]
#If it's not, fill it up
if count == 0:
cur.executemany("""INSERT INTO process_parameters VALUES(NULL, ?, ?, ?, ?, ?, ?, ?, ?);""", process_params)
db.commit()
Now, suppose that on a subsequent processing run, I change one or more of the parameters in process_params. What I'd like is for on any subsequent runs that Python will update the database with the most recent version of the parameters. So I do
else:
cur.executemany("""UPDATE process_parameters SET exciton_bind_energy=?, exciton_bohr_radius=?, exciton_mass=?, exciton_density_per_QW=?, box_trap_side_length=?, electron_hole_overlap=?, dipole_matrix_element=?, k_cutoff=?;""", process_params)
db.commit()
db.close()
But when I do this, the script seems to hang (or be going very slowly) such that Ctrl+C doesn't even quit the script (being run via ipython).
I know in this case, updating using a huge Python list may be irrelevant, but it's the principle here which I want to clarify, since at another time, I may not be updating every row with the same values. If someone could help me understand what's happening and/or how to fix this, I'd really appreciate it. Thank-you.
cur.executemany("""
UPDATE process_parameters SET
exciton_bind_energy=?,
exciton_bohr_radius=?,
exciton_mass=?,
exciton_density_per_QW=?,
box_trap_side_length=?,
electron_hole_overlap=?,
dipole_matrix_element=?,
k_cutoff=?
;
""", process_params)
You forgot the WHERE clause while updating. Without the WHERE clause, the UPDATE statement will update every row in the table. Since you provide 14641 sets of parameters, the SQLite driver will update rows for 14641 (input) × 14641 (rows in table) = 214 million times, which shows why it is slow.
The proper way is to update only the relevant row every time:
cur.executemany("""
UPDATE process_parameters SET
exciton_bind_energy=?,
exciton_bohr_radius=?,
exciton_mass=?,
exciton_density_per_QW=?,
box_trap_side_length=?,
electron_hole_overlap=?,
dipole_matrix_element=?,
k_cutoff=?
WHERE parameter_id=?
-- ^~~~~~~~~~~~~~~~~~~~ don't forget this
;
""", process_params)
For sure, this means process_params must include parameter IDs, and you need to modify the INSERT statement to insert the parameter ID as well.

oursql extremely slow in inserting data

I am trying to store some data generated by a python script in a MySQL database. Essentially I am using the commands:
con = oursql.connect(user="user", host="host", passwd="passwd",
db="testdb")
c = con.cursor()
c.executemany(insertsimoutput, zippedsimoutput)
con.commit()
c.close()
where,
insertsimoutput = '''insert into simoutput
(repnum,
timepd,
...) values (?, ?, ...?)'''
About 30,000 rows are inserted and there are about 15 columns. The above takes about 7 minutes. If I use MySQLdb instead of oursql, it takes about 2 seconds. Why this huge difference? Is this supposed to be done some other way in oursql, our oursql is just plain slow? If there is a better way to insert this data with oursql, I would appreciate if you can let me know.
Thank you.
The difference is that MySQLdb does some hackery to your query while oursql does not...
Taking this:
cursor.executemany("INSERT INTO sometable VALUES (%s, %s, %s)",
[[1,2,3],[4,5,6],[7,8,9]])
MySQLdb translates it before running into this:
cursor.execute("INSERT INTO sometable VALUES (1,2,3),(4,5,6),(7,8,9)")
But if you do:
cursor.executemany("INSERT INTO sometable VALUES (?, ?, ?)",
[[1,2,3],[4,5,6],[7,8,9]])
In oursql, it gets translated into something like this pseudocode:
stmt = prepare("INSERT INTO sometable VALUES (?, ?, ?)")
for params in [[1,2,3],[4,5,6],[7,8,9]]:
stmt.execute(*params)
So if you want to emulate what mysqldb is doing but benefit from prepared statements and other goodness with oursql, you need to do this:
from itertools import chain
data = [[1,2,3],[4,5,6],[7,8,9]]
one_val = "({})".format(','.join("?" for i in data[0]))
vals_clause = ','.join(one_val for i in data)
cursor.execute("INSERT INTO sometable VALUES {}".format(vals_clause),
chain.from_iterable(data))
I bet oursql will be faster when you do this :-)
Also, if you think its ugly, you are right. But just remember MySQL db is doing something uglier internally - its using regular expressions to parse your INSERT statement and break off the parameterized part and THEN doing what I suggested you do for oursql.
I would say to check if oursql supports a bulk insert sql command to boost performance.
Oursql does support bulk insert statements. I've written code to do so, using the sqlalchemy wrapper.
For pure oursql, something like this should be fine:
with open('tmp.csv', 'wb') as tmp:
for item in zippedsimoutput:
tmp.write("{0}\n".format(item))
c.execute("""LOAD DATA LOCAL INFILE 'tmp.csv' INTO TABLE flags FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' ;""")
Note that the rows must be in the same order as the columns on the database.

Categories

Resources