Faster solution than executemany to insert multiple rows at once in pyodbc - python

I would like to insert multiple rows with one insert statement.
I tried with
params = ((1, 2), (3,4), (5,6))
sql = 'insert into tablename (column_name1, column_name2) values (?, ?)'
cursor.fast_executemany = True
cursor.executemany(sql, params)
but it's simple loop on the params with running execute method under the hood.
I also tried with creating longer insert statement to be like INSERT INTO tablename (col1, col2) VALUES (?,?), (?,?)...(?,?).
def flat_map_list_of_tuples(list_of_tuples):
return [element for tupl in list_of_tuples for element in tupl])
args_str = ', '.join('(?,?)' for x in params)
sql = 'insert into tablename (column_name1, column_name2) values'
db.cursor.execute(sql_template + args_str, flat_map_list_of_tuples(params))
It worked and reduced time of insertion from 10.9s to 6.1.
Is this solution correct? Does it have some vulnerabilities?

Is this solution correct?
The solution you propose, which is to build a table value constructor (TVC), is not incorrect but it is really not necessary. pyodbc with fast_executemany=True and Microsoft's ODBC Driver 17 for SQL Server is about as fast as you're going to get short of using BULK INSERT or bcp as described in this answer.
Does it have some vulnerabilities?
Since you are building a TVC for a parameterized query you are protected from SQL Injection vulnerabilities, but there are still a couple of implementation considerations:
A TVC can insert a maximum of 1000 rows at a time.
pyodbc executes SQL statements by calling a system stored procedure, and stored procedures in SQL Server can accept a maximum of 2100 parameters, so the number of rows that your TVC can insert is also limited to (number_of_rows * number_of_columns < 2100).
In other words, your TVC approach will be limited to a "chunk size" of 1000 rows or less. The actual calculation is described in this answer.

Related

In python script i have insert query but when i want insert multiple columns in the same query it gives error

In python script i have insert query but when i want insert multiple columns in the same query it gives error.
but for single query it works perfectly.
Below is my code.
my database AWS S3.
A = [] #
for score_row in score:
A.append(score_row[2])
print("A=",A)
B = [] #
for day_row in score:
B.append(day_row[1])
print("B=",B)
for x,y in zip(A,B):
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.executemany(sql, (x,),(y,))
when i replace above query with following sql insert statement it works perfect.
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?)"""
cursor.executemany(sql, (x,))
Fix your code like this:
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.execute(sql, (x,y,)) #<-- here
Because is just a onet insert ( not several inserts )
Explanation
I guess you are mistaked about number of inserts ( rows ) and number of parĂ meters ( fields to insert on each row ). When you want to insert several rows, use executemany, just for one row you should to use execute. Second parapeter of execute is the "list" (or sequence ) of values to be inserted in this row.
Alternative
You can try to change syntax and insert all data in one shot using ** syntax:
values = zip(A,B) #instead of "for"
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.executemany(sql, **values )
Notice this approach don't use for statement. This mean all data is send to database in one call, this is more efficient.

psychopg2 to generate insert statements with variable column counts

I am attempting to insert Excel spreadsheets into a Postgres DB using a Python script with psychopg2.
The problem is not all the spreadsheets have the same number of columns, and I need the insert statement to be flexible enough so I don't have to specify them by name.
My approach is to load the columns of the spreadsheet's header row into a tuple, and likewise with the values being inserted. So for example:
sql = ''''INSERT INTO my_table (%s) VALUES (%s);'''
cur.execute(sql, (cols, vals))
where 'cols' and 'vals' are both tuples.
'cols' can have 7, 9, 10, etc. entries, again depending on how many columns the spreadsheet had.
When I attempt to run this, I get:
psycopg2.ProgrammingError: syntax error at or near "'INSERT INTO my_table
(ARRAY['"
LINE 1: 'INSERT INTO my_table...
^
Not sure if the problem is in my calling syntax, or if you simply can't do what I'm trying to do.
There's an apostrophe ' at the beginning of your sql query.
''''INSERT INTO my_table (%s) VALUES (%s);'''
should be
'''INSERT INTO my_table (%s) VALUES (%s);'''
Edit: didn't realize you where trying to query columns dynamically. To do that, you should use text formatting. Asuming cols is a list:
sql = '''INSERT INTO my_table ({}) VALUES (%s)'''.format(','.join(cols))
Then, your execution would be:
cur.execute(sql, (vals,))

psycopg2 interpolate table name in executemany statement

I am trying to insert data into a table. The table is determined in the beging of the program and remains constant throughout. How do I interpolate the table name in an execute many statement like the one below?
tbl = 'table_name'
rows = [{'this':x, 'that': x+1} for x in range(10)]
cur.executemany("""INSERT INTO %(tbl)s
VALUES(
%(this)s,
%(that)s
)""", rows)
As stated in the official documentation: "Only query values should be bound via this method: it shouldn’t be used to merge table or field names to the query. If you need to generate dynamically an SQL query (for instance choosing dynamically a table name) you can use the facilities provided by the psycopg2.sql module."
It has the following syntax:
from psycopg2 import sql
tbl = 'table_name'
rows = [{'this':x, 'that': x+1} for x in range(10)]
cur.execute(
sql.SQL("INSERT INTO {} VALUES (%(this)s, %(that)s);"""")
.format(sql.Identifier(tbl)), rows)
More on http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql

Brief syntax for inserting row into sqlite database using python sqlite3

I have a csv file of size 360x120 that I want to import into my sqlite database row by row. For one row, I know that below syntax works if mytuple has 2 elements:
import sqlite3
conn = sqlite3.connect(dbLoc)
cur = conn.cursor()
mytuple = (a, b, c, ...) #some long tuple of 120 elements
cur.execute('INSERT INTO tablename VALUES (?, ?)', mytuple)
Problem is, my rows contain 120 columns and I can't really go type 120 question marks into the cur.execute() line. Actually I have, it works but yeah, it is not a good solution. One thing I have tried was:
cur.execute('INSERT INTO tablename VALUES ?', mytuple)
Thought it would just do ?=mytuple and replace ? with mytuple but it doesn't do that. A user comment on the article sebastianraschka.com/Articles/2014_sqlite_in_python_tutorial.html shows such syntax, which would work for me but it does not:
t = ('RHAT',)
c.execute('SELECT * FROM stocks WHERE symbol=?', t)
As seen here he's able to replace a tuple into the execute string with a single ? used. How can I achieve the same with INSERT INTO tablename?
sqlite3 doesn't support more concise syntax:
c.execute('INSERT INTO tablename VALUES ({}?)'.format('?,'*(len(t) - 1)), t)
Note: the default SQLITE_MAX_COLUMN is 2000. And some algorithms in SQLite are O(n**2) in the number of columns i.e., if you increase the limit; it may slow down db operations.

python db insert

I am in facing a performance problem in my code.I am making db connection a making a select query and then inserting in a table.Around 500 rows in one select query ids populated .Before inserting i am running select query around 8-9 times first and then inserting then all using cursor.executemany.But it is taking 2 miuntes to insert which is not qood .Any idea
def insert1(id,state,cursor):
cursor.execute("select * from qwert where asd_id =%s",[id])
if sometcondition:
adding.append(rd[i])
cursor.executemany(indata, adding)
where rd[i] is a aray for records making and indata is a insert statement
#prog start here
cursor.execute("select * from assd")
for rows in cursor.fetchall()
if rows[1]=='aq':
insert1(row[1],row[2],cursor)
if rows[1]=='qw':
insert2(row[1],row[2],cursor)
I don't really understand why you're doing this.
It seems that you want to insert a subset of rows from "assd" into one table, and another subset into another table?
Why not just do it with two SQL statements, structured like this:
insert into tab1 select * from assd where asd_id = 42 and cond1 = 'set';
insert into tab2 select * from assd where asd_id = 42 and cond2 = 'set';
That'd dramatically reduce your number of roundtrips to the database and your client-server traffic. It'd also be an order of magnitude faster.
Of course, I'd also strongly recommend that you specify your column names in both the insert and select parts of the code.

Categories

Resources