I am in facing a performance problem in my code.I am making db connection a making a select query and then inserting in a table.Around 500 rows in one select query ids populated .Before inserting i am running select query around 8-9 times first and then inserting then all using cursor.executemany.But it is taking 2 miuntes to insert which is not qood .Any idea
def insert1(id,state,cursor):
cursor.execute("select * from qwert where asd_id =%s",[id])
if sometcondition:
adding.append(rd[i])
cursor.executemany(indata, adding)
where rd[i] is a aray for records making and indata is a insert statement
#prog start here
cursor.execute("select * from assd")
for rows in cursor.fetchall()
if rows[1]=='aq':
insert1(row[1],row[2],cursor)
if rows[1]=='qw':
insert2(row[1],row[2],cursor)
I don't really understand why you're doing this.
It seems that you want to insert a subset of rows from "assd" into one table, and another subset into another table?
Why not just do it with two SQL statements, structured like this:
insert into tab1 select * from assd where asd_id = 42 and cond1 = 'set';
insert into tab2 select * from assd where asd_id = 42 and cond2 = 'set';
That'd dramatically reduce your number of roundtrips to the database and your client-server traffic. It'd also be an order of magnitude faster.
Of course, I'd also strongly recommend that you specify your column names in both the insert and select parts of the code.
Related
I'm using pyobdc with an application that requires me to insert >1000 rows normally which I currently do individually with pyobdc. Though this tends to take >30 minutes to finish. I was wondering if there are any faster methods that could do this < 1 minute. I know you can use multiple values in an insert commands but according to this (Multiple INSERT statements vs. single INSERT with multiple VALUES) it would possibly be even slower.
The code currently looks like this.
def Insert_X(X_info):
columns = ', '.join(X_info.keys())
placeholders = ', '.join('?' * len(X_info.keys()))
columns = columns.replace("'","")
values = [x for x in X_info.values()]
query_string = f"INSERT INTO X ({columns}) VALUES ({placeholders});"
with conn.cursor() as cursor:
cursor.execute(query_string,values)
With Insert_X being called >1000 times.
I would like to insert multiple rows with one insert statement.
I tried with
params = ((1, 2), (3,4), (5,6))
sql = 'insert into tablename (column_name1, column_name2) values (?, ?)'
cursor.fast_executemany = True
cursor.executemany(sql, params)
but it's simple loop on the params with running execute method under the hood.
I also tried with creating longer insert statement to be like INSERT INTO tablename (col1, col2) VALUES (?,?), (?,?)...(?,?).
def flat_map_list_of_tuples(list_of_tuples):
return [element for tupl in list_of_tuples for element in tupl])
args_str = ', '.join('(?,?)' for x in params)
sql = 'insert into tablename (column_name1, column_name2) values'
db.cursor.execute(sql_template + args_str, flat_map_list_of_tuples(params))
It worked and reduced time of insertion from 10.9s to 6.1.
Is this solution correct? Does it have some vulnerabilities?
Is this solution correct?
The solution you propose, which is to build a table value constructor (TVC), is not incorrect but it is really not necessary. pyodbc with fast_executemany=True and Microsoft's ODBC Driver 17 for SQL Server is about as fast as you're going to get short of using BULK INSERT or bcp as described in this answer.
Does it have some vulnerabilities?
Since you are building a TVC for a parameterized query you are protected from SQL Injection vulnerabilities, but there are still a couple of implementation considerations:
A TVC can insert a maximum of 1000 rows at a time.
pyodbc executes SQL statements by calling a system stored procedure, and stored procedures in SQL Server can accept a maximum of 2100 parameters, so the number of rows that your TVC can insert is also limited to (number_of_rows * number_of_columns < 2100).
In other words, your TVC approach will be limited to a "chunk size" of 1000 rows or less. The actual calculation is described in this answer.
I have a python program in which I want to read the odd rows from one table and insert them into another table. How can I achieve this?
For example, the first table has 5 rows in total, and I want to insert the first, third, and fifth rows into another table.
Note that the table may contains millions of rows, so the performance is very important.
I found a few methods here. Here's two of them transcribed to psycopg2.
If you have a sequential primary key, you can just use mod on it:
database_cursor.execute('SELECT * FROM table WHERE mod(primary_key_column, 2) = 1')
Otherwise, you can use a subquery to get the row number and use mod:
database_cursor.execute('''SELECT col1, col2, col3
FROM (SELECT row_number() OVER () as rnum, col1, col2, col3
FROM table)
WHERE mod(rnum, 2) = 1''')
If you have an id-type column that is guaranteed to increment by 1 upon every insert (kinda like an auto-increment index), you could always mod that to select the row. However, this would break when you begin to delete rows from the table you are selecting from.
A more complicated solution would be to use postgresql's row_number() function. The following assumes you have an id column that can be used to sort the rows in the desired order:
select r.*
from (select *,row_number() over(order by id) as row
from <tablename>
) r
where r.row % 2 = 0
Note: regardless of how you do it, the performance will NEVER really be efficient as you necessarily have to do a full table scan, and selecting all columns on a table with millions of records using a full table scan is going to be slow.
I have a database table with 3 columns (A,B,C). I want to add some rows in the table, for that i am going to take input from user by making a 'textentrydialog' like this https://pastebin.com/0JYm5x6e. But the problem is that i want to add multiple rows in table for the multiple values of 'A' but values of B and C are same (For example)
B = Ram
C = Aam
A = s,t,k
So the values in table should insert in this way:
(s,Ram,Aam)
(t,Ram,Aam)
(k,Ram,Aam)
Can someone please help with this how can i insert?
Here is a proposal, able to create the output you have shown, with the input you have shown.
Note that I assume you insist on the way to input, which implies using a single table.
If you can accept different input, I recommend to use two tables.
One with (id, A, C) one with (id, B) and then query by using join using(id).
A MCVE for this is at the end of the answer. It contains some additional test cases, which I made up to demonstrate that it does not only give output for given input, trying to guess obvious usecases.
Query:
select A, group_concat(B), C
from toy
group by A,C;
Output:
Mar|t,u|Aam
Ram|s,t,k|Aam
Ram|k,s,m|Maa
MCVE:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE toy (A varchar(10), B varchar(10), C varchar(10));
INSERT INTO toy VALUES('Ram','s','Aam');
INSERT INTO toy VALUES('Ram','t','Aam');
INSERT INTO toy VALUES('Ram','k','Aam');
INSERT INTO toy VALUES('Mar','t','Aam');
INSERT INTO toy VALUES('Mar','u','Aam');
INSERT INTO toy VALUES('Ram','k','Maa');
INSERT INTO toy VALUES('Ram','s','Maa');
INSERT INTO toy VALUES('Ram','m','Maa');
COMMIT;
Im using Python to query a SQL database. I'm fairly new with databases. I've tried looking up this question, but I can't find a similar enough question to get the right answer.
I have a table with multiple columns/rows. I want to find the MAX of a single column, I want ALL columns returned (the entire ROW), and I want only one instance of the MAX. Right now I'm getting ten ROWS returned, because the MAX is repeated ten times. I only want one ROW returned.
The query strings I've tried so far:
sql = 'select max(f) from cbar'
# this returns one ROW, but only a single COLUMN (a single value)
sql = 'select * from cbar where f = (select max(f) from cbar)'
# this returns all COLUMNS, but it also returns multiple ROWS
I've tried a bunch more, but they returned nothing. They weren't right somehow. That's the problem, I'm too new to find the middle ground between my two working query statements.
In SQLite 3.7.11 or later, you can just retrieve all columns together with the maximum value:
SELECT *, max(f) FROM cbar;
But your Python might be too old. In the general case, you can sort the table by that column, and then just read the first row:
SELECT * FROM cbar ORDER BY f DESC LIMIT 1;