How do I improve the performance of python insert into Postgres

How do I improve the performance of python insert into Postgres - python

Using execute 40 inserts per minute
Using executemany 41 inserts per minute
Using extras.execute_Values 42 inserts per minute
def save_return_to_postgres(record_to_insert) -> Any:
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %sreturning id"""
records = (record_to_insert[2],record_to_insert[1],record_to_insert[3],record_to_insert[4],record_to_insert[0],datetime.datetime.now())
# df = df[["description","code","unit","price","created_date","updated_date"]]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
valores = df.values
for valor in valores:
save_return_to_postgres(valor)
print(valor)

I don't know how much lines-per-insert postgres can take
But many SQL-based databases can take multiples inserts at the same time.
So instead of running
for insert_query in queries:
sql_execute(insert_query)
Try making several inserts at once in a single command
(Test it on pure SQL first to see if it works)
insert_list=[]
for insert_query in queries:
insert_list.append(insert_query)
sql_execute(insert_list)
I had a similar issue and this link helped me
https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-multiple-rows/
(of course mine was not Postgres but the idea is the same,
decrease internet time by running multiple inserts in one command)
Tamo Junto

Use execute_batch or execute_values and use them over the entire record set. As of now you are not using the batch capabilities of execute_values because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don't have the actual data and am assuming what df.values is.
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %s returning id"""
#execute_batch query
#insert_query = """INSERT INTO pricing.xxxx #(description,code,unit,price,created_date,updated_date)
# VALUES (%s, %s, %s, %s, %s, %s) returning id"""
valores = df.values
#Create a list of lists to pass to query as a batch instead of singly.
records = [[record_to_insert[2],record_to_insert[1],record_to_insert[3],
record_to_insert[4],record_to_insert[0],datetime.datetime.now()]
for record_to_insert in valores]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
#execute_batch
#extras.execute_batch(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
For more information see Fast execution helpers. Note that both the execute_values and execute_batch functions have a page_size argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing the page_size to make bigger batches and reduce the number of server round trips .

Related

python MySQL insert big data

using python,I am looping through csv file to read data, then I am ding some modifications on the readied row and call a save function to insert the modified data into MySQL.
def save(Id, modifiedData,):
try:
mydb = mysql.connector.connect(host="localhost",user="use",password="pass",database="data")
sql = "INSERT INTO data (Id, modifiedData) VALUES (%s, %s)"
recordTuple = (Id, modifiedData)
mycursor = mydb.cursor()
mycursor.execute(sql,recordTuple)
mydb.commit()
print("Record inserted successfully into table")
except mysql.connector.Error as error:
print("Failed to insert into MySQL table {}".format(error))
def main():
for row in csv:
#modify row
#creat Id
save(Id, modifiedData,)
but I don't think this is good solution to do MYSQL connection and insert data with each iteration, it will be time and resources consuming , specially when I move to real server in production
how can I improve my solution?

Ideally, connections should be managed by connection pool, should be committed bulky. But amount of csv at most, need not to mind so much. Anyway, If you don't wanna bother it, I recommend using ORM like SQLAlchemy.

You only need to create the connection once, and that should be in function main, who then passes the connection to function save as follows:
def save(mydb, Id, modifiedData):
try:
sql = "INSERT INTO data (Id, modifiedData) VALUES (%s, %s)"
recordTuple = (Id, modifiedData)
mycursor = mydb.cursor()
mycursor.execute(sql,recordTuple)
mydb.commit()
print("Record inserted successfully into table")
except mysql.connector.Error as error:
print("Failed to insert into MySQL table {}".format(error))
def main():
try:
mydb = mysql.connector.connect(host="localhost",user="use",password="pass",database="data")
except mysql.connector.Error as error:
print("Failed to create connection: {}".format(error))
return
for row in csv:
#modify row
#creat Id
save(mydb, Id, modifiedData)
For perhaps even greater performance you can try executemany:
def save(mydb, modified_records):
try:
sql = "INSERT INTO data (Id, modifiedData) VALUES (%s, %s)"
mycursor = mydb.cursor()
mycursor.executemany(sql, modified_records)
mydb.commit()
print("Records inserted successfully into table")
except mysql.connector.Error as error:
print("Failed to insert into MySQL table {}".format(error))
def main():
try:
mydb = mysql.connector.connect(host="localhost",user="use",password="pass",database="data")
except mysql.connector.Error as error:
print("Failed to create connection: {}".format(error))
return
modified_records = []
for row in csv:
#modify row
#creat Id
modified_records.append([id, modifiedData])
save(mydb, modified_records)

Syntax error near (%s %s %s) using mysql.connector for Python

I have several queries. Most of my insert queries work and they follow the same format as this one (I even copy pasted and modified as needed). For some reason, this query is throwing me a syntax error and I'm not sure why.
I've looked at the various solutions on SO for resolving this error. Seems like people get it for a variety of different reasons, and I'm not sure exactly what the reason is for this error being thrown at me. I can't see the problem with the statement.
def set_prices(game_name, vendor_id, vendor_name, game_id, game_platform, vendor_store_link, current_price_at_vendor, previous_price_at_vendor, historical_low_at_vendor):
# Create the query
insert_price_data = """ INSERT INTO game_vendors
(game_name, vendor_id, vendor_name, game_id, game_platform, vendor_store_link, current_price_at_vendor, previous_price_at_vendor, historical_low_at_vendor, historical_low_date)
VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s); """
vals = (game_name, vendor_id, vendor_name, game_id, game_platform, vendor_store_link, current_price_at_vendor, previous_price_at_vendor, historical_low_at_vendor, datetime.datetime.now().strftime("%Y/%m/%d").replace('/', '-'))
send_to_db(insert_price_data, vals)
set_prices("borderlands 3", "1", "xbox", "26", "xbox one", "xbox.com", "45", "45", "25")
I wasn't expected to receive this error, as it had not occurred with any of my previous SQL statements.
"1064 (42000): You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to
use near '%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)' at line 4"
I bet it is something minor I'm overlooking.
Update send_to_db() function:
def send_to_db(query, multi_result=False, vals=None):
# this function opens a database query for a connection
# build engine for database
db_engine = mysql.connector.connect(
host="localhost",
user="root",
passwd="",
database="mydb",
)
# Initiate the cursor
cursor = db_engine.cursor()
# row = None
# rows = None
try:
if vals:
# insert queries require vals
cursor.execute(query, vals)
db_engine.commit()
else:
# select queries which don't require vals
cursor.execute(query)
if multi_result:
rows = cursor.fetchall()
return rows
else:
row = cursor.fetchone()
return row
except mysql.connector.Error as error:
print(error)
finally:
if db_engine.is_connected():
cursor.close()
db_engine.close()
print("MySQL connection is closed")
db_engine.disconnect()

pyodbc not updating table

I query a table then loop through it to Update another table.
The console Prints shows correct data.
Not sure how to debug the cursor.execute for the UPDATE query.
It is not updating on the table. It's not a permission issue. If I run update command on my SQL workbench it works fine.
cursor = conn.cursor()
cursor.execute("Select Account_Name FROM dsf_CS_WebAppView")
for row in cursor.fetchall():
try:
cursor.execute("Select fullpath FROM customerdesignmap WHERE
fullpath LIKE '%{}%'".format(row.Account_Name))
rows = cursor.fetchall()
print(len(cursor.fetchall()))
if len(rows) > 0:
for rowb in rows:
print(rowb.fullpath)
print(row.Account_Name)
if len(row.Account_Name) > 2:
cursor.execute("UPDATE customerdesignmap SET householdname = {}, msid = {} WHERE fullpath LIKE '{}'".format(row.Account_Name, row.UniqueProjectNumber, rowb.fullpath))
conn.commit()
except:
pass

Consider a pure SQL solution as SQL Server supports UPDATE and JOIN across multiple tables. This avoids the nested loops, cursor calls, and string formatting of SQL commands.
UPDATE m
SET m.householdname = v.Account_Name,
m.msid = v.UniqueProjectNumber
FROM customerdesignmap m
JOIN dsf_CS_WebAppView v
ON m.fullpath LIKE CONCAT('%', v.Account_Name, '%')
In Python, run above in a single cursor.execute() with commit() call.
cursor.execute('''my SQL Query''')
conn.commit()

Retrieve data from sql server database using Python

I am trying to execute the following script. but I don't get neither the desired results nor a error message ,and I can't figure out where I'm doing wrong.
import pyodbc
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=mySRVERNAME;"
"Database=MYDB;"
"uid=sa;pwd=MYPWD;"
"Trusted_Connection=yes;")
cursor = cnxn.cursor()
cursor.execute('select DISTINCT firstname,lastname,coalesce(middlename,\' \') as middlename from Person.Person')
for row in cursor:
print('row = %r' % (row,))
any ideas ? any help is appreciated :)

You have to use a fetch method along with cursor. For Example
for row in cursor.fetchall():
print('row = %r' % (row,))
EDIT :
The fetchall function returns all remaining rows in a list.
If there are no rows, an empty list is returned.
If there are a lot of rows, *this will use a lot of memory.*
Unread rows are stored by the database driver in a compact format and are often sent in batches from the database server.
Reading in only the rows you need at one time will save a lot of memory.
If we are going to process the rows one at a time, we can use the cursor itself as an interator
Moreover we can simplify it since cursor.execute() always returns a cursor :
for row in cursor.execute("select bla, anotherbla from blabla"):
print row.bla, row.anotherbla
Documentation

I found this information useful to retrieve data from SQL database to python as a data frame.
import pandas as pd
import pymssql
con = pymssql.connect(server='use-et-aiml-cloudforte-aiops- db.database.windows.net',user='login_username',password='login_password',database='database_name')
cursor = con.cursor()
query = "SELECT * FROM <TABLE_NAME>"
cursor.execute(query)
df = pd.read_sql(query, con)
con.close()
df

import mysql.connector as mc
connection creation
conn = mc.connect(host='localhost', user='root', passwd='password')
print(conn)
#create cursor object
cur = conn.cursor()
print(cur)
cur.execute('show databases')
for i in cur:
print(i)
query = "Select * from employee_performance.employ_mod_recent"
emp_data = pd.read_sql(query, conn)
emp_data

MySQL did not give an error, but none of the rows got filled

I tried to fill a table in a database using MySQLdb. It did not give any errors, and once gave the warning
main.py:23: Warning: Data truncated for column 'other_id' at row 1
cur.execute("INSERT INTO map VALUES(%s,%s)",(str(info[0]).replace('\n',''), str(info[2].replace('\n','').replace("'",""))))
so I thought it was working fine. However, when it was finished and I did a row count it turned out that nothing was added. Why was the data not added to the database? The code is below
def fillDatabase():
db = MySQLdb.connect(host="127.0.0.1",
user="root",
passwd="",
db="uniprot_map")
cur = db.cursor()
conversion_file = open('idmapping.dat')
for line in conversion_file:
info = line.split('\t')
cur.execute("INSERT INTO map VALUES(%s,%s)",(str(info[0]).replace('\n',''), str(info[2].replace('\n','').replace("'",""))))
def test():
db = MySQLdb.connect(host="127.0.0.1",
user="root",
passwd="",
db="uniprot_map")
cur = db.cursor()
cur.execute("SELECT COUNT(*) FROM map")
rows = cur.fetchall()
for row in rows:
print row
def main():
fillDatabase()
test()

You need to do a db.commit() after adding all of the entries. Even if the update is not transactional, the DBAPI imposes an implicit transaction on every change.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I improve the performance of python insert into Postgres - python

Related

python MySQL insert big data

Syntax error near (%s %s %s) using mysql.connector for Python

pyodbc not updating table

Retrieve data from sql server database using Python

MySQL did not give an error, but none of the rows got filled

Categories

Resources