I am grabbing json data from a messaging bus and dumping that json into a database. I've had this working pretty well with psycopg2 doing ~3000 entries/sec into postgres. For a number of reason we've since moved to SQL Server 2016, and my entries dropped to around 100 per second.
I've got a functioned called insert_into() that inserts the json to the database. All I've really done to my insert_into() function is change the library to pyodbc and the connection string. It seems that my slow down is coming from setting up then tearing down my connection each time my function is called ('conn' in the code below). If I move the line that setup the connection outside of my insert_into function, my speed comes back. I was just wondering two things:
Whats the proper way to setup connections like this from a SQL Server perspective?
Is this even the best way to do this in postrgres?
For SQL Server, the server is 2016, using ODBC driver 17, SQL authentication.
Slow for SQL Server:
def insert_into():
conn = None
try:
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=server1;DATABASE=json;UID=user;PWD=pass')
cur = conn.cursor()
for i in buffer_list:
command = 'INSERT INTO jsonTable (data) VALUES (%s)' % ("'" + i + "'")
cur.execute(command)
cur.close()
conn.commit()
except (Exception, pyodbc.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
Fast for SQL Server:
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER=server1;DATABASE=json;UID=user;PWD=pass')
def insert_into():
#conn = None
try:
cur = conn.cursor()
for i in buffer_list:
command = 'INSERT INTO jsonTable (data) VALUES (%s)' % ("'" + i + "'")
cur.execute(command)
cur.close()
conn.commit()
except (Exception, pyodbc.DatabaseError) as error:
print(error)
This daemon runs 24/7 and any advice on setting up a fast connection to MSSQL will be greatly appreciated.
Related
I am reading from a Microsoft SQL server instance. I need to read all data from a table, which is quite big (~4 million records). So I like to do that in chunks, so I can limit the memory usage of my Python program.
This works fine normally, but now I need to move where this runs, which forces it go over a not super stable connection (I believe VPN is sometimes throttling the connection). So occasionally I get a connection error in one of the chunks:
sqlalchemy.exc.OperationalError: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (
SQLGetData)')
The code I run comes down to this:
import pandas as pd
from sqlalchemy import create_engine
connection_string = 'mssql+pyodbc://DB_USER:DB_PASSWORD#DB_HOST/DB_NAME?trusted_connection=no&driver=ODBC+Driver+17+for+SQL+Server'
db = create_engine(connection_string, pool_pre_ping=True)
query = 'SELECT * FROM table'
for chunk in pd.read_sql_query(query, db, chunksize=500_000):
# do stuff with chunk
What I would like to know: is it possible to add a retry mechanism that can continue with the correct chunk if the connection fails? I've tried a few options, but none of them seem to be able to recover and continue at the same chunk.
query = 'SELECT * FROM table'
is a bad practice
always filter by the fields you need and process in chunks of 500 records
https://www.w3schools.com/sql/sql_top.asp
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;
I feel your pain. My VPN is the same. I'm not sure if this is a viable solution for you, but you can try this technique.
retry_flag = True
retry_count = 0
cursor = cnxn.cursor()
while retry_flag and retry_count < 5:
try:
cursor.execute('SELECT too_id FROM [TTMM].[dbo].[Machines] WHERE MachineID = {}'.format (machineid,))
too_id = cursor.fetchone()[0]
cursor.execute('INSERT INTO [TTMM].[dbo].[{}](counter, effectively, too_id) VALUES ({},{},{})'.format (machineid, counter, effectively, too_id,))
retry_flag = False
print("Printed To DB - Counter = ", counter, ", Effectively = ", effectively, ", too_id = ", too_id,)
except Exception as e:
print (e)
print ("Retry after 5 sec")
retry_count = retry_count + 1
cursor.close()
cnxn.close()
time.sleep(5)
cnxn = pyodbc.connect('DRIVER=FreeTDS;SERVER=*;PORT=*;DATABASE=*;UID=*;PWD=*;TDS_Version=8.7;', autocommit=True)
cursor = cnxn.cursor()
cursor.close()
How to retry after sql connection failed in python?
I am writing a Python script that will read data from a SQL Server database. For this I have used pyodbc to connect to SQL Server on Windows (my driver is ODBC Driver 17 for SQL Server).
My script works fine, but I need to use a connection pool instead of a single connection to manage resources more effectively. However the documentation for pyodbc only mentions pooling without providing examples of how connection pooling can be implemented. Any ideas of how this can be done using Python while connecting to an SQL Server? I only found solutions for PostgreSQL that use psycopg2, but this does not work for me obviously.
At the moment my code looks like this (please disregard the missing indentation which happened when copying the file from my IDE):
def get_limited_rows(size):
try:
server = 'here-is-IP-address-of-servier'
database = 'here-is-my-db-name'
username = 'here-is-my-username'
password = 'here-is-my-password'
conn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+password)
cursor = conn.cursor()
print('Connected to database')
select_query = 'select APPN, APPD from MAIN'
cursor.execute(select_query)
while True:
records = cursor.fetchmany(size)
if not records:
cursor.close()
sys.exit("Completed")
else:
for record in records:
print(record)
time.sleep(10)
except pyodbc.Error as error:
print('Error reading data from table', error)
finally:
if (conn):
conn.close()
print('Data base connection closed')
I'm running Ubuntu 16.04 with MySQL. I've opened the MySQL server for remote connections, and my remote Python script can query my database, but all attempts to INSERT fail without any error log entry.
It also looks like my remote INSERTs are being seen, because my AUTO_INCREMENT ID increases without entries being made when I run the Python INSERT code.
Any insight is appreciated!
Simple table schema:
CREATE TABLE test (
ID int NOT NULL AUTO_INCREMENT,
x INT,
PRIMARY KEY (ID)
);
This works directly on the server:
INSERT INTO test (x) VALUES (10);
This is the Python query that's working:
try:
connection = db.Connection(host=HOST, port=PORT, user=USER, passwd=PASSWORD, db=DB)
cursor = connection.cursor()
print("Connected to Server")
cursor.execute("SELECT * FROM test")
result = cursor.fetchall()
for item in result:
print(item)
except Exception as e:
print('exception connecting to server db: ' + str(e))
finally:
print('closing connection...')
connection.close()
And the Python INSERT that's not working:
try:
connection = db.Connection(host=HOST, port=PORT, user=USER, passwd=PASSWORD, db=DB)
cursor = connection.cursor()
print("Connected to Server")
cursor.execute("INSERT INTO test (x) VALUES (10);")
except Exception as e:
print('exception connecting to server db: ' + str(e))
finally:
print('closing connection...')
connection.close()
Thanks
Add this line after the execute() call:
cursor.execute("INSERT INTO test (x) VALUES (10)")
connection.commit()
When making changes to the db, it is required that you commit your changes, no change(s) would take effect.
I am getting below error
query = command % processed_params TypeError: not all arguments
converted during string formatting
I am trying to pull data from SQL server and then inserting it into Snowflake
my below code
import pyodbc
import sqlalchemy
import snowflake.connector
driver = 'SQL Server'
server = 'tanmay'
db1 = 'testing'
tcon = 'no'
uname = 'sa'
pword = '123'
cnxn = pyodbc.connect(driver='{SQL Server}',
host=server, database=db1, trusted_connection=tcon,
user=uname, password=pword)
cursor = cnxn.cursor()
cursor.execute("select * from Admin_tbldbbackupdetails")
rows = cursor.fetchall()
#for row in rows:
# #data = [(row[0], row[1],row[2], row[3],row[4], row[5],row[6], row[7])]
print (rows[0])
cnxn.commit()
cnxn.close()
connection = snowflake.connector.connect(user='****',password='****',account='*****')
cursor2 = connection.cursor()
cursor2.execute("USE WAREHOUSE FOOD_WH")
cursor2.execute("USE DATABASE Test")
sql1="INSERT INTO CN_RND.Admin_tbldbbackupdetails_ip"
"(id,dbname, dbpath, backupdate, backuptime, backupStatus, FaildMsg, Backupsource)"
"values (?,?,?,?,?,?,?,?)"
cursor2.execute(sql1,*rows[0])
It's obviously string parsing error.
You missed to provide parameter to %s printout.
If you cannot fix it step back and try another approach.
Use another script to achieve the same and get back to you bug tomorrow :-)
My script is doing pretty much the same:
1. Connect to SQL Server
-> fetchmany
-> multipart upload to s3
-> COPY INTO Snowflake table
Details are here: Snowpipe-for-SQLServer
Is it okay to use a single MySQLdb connection for multiple transactions without closing the connection between them? In other words, something like this:
conn = MySQLdb.connect(host="1.2.3.4", port=1234, user="root", passwd="x", db="test")
for i in range(10):
try:
cur = conn.cursor()
query = "DELETE FROM SomeTable WHERE ID = %d" % i
cur.execute(query)
cur.close()
conn.commit()
except Exception:
conn.rollback()
conn.close()
It seems to work okay, but I just wanted to double check.
I think there is a misunderstanding about what constitutes a transaction here.
Your example opens up one connection, then executes one transaction on it. You execute multiple SQL statements in that transaction, but you close it completely after committing. Of course that's more than fine.
Executing multiple transactions (as opposed to just SQL statements), looks like this:
conn = MySQLdb.connect(host="1.2.3.4", port=1234, user="root", passwd="x", db="test")
for j in range(10):
try:
for i in range(10):
cur = conn.cursor()
query = "DELETE FROM SomeTable WHERE ID = %d" % i
cur.execute(query)
cur.close()
conn.commit()
except Exception:
conn.rollback()
conn.close()
The above code commits 10 transactions, each consisting of 10 individual delete statements.
And yes, you should be able to re-use the open connection for that without problems, as long as you don't share that connection between threads.
For example, SQLAlchemy re-uses connections by pooling them, handing out open connections as needed to the application. New transactions and new statements are executed on these connections throughout the lifetime of an application, without needing to be closed until the application is shut down.
It would be better to first build a query string and then execute that single MySQL statement. For example:
query = "DELETE FROM table_name WHERE id IN ("
for i in range(10):
query = query + "'" + str(i) + "', "
query = query[:-2] + ')'
cur = conn.cursor()
cur.execute(query)