I'm using a few apps running Tornado Web server which all connect to a MySql DB using mysqldb. When I spin up the server, it instantiates a DB class (below) which opens a connection to the DB. All transactions are made using this same connection - which I'm not sure is a good idea.
class RDSdb(object):
def __init__(self):
self.connect()
def connect(self):
self.connection = MySQLdb.connect(cursorclass = MySQLdb.cursors.SSDictCursor, host=self.RDS_HOST,
user=self.RDS_USER, passwd=self.RDS_PASS, db=self.RDS_DB)
def get_cursor(self):
try:
cursor = self.connection.cursor()
except (AttributeError, MySQLdb.OperationalError):
self.connect()
cursor = self.connection.cursor()
return cursor
def fetch_by_query(self, query):
cursor = self.get_cursor()
cursor.execute(query)
result = cursor.fetchall()
cursor.close()
return result
I'm pretty sure I shouldn't open/close a new connection for every transaction, but then, when should I?
I noticed something else that's a bit off, which I'm certain is related : when I need to update one of my db table's schema (ex : alter table), the whole table in question gets locked and unresponsive - until I kill my 3 apps with open connections to the DB - I realize that one of those connections was holding up this update.
Best practices when it comes to this? Ideas?
thanks.
Related
Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results
I'm using psycopg2 library to connection to my postgresql database.
Every time I want to execute any query, I make a make a new connection like this:
import psycopg2
def run_query(query):
with psycopg2.connect("dbname=test user=postgres") as connection:
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
But I think it's faster to make one connection for whole app execution like this:
import psycopg2
connection = psycopg2.connect("dbname=test user=postgres")
def run_query(query):
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
So which is better way to connect my database during all execution time on my app?
I've tried both ways and both worked, but I want to know which is better and why.
You should strongly consider using a connection pool, as other answers have suggested, this will be less costly than creating a connection every time you query, as well as deal with workloads that one connection alone couldn't deal with.
Create a file called something like mydb.py, and include the following:
import psycopg2
import psycopg2.pool
from contextlib import contextmanager
dbpool = psycopg2.pool.ThreadedConnectionPool(host=<<YourHost>>,
port=<<YourPort>>,
dbname=<<YourDB>>,
user=<<YourUser>>,
password=<<YourPassword>>,
)
#contextmanager
def db_cursor():
conn = dbpool.getconn()
try:
with conn.cursor() as cur:
yield cur
conn.commit()
"""
You can have multiple exception types here.
For example, if you wanted to specifically check for the
23503 "FOREIGN KEY VIOLATION" error type, you could do:
except psycopg2.Error as e:
conn.rollback()
if e.pgcode = '23503':
raise KeyError(e.diag.message_primary)
else
raise Exception(e.pgcode)
"""
except:
conn.rollback()
raise
finally:
dbpool.putconn(conn)
This will allow you run queries as so:
import mydb
def myfunction():
with mydb.db_cursor() as cur:
cur.execute("""Select * from blahblahblah...""")
Both ways are bad. The fist one is particularly bad, because opening a database connection is quite expensive. The second is bad, because you will end up with a single connection (which is too few) one connection per process or thread (which is usually too many).
Use a connection pool.
I have written a Python Tool with an wxPython GUI which has mainly the task to get a lot of user input regarding Customer Data, Product Data and so on and save it to a SQL Database, at the moment locally with a SQLite3 Database for testing an now switching to MS Azure to have anybody work in the same Database.
As i now plan to use a MS Azure SQL DB i have a few questions an i am hoping this is the right place to ask:
What is the best library to connect to Azure via Python? I found
pyodbc and pymssql but i think both need to have an extra driver
installed? Is this true and is this a problem in real usecases?
I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
So my code looks like this most of the time:
import wx
import sqlite3
SQL_PATH = "Database_Test.db"
class ManageCustomerToDB(wx.Dialog):
def __init__(self, *args, **kw):
super(ManageCustomerToDB, self).__init__(*args, **kw)
def InitUI(self):
#[GUI an so on...]
# I do this on time inside a module:
conn = sqlite3.connect(SQL_PATH)
self.c = conn.cursor()
# Use functions like the ones below...
def GetCustomerData(self):
self.c.execute("SELECT * FROM Customer WHERE CustomerID = ?", (self.tc_customer_id.GetValue(),))
customer_data = self.c.fetchall()
# Do something with Customer Data
def GetPersonData(self):
self.c.execute("SELECT * FROM Person WHERE PersonID = ?", (self.tc_person_id.GetValue(),))
person_data = self.c.fetchall()
# Do something with Person Data
I hope this example shows what i do. Are there any bigger mistakes i do?
After a read in SQL I dont have to close the DB in any way?
Thanks for your help and let me know if i can improve my question or give more details.
It is not a good idea to create a new connection to Azure SQL every time you CRUD. This is a waste of resources, and when the number of accesses reaches a certain number, it will have a large impact on the performance of mssql.
I suggest you use database connection pool. The pool manager will initial several connections to SQL Server instance, and then reuse these connections when requested.
There is an existing package which you can take advantage of. It is DBUtils. You can use the PoolDB from it with pyodbc together.
A sample for showing how database connection pool works:
import pyodbc
from DBUtils.PooledDB import PooledDB
class Database:
def __init__(self, server, driver, port, database, username, password):
self.server = server
self.driver = driver
self.port = port
self.database = database
self.username = username
self.password = password
self._CreatePool()
def _CreatePool(self):
self.Pool = PooledDB(creator=pyodbc, mincached=2, maxcached=5, maxshared=3, maxconnections=6, blocking=True, DRIVER=self.driver, SERVER=self.server, PORT=self.port, DATABASE=self.database, UID=self.username, PWD=self.password)
def _Getconnect(self):
self.conn = self.Pool.connection()
cur = self.conn.cursor()
if not cur:
raise "connection error"
else:
return cur
# query sql
def ExecQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
relist = cur.fetchall()
cur.close()
self.conn.close()
return relist
# non-query sql
def ExecNoQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
self.conn.commit()
cur.close()
self.conn.close()
def main():
server = 'jackdemo.database.windows.net'
database = 'jackdemo'
username = 'jack'
port=1433
password = '*********'
driver= '{ODBC Driver 17 for SQL Server}'
ms = Database(server=server, driver=driver, port=port, database=database, username=username, password=password)
resList = ms.ExecQuery("select * from Users")
print(resList)
if __name__ == '__main__':
main()
Answers to your questions:
Q1: What is the best library to connect to Azure via Python? I found pyodbc and pymssql but i think both need to have an extra driver installed? Is this true and is this a problem in real usecases?
Answer: Both of then would be OK. ODBC stands for Open Database Connectivity, so it could be used to connect many databases. I see the Microsoft tutorial uses pyodbc, so maybe it is a better choice.
Q2: I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
Answer: Use database connection pool.
Q3: After a read in SQL I dont have to close the DB in any way?
Answer: If you use database connection pool, the connection will be put back too pool after you call close() method.
I have written a function for connecting to a database using pymysql. Here is my code:
def SQLreadrep(sql):
connection=pymysql.connect(host=############,
user=#######,
password=########,
db=#########)
with connection.cursor() as cursor:
cursor.execute(sql)
rows=cursor.fetchall()
connection.commit()
connection.close()
return rows
I pass the SQL into this function and return the rows. However, I am doing quick queries to the database. (Something like "SELECT sku WHERE object='2J4423K').
What is a way to avoid so many connections?
Should I be avoiding this many connections to begin with?
Could I crash a server using this many connections and queries?
Let me answer your last question first. Your function is acquiring a connection but it is closing it prior to returning. So, I see no reason why unless your were multithreading or multiprocessing you would ever be using more than one connection at a time and you should not be crashing the server.
The way to avoid the overhead of creating and closing so many connections would be to "cache" the connection. One way to do that would be to replace your function by a class:
import pymysql
class DB(object):
def __init__(self, datasource, db_user, db_password):
self.conn = pymysql.connect(db=datasource, user=db_user, password=db_password)
def __del__(self):
self.conn.close()
def query(self, sql):
with self.conn.cursor() as cursor:
cursor.execute(sql)
self.conn.commit()
return cursor.fetchall()
Then you instantiate an instance of the DB class and invoke its query method. When the DB instance is grabage collected, the connection will be automatically closed.
I'm trying to run a Python script which leaves a connection open permanently, and responds to changes made outside of the script.
So for example:
Data script: Accepts form posts and commits form data to the database
Worker script: Monitors the database for new form posts and takes action accordingly
The relevant code in the worker script is:
import pymysql
conn = pymysql.connect(host='127.0.0.1', port=3306, user='dbuser', passwd='dbpass', db='my_db')
def processForms(Formdat):
c = conn.cursor(pymysql.cursors.DictCursor)
myform.sendEmail(c)
conn.commit()
c.close()
def doForms():
while True:
... get data and store in 'myforms' ...
futures = [executor.submit(processForms, myform) for myform in myforms]
time.sleep(30)
doForms()
Now I don't understand why this is not picking up new forms... If I create a new connection in each iteration of doForms(), the new forms are picked up, but I don't want to be creating and destroying connections all the time.
For example, this modification works:
conn = None
def doForms():
while True:
global conn
conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='', db='mw_py')
... get data and store in 'myforms' ...
futures = [executor.submit(processForms, myform) for myform in myforms]
conn.close()
time.sleep(30)
Is there a way for me to use the open connection and have it poll the latest data?
Open 1 connection in the beginning of your script. It is not a cheap operation to connect.
Remember ID of last row fetched.
On every iteration select rows with ID greater than last seen.