I have starting to learn how to code psycopg2 together with Python. what I do is that I have quite few scripts. Lets have an example where it can be up to 150 connections and as we know, we cannot have more than 100 connections connected at the same time. What I figure out is that whenever I want to do a database query/execution - I then connect to the database, do the execution and then close the database. However I do believe that opening and closing new connection are very expensive and should be longer-lived.
I have done something like this:
DATABASE_CONNECTION = {
"host": "TEST",
"database": "TEST",
"user": "TEST",
"password": "TEST"
}
def get_all_links(store):
"""
Get all links from given store
:param store:
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT id, link FROM public.store_items WHERE store = %s AND visible = %s;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
data_tuple = (store, "yes")
cursor.execute(sql_update_query, data_tuple)
test_data = [{"id": links["id"], "link": links["link"]} for links in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
def get_all_stores():
"""
Get all stores in database
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT store FROM public.store_config;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
I wonder how can I make it as effective as possible where I can have alot of scripts connected to the database but still do not hit the max_connection issue?
I do forgot to add that the way im connecting is that I have multiple scripts etc:
test1.py
test2.py
test3.py
....
....
every script runs for themselves
where they all have a import database.py which has the following code that I have showed before.
UPDATE:
from psycopg2 import pool
threaded_postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 2,
user="test",
password="test",
host="test",
database="test")
if (threaded_postgreSQL_pool):
print("Connection pool created successfully using ThreadedConnectionPool")
def get_all_stores():
"""
Get all stores in database
:return:
"""
# Use getconn() method to Get Connection from connection pool
ps_connection = threaded_postgreSQL_pool.getconn()
sql_update_query = "SELECT store FROM public.store_config;"
ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
ps_cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in ps_cursor]
ps_cursor.close()
threaded_postgreSQL_pool.putconn(ps_connection)
print("Put away a PostgreSQL connection")
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
ps_cursor.close()
ps_connection.rollback()
return 1
While opening and close database connections is not free, it is also not all that expensive when compared to starting up and stopping the python interpreter. If all your scripts are running independently and briefly, that is probably the first thing you should fix. You have to decide and describe how your scripts are getting scheduled and invoked before you can know how (and if) to use a connection pooler.
and as we know, we cannot have more than 100 connections connected at the same time.
100 is the default setting for max_connections, but it is entirely configurable. You can increase it if you want to. If you refactor for performance, you should probably do so in a way that naturally means you don't need to raise max_connections. But refactoring just because you don't want to raise max_connections is letting the tail wag the dog.
You are right, establishing a database connection is expensive; therefore, you should use connection pooling. But there is no need to re-invent the wheel, since psycopg2 has built-in connection pooling:
Use a psycopg2.pool.SimpleConnectionPool or psycopg2.pool.ThreadedConnectionPool (depending on whether you use threading or not) and use the getconn() and putconn() methods to grab or return a connection.
Related
Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results
I'm using psycopg2 library to connection to my postgresql database.
Every time I want to execute any query, I make a make a new connection like this:
import psycopg2
def run_query(query):
with psycopg2.connect("dbname=test user=postgres") as connection:
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
But I think it's faster to make one connection for whole app execution like this:
import psycopg2
connection = psycopg2.connect("dbname=test user=postgres")
def run_query(query):
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
So which is better way to connect my database during all execution time on my app?
I've tried both ways and both worked, but I want to know which is better and why.
You should strongly consider using a connection pool, as other answers have suggested, this will be less costly than creating a connection every time you query, as well as deal with workloads that one connection alone couldn't deal with.
Create a file called something like mydb.py, and include the following:
import psycopg2
import psycopg2.pool
from contextlib import contextmanager
dbpool = psycopg2.pool.ThreadedConnectionPool(host=<<YourHost>>,
port=<<YourPort>>,
dbname=<<YourDB>>,
user=<<YourUser>>,
password=<<YourPassword>>,
)
#contextmanager
def db_cursor():
conn = dbpool.getconn()
try:
with conn.cursor() as cur:
yield cur
conn.commit()
"""
You can have multiple exception types here.
For example, if you wanted to specifically check for the
23503 "FOREIGN KEY VIOLATION" error type, you could do:
except psycopg2.Error as e:
conn.rollback()
if e.pgcode = '23503':
raise KeyError(e.diag.message_primary)
else
raise Exception(e.pgcode)
"""
except:
conn.rollback()
raise
finally:
dbpool.putconn(conn)
This will allow you run queries as so:
import mydb
def myfunction():
with mydb.db_cursor() as cur:
cur.execute("""Select * from blahblahblah...""")
Both ways are bad. The fist one is particularly bad, because opening a database connection is quite expensive. The second is bad, because you will end up with a single connection (which is too few) one connection per process or thread (which is usually too many).
Use a connection pool.
I have a python application that is reading from mysql/mariadb, uses that to fetch data from an api and then inserts results into another table.
I had setup a module with a function to connect to the database and return the connection object that is passed to other functions/modules. However, I believe this might not be a correct approach. The idea was to have a small module that I could just call whenever I needed to connect to the db.
Also note, that I am using the same connection object during loops (and within the loop passing to the db_update module) and call close() when all is done.
I am also getting some warnings from the db sometimes, those mostly happen at the point where I call db_conn.close(), so I guess I am not handling the connection or session/engine correctly. Also, the connection id's in the log warning keep increasing, so that is another hint, that I am doing it wrong.
[Warning] Aborted connection 351 to db: 'some_db' user: 'some_user' host: '172.28.0.3' (Got an error reading communication packets)
Here is some pseudo code that represents the structure I currently have:
################
## db_connect.py
################
# imports ...
from sqlalchemy import create_engine
def db_connect():
# get env ...
db_string = f"mysql+pymysql://{db_user}:{db_pass}#{db_host}:{db_port}/{db_name}"
try:
engine = create_engine(db_string)
except Exception as e:
return None
db_conn = engine.connect()
return db_conn
################
## db_update.py
################
# imports ...
def db_insert(db_conn, api_result):
# ...
ins_qry = "INSERT INTO target_table (attr_a, attr_b) VALUES (:a, :b);"
ins_qry = text(ins_qry)
ins_qry = ins_qry.bindparams(a = value_a, b = value_b)
try:
db_conn.execute(ins_qry)
except Exception as e:
print(e)
return None
return True
################
## main.py
################
from sqlalchemy import text
from db_connect import db_connect
from db_update import db_insert
def run():
try:
db_conn = db_connect()
if not db_conn:
return False
except Exception as e:
print(e)
qry = "SELECT *
FROM some_table
WHERE some_attr IN (:some_value);"
qry = text(qry)
search_run_qry = qry.bindparams(
some_value = 'abc'
)
result_list = db_conn.execute(qry).fetchall()
for result_item in result_list:
## do stuff like fetching data from api for every record in the query result
api_result = get_api_data(...)
## insert into db:
db_ins_status = db_insert(db_conn, api_result)
## ...
db_conn.close
run()
EDIT: Another question:
a) Is it ok in a loop, that does an update on every iteration to use the same connection, or would it be wiser to instead pass the engine to the run() function and call db_conn = engine.connect() and db_conn.close() just before and after each update?
b) I am thinking about using ThreadPoolExecutor instead of the loop for the API calls. Would this have implications on how to use the connection, i.e. can I use the same connection for multiple threads that are doing updates to the same table?
Note: I am not using the ORM feature mostly because I have a strong DWH/SQL background (though not so much as DBA) and I am used to writing even complex sql queries. I am thinking about switching to just using PyMySQL connector for that reason.
Thanks in advance!
Yes you can return/pass connection object as parameter but what is the aim of db_connect method, except testing connection ? As I see there is no aim of this db_connect method therefore I would recommend you to do this as I done it before.
I would like to share a code snippet from one of my project.
def create_record(sql_query: str, data: tuple):
try:
connection = mysql_obj.connect()
db_cursor = connection.cursor()
db_cursor.execute(sql_query, data)
connection.commit()
return db_cursor, connection
except Exception as error:
print(f'Connection failed error message: {error}')
and then using this one as for another my need
db_cursor, connection, query_data = fetch_data(sql_query, query_data)
and after all my needs close the connection with this method and method call.
def close_connection(connection, db_cursor):
"""
This method used to close SQL server connection
"""
db_cursor.close()
connection.close()
and calling method
close_connection(connection, db_cursor)
I am not sure can I share my github my check this link please. Under model.py you can see database methods and to see how calling them check it main.py
Best,
Hasan.
I'm still using Flask-mysql.
I'm getting the database context (the mysql variable) just fine, and can query on the database / get results. It's only the insert that is not working: it's not complaining (throwing Exceptions). It returns True from the insert method.
This should be done inserting the record when it commits, but for some reason, as I watch the MySQL database with MySQL Workbench, nothing is getting inserted into the table (and it's not throwing exceptions from the insert method):
I'm passing in this to insertCmd:
"INSERT into user(username, password) VALUES ('test1','somepassword');"
I've checked the length of the column in the database, and copied the command into MySQL Workbench (where it successfully inserts the row into the table).
I'm at a loss. The examples I've seen all seem to follow this format, and I have a good database context. You can see other things I've tried in the comments.
def insert(mysql, insertCmd):
try:
#connection = mysql.get_db()
cursor = mysql.connect().cursor()
cursor.execute(insertCmd)
mysql.connect().commit()
#mysql.connect().commit
#connection.commit()
return True
except Exception as e:
print("Problem inserting into db: " + str(e))
return False
You need to keep a handle to the connection; you keep overriding it in your loop.
Here is a simplified example:
con = mysql.connect()
cursor = con.cursor()
def insert(mysql, insertCmd):
try:
cursor.execute(insertCmd)
con.commit()
return True
except Exception as e:
print("Problem inserting into db: " + str(e))
return False
If mysql is your connection, then you can just commit on that, directly:
def insert(mysql, insertCmd):
try:
cursor = mysql.cursor()
cursor.execute(insertCmd)
mysql.commit()
return True
except Exception as e:
print("Problem inserting into db: " + str(e))
return False
return False
Apparently, you MUST separate the connect and cursor, or it won't work.
To get the cursor, this will work: cursor = mysql.connect().cursor()
However, as Burchan Khalid so adeptly pointed out, any attempt after that to make a connection object in order to commit will wipe out the work you did using the cursor.
So, you have to do the following (no shortcuts):
connection = mysql.connect()
cursor = connection.cursor()
cursor.execute(insertCmd)
connection.commit()
According to http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html#disconnect-handling-pessimistic, SQLAlchemy can be instrumented to reconnect if an entry in the connection pool is no longer valid. I create the following test case to test this:
import subprocess
from sqlalchemy import create_engine, event
from sqlalchemy import exc
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
print "pinging server"
cursor.execute("SELECT 1")
except:
print "raising disconnect error"
raise exc.DisconnectionError()
cursor.close()
engine = create_engine('postgresql://postgres#localhost/test')
connection = engine.connect()
subprocess.check_call(['psql', str(engine.url), '-c',
"select pg_terminate_backend(pid) from pg_stat_activity " +
"where pid <> pg_backend_pid() " +
"and datname='%s';" % engine.url.database],
stdout=subprocess.PIPE)
result = connection.execute("select 'OK'")
for row in result:
print "Success!", " ".join(row)
But instead of recovering I receive this exception:
sqlalchemy.exc.OperationalError: (OperationalError) terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Since "pinging server" is printed on the terminal it seems safe to conclude that the event listener is attached. How can SQLAlchemy be taught to recover from a disconnect?
It looks like the checkout method is only called when you first get a connection from the pool (eg your connection = engine.connect() line)
If you subsequently lose your connection, you will have to explicitly replace it, so you could just grab a new one, and retry your sql:
try:
result = connection.execute("select 'OK'")
except sqlalchemy.exc.OperationalError: # may need more exceptions here
connection = engine.connect() # grab a new connection
result = connection.execute("select 'OK'") # and retry
This would be a pain to do around every bit of sql, so you could wrap database queries using something like:
def db_execute(conn, query):
try:
result = conn.execute(query)
except sqlalchemy.exc.OperationalError: # may need more exceptions here (or trap all)
conn = engine.connect() # replace your connection
result = conn.execute(query) # and retry
return result
The following:
result = db_execute(connection, "select 'OK'")
Should now succeed.
Another option would be to also listen for the invalidate method, and take some action at that time to replace your connection.