According to http://docs.sqlalchemy.org/en/rel_0_9/core/pooling.html#disconnect-handling-pessimistic, SQLAlchemy can be instrumented to reconnect if an entry in the connection pool is no longer valid. I create the following test case to test this:
import subprocess
from sqlalchemy import create_engine, event
from sqlalchemy import exc
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
print "pinging server"
cursor.execute("SELECT 1")
except:
print "raising disconnect error"
raise exc.DisconnectionError()
cursor.close()
engine = create_engine('postgresql://postgres#localhost/test')
connection = engine.connect()
subprocess.check_call(['psql', str(engine.url), '-c',
"select pg_terminate_backend(pid) from pg_stat_activity " +
"where pid <> pg_backend_pid() " +
"and datname='%s';" % engine.url.database],
stdout=subprocess.PIPE)
result = connection.execute("select 'OK'")
for row in result:
print "Success!", " ".join(row)
But instead of recovering I receive this exception:
sqlalchemy.exc.OperationalError: (OperationalError) terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Since "pinging server" is printed on the terminal it seems safe to conclude that the event listener is attached. How can SQLAlchemy be taught to recover from a disconnect?
It looks like the checkout method is only called when you first get a connection from the pool (eg your connection = engine.connect() line)
If you subsequently lose your connection, you will have to explicitly replace it, so you could just grab a new one, and retry your sql:
try:
result = connection.execute("select 'OK'")
except sqlalchemy.exc.OperationalError: # may need more exceptions here
connection = engine.connect() # grab a new connection
result = connection.execute("select 'OK'") # and retry
This would be a pain to do around every bit of sql, so you could wrap database queries using something like:
def db_execute(conn, query):
try:
result = conn.execute(query)
except sqlalchemy.exc.OperationalError: # may need more exceptions here (or trap all)
conn = engine.connect() # replace your connection
result = conn.execute(query) # and retry
return result
The following:
result = db_execute(connection, "select 'OK'")
Should now succeed.
Another option would be to also listen for the invalidate method, and take some action at that time to replace your connection.
Related
I'm using psycopg2 library to connection to my postgresql database.
Every time I want to execute any query, I make a make a new connection like this:
import psycopg2
def run_query(query):
with psycopg2.connect("dbname=test user=postgres") as connection:
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
But I think it's faster to make one connection for whole app execution like this:
import psycopg2
connection = psycopg2.connect("dbname=test user=postgres")
def run_query(query):
cursor = connection.cursor()
cursor.execute(query)
cursor.close()
So which is better way to connect my database during all execution time on my app?
I've tried both ways and both worked, but I want to know which is better and why.
You should strongly consider using a connection pool, as other answers have suggested, this will be less costly than creating a connection every time you query, as well as deal with workloads that one connection alone couldn't deal with.
Create a file called something like mydb.py, and include the following:
import psycopg2
import psycopg2.pool
from contextlib import contextmanager
dbpool = psycopg2.pool.ThreadedConnectionPool(host=<<YourHost>>,
port=<<YourPort>>,
dbname=<<YourDB>>,
user=<<YourUser>>,
password=<<YourPassword>>,
)
#contextmanager
def db_cursor():
conn = dbpool.getconn()
try:
with conn.cursor() as cur:
yield cur
conn.commit()
"""
You can have multiple exception types here.
For example, if you wanted to specifically check for the
23503 "FOREIGN KEY VIOLATION" error type, you could do:
except psycopg2.Error as e:
conn.rollback()
if e.pgcode = '23503':
raise KeyError(e.diag.message_primary)
else
raise Exception(e.pgcode)
"""
except:
conn.rollback()
raise
finally:
dbpool.putconn(conn)
This will allow you run queries as so:
import mydb
def myfunction():
with mydb.db_cursor() as cur:
cur.execute("""Select * from blahblahblah...""")
Both ways are bad. The fist one is particularly bad, because opening a database connection is quite expensive. The second is bad, because you will end up with a single connection (which is too few) one connection per process or thread (which is usually too many).
Use a connection pool.
I have starting to learn how to code psycopg2 together with Python. what I do is that I have quite few scripts. Lets have an example where it can be up to 150 connections and as we know, we cannot have more than 100 connections connected at the same time. What I figure out is that whenever I want to do a database query/execution - I then connect to the database, do the execution and then close the database. However I do believe that opening and closing new connection are very expensive and should be longer-lived.
I have done something like this:
DATABASE_CONNECTION = {
"host": "TEST",
"database": "TEST",
"user": "TEST",
"password": "TEST"
}
def get_all_links(store):
"""
Get all links from given store
:param store:
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT id, link FROM public.store_items WHERE store = %s AND visible = %s;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
data_tuple = (store, "yes")
cursor.execute(sql_update_query, data_tuple)
test_data = [{"id": links["id"], "link": links["link"]} for links in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
def get_all_stores():
"""
Get all stores in database
:return:
"""
conn = psycopg2.connect(**DATABASE_CONNECTION)
sql_update_query = "SELECT store FROM public.store_config;"
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in cursor]
cursor.close()
conn.close()
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
cursor.close()
conn.rollback()
return 1
I wonder how can I make it as effective as possible where I can have alot of scripts connected to the database but still do not hit the max_connection issue?
I do forgot to add that the way im connecting is that I have multiple scripts etc:
test1.py
test2.py
test3.py
....
....
every script runs for themselves
where they all have a import database.py which has the following code that I have showed before.
UPDATE:
from psycopg2 import pool
threaded_postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 2,
user="test",
password="test",
host="test",
database="test")
if (threaded_postgreSQL_pool):
print("Connection pool created successfully using ThreadedConnectionPool")
def get_all_stores():
"""
Get all stores in database
:return:
"""
# Use getconn() method to Get Connection from connection pool
ps_connection = threaded_postgreSQL_pool.getconn()
sql_update_query = "SELECT store FROM public.store_config;"
ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
ps_cursor.execute(sql_update_query)
test_data = [stores["store"] for stores in ps_cursor]
ps_cursor.close()
threaded_postgreSQL_pool.putconn(ps_connection)
print("Put away a PostgreSQL connection")
return test_data
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
ps_cursor.close()
ps_connection.rollback()
return 1
While opening and close database connections is not free, it is also not all that expensive when compared to starting up and stopping the python interpreter. If all your scripts are running independently and briefly, that is probably the first thing you should fix. You have to decide and describe how your scripts are getting scheduled and invoked before you can know how (and if) to use a connection pooler.
and as we know, we cannot have more than 100 connections connected at the same time.
100 is the default setting for max_connections, but it is entirely configurable. You can increase it if you want to. If you refactor for performance, you should probably do so in a way that naturally means you don't need to raise max_connections. But refactoring just because you don't want to raise max_connections is letting the tail wag the dog.
You are right, establishing a database connection is expensive; therefore, you should use connection pooling. But there is no need to re-invent the wheel, since psycopg2 has built-in connection pooling:
Use a psycopg2.pool.SimpleConnectionPool or psycopg2.pool.ThreadedConnectionPool (depending on whether you use threading or not) and use the getconn() and putconn() methods to grab or return a connection.
With psycopg2, connection and querying the database works like so
conn = psycopg2.connect('connection string')
with conn:
cur=conn.cursor()
cur.execute("SELECT * FROM pg_stat_activity") #simple query
rows = cur.fetchall()
for row in rows:
print (row)
After trial and error, I found out that with conn is absolutely necessary or you will get many unexplained locks.
My question is: is there a way to setup the connection to avoid the need to use it?
From https://www.psycopg.org/docs/usage.html,
Warning
Unlike file objects or other resources, exiting the connection’s with
block doesn’t close the connection, but only the transaction
associated to it. If you want to make sure the connection is closed
after a certain point, you should still use a try-catch block:
conn = psycopg2.connect(DSN)
try:
# connection usage
finally:
conn.close()
In psycopg, the Context manager has been implemented in such a way that the with statement will only terminate the transaction and not close the connection for you. The connection needs to be closed by you separately.
In case of an error with your transaction, you have the option to rollback and raise the error.
One way to do this is to write the connection closing logic yourself.
def with_connection(func):
"""
Function decorator for passing connections
"""
def connection(*args, **kwargs):
# Here, you may even use a connection pool
conn = psycopg.connect(DSN)
try:
rv = func(conn, *args, **kwargs)
except Exception as e:
conn.rollback()
raise e
else:
# Can decide to see if you need to commit the transaction or not
conn.commit()
finally:
conn.close()
return rv
return connection
#with_connection
def run_sql(conn, arg1):
cur = conn.cursor()
cur.execute(SQL, (arg1))
Since Version 2.5, psycopg2 should support the with statement like you expect it to behave.
Docs
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
I want to konw, what is a proper way to closing connection with Postgres database using with statement and psyopcg2.
import pandas as pd
import psycopg2
def create_df_from_postgres(params: dict,
columns: str,
tablename: str,
) -> pd.DataFrame:
with psycopg2.connect(**params) as conn:
data_sql = pd.read_sql_query(
"SELECT " + columns + ", SUM(total)"
" AS total FROM " + str(tablename),
con=conn
)
# i need to close conection here:
# conn.close()
# or here:
conn.close()
return data_sql
Is this a better way to handle connection ?
def get_ci_method_and_date(params: dict,
columns: str,
tablename: str,
) -> pd.DataFrame:
try:
connection = psycopg2.connect(**params)
data_sql = pd.read_sql_query('SELECT ' + columns +
' FROM ' + str(tablename),
con=connection
)
finally:
if(connection):
connection.close()
return data_sql
From official psycopg docs
Warning Unlike file objects or other resources, exiting the connection’s with block doesn’t close the connection, but only the transaction associated to it. If you want to make sure the connection is closed after a certain point, you should still use a try-catch block:
conn = psycopg2.connect(DSN)
try:
# connection usage
finally:
conn.close()
Proper way to close a connection:
From official psycopg docs:
Warning Unlike file objects or other resources, exiting the connection’s with
block doesn’t close the connection, but only the transaction associated to
it. If you want to make sure the connection is closed after a certain point, you
should still use a try-catch block:
conn = psycopg2.connect(DSN)
try:
# connection usage
finally:
conn.close()
I thought the Connection ContextManager closes the connection, but according to the docs, it does not:
Connections can be used as context managers. Note that a context wraps a transaction: if the context exits with success the transaction is committed, if it exits with an exception the transaction is rolled back. Note that the connection is not closed by the context and it can be used for several contexts.
Proposed usage is:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
# leaving contexts doesn't close the connection
conn.close()
source: https://www.psycopg.org/docs/connection.html
Depends on your code structure and logic, but you can also use:
#contextmanager
def _establish_connection():
db_connection = psycopg2.connect(...)
try:
yield db_connection
finally:
# Extra safety check if the transaction was not rolled back by some reason
if db_connection.status == psycopg2.extensions.STATUS_IN_TRANSACTION:
db_connection.rollback()
db_connection.close()
# After use your function like that
with _establish_connection() as conn:
# Do your logic here
return ...
The whole point of a with statement is that the resources are cleaned up automatically when it exits. So there is no need to call conn.close() explicitly at all.
Error OperationalError: (OperationalError) (2006, 'MySQL server has gone away') i'm already received this error when i coded project on Flask, but i cant understand why i get this error.
I have code (yeah, if code small and executing fast, then no errors) like this \
db_engine = create_engine('mysql://root#127.0.0.1/mind?charset=utf8', pool_size=10, pool_recycle=7200)
Base.metadata.create_all(db_engine)
Session = sessionmaker(bind=db_engine, autoflush=True)
Session = scoped_session(Session)
session = Session()
# there many classes and functions
session.close()
And this code returns me error 'MySQL server has gone away', but return it after some time, when i use pauses in my script.
Mysql i use from openserver.ru (it's web server like such as wamp).
Thanks..
Looking at the mysql docs, we can see that there are a bunch of reasons why this error can occur. However, the two main reasons I've seen are:
1) The most common reason is that the connection has been dropped because it hasn't been used in more than 8 hours (default setting)
By default, the server closes the connection after eight hours if nothing has happened. You can change the time limit by setting the wait_timeout variable when you start mysqld
I'll just mention for completeness the two ways to deal with that, but they've already been mentioned in other answers:
A: I have a very long running job and so my connection is stale. To fix this, I refresh my connection:
create_engine(conn_str, pool_recycle=3600) # recycle every hour
B: I have a long running service and long periods of inactivity. To fix this I ping mysql before every call:
create_engine(conn_str, pool_pre_ping=True)
2) My packet size is too large, which should throw this error:
_mysql_exceptions.OperationalError: (1153, "Got a packet bigger than 'max_allowed_packet' bytes")
I've only seen this buried in the middle of the trace, though often you'll only see the generic _mysql_exceptions.OperationalError (2006, 'MySQL server has gone away'), so it's hard to catch, especially if logs are in multiple places.
The above doc say the max packet size is 64MB by default, but it's actually 16MB, which can be verified with SELECT ##max_allowed_packet
To fix this, decrease packet size for INSERT or UPDATE calls.
SQLAlchemy now has a great write-up on how you can use pinging to be pessimistic about your connection's freshness:
http://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-pessimistic
From there,
from sqlalchemy import exc
from sqlalchemy import event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def ping_connection(dbapi_connection, connection_record, connection_proxy):
cursor = dbapi_connection.cursor()
try:
cursor.execute("SELECT 1")
except:
# optional - dispose the whole pool
# instead of invalidating one at a time
# connection_proxy._pool.dispose()
# raise DisconnectionError - pool will try
# connecting again up to three times before raising.
raise exc.DisconnectionError()
cursor.close()
And a test to make sure the above works:
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger#localhost/test", echo_pool=True)
c1 = e.connect()
c2 = e.connect()
c3 = e.connect()
c1.close()
c2.close()
c3.close()
# pool size is now three.
print "Restart the server"
raw_input()
for i in xrange(10):
c = e.connect()
print c.execute("select 1").fetchall()
c.close()
from documentation you can use pool_recycle parameter:
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger#localhost/test", pool_recycle=3600)
I just faced the same problem, which is solved with some effort. Wish my experience be helpful to others.
Fallowing some suggestions, I used connection pool and set pool_recycle less than wait_timeout, but it still doesn't work.
Then, I realized that global session maybe just use the same connection and connection pool didn't work. To avoid global session, for each request generate a new session which is removed by Session.remove() after processing.
Finally, all is well.
One more point to keep in mind is to manually push the flask application context with database initialization. This should resolve the issue.
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
db = SQLAlchemy()
app = Flask(__name__)
with app.app_context():
db.init_app(app)
https://docs.sqlalchemy.org/en/latest/core/pooling.html#disconnect-handling-optimistic
def sql_read(cls, sql, connection):
"""sql for read action like select
"""
LOG.debug(sql)
try:
result = connection.engine.execute(sql)
header = result.keys()
for row in result:
yield dict(zip(header, row))
except OperationalError as e:
LOG.info("recreate pool duo to %s" % e)
connection.engine.pool.recreate()
result = connection.engine.execute(sql)
header = result.keys()
for row in result:
yield dict(zip(header, row))
except Exception as ee:
LOG.error(ee)
raise SqlExecuteError()