Re-running SQLAlchemy queries when connections get stale

Re-running SQLAlchemy queries when connections get stale - python

I have SQLAlchemy connecting to Postgres via PGPool. PGPool is configured to recycle connections that are about 60s old.
I have two problems:
1) Sometimes, we get a huge query that takes more than 60s (I know it's bad... we're working on improving this) and subsequent queries fail because they rely on the same old connection that is no longer valid.
2) Similarly, when I start my Pyramid app using iPython, the connections get stale here when I stop to think for a moment.
When attempting to perform a query with a session with a stale connection, I get an exception saying:
OperationalError: (psycopg2.OperationalError) connection terminated due to client idle limit reached
ERROR: connection terminated due to client idle limit reached
SQLAlchemy's pessimistic disconnect handling docs recommend testing the connection when you get it out of the pool. However, the connection is becoming stale after being checked out, so this wouldn't help much.
I think the right solution would be to refresh the session's connection upon getting this type of error:
session = MySession() # using scoped_session here
query = session.query(...)
try:
rows = [r for r in query]
except OperationalError:
# somehow tell query.session to use a new connection here and try again?
How can I do this?

For me, executing
session.close_all()
makes the session then able run queries, at least until it idles out again.
Interestingly, running session.remove() or session.close(), like the SQLAlchemy documentation seems to imply should work, doesn't work; that makes future queries give InvalidRequestError: Can't reconnect until invalid transaction is rolled back (which of course session.rollback() doesn't fix) until calling session.close_all() .
I hope somebody can provide insight into why session.close_all() does the trick, and it may not be an appropriate solution for production, but that should at least make it so you don't have to restart the whole app in your iPython session.

Related

SQLAlchemy connection errors

I'm experiencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.
We're working on a Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a cursor already closed or This Connection is closed error, but we get other related exceptions too:
(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further
There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.
It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.
After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:
db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()
We have a BaseModel for all our orm objects:
BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)
We use the pyramid_tm tween to handle transactions during the request
We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.
We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:
engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool)
db_session.configure(bind=engine, query_cls=FilterQuery)
BaseModel.metadata.bind = engine
config.add_subscriber(cleanup_db_session, NewResponse)
return config.make_wsgi_app()
In our app we access all db operation using:
from project.db import db_session
...
db_session.query(MyModel).filter(...)
db_session.execute(...)
We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between
I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)
I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.
Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)? To complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...
Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:
#event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
conn.pdtb_start_timer = time.time()
#event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
print conn.pdtb_start_timer
Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'
Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ
And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)
Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?
Thanks in advance!
Matthijs Blaas

Update: The errors where caused by multiple commands that where send in one prepared sql statement. Psycopg2 seems to allow this, but apparently it can cause strange issues. The PG8000 connector is more strict and bailed out on the multiple commands, sending one command fixed the issue!

Python storm/web.py: Correctly handling DisconnectionError with MySQL database

I am writing a web service based on web.py, using storm as an ORM layer for querying records from a MySQL database. The web service is deployed via mod_wsgi using Apache2 on a Linux box. I create a connection to the MySQL database server when the script is started using storm's create_database() method. This is also the point where I create a Store object, which is used later on to perform queries when a request comes in.
After some hours of inactivity, store.find() throws a DisconnectionError: (2006, 'MySQL server has gone away'). I am not surprised that the database connection is dropped as Apache/mod_wsgi reuses the Python processes without reinitializing them for a long time. My question is how to correctly deal with this?
I have tried setting up a mechanism to keep alive the connection to the MySQL server by sending it a recurring "SELECT 1" (every 300 seconds). Unfortunately that fixed the problem on our testing machine, but not on our demo deployments (ouch) while both share the same MySQL configuration (wait_timeout is set to 8 hours).
I have searched for solutions for reconnecting the storm store to the database, but didn't find anything sophisticated. The only recommendation seems to be that one has to catch the exception, treat it like an inconsistency, call rollback() on the store and then retry. However, this would imply that I either have to wrap the whole Store class or implement the same retry-mechanism over and over. Is there a better solution or am I getting something completely wrong here?
Update: I have added a web.py processor that gracefully handles the disconnection error by recreating the storm Store if the exception is caught and then retrying the operation like recommended by Andrey. However, this is an incomplete and suboptimal solution as (a) the store is referenced by a handful of objects for re-use, which requires an additional mechanism to re-wire the store reference on each of these objects, and (b), it doesn't cover transaction handling (rollbacks) when performing writes on the database. However, at least it's an acceptable fix for all read operations on the store for now.

Perhaps you can use web.py's application processor to wrap your controller methods and catch DisconnectionError from them. Something like this:
def my_processor(handler):
tries = 3
while True:
try:
return handler()
except DisconnectionError:
tries -= 1
if tries == 0:
raise
Or you may check cookbook entry of how application processor is used to have SqlAlchemy with web.py: http://webpy.org/cookbook/sqlalchemy and make something similar for storm:
def load_storm(handler):
web.ctx.store = Store(database)
try:
return handler()
except web.HTTPError:
web.ctx.store.commit()
raise
except:
web.ctx.store.rollback()
raise
finally:
web.ctx.store.commit()
app.add_processor(load_storm)

Psycopg / Postgres : Connections hang out randomly

I'm using psycopg2 for the cherrypy app I'm currently working on and cli & phpgadmin to handle some operations manually. Here's the python code :
#One connection per thread
cherrypy.thread_data.pgconn = psycopg2.connect("...")
...
#Later, an object is created by a thread :
class dbobj(object):
def __init__(self):
self.connection=cherrypy.thread_data.pgconn
self.curs=self.connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
...
#Then,
try:
blabla
self.curs.execute(...)
self.connection.commit()
except:
self.connection.rollback()
lalala
...
#Finally, the destructor is called :
def __del__(self):
self.curs.close()
I'm having a problem with either psycopg or postgres (altough I think the latter is more likely). After having sent a few queries, my connections drop dead. Similarly, phpgadmin -usually- gets dropped as well ; it prompts me to reconnect after having made requests several times. Only the CLI remains persistent.
The problem is, these happen very randomly and I can't even track down what the cause is. I can either get locked down after a few page requests or never really encounter anything after having requested hundreds of pages. The only error I've found in postgres log, after terminating the app is :
...
LOG: unexpected EOF on client connection
LOG: could not send data to client: Broken pipe
LOG: unexpected EOF on client connection
...
I thought of creating a new connection every time a new dbobj instance is created but I absolutely don't want to do this.
Also, I've read that one may run into similar problems unless all transactions are committed : I use the try/except block for every single INSERT/UPDATE query, but I never use it for SELECT queries nor do I want to write even more boilerplate code (btw, do they need to be committed ?). Even if that's the case, why would phpgadmin close down ?
max_connections is set to 100 in the .conf file, so I don't think that's the reason either. A single cherrypy worker has only 10 threads.
Does anyone have an idea where I should look first ?

Psycopg2 needs a commit or rollback after every transaction, including SELECT queries, or it leaves the connections "IDLE IN TRANSACTION". This is now a warning in the docs:
Warning: By default, any query execution, including a simple SELECT will start a transaction: for long-running programs, if no further action is taken, the session will remain “idle in transaction”, an undesirable condition for several reasons (locks are held by the session, tables bloat...). For long lived scripts, either ensure to terminate a transaction as soon as possible or use an autocommit connection.

It's a bit difficult to see exactly where you're populating and accessing cherrypy.thread_data. I'd recommend investigating psycopg2.pool.ThreadedConnectionPool instead of trying to bind one conn to each thread yourself.

Even though I don't have any idea why successful SELECT queries block the connection, spilling .commit() after pretty much every single query that doesn't have to work in conjunction with another solved the problem.

Python + SQLAlchemy problem: The transaction is inactive due to a rollback in a subtransaction

I have a problem with Python + SQLAlchemy.
When something goes wrong (in my case it is an integrity error, due to a race condition) and the database error is raised, all following requests result in the error being raised:
InvalidRequestError: The transaction is inactive due to a rollback in a subtransaction. Issue rollback() to cancel the transaction.
While I can prevent this original error (race condition) from happening, but I would like a more robust solution, I want to prevent a single error from crashing the entire application.
What is the best way to do this? Is there a way to tell Python to rollback the failed transaction?

The easiest thing is to make sure you are using a new SQLAlchemy Session when you start work in your controller. in /project/lib/base.py, add a method for BaseController:
def __before__(self):
model.Session.close()
Session.close() will clear out the session and close any open transactions if there are any. You want to make sure that each time you use a session it's cleared when you're done with your work in the controller. Doing it at the start of the controller's handling of the request will make sure that it's always cleared, even if the thread's previous request had an exception and there is a rollback waiting.

Do you use in your controllers yoursapp.lib.base.BaseController?
You can look at
Handle mysql restart in SQLAlchemy
Also you can catch SA exception in BaseController try-finally block and do session rollback()
In BaseController SA Session removed http://www.sqlalchemy.org/docs/05/session.html#lifespan-of-a-contextual-session

Sql Alchemy connection time Out

I am using sqlalchemy with MySQL, and executing query with sql expression. When executing a number of query then it time out. I found an answer but it is not clear to me. Please, any one can help me?
TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30

Whenever you create a new session in your code, make sure you close it. Just call session.close()
When I got this error I thought I was closing all of my sessions, but I looked carefully and there was one new method where I wasn't. Closing the session in that method fixed this error for me.

In multi-thread mode, if your concurrent request num is much more than the db connection pool size, it will throw the Queue Pool limit of size 5 overflow 10 reached error. try with this:
engine = create_engine('mysql://', convert_unicode=True,
pool_size=20, max_overflow=100)
to add the pool size
Add: the method above is not a correct way. The actual reason is that db connection pool is used up, and no other available connection. The most probably situation is you miss to release connection. For example:
#app.teardown_appcontext
def shutdown_session(exception=None):
db_session.remove()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.