SQLAlchemy connection errors - python

I'm experiencing some strange bugs which seem to be caused by connections used by Sqlalchemy, which i can't pin down exactly.. i was hoping someone has a clue whats going on here.
We're working on a Pyramid (version 1.5b1) and use Sqlalchemy (version 0.9.6) for all our database connectivity. Sometimes we get errors related to the db connection or session, most of the time this would be a cursor already closed or This Connection is closed error, but we get other related exceptions too:
(OperationalError) connection pointer is NULL
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
A conflicting state is already present in the identity map for key (<class '...'>, (1001L,))
This Connection is closed (original cause: ResourceClosedError: This Connection is closed)
(InterfaceError) cursor already closed
Parent instance <...> is not bound to a Session; lazy load operation of attribute '...' cannot proceed
Parent instance <...> is not bound to a Session, and no contextual session is established; lazy load operation of attribute '...' cannot proceed
'NoneType' object has no attribute 'twophase'
(OperationalError) connection pointer is NULL
This session is in 'prepared' state; no further
There is no silver bullet to reproduce them, only by refreshing many times they are bound to happen one at some point. So i made a script using multi-mechanize to spam different urls concurrently and see where and when it happens.
It appears the url triggered doesn't really matter, the errors happen when there are concurrent requests that span a longer time (and other requests get served in between). This seems to indicate there is some kind of threading problem; that either the session or connection is shared among different threads.
After googling for these issues I found a lot of topics, most of them tell to use scoped sessions, but the thing is we do use them already:
db_session = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), autocommit=False, autoflush=False))
db_meta = MetaData()
We have a BaseModel for all our orm objects:
BaseModel = declarative_base(cls=BaseModelObj, metaclass=BaseMeta, metadata=db_meta)
We use the pyramid_tm tween to handle transactions during the request
We hook db_session.remove() to the pyramid NewResponse event (which is fired after everything has run). I also tried putting it in a seperate tween running after pyramid_tm or even not doing it at all, none of these seem to have effect, so the response event seemed like the most clean place to put it.
We create the engine in our main entrypoint of our pyramid project and use a NullPool and leave connection pooling to pgbouncer. We also configure the session and the bind for our BaseModel here:
engine = engine_from_config(config.registry.settings, 'sqlalchemy.', poolclass=NullPool)
db_session.configure(bind=engine, query_cls=FilterQuery)
BaseModel.metadata.bind = engine
config.add_subscriber(cleanup_db_session, NewResponse)
return config.make_wsgi_app()
In our app we access all db operation using:
from project.db import db_session
...
db_session.query(MyModel).filter(...)
db_session.execute(...)
We use psycopg2==2.5.2 to handle the connection to postgres with pgbouncer in between
I made sure no references to db_session or connections are saved anywhere (which could result in other threads reusing them)
I also tried the spamming test using different webservers, using waitress and cogen i got the errors very easily, using wsgiref we unsurprisingly have no errors (which is singlethreaded). Using uwsgi and gunicorn (4 workers, gevent) i didn't get any errors.
Given the differences in the webserver used, I thought it either has to do with some webservers handling requests in threads and some using new processes (maybe a forking problem)? To complicate matters even more, when time went on and i did some new tests, the problem had gone away in waitress but now happened with gunicorn (when using gevent)! I have no clue on how to go debugging this...
Finally, to test what happens to the connection, i attached an attribute to the connection at the start of the cursor execute and tried to read the attribute out at the end of the execute:
#event.listens_for(Engine, "before_cursor_execute")
def _before_cursor_execute(conn, cursor, stmt, params, context, execmany):
conn.pdtb_start_timer = time.time()
#event.listens_for(Engine, "after_cursor_execute")
def _after_cursor_execute(conn, cursor, stmt, params, context, execmany):
print conn.pdtb_start_timer
Surprisingly this sometimes raised an exception: 'Connection' object has no attribute 'pdtb_start_timer'
Which struck me as very strange.. I found one discussion about something similar: https://groups.google.com/d/msg/sqlalchemy/GQZSjHAGkWM/rDflJvuyWnEJ
And tried adding strategy='threadlocal' to the engine, which from what i understand should force 1 connection for the tread. But it didn't have any effect on the errors im seeing.. (besides some unittests failing because i need two different sessions/connections for some tests and this forces 1 connection to be associated)
Does anyone have any idea what might go on here or have some more pointers on how to attack this problem?
Thanks in advance!
Matthijs Blaas

Update: The errors where caused by multiple commands that where send in one prepared sql statement. Psycopg2 seems to allow this, but apparently it can cause strange issues. The PG8000 connector is more strict and bailed out on the multiple commands, sending one command fixed the issue!

Related

"server closed the connection unexpectedly" in long running task after trying to save stuff to db

I have django app and celery workers.
One celery task is quite huge and can run for over 15 minutes. When main calculations are done and I try to save results to db I get an error: psycopg2.OperationalError: server closed the connection unexpectedly.
#celery_app.task
def task(param):
Model1.objects.create(...)
huge_calculations(param) # may run for over 15 minutes
Model2.objects.create(...) # <- error here
Everything that I managed to google refers to the simple solution: "update everything", but I already did, have latest versions of every package in project and still have this error.
For short task (even same task w/ different params) everything works fine.
I've also tried to adjust db connection timeout, but no luck :/
Did you try this?
I'm having the same issue. For now as a workaround had to create wrapper around
connection object with used methods to hook them. In cursor() method had to do a
SELECT 1 "ping" check and reconnect if needed to return a valid working cursor.

SQLAlchemy core engine.execute() vs connection.execute()

I'm playing around with SQLAlchemy core in Python, and I've read over the documentation numerous times and still need clarification about engine.execute() vs connection.execute().
As I understand it, engine.execute() is the same as doing connection.execute(), followed by connection.close().
The tutorials I've followed let me to use this in my code:
Initial setup in script
try:
engine = db.create_engine("postgres://user:pass#ip/dbname", connect_args={'connect_timeout': 5})
connection = engine.connect()
metadata = db.MetaData()
except exc.OperationalError:
print_error(f":: Could not connect to {db_ip}!")
sys.exit()
Then, I have functions that handle my database access, for example:
def add_user(a_username):
query = db.insert(table_users).values(username=a_username)
connection.execute(query)
Am I supposed to be calling connection.close() before my script ends? Or is that handled efficiently enough by itself? Would I be better off closing the connection at the end of add_user(), or is that inefficient?
If I do need to be calling connection.close() before the script ends, does that mean interrupting the script will cause hanging connections on my Postgres DB?
I found this post helpful to better understand the different interaction paradigms in sqlalchemy, in case you haven't read it yet.
Regarding your question as to when to close your db connection: It is indeed very inefficient to create and close connections for every statement execution. However you should make sure that your application does not have connection leaks in it's global flow.

SQLAlchemy + pyTelegramBotAPI: SQLite objects created in a thread can only be used in that same thread

i have a real headache from trying to understand the cause of the following problem. We are using a combination of the following libraries:
pyTelegramBotAPI to process requests in a multi-threaded way
SQLAlchemy
sqlite
The SQLAlchemy was first using NullPool and now is configured to utilize QueuePool. I am also using the following idiom to have a new DB session firing up for each thread (as per my understanding)
Session = sessionmaker(bind=create_engine(classes.db_url, poolclass=QueuePool))
#contextmanager
def session_scope():
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
#bot.message_handler(content_types=['document'])
def method_handler:
with session_scope() as session:
do_database_stuff_here(session)
Nevertheless I am still getting this annoying exception: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread
Any ideas? ;) In particular, i don't get how it is possible for another tread to get somewhere in between the db operations...this is probably the reason of the pesky exception
update 1: if i change the poolclass to SingletonThreadPool, then there seems to be no more errors coming up. However, the documentation of SQLAlchemy tells that it's not production rife.
As you can see in the source, sqlite will raise this exception inside pysqlite_check_thread if the connection object is reused across any threads.
By using a QueuePool, you are telling SQLAchemy it is safe to reuse connections across multiple threads. It will therefore just pick a connection from the pool for any session no matter which thread it is on. This is why you're hitting the error. The first time you create and use a connection, you'll be fine; however the next use will probably be on a different thread and so fail the check.
This is why SQLAlchemy mandates the use of other pools such as SingletonThreadPool and NullPool.
Assuming you are using a file based database, you should use the NullPool. This will give you good concurrency on reads. Write access concurrency is always going to be an issue for sqlite; if you need this, you probably want a diffenet database.
Something that may be worth trying: use scoped_session instead of your contextmanager; scoped_session implicitly creates a thread-local session when it is accessed from a different thread. Be sure also to use NullPool.
from sqlalchemy.orm import scoped_session
sessionmaker(bind=create_engine(classes.db_url, poolclass=NullPool))
session = scoped_session()
Note that you can use this scoped session directly as if it were just a regular session, even though it is actually creating thread-local sessions behind the scenes when it is being used.
For scoped_session, should call session.remove() for after you're done (i.e., after each method_handler) and explicitly call session.commit() as needed.
In theory, your context manager should work in giving each thread its own session, but, for lack of better explanation, I wonder if there are multiple threads accessing that session within the context.

Connections Not Being Closed SQLAlchemy

I have recently been seeing MySQL server has gone away in my application logs for a daemon that I have running utilizing SQLAlchemy.
I wrap every database query or update in a decorator that should close all the sessions after finishing. In theory, that should also close the connection.
My decorator looks like
def dbop(meth):
#wraps(meth)
def nf(self, *args, **kwargs):
self.session = self.sm()
res = meth(self, *args, **kwargs)
self.session.commit()
self.session.close()
return res
return nf
I also initialize the database at the top of my Python script with:
def initdb(self):
engine = create_engine(db_url)
Base.metadata.create_all(engine)
self.sm = sessionmaker(bind=engine,
autocommit=False,
autoflush=False,
expire_on_commit=False)
To my understanding, I am getting that error because my connection is timing out. Why would this be the case if I wrap each method in that decorator above? Is this because expire_on_commit cause queries even after connection is closed and might reopen them? Is this because Base.metadata.create_all causes SQL to be executed which opens a connection that isn't closed?
Your session is bound to an "engine", which in turn uses a connection pool. Each time SQLAlchemy requires a connection it checks one out from the pool, and if it is done with it, it is returned to the pool but it is not closed! This is a common strategy to reduce overhead from opening/closing connections. All the options you set above have only an impact on the session, not the connection!
By default, the connections in the pool are kept open indefinitely.
But MySQL will automatically close the connection after a certain amount of inactivity (See wait_timeout).
The issue here is that your Python process will not be informed by the MySQL server that the connection was closed if it falls into the inactivity timeout. Instead, the next time a query is sent to that connection, the Python will discover that the connection is no longer available. A similar thing can happen if the connection is lost due to other reasons, for example forced service restarts which don't wait for open connections to be cleanly closed (for example, using the "immediate" option in postgres restarts).
This is when you run into the exception.
SQLAlchemy gives you various strategies of dealing with this, which are well documented int the "Dealing with Disconnects" section as mentioned by #lukas-graf
If you jump through some hoops you can get a reference to the connection which is currently in use by the session. You could close it that way but I strongly recommend against this. Instead, refer to the "Dealing with Disconnects" session above, and let SQLAlchemy deal with this for you transparently. In your case, setting the pool_recycle option might solve your problem.

Invalid transaction persisting across requests

Summary
One of our threads in production hit an error and is now producing InvalidRequestError: This session is in 'prepared' state; no further SQL can be emitted within this transaction. errors, on every request with a query that it serves, for the rest of its life! It's been doing this for days, now! How is this possible, and how can we prevent it going forward?
Background
We are using a Flask app on uWSGI (4 processes, 2 threads), with Flask-SQLAlchemy providing us DB connections to SQL Server.
The problem seemed to start when one of our threads in production was tearing down its request, inside this Flask-SQLAlchemy method:
#teardown
def shutdown_session(response_or_exc):
if app.config['SQLALCHEMY_COMMIT_ON_TEARDOWN']:
if response_or_exc is None:
self.session.commit()
self.session.remove()
return response_or_exc
...and somehow managed to call self.session.commit() when the transaction was invalid. This resulted in sqlalchemy.exc.InvalidRequestError: Can't reconnect until invalid transaction is rolled back getting output to stdout, in defiance of our logging configuration, which makes sense since it happened during the app context tearing down, which is never supposed to raise exceptions. I'm not sure how the transaction got to be invalid without response_or_exec getting set, but that's actually the lesser problem AFAIK.
The bigger problem is, that's when the "'prepared' state" errors started, and haven't stopped since. Every time this thread serves a request that hits the DB, it 500s. Every other thread seems to be fine: as far as I can tell, even the thread that's in the same process is doing OK.
Wild guess
The SQLAlchemy mailing list has an entry about the "'prepared' state" error saying it happens if a session started committing and hasn't finished yet, and something else tries to use it. My guess is that the session in this thread never got to the self.session.remove() step, and now it never will.
I still feel like that doesn't explain how this session is persisting across requests though. We haven't modified Flask-SQLAlchemy's use of request-scoped sessions, so the session should get returned to SQLAlchemy's pool and rolled back at the end of the request, even the ones that are erroring (though admittedly, probably not the first one, since that raised during the app context tearing down). Why are the rollbacks not happening? I could understand it if we were seeing the "invalid transaction" errors on stdout (in uwsgi's log) every time, but we're not: I only saw it once, the first time. But I see the "'prepared' state" error (in our app's log) every time the 500s occur.
Configuration details
We've turned off expire_on_commit in the session_options, and we've turned on SQLALCHEMY_COMMIT_ON_TEARDOWN. We're only reading from the database, not writing yet. We're also using Dogpile-Cache for all of our queries (using the memcached lock since we have multiple processes, and actually, 2 load-balanced servers). The cache expires every minute for our major query.
Updated 2014-04-28: Resolution steps
Restarting the server seems to have fixed the problem, which isn't entirely surprising. That said, I expect to see it again until we figure out how to stop it. benselme (below) suggested writing our own teardown callback with exception handling around the commit, but I feel like the bigger problem is that the thread was messed up for the rest of its life. The fact that this didn't go away after a request or two really makes me nervous!
Edit 2016-06-05:
A PR that solves this problem has been merged on May 26, 2016.
Flask PR 1822
Edit 2015-04-13:
Mystery solved!
TL;DR: Be absolutely sure your teardown functions succeed, by using the teardown-wrapping recipe in the 2014-12-11 edit!
Started a new job also using Flask, and this issue popped up again, before I'd put in place the teardown-wrapping recipe. So I revisited this issue and finally figured out what happened.
As I thought, Flask pushes a new request context onto the request context stack every time a new request comes down the line. This is used to support request-local globals, like the session.
Flask also has a notion of "application" context which is separate from request context. It's meant to support things like testing and CLI access, where HTTP isn't happening. I knew this, and I also knew that that's where Flask-SQLA puts its DB sessions.
During normal operation, both a request and an app context are pushed at the beginning of a request, and popped at the end.
However, it turns out that when pushing a request context, the request context checks whether there's an existing app context, and if one's present, it doesn't push a new one!
So if the app context isn't popped at the end of a request due to a teardown function raising, not only will it stick around forever, it won't even have a new app context pushed on top of it.
That also explains some magic I hadn't understood in our integration tests. You can INSERT some test data, then run some requests and those requests will be able to access that data despite you not committing. That's only possible since the request has a new request context, but is reusing the test application context, so it's reusing the existing DB connection. So this really is a feature, not a bug.
That said, it does mean you have to be absolutely sure your teardown functions succeed, using something like the teardown-function wrapper below. That's a good idea even without that feature to avoid leaking memory and DB connections, but is especially important in light of these findings. I'll be submitting a PR to Flask's docs for this reason. (Here it is)
Edit 2014-12-11:
One thing we ended up putting in place was the following code (in our application factory), which wraps every teardown function to make sure it logs the exception and doesn't raise further. This ensures the app context always gets popped successfully. Obviously this has to go after you're sure all teardown functions have been registered.
# Flask specifies that teardown functions should not raise.
# However, they might not have their own error handling,
# so we wrap them here to log any errors and prevent errors from
# propagating.
def wrap_teardown_func(teardown_func):
#wraps(teardown_func)
def log_teardown_error(*args, **kwargs):
try:
teardown_func(*args, **kwargs)
except Exception as exc:
app.logger.exception(exc)
return log_teardown_error
if app.teardown_request_funcs:
for bp, func_list in app.teardown_request_funcs.items():
for i, func in enumerate(func_list):
app.teardown_request_funcs[bp][i] = wrap_teardown_func(func)
if app.teardown_appcontext_funcs:
for i, func in enumerate(app.teardown_appcontext_funcs):
app.teardown_appcontext_funcs[i] = wrap_teardown_func(func)
Edit 2014-09-19:
Ok, turns out --reload-on-exception isn't a good idea if 1.) you're using multiple threads and 2.) terminating a thread mid-request could cause trouble. I thought uWSGI would wait for all requests for that worker to finish, like uWSGI's "graceful reload" feature does, but it seems that's not the case. We started having problems where a thread would acquire a dogpile lock in Memcached, then get terminated when uWSGI reloads the worker due to an exception in a different thread, meaning the lock is never released.
Removing SQLALCHEMY_COMMIT_ON_TEARDOWN solved part of our problem, though we're still getting occasional errors during app teardown during session.remove(). It seems these are caused by SQLAlchemy issue 3043, which was fixed in version 0.9.5, so hopefully upgrading to 0.9.5 will allow us to rely on the app context teardown always working.
Original:
How this happened in the first place is still an open question, but I did find a way to prevent it: uWSGI's --reload-on-exception option.
Our Flask app's error handling ought to be catching just about anything, so it can serve a custom error response, which means only the most unexpected exceptions should make it all the way to uWSGI. So it makes sense to reload the whole app whenever that happens.
We'll also turn off SQLALCHEMY_COMMIT_ON_TEARDOWN, though we'll probably commit explicitly rather than writing our own callback for app teardown, since we're writing to the database so rarely.
A surprising thing is that there's no exception handling around that self.session.commit. And a commit can fail, for example if the connection to the DB is lost. So the commit fails, session is not removed and next time that particular thread handles a request it still tries to use that now-invalid session.
Unfortunately, Flask-SQLAlchemy doesn't offer any clean possibility to have your own teardown function. One way would be to have the SQLALCHEMY_COMMIT_ON_TEARDOWN set to False and then writing your own teardown function.
It should look like this:
#app.teardown_appcontext
def shutdown_session(response_or_exc):
try:
if response_or_exc is None:
sqla.session.commit()
finally:
sqla.session.remove()
return response_or_exc
Now, you will still have your failing commits, and you'll have to investigate that separately... But at least your thread should recover.

Categories

Resources