Leaking database connections: PostgreSQL, SQLAlchemy, Flask - python

I'm running PostgreSQL 9.3 and SQLAlchemy 0.8.2 and experience database connections leaking. After deploying the app consumes around 240 connections. Over next 30 hours this number gradually grows to 500, when PostgreSQL will start dropping connections.
I use SQLAlchemy thread-local sessions:
from sqlalchemy import orm, create_engine
engine = create_engine(os.environ['DATABASE_URL'], echo=False)
Session = orm.scoped_session(orm.sessionmaker(engine))
For the Flask web app, the .remove() call to the Session proxy-object is send during request teardown:
#app.teardown_request
def teardown_request(exception=None):
if not app.testing:
Session.remove()
This should be the same as what Flask-SQLAlchemy is doing.
I also have some periodic tasks that run in a loop, and I call .remove() for every iteration of the loop:
def run_forever():
while True:
do_stuff(Session)
Session.remove()
What am I doing wrong which could lead to a connection leak?

If I remember correctly from my experiments with SQLAlchemy, the scoped_session() is used to create sessions that you can access from multiple places. That is, you create a session in one method and use it in another without explicitly passing the session object around.
It does that by keeping a list of sessions and associating them with a "scope ID". By default, to obtain a scope ID, it uses the current thread ID; so you have session per thread. You can supply a scopefunc to provide – for example – one ID per request:
# This is (approx.) what flask-sqlalchemy does:
from flask import _request_ctx_stack as context_stack
Session = orm.scoped_session(orm.sessionmaker(engine),
scopefunc=context_stack.__ident_func__)
Also, take note of the other answers and comments about doing background tasks.

First of all, it is a really really bad way to run background tasks. Try any ASync scheduler like celery.
Not 100% sure so this is a bit of a guess based on the information provided, but I wonder if each page load is starting a new db connection which is then listening for notifications. If this is the case, I wonder if the db connection is effectively removed from the pool and so gets created on the next page load.
If this is the case, my recommendation would be to have a separate DBI database handle dedicated to listening for notifications so that these are not active in the queue. This might be done outside your workflow.
Also
Particularly, the leak is happening when making more than one simultaneous requests. At the same time, I could see some of the requests were left with uncompleted query execution and timing out. You can write something to manage this yourself.

Related

How is sqlalchemy's session used concurrently in the case of fork(uwsgi celery prefork)

For example, it is found in the development that uwsgi and celery use multiple processes in the way of prefork, but if the session has been created at the time of application initialization, the processes share the same session. How to avoid this situation
A SQLAlchemy session should be short-lived. In a web application, you typically create a session at the start of a request, and remove the session when the request ends.
So each request gets its own session.
Using the session properly is important in SQLAlchemy, I suggest you read this part of the documentation: https://docs.sqlalchemy.org/en/latest/orm/session.html
From When do I construct a Session, when do I commit it, and when do I close it?:
A web application is the easiest case because such an application is
already constructed around a single, consistent scope - this is the
request, which represents an incoming request from a browser, the
processing of that request to formulate a response, and finally the
delivery of that response back to the client. Integrating web
applications with the Session is then the straightforward task of
linking the scope of the Session to that of the request. The Session
can be established as the request begins, or using a lazy
initialization pattern which establishes one as soon as it is needed.
The request then proceeds, with some system in place where application
logic can access the current Session in a manner associated with how
the actual request object is accessed. As the request ends, the
Session is torn down as well, usually through the usage of event hooks
provided by the web framework. The transaction used by the Session may
also be committed at this point, or alternatively the application may
opt for an explicit commit pattern, only committing for those requests
where one is warranted, but still always tearing down the Session
unconditionally at the end.

SQLAlchemy + pyTelegramBotAPI: SQLite objects created in a thread can only be used in that same thread

i have a real headache from trying to understand the cause of the following problem. We are using a combination of the following libraries:
pyTelegramBotAPI to process requests in a multi-threaded way
SQLAlchemy
sqlite
The SQLAlchemy was first using NullPool and now is configured to utilize QueuePool. I am also using the following idiom to have a new DB session firing up for each thread (as per my understanding)
Session = sessionmaker(bind=create_engine(classes.db_url, poolclass=QueuePool))
#contextmanager
def session_scope():
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
finally:
session.close()
#bot.message_handler(content_types=['document'])
def method_handler:
with session_scope() as session:
do_database_stuff_here(session)
Nevertheless I am still getting this annoying exception: (sqlite3.ProgrammingError) SQLite objects created in a thread can only be used in that same thread
Any ideas? ;) In particular, i don't get how it is possible for another tread to get somewhere in between the db operations...this is probably the reason of the pesky exception
update 1: if i change the poolclass to SingletonThreadPool, then there seems to be no more errors coming up. However, the documentation of SQLAlchemy tells that it's not production rife.
As you can see in the source, sqlite will raise this exception inside pysqlite_check_thread if the connection object is reused across any threads.
By using a QueuePool, you are telling SQLAchemy it is safe to reuse connections across multiple threads. It will therefore just pick a connection from the pool for any session no matter which thread it is on. This is why you're hitting the error. The first time you create and use a connection, you'll be fine; however the next use will probably be on a different thread and so fail the check.
This is why SQLAlchemy mandates the use of other pools such as SingletonThreadPool and NullPool.
Assuming you are using a file based database, you should use the NullPool. This will give you good concurrency on reads. Write access concurrency is always going to be an issue for sqlite; if you need this, you probably want a diffenet database.
Something that may be worth trying: use scoped_session instead of your contextmanager; scoped_session implicitly creates a thread-local session when it is accessed from a different thread. Be sure also to use NullPool.
from sqlalchemy.orm import scoped_session
sessionmaker(bind=create_engine(classes.db_url, poolclass=NullPool))
session = scoped_session()
Note that you can use this scoped session directly as if it were just a regular session, even though it is actually creating thread-local sessions behind the scenes when it is being used.
For scoped_session, should call session.remove() for after you're done (i.e., after each method_handler) and explicitly call session.commit() as needed.
In theory, your context manager should work in giving each thread its own session, but, for lack of better explanation, I wonder if there are multiple threads accessing that session within the context.

Triggering connection pools with sqlalchemy in flask

I am using Flask + SQLAlchemy (DB is Postgres) for my server, and am wondering how connection pooling happens. I know that it is enabled by default with a pool size of 5, but I don't know if my code triggers it.
Assuming I use the default flask SQLalchemy bridge :
db = SQLAlchemy(app)
And then use that object to place database calls like
db.session.query(......)
How does flask-sqlalchemy manage the connection pool behind the scene? Does it grab a new session every time I access db.session? When is this object returned to the pool (assuming I don't store it in a local variable)?
What is the correct pattern to write code to maximize concurrency + performance? If I access the DB multiple times in one serial method, is it a good idea to use db.session every time?
I was unable to find documentation on this manner, so I don't know what is happening behind the scene (the code works, but will it scale?)
Thanks!
You can use event registration - http://docs.sqlalchemy.org/en/latest/core/event.html#event-registration
There are many different event types that can be monitored, checkout, checkin, connect etc... - http://docs.sqlalchemy.org/en/latest/core/events.html
Here is a basic example from the docs on printing a when a new connection is established.
from sqlalchemy.event import listen
from sqlalchemy.pool import Pool
def my_on_connect(dbapi_con, connection_record):
print "New DBAPI connection:", dbapi_con
listen(Pool, 'connect', my_on_connect)

Multi-threaded use of SQLAlchemy

I want to make a Database Application Programming Interface written in Python and using SQLAlchemy (or any other database connectors if it is told that using SQLAlchemy for this kind of task is not the good way to go). The setup is a MySQL server running on Linux or BSD and a the Python software running on a Linux or BSD machine (Either foreign or local).
Basically what I want to do is spawn a new thread for each connections and the protocol would be custom and quite simple, although for each requests I would like to open a new transaction (or session as I have read) and then I need to commit the session. The problem I am facing right now is that there is high probability that another sessions happen at the same time from another connection.
My question here is what should I do to handle this situation?
Should I use a lock so only a single session can run at the same time?
Are sessions actually thread-safe and I am wrong about thinking that they are not?
Is there a better way to handle this situation?
Is threading the way not-to-go?
Session objects are not thread-safe, but are thread-local. From the docs:
"The Session object is entirely designed to be used in a non-concurrent fashion, which in terms of multithreading means "only in one thread at a time" .. some process needs to be in place such that mutltiple calls across many threads don’t actually get a handle to the same session. We call this notion thread local storage."
If you don't want to do the work of managing threads and sessions yourself, SQLAlchemy has the ScopedSession object to take care of this for you:
The ScopedSession object by default uses threading.local() as storage, so that a single Session is maintained for all who call upon the ScopedSession registry, but only within the scope of a single thread. Callers who call upon the registry in a different thread get a Session instance that is local to that other thread.
Using this technique, the ScopedSession provides a quick and relatively simple way of providing a single, global object in an application that is safe to be called upon from multiple threads.
See the examples in Contextual/Thread-local Sessions for setting up your own thread-safe sessions:
# set up a scoped_session
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
session_factory = sessionmaker(bind=some_engine)
Session = scoped_session(session_factory)
# now all calls to Session() will create a thread-local session
some_session = Session()
# you can now use some_session to run multiple queries, etc.
# remember to close it when you're finished!
Session.remove()

What's the recommended scoped_session usage pattern in a multithreaded sqlalchemy webapp?

I'm writing an application with python and sqlalchemy-0.7. It starts by initializing the sqlalchemy orm (using declarative) and then it starts a multithreaded web server - I'm currently using web.py for rapid prototyping but that could change in the future. I will also add other "threads" for scheduled jobs and so on, probably using other python threads.
From SA documentation I understand I have to use scoped_session() to get a thread-local session, so my web.py app should end up looking something like:
import web
from myapp.model import Session # scoped_session(sessionmaker(bind=engine))
from myapp.model import This, That, AndSoOn
urls = blah...
app = web.application(urls, globals())
class index:
def GET(self):
s = Session()
# get stuff done
Session().remove()
return(stuff)
class foo:
def GET(self):
s = Session()
# get stuff done
Session().remove()
return(stuff)
Is that the Right Way to handle the session?
As far as I understand, I should get a scoped_session at every method since it'll give me a thread local session that I could not obtain beforehand (like at the module level).
Also, I should call .remove() or .commit() or something like them at every method end, otherwise the session will still contain Persistent objects and I would not be able to query/access the same objects in other threads?
If that pattern is the correct one, it could probably be made better by writing it only once, maybe using a decorator? Such a decorator could get the session, invoke the method and then make sure to dispose the session properly. How would that pass the session to the decorated function?
Yes, this is the right way.
Example:
The Flask microframework with Flask-sqlalchemy extension does what you described. It also does .remove() automatically at the end of each HTTP request ("view" functions), so the session is released by the current thread. Calling just .commit() is not sufficient, you should use .remove().
When not using Flask views, I usually use a "with" statement:
#contextmanager
def get_db_session():
try:
yield session
finally:
session.remove()
with get_db_session() as session:
# do something with session
You can create a similar decorator.
Scoped session creates a DBMS connection pool, so this approach will be faster than opening/closing session at each HTTP request. It also works nice with greenlets (gevent or eventlet).
You don't need to create a scoped session if you create new session for each request and each request is handled by single thread.
You have to call s.commit() to make pending objects persistent, i.e. to save changes into database.
You may also want to close session by calling s.close().

Categories

Resources