I would like to use psycopg2 (directly, without SQLAlchemy). Also, I would prefer using a connection pool to avoid initializing database connections on every request, as opposed to what (I think?) official docs recommend.
However, Flask app context has approximately the same lifetime as request context, which is the lifetime of the request, so defining the pool there would not make sense. The only cross-request place I found is in a global variable on module level, which seems to work, but I'm worried if this is safe?
In other words, where is the correct place to initialize a DB connection pool in a Flask application so that it is used across requests?
Related
I need to create a connection pool to database which can be reused by requests in Flask. The documentation(0.11.x) suggests to use g, the application context to store the database connections.
The issue is application context is created and destroyed before and after each request. Thus, there is no limit on the number of connection being created and no connection is getting reused. The code I am using is:
def get_some_connection():
if not hasattr(g, 'some_connection'):
logger.info('creating connection')
g.some_connection = SomeConnection()
return g.some_connection
and to close the connection
#app.teardown_appcontext
def destroy_some_connection(error):
logger.info('destroying some connection')
g.some_connection.close()
Is this intentional, that is, flask want to create a fresh connection everytime, or there is some issue with the use of application context in my code. Also, if its intentional is there a workaround to have the connection global. I see, some of the old extensions keep the connection in app['extension'] itself.
No, you'll have to have some kind of global connection pool. g lets you share state across one request, so between various templates and functions called while handling one request, without having to pass that 'global' state around, but it is not meant to be a replacement for module-global variables (which have the same lifetime as the module).
You can certainly set the database connection onto g to ensure all of your request code uses just the one connection, but you are still free to draw the connection from a (module) global pool.
I recommend you create connections per thread and pool these. You can either build this from scratch (use a threading.local object perhaps), or you can use a project like SQLAlchemy which comes with excellent connection pool implementations. This is basically what the Flask-SQLAlchemy extension does.
I am using Flask + SQLAlchemy (DB is Postgres) for my server, and am wondering how connection pooling happens. I know that it is enabled by default with a pool size of 5, but I don't know if my code triggers it.
Assuming I use the default flask SQLalchemy bridge :
db = SQLAlchemy(app)
And then use that object to place database calls like
db.session.query(......)
How does flask-sqlalchemy manage the connection pool behind the scene? Does it grab a new session every time I access db.session? When is this object returned to the pool (assuming I don't store it in a local variable)?
What is the correct pattern to write code to maximize concurrency + performance? If I access the DB multiple times in one serial method, is it a good idea to use db.session every time?
I was unable to find documentation on this manner, so I don't know what is happening behind the scene (the code works, but will it scale?)
Thanks!
You can use event registration - http://docs.sqlalchemy.org/en/latest/core/event.html#event-registration
There are many different event types that can be monitored, checkout, checkin, connect etc... - http://docs.sqlalchemy.org/en/latest/core/events.html
Here is a basic example from the docs on printing a when a new connection is established.
from sqlalchemy.event import listen
from sqlalchemy.pool import Pool
def my_on_connect(dbapi_con, connection_record):
print "New DBAPI connection:", dbapi_con
listen(Pool, 'connect', my_on_connect)
I'm running PostgreSQL 9.3 and SQLAlchemy 0.8.2 and experience database connections leaking. After deploying the app consumes around 240 connections. Over next 30 hours this number gradually grows to 500, when PostgreSQL will start dropping connections.
I use SQLAlchemy thread-local sessions:
from sqlalchemy import orm, create_engine
engine = create_engine(os.environ['DATABASE_URL'], echo=False)
Session = orm.scoped_session(orm.sessionmaker(engine))
For the Flask web app, the .remove() call to the Session proxy-object is send during request teardown:
#app.teardown_request
def teardown_request(exception=None):
if not app.testing:
Session.remove()
This should be the same as what Flask-SQLAlchemy is doing.
I also have some periodic tasks that run in a loop, and I call .remove() for every iteration of the loop:
def run_forever():
while True:
do_stuff(Session)
Session.remove()
What am I doing wrong which could lead to a connection leak?
If I remember correctly from my experiments with SQLAlchemy, the scoped_session() is used to create sessions that you can access from multiple places. That is, you create a session in one method and use it in another without explicitly passing the session object around.
It does that by keeping a list of sessions and associating them with a "scope ID". By default, to obtain a scope ID, it uses the current thread ID; so you have session per thread. You can supply a scopefunc to provide – for example – one ID per request:
# This is (approx.) what flask-sqlalchemy does:
from flask import _request_ctx_stack as context_stack
Session = orm.scoped_session(orm.sessionmaker(engine),
scopefunc=context_stack.__ident_func__)
Also, take note of the other answers and comments about doing background tasks.
First of all, it is a really really bad way to run background tasks. Try any ASync scheduler like celery.
Not 100% sure so this is a bit of a guess based on the information provided, but I wonder if each page load is starting a new db connection which is then listening for notifications. If this is the case, I wonder if the db connection is effectively removed from the pool and so gets created on the next page load.
If this is the case, my recommendation would be to have a separate DBI database handle dedicated to listening for notifications so that these are not active in the queue. This might be done outside your workflow.
Also
Particularly, the leak is happening when making more than one simultaneous requests. At the same time, I could see some of the requests were left with uncompleted query execution and timing out. You can write something to manage this yourself.
It looks like this is what e.g. MongoEngine does. The goal is to have model files be able to access the db without having to explicitly pass around the context.
Pyramid has nothing to do with it. The global needs to handle whatever mechanism the WSGI server is using to serve your application.
For instance, most servers use a separate thread per request, so your global variable needs to be threadsafe. gunicorn and gevent are served using greenlets, which is a different mechanic.
A lot of engines/orms support a threadlocal connection. This will allow you to access your connection as if it were a global variable, but it is a different variable in each thread. You just have to make sure to close the connection when the request is complete to avoid that connection spilling over into the next request in the same thread. This can be done easily using a Pyramid tween or several other patterns illustrated in the cookbook.
I'm writing an application with python and sqlalchemy-0.7. It starts by initializing the sqlalchemy orm (using declarative) and then it starts a multithreaded web server - I'm currently using web.py for rapid prototyping but that could change in the future. I will also add other "threads" for scheduled jobs and so on, probably using other python threads.
From SA documentation I understand I have to use scoped_session() to get a thread-local session, so my web.py app should end up looking something like:
import web
from myapp.model import Session # scoped_session(sessionmaker(bind=engine))
from myapp.model import This, That, AndSoOn
urls = blah...
app = web.application(urls, globals())
class index:
def GET(self):
s = Session()
# get stuff done
Session().remove()
return(stuff)
class foo:
def GET(self):
s = Session()
# get stuff done
Session().remove()
return(stuff)
Is that the Right Way to handle the session?
As far as I understand, I should get a scoped_session at every method since it'll give me a thread local session that I could not obtain beforehand (like at the module level).
Also, I should call .remove() or .commit() or something like them at every method end, otherwise the session will still contain Persistent objects and I would not be able to query/access the same objects in other threads?
If that pattern is the correct one, it could probably be made better by writing it only once, maybe using a decorator? Such a decorator could get the session, invoke the method and then make sure to dispose the session properly. How would that pass the session to the decorated function?
Yes, this is the right way.
Example:
The Flask microframework with Flask-sqlalchemy extension does what you described. It also does .remove() automatically at the end of each HTTP request ("view" functions), so the session is released by the current thread. Calling just .commit() is not sufficient, you should use .remove().
When not using Flask views, I usually use a "with" statement:
#contextmanager
def get_db_session():
try:
yield session
finally:
session.remove()
with get_db_session() as session:
# do something with session
You can create a similar decorator.
Scoped session creates a DBMS connection pool, so this approach will be faster than opening/closing session at each HTTP request. It also works nice with greenlets (gevent or eventlet).
You don't need to create a scoped session if you create new session for each request and each request is handled by single thread.
You have to call s.commit() to make pending objects persistent, i.e. to save changes into database.
You may also want to close session by calling s.close().