flask's application context and global connection - python

I need to create a connection pool to database which can be reused by requests in Flask. The documentation(0.11.x) suggests to use g, the application context to store the database connections.
The issue is application context is created and destroyed before and after each request. Thus, there is no limit on the number of connection being created and no connection is getting reused. The code I am using is:
def get_some_connection():
if not hasattr(g, 'some_connection'):
logger.info('creating connection')
g.some_connection = SomeConnection()
return g.some_connection
and to close the connection
#app.teardown_appcontext
def destroy_some_connection(error):
logger.info('destroying some connection')
g.some_connection.close()
Is this intentional, that is, flask want to create a fresh connection everytime, or there is some issue with the use of application context in my code. Also, if its intentional is there a workaround to have the connection global. I see, some of the old extensions keep the connection in app['extension'] itself.

No, you'll have to have some kind of global connection pool. g lets you share state across one request, so between various templates and functions called while handling one request, without having to pass that 'global' state around, but it is not meant to be a replacement for module-global variables (which have the same lifetime as the module).
You can certainly set the database connection onto g to ensure all of your request code uses just the one connection, but you are still free to draw the connection from a (module) global pool.
I recommend you create connections per thread and pool these. You can either build this from scratch (use a threading.local object perhaps), or you can use a project like SQLAlchemy which comes with excellent connection pool implementations. This is basically what the Flask-SQLAlchemy extension does.

Related

Where should DB connection pool be initialized in Flask?

I would like to use psycopg2 (directly, without SQLAlchemy). Also, I would prefer using a connection pool to avoid initializing database connections on every request, as opposed to what (I think?) official docs recommend.
However, Flask app context has approximately the same lifetime as request context, which is the lifetime of the request, so defining the pool there would not make sense. The only cross-request place I found is in a global variable on module level, which seems to work, but I'm worried if this is safe?
In other words, where is the correct place to initialize a DB connection pool in a Flask application so that it is used across requests?

SQLAlchemy - work with connections to DB and Sessions (not clear behavior and part in documentation)

I use SQLAlchemy (really good ORM but documentation is not clear enough) for communicating with PostgreSQL
Everything was great till one case when postgres "crashed" cause of maximum connection limits was reached: no more connections allowed (max_client_conn).
That case makes me think that I do smth wrong. After few experiments I figure out how not to face that issue again, but some questions left
Below you'll see code examples (in Python 3+, PostgreSQL settings are default) without and with mentioned issue, and what I'd like to hear eventually is answers on following questions:
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
What is the best practices to open and close connections(and/or work with Session) when you use SQLAlchemy and PostgreSQL DB for multiple instances of script (or separate threads in script) that listens requests and has to have separate session to each of them? (I mean raw SQLAlchemy not Flask-SQLAlchemy or smth like this)
Working example of code without issue:
making connection to DB:
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
function to get session to work with DB:
from contextlib import contextmanager
#contextmanager
def session_scope():
con_loc, meta_loc = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con_loc)
"""Provide a transactional scope around a series of operations."""
session = Session()
try:
yield session
session.commit()
except:
session.rollback()
raise
query example:
with session_scope() as session:
entity = session.query(SomeEntity).first()
Failing example of code:
function to get session to work with DB:
def create_session():
# connect method the same as in first example
con, meta = connect(db_user, db_pass, db_instance, 'localhost')
Session = sessionmaker(bind=con)
session = Session()
return session
query example:
session = create_session()
entity = session.query(SomeEntity).first()
Hope you got the main idea
First of all you should not create engines repeatedly in your connect() function. The usual practice is to have a single global Engine instance per database URL in your application. The same goes for the Session class created by the sessionmaker().
What exactly does context manager do with connections and sessions? Closing session and disposing connection or what?
What you've programmed it to do, and if this seems unclear, read about context managers in general. In this case it commits or rolls back the session if an exception was raised within the block governed by the with-statement. Both actions return the connection used by the session to the pool, which in your case is a NullPool, so the connection is simply closed.
Why does first working example of code behave as example with issue without NullPool as poolclass in "connect" method?
and
from sqlalchemy.pool import NullPool # does not work without NullPool, why?
Without NullPool the engines you repeatedly create also pool connections, so if they for some reason do not go out of scope, or their refcounts are otherwise not zeroed, they will hold on to the connections even if the sessions return them. It is unclear if the sessions go out of scope timely in the second example, so they might also be holding on to the connections.
Why in the first example I got only 1 connection to db for all queries but in second example I got separate connection for each query? (please correct me if I understood it wrong, was checking it with "pgbouncer")
The first example ends up closing the connection due to the use of the context manager that handles transactions properly and the NullPool, so the connection is returned to the bouncer, which is another pool layer.
The second example might never close the connections because it lacks the transaction handling, but that's unclear due to the example given. It also might be holding on to connections in the separate engines that you create.
The 4th point of your question set is pretty much covered by the official documentation in "Session Basics", especially "When do I construct a Session, when do I commit it, and when do I close it?" and "Is the session thread-safe?".
There's one exception: multiple instances of the script. You should not share an engine between processes, so in order to pool connections between them you need an external pool such as the PgBouncer.
What exactly does context manager do with connections and sessions?
Closing session and disposing connection or what?
The context manager in Python is used to create a runtime context for use with the with statement. Simply, when you run the code:
with session_scope() as session:
entity = session.query(SomeEntity).first()
session is the yielded session. So, to your question of what the context manager does with the connections and sessions, all you have to do is look at what happens after the yield to see what happens. In this case it's just:
try:
yield session
session.commit()
except:
session.rollback()
raise
If you trigger no exceptions, it will be session.commit(), which according to the SQLAlchemy docs will "Flush pending changes and commit the current transaction."
Why does first working example of code behave as example with issue
without NullPool as poolclass in "connect" method?
The poolclass argument is just telling SQLAlchemy which subclass of Pool to use. However, in the case where you pass NullPool here, you are telling SQLAlchemy to not use a pool. You're effectively disabling pooling connections when you pass in NullPool. From the docs: "to disable pooling, set poolclass to NullPool instead." I can't say for sure but using NullPool is probably contributing to your max_connection issues.
Why in the first example I got only 1 connection to db for all queries
but in second example I got separate connection for each query?
(please correct me if I understood it wrong, was checking it with
"pgbouncer")
I'm not exactly sure. I think this has to do with how in the first example, you are using a context manager so everything within the with block will use a session generator. In your second example, you created a function that initializes a new Session and returns it, so you're not getting back a generator. I also think this has to do with your NullPool use which prevents connection pooling. With NullPool each query execution is acquiring a connection on its own.
What is the best practices to open and close connections(and/or work
with Session) when you use SQLAlchemy and PostgreSQL DB for multiple
instances of script (or separate threads in script) that listens
requests and has to have separate session to each of them? (I mean raw
SQLAlchemy not Flask-SQLAlchemy or smth like this)
See the section Is the session thread-safe? for this, but you need to take a "share nothing" approach to your concurrency. So in your case, you need each instance of a script to share nothing between each other.
You probably want to check out Working with Engines and Connections. I don't think messing with sessions is where you want to be if concurrency is what you're working on. There's more information about the NullPool and concurrency there:
For a multiple-process application that uses the os.fork system call,
or for example the Python multiprocessing module, it’s usually
required that a separate Engine be used for each child process. This
is because the Engine maintains a reference to a connection pool that
ultimately references DBAPI connections - these tend to not be
portable across process boundaries. An Engine that is configured not
to use pooling (which is achieved via the usage of NullPool) does not
have this requirement.
#Ilja Everilä answer was mostly helpful
I'll leave edited code here, maybe it'll help someone
New code that works like I expected is following:
making connection to DB::
from sqlalchemy.pool import NullPool # will work even without NullPool in code
def connect(user, password, db, host='localhost', port=5432):
"""Returns a connection and a metadata object"""
url = 'postgresql://{}:{}#{}:{}/{}'.format(user, password, host, port, db)
temp_con = sqlalchemy.create_engine(url, client_encoding='utf8', poolclass=NullPool)
temp_meta = sqlalchemy.MetaData(bind=temp_con, reflect=True)
return temp_con, temp_meta
one instance of connection and sessionmaker per app, for example where your main function:
from sqlalchemy.orm import sessionmaker
# create one connection and Sessionmaker to each instance of app (to avoid creating it repeatedly)
con, meta = connect(db_user, db_pass, db_instance, db_host)
session_maker = sessionmaker(bind=con) enter code here
function to get session with with statement:
from contextlib import contextmanager
from some_place import session_maker
#contextmanager
def session_scope() -> Session:
"""Provide a transactional scope around a series of operations."""
session = session_maker() # create session from SQLAlchemy sessionmaker
try:
yield session
session.commit()
except:
session.rollback()
raise
wrap transaction and use session:
with session_scope() as session:
entity = session.query(SomeEntity).first()

Managing connection creation in Python?

Applications often need to connect to other services (a database, a cache, an API, etc). For sanity and DRY, we'd like to keep all of these connections in one module so the rest of our code base can share connections.
To reduce boilerplate, downstream usage should be simple:
# app/do_stuff.py
from .connections import AwesomeDB
db = AwesomeDB()
def get_stuff():
return db.get('stuff')
And setting up the connection should also be simple:
# app/cli.py or some other main entry point
from .connections import AwesomeDB
db = AwesomeDB()
db.init(username='stuff admin') # Or os.environ['DB_USER']
Web frameworks like Django and Flask do something like this, but it feels a bit clunky:
Connect to a Database in Flask, Which Approach is better?
http://flask.pocoo.org/docs/0.10/tutorial/dbcon/
One big issue with this is that we want a reference to the actual connection object instead of a proxy, because we want to retain tab-completion in iPython and other dev environments.
So what's the Right Way (tm) to do it? After a few iterations, here's my idea:
#app/connections.py
from awesome_database import AwesomeDB as RealAwesomeDB
from horrible_database import HorribleDB as RealHorribleDB
class ConnectionMixin(object):
__connection = None
def __new__(cls):
cls.__connection = cls.__connection or object.__new__(cls)
return cls.__connection
def __init__(self, real=False, **kwargs):
if real:
super().__init__(**kwargs)
def init(self, **kwargs):
kwargs['real'] = True
self.__init__(**kwargs)
class AwesomeDB(ConnectionMixin, RealAwesomeDB):
pass
class HorribleDB(ConnectionMixin, RealHorribleDB):
pass
Room for improvement: Set initial __connection to a generic ConnectionProxy instead of None, which catches all attribute access and throws an exception.
I've done quite a bit of poking around here on SO and in various OSS projects and haven't seen anything like this. It feels pretty solid, though it does mean a bunch of modules will be instantiating connection objects as a side effect at import time. Will this blow up in my face? Are there any other negative consequences to this approach?
First, design-wise, I might be missing something, but I don't see why you need the heavy mixin+singleton machinery instead of just defining a helper like so:
_awesome_db = None
def awesome_db(**overrides):
global _awesome_db
if _awesome_db is None:
# Read config/set defaults.
# overrides.setdefault(...)
_awesome_db = RealAwesomeDB(**overrides)
return _awesome_db
Also, there is a bug that might not look like a supported use-case, but anyway: if you make the following 2 calls in a row, you would wrongly get the same connection object twice even though you passed different parameters:
db = AwesomeDB()
db.init(username='stuff admin')
db = AwesomeDB()
db.init(username='not-admin') # You'll get admin connection here.
An easy fix for that would be to use a dict of connections keyed on the input parameters.
Now, on the essence of the question.
I think the answer depends on how your "connection" classes are actually implemented.
Potential downsides with your approach I see are:
In a multithreaded environment you could get problems with unsychronized concurrent access to the global connection object from multiple threads, unless it is already thread-safe. If you care about that, you could change your code and interface a bit and use a thread-local variable.
What if a process forks after creating the connection? Web application servers tend to do that and it might not be safe, again depending on the underlying connection.
Does the connection object have state? What happens if the connection object becomes invalid (due to i.e. connection error/time out)? You might need to replace the broken connection with a new one to return the next time a connection is requested.
Connection management is often already efficiently and safely implemented through a connection pool in client libraries.
For example, the redis-py Redis client uses the following implementation:
https://github.com/andymccurdy/redis-py/blob/1c2071762ad9b9288e786665990083e61c1cf355/redis/connection.py#L974
The Redis client then uses the connection pool like so:
Requests a connection from the connection pool.
Tries to execute a command on the connection.
If the connection fails, the client closes it.
In any case, finaly it is returned to the connection pool so it can be reused by subsequent calls or other threads.
So since the Redis client handles all of that under the hood, you can safely do what you want directly. Connections will be lazily created until the connection pool reaches full capacity.
# app/connections.py
def redis_client(**kwargs):
# Maybe read configuration/set default arguments
# kwargs.setdefault()
return redis.Redis(**kwargs)
Similarly, SQLAlchemy can use connection pooling as well.
To summarize, my understanding is that:
If your client library supports connection pooling, you don't need to do anything special to share connections between modules and even threads. You could just define a helper similar to redis_client() that reads configuration, or specifies default parameters.
If your client library provides only low-level connection objects, you will need to make sure access to them is thread-safe and fork-safe. Also, you need to make sure each time you return a valid connection (or raise an exception if you can't establish or reuse an existing one).

Leaking database connections: PostgreSQL, SQLAlchemy, Flask

I'm running PostgreSQL 9.3 and SQLAlchemy 0.8.2 and experience database connections leaking. After deploying the app consumes around 240 connections. Over next 30 hours this number gradually grows to 500, when PostgreSQL will start dropping connections.
I use SQLAlchemy thread-local sessions:
from sqlalchemy import orm, create_engine
engine = create_engine(os.environ['DATABASE_URL'], echo=False)
Session = orm.scoped_session(orm.sessionmaker(engine))
For the Flask web app, the .remove() call to the Session proxy-object is send during request teardown:
#app.teardown_request
def teardown_request(exception=None):
if not app.testing:
Session.remove()
This should be the same as what Flask-SQLAlchemy is doing.
I also have some periodic tasks that run in a loop, and I call .remove() for every iteration of the loop:
def run_forever():
while True:
do_stuff(Session)
Session.remove()
What am I doing wrong which could lead to a connection leak?
If I remember correctly from my experiments with SQLAlchemy, the scoped_session() is used to create sessions that you can access from multiple places. That is, you create a session in one method and use it in another without explicitly passing the session object around.
It does that by keeping a list of sessions and associating them with a "scope ID". By default, to obtain a scope ID, it uses the current thread ID; so you have session per thread. You can supply a scopefunc to provide – for example – one ID per request:
# This is (approx.) what flask-sqlalchemy does:
from flask import _request_ctx_stack as context_stack
Session = orm.scoped_session(orm.sessionmaker(engine),
scopefunc=context_stack.__ident_func__)
Also, take note of the other answers and comments about doing background tasks.
First of all, it is a really really bad way to run background tasks. Try any ASync scheduler like celery.
Not 100% sure so this is a bit of a guess based on the information provided, but I wonder if each page load is starting a new db connection which is then listening for notifications. If this is the case, I wonder if the db connection is effectively removed from the pool and so gets created on the next page load.
If this is the case, my recommendation would be to have a separate DBI database handle dedicated to listening for notifications so that these are not active in the queue. This might be done outside your workflow.
Also
Particularly, the leak is happening when making more than one simultaneous requests. At the same time, I could see some of the requests were left with uncompleted query execution and timing out. You can write something to manage this yourself.

In Pyramid, is it safe to have a python global variable that stores the db connection?

It looks like this is what e.g. MongoEngine does. The goal is to have model files be able to access the db without having to explicitly pass around the context.
Pyramid has nothing to do with it. The global needs to handle whatever mechanism the WSGI server is using to serve your application.
For instance, most servers use a separate thread per request, so your global variable needs to be threadsafe. gunicorn and gevent are served using greenlets, which is a different mechanic.
A lot of engines/orms support a threadlocal connection. This will allow you to access your connection as if it were a global variable, but it is a different variable in each thread. You just have to make sure to close the connection when the request is complete to avoid that connection spilling over into the next request in the same thread. This can be done easily using a Pyramid tween or several other patterns illustrated in the cookbook.

Categories

Resources