I have a multi tenancy python falcon app. Every tenant have their own database. On incoming request, I need to connect to tenant database.
But there is a situation here. Database configs are stored on another service and configs changing regularly.
I tried session create before process resource. But sql queries slowing down after this change. To make this faster, what should I do?
P.S. : I use PostgreSQL
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
import config
import json
import requests
class DatabaseMiddleware:
def __init__(self):
pass
def process_resource(self, req, resp, resource, params):
engineConfig = requests.get('http://database:10003/v1/databases?loadOnly=config&appId=06535108-111a-11e9-ab14-d663bd873d93').text
engineConfig = json.loads(engineConfig)
engine = create_engine(
'{dms}://{user}:{password}#{host}:{port}/{dbName}'.format(
dms= engineConfig[0]['config']['dms'],
user= engineConfig[0]['config']['user'],
password= engineConfig[0]['config']['password'],
host= engineConfig[0]['config']['host'],
port= engineConfig[0]['config']['port'],
dbName= engineConfig[0]['config']['dbName']
))
session_factory = sessionmaker(bind=engine,autoflush=True)
databaseSession = scoped_session(session_factory)
resource.databaseSession = databaseSession
def process_response(self, req, resp, resource, req_succeeded):
if hasattr(resource, 'mainDatabase'):
if not req_succeeded:
resource.databaseSession.rollback()
self.databaseSession.remove()
Your approach is probably wrong since it is against the intended usage pattern of engine instances described in engine disposal. The lifetime of engine instance should be the same as for the instance of your middleware.
The Engine refers to a connection pool, which means under normal circumstances, there are open database connections present while the Engine object is still resident in memory. When an Engine is garbage collected, its connection pool is no longer referred to by that Engine, and assuming none of its connections are still checked out, the pool and its connections will also be garbage collected, which has the effect of closing out the actual database connections as well. But otherwise, the Engine will hold onto open database connections assuming it uses the normally default pool implementation of QueuePool.
The Engine is intended to normally be a permanent fixture established up-front and maintained throughout the lifespan of an application. It is not intended to be created and disposed on a per-connection basis; it is instead a registry that maintains both a pool of connections as well as configurational information about the database and DBAPI in use, as well as some degree of internal caching of per-database resources.
In conjunction with SQLAlchemy, I use SQLService as an interface layer to SQLAlchemy's session manager and ORM layer, which nicely centralizes the core functionality of SQLAlchemy.
Here is my middleware component definition:
class DatabaseSessionComponent(object):
""" Initiates a new Session for incoming request and closes it in the end. """
def __init__(self, sqlalchemy_database_uri):
self.sqlalchemy_database_uri = sqlalchemy_database_uri
def process_resource(self, req, resp, resource, params):
resource.db = sqlservice.SQLClient(
self.sqlalchemy_database_uri,
model_class=BaseModel
)
def process_response(self, req, resp, resource):
if hasattr(resource, "db"):
resource.db.disconnect()
With its instantiation within the API's instantiation here:
api = falcon.API(
middleware=[
DatabaseSessionComponent(os.environ["SQLALCHEMY_DATABASE_URI"]),
]
)
Related
I am running into a problem that my Dash/Flask web app is using too many mysql resources when used for a longer time. Eventually the server becomes incredibly slow because it tries to keep too many database connections alive. The project started based on on this article and is still organised in a similar way: https://hackersandslackers.com/plotly-dash-with-flask/
Once I open an URL from the website each Dash callback seems to open it's own connection to the database. Apart from the callback Flask opens a database connection as well to store the user session. The amount of open connections at the same time isn't really a problem, but the fact the connections aren't closed once finished is.
I've tried different settings and ways to setup the database connection, but none of them solved the problem of open database connections after the request is finished. Eventually the database runs out of resources because it tries to keep too many database connections open and the web app becomes unusable.
I've tried
db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import NullPool
from sqlalchemy.ext.declarative import declarative_base
app_engine = create_engine('databasestring', poolclass=NullPool)
db_app_session = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=app_engine))
Base = declarative_base()
Base.query = db_app_session.query_property()
def init_db():
Base.metadata.create_all(bind=app_engine)
and
db.py
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
from sqlalchemy.pool import NullPool
app_engine = create_engine('databasestring', poolclass=NullPool)
Session = sessionmaker(bind=app_engine)
And then import the db.py session / connection into the dash app.
Depending on the contents of db.py I use it in this way in the Dash app in the callback:
dash.py
#app.callback(
# Callback input/output
....
)
def update_graph(rows):
# ... Callback logic
session = Session()
domain: Domain = session.query(Domain).get(domain_id)
/*Do stuff */
session.close()
session.bind.dispose()
I've tried to close the database connections in the init.py of the Flask app with #app.after_request or #app.teardown_request but none of these seemed to work either.
init.py
#app.after_request
def after_request(response):
session = Session()
session.close()
session.bind.dispose()
return response
I am aware of Flask-alchemy package and tried that one as well but with similar results. When using similar code outside of Flask/Dash closing the connections after the code is finished does seem to work.
Adding the NullPool helped to get the connections close when code is executed outside of Flask/Dash, but not within the web app itself. So something still goes wrong within Flask/Dash, but I am unable to find what.
Who can point me into the right direction?
I've also found this issue and pin-pointed it to the login_required decorator. Essentially, each Dash view route has this decorator so any time the dash app is opened in Flask, it opens up a new DB connection, querying for the current user. I've brought it up on a GitHub post here.
I tried this out (in addition to the NullPool configuration) and it worked. Not sure if it's the right solution since it disposes of the database. Try it out and let me know.
#login.user_loader
def load_user(id):
user = db.session.query(User).filter(User.id == id).one_or_none()
db.session.close()
engine = db.get_engine(current_app)
engine.dispose()
return user
I'm used to do this:
from sqlalchemy.orm import sessionmaker
from sqlalchemy.engine import create_engine
Session = sessionmaker()
engine = create_engine("some connection db string", echo=False)
Session.configure(bind=engine)
db_con = Session()
try:
# DB MANIPULATION
finally:
db_con.close()
Is this a good habit? If so, why sqlalchemy does not permit you to do simply:
with Session() as db_con:
# DB MANIPULATION
?
No, this isn't good practice. It's easy to forget, and will make the code more confusing.
Instead, you can use the contextlib.closing context manager, and make that the only way to get a session.
# Wrapped in a custom context manager for better readability
#contextlib.contextmanager
def get_session():
with contextlib.closing(Session()) as session:
yield session
with get_session() as session:
session.add(...)
Firstly if you are done with the session object you should close the session. session.close will return the connection back to engine pool and if you are exiting the program you should dispose the engine pool with engine.dispose.
Now to your question. In most cases sessions will be used on long running applications like web server. Where it makes sense to centralize the session management. For example in flask-sqlalchemy session is created with start of each web-request and closed when the request of over.
I am creating flask app with Redis database. And I have one connection question
I can have Redis connection global and keep non-closed all time:
init.py
import os
from flask import Flask
import redis
app = Flask(__name__)
db = redis.StrictRedis(host='localhost', port=6379, db=0)
Also I can reconnect every request (Flask doc http://flask.pocoo.org/docs/tutorial/dbcon/):
init.py
import os
from flask import Flask
import redis
app = Flask(__name__)
#code...
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
Which method is better? Why I should use it?
Thanks for the help!
By default redis-py uses connection pooling. The github wiki says:
Behind the scenes, redis-py uses a connection pool to manage connections to a Redis server. By default, each Redis instance you create will in turn create its own connection pool.
This means that for most applications and assuming your redis server is on the same computer as your flask app, its unlikely that "opening a connection" for each request is going to cause any performance issues. The creator of Redis Py has suggested this approach:
a. create a global redis client instance and have your code use that.
b. create a global connection pool and pass that to various redis instances throughout your code.
Additionally, if you have a lot of instructions to execute at any one time then it may be worth having a look at pipelining as this reduces that back and forth time required for each instruction.
Using Flask, global variables are not recommended. We can use g to manage redis client during a request. Like manage a database connection using factory pattern.
from flask import g
import redis
def get_redis():
if 'db' not in g:
g.db = redis.Redis(host='localhost', port=6379, db=0)
return g.db
Reconnect every request is better for you.
The application context is a good place to store common data during a request or CLI command. Flask provides the g object for this purpose. It is a simple namespace object that has the same lifetime as an application context.
I am running Pylons using SQLAlchemy to connect to MySQL, so when I want to use a database connection in a controller, I can do this:
from myapp.model.meta import Session
class SomeController(BaseController):
def index(self):
conn = Session.connection()
rows = conn.execute('SELECT whatever')
...
Say my controller needs to call up an external library, that also needs a database connection, and I want to provide the connection for it from the SQLAlchemy MySQL connection that is already established:
from myapp.model.meta import Session
import mymodule
class SomeController(BaseController):
def index(self):
conn = Session.connection()
myobject = mymodule.someobject(DATABASE_OBJECT)
...
conn.close()
What should DATABSE_OBJECT be? Possibilities:
Pass Session -- and then open and close Session.connection() in the module code
Pass conn, and then call conn.close() in the controller
Just pass the connection parameters, and have the module code set up its own connection
There is another wrinkle, which is that I need to instantiate some objects in app_globals.py, and these objects need a database connection as well. It seems that app_globals.py cannot use Session's SQLAlchemy connection yet -- it's not bound yet.
Is my architecture fundamentally unsounds? Should I not be trying to share connections between Pylons and external libraries this way? Thanks!
You should not manage connections yourself - it's all done by SQLAlchemy. Just use scoped session object everywhere, and you will be fine.
def init_model(engine):
sm = orm.sessionmaker(autoflush=False, autocommit=False, expire_on_commit=False, bind=engine)
meta.engine = engine
meta.Session = orm.scoped_session(sm)
def index(self):
rows = Session.execute('SELECT ...')
You can pass Session object to your external library and do queries there as you wish. There is no need to call .close() on it.
Regarding app_globals, I solved that by adding other method in globals class which is called after db initialization from environment.py
class Globals(...):
def init_model(self, config):
self.some_persistent_db_object = Session.execute('...')
def load_environment(...):
...
config['pylons.app_globals'].init_model(config)
return config
What should DATABSE_OBJECT be? Possibilities:
4. pass a "proxy" or "helper" object with higher level of abstraction interface
Unless the external library really needs direct access to SQLAlchemy session, you could provide it with object that has methods like "get_account(account_no)" instead of "execute(sql)". Doing so would keep SQLAlchemy-specific code more isolated, and the code would be also easier to test.
Sorry that this is not so much an answer to your original question, more a design suggestion.
I'm using django with apache and mod_wsgi and PostgreSQL (all on same host), and I need to handle a lot of simple dynamic page requests (hundreds per second). I faced with problem that the bottleneck is that a django don't have persistent database connection and reconnects on each requests (that takes near 5ms).
While doing a benchmark I got that with persistent connection I can handle near 500 r/s while without I get only 50 r/s.
Anyone have any advice? How can I modify Django to use a persistent connection or speed up the connection from Python to DB?
Django 1.6 has added persistent connections support (link to doc for latest stable Django ):
Persistent connections avoid the overhead of re-establishing a
connection to the database in each request. They’re controlled by the
CONN_MAX_AGE parameter which defines the maximum lifetime of a
connection. It can be set independently for each database.
Try PgBouncer - a lightweight connection pooler for PostgreSQL.
Features:
Several levels of brutality when rotating connections:
Session pooling
Transaction pooling
Statement pooling
Low memory requirements (2k per connection by default).
In Django trunk, edit django/db/__init__.py and comment out the line:
signals.request_finished.connect(close_connection)
This signal handler causes it to disconnect from the database after every request. I don't know what all of the side-effects of doing this will be, but it doesn't make any sense to start a new connection after every request; it destroys performance, as you've noticed.
I'm using this now, but I havn't done a full set of tests to see if anything breaks.
I don't know why everyone thinks this needs a new backend or a special connection pooler or other complex solutions. This seems very simple, though I don't doubt there are some obscure gotchas that made them do this in the first place--which should be dealt with more sensibly; 5ms overhead for every request is quite a lot for a high-performance service, as you've noticed. (It takes me 150ms--I havn't figured out why yet.)
Edit: another necessary change is in django/middleware/transaction.py; remove the two transaction.is_dirty() tests and always call commit() or rollback(). Otherwise, it won't commit a transaction if it only read from the database, which will leave locks open that should be closed.
I created a small Django patch that implements connection pooling of MySQL and PostgreSQL via sqlalchemy pooling.
This works perfectly on production of http://grandcapital.net/ for a long period of time.
The patch was written after googling the topic a bit.
Disclaimer: I have not tried this.
I believe you need to implement a custom database back end. There are a few examples on the web that shows how to implement a database back end with connection pooling.
Using a connection pool would probably be a good solution for you case, as the network connections are kept open when connections are returned to the pool.
This post accomplishes this by patching Django (one of the comments points out that it is better to implement a custom back end outside of the core django code)
This post is an implementation of a custom db back end
Both posts use MySQL - perhaps you are able to use similar techniques with Postgresql.
Edit:
The Django Book mentions Postgresql connection pooling, using pgpool (tutorial).
Someone posted a patch for the psycopg2 backend that implements connection pooling. I suggest creating a copy of the existing back end in your own project and patching that one.
This is a package for django connection pool:
django-db-connection-pool
pip install django-db-connection-pool
You can provide additional options to pass to SQLAlchemy's pool creation, key's name is POOL_OPTIONS:
DATABASES = {
'default': {
...
'POOL_OPTIONS' : {
'POOL_SIZE': 10,
'MAX_OVERFLOW': 10
}
...
}
}
I made some small custom psycopg2 backend that implements persistent connection using global variable.
With this I was able to improve the amout of requests per second from 350 to 1600 (on very simple page with few selects)
Just save it in the file called base.py in any directory (e.g. postgresql_psycopg2_persistent) and set in settings
DATABASE_ENGINE to projectname.postgresql_psycopg2_persistent
NOTE!!! the code is not threadsafe - you can't use it with python threads because of unexpectable results, in case of mod_wsgi please use prefork daemon mode with threads=1
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using global variable
from django.db.backends.postgresql_psycopg2.base import DatabaseError, DatabaseWrapper as BaseDatabaseWrapper, \
IntegrityError
from psycopg2 import OperationalError
connection = None
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
global connection
if connection is not None and self.connection is None:
try: # Check if connection is alive
connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
connection = None
else:
self.connection = connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if connection is None and self.connection is not None:
connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None
Or here is a thread safe one, but python threads don't use multiple cores, so you won't get such performance boost as with previous one. You can use this one with multi process one too.
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using thread local storage
from threading import local
from django.db.backends.postgresql_psycopg2.base import DatabaseError, \
DatabaseWrapper as BaseDatabaseWrapper, IntegrityError
from psycopg2 import OperationalError
threadlocal = local()
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
if hasattr(threadlocal, 'connection') and threadlocal.connection is \
not None and self.connection is None:
try: # Check if connection is alive
threadlocal.connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
threadlocal.connection = None
else:
self.connection = threadlocal.connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if (not hasattr(threadlocal, 'connection') or threadlocal.connection \
is None) and self.connection is not None:
threadlocal.connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None