Pylons: Sharing SQLAlchemy MySQL connection with external library - python

I am running Pylons using SQLAlchemy to connect to MySQL, so when I want to use a database connection in a controller, I can do this:
from myapp.model.meta import Session
class SomeController(BaseController):
def index(self):
conn = Session.connection()
rows = conn.execute('SELECT whatever')
...
Say my controller needs to call up an external library, that also needs a database connection, and I want to provide the connection for it from the SQLAlchemy MySQL connection that is already established:
from myapp.model.meta import Session
import mymodule
class SomeController(BaseController):
def index(self):
conn = Session.connection()
myobject = mymodule.someobject(DATABASE_OBJECT)
...
conn.close()
What should DATABSE_OBJECT be? Possibilities:
Pass Session -- and then open and close Session.connection() in the module code
Pass conn, and then call conn.close() in the controller
Just pass the connection parameters, and have the module code set up its own connection
There is another wrinkle, which is that I need to instantiate some objects in app_globals.py, and these objects need a database connection as well. It seems that app_globals.py cannot use Session's SQLAlchemy connection yet -- it's not bound yet.
Is my architecture fundamentally unsounds? Should I not be trying to share connections between Pylons and external libraries this way? Thanks!

You should not manage connections yourself - it's all done by SQLAlchemy. Just use scoped session object everywhere, and you will be fine.
def init_model(engine):
sm = orm.sessionmaker(autoflush=False, autocommit=False, expire_on_commit=False, bind=engine)
meta.engine = engine
meta.Session = orm.scoped_session(sm)
def index(self):
rows = Session.execute('SELECT ...')
You can pass Session object to your external library and do queries there as you wish. There is no need to call .close() on it.
Regarding app_globals, I solved that by adding other method in globals class which is called after db initialization from environment.py
class Globals(...):
def init_model(self, config):
self.some_persistent_db_object = Session.execute('...')
def load_environment(...):
...
config['pylons.app_globals'].init_model(config)
return config

What should DATABSE_OBJECT be? Possibilities:
4. pass a "proxy" or "helper" object with higher level of abstraction interface
Unless the external library really needs direct access to SQLAlchemy session, you could provide it with object that has methods like "get_account(account_no)" instead of "execute(sql)". Doing so would keep SQLAlchemy-specific code more isolated, and the code would be also easier to test.
Sorry that this is not so much an answer to your original question, more a design suggestion.

Related

When using airflow I get "UserWarning: MongoClient opened before fork" + best practice tips to instantiate mongo client in python

I have created a python flow that executes various steps. I also use MongoDB which for now is used for config purposes but will further use it for persistence.
When I execute my code from Pycharm all is good but when I executed it through Airflow I get the fork warning "UserWarning: MongoClient opened before fork. Create MongoClient only after forking.".
At first I thought that the problem is that I open many instances of the Mongo client so I used the singleton pattern so that the connection is instantiated only once -please see code below-.
The only way to get rid of the warning is by adding connect=False parameter as you may see commented in my code, but it seems to be a workaround and not a steady solution --according to documentation-.
So what is the problem here? It has something to do with Airflow maybe -since I am not using the mongo_hook-?
In addition please let me know if using the singleton pattern to instantiate once the Mongo Client is a good practice? The client is getting called from various modules.
Note that Mongo as well as Airflow are running in separate dockers.
def singleton(class_):
instances = {}
def get_instance(*args, **kwargs):
if class_ not in instances:
instances[class_] = class_(*args, **kwargs)
return instances[class_]
return get_instance
#singleton
class MongoPersistence(Persistence):
def __init__(self, driver, host, user, password, port, db):
self._uri = '{driver}://{host}:{port}'.format(driver=driver, host=host, port=port)
self._client = MongoClient(self._uri
, serverSelectionTimeoutMS=3000 # 3 second timeout
, username=self.user
, password=self.password
# ,connect=False
)
def find_one(self, **kwargs):
return self._client[kwargs.get('db_name')][kwargs.get('collection_name')].find_one(kwargs.get('query'))
Thank you,
Dina

How to SQLAlchemy database session create per Request

I have a multi tenancy python falcon app. Every tenant have their own database. On incoming request, I need to connect to tenant database.
But there is a situation here. Database configs are stored on another service and configs changing regularly.
I tried session create before process resource. But sql queries slowing down after this change. To make this faster, what should I do?
P.S. : I use PostgreSQL
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session
from sqlalchemy.orm import sessionmaker
import config
import json
import requests
class DatabaseMiddleware:
def __init__(self):
pass
def process_resource(self, req, resp, resource, params):
engineConfig = requests.get('http://database:10003/v1/databases?loadOnly=config&appId=06535108-111a-11e9-ab14-d663bd873d93').text
engineConfig = json.loads(engineConfig)
engine = create_engine(
'{dms}://{user}:{password}#{host}:{port}/{dbName}'.format(
dms= engineConfig[0]['config']['dms'],
user= engineConfig[0]['config']['user'],
password= engineConfig[0]['config']['password'],
host= engineConfig[0]['config']['host'],
port= engineConfig[0]['config']['port'],
dbName= engineConfig[0]['config']['dbName']
))
session_factory = sessionmaker(bind=engine,autoflush=True)
databaseSession = scoped_session(session_factory)
resource.databaseSession = databaseSession
def process_response(self, req, resp, resource, req_succeeded):
if hasattr(resource, 'mainDatabase'):
if not req_succeeded:
resource.databaseSession.rollback()
self.databaseSession.remove()
Your approach is probably wrong since it is against the intended usage pattern of engine instances described in engine disposal. The lifetime of engine instance should be the same as for the instance of your middleware.
The Engine refers to a connection pool, which means under normal circumstances, there are open database connections present while the Engine object is still resident in memory. When an Engine is garbage collected, its connection pool is no longer referred to by that Engine, and assuming none of its connections are still checked out, the pool and its connections will also be garbage collected, which has the effect of closing out the actual database connections as well. But otherwise, the Engine will hold onto open database connections assuming it uses the normally default pool implementation of QueuePool.
The Engine is intended to normally be a permanent fixture established up-front and maintained throughout the lifespan of an application. It is not intended to be created and disposed on a per-connection basis; it is instead a registry that maintains both a pool of connections as well as configurational information about the database and DBAPI in use, as well as some degree of internal caching of per-database resources.
In conjunction with SQLAlchemy, I use SQLService as an interface layer to SQLAlchemy's session manager and ORM layer, which nicely centralizes the core functionality of SQLAlchemy.
Here is my middleware component definition:
class DatabaseSessionComponent(object):
""" Initiates a new Session for incoming request and closes it in the end. """
def __init__(self, sqlalchemy_database_uri):
self.sqlalchemy_database_uri = sqlalchemy_database_uri
def process_resource(self, req, resp, resource, params):
resource.db = sqlservice.SQLClient(
self.sqlalchemy_database_uri,
model_class=BaseModel
)
def process_response(self, req, resp, resource):
if hasattr(resource, "db"):
resource.db.disconnect()
With its instantiation within the API's instantiation here:
api = falcon.API(
middleware=[
DatabaseSessionComponent(os.environ["SQLALCHEMY_DATABASE_URI"]),
]
)

Mock library method python

I'm trying to unit test a class connection to a database. To avoid hardcoding a database, I'd like to assert that the mysql.connector.connect method is called with adequate values.
from mysql.connector import connect
from mysql.connector import Error
from discovery.database import Database
class MariaDatabase(Database):
def connect(self, username, password):
"""
Don't forget to close the connection !
:return: connection to the database
"""
try:
return connect(host=str(self.target),
database=self.db_name,
user=username,
password=password)
I've read documentation around mocks and similar problems (notably this one Python Unit Test : How to unit test the module which contains database operations?, which I thought would solve my problem, but mysql.connector.connect keeps being called instead of the mock).
I don't know what could I do to UTest this class

Why does the python Cassandra driver fail when declared as a class field?

I'm currently working with a server process that is meant to connect to a Cassandra database and forward information to it. This process is written as a class, and my goal is to create a single Cassandra session for this class that it can use to send the information. However, I'm running into a problem; when I create a Cassandra session in the classes init method, and then later try to use the session in another method, I get the following error:
errors={}, last_host=<server IP address>. I can currently get around this problem by creating a new Cassandra session each time the method is called, but this is obviously not a good way to go about this. So, how can I make a Cassandra session that I can use consistently throughout the class?
This code does NOT work:
from cassandra.cluster import Cluster
from multiprocessing import Process
class DataProcess(Process):
def __init__(self):
super(DataProcess,self).__init__()
# Do a few other irrelevant things ...
# Set up the Cassandra connection
self.cluster = Cluster(contact_points=[CASSANDRA_IP])
self.session = self.cluster.connect('some keyspace')
print "Connected to cassandra."
def callback(self,ch,method,props,body):
prepared_statement = self.session.prepare("Some CQL statement...")
bound_statement = prepared_statement.bind(some values)
self.session.execute(bound_statement)
Output:
"Connected to cassandra."
errors={}, last_host=<server IP address>
This code DOES work, but it's a silly way to do it:
from cassandra.cluster import Cluster
from multiprocessing import Process
class DataProcess(Process):
def __init__(self):
super(DataProcess,self).__init__()
# Do a few irrelevant things ...
def callback(self,ch,method,props,body):
# Set up the Cassandra connection
cluster = Cluster(contact_points=[CASSANDRA_IP])
session = cluster.connect('some keyspace')
prepared_statement = session.prepare("Some CQL statement...")
bound_statement = prepared_statement.bind(some values)
session.execute(bound_statement)
Other relevant info:
Using python cassandra-driver version 2.5.1
Cassandra database version 2.1.8
EDIT: ANSWER
The following code solved the issue:
from cassandra.cluster import Cluster
from multiprocessing import Process
class DataProcess(Process):
def __init__(self):
super(DataProcess,self).__init__()
self.cluster = None
self.session = None
# Do a few irrelevant things ...
def callback(self,ch,method,props,body):
# Set up the Cassandra connection
cluster = Cluster(contact_points=[CASSANDRA_IP])
session = cluster.connect('some keyspace')
prepared_statement = session.prepare("Some CQL statement...")
bound_statement = prepared_statement.bind(some values)
session.execute(bound_statement)
def run(self):
self.cluster = Cluster(contact_points=[CASSANDRA_IP])
self.session = self.cluster.connect('some keyspace')
Are you creating your cluster and sessions before the fork? That could cause issues like what you are seeing. There is an amazing writeup of how to distribute work using pools using the python driver here here. This will likely be exactly what you are looking for.
If it's not please leave more context on how you are running your process, as it's tough to reproduce without knowing your process lifecycle.

Django persistent database connection

I'm using django with apache and mod_wsgi and PostgreSQL (all on same host), and I need to handle a lot of simple dynamic page requests (hundreds per second). I faced with problem that the bottleneck is that a django don't have persistent database connection and reconnects on each requests (that takes near 5ms).
While doing a benchmark I got that with persistent connection I can handle near 500 r/s while without I get only 50 r/s.
Anyone have any advice? How can I modify Django to use a persistent connection or speed up the connection from Python to DB?
Django 1.6 has added persistent connections support (link to doc for latest stable Django ):
Persistent connections avoid the overhead of re-establishing a
connection to the database in each request. They’re controlled by the
CONN_MAX_AGE parameter which defines the maximum lifetime of a
connection. It can be set independently for each database.
Try PgBouncer - a lightweight connection pooler for PostgreSQL.
Features:
Several levels of brutality when rotating connections:
Session pooling
Transaction pooling
Statement pooling
Low memory requirements (2k per connection by default).
In Django trunk, edit django/db/__init__.py and comment out the line:
signals.request_finished.connect(close_connection)
This signal handler causes it to disconnect from the database after every request. I don't know what all of the side-effects of doing this will be, but it doesn't make any sense to start a new connection after every request; it destroys performance, as you've noticed.
I'm using this now, but I havn't done a full set of tests to see if anything breaks.
I don't know why everyone thinks this needs a new backend or a special connection pooler or other complex solutions. This seems very simple, though I don't doubt there are some obscure gotchas that made them do this in the first place--which should be dealt with more sensibly; 5ms overhead for every request is quite a lot for a high-performance service, as you've noticed. (It takes me 150ms--I havn't figured out why yet.)
Edit: another necessary change is in django/middleware/transaction.py; remove the two transaction.is_dirty() tests and always call commit() or rollback(). Otherwise, it won't commit a transaction if it only read from the database, which will leave locks open that should be closed.
I created a small Django patch that implements connection pooling of MySQL and PostgreSQL via sqlalchemy pooling.
This works perfectly on production of http://grandcapital.net/ for a long period of time.
The patch was written after googling the topic a bit.
Disclaimer: I have not tried this.
I believe you need to implement a custom database back end. There are a few examples on the web that shows how to implement a database back end with connection pooling.
Using a connection pool would probably be a good solution for you case, as the network connections are kept open when connections are returned to the pool.
This post accomplishes this by patching Django (one of the comments points out that it is better to implement a custom back end outside of the core django code)
This post is an implementation of a custom db back end
Both posts use MySQL - perhaps you are able to use similar techniques with Postgresql.
Edit:
The Django Book mentions Postgresql connection pooling, using pgpool (tutorial).
Someone posted a patch for the psycopg2 backend that implements connection pooling. I suggest creating a copy of the existing back end in your own project and patching that one.
This is a package for django connection pool:
django-db-connection-pool
pip install django-db-connection-pool
You can provide additional options to pass to SQLAlchemy's pool creation, key's name is POOL_OPTIONS:
DATABASES = {
'default': {
...
'POOL_OPTIONS' : {
'POOL_SIZE': 10,
'MAX_OVERFLOW': 10
}
...
}
}
I made some small custom psycopg2 backend that implements persistent connection using global variable.
With this I was able to improve the amout of requests per second from 350 to 1600 (on very simple page with few selects)
Just save it in the file called base.py in any directory (e.g. postgresql_psycopg2_persistent) and set in settings
DATABASE_ENGINE to projectname.postgresql_psycopg2_persistent
NOTE!!! the code is not threadsafe - you can't use it with python threads because of unexpectable results, in case of mod_wsgi please use prefork daemon mode with threads=1
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using global variable
from django.db.backends.postgresql_psycopg2.base import DatabaseError, DatabaseWrapper as BaseDatabaseWrapper, \
IntegrityError
from psycopg2 import OperationalError
connection = None
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
global connection
if connection is not None and self.connection is None:
try: # Check if connection is alive
connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
connection = None
else:
self.connection = connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if connection is None and self.connection is not None:
connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None
Or here is a thread safe one, but python threads don't use multiple cores, so you won't get such performance boost as with previous one. You can use this one with multi process one too.
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using thread local storage
from threading import local
from django.db.backends.postgresql_psycopg2.base import DatabaseError, \
DatabaseWrapper as BaseDatabaseWrapper, IntegrityError
from psycopg2 import OperationalError
threadlocal = local()
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
if hasattr(threadlocal, 'connection') and threadlocal.connection is \
not None and self.connection is None:
try: # Check if connection is alive
threadlocal.connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
threadlocal.connection = None
else:
self.connection = threadlocal.connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if (not hasattr(threadlocal, 'connection') or threadlocal.connection \
is None) and self.connection is not None:
threadlocal.connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None

Categories

Resources