We have an application running multiple worker processes connected by a multiprocessing queue.
In order to take care about the DB connections and possible errors we build a static class taking care of establishing the connection and handling errors.
An extract:
class DBConnector:
mysqlhost = "localhost"
mySQLConnections = dict()
#staticmethod
def getWaitingTime():
return DBConnector.time_to_wait_after_failure
#staticmethod
def getRetries():
return DBConnector.retries
#staticmethod
def getMySQLDB(database, user, pwd):
'''return only new connection if no connection for this db, this user (and this thread) exists'''
dbuserkey = database+user
if dbuserkey in DBConnector.mySQLConnections:
print "returning stored connection for "+dbuserkey
pprint(DBConnector.mySQLConnections)
return DBConnector.mySQLConnections[dbuserkey]
else:
print "returning new connection for "+dbuserkey
pprint(DBConnector.mySQLConnections)
mySQLConn = MySQLConnection(DBConnector.mysqlhost, database, user, pwd, DBConnector.retries, DBConnector.time_to_wait_after_failure)
DBConnector.mySQLConnections[dbuserkey] = mySQLConn
return mySQLConn
We had in mind that every worker process now uses this static methods to get a DB connection and ran into strange problems.
We expected that when we start 10 Workers which call the static methods that there will be 10 different database connections. But instead there was a non deterministic behaviour resulting in various number of connections sometimes 3 different or 7 different ones.
I call it "pseudo" instances of the class holding the static method.
Is this behavior normal ? Or is it a bug or someting ?
Related
Summary
We run into the MySQL “max connection reached” issue by making a lot of read/write queries from different python multiprocessing workers from different autoscaled AWS server instances, because we have limited “max connections” for the AWS RDS database instance. While we could beef up the RDS instance type (this shows approximately how many max concurrent connections each instance type can have) and have a higher max connection limit, at some point also those connections will get exhausted if we scale up enough new server instances with new workers.
Questions
Is there a way to create a Connection Pool as a separate service on a separate AWS server instance, so that all python multiprocessing workers across all autoscaled AWS server instances can use the pool and thus we would not exceed the RDS DB max connection limit?
We are able to create the pool using SQLAlchemy (direct link to pool docs) on the first server instance for example, but how can the workers from the other AWS server instances connect to that pool? This is the reason why I highlight creating a pool on a separate AWS server instance because workers from all other servers would connect to that.
Are there any libraries that already handle this scenario? If not, this sounds like a huge effort to implement?
Main Components/Concepts of the Current APP
Flask backend. It has a connection pool and the size is set to 10. This never exceeds the connection beyond 10. There is no issue with this part as it is a separate web facing part that does not relate to the ‘python processing’ workers.
Python Workers. Those are multiprocessing workers which consume messages from the message broker. Whenever a python worker gets a message, the DB connection is established and closed at the end of the task. We have 4 types of workers and each worker has at least 5 instances (we could config this to 10 for example if we use a larger AWS instance). This leads to 20 concurrent connections (5x4) at a worst case scenario when all workers are making a db connection at the same time.
Autoscale. We automatically create new instances for additional workers when there is an overload of messages (tasks). This means that every time a new server instance is added, there could be another 20 concurrent DB connections in the worst case if all workers connect at the same time. So if we have two server instances, that would be 40 concurrent DB connections in the worst case. If we have 100 servers then that could be 2000 concurrent connections.
flask_app.py
app = Flask(__name__)
app.config.from_pyfile('../api.conf')
CORS(app)
jwt = JWTManager(app)
db = SQLAlchemy(app)
app.logger.info("[SQLPOOLSTATUS] pool size = {}".format(db.engine.pool.status()))
#app.route('/upload', methods=['POST'])
def api_upload_file():
log_request(request)
payload = request.get_json()
#--- database read and write -----
img_rec = db.session.query(Table).filter(Table.id == payload.get("img_id")).all()
user_rec = db.session.query(Table2).filter(Table2.id == payload.get("user_id")).first()
#------
some more code for write records for table --
db.session.add(record)
db.session.commit()
return json_response
worker.py
from models import Image, Upload, File, PDF, Album, Account
import os, sys, signal
import socket
import multiprocessing
import time
import pika
from utils import *
def run_priority(workerid, stop_event):
connection = amqp_connect()
channel = connection.channel()
amqp_init_queue(channel)
channel.queue_declare(queue=queue, durable=True, exclusive=False, auto_delete=True)
channel.queue_bind(routing_key=routing_key,queue=queue,exchange=exchange)
method_frame, header_frame, body = channel.basic_get(queue)
# --- Establish database connection ---
engine = db_engine()
connection = engine.connect() #
Session = sessionmaker(bind=engine)
session = Session()
#--- doing some database operation ----
record = session.query(Table).first()
try:
session.add(new_record)
session.commit()
except Exception as e:
session.rollback()
if __name__ == '__main__':
stop_event = multiprocessing.Event()
workers = []
workerid = 0
try:
default_handler = signal.getsignal(signal.SIGINT)
signal.signal(signal.SIGINT, signal.SIG_IGN)
workercount = int(config.get('backend', 'priority_upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run_priority, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
workercount = int(config.get('backend', 'upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
signal.signal(signal.SIGTERM, upload_sigterm_handler)
signal.signal(signal.SIGINT, default_handler)
monitor_worker(workers)
except Exception as e:
# some code to handle exceptions
Tried: create an flask application with sqlachemy pool as a seperate service but the challenge is that i need to rewrite SQLAlchemy ORM queries everywhere in the workers code. Is there a better way to tackle the problem?
Expectation: Any alternative solution/suggestions to use a connection pool globally to all multiprocessing workers and use database connection with the limited connections and
never exceed the limit in the pool.
Any links or resources would be helpful.
Today I found a bug on my python application using ZODB.
Trying to find why my application freezes up, I figured that ZODB was the cause.
Setting the logging to debug, it seem that when commiting, that ZODB would find 2 connections and then start freezing.
INFO:ZEO.ClientStorage:('127.0.0.1', 8092) Connected to storage: ('localhost', 8092)
DEBUG:txn.140661100980032:new transaction
DEBUG:txn.140661100980032:commit
DEBUG:ZODB.Connection:Committing savepoints of size 1858621925
DEBUG:discord.gateway:Keeping websocket alive with sequence 59.
DEBUG:txn.140661100980032:commit <Connection at 7fee2d080fd0>
DEBUG:txn.140661100980032:commit <Connection at 7fee359e5cc0>
As I'm a ZODB beginner, any idea on a how to solve / how to dig deeper ?
It seems to be related to concurrent commits.
I believed that opening a new connection would initiate a dedicated transaction manager, but this is not the case. While initiating a new connection without specifying a transaction manager, the local one (shared with other connections on the thread) is used.
My code:
async def get_connection():
return ZEO.connection(8092)
async def _message_db_init_aux(self, channel, after=None, before=None):
connexion = await get_connection()
root = connexion.root()
messages = await some_function_which_return_a_list()
async for message in messages:
# If author.id doesn't exist on the data, let's initiate it as a Tree
if message.author.id not in root.data: # root.data is a BTrees.OOBTree.BTree()
root.data[message.author.id] = BTrees.OOBTree.BTree()
# Message is a defined classed inherited from persistant.Persistant
root.data[message.author.id][message.id] = Message(message.id, message.author.id, message.created_at)
transaction.commit()
connexion.close()
Don't re-use transaction managers across connections. Each connection has its own transaction manager, use that.
Your code currently creates the connection, then commits. Rather than create the connection, ask the database to create a transaction manager for you, which then manages its own connection. The transaction manager can be used as a context manager, meaning that changes to the database are automatically committed when the context ends.
Moreover, by using ZEO.connection() for each transaction, you are forcing ZEO to create a complete new client object, with a fresh cache and connection pool. By using ZEO.DB() instead, and caching the result, a single client is created from which connections can be pooled and reused, and with a local cache to speed up transactions.
I'd alter the code to:
def get_db():
"""Access the ZEO database client.
The database client is cached to take advantage of caching and connection pooling
"""
db = getattr(get_db, 'db', None)
if db is None:
get_db.db = db = ZEO.DB(8092)
return db
async def _message_db_init_aux(self, channel, after=None, before=None):
with self.get_db().transaction() as conn:
root = conn.root()
messages = await some_function_which_return_a_list()
async for message in messages:
# If author.id doesn't exist on the data, let's initiate it as a Tree
if message.author.id not in root.data: # root.data is a BTrees.OOBTree.BTree()
root.data[message.author.id] = BTrees.OOBTree.BTree()
# Message is a defined classed inherited from persistant.Persistant
root.data[message.author.id][message.id] = Message(
message.id, message.author.id, message.created_at
)
The .transaction() method on the database object creates a new connection under the hood, the moment the context is entered (with causing __enter__ to be called), and when the with block ends the transaction is committed and the connection is released to the pool again.
Note that I used a synchronous def get_db() method; the call signatures on the ZEO client code are entirely synchronous. They are safe to call from asynchronous code because under the hood, the implementation uses asyncio throughout, using callbacks and tasks on the same loop, and actual I/O is deferred to separate tasks.
When not precised, the local transaction manager is used.
If you open multiple connections on the same thread, you have to precise the transaction manager you want to use. By default
transaction.commit()
is the local transaction manager.
connection.transaction.manager.commit()
will use the transaction manager dedicated to the transaction (and not the local one).
For more informations, check http://www.zodb.org/en/latest/guide/transactions-and-threading.html
I wrote a small server chat that does very basic things and I would like to write the tests around it. Unfortunately I quite lost regarding. I would need some help to get on the right tracks.
I have a class called Server() and it contains a method called bind_socket(). I would like to write unit test (preferably using pytest) to test the following method:
class Server(Threading.Thread):
""" Server side class
Instanciate a server in a thread.
"""
MAX_WAITING_CONNECTIONS = 10
def __init__(self, host='localhost', port=10000):
""" Constructor of the Server class.
Initialize the instance in a thread.
Args:
host (str): Host to which to connect (default=localhost)
port (int): Port on which to connect (default=10000)
"""
threading.Thread.__init__(self)
self.host = host
self.port = port
self.connections = []
self.running = True
def bind_socket(self, ip=socket.AF_INET, protocol=socket.SOCK_STREAM):
self.server_socket = socket.socket(ip, protocol)
self.server_socket.bind((self.host, self.port))
self.server_socket.listen(self.MAX_WAITING_CONNECTIONS)
self.connections.append(self.server_socket)
I'm wondering what is the best way to write a test for this method as it doesn't return anything. Should I mock it and try to return the number of of call of socket(), bind(), listen() and append() or is it the wrong way to do proceed? I'm quite lost on that, I did many tests either with pytest and unittest, watch conferences and read articles and I still don't have anything working.
Some explanation and/or examples would be greatly appreciated.
Thanks a lot
For each line of bind_socket you should ask yourself the questions:
What if this line didn't exist
(for conditionals... I know you don't have any here) What if this condition was the other way around
Can this line raise exceptions.
You want your tests to cover all these eventualities.
For example, socket.bind can raise an exception if it's already bound, or socket.listen can raise an exception. Do you close the socket afterwards?
python-memcached memcache client is written in a way where each thread gets its own connection. This makes python-memcached code simple, which is nice, but presents a problem if your application has hundreds or thousands of threads (or if you run lots of applications), because you will quickly run out of available connections in memcache.
Typically this kind of problem is solved by using a connection pool, and indeed the Java memcache libraries I have seen implement connection pooling. After reading the documentation for various Python memcache libraries it seems the only one offering connection pool is pylibmc, but it has two problems for me: it is not pure Python, and it does not seem to have a timeout for reserving a client from the pool. While not being pure Python is perhaps not a deal breaker, not having a timeout certainly is. It is also not clear how those pools would work with for example dogpile.cache.
Preferably I would like to find a pure Python memcache client with connection pooling that would work with dogpile.cache, but I am open to other suggestions as well. I'd rather avoid changing the application logic, though (like pushing all memcache operations into fewer background threads).
A coworker came up with an idea that seems to work well enough for our use case, so sharing that here. The basic idea is that you create the number of memcache clients you want to use up front, put them in a queue, and whenever you need a memcache client you pull one from the queue. Due to Queue.Queue get() method having optional timeout parameter, you can also handle the case where you can't get a client in time. You also need to deal with the use of threading.local in memcache client.
Here is how it could work in code (note that I haven't actually run this exact version so there might be some issues, but this should give you an idea if the textual description did not make sense to you):
import Queue
import memcache
# See http://stackoverflow.com/questions/9539052/python-dynamically-changing-base-classes-at-runtime-how-to
# Don't inherit client from threading.local so that we can reuse clients in
# different threads
memcache.Client = type('Client', (object,), dict(memcache.Client.__dict__))
# Client.__init__ references local, so need to replace that, too
class Local(object): pass
memcache.local = Local
class PoolClient(object):
'''Pool of memcache clients that has the same API as memcache.Client'''
def __init__(self, pool_size, pool_timeout, *args, **kwargs):
self.pool_timeout = pool_timeout
self.queue = Queue.Queue()
for _i in range(pool_size):
self.queue.put(memcache.Client(*args, **kwargs))
def __getattr__(self, name):
return lambda *args, **kw: self._call_client_method(name, *args, **kw)
def _call_client_method(self, name, *args, **kwargs):
try:
client = self.queue.get(timeout=self.pool_timeout)
except Queue.Empty:
return
try:
return getattr(client, name)(*args, **kwargs)
finally:
self.queue.put(client)
Many thank to #Heikki Toivenen for providing ideas to the problem! However, I'm not sure how to call the get() method exactly in order to use a memcache client in the PoolClient. Direct calling of get() method with arbitrary name gives AttributeError or MemcachedKeyNoneError.
By combining #Heikki Toivonen's and pylibmc's solution to the problem, I came up with the following code for the problem and posted here for the convenience of future users (I have debugged this code and it should be ready to run):
import Queue, memcache
from contextlib import contextmanager
memcache.Client = type('Client', (object,), dict(memcache.Client.__dict__))
# Client.__init__ references local, so need to replace that, too
class Local(object): pass
memcache.local = Local
class PoolClient(object):
'''Pool of memcache clients that has the same API as memcache.Client'''
def __init__(self, pool_size, pool_timeout, *args, **kwargs):
self.pool_timeout = pool_timeout
self.queue = Queue.Queue()
for _i in range(pool_size):
self.queue.put(memcache.Client(*args, **kwargs))
print "pool_size:", pool_size, ", Queue_size:", self.queue.qsize()
#contextmanager
def reserve( self ):
''' Reference: http://sendapatch.se/projects/pylibmc/pooling.html#pylibmc.ClientPool'''
client = self.queue.get(timeout=self.pool_timeout)
try:
yield client
finally:
self.queue.put( client )
print "Queue_size:", self.queue.qsize()
# Intanlise an instance of PoolClient
mc_client_pool = PoolClient( 5, 0, ['127.0.0.1:11211'] )
# Use a memcache client from the pool of memcache client in your apps
with mc_client_pool.reserve() as mc_client:
#do your work here
I'm using django with apache and mod_wsgi and PostgreSQL (all on same host), and I need to handle a lot of simple dynamic page requests (hundreds per second). I faced with problem that the bottleneck is that a django don't have persistent database connection and reconnects on each requests (that takes near 5ms).
While doing a benchmark I got that with persistent connection I can handle near 500 r/s while without I get only 50 r/s.
Anyone have any advice? How can I modify Django to use a persistent connection or speed up the connection from Python to DB?
Django 1.6 has added persistent connections support (link to doc for latest stable Django ):
Persistent connections avoid the overhead of re-establishing a
connection to the database in each request. They’re controlled by the
CONN_MAX_AGE parameter which defines the maximum lifetime of a
connection. It can be set independently for each database.
Try PgBouncer - a lightweight connection pooler for PostgreSQL.
Features:
Several levels of brutality when rotating connections:
Session pooling
Transaction pooling
Statement pooling
Low memory requirements (2k per connection by default).
In Django trunk, edit django/db/__init__.py and comment out the line:
signals.request_finished.connect(close_connection)
This signal handler causes it to disconnect from the database after every request. I don't know what all of the side-effects of doing this will be, but it doesn't make any sense to start a new connection after every request; it destroys performance, as you've noticed.
I'm using this now, but I havn't done a full set of tests to see if anything breaks.
I don't know why everyone thinks this needs a new backend or a special connection pooler or other complex solutions. This seems very simple, though I don't doubt there are some obscure gotchas that made them do this in the first place--which should be dealt with more sensibly; 5ms overhead for every request is quite a lot for a high-performance service, as you've noticed. (It takes me 150ms--I havn't figured out why yet.)
Edit: another necessary change is in django/middleware/transaction.py; remove the two transaction.is_dirty() tests and always call commit() or rollback(). Otherwise, it won't commit a transaction if it only read from the database, which will leave locks open that should be closed.
I created a small Django patch that implements connection pooling of MySQL and PostgreSQL via sqlalchemy pooling.
This works perfectly on production of http://grandcapital.net/ for a long period of time.
The patch was written after googling the topic a bit.
Disclaimer: I have not tried this.
I believe you need to implement a custom database back end. There are a few examples on the web that shows how to implement a database back end with connection pooling.
Using a connection pool would probably be a good solution for you case, as the network connections are kept open when connections are returned to the pool.
This post accomplishes this by patching Django (one of the comments points out that it is better to implement a custom back end outside of the core django code)
This post is an implementation of a custom db back end
Both posts use MySQL - perhaps you are able to use similar techniques with Postgresql.
Edit:
The Django Book mentions Postgresql connection pooling, using pgpool (tutorial).
Someone posted a patch for the psycopg2 backend that implements connection pooling. I suggest creating a copy of the existing back end in your own project and patching that one.
This is a package for django connection pool:
django-db-connection-pool
pip install django-db-connection-pool
You can provide additional options to pass to SQLAlchemy's pool creation, key's name is POOL_OPTIONS:
DATABASES = {
'default': {
...
'POOL_OPTIONS' : {
'POOL_SIZE': 10,
'MAX_OVERFLOW': 10
}
...
}
}
I made some small custom psycopg2 backend that implements persistent connection using global variable.
With this I was able to improve the amout of requests per second from 350 to 1600 (on very simple page with few selects)
Just save it in the file called base.py in any directory (e.g. postgresql_psycopg2_persistent) and set in settings
DATABASE_ENGINE to projectname.postgresql_psycopg2_persistent
NOTE!!! the code is not threadsafe - you can't use it with python threads because of unexpectable results, in case of mod_wsgi please use prefork daemon mode with threads=1
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using global variable
from django.db.backends.postgresql_psycopg2.base import DatabaseError, DatabaseWrapper as BaseDatabaseWrapper, \
IntegrityError
from psycopg2 import OperationalError
connection = None
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
global connection
if connection is not None and self.connection is None:
try: # Check if connection is alive
connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
connection = None
else:
self.connection = connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if connection is None and self.connection is not None:
connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None
Or here is a thread safe one, but python threads don't use multiple cores, so you won't get such performance boost as with previous one. You can use this one with multi process one too.
# Custom DB backend postgresql_psycopg2 based
# implements persistent database connection using thread local storage
from threading import local
from django.db.backends.postgresql_psycopg2.base import DatabaseError, \
DatabaseWrapper as BaseDatabaseWrapper, IntegrityError
from psycopg2 import OperationalError
threadlocal = local()
class DatabaseWrapper(BaseDatabaseWrapper):
def _cursor(self, *args, **kwargs):
if hasattr(threadlocal, 'connection') and threadlocal.connection is \
not None and self.connection is None:
try: # Check if connection is alive
threadlocal.connection.cursor().execute('SELECT 1')
except OperationalError: # The connection is not working, need reconnect
threadlocal.connection = None
else:
self.connection = threadlocal.connection
cursor = super(DatabaseWrapper, self)._cursor(*args, **kwargs)
if (not hasattr(threadlocal, 'connection') or threadlocal.connection \
is None) and self.connection is not None:
threadlocal.connection = self.connection
return cursor
def close(self):
if self.connection is not None:
self.connection.commit()
self.connection = None