What is the best pattern to handle Redis connections (both for interacting with Redis directly and indirectly through Python-RQ)?
Generally, database connections need to be closed / returned to a pool when done, but I don't see how to do that with redis-py. That makes me wonder if I'm doing it the wrong way.
Also, I have seen some performance dips when enqueuing jobs to RQ, which I've been told might relate to poor connection usage / reuse.
Basically, I'm interested in knowing the correct pattern, so I can either verify or correct what we have in our application.
Thanks a lot! If there's more information that would be helpful, let me know.
Behind the scenes, redis-py uses a connection pool to manage connections to a Redis server. By default, each Redis instance you create will in turn create its own connection pool. You can override this behavior and use an existing connection pool by passing an already created connection pool instance to the connection_pool argument of the Redis class. You may choose to do this in order to implement client side sharding or have finer grain control of how connections are managed
pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
r = redis.Redis(connection_pool=pool)
Related
I would like to makeup a setup to store my mongodb connections as 3-4 threads on Memory by which it will act as a pool of connections. I don't want to create a connection everytime when my core functions work which does some db queries. I am thinking on this way like If I have a pool of connections then my core functions will take available connections(available threads) from the pool, use it and release it back to the pool.
Does it make any sense? Is it possible to achieve this?
I know that mongodb internally do have connection pooling, but I would like to the above mentioned stuff on top of it.
Your question doesn't really make sense. There's no need to do anything like have connection pooling on top of existing connection pooling.
PyMongo docs: All that really matters is whether your application uses multithreading or multiprocessing.
multithreading is already handled by the connection pools and you only need one client instance.
multiprocessing requires you to create a new instance for each process to avoid any deadlock issues.
Your app is likely multithreaded, so all you should need to do is have a global instance of your db connection and reuse it for each db query. PyMongo will take care of the rest.
If I don't need transactions, can I reuse the same database connection for multiple requests?
Flask documentation says:
Because database connections encapsulate a transaction, we also need to make sure that only one request at the time uses the connection.
Here's how I understand the meaning of the above sentence:
Python DB-API connection can only handle one transaction at a time; to start a new transaction, one must first commit or roll back the previous one. So if each of our requests needs its own transaction, then of course each request needs its own database connection.
Please let me know if I got it wrong.
But let's say I set autocommit mode, and handle each request in a single SQL statement. Or, alternatively, let's say I only read - not write - to the database. In either case, it seems I can just reuse the same database connection for all my requests to save the overhead of multiple connections. But I'm not sure if there's any downside to this approach.
Edit: I can see one issue with what I'm proposing: each request might be handled by a different process. Since connections should probably not be reused across processes, let me clarify my question: I mean creating one connection per process, and using it for all requests that happen to be handled by this process.
On the other hand, the whole point of (green or native) threads is usually to serve one request per thread, so my proposed approach implies sharing connection across threads. It seems one connection can be used concurrently in multiple native threads, but not in multiple green threads.
So let's say for concreteness my environment is flask + gunicorn with multiple multi-threaded sync workers.
Based on #Craig Ringer comment on a different question, I think I know the answer.
The only possible advantage of connection sharing is performance (other factors - like transaction encapsulation and simplicity - favor a separate connection per request). And since a connection can't be shared across processes or green threads, it only has a chance with native threads. But psycopg2 (and presumably other drivers) doesn't allow concurrent access from the same connection. So unless each request spends very little time talking to the database, there is likely a performance hit, not benefit, from connection sharing.
Applications often need to connect to other services (a database, a cache, an API, etc). For sanity and DRY, we'd like to keep all of these connections in one module so the rest of our code base can share connections.
To reduce boilerplate, downstream usage should be simple:
# app/do_stuff.py
from .connections import AwesomeDB
db = AwesomeDB()
def get_stuff():
return db.get('stuff')
And setting up the connection should also be simple:
# app/cli.py or some other main entry point
from .connections import AwesomeDB
db = AwesomeDB()
db.init(username='stuff admin') # Or os.environ['DB_USER']
Web frameworks like Django and Flask do something like this, but it feels a bit clunky:
Connect to a Database in Flask, Which Approach is better?
http://flask.pocoo.org/docs/0.10/tutorial/dbcon/
One big issue with this is that we want a reference to the actual connection object instead of a proxy, because we want to retain tab-completion in iPython and other dev environments.
So what's the Right Way (tm) to do it? After a few iterations, here's my idea:
#app/connections.py
from awesome_database import AwesomeDB as RealAwesomeDB
from horrible_database import HorribleDB as RealHorribleDB
class ConnectionMixin(object):
__connection = None
def __new__(cls):
cls.__connection = cls.__connection or object.__new__(cls)
return cls.__connection
def __init__(self, real=False, **kwargs):
if real:
super().__init__(**kwargs)
def init(self, **kwargs):
kwargs['real'] = True
self.__init__(**kwargs)
class AwesomeDB(ConnectionMixin, RealAwesomeDB):
pass
class HorribleDB(ConnectionMixin, RealHorribleDB):
pass
Room for improvement: Set initial __connection to a generic ConnectionProxy instead of None, which catches all attribute access and throws an exception.
I've done quite a bit of poking around here on SO and in various OSS projects and haven't seen anything like this. It feels pretty solid, though it does mean a bunch of modules will be instantiating connection objects as a side effect at import time. Will this blow up in my face? Are there any other negative consequences to this approach?
First, design-wise, I might be missing something, but I don't see why you need the heavy mixin+singleton machinery instead of just defining a helper like so:
_awesome_db = None
def awesome_db(**overrides):
global _awesome_db
if _awesome_db is None:
# Read config/set defaults.
# overrides.setdefault(...)
_awesome_db = RealAwesomeDB(**overrides)
return _awesome_db
Also, there is a bug that might not look like a supported use-case, but anyway: if you make the following 2 calls in a row, you would wrongly get the same connection object twice even though you passed different parameters:
db = AwesomeDB()
db.init(username='stuff admin')
db = AwesomeDB()
db.init(username='not-admin') # You'll get admin connection here.
An easy fix for that would be to use a dict of connections keyed on the input parameters.
Now, on the essence of the question.
I think the answer depends on how your "connection" classes are actually implemented.
Potential downsides with your approach I see are:
In a multithreaded environment you could get problems with unsychronized concurrent access to the global connection object from multiple threads, unless it is already thread-safe. If you care about that, you could change your code and interface a bit and use a thread-local variable.
What if a process forks after creating the connection? Web application servers tend to do that and it might not be safe, again depending on the underlying connection.
Does the connection object have state? What happens if the connection object becomes invalid (due to i.e. connection error/time out)? You might need to replace the broken connection with a new one to return the next time a connection is requested.
Connection management is often already efficiently and safely implemented through a connection pool in client libraries.
For example, the redis-py Redis client uses the following implementation:
https://github.com/andymccurdy/redis-py/blob/1c2071762ad9b9288e786665990083e61c1cf355/redis/connection.py#L974
The Redis client then uses the connection pool like so:
Requests a connection from the connection pool.
Tries to execute a command on the connection.
If the connection fails, the client closes it.
In any case, finaly it is returned to the connection pool so it can be reused by subsequent calls or other threads.
So since the Redis client handles all of that under the hood, you can safely do what you want directly. Connections will be lazily created until the connection pool reaches full capacity.
# app/connections.py
def redis_client(**kwargs):
# Maybe read configuration/set default arguments
# kwargs.setdefault()
return redis.Redis(**kwargs)
Similarly, SQLAlchemy can use connection pooling as well.
To summarize, my understanding is that:
If your client library supports connection pooling, you don't need to do anything special to share connections between modules and even threads. You could just define a helper similar to redis_client() that reads configuration, or specifies default parameters.
If your client library provides only low-level connection objects, you will need to make sure access to them is thread-safe and fork-safe. Also, you need to make sure each time you return a valid connection (or raise an exception if you can't establish or reuse an existing one).
I am working on an online judge.I am using python 2.7 and Mysql ( as I am working on back end-part)
My Method:
I create a main thread which pulls out submissions from database( 10 at a time) and puts them in a queue.Then I have multiple threads that take submissions from queue, evaluate it and write the result back to database.
Now I have some doubts(I know they are doubts from different topics but approach to some of them also is highly appreciated).
Currently when I start the threads I give them their own db connections, Which they use.Is this a good practice to give one connection per thread. Does sharing of connections between threads create problems.How do I go about this.
My main thread uses a single connection as its only work is to pull submissions from db and put then in queue(also update their status in db to Assessing Submission). But sometimes I get the error: Lost connection to Mysql server while querying. I keep getting it even when I stop the program and start it again.What do I do about it? Also should I implement a Pool of connections for only the main thread?
Also does a db connection stay alive for ever? What to do when its session memory etc gets exhausted how to handle that?
Use a connection pool. Sharing the database connection is not always bad but you have to be careful about it. You can try SQLAlchemy to manage a lot of this for you: http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#unitofwork-contextual
The server might be out of connections, your connection might have been killed because it uses too many resources.. etc. A connection pool could help you solve this.
It all depends, it could stay alive indefinitely theoretically, but usually you have a timeout somewhere.
If you give the same connection to every thread then the threads will not be able to query the database and race condition will occur. So you need to provide separate connection to every thread and indeed it is a good idea. Use a Connection Pool for the purpose it will help you get different connections.
Connection Pool will surely help.
Release the connection once your work is over. There is a limit to connection which is termed as connection time out. So you need to use some third party library to handle that, c3p0 is a good library which can help you in this.
Please refer the below link to configure it:
Best configuration of c3p0
I have developed some custom DAO-like classes to meet some very specialized requirements for my project that is a server-side process that does not run inside any kind of framework.
The solution works great except that every time a new request is made, I open a new connection via MySQLdb.connect.
What is the best "drop in" solution to switch this over to using connection pooling in python? I am imagining something like the commons DBCP solution for Java.
The process is long running and has many threads that need to make requests, but not all at the same time... specifically they do quite a lot of work before brief bursts of writing out a chunk of their results.
Edited to add:
After some more searching I found anitpool.py which looks decent, but as I'm relatively new to python I guess I just want to make sure I'm not missing a more obvious/more idiomatic/better solution.
In MySQL?
I'd say don't bother with the connection pooling. They're often a source of trouble and with MySQL they're not going to bring you the performance advantage you're hoping for. This road may be a lot of effort to follow--politically--because there's so much best practices hand waving and textbook verbiage in this space about the advantages of connection pooling.
Connection pools are simply a bridge between the post-web era of stateless applications (e.g. HTTP protocol) and the pre-web era of stateful long-lived batch processing applications. Since connections were very expensive in pre-web databases (since no one used to care too much about how long a connection took to establish), post-web applications devised this connection pool scheme so that every hit didn't incur this huge processing overhead on the RDBMS.
Since MySQL is more of a web-era RDBMS, connections are extremely lightweight and fast. I have written many high volume web applications that don't use a connection pool at all for MySQL.
This is a complication you may benefit from doing without, so long as there isn't a political obstacle to overcome.
IMO, the "more obvious/more idiomatic/better solution" is to use an existing ORM rather than invent DAO-like classes.
It appears to me that ORM's are more popular than "raw" SQL connections. Why? Because Python is OO, and the mapping from a SQL row to an object is absolutely essential. There aren't many use cases where you deal with SQL rows that don't map to Python objects.
I think that SQLAlchemy or SQLObject (and the associated connection pooling) are the more idiomatic Pythonic solutions.
Pooling as a separate feature isn't very common because pure SQL (without object mapping) isn't very popular for the kind of complex, long-running processes that benefit from connection pooling. Yes, pure SQL is used, but it's always used in simpler or more controlled applications where pooling isn't helpful.
I think you might have two alternatives:
Revise your classes to use SQLAlchemy or SQLObject. While this appears painful at first (all that work wasted), you should be able to leverage all the design and thought. It's merely an exercise in adopting a widely-used ORM and pooling solution.
Roll out your own simple connection pool using the algorithm you outlined -- a simple Set or List of connections that you cycle through.
Wrap your connection class.
Set a limit on how many connections you make.
Return an unused connection.
Intercept close to free the connection.
Update:
I put something like this in dbpool.py:
import sqlalchemy.pool as pool
import MySQLdb as mysql
mysql = pool.manage(mysql)
Old thread, but for general-purpose pooling (connections or any expensive object), I use something like:
def pool(ctor, limit=None):
local_pool = multiprocessing.Queue()
n = multiprocesing.Value('i', 0)
#contextlib.contextmanager
def pooled(ctor=ctor, lpool=local_pool, n=n):
# block iff at limit
try: i = lpool.get(limit and n.value >= limit)
except multiprocessing.queues.Empty:
n.value += 1
i = ctor()
yield i
lpool.put(i)
return pooled
Which constructs lazily, has an optional limit, and should generalize to any use case I can think of. Of course, this assumes that you really need the pooling of whatever resource, which you may not for many modern SQL-likes. Usage:
# in main:
my_pool = pool(lambda: do_something())
# in thread:
with my_pool() as my_obj:
my_obj.do_something()
This does assume that whatever object ctor creates has an appropriate destructor if needed (some servers don't kill connection objects unless they are closed explicitly).
I've just been looking for the same sort of thing.
I've found pysqlpool and the sqlalchemy pool module
Replying to an old thread but the last time I checked, MySQL offers connection pooling as part of its drivers.
You can check them out at :
https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html
From TFA, Assuming you want to open a connection pool explicitly (as OP had stated):
dbconfig = { "database": "test", "user":"joe" }
cnxpool = mysql.connector.pooling.MySQLConnectionPool(pool_name = "mypool",pool_size = 3, **dbconfig)
This pool is then accessed by requesting from the pool through the get_connection() function.
cnx1 = cnxpool.get_connection()
cnx2 = cnxpool.get_connection()
Making your own connection pool is a BAD idea if your app ever decides to start using multi-threading. Making a connection pool for a multi-threaded application is much more complicated than one for a single-threaded application. You can use something like PySQLPool in that case.
It's also a BAD idea to use an ORM if you're looking for performance.
If you'll be dealing with huge/heavy databases that have to handle lots of selects, inserts,
updates and deletes at the same time, then you're going to need performance, which means you'll need custom SQL written to optimize lookups and lock times. With an ORM you don't usually have that flexibility.
So basically, yeah, you can make your own connection pool and use ORMs but only if you're sure you won't need anything of what I just described.
Use DBUtils, simple and reliable.
pip install DBUtils
i did it for opensearch so you can refer it.
from opensearchpy import OpenSearch
def get_connection():
connection = None
try:
connection = OpenSearch(
hosts=[{'host': settings.OPEN_SEARCH_HOST, 'port': settings.OPEN_SEARCH_PORT}],
http_compress=True,
http_auth=(settings.OPEN_SEARCH_USER, settings.OPEN_SEARCH_PASSWORD),
use_ssl=True,
verify_certs=True,
ssl_assert_hostname=False,
ssl_show_warn=False,
)
except Exception as error:
print("Error: Connection not established {}".format(error))
else:
print("Connection established")
return connection
class OpenSearchClient(object):
connection_pool = []
connection_in_use = []
def __init__(self):
if OpenSearchClient.connection_pool:
pass
else:
OpenSearchClient.connection_pool = [get_connection() for i in range(0, settings.CONNECTION_POOL_SIZE)]
def search_data(self, query="", index_name=settings.OPEN_SEARCH_INDEX):
available_cursor = OpenSearchClient.connection_pool.pop(0)
OpenSearchClient.connection_in_use.append(available_cursor)
response = available_cursor.search(body=query, index=index_name)
available_cursor.close()
OpenSearchClient.connection_pool.append(available_cursor)
OpenSearchClient.connection_in_use.pop(-1)
return response