Cassandra Connection Pool for Flask - python

I am working on a REST web service built with Flask which needs to query a Cassandra database. The most expensive part of the logic is creating the connection to the Cassandra cluster.
What do I need to do with Flask so that I do not have to create the connection to the Cluster on every request?

You should not create new connection on every request, rather you should create a connection object for each process.
If you are running your flask application with uwsgi , I suggest to use #postfork decorator.
Say - You are spawning 4 processes with uwsgi, then
one session for each process is created after the process is spawned.
from uwsgidecorators import postfork
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import dict_factory
from cassandra.policies import RoundRobinPolicy
session = None
hosts=["127.0.0.1","127.0.0.2"]
keyspace="mykeyspace"
def get_new_session():
global cluster
cluster = Cluster(hosts, protocol_version=4, auth_provider=auth_provider, control_connection_timeout=None,
max_schema_agreement_wait=10, port=9042, load_balancing_policy=RoundRobinPolicy())
s = cluster.connect(keyspace)
s.row_factory = dict_factory
return s
#initializing session in every process spawned by uwsgi
#postfork
def connect():
global session
session = get_new_session()
session.row_factory = dict_factory

Related

Alternative to resolve max connection limit issue in sqlalchemy?

Summary
We run into the MySQL “max connection reached” issue by making a lot of read/write queries from different python multiprocessing workers from different autoscaled AWS server instances, because we have limited “max connections” for the AWS RDS database instance. While we could beef up the RDS instance type (this shows approximately how many max concurrent connections each instance type can have) and have a higher max connection limit, at some point also those connections will get exhausted if we scale up enough new server instances with new workers.
Questions
Is there a way to create a Connection Pool as a separate service on a separate AWS server instance, so that all python multiprocessing workers across all autoscaled AWS server instances can use the pool and thus we would not exceed the RDS DB max connection limit?
We are able to create the pool using SQLAlchemy (direct link to pool docs) on the first server instance for example, but how can the workers from the other AWS server instances connect to that pool? This is the reason why I highlight creating a pool on a separate AWS server instance because workers from all other servers would connect to that.
Are there any libraries that already handle this scenario? If not, this sounds like a huge effort to implement?
Main Components/Concepts of the Current APP
Flask backend. It has a connection pool and the size is set to 10. This never exceeds the connection beyond 10. There is no issue with this part as it is a separate web facing part that does not relate to the ‘python processing’ workers.
Python Workers. Those are multiprocessing workers which consume messages from the message broker. Whenever a python worker gets a message, the DB connection is established and closed at the end of the task. We have 4 types of workers and each worker has at least 5 instances (we could config this to 10 for example if we use a larger AWS instance). This leads to 20 concurrent connections (5x4) at a worst case scenario when all workers are making a db connection at the same time.
Autoscale. We automatically create new instances for additional workers when there is an overload of messages (tasks). This means that every time a new server instance is added, there could be another 20 concurrent DB connections in the worst case if all workers connect at the same time. So if we have two server instances, that would be 40 concurrent DB connections in the worst case. If we have 100 servers then that could be 2000 concurrent connections.
flask_app.py
app = Flask(__name__)
app.config.from_pyfile('../api.conf')
CORS(app)
jwt = JWTManager(app)
db = SQLAlchemy(app)
app.logger.info("[SQLPOOLSTATUS] pool size = {}".format(db.engine.pool.status()))
#app.route('/upload', methods=['POST'])
def api_upload_file():
log_request(request)
payload = request.get_json()
#--- database read and write -----
img_rec = db.session.query(Table).filter(Table.id == payload.get("img_id")).all()
user_rec = db.session.query(Table2).filter(Table2.id == payload.get("user_id")).first()
#------
some more code for write records for table --
db.session.add(record)
db.session.commit()
return json_response
worker.py
from models import Image, Upload, File, PDF, Album, Account
import os, sys, signal
import socket
import multiprocessing
import time
import pika
from utils import *
def run_priority(workerid, stop_event):
connection = amqp_connect()
channel = connection.channel()
amqp_init_queue(channel)
channel.queue_declare(queue=queue, durable=True, exclusive=False, auto_delete=True)
channel.queue_bind(routing_key=routing_key,queue=queue,exchange=exchange)
method_frame, header_frame, body = channel.basic_get(queue)
# --- Establish database connection ---
engine = db_engine()
connection = engine.connect() #
Session = sessionmaker(bind=engine)
session = Session()
#--- doing some database operation ----
record = session.query(Table).first()
try:
session.add(new_record)
session.commit()
except Exception as e:
session.rollback()
if __name__ == '__main__':
stop_event = multiprocessing.Event()
workers = []
workerid = 0
try:
default_handler = signal.getsignal(signal.SIGINT)
signal.signal(signal.SIGINT, signal.SIG_IGN)
workercount = int(config.get('backend', 'priority_upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run_priority, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
workercount = int(config.get('backend', 'upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
signal.signal(signal.SIGTERM, upload_sigterm_handler)
signal.signal(signal.SIGINT, default_handler)
monitor_worker(workers)
except Exception as e:
# some code to handle exceptions
Tried: create an flask application with sqlachemy pool as a seperate service but the challenge is that i need to rewrite SQLAlchemy ORM queries everywhere in the workers code. Is there a better way to tackle the problem?
Expectation: Any alternative solution/suggestions to use a connection pool globally to all multiprocessing workers and use database connection with the limited connections and
never exceed the limit in the pool.
Any links or resources would be helpful.

How can I setup a flask webhook to wait before process the next request if the previous one hasn't finished yet? [duplicate]

I'm writing a small Flask application and am having it connect to Rserve using pyRserve. I want every session to initiate and then maintain its own Rserve connection.
Something like this:
session['my_connection'] = pyRserve.connect()
doesn't work because the connection object is not JSON serializable. On the other hand, something like this:
flask.g.my_connection = pyRserve.connect()
doesn't work because it does not persist between requests. To add to the difficulty, it doesn't seem as though pyRserve provides any identifier for a connection, so I can't store a connection ID in the session and use that to retrieve the right connection before each request.
Is there a way to accomplish having a unique connection per session?
The following applies to any global Python data that you don't want to recreate for each request, not just rserve, and not just data that is unique to each user.
We need some common location to create an rserve connection for each user. The simplest way to do this is to run a multiprocessing.Manager as a separate process.
import atexit
from multiprocessing import Lock
from multiprocessing.managers import BaseManager
import pyRserve
connections = {}
lock = Lock()
def get_connection(user_id):
with lock:
if user_id not in connections:
connections[user_id] = pyRserve.connect()
return connections[user_id]
#atexit.register
def close_connections():
for connection in connections.values():
connection.close()
manager = BaseManager(('', 37844), b'password')
manager.register('get_connection', get_connection)
server = manager.get_server()
server.serve_forever()
Run it before starting your application, so that the manager will be available:
python rserve_manager.py
We can access this manager from the app during requests using a simple function. This assumes you've got a value for "user_id" in the session (which is what Flask-Login would do, for example). This ends up making the rserve connection unique per user, not per session.
from multiprocessing.managers import BaseManager
from flask import g, session
def get_rserve():
if not hasattr(g, 'rserve'):
manager = BaseManager(('', 37844), b'password')
manager.register('get_connection')
manager.connect()
g.rserve = manager.get_connection(session['user_id'])
return g.rserve
Access it inside a view:
result = get_rserve().eval('3 + 5')
This should get you started, although there's plenty that can be improved, such as not hard-coding the address and password, and not throwing away the connections to the manager. This was written with Python 3, but should work with Python 2.

How to configure Cassandra with Pyramid using SQLAlchemy?

I've a requirement to develop a web app using pyramid and cassandra in the back end. I've googled enough to find out how to configure cassandra in pyramid (using alchemy scaffold). But, I could not find much details on the same. As per my search, I found that it is not possible to configure NoSQL class databases using alchemy. Is there any way to integrate cassandra with pyramid.
You just need to connect to your cassandra cluster on application start and register the session in the request:
app.models.__init__.py
def includeme(config):
def get_session():
from cassandra.cluster import Cluster
cluster = Cluster('your.cluster.ip')
return cluster.connect()
config.add_request_method(
lambda request: get_session,
'dbsession',
reify=True)
app.__init__:
def main(global_config, **settings):
config = Configurator(settings=settings)
config.include('app.models')
Then you can you use cassandra session in your view by calling request.dbsession, for example like this:
request.dbsession.execute('SELECT name, email FROM users')
At the moment using SQLAlchemy with Cassandra is not possible cause SQLAlchemy generates SQL code and Cassandra queries are build in CQL.
About connecting Pyramid with the Cassandra database I have an example similar to the one posted by #matino but also includes a finished callback, so all connections are closed at the end of the request.
Example of my app.__init__.py:
from cassandra.cluster import Cluster
from cassandra.io.libevreactor import LibevConnection
def main(global_config, **settings):
"""
... MORE CONFIG CODE ...
"""
# Retrieves connection to Cassandra (Non SQL database)
def get_cassandra(request):
cluster = Cluster(['127.0.0.1'], port=9042)
cluster.connection_class = LibevConnection
def disconnect(request):
cluster.shutdown()
session = cluster.connect('app')
session.row_factory = dict_factory
request.add_finished_callback(disconnect)
return session
config.add_request_method(get_cassandra, 'cassandra', reify=True)
"""
... MORE CONFIG CODE ...
"""
It certainly works although to be honest I don't know if this is the best approach cause every single time we execute a statement:
request.cassandra.execute('SELECT * FROM users')
it will go through the whole process: Creating the cluster, defining connection, connecting, executing statement and shutting down the cluster.
I wonder if this is the better approach...

Rabbitmq connections management in Pyramid web app?

How can I manage my rabbit-mq connection in Pyramid app?
I would like to re-use a connection to the queue throughout the web application's lifetime. Currently I am opening/closing connection to the queue for every publish call.
But I can't find any "global" services definition in Pyramid. Any help appreciated.
Pyramid does not need a "global services definition" because you can trivially do that in plain Python:
db.py:
connection = None
def connect(url):
global connection
connection = FooBarBaz(url)
your startup file (__init__.py)
from db import connect
if __name__ == '__main__':
connect(DB_CONNSTRING)
elsewhere:
from db import connection
...
connection.do_stuff(foo, bar, baz)
Having a global (any global) is going to cause problems if you ever run your app in a multi-threaded environment, but is perfectly fine if you run multiple processes, so it's not a huge restriction. If you need to work with threads the recipe can be extended to use thread-local variables. Here's another example which also connects lazily, when the connection is needed the first time.
db.py:
import threading
connections = threading.local()
def get_connection():
if not hasattr(connections, 'this_thread_connection'):
connections.this_thread_connection = FooBarBaz(DB_STRING)
return connections.this_thread_connection
elsewhere:
from db import get_connection
get_connection().do_stuff(foo, bar, baz)
Another common problem with long-living connections is that the application won't auto-recover if, say, you restart RabbitMQ while your application is running. You'll need to somehow detect dead connections and reconnect.
It looks like you can attach objects to the request with add_request_method.
Here's a little example app using that method to make one and only one connection to a socket on startup, then make the connection available to each request:
from wsgiref.simple_server import make_server
from pyramid.config import Configurator
from pyramid.response import Response
def index(request):
return Response('I have a persistent connection: {} with id {}'.format(
repr(request.conn).replace("<", "<"),
id(request.conn),
))
def add_connection():
import socket
s = socket.socket()
s.connect(("google.com", 80))
print("I should run only once")
def inner(request):
return s
return inner
if __name__ == '__main__':
config = Configurator()
config.add_route('index', '/')
config.add_view(index, route_name='index')
config.add_request_method(add_connection(), 'conn', reify=True)
app = config.make_wsgi_app()
server = make_server('0.0.0.0', 8080, app)
server.serve_forever()
You'll need to be careful about threading / forking in this case though (each thread / process will need its own connection). Also, note that I am not very familiar with pyramid, there may be a better way to do this.

Python Redis connection should be closed on every request? (flask)

I am creating flask app with Redis database. And I have one connection question
I can have Redis connection global and keep non-closed all time:
init.py
import os
from flask import Flask
import redis
app = Flask(__name__)
db = redis.StrictRedis(host='localhost', port=6379, db=0)
Also I can reconnect every request (Flask doc http://flask.pocoo.org/docs/tutorial/dbcon/):
init.py
import os
from flask import Flask
import redis
app = Flask(__name__)
#code...
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
Which method is better? Why I should use it?
Thanks for the help!
By default redis-py uses connection pooling. The github wiki says:
Behind the scenes, redis-py uses a connection pool to manage connections to a Redis server. By default, each Redis instance you create will in turn create its own connection pool.
This means that for most applications and assuming your redis server is on the same computer as your flask app, its unlikely that "opening a connection" for each request is going to cause any performance issues. The creator of Redis Py has suggested this approach:
a. create a global redis client instance and have your code use that.
b. create a global connection pool and pass that to various redis instances throughout your code.
Additionally, if you have a lot of instructions to execute at any one time then it may be worth having a look at pipelining as this reduces that back and forth time required for each instruction.
Using Flask, global variables are not recommended. We can use g to manage redis client during a request. Like manage a database connection using factory pattern.
from flask import g
import redis
def get_redis():
if 'db' not in g:
g.db = redis.Redis(host='localhost', port=6379, db=0)
return g.db
Reconnect every request is better for you.
The application context is a good place to store common data during a request or CLI command. Flask provides the g object for this purpose. It is a simple namespace object that has the same lifetime as an application context.

Categories

Resources