I have a Flask webapp running on Heroku. There are functions that require more than 30 seconds to process data and for those tasks I using heroku background jobs with Redis with 20 connections limit. However, these tasks are only available for specific users.
My understanding is that Redis opens connection after I initiate the Queue, no matter if the job was queued and processed or not.
Here's my import and Queue initiation:
from rq import Queue
from rq.job import Job
from worker import conn as rconn
q = Queue(connection=rconn)
And here's my worker file:
import os
import urllib
from redis import Redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDIS_URL')
urllib.parse.uses_netloc.append('redis')
url = urllib.parse.urlparse(redis_url)
conn = Redis(host=url.hostname, port=url.port, db=0, password=url.password)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
I am looking for a way to initiate redis connection only for users with specific access level, so the app won't reach connection error.
Does it make sense to initiate Queue from user_login function as global variable, like this:
if check_password_hash(db_pwd, pwd) and acces_level==4:
q global
q = Queue(connection=rconn)
Related
Summary
We run into the MySQL “max connection reached” issue by making a lot of read/write queries from different python multiprocessing workers from different autoscaled AWS server instances, because we have limited “max connections” for the AWS RDS database instance. While we could beef up the RDS instance type (this shows approximately how many max concurrent connections each instance type can have) and have a higher max connection limit, at some point also those connections will get exhausted if we scale up enough new server instances with new workers.
Questions
Is there a way to create a Connection Pool as a separate service on a separate AWS server instance, so that all python multiprocessing workers across all autoscaled AWS server instances can use the pool and thus we would not exceed the RDS DB max connection limit?
We are able to create the pool using SQLAlchemy (direct link to pool docs) on the first server instance for example, but how can the workers from the other AWS server instances connect to that pool? This is the reason why I highlight creating a pool on a separate AWS server instance because workers from all other servers would connect to that.
Are there any libraries that already handle this scenario? If not, this sounds like a huge effort to implement?
Main Components/Concepts of the Current APP
Flask backend. It has a connection pool and the size is set to 10. This never exceeds the connection beyond 10. There is no issue with this part as it is a separate web facing part that does not relate to the ‘python processing’ workers.
Python Workers. Those are multiprocessing workers which consume messages from the message broker. Whenever a python worker gets a message, the DB connection is established and closed at the end of the task. We have 4 types of workers and each worker has at least 5 instances (we could config this to 10 for example if we use a larger AWS instance). This leads to 20 concurrent connections (5x4) at a worst case scenario when all workers are making a db connection at the same time.
Autoscale. We automatically create new instances for additional workers when there is an overload of messages (tasks). This means that every time a new server instance is added, there could be another 20 concurrent DB connections in the worst case if all workers connect at the same time. So if we have two server instances, that would be 40 concurrent DB connections in the worst case. If we have 100 servers then that could be 2000 concurrent connections.
flask_app.py
app = Flask(__name__)
app.config.from_pyfile('../api.conf')
CORS(app)
jwt = JWTManager(app)
db = SQLAlchemy(app)
app.logger.info("[SQLPOOLSTATUS] pool size = {}".format(db.engine.pool.status()))
#app.route('/upload', methods=['POST'])
def api_upload_file():
log_request(request)
payload = request.get_json()
#--- database read and write -----
img_rec = db.session.query(Table).filter(Table.id == payload.get("img_id")).all()
user_rec = db.session.query(Table2).filter(Table2.id == payload.get("user_id")).first()
#------
some more code for write records for table --
db.session.add(record)
db.session.commit()
return json_response
worker.py
from models import Image, Upload, File, PDF, Album, Account
import os, sys, signal
import socket
import multiprocessing
import time
import pika
from utils import *
def run_priority(workerid, stop_event):
connection = amqp_connect()
channel = connection.channel()
amqp_init_queue(channel)
channel.queue_declare(queue=queue, durable=True, exclusive=False, auto_delete=True)
channel.queue_bind(routing_key=routing_key,queue=queue,exchange=exchange)
method_frame, header_frame, body = channel.basic_get(queue)
# --- Establish database connection ---
engine = db_engine()
connection = engine.connect() #
Session = sessionmaker(bind=engine)
session = Session()
#--- doing some database operation ----
record = session.query(Table).first()
try:
session.add(new_record)
session.commit()
except Exception as e:
session.rollback()
if __name__ == '__main__':
stop_event = multiprocessing.Event()
workers = []
workerid = 0
try:
default_handler = signal.getsignal(signal.SIGINT)
signal.signal(signal.SIGINT, signal.SIG_IGN)
workercount = int(config.get('backend', 'priority_upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run_priority, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
workercount = int(config.get('backend', 'upload_workers'))
for x in range(workercount):
worker = multiprocessing.Process(target=run, args=(workerid, stop_event))
workers.append(worker)
worker.daemon = True
worker.start()
signal.signal(signal.SIGTERM, upload_sigterm_handler)
signal.signal(signal.SIGINT, default_handler)
monitor_worker(workers)
except Exception as e:
# some code to handle exceptions
Tried: create an flask application with sqlachemy pool as a seperate service but the challenge is that i need to rewrite SQLAlchemy ORM queries everywhere in the workers code. Is there a better way to tackle the problem?
Expectation: Any alternative solution/suggestions to use a connection pool globally to all multiprocessing workers and use database connection with the limited connections and
never exceed the limit in the pool.
Any links or resources would be helpful.
Hello fellow developers,
I'm actually trying to create a small webapp that would allow me to monitor multiple binance accounts from a dashboard and maybe in the futur perform some small automatic trading actions.
My frontend is implemented with Vue+quasar and my backend server is based on python Flask for the REST api.
What I would like to do is being able to start a background process dynamically when a specific endpoint of my server is called. Once this process is started on the server, I would like it to communicate via websocket with my Vue client.
Right now I can spawn the worker and create the websocket communication, but somehow, I can't figure out how to make all the threads in my worker to work all together. Let me get a bit more specific:
Once my worker is started, I'm trying to create at least two threads. One is the infinite loop allowing me to automate some small actions and the other one is the flask-socketio server that will handle the sockets connections. Here is the code of that worker :
customWorker.py
import time
from flask import Flask
from flask_socketio import SocketIO, send, emit
import threading
import json
import eventlet
# custom class allowing me to communicate with my mongoDD
from db_wrap import DbWrap
from binance.client import Client
from binance.exceptions import BinanceAPIException, BinanceWithdrawException, BinanceRequestException
from binance.websockets import BinanceSocketManager
def process_message(msg):
print('got a websocket message')
print(msg)
class customWorker:
def __init__(self, workerId, sleepTime, dbWrap):
self.workerId = workerId
self.sleepTime = sleepTime
self.socketio = None
self.dbWrap = DbWrap()
# this retrieves worker configuration from database
self.config = json.loads(self.dbWrap.get_worker(workerId))
keys = self.dbWrap.get_worker_keys(workerId)
self.binanceClient = Client(keys['apiKey'], keys['apiSecret'])
def handle_message(self, data):
print ('My PID is {} and I received {}'.format(os.getpid(), data))
send(os.getpid())
def init_websocket_server(self):
app = Flask(__name__)
socketio = SocketIO(app, async_mode='eventlet', logger=True, engineio_logger=True, cors_allowed_origins="*")
eventlet.monkey_patch()
socketio.on_event('message', self.handle_message)
self.socketio = socketio
self.app = app
def launch_main_thread(self):
while True:
print('My PID is {} and workerId {}'
.format(os.getpid(), self.workerId))
if self.socketio is not None:
info = self.binanceClient.get_account()
self.socketio.emit('my_account', info, namespace='/')
def launch_worker(self):
self.init_websocket_server()
self.socketio.start_background_task(self.launch_main_thread)
self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False)
Once the REST endpoint is called, the worker is spawned by calling birth_worker() method of "Broker" object available within my server :
from custom_worker import customWorker
#...
def create_worker(self, workerid, sleepTime, dbWrap):
worker = customWorker(workerid, sleepTime, dbWrap)
worker.launch_worker()
def birth_worker(workerid, 5, dbwrap):
p = Process(target=self.create_worker, args=(workerid,10, botPipe, dbWrap))
p.start()
So when this is done, the worker is launched in a separate process that successfully creates threads and listens for socket connection. But my problem is that I can't use my binanceClient in my main thread. I think that it is using threads and the fact that I use eventlet and in particular the monkey_patch() function breaks it. When I try to call the binanceClient.get_account() method I get an error AttributeError: module 'select' has no attribute 'poll'
I'm pretty sure about that it comes from monkey_patch because if I use it in the init() method of my worker (before patching) it works and I can get the account info. So I guess there is a conflict here that I've been trying to resolve unsuccessfully.
I've tried using only the thread mode for my socket.io app by using async_mode=threading but then, my flask-socketio app won't start and listen for sockets as the line self.socketio.run(self.app, host="127.0.0.1", port=8001, debug=True, use_reloader=False) blocks everything
I'm pretty sure I have an architecture problem here and that I shouldn't start my app by launching socketio.run. I've been unable to start it with gunicorn for example because I need it to be dynamic and call it from my python scripts. I've been struggling to find the proper way to do this and that's why I'm here today.
Could someone please give me a hint on how is this supposed to be achieved ? How can I dynamically spawn a subprocess that will manage a socket server thread, an infinite loop thread and connections with binanceClient ? I've been roaming stack overflow without success, every advice is welcome, even an architecture reforge.
Here is my environnement:
Manjaro Linux 21.0.1
pip-chill:
eventlet==0.30.2
flask-cors==3.0.10
flask-socketio==5.0.1
pillow==8.2.0
pymongo==3.11.3
python-binance==0.7.11
websockets==8.1
In my Heroku application I succesfully implemented background tasks. For this purpose I created a Queue object at the top of my views.py file and called queue.enqueue() in the appropriate view.
Now I'm trying to set a repeated job with rq-scheduler's scheduler.schedule() method. I know that it is not best way to do it but I call this method again at the top of my views.py file. Whatever I do, I couldn't get it to work, even if it's a simple HelloWorld function.
views.py:
from redis import Redis
from rq import Queue
from worker import conn
from rq_scheduler import Scheduler
scheduler = Scheduler(queue=q, connection=conn)
print("SCHEDULER = ", scheduler)
def say_hello():
print(" Hello world!")
scheduler.schedule(
scheduled_time=datetime.utcnow(), # Time for first execution, in UTC timezone
func=say_hello, # Function to be queued
interval=60, # Time before the function is called again, in seconds
repeat=10, # Repeat this number of times (None means repeat forever)
queue_name='default',
)
worker.py:
import os
import redis
from rq import Worker, Queue, Connection
import django
django.setup()
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL')
if not redis_url:
print("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
raise RuntimeError("Set up Redis To Go first. Probably can't get env variable REDISTOGO_URL")
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
print(" CREATING NEW WORKER IN worker.py")
worker = Worker(map(Queue, listen))
worker.work()
I'm checking the length of my queue before and after of schedule(), but it looks like length is always 0. I also can see that there are jobs when I call scheduler.get_jobs(), but those jobs doesn't get enqueued or performed I think.
I also don't want to use another cron solution for my project, as I already can do background tasks with rq, it shouldn't be that hard to implement a repeated task, or is it?
I went through documentation a couple times, now I feel so stuck, so I appretiate all the help or advices that I can get.
Using rq 1.6.1 and rq-scheduler 0.10.0 packages with Django 2.2.5 and Python 3.6.10
Edit: When I print jobs in scheduler, I see that their enqueued_at param is set to None, am I missing something really simple?
I'm trying to extend the flask-base project https://github.com/hack4impact/flask-base/tree/master/app which comes with a user model only. I'm trying to add the ability to run a background task on redis using rq. I've found https://devcenter.heroku.com/articles/python-rq which is helpful.
this app has support for redis queues with a background redis queue being implemented by running :
#manager.command
def run_worker():
"""Initializes a slim rq task queue."""
listen = ['default']
conn = Redis(
host=app.config['RQ_DEFAULT_HOST'],
port=app.config['RQ_DEFAULT_PORT'],
db=0,
password=app.config['RQ_DEFAULT_PASSWORD'])
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
using:
$ python manage.py run_worker
In my views I have:
#main.route('/selected')
def background_selected():
from rq import Queue
from manage import run_worker.conn
q = Queue(connection=conn)
return q.enqueue(selected)
The problem is I don't know how to import the connection created in run_worker() into my view. I've tried variations of :
from manage import run_worker.conn
but I'm getting:
SyntaxError: invalid syntax.
How can I get access to the conn variable in the background task?
from the documentation, python-rq Configuration
Can you try by making the below changes:
manager.py
import redis
"""Initializes a slim rq task queue."""
listen = ['default']
conn = redis.Redis(host=app.config['RQ_DEFAULT_HOST'],
port=app.config['RQ_DEFAULT_PORT'],
db=0,
password=app.config['RQ_DEFAULT_PASSWORD'])
#manager.command
def run_worker():
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
and from view:
from rq import Queue
from manage import conn
q = Queue(connection=conn)
I contacted the developer who provided the following:
I am working on a REST web service built with Flask which needs to query a Cassandra database. The most expensive part of the logic is creating the connection to the Cassandra cluster.
What do I need to do with Flask so that I do not have to create the connection to the Cluster on every request?
You should not create new connection on every request, rather you should create a connection object for each process.
If you are running your flask application with uwsgi , I suggest to use #postfork decorator.
Say - You are spawning 4 processes with uwsgi, then
one session for each process is created after the process is spawned.
from uwsgidecorators import postfork
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import dict_factory
from cassandra.policies import RoundRobinPolicy
session = None
hosts=["127.0.0.1","127.0.0.2"]
keyspace="mykeyspace"
def get_new_session():
global cluster
cluster = Cluster(hosts, protocol_version=4, auth_provider=auth_provider, control_connection_timeout=None,
max_schema_agreement_wait=10, port=9042, load_balancing_policy=RoundRobinPolicy())
s = cluster.connect(keyspace)
s.row_factory = dict_factory
return s
#initializing session in every process spawned by uwsgi
#postfork
def connect():
global session
session = get_new_session()
session.row_factory = dict_factory