Python - Celery how to establish async connection per worker - python

I created several endpoint calling to Celery tasks which preforming different tasks against the DB.
Obviously it doesn't make sense to re-connect to DB each time,
But from other hand - When the connection should be close?
Is it make sense using async connection to DB?
I'm not sure how i can achieve that and if that make sense to use Async with Celery - would appreciate any guidance
import os
import traceback
from celery import Celery
from celery.utils.log import get_task_logger
from config.config import *
app = Celery('proj',
broker=config('CELERY_BROKER_URL'),
backend=config('CELERY_RESULT_BACKEND'),
include=['proj.tasks','proj.fetch_data'])
app.conf.update(
result_expires=3600,
)
app.autodiscover_tasks()
if __name__ == '__main__':
app.start()
I came up to worker_process_init, worker_process_shutdown,
Note :
database_ps_ms_stg - is the based on Databases (async to postgres).
tasks.py
from .celery import app
from celery.signals import worker_process_init, worker_process_shutdown,task_postrun
from config.db import database_ps_ms_stg
import asyncio
#worker_process_init.connect
async def init_worker(**kwargs):
if not database_ps_ms_stg.is_connected:
await database_ps_ms_stg.connect()
print ("connected to database_ps_ms_stg")
#worker_process_shutdown.connect
async def shutdown_worker(**kwargs):
if database_ps_ms_stg.is_connected:
await database_ps_ms_stg.disconnect()
print ("disconneceting from database_ps_ms_stg")
Getting :
[2021-07-18 16:23:16,951: WARNING/ForkPoolWorker-1] /usr/local/lib/python3.8/site-packages/celery/concurrency/prefork.py:77: RuntimeWarning: coroutine 'init_worker' was never awaited
signals.worker_process_init.send(sender=None)
```

Your coroutines are not being scheduled for execution in any event loop.
For example, this code
#worker_process_init.connect
async def init_worker(**kwargs):
if not database_ps_ms_stg.is_connected:
await database_ps_ms_stg.connect()
print ("connected to database_ps_ms_stg")
just creates a coroutine object when worker_process_init fires, but does nothing with this object afterwards.
It is probably a solution - to wrap your coroutines with kind of a scheduler-decorator, which will start them
def async2sync(func):
def wrapper(*args, **kwargs):
loop = asyncio.get_event_loop()
task = loop.ensure_future(func())
task.add_done_callback(lambda f: loop.stop())
loop.run_forever()
try:
return task.result()
except asyncio.CancelledError:
pass
return wrapper
...
#worker_process_init.connect
#async2sync
async def init_worker(**kwargs):
if not database_ps_ms_stg.is_connected:
await database_ps_ms_stg.connect()
print ("connected to database_ps_ms_stg")
Please check if this may answer some of your questions. My opinion - it's not worth it, better just use blocking connectors, as long as your code is intended to be used with celery.

Related

Python | FastApi/Uvicorn: How to pass a Queue to uvicorn.run(..)?

What I tried
I tried first to pass it on as a parameter somehow within uvicorn.run(...) but have not been successfully on it (I didn't found a general placeholder I could use for it).
I also tried to use a global variable, but it seems to me that uvicorn run on a separate process, thereby the Queue address did not stayed constant.
Then I tried to pass on the memory storage reference of the queue (as string) via environment variable, but then I was not able to transfer that string address back to an object to use it within the server.
Does anyone know how to solve this and basically pass on a Queue to the (FastApi)uvicorn server?
from multiprocessing import Queue
from multiprocessing import Process
import uvicorn as uvicorn
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
class WSServer:
def __init__(self): pass
def run(self, queue):
# how can I pass the --> queue <-- inside the uvicorn(FastApi) server?
uvicorn.run("server:app", host="0.0.0.0", port=8081, reload=True, access_log=False)
main_queue: Queue = Queue()
proc = Process(target=WSServer().run,name="Process: Simple Server",args=(main_queue,))
proc.start()
------------
app = FastAPI()
#app.websocket("/")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
try:
while True:
msg = await websocket.receive_text()
...
# here it would be lovely if I could use the "queue"...

How to gracefully close aioredis pool in aiohttp server?

I create redis pool the following way:
async def create_redis_connection_pool(app) -> aioredis.Redis:
redis = aioredis.from_url(
"redis://localhost", encoding="utf-8", decode_responses=True, max_connections=10,
)
app["redis"] = redis
try:
yield
finally:
loop = asyncio.get_event_loop()
await loop.create_task(app["redis"].close())
Then I use the function when I create the Aiohttp app:
def init() -> web.Application:
app = web.Application()
...
app.cleanup_ctx.append(create_redis_connection_pool)
...
return app
When I start server, do at least one request which use redis pool and then do Cnrl+C I get the following warning message:
sys:1: RuntimeWarning: coroutine 'Connection.disconnect' was never awaited
How to solve the issue and gracefully close Redis connection pool? I do tests in Mac OS.
If you're using redis==4.2.0 (from redis import asyncio as aioredis) or later,
pass close_connection_pool=True when you call .close():
await app["redis"].close(close_connection_pool=True)
Otherwise, for aioredis==2.0.1 (latest version as of this answer) or earlier,
call .connection_pool.disconnect() after .close():
await app["redis"].close()
await app["redis"].connection_pool.disconnect()
Reference: https://github.com/aio-libs/aioredis-py/pull/1256

How to create only 1 class instance when using Gunicorn and multiple workers?

I have a simple Python backend using falcon and websockets. If a client makes a call to an endpoint (e.g., to submit data) all other connected clients are notified via their respective websocket connection, i.e., the backend makes a broadcast to all currently connected clients. In general, this works just fine. Here's the minimal script for the falcon app
import falcon
from db.dbmanager import DBManager
from ws.wsserver import WebSocketServer
from api.resources.liveqa import DemoResource
dbm = DBManager() # PostgreSQL connection pool; works fine with multiple workers
wss = WebSocketServer() # Works only with 1 worker
app = falcon.App()
demo_resource = DemoResource(dbm, wss)
app.add_route('/api/v1/demo', demo_resource)
And here is the code for the websockets server which I instantiate and pass the resource class:
import json
import asyncio
import websockets
import threading
class WebSocketServer:
def __init__(self):
self.clients = {}
self.start_server()
async def handler(self, ws, path):
session_id = path.split('/')[-1]
if session_id in self.clients:
self.clients[session_id].add(ws)
else:
self.clients[session_id] = {ws}
try:
async for msg in ws:
pass # The clients are not supposed to send anything
except websockets.ConnectionClosedError:
pass
finally:
self.clients[session_id].remove(ws)
async def send(self, client, msg):
await client.send(msg)
def broadcast(self, session_id, msg):
if session_id not in self.clients:
return
for client in self.clients[session_id]:
try:
asyncio.run(self.send(client, json.dumps(msg)))
except:
pass
def start_server(self):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
start_server = websockets.serve(self.handler, host='111.111.111.111', port=5555)
asyncio.get_event_loop().run_until_complete(start_server)
threading.Thread(target=asyncio.get_event_loop().run_forever).start()
I use Gunicorn as server for the backend, and it works if I use just 1 worker. However, if I try --workers 2 I get the error that port 5555 is already in use. I guess this makes sense as each worker is trying to create a WebSocketServer instance using the same ip/port-pair.
What is the best / cleanest / most phytonic way to address this? I assume that I have to ensure that only one WebSocketServer instance is created. But how?
On a side note, I assume that a DBManager instance get created for each worker as well. While it doesn't throw an error as there can be multiple connections pools, I guess ensuring a single instance of DBManager is also the preferred way.
First of all, even running with one worker is potentially problematic, because Gunicorn is primarily a pre-forking server, and forking a process with threads is, in general, unsafe and may lead to unpredictable results.
One way to solve this is to use Gunicorn's server hooks to only start a thread (in this case a WebSocket server) in one of the workers, and only do that after forking. For instance,
import logging
import os
import threading
import falcon
import gunicorn.app.base
logging.basicConfig(
format='%(asctime)s [%(levelname)s] %(message)s', level=logging.INFO)
class HelloWorld:
def on_get(self, req, resp):
resp.media = {'message': 'Hello, World!'}
def do_something(fork_nr):
pid = os.getpid()
logging.info(f'in a thread, {pid=}')
if fork_nr == 1:
logging.info('we could start a WebSocket server...')
else:
logging.info('not the first worker, not starting any servers')
class HybridApplication(gunicorn.app.base.BaseApplication):
forks = 0
#classmethod
def pre_fork(cls, server, worker):
logging.info(f'about to fork a new worker #{cls.forks}')
cls.forks += 1
#classmethod
def post_fork(cls, server, worker):
thread = threading.Thread(
target=do_something, args=(cls.forks,), daemon=True)
thread.start()
def __init__(self):
self.options = {
'bind': '127.0.0.1:8000',
'pre_fork': self.pre_fork,
'post_fork': self.post_fork,
'workers': 4,
}
self.application = falcon.App()
self.application.add_route('/hello', HelloWorld())
super().__init__()
def load_config(self):
config = {key: value for key, value in self.options.items()
if key in self.cfg.settings and value is not None}
for key, value in config.items():
self.cfg.set(key.lower(), value)
def load(self):
return self.application
if __name__ == '__main__':
HybridApplication().run()
This simplistic prototype is not infallible, as we should also handle server reloads, the worker getting killed, etc. Speaking of which, you should probably use another worker type than sync for potentially long running requests, or set a long timeout, because otherwise the worker can get killed, taking the WebSocket thread with it. Specifying a number of threads should automatically change your worker type into gthread.
Note that here I implemented a custom Gunicorn application, but you could achieve the same effect by specifying hooks via a configuration file.
Another option is to use the ASGI flavour of Falcon, and implement even the WebSocket part inside your app:
import asyncio
import logging
import falcon.asgi
logging.basicConfig(
format='%(asctime)s [%(levelname)s] %(message)s', level=logging.INFO)
class HelloWorld:
async def on_get(self, req, resp):
resp.media = {'message': 'Hello, World!'}
async def on_websocket(self, req, ws):
await ws.accept()
logging.info(f'WS accepted {req.path=}')
try:
while True:
await ws.send_media({'message': 'hi'})
await asyncio.sleep(10)
finally:
logging.info(f'WS disconnected {req.path=}')
app = falcon.asgi.App()
app.add_route('/hello', HelloWorld())
Note that Gunicorn itself does not "speak" ASGI, so you would either need to use an ASGI app server, or use Gunicorn as a process manager for Uvicorn workers.
For instance, assuming your file is called test.py, you could run Uvicorn directly as:
pip install uvicorn[standard]
uvicorn test:app
However, if you went the ASGI route, you would need to implement your responders as coroutine functions (async def on_get(...) etc), or run your synchronous DB code in a threadpool executor.

Running separate infinite background thread in a python webapp (fastapi/flask/django)

How can i launch an application which does below 2 things:
Expose rest endpoint via FastAPI.
Run a seperate thread infintely (rabbitmq consumer - pika) waiting for request.
Below is the code through which i am launching the fastAPI server. But when i try to run a thread, before execution of below line, it says coroutine was never awaited.
How can both be run parallely?
Maybe this is not the answer you are looking for.
There is a library called celery that makes multithreading easy to manage in python.
Check it out:
https://docs.celeryproject.org/en/stable/getting-started/introduction.html
import asyncio
from concurrent.futures import ThreadPoolExecutor
from fastapi import FastAPI
import uvicorn
app = FastAPI()
def run(corofn, *args):
loop = asyncio.new_event_loop()
try:
coro = corofn(*args)
asyncio.set_event_loop(loop)
return loop.run_until_complete(coro)
finally:
loop.close()
async def sleep_forever():
await asyncio.sleep(1000)
#
async def main():
loop = asyncio.get_event_loop()
executor = ThreadPoolExecutor(max_workers=2)
futures = [loop.run_in_executor(executor, run, sleep_forever),
loop.run_in_executor(executor, run, uvicorn.run, app)]
print(await asyncio.gather(*futures))
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Note: This may hinder your FastAPI performance. Better approach would be to use a Celery Task

Calling Tornado Coroutine from asyncio

My main event loop uses asyncio but needs to call a library method that is a coroutine of type tornado.concurrent.Future. Attempting to await on the coroutine fails with RuntimeError.
RuntimeError: Task got bad yield: <tornado.concurrent.Future object at 0x7f374abdbef0>
Documentation and searches have suggested upgrading the version of Tornado (currently using 4.5) or using method tornado.platform.asyncio.to_asyncio_future which no longer produces a RuntimeError but instead just hangs on await. I'm curious to know if someone can explain what is happening. There are two main methods, one with asyncio calling a Tornado coroutine and another that is purely Tornado which works as expected.
import asyncio
from tornado import gen
from tornado.platform.asyncio import to_asyncio_future
async def coro_wrap():
tornado_fut = coro()
print(f'tornado_fut = {tornado_fut}, type({type(tornado_fut)})')
async_fut = to_asyncio_future(tornado_fut)
print(f'async_fut = {async_fut}')
res = await async_fut
print(f'done => {res}')
#gen.coroutine
def coro():
print('coro start')
yield gen.sleep(3)
print('coro end')
return 'my result'
def main():
loop = asyncio.get_event_loop()
task = loop.create_task(coro_wrap())
loop.run_until_complete(task)
print('end')
def main2():
from tornado import ioloop
loop = ioloop.IOLoop()
res = loop.run_sync(coro)
print(res)
if __name__ == '__main__':
main()
Output from main
coro start
tornado_fut = <tornado.concurrent.Future object at 0x7f41493f1f28>, type(<class 'tornado.concurrent.Future'>)
async_fut = <Future pending>
Output from main2
coro start
coro end
my result
In new versions of Tornado, this just works.
In old versions of tornado you must both use to_asyncio_future and at startup call tornado.platform.asyncio.AsyncIOMainLoop.install().

Categories

Resources