For a web application in Python running in a web server with WSGI, how to have one single WSGI Worker performing a task? - python

My web application with Python 3.9 and Flask is running in a web server with WSGI.
As more users connect to the web server, more workers are started by WSGI, but there are some tasks that must be performed by one single WSGI worker, rather than all Workers at the same time.
Among such tasks to be performed by one single worker are:
delete obsolete files in the disk
copy some data from a file to REDIS
delete specific lines in various TXT and LOG files
If all workers do such tasks, then a mess starts.
How to have one single worker doing it, rather than all workers?

You may want to look into implementing an asynchronous task queue; something like celery would work to do this, you can define the frequency at which tasks run.

Related

Single process Python WSGI Server?

Are there any single threaded Python WSGI servers out there? It seems that every single one of the last generation of Python servers has an arbiter processes that exists to ensure a worker count. For instance, when you start Gunicorn you actually start at a bare minimum 3 processes; one root process, one arbiter process, and the actual worker.
This really doesn't play nice in Kubernetes, because it generally assumes you have one process and a ThreadPool. Having multiple processes can mess with things like OOM killer. And having an arbiter process is redundant when you have healthchecks and multiple pods. It can cause more problems than it solves when you have multiple things doing the same thing.
Are there any reliable single threaded Python WSGI servers around? In the past I've written hacks around Gunicorn.
If not, what should I be aware of? (i.e signals etc).

Is it common to run 20 python workers which uses Redis as Queue ?

This program listen to Redis queue. If there is data in Redis, worker start to do their jobs. All these jobs have to run simultaneously that's why each worker listen to one particular Redis queue.
My question is : Is it common to run more than 20 workers to listen to Redis ?
python /usr/src/worker1.py
python /usr/src/worker2.py
python /usr/src/worker3.py
python /usr/src/worker4.py
python /usr/src/worker5.py
....
....
python /usr/src/worker6.py
Having multiple worker processes (and when I mean "multiple" I'm talking hundreds or more), possibly running on different machines, fetching jobs from a job queue is indeed a common pattern nowadays. There even are whole packages/frameworks devoted to such workflows, like for example Celery.
What is less common is trying to write the whole task queues system from scratch in a seemingly ad-hoc way instead of using a dedicated task queues system like Celery, ZeroMQ or something similar.
If your worker need to do a long task with data, it's a solution. but each data must be treated by a single worker.
By this way, you can easly (without thread,etc..) distribute your tasks, it's better if your worker doesn't work in the same server

How to set up a distributed worker pool with Celery and RabbitMQ

I'm still really new to this kind of thing so it's entirely possible that I've got this wrong.
I am trying to set up a distributed task system. I have a Django webapp that is generating tasks using Celery. Right now, I have the webapp, the worker, and RabbitMQ running all on the same server. I would like to distribute this out to several servers.
As I currently understand it, I should be able to have my webapp generating tasks, handing them off to the message queue -- which is its own server -- and then workers distributed across any number of servers will consume tasks from that queue. I know how to tell my Django app which server is the broker, but how do I start worker threads on the worker servers and instruct them where to consume tasks from? I'm totally lost -- I don't even know where to look.
You can run your worker code (async_tasks.py) like this:
from celery import Celery
app = Celery('tasks', broker=broker_url)
#app.task(queue='queue_name')
def async_compute_something(input):
# do something
return "Result"
on other machines using this command :
celery -A async_tasks worker -Q queue_name
Note that you have set the url of the broker correctly and not localhost

How to do logging with multiple django WSGI processes + celery on the same webserver

I've got a mod_wsgi server setup with 5 processes and a celery worker queue (2 of them) all on the same VM. I'm running into problems where the loggers are stepping on each other and while it appears there are some solutions if you are using python multiprocessing, I don't see how that applies to mod_wsgi processes combined also with celery processes.
What is everyone else doing with this problem? The celery tasks are using code that logs in the same files as the webserver code.
Do I somehow have to add a pid to the logfilename? That seems like it could get messy fast with lots of logfiles with unique names and no real coherent way to pull them all back together.
Do I have to write a log daemon that allows all the processes to log to it? If so, where do you start it up so that it is ready for all of the processes that might want to log.....
Surely there is some kind of sane pattern out there for this, I just don't know what it is yet.
As mentioned in the docs, you could use a separate server process which listens on a socket and logs to different destinations, and has whatever logging configuration you want (in terms of files, console and so on). The other processes just configure a SocketHandler to send their events to the server process. This is generally better than separate log files with pids in their filenames.
The logging docs contain an example socket server implementation which you can adapt to your needs.

Making a zmq server run forever in Django?

I'm trying to figure that best way to keep a zeroMQ listener running forever in my django app.
I'm setting up a zmq server app in my Django project that acts as internal API to other applications in our network (no need to go through http/requests stuff since these apps are internal). I want the zmq listener inside of my django project to always be alive.
I want the zmq listener in my Django project so I have access to all of the projects models (for querying) and other django context things.
I'm currently thinking:
Set up a Django management command that will run the listener and keep it alive forever (aka infinite loop inside the zmq listener code) or
use a celery worker to always keep the zmq listener alive? But I'm not exactly sure on how to get a celery worker to restart a task only if it's not running. All the celery docs are about frequency/delayed running. Or maybe I should let celery purge the task # a given interval & restart it anyways..
Any tips, advice on performance implications or alternate approaches?
Setting up a management command is a fine way to do this, especially if you're running on your own hardware.
If you're running in a cloud, where a machine may disappear along with your process, then the latter is a better option. This is how I've done it:
Setup a periodic task that runs every N seconds (you need celerybeat running somewhere)
When the task spawns, it first checks a shared network resource (redis, zookeeper, or a db), to see if another process has an active/valid lease. If one exists, abort.
If there's no valid lease, obtain your lease (beware of concurrency here!), and start your infinite loop, making sure you periodically renew the lease.
Add instrumentation so that you know who, where the process is running.
Start celery workers on multiple boxes, consuming from the same queue your periodic task is designated for.
The second solution is more complex and harder to get right; so if you can, a singleton is great and consider using something like supervisord to ensure the process gets restarted if it faults for some reason.

Categories

Resources