How to do logging with multiple django WSGI processes + celery on the same webserver - python

I've got a mod_wsgi server setup with 5 processes and a celery worker queue (2 of them) all on the same VM. I'm running into problems where the loggers are stepping on each other and while it appears there are some solutions if you are using python multiprocessing, I don't see how that applies to mod_wsgi processes combined also with celery processes.
What is everyone else doing with this problem? The celery tasks are using code that logs in the same files as the webserver code.
Do I somehow have to add a pid to the logfilename? That seems like it could get messy fast with lots of logfiles with unique names and no real coherent way to pull them all back together.
Do I have to write a log daemon that allows all the processes to log to it? If so, where do you start it up so that it is ready for all of the processes that might want to log.....
Surely there is some kind of sane pattern out there for this, I just don't know what it is yet.

As mentioned in the docs, you could use a separate server process which listens on a socket and logs to different destinations, and has whatever logging configuration you want (in terms of files, console and so on). The other processes just configure a SocketHandler to send their events to the server process. This is generally better than separate log files with pids in their filenames.
The logging docs contain an example socket server implementation which you can adapt to your needs.

Related

For a web application in Python running in a web server with WSGI, how to have one single WSGI Worker performing a task?

My web application with Python 3.9 and Flask is running in a web server with WSGI.
As more users connect to the web server, more workers are started by WSGI, but there are some tasks that must be performed by one single WSGI worker, rather than all Workers at the same time.
Among such tasks to be performed by one single worker are:
delete obsolete files in the disk
copy some data from a file to REDIS
delete specific lines in various TXT and LOG files
If all workers do such tasks, then a mess starts.
How to have one single worker doing it, rather than all workers?
You may want to look into implementing an asynchronous task queue; something like celery would work to do this, you can define the frequency at which tasks run.

Single process Python WSGI Server?

Are there any single threaded Python WSGI servers out there? It seems that every single one of the last generation of Python servers has an arbiter processes that exists to ensure a worker count. For instance, when you start Gunicorn you actually start at a bare minimum 3 processes; one root process, one arbiter process, and the actual worker.
This really doesn't play nice in Kubernetes, because it generally assumes you have one process and a ThreadPool. Having multiple processes can mess with things like OOM killer. And having an arbiter process is redundant when you have healthchecks and multiple pods. It can cause more problems than it solves when you have multiple things doing the same thing.
Are there any reliable single threaded Python WSGI servers around? In the past I've written hacks around Gunicorn.
If not, what should I be aware of? (i.e signals etc).

Making a zmq server run forever in Django?

I'm trying to figure that best way to keep a zeroMQ listener running forever in my django app.
I'm setting up a zmq server app in my Django project that acts as internal API to other applications in our network (no need to go through http/requests stuff since these apps are internal). I want the zmq listener inside of my django project to always be alive.
I want the zmq listener in my Django project so I have access to all of the projects models (for querying) and other django context things.
I'm currently thinking:
Set up a Django management command that will run the listener and keep it alive forever (aka infinite loop inside the zmq listener code) or
use a celery worker to always keep the zmq listener alive? But I'm not exactly sure on how to get a celery worker to restart a task only if it's not running. All the celery docs are about frequency/delayed running. Or maybe I should let celery purge the task # a given interval & restart it anyways..
Any tips, advice on performance implications or alternate approaches?
Setting up a management command is a fine way to do this, especially if you're running on your own hardware.
If you're running in a cloud, where a machine may disappear along with your process, then the latter is a better option. This is how I've done it:
Setup a periodic task that runs every N seconds (you need celerybeat running somewhere)
When the task spawns, it first checks a shared network resource (redis, zookeeper, or a db), to see if another process has an active/valid lease. If one exists, abort.
If there's no valid lease, obtain your lease (beware of concurrency here!), and start your infinite loop, making sure you periodically renew the lease.
Add instrumentation so that you know who, where the process is running.
Start celery workers on multiple boxes, consuming from the same queue your periodic task is designated for.
The second solution is more complex and harder to get right; so if you can, a singleton is great and consider using something like supervisord to ensure the process gets restarted if it faults for some reason.

Need queue module to be shared between two applications

I need to share some queue between two applications on same machine, one is Tornado which is going to occasionally add message to that queue and another is python script runs from cron which is going in every iteration add new messages. Can anyone suggest me module for this ?
(Can this be solved with redis usage, I avoid to use mysql for this purpose )
I would use redis with a list. You can push a element top, and rpop to remove from the tail.
See redis rpop
and redis lpushx
The purest way I can think of to do this is with IPC. Python has very good support for IPC between two processes when one process spawns another, but not in your scenario. There are python modules for ipc such as sysv_ipc and posix_ipc. But if you are going to have your main application built in tornado, why not just have it listen on a zeromq socket for published messages.
Here is a link with more information. You want the Publisher-Subscriber model.
http://zeromq.github.io/pyzmq/eventloop.html#tornado-ioloop
Your cron job will start and publish messages a to zeromq socket. Your already running application will receive them as subscriber.
Try RabbitMQ for hosting the queue independent of your applications, then access using Pika, which even comes with a Tornado adapter. Just pick the appropriate model: queue/exchange/topic and protocol of the message you want (strings, json, xml, yaml) and you are set.

Apache + mod_wsgi interaction

Before posting this, I have read quite a few resources online, including the mod_wsgi wiki, but I am confused about how exactly Apache processes/threads interact with mod_wsgi.
This is my current understanding: Apache can be configured to run such that one or more child processes can handle incoming requests, and each of these child processes can be configured to in turn use one or more threads to service requests. After that, things start getting hazy for me. My doubts are:
What is a WSGIDaemonProcess, and who actually calls my Django app using the python sub interpreter?
If I have my Django app running under a mode where multiple threads are allowed in a single Apache child process - does that mean that multiple requests could be simultaneously accessing my app at the same time? If so - would doing something like setting a module level variable (say that of an user's ID) could be over-written by other parallel requests and lead to non-thread safe behavior?
For the case above, with Python's global interpreter lock, would the threads actually be executing in parallel?
Answers to each of the points.
1 - WSGIDaemonProcess/WSGIProcessGroup indicate that mod_wsgi should fork of a separate process for running the WSGI application in. This is a fork only and not a fork/exec, so mod_wsgi is still in control of it. When it is detected that a URL maps to a WSGI application running in daemon mode, then mod_wsgi code in the Apache child worker processes will proxy the request details through to the daemon mode process where the mod_wsgi code there reads it and calls up into your WSGI application.
2 - Yes, multiple requests can be operating concurrently and be wanting to modify the module global data at the same time.
3 - For the time that execution is within Python itself then no, they aren't strictly running in parallel as the global interpreter lock means that only one thread can be executing Python code at a time. The Python interpreter will periodically switch which thread is getting to run. If one of the threads calls into C code and releases the GIL then at least for the time that thread is in that state it can run in parallel to other threads, running in Python or C code. As example, when calls are made down into the Apache/mod_wsgi layer to write back response data, the GIL is released. This means that the actual writing back of response data at the lower layers is not prevent other threads from running.

Categories

Resources