Start a Django Channels Worker within the same process - python

I have to set up a worker which handles some data after a certain event happens. I know I can start the worker with python manage.py runworker my_worker, but what I would need is to start the worker in the same process as the main Django app on a separate thread.
Why do I need it in a separate thread and not in a separate process? Because the worker would perform a pretty light-weight job which would not overload the server's resources, and, moreover, the effort of making the set up for a new process in the production is not worth the gain in performance. In other words, I would prefer to keep it in the Django's process if possible.
Why not perform the job synchronously? Because it is a separate logic that needs the possibility to be extended, and it is out of the main HTTP request-reply scope. It is a post-processing task which doesn't interfere with the main logic. I need to decouple this task from an infrastructural point-of-view, not only logical (e.g. with plain signals).
Is there any possibility provided by Django Channels to run a worker in such a way?
Would there be any downsides to start the worker manually on a separate thread?
Right now I have the setup for a message broker consumer thread (without using Channels), so I have the entry point for starting a new worker thread. But as I've seen from the Channel's runworker command, it loads the whole application, so it doesn't seem like a naïve worker.run() call is the proper way to do it (I might be wrong with this one).

I found an answer to my question.
The answer is no, you can't just start a worker within the same process. This is because the consumer needs to run inside an event loop thread and it is not good at all to have more than one event loop thread in the same process (Django WSGI application already runs the main thread with an event loop).
The best you can do is to start the worker in a separate process. As I mentioned in my question, I started a message broker consumer on a separate thread, which was not a good approach either, so I changed my configuration to start the consumers as separate processes.

Related

Health Check API blocked due to single IOLoop

I have a tornado application, we have two API's /health and /make
call to /make API takes 10 min to build required resources and load it to memory, during which period call's to /health is blocked due to which the server is marked as unhealthy. what is the better way to build /health API.
It's a good and widespread practice (for a reason) to move long blocking operations from the main thread into a separate thread/pools/Celery/etc. If you do so with resources building, your main thread with the /health would be unblocked and available.
I believe that the easiest, and most tornado-like, way of moving the blocking process into a new thread is to use tornados subprocess impl. which described here: https://www.tornadoweb.org/en/stable/process.html#tornado.process.Subprocess
In short: the idea is to start the build process in a new thread where the I/O is added to the IOLoop like any other non-blocking I/O resource. In reality the new process (the child / sub process) is completely separate from the main tornado process but it's interfaced to hide that fact.

Celery worker: periodic local background task

Is there any Celery functionality or preferred way of executing periodic background tasks locally when using a single worker? Sort of like a background thread, but scheduled and handled by Celery?
celery.beat doesn't seem suitable as it appears to be simply tied to a consumer (so could run on any server) - that's the type of scheduling I was after, but just a task that is always run locally on each server running this worker (the task does some cleanup and stats relating to the main task the worker handles).
I may be going about this the wrong way, but I'm confined to implementing this within a celery worker daemon.
You could use a custom remote control command and use the broadcast function on a cron to run cleanup or whatever else might be required.
One possible method I thought of, though not ideal, is to patch the celery.worker.heartbeat Heart() class.
Since we already use heartbeats, the class allows for a simple modification to its start() method (add another self.timer.call_repeatedly() entry), or an additional self.eventer.on_enabled.add() __init__ entry which references a new method that also uses self.timer.call_repeatedly() to perform a periodic task.

Making a zmq server run forever in Django?

I'm trying to figure that best way to keep a zeroMQ listener running forever in my django app.
I'm setting up a zmq server app in my Django project that acts as internal API to other applications in our network (no need to go through http/requests stuff since these apps are internal). I want the zmq listener inside of my django project to always be alive.
I want the zmq listener in my Django project so I have access to all of the projects models (for querying) and other django context things.
I'm currently thinking:
Set up a Django management command that will run the listener and keep it alive forever (aka infinite loop inside the zmq listener code) or
use a celery worker to always keep the zmq listener alive? But I'm not exactly sure on how to get a celery worker to restart a task only if it's not running. All the celery docs are about frequency/delayed running. Or maybe I should let celery purge the task # a given interval & restart it anyways..
Any tips, advice on performance implications or alternate approaches?
Setting up a management command is a fine way to do this, especially if you're running on your own hardware.
If you're running in a cloud, where a machine may disappear along with your process, then the latter is a better option. This is how I've done it:
Setup a periodic task that runs every N seconds (you need celerybeat running somewhere)
When the task spawns, it first checks a shared network resource (redis, zookeeper, or a db), to see if another process has an active/valid lease. If one exists, abort.
If there's no valid lease, obtain your lease (beware of concurrency here!), and start your infinite loop, making sure you periodically renew the lease.
Add instrumentation so that you know who, where the process is running.
Start celery workers on multiple boxes, consuming from the same queue your periodic task is designated for.
The second solution is more complex and harder to get right; so if you can, a singleton is great and consider using something like supervisord to ensure the process gets restarted if it faults for some reason.

Danger to having long lasting (non-deamon) threads in a Django/Gunicorn app?

I generally don't need to explicitly use threads in my Django application level programming (i.e. views). But I've noticed a library that looks interesting which handles server side analytics by via threading.
During a Django view, you would use their Python client to batch HTTP POSTs to their web service in a separate (non-daemon) thread. Normally, I would go with RabbitMQ for something like this, instead of threads but they wanted to lower the startup costs for the library.
My question is, are there any downsides to this approach? Threads have some additional memory footprint, but I'm not too worried about that. It obviously depends on the number of requests/threads started.
Is the fact that the threads are not daemons and potentially long running an issue? I assume that the Gunicorn process is the main thread of execution and it runs in an infinite loop, so it generally doesn't matter if it has to wait on the non-daemon threads to exit. Is that correct?
Kind of an open question but the main point is understanding the impact of non-daemon threads in Django/Gunicorn apps.
Gunicorn uses a pre-fork worker model. The Master process spawns and manages Worker processes. For non-Tornado uses, there are two kinds of Workers: Sync (default) and Async.
In normal operations, these Workers run in a loop until the Master either tells them to graceful shutdown or kills them. Workers will periodically issue a heartbeat to the Master to indicate that they are still alive and working. If a heartbeat timeout occurs, then the Master will kill the Worker and restart it.
Therefore, daemon and non-daemon threads that do not interfere with the Worker's main loop should have no impact. If the thread does interfere with the Worker's main loop, such as a scenario where the thread is performing work and will provide results to the HTTP Response, then consider using an Async Worker. Async Workers allow for the TCP connection to remain alive for a long time while still allowing the Worker to issue heartbeats to the Master.

How to create a thread-safe singleton in python

I would like to hold running threads in my Django application. Since I cannot do so in the model or in the session, I thought of holding them in a singleton. I've been checking this out for a while and haven't really found a good how-to for this.
Does anyone know how to create a thread-safe singleton in python?
EDIT:
More specifically what I wand to do is I want to implement some kind of "anytime algorithm", i.e. when a user presses a button, a response returned and a new computation begins (a new thread). I want this thread to run until the user presses the button again, and then my app will return the best solution it managed to find. to do that, i need to save somewhere the thread object - i thought of storing them in the session, what apparently i cannot do.
The bottom line is - i have a FAT computation i want to do on the server side, in different threads, while the user is using my site.
Unless you have a very good reason - you should execute the long running threads in a different process altogether, and use Celery to execute them:
Celery is an open source asynchronous
task queue/job queue based on
distributed message passing. It is
focused on real-time operation, but
supports scheduling as well.
The execution units, called tasks, are
executed concurrently on one or more
worker nodes using multiprocessing,
Eventlet or gevent. Tasks can execute
asynchronously (in the background) or
synchronously (wait until ready).
Celery guide for djangonauts: http://django-celery.readthedocs.org/en/latest/getting-started/first-steps-with-django.html
For singletons and sharing data between tasks/threads, again, unless you have a good reason, you should use the db layer (aka, models) with some caution regarding db locks and refreshing stale data.
Update: regarding your use case, define a Computation model, with a status field. When a user starts a computation, an instance is created, and a task will start to run. The task will monitor the status field (check db once in a while). When a user clicks the button again, a view will change the status to user requested to stop, causing the task to terminate.
If you want asynchronous code in a web application then you're taking the wrong approach. You should run background tasks with a specialist task queue like Celery: http://celeryproject.org/
The biggest problem you have is web server architecture. Unless you go against the recommended Django web server configuration and use a worker thread MPM, you will have no way to track your thread handles between requests as each request typically occupies its own process. This is how Apache normally works: http://httpd.apache.org/docs/2.0/mod/prefork.html
EDIT:
In light of your edit I think you might learn more by creating a custom solution that does this:
Maintains start/stop state in the database
Create a new program that runs as a daemon
Periodically check the start/stop state and begin or end work from here
There's no need for multithreading here unless you need to create a new process for each user. If so, things get more complicated and using Celery will make your life much easier.

Categories

Resources