How to start Django from code in background? - python

One can easily start Django using management command like this:
management.call_command('runserver', interactive=False)
But it actually blocks execution.
Any workaround apart from subprocess/threading/multiprocessing.
I mean how to do it in more native fashion?

A management command is not "starting django".
You "start django" by deploying on any number of web servers, each of which has methods to run in the background.
https://docs.djangoproject.com/en/dev/howto/deployment/
Dynamically deploying django isn't something I've seen, but I suppose you could write some scripts that generate webserver configuration files.
manage.py runserver should never be used for production environments / uses.
If that was just an example, and you actually want to run other asynchronous management commands, the accepted community answer to to use a task queue like Celery.
http://docs.celeryproject.org/en/latest/django/
You could then fire off 10000 non blocking management commands to be consumed "in the future" at some point by celery workers.

Related

Running an arbitrary Django task along with "runserver" forever

For a Django-based server I require the simultaneous running of scripts in a fashion similar to cronjobs. I want to avoid the explicit usage of cronjobs and instead, integrate these periodic tasks to the HTTP server initialization - that is, when I run either manage.py runserver or a very similar management command, alongside the HTTP daemon, two other processes start that can perform my tasks periodically.
I already created management commands for these scripts. What are my options?
My best guess is starting two threads either in AppConfig.ready() like suggested here or somehow in manage.py itself. I'm not entirely sure if it has any caveats, though.
Since asking this question, I realized my only solution is the initialization of threads and also that I should do it explicitly in either asgi.py or wsgi.py, depending on my production solution - runserver management command is not suitable for production.

what is a robust way to execute long-running tasks/batches under Django?

I have a Django app that is intended to be run on Virtualbox VMs on LANs. The basic user will be a savvy IT end-user, not a sysadmin.
Part of that app's job is to connect to external databases on the LAN, run some python batches against those databases and save the results in its local db. The user can then explore the systems using Django pages.
Run time for the batches isn't all that long, but runs to minutes, tens of minutes potentially, not seconds. Run frequency is infrequent at best, I think you could spend days without needing a refresh.
This is not celery's normal use case of long tasks which will eventually push the results back into the web UI via ajax and/or polling. It is more similar to a dev's occasional use of the django-admin commands, but this time intended for an end user.
The user should be able to initiate a run of one or several of those batches when they want in order to refresh the calculations of a given external database (the target db is a parameter to the batch).
Until the batches are done for a given db, the app really isn't useable. You can access its pages, but many functions won't be available.
It is very important, from a support point of view that the batches remain easily runnable at all times. Dropping down to the VMs SSH would probably require frequent handholding which wouldn't be good - it is best that you could launch them from the Django webpages.
What I currently have:
Each batch is in its own script.
I can run it on the command line (via if __name__ == "main":).
The batches are also hooked up as celery tasks and work fine that way.
Given the way I have written them, it would be relatively easy for me to allow running them from subprocess calls in Python. I haven't really looked into it, but I suppose I could make them into django-admin commands as well.
The batches already have their own rudimentary status checks. For example, they can look at the calculated data and tell whether they have been run and display that in Django pages without needing to look at celery task status backends.
The batches themselves are relatively robust and I can make them more so. This is about their launch mechanism.
What's not so great.
In Mac dev environment I find the celery/celerycam/rabbitmq stack to be somewhat unstable. It seems as if sometime rabbitmqs daemon balloons up in CPU/RAM use and then needs to be terminated. That mightily confuses the celery processes and I find I have to kill -9 various tasks and relaunch them manually. Sometimes celery still works but celerycam doesn't so no task updates. Some of these issues may be OSX specific or may be due to the DEBUG flag being switched for now, which celery warns about.
So then I need to run the batches on the command line, which is what I was trying to avoid, until the whole celery stack has been reset.
This might be acceptable on a normal website, with an admin watching over it. But I can't have that happen on a remote VM to which only the user has access.
Given that these are somewhat fire-and-forget batches, I am wondering if celery isn't overkill at this point.
Some options I have thought about:
writing a cleanup shell/Python script to restart rabbitmq/celery/celerycam and generally make it more robust. i.e. whatever is required to make celery & all more stable. I've already used psutil to figure out rabbit/celery process are running and display their status in Django.
Running the batches via subprocess instead and avoiding celery. What about django-admin commands here? Does that make a difference? Still needs to be run from the web pages.
an alternative task/process manager to celery with less capability but also less moving parts?
not using subprocess but relying on Python multiprocessing module? To be honest, I have no idea how that compares to launches via subprocess.
environment:
nginx, wsgi, ubuntu on virtualbox, chef to build VMs.
I'm not sure how your celery configuration makes it unstable but sounds like it's still the best fit for your problem. I'm using redis as the queue system and it works better than rabbitmq from my own experience. Maybe you can try it see if it improves things.
Otherwise, just use cron as a driver to run periodic tasks. You can just let it run your script periodically and update the database, your UI component will poll the database with no conflict.

Django, RabbitMQ, & Celery - why does Celery run old versions of my tasks after I update my Django code in development?

So I have a Django app that occasionally sends a task to Celery for asynchronous execution. I've found that as I work on my code in development, the Django development server knows how to automatically detect when code has changed and then restart the server so I can see my changes. However, the RabbitMQ/Celery section of my app doesn't pick up on these sorts of changes in development. If I change code that will later be run in a Celery task, Celery will still keep running the old version of the code. The only way I can get it to pick up on the change is to:
stop the Celery worker
stop RabbitMQ
reset RabbitMQ
start RabbitMQ
add the user to RabbitMQ that my Django app is configured to use
set appropriate permissions for this user
restart the Celery worker
This seems like a far more drastic approach than I should have to take, however. Is there a more lightweight approach I can use?
I've found that as I work on my code in development, the Django
development server knows how to automatically detect when code has
changed and then restart the server so I can see my changes. However,
the RabbitMQ/Celery section of my app doesn't pick up on these sorts
of changes in development.
What you've described here is exactly correct and expected. Keep in mind that Python will use a module cache, so you WILL need to restart the Python interpreter before you can use the new code.
The question is "Why doesn't Celery pick up the new version", but this is how most libraries will work. The Django development server, however, is an exception. It has special code that helps it automatically reload Python code as necessary. It basically restarts the web server without you needing to restart the web server.
Note that when you run Django in production, you probably WILL have to restart/reload your server (since you won't be using the development server in production, and most production servers don't try to take on the hassle of implementing a problematic feature of detecting file changes and auto-reloading the server).
Finally, you shouldn't need to restart RabbitMQ. You should only have to restart the Celery worker to use the new version of the Python code. You might have to clear the queue if the new version of the code is changing the data in the message, however. For example, the Celery worker might be receiving version 1 of the message when it is expecting to receive version 2.

django server sharing scope with celery workers

I am working on a web application that uses a permanent object MyService. Using a web interface I am dynamically updating its state and monitor its behavior. Now I would like to periodically call one of its methods. I was thinking of using celery PeriodicTask but run into some scope issues. It seems I need to execute three different processes:
python manage.py runserver
python manage.py celery worker
python manage.py celerybeat
The problem is that even if I ensure that MyService is a singleton that can be safely used by more than one thread, celery creates its own fresh copy of the object. Is there a way I could share this object between both django server and celery main process? I tried to find a way to start celery from within django script but until now with no success. Would appreciate any help.
If you need to share something between multiple processes or maybe even multiple machines (eg. your workers could run on a seperate machine) the best (and probably easiest) practice to share information would be using an external service.
In the simplest case you could use Django's DB, but if you encounter that this is not suitable for you, for example if you have a heavy write load you can use something like Redis or Memcache (which you can also talk to via Django's caching API). These will enable you to be able to handle a big write load and besides you can use eg. Redis as a queue for celery as well.

Python, Django and event loop (periodic jobs)

I am developing a Python applicaton for server that uses Django + WSGI + Apache under Debian Linux. The application has web interface as well as
command line interface (that still uses django models..., just does not use views and templates).
Database backend is SQLite3.
This applications also needs to run some jobs periodically. I wrote a unix-like daemon that uses python-gobject and python-glib, and runs those jobs like this:
gobject.timeout_add_seconds(seconds, someCallback...)
gobject.timeout_add_seconds(seconds, someCallback...)
...
gobject.timeout_add_seconds(seconds, someCallback...)
glib.MainLoop().run()
I tested it, and there are some strange problems on written data in sqlite db. I think that's because there are two Python instances reading and writing from/to a single sqlite db. One for apache+wsgi and one for my own daemon. (Or event 3 Python instances, when I use command line interface)
My question is, what do recommend me to do? Put those timeout_add and MainLoop in my "dj_survey.wsgi" to run on apache start?
No, you don't want to run background processes inside your apache/whatever WSGI environment.
Start them on the shell and use some method to communicate with your background process.

Categories

Resources