Im looking for relatively simple and lightweight way to setup primitive DB maintain tasks for Django-based web-site. Celery seems for me like overkill.
In my mind its now looking like making custom Django management command, and putting in in cron. Maybe some could suggest better method?
django-extensions has a jobs-scheduling function that would work well for DB maintenance tasks. You still would rely on cron entries to actually run them though.
But then again, just doing a management command from cron is perfectly reasonable.
Django Chronograph is a django app with a very nice admin interface for managing Cron Jobs and setting up multiple task. So in this way, you don't have to go and fiddle with your server's cron file and this interface/app would manage it efficiently for you.
You can also do it the Django way by writing Custom Management Commands as also mentioned here.
Related
In creating scheduled tasks I've used both Cron and a specially set up daemon for django.
Cron is silly-simple, and the daemon (in my opinion) might be excessive. The daemon set up an independent Django instance.
Django itself (If I'm not mistaken) runs as a daemon anyway, correct?
I'm wondering - how do you schedule tasks within the Django environment without leaving off from standard use?
You can use Celery to run periodic tasks but depending on what are you trying to do it could be overkill.
If your use case it's simple, cron+management command it's way easier. You can use Kronos, django-cron or any of this libraries for this
I have a Django app that is intended to be run on Virtualbox VMs on LANs. The basic user will be a savvy IT end-user, not a sysadmin.
Part of that app's job is to connect to external databases on the LAN, run some python batches against those databases and save the results in its local db. The user can then explore the systems using Django pages.
Run time for the batches isn't all that long, but runs to minutes, tens of minutes potentially, not seconds. Run frequency is infrequent at best, I think you could spend days without needing a refresh.
This is not celery's normal use case of long tasks which will eventually push the results back into the web UI via ajax and/or polling. It is more similar to a dev's occasional use of the django-admin commands, but this time intended for an end user.
The user should be able to initiate a run of one or several of those batches when they want in order to refresh the calculations of a given external database (the target db is a parameter to the batch).
Until the batches are done for a given db, the app really isn't useable. You can access its pages, but many functions won't be available.
It is very important, from a support point of view that the batches remain easily runnable at all times. Dropping down to the VMs SSH would probably require frequent handholding which wouldn't be good - it is best that you could launch them from the Django webpages.
What I currently have:
Each batch is in its own script.
I can run it on the command line (via if __name__ == "main":).
The batches are also hooked up as celery tasks and work fine that way.
Given the way I have written them, it would be relatively easy for me to allow running them from subprocess calls in Python. I haven't really looked into it, but I suppose I could make them into django-admin commands as well.
The batches already have their own rudimentary status checks. For example, they can look at the calculated data and tell whether they have been run and display that in Django pages without needing to look at celery task status backends.
The batches themselves are relatively robust and I can make them more so. This is about their launch mechanism.
What's not so great.
In Mac dev environment I find the celery/celerycam/rabbitmq stack to be somewhat unstable. It seems as if sometime rabbitmqs daemon balloons up in CPU/RAM use and then needs to be terminated. That mightily confuses the celery processes and I find I have to kill -9 various tasks and relaunch them manually. Sometimes celery still works but celerycam doesn't so no task updates. Some of these issues may be OSX specific or may be due to the DEBUG flag being switched for now, which celery warns about.
So then I need to run the batches on the command line, which is what I was trying to avoid, until the whole celery stack has been reset.
This might be acceptable on a normal website, with an admin watching over it. But I can't have that happen on a remote VM to which only the user has access.
Given that these are somewhat fire-and-forget batches, I am wondering if celery isn't overkill at this point.
Some options I have thought about:
writing a cleanup shell/Python script to restart rabbitmq/celery/celerycam and generally make it more robust. i.e. whatever is required to make celery & all more stable. I've already used psutil to figure out rabbit/celery process are running and display their status in Django.
Running the batches via subprocess instead and avoiding celery. What about django-admin commands here? Does that make a difference? Still needs to be run from the web pages.
an alternative task/process manager to celery with less capability but also less moving parts?
not using subprocess but relying on Python multiprocessing module? To be honest, I have no idea how that compares to launches via subprocess.
environment:
nginx, wsgi, ubuntu on virtualbox, chef to build VMs.
I'm not sure how your celery configuration makes it unstable but sounds like it's still the best fit for your problem. I'm using redis as the queue system and it works better than rabbitmq from my own experience. Maybe you can try it see if it improves things.
Otherwise, just use cron as a driver to run periodic tasks. You can just let it run your script periodically and update the database, your UI component will poll the database with no conflict.
Does anyone knows of a proven and simple way of running a system command from a django application?
Maybe using celery? ...
From my research, it's a problematic task, since it involves permissions and insecure approaches to the problem. Am i right?
EDIT: Use case: delete some files on a remote machine.
Thanks
Here is one approach: in your Django web application, write a message to a queue (e.g., RabbitMQ) containing the information that you need. In a separate system, read the message from the queue and perform any file actions. You can indeed use Celery for setting up this system.
I am working on a web application that uses a permanent object MyService. Using a web interface I am dynamically updating its state and monitor its behavior. Now I would like to periodically call one of its methods. I was thinking of using celery PeriodicTask but run into some scope issues. It seems I need to execute three different processes:
python manage.py runserver
python manage.py celery worker
python manage.py celerybeat
The problem is that even if I ensure that MyService is a singleton that can be safely used by more than one thread, celery creates its own fresh copy of the object. Is there a way I could share this object between both django server and celery main process? I tried to find a way to start celery from within django script but until now with no success. Would appreciate any help.
If you need to share something between multiple processes or maybe even multiple machines (eg. your workers could run on a seperate machine) the best (and probably easiest) practice to share information would be using an external service.
In the simplest case you could use Django's DB, but if you encounter that this is not suitable for you, for example if you have a heavy write load you can use something like Redis or Memcache (which you can also talk to via Django's caching API). These will enable you to be able to handle a big write load and besides you can use eg. Redis as a queue for celery as well.
Something I've had interest in is regularly running a certain set of actions at regular time intervals. Obviously, this is a task for cron, right?
Unfortunately, the Internet seems to be in a bit of disagreement there.
Let me elaborate a little about my setup. First, my development environment is in Windows, while my production environment is hosted on Webfaction (Linux). There is no real cron on Windows, right? Also, I use Django! And what's suggested for Django?
Celery of course! Unfortunately, setting up Celery has been more or less a literal nightmare for me - please see Error message 'No handlers could be found for logger “multiprocessing”' using Celery. And this is only ONE of the problems I've had with Celery. Others include a socket error which it I'm the only one ever to have gotten the problem.
Don't get me wrong, Celery seems REALLY cool. Unfortunately, there seems to be a lack of support, and some odd limitations built into its preferred backend, RabbitMQ. Unfortunately, no matter how cool a program is, if it doesn't work, well, it doesn't work!
That's where I hope all of you can come in. I'd like to know about cron or a cron-equivalent, which can be set up similarly (preferably identically) in both a Windows and a Linux environment.
(I've been struggling with Celery for about two weeks now and unfortunately I think it's time to toss in the towel and give up on it, at least for now.)
I had the same problem, and held off trying to solve it with celery (too complicated) or cron (external to application) and ended up finding Advanced Python Scheduler. Only just started using it but it seems reasonably mature and stable, has decent documentation and will take a number of scheduling formats (e.g. cron style).
From the documentation, running a function at a specific interval.
from apscheduler.scheduler import Scheduler
sched = Scheduler()
sched.start()
def hello_world():
print "hello world"
sched.add_interval_job(hello_world,seconds=10)
This is non-blocking, and I run something pretty identical by simply importing the module from my urls.py. Hope this helps.
A simple, non-Celery way to approach things would be to create custom django-admin commands to perform your asynchronous or scheduled tasks.
Then, on Windows, you use the at command to schedule these tasks. On Linux, you use cron.
I'd also strongly recommend ditching Windows if you can for a development environment. Your life will be so much better on Linux or even Mac OSX. Re-purpose a spare or old machine with Ubuntu for example, or run Ubuntu in a VM on your Windows box.
https://github.com/andybak/django-cron
Triggered by a single cron task but all the scheduling and configuration is done in Python.
Django Chronograph is a great alternative. You only need to setup one cron then do everything in django admin. You can schedule tasks/commands from django management.