Preventing multiple executions

Preventing multiple executions - python

I have this Django cron job script (I am using kronos for that, which is great).
Since I trigger this job every minute, I want to make sure that there isn't another instance of the script already running. If there is a previous job running, then I want to skip the current execution.
I know I can do that with a lock file, but that's not very reliable and can cause problems when you reboot in the middle of an execution (you have to clear the lock file), etc.
What's the best way to do this using Python (Django in this case)?
EDIT: I am targeting Linux, sorry for leaving this out.

There's a Django app here: https://github.com/jsocol/django-cronjobs
Also available on pip as cronjobs.
Just in case you go straight to pip; You register jobs using the decorator like so:
# myapp/cron.py
import cronjobs
#cronjobs.register
def periodic_task():
pass
Then run the command via:
$ ./manage.py cron periodic_task
It has job locking by default but you can disable it when you apply the decorator:
#register(lock=False)

Related

Is there a better way to schedule a task with Django? [duplicate]

In creating scheduled tasks I've used both Cron and a specially set up daemon for django.
Cron is silly-simple, and the daemon (in my opinion) might be excessive. The daemon set up an independent Django instance.
Django itself (If I'm not mistaken) runs as a daemon anyway, correct?
I'm wondering - how do you schedule tasks within the Django environment without leaving off from standard use?

You can use Celery to run periodic tasks but depending on what are you trying to do it could be overkill.
If your use case it's simple, cron+management command it's way easier. You can use Kronos, django-cron or any of this libraries for this

Django: Run a script right after runserver

Context:
I have a table on the database that uses values from an external database. This external database updates its values periodically.
Problem:
In order to update my database everytime i start the server, I want to run a script right after the runserver.
Potential Solution:
I have seen that it is possible to run a script from a certain app, which is something I'm interested in. This is achievable by using the django-extensions:
https://django-extensions.readthedocs.io/en/latest/runscript.html
However, this script only runs with the following command:
python manage.py runscript your_script
Is there any other way to run a script from an app and execute it right after the runserver command? I am open to suggestions!
Thanks in advance
Update
Thanks to #Raydel Miranda for the remarks, I feel i left some information behind.
My goal is, once I start the server I'm planning to open a socket to maintain my database updated.

You can execute the code in the top-level urls.py. That module is imported and executed once.
urls.py
from django.confs.urls.defaults import *
from your_script import one_time_startup_function
urlpatterns = ...
one_time_startup_function()

I would recommend to use something like this, lets say you have the script like this:
# abc.py
from your_app.models import do_something
do_something()
Now you can run this script right after runserver(or any other way you are running the django application) like this:
python manage.py runserver & python manage.py shell < abc.py
FYI, it will only work if you have bash in your terminal (like in ie Linux, MacOs).
Update
After reading you problem carefully, I think running a script after runserver might not be the best solution. As you said:
This external database updates its values periodically.
So, I think you need some sort of perodic task to do this update. You can use cronjob or you can use Celery for this.

Running the script after runserver don't seem a very good idea, the main reason is that you will have a window since the server is running (and available for users) till you finish synchronizing your data. Also if you synchronize using a script after runserver you won't get updates from the external db after that.
The best solution for this is to configure multiple databases, you can use the external database with only read access. This way your views will provide really updated data.
On the other hand ...
If want use something like a script is better to write a Django custom command (this way you don't have to deal with initializing django settings and other issues) and execute it using cron or celery as #ruddra states in his/her answer.
Said this, you should see this: https://docs.djangoproject.com/en/2.1/topics/db/multi-db/

This may help.
you can edit yourapp/apps.py
class MyAppConfig(AppConfig):
name = 'myapp'
def ready(self):
# update my database here
pass

what is a robust way to execute long-running tasks/batches under Django?

I have a Django app that is intended to be run on Virtualbox VMs on LANs. The basic user will be a savvy IT end-user, not a sysadmin.
Part of that app's job is to connect to external databases on the LAN, run some python batches against those databases and save the results in its local db. The user can then explore the systems using Django pages.
Run time for the batches isn't all that long, but runs to minutes, tens of minutes potentially, not seconds. Run frequency is infrequent at best, I think you could spend days without needing a refresh.
This is not celery's normal use case of long tasks which will eventually push the results back into the web UI via ajax and/or polling. It is more similar to a dev's occasional use of the django-admin commands, but this time intended for an end user.
The user should be able to initiate a run of one or several of those batches when they want in order to refresh the calculations of a given external database (the target db is a parameter to the batch).
Until the batches are done for a given db, the app really isn't useable. You can access its pages, but many functions won't be available.
It is very important, from a support point of view that the batches remain easily runnable at all times. Dropping down to the VMs SSH would probably require frequent handholding which wouldn't be good - it is best that you could launch them from the Django webpages.
What I currently have:
Each batch is in its own script.
I can run it on the command line (via if __name__ == "main":).
The batches are also hooked up as celery tasks and work fine that way.
Given the way I have written them, it would be relatively easy for me to allow running them from subprocess calls in Python. I haven't really looked into it, but I suppose I could make them into django-admin commands as well.
The batches already have their own rudimentary status checks. For example, they can look at the calculated data and tell whether they have been run and display that in Django pages without needing to look at celery task status backends.
The batches themselves are relatively robust and I can make them more so. This is about their launch mechanism.
What's not so great.
In Mac dev environment I find the celery/celerycam/rabbitmq stack to be somewhat unstable. It seems as if sometime rabbitmqs daemon balloons up in CPU/RAM use and then needs to be terminated. That mightily confuses the celery processes and I find I have to kill -9 various tasks and relaunch them manually. Sometimes celery still works but celerycam doesn't so no task updates. Some of these issues may be OSX specific or may be due to the DEBUG flag being switched for now, which celery warns about.
So then I need to run the batches on the command line, which is what I was trying to avoid, until the whole celery stack has been reset.
This might be acceptable on a normal website, with an admin watching over it. But I can't have that happen on a remote VM to which only the user has access.
Given that these are somewhat fire-and-forget batches, I am wondering if celery isn't overkill at this point.
Some options I have thought about:
writing a cleanup shell/Python script to restart rabbitmq/celery/celerycam and generally make it more robust. i.e. whatever is required to make celery & all more stable. I've already used psutil to figure out rabbit/celery process are running and display their status in Django.
Running the batches via subprocess instead and avoiding celery. What about django-admin commands here? Does that make a difference? Still needs to be run from the web pages.
an alternative task/process manager to celery with less capability but also less moving parts?
not using subprocess but relying on Python multiprocessing module? To be honest, I have no idea how that compares to launches via subprocess.
environment:
nginx, wsgi, ubuntu on virtualbox, chef to build VMs.

I'm not sure how your celery configuration makes it unstable but sounds like it's still the best fit for your problem. I'm using redis as the queue system and it works better than rabbitmq from my own experience. Maybe you can try it see if it improves things.
Otherwise, just use cron as a driver to run periodic tasks. You can just let it run your script periodically and update the database, your UI component will poll the database with no conflict.

Deploying a Python Script on a Server (CentOS): Where to start?

I'm new to Python (relatively new to programing in general) and I have created a small python script that scrape some data off of a site once a week and stores it to a local database (I'm trying to do some statistical analysis on downloaded music). I've tested it on my Mac and would like to put it up onto my server (VPS with WiredTree running CentOS 5), but I have no idea where to start.
I tried Googling for it, but apparently I'm using the wrong terms as "deploying" means to create an executable file. The only thing that seems to make sense is to set it up inside Django, but I think that might be overkill. I don't know...
EDIT: More clarity

You should look into cron for this, which will allow you to schedule the execution of your Python script.
If you aren't sure how to make your Python script executable, add a shebang to the top of the script, and then add execute permissions to the script using chmod.

Copy script to server
test script manually on server
set cron, "crontab -e" to a value that will test it soon
once you've debugged issues set cron to the appropriate time.

Sounds like a job for Cron?
Cron is a scheduler that provides a way to run certain scripts (apps, etc.) at certain times.
Here is a short tutorial that explains how to set up cron.
See this for more general cron information.
Edit:
Also, since you are using CentOS: if you end up having issues with your script later on... it could partly be caused by SELinux. There are ways to disable SELinux on your server (if you have enough access permissions.) But... there are arguments against disabling SELinux, as well.

Celery tasks profiling

As I can see in top utility celery procecess consume a lot of CPU time. So I want to profile it.
I can do it manually on developer machine like so:
python -m cProfile -o test-`date +%Y-%m-%d-%T`.prof ./manage.py celeryd -B
But to have accurate timings I need to profile it on production machine. On that machine (Fedora 14) celery is launched by init scripts. E.g.
service celeryd start
I have figured out these scripts eventually call manage.py celeryd_multi eventually. So my question is how can I tell celeryd_multi to start celery with profiling enabled? In my case this means add -m cProfile -o out.prof options to python.
Any help is much appreciated.

I think you're confusing two separate issues. You could be processing too many individual tasks or an individual task could be inefficient.
You may know which of these is the problem, but it's not clear from your question which it is.
To track how many tasks are being processed I suggest you look at celerymon. If a particular task appears more often that you would expect then you can investigate where it is getting called from.
Profiling the whole of celery is probably not helpful as you'll get lots of code that you have no control over. As you say it also means you have a problem running it in production. I suggest you look at adding the profiling code directly into your task definition.
You can use cProfile.run('func()') as a layer of indirection between celery and your code so each run of the task is profiled. If you generate a unique filename and pass it as the second parameter to run you'll have a directory full of profile data that you can inspect on a task-by-task basis, or use pstats.add to combine multiple task runs together.
Finally, per-task profiling means you can also turn profiling on or off using a setting in your project code either globally or by task, rather than needing to modify the init scripts on your server.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.