Django Crontab : How to stop parallel execution

Django Crontab : How to stop parallel execution - python

I have few cronjobs running with the help of django-crontab. Let us take one cronjob as an example, suppose this job A is scheduled to run every two minutes.
However, while the job is running and if it is not finished in two minutes, I do not want another instance of this job to execute.
Exploring few resources, I came across this article, but I am not sure where to fit this in.
https://bencane.com/2015/09/22/preventing-duplicate-cron-job-executions/
Did someone already came across this issue? How did you fix it?

According to the readme, you should be able to set:
CRONTAB_LOCK_JOBS = True
in your Django settings. That will prevent a new job instance from starting if a previous one is still running.

Related

In Django, what are the best ways to handle scheduled jobs where only one job is spun up instead of multiple?

[TLDR] What are the best ways to handle scheduled jobs in development where only one job is spun up instead of two? Do you use the 'noreload' option? Do you test to see if a file is locked and then stop the second instance of the job if it is? Are there better alternatives?
[Context]
[edit] We are still in a development environment and am looking to next steps for a production environment.
My team and I are currently developing a Django project, Django 1.9 with Python 3.5. We just discovered that Django spins up two instances of itself to allow for real time code changes.
APScheduler 3.1.0 is being used to schedule a DB ping every few minutes to see if there is new data for us to process. However, when Django spins up, we noticed that we were pinging twice and that there were two instances of our functions running. We tried to shut down the second job through APS but as they are in two different processes, APS is unable to see the other job.
After researching this, we discovered the 'noreload' option and another suggestion to test if a file has been locked.
The noreload prevents Django from spinning up the second instance. This solution works but feels weird. We haven't come across any documentation or guides that suggest that this is something that you want to/not do in production.
Filelock 2.0.6 is another option that we have tested. In this solution, the two scheduled tasks ping a local file to see if it is locked. If it isn't locked, then that task will lock it and run while the other one will stop running. If the task crashes, then the locked file will remain locked until a server restart. This feels like a hack.
In a production environment, are these good solutions? Are there other alternatives that we should look at for handling scheduled tasks that are better for this? Are there cons to either of these solutions that we haven't thought of?
'noreload' - is this something that is done normally in a production environment?

Having a function run at random time intervals, web2py

Im currently making a program that would send random text messages at randomly generated times during the day. I first made my program in python and then realized that if I would like other people to sign up to receive messages, I would have to use some sort of online framework. (If anyone knowns a way to use my code in python without having to change it that would be amazing, but for now I have been trying to use web2py) I looked into scheduler but it does not seem to do what I have in mind. If anyone knows if there is a way to pass a time value into a function and have it run at that time, that would be great. Thanks!

Check out the Apscheduler module for cron-like scheduling of events in python - In their example it shows how to schedule some python code to run in a cron'ish way.
Still not sure about the random part though..
As for a web framework that may appeal to you (seeing you are familiar with Python already) you should really look into Django (or to keep things simple just use WSGI).
Best.

I think that actually you can use Scheduler and Tasks of web2py. I've never used it ;) but the documentation describes creation of a task to which you can pass parameters from your code - so something you need - and it should work fine for your needs:
scheduler.queue_task('mytask', start_time=myrandomtime)
So you need web2py's cron job, running every day and firing code similar to the above for each message to be sent (passing parameters you need, possibly message content and phone number, see examples in web2py book). This would be a daily creation of tasks which would be processed later by the scheduler.
You can also have a simpler solution, one daily cron job which prepares the queue of messages with random times for the next day and the second one which runs every, like, ten minutes, checks what awaits to be processed and sends messages. So, no Tasks. This way is a bit ugly though (consider a single processing which takes more then 10 minutes). You may also want to have and check some statuses of the messages to be processed (like pending, ongoing, done) to prevent a situation in which two jobs are working on the same message and to allow tracking progress of the processing. Anyway, you could use the cron method it in an early version of your software and later replace it by a better method :)
In any case, you should check expected number of messages to process and average processing time on your target platform - to make sure that the chosen method is quick enough for your needs.

This is an old question but in case someone is interested, the answer is APScheduler blocking scheduler with jobs set to run in regular intervals with some jitter
See: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/interval.html

Django execute task on time specified in model datetime field

got a simple question, I believe, but it got me stuck anyways.
Say I have a simple model:
class myModel(models.Model):
expires = models.DateTimeField(...)
and I want, say on the specified time do something: send an email, delete model, change some of the models fields... Something. Is there a tool in django core, allowing me to do so?
Or, if not, I think some task queuing tool might be in order. I have djcelery working in my project, though I'm a completely newbie in it, and all I was able to perform so far, is to run django-celery-email package, in order to send my mail asynchronically. Though I can't say I'm fully capable of defining task and workers to work in background and be reliable.
If any ideas, on how to solve such problem, please, do not hesitate =)

Write a custom management command to do the task that you desire. When you are done, you should be able to run your task with python manage.py yourtaskname.
Use cron, at, periodic tasks in celery, django-cron, djangotaskscheduler or django-future to schedule your tasks.

I think the best is a background-task the reads the datime and executes a task if a datetime is or has been reached.
See the solution given here for a scheduled task
So the workflow would be:
Create the task you want to apply on objects whose date has been reached
Create a managment command that checks the datetimes in your DB, and execute the above task for every object the datetime has been reached
Use cron (Linux) or at(Windows) to schedule the command call

If you're on a UNIX-like machine, it's possible that you have access to cronjobs. If you're on Windows, I hear there's a program called at that can do similar things. If this doesn't suit your needs, there are a number of ways to do things every X hours using the time library (time.sleep(SOME_NUMBER_OF_SECONDS) in a loop with whatever else you want to do will do it if you want something done regularly, otherwise you'll need to look at time.localtime() and check for conditions).

Add Repeating Task With Redis

How do I schedule a task to run once every six hours (on repeat)?
I am trying to implement a Redis queue for the first time.
I went through Heroku's tutorial : https://devcenter.heroku.com/articles/python-rq
But the tutorial did not explain how to run a task repeatedly with a timeframe (such as checking a couple of websites for info, once every six hours)
Also, since I am new to do this, if I should not be using Redis for such a task, please let me know what I should be using to check a couple of websites for info once every six hours
Thanks

You don't need Redis for this functionality at all.
Take a look at the Heroku Scheduler here: https://devcenter.heroku.com/articles/scheduler
You can set this to run your code every hour, and have your code check if the current hour is 0,5,11,17 (or whatever other interval you may need).

Register a celery PeriodicTask after it's created and at runtime

My application creates PeriodicTask objects according to user-defined schedules. That is, the schedule for the PeriodicTask can change at any time. The past couple days have been spent in frustration trying to figure out how to get Celery to support this. Ultimately, the issue is that, for something to run as a PeriodicTask it first, has to be created and then second, has to be registered (I have no idea why this is required).
So, for dynamic tasks to work, I need
to register all the tasks when the celery server starts
to register a task when it is newly created.
#1 should be solved easily enough by running a startup script (i.e., something that gets run after ./manage.py celerybeat gets called). Unfortunately, I don't think there's a convenient place to put this. If there were, the script would go something like this:
from djcelery.models import PeriodicTask
from celery.registry import tasks
for task in PeriodicTask.objects.filter(name__startswith = 'scheduler.'):
tasks.register(task)
I'm filtering for 'scheduler.' because the names of all my dynamic tasks begin that way.
#2 I have no idea. The issue so far as I see it is that celery.registry.tasks is kept in memory and there's no way, barring some coding magic, to access the celerybeat's tasks registry once it started running.
Thanks in advance for your help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.