I followed the answers to this question to set a Celery task schedule with a time in the US/Eastern timezone. This seems to work as expected.
However, when I check schedules using the method described in this question (my own answer, actually), the output does not include the fact that there is a nofun callback set.
Is there a way to see this at runtime?
Related
I'm trying to implement SLA in my airflow DAG.
I know how SLAs work, you set a timedelta object and if the task does not get done in that duration, it will send an email and notifies that the task is not done yet.
I want some similar functionality, but instead of giving duration, I want to set specific time in SLA. For example, if the task is not done due to 8:00 AM, it sends the email and notifies the manager. Something like this:
'sla': time(hour=8, minute=0, second=0)
I have searched a lot, but nothing found.
Is there any solution for this specific problem? or any other solutions than SLA?
Thanks in advance.
SLA param of BaseOperator expects a datetime.timedelta object, so there is nothing more to do there. Take into consideration that SLA represents a time delta after the scheduled period is over. The example from the docs supposes a DAG scheduled daily:
For example if you set an SLA of 1 hour, the scheduler would send an email soon after 1:00AM on the 2016-01-02 if the 2016-01-01 instance has not succeeded yet.
The point is, it's always a time delta from the schedule period which is not what you are looking for.
So I think you should take another approach, like schedule your DAG whenever you need it, execute the tasks you want and then add a sensor operator to check if the condition you are looking for is met or not. There are a few types of sensors depending on the context you have you could choose from them.
Another option could be, create a new DAG dedicated to check if your tasks executed in the original DAG were successfully executed or not, and act accordingly (for example, send emails, etc.). To do this you could use an ExternalTaskSensor, check online for tutorials on how to implement it, although it may be simpler to avoid cross DAG dependencies as stated in the docs.
Hope that this could point you into the right direction.
Lets say scheduler is stopped for 5 hours and I had dag scheduled for twice every hour. Now when I restart the scheduler I do not want to airflow to backfill all the instances those were missed, Instead I want it to continue from the current hour.
To achieve this behavior, you can use the LatestOnlyOperator, which was just recently introduced to master, to the start of your DAG. It is not currently part of a released version though (1.7.1.3 is the latest version as of the writing of this post).
I'm sure you're no longer waiting for an answer, but for reference, this is covered here: https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls.
"When needing to change your start_date and schedule interval, change the name of the dag (a.k.a. dag_id) - I follow the convention : my_dag_v1, my_dag_v2, my_dag_v3, my_dag_v4, etc..."
Im currently making a program that would send random text messages at randomly generated times during the day. I first made my program in python and then realized that if I would like other people to sign up to receive messages, I would have to use some sort of online framework. (If anyone knowns a way to use my code in python without having to change it that would be amazing, but for now I have been trying to use web2py) I looked into scheduler but it does not seem to do what I have in mind. If anyone knows if there is a way to pass a time value into a function and have it run at that time, that would be great. Thanks!
Check out the Apscheduler module for cron-like scheduling of events in python - In their example it shows how to schedule some python code to run in a cron'ish way.
Still not sure about the random part though..
As for a web framework that may appeal to you (seeing you are familiar with Python already) you should really look into Django (or to keep things simple just use WSGI).
Best.
I think that actually you can use Scheduler and Tasks of web2py. I've never used it ;) but the documentation describes creation of a task to which you can pass parameters from your code - so something you need - and it should work fine for your needs:
scheduler.queue_task('mytask', start_time=myrandomtime)
So you need web2py's cron job, running every day and firing code similar to the above for each message to be sent (passing parameters you need, possibly message content and phone number, see examples in web2py book). This would be a daily creation of tasks which would be processed later by the scheduler.
You can also have a simpler solution, one daily cron job which prepares the queue of messages with random times for the next day and the second one which runs every, like, ten minutes, checks what awaits to be processed and sends messages. So, no Tasks. This way is a bit ugly though (consider a single processing which takes more then 10 minutes). You may also want to have and check some statuses of the messages to be processed (like pending, ongoing, done) to prevent a situation in which two jobs are working on the same message and to allow tracking progress of the processing. Anyway, you could use the cron method it in an early version of your software and later replace it by a better method :)
In any case, you should check expected number of messages to process and average processing time on your target platform - to make sure that the chosen method is quick enough for your needs.
This is an old question but in case someone is interested, the answer is APScheduler blocking scheduler with jobs set to run in regular intervals with some jitter
See: https://apscheduler.readthedocs.io/en/3.x/modules/triggers/interval.html
got a simple question, I believe, but it got me stuck anyways.
Say I have a simple model:
class myModel(models.Model):
expires = models.DateTimeField(...)
and I want, say on the specified time do something: send an email, delete model, change some of the models fields... Something. Is there a tool in django core, allowing me to do so?
Or, if not, I think some task queuing tool might be in order. I have djcelery working in my project, though I'm a completely newbie in it, and all I was able to perform so far, is to run django-celery-email package, in order to send my mail asynchronically. Though I can't say I'm fully capable of defining task and workers to work in background and be reliable.
If any ideas, on how to solve such problem, please, do not hesitate =)
Write a custom management command to do the task that you desire. When you are done, you should be able to run your task with python manage.py yourtaskname.
Use cron, at, periodic tasks in celery, django-cron, djangotaskscheduler or django-future to schedule your tasks.
I think the best is a background-task the reads the datime and executes a task if a datetime is or has been reached.
See the solution given here for a scheduled task
So the workflow would be:
Create the task you want to apply on objects whose date has been reached
Create a managment command that checks the datetimes in your DB, and execute the above task for every object the datetime has been reached
Use cron (Linux) or at(Windows) to schedule the command call
If you're on a UNIX-like machine, it's possible that you have access to cronjobs. If you're on Windows, I hear there's a program called at that can do similar things. If this doesn't suit your needs, there are a number of ways to do things every X hours using the time library (time.sleep(SOME_NUMBER_OF_SECONDS) in a loop with whatever else you want to do will do it if you want something done regularly, otherwise you'll need to look at time.localtime() and check for conditions).
What is the best way to schedule a periodic task starting at specific datetime?
(I'm not using cron for this considering I've the need to schedule about a hundred remote rsyncs,
where I compute the remote vs local offset and would need to rsync each path the second the logs are generated in each host.)
By my understanding the celery.task.schedules crontab class only allows specifying hour, minute, day of week.
The most useful tip I've found so far was this answer by nosklo.
Is this the best solution?
Am I using the wrong tool for the job?
Celery seems like a good solution for your scheduling problem: Celery's PeriodicTasks have run time resolution in seconds.
You're using an appropriate tool here, but the crontab entry is not what you want. You want to use python's datetime.timedelta object; the crontab scheduler in celery.schedules has only minute resolution, but using timedelta's to configure the PeriodicTask interval provides strictly more functionality, in this case, per second resolution.
e.g. from the Celery docs
>>> from celery.task import tasks, PeriodicTask
>>> from datetime import timedelta
>>> class EveryThirtySecondsTask(PeriodicTask):
... run_every = timedelta(seconds=30)
...
... def run(self, **kwargs):
... logger = self.get_logger(**kwargs)
... logger.info("Execute every 30 seconds")
http://ask.github.com/celery/reference/celery.task.base.html#celery.task.base.PeriodicTask
class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
The only challenge here is that you have to describe the frequency with which you want this task to run rather than at what clock time you want it to run; however, I would suggest you check out the Advanced Python Scheduler http://packages.python.org/APScheduler/
It looks like Advanced Python Scheduler could easily be used to launch normal (i.e. non Periodic) Celery tasks at any schedule of your choosing using it's own scheduling functionality.
I've recently worked on a task that involved Celery, and I had to use it for asynchronous operation as well as scheduled tasks. Suffice to say I resorted back to the old crontab for the scheduled task, although it calls a python script that spawns a separate asynchronous task. This way I have less to maintain for the crontab (to make the Celery scheduler run there needs some further setup), but I am making full use of Celery's asynchronous capabilities.