Celery beat schedule, schedule to run on load then on interval - python

I am trying to figure out how to configure a periodic task in celery to be scheduled to run on load regardless of interval.
For example,
beat_schedule = {
'my-task': {
'task': 'module.my_task',
'schedule': 60.0,
},
}
will wait 60 seconds after the beat is started to run for the first time.
This is problematic for a longer interval, such as an hour, that can do work that is immediately valuable but is not needed "fresh" at shorter intervals.
This question addresses this issue but neither of the answers are satisfactory:
Adding startup lag for the task to be en-queued is both undesirable performance-wise and bad for maintainability since the initial run and schedule are now separated.
Re-implementing the schedule within the task is bad for maintainability.
This seems to me something that should be obvious, so I am quite surprised that that SO question is all I can find on the matter. I am unable to figure this out from the docs and the celery github issues so I wonder if I am missing something obvious.
Edit:
There seems to be more to the story here, because after trying a different task with an hour interval, it ran immediately as the project celery is started.
If I stop and clear the queue with celery purge -A proj -f then start celery again, the task does not run within the heartbeat interval. This would makes sense because the worker handles the messages but beat has its own schedule record celerybeat-schedule which would be unaffected by the purge.
If I delete celerybeat-schedule and restart beat the task still does not run. Starting celery beat with a non-default schedule db location also does not cause the task to run. The next time the task runs is one hour from the time I started the new beat (14:59) not one hour from the first start time of the task (13:47).
There seems to be some state that is not documented well or is unknown that is the basis of this issue. My question can also be stated as: how do you force beat to clear its record of last runs?.
I am also concerned that while running the worker and beat, running celery -A proj inspect scheduled gives - empty - but presumably the task had to be scheduled at some point because it gets run.

Related

Everytime Celery is restarted all the scheduled tasks are acknowledged first and it takes 20 minutes if there are 100 thousands of tasks

I mean every time when we restart our Celery workers because of our App being deployed - they are not actually executing any tasks from the Q until all the tasks are received/acknowledged first. Which take a huge amount of time given our amount of the tasks in the Q - close to 100K tasks and close to 20 minutes of lag accordingly.
This question was already asked in 2016 here, here is what I can come up with given that, but I hope there is a better way:
We can support multiple celery Qs inside a single RMQ and do this:
tasks by default go to the new Q
upon every deploy we move all thousands of the tasks from the new to old Q (dunno how quickly this can be done)
different workers deal with old and new tasks
That way at least the new incoming tasks won't be waiting for the old ones to be acked. But imagine - what if in the old tasks there is a task which should be executed during this period while the old tasks are still being fed to the workers - this approach won't solve it.

Celery Django runing periodic tasks after previus was done. [django-celery-beat]

I want to use django-celery-beat library to make some changes in my database periodically. I set task to run each 10 minutes. Everything working fine till my task takes less than 10 minutes, if it lasts longer next tasks starts while first one is doing calculations and it couses an error.
my tasks loks like that:
from celery import shared_task
from .utils.database_blockchain import BlockchainVerify
#shared_task()
def run_function():
build_block = BlockchainVerify()
return "Database updated"
is there a way to avoid starting the same task if previous wasn't done ?
There is definitely a way. It's locking.
There is whole page in the celery documentation - Ensuring a task is only executed one at a time.
Shortly explained - you can use some cache or even database to put lock in and then every time some task starts just check if this lock is still in use or has been already released.
Be aware of that the task may fail or run longer than expected. Task failure may be handled by adding some expiration to the lock. And set the lock expiration to be long enough just in case the task is still running.
There already is a good thread on SO - link.

Celery's expires option doesn't work

I'm playing around with Celery, and I'm trying to do a periodic task with CELERYBEAT_SCHEDULER. Here is my configuration:
CELERY_TIMEZONE = 'Europe/Kiev'
CELERYBEAT_SCHEDULE = {
'run-task-every-5-seconds': {
'task': 'tasks.run_every_five_seconds',
'schedule': timedelta(seconds=5),
'options': {
'expires': 10,
}
},
}
# the task
#app.task()
def run_every_five_seconds():
return '5 seconds passed'
When running the beat with celery -A celery_app beat the task doesn't seem to expire. Then I've read that there might be some issue with the beat, so it do not take into account the expires option.
Then I've tried to do a task, so it gets called manually.
#app.task()
def print_hello():
while True:
print datetime.datetime.now()
sleep(1)
I am calling the task in this way:
print_hello.apply_async(args=[], expires=5)
The worker's console is telling my that my task will expire, but it doesn't get expired as well. It's getting executed infinitely.
Received task: tasks.print_hello[05ee0175-cf3a-492b-9601-1450eaaf8ef7] expires:[2016-01-15 00:08:03.707062+02:00]
Is there something I am doing wrong?
I think you have understood the expires argument wrong.
The documentation says: "The task will not be executed after the expiration time." ref. It means the execution will not start if the expiration time has passed. If the execution has already started, the execution will run to completion.
Your configuration adds a task to task queue every 5 seconds. If the execution does not start in 10 seconds from the time the task is added to the task queue, the task is discarded. However, the tasks is executed immediately because there is a free celery worker available.
Your code example adds a task that is discarded if the execution is not started in 5 seconds.
To get the functionality you want, you can replace 'expires': 10, with 'expires': datetime.datetime.now() + timedelta(seconds=10),. That will set the expires to a absolute time.
To add to the previous answer, the purpose of the expire parameter is captured at: https://github.com/celery/celery/issues/591
Let me explain with an example,
let's say you are scheduling a task to be executed every 5 minutes. so celery beat adds the task every 5 minutes to the task queue. Now, for some reason if the worker was not working, it would not pick any task from the task queue. The task queue grows over time with many repetitive tasks. As soon as the worker starts, it has a huge backlog and wastes time in doing the old tasks.
Solution? expires parameter.
Each task now will have, let's say 1 minute of expiring time. So when the worker is online again, it discards all old tasks which are expires and only works on the latest unexpired task. Thanks to this, worker doesn't have to waste time in old repetitive tasks.
Best Practice
When you don't know what to set the expires time, it's always good to set it equal to the schedule/interval.

Celery/Django: Get result of periodic task execution

I have a Django 1.7 project using Celery (latest). I have a REST API that receives some parameters, and creates, programmatically, a PeriodicTask. For testing, I'm using a period of seconds:
periodic_task, _= PeriodicTask.objects.get_or_create(name=task_label, task=task_name, interval=interval_schedule)
I store a reference to this tasks somewhere. I start celery beat:
python manage.py celery beat
and a worker:
python manage.py celery worker --loglevel=info
and my task runs as I can see in the worker's output.
I've set the result backend:
CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'
and with that, I can check the task results using the TaskMeta model. The objects there contains the task_id (the same that I would get if I call the task with .delay() or .apply_async() ), the status, the result, everything, beautiful.
However, I can't find a connection between the PeriodicTask object and TaskMeta.
PeriodicTask has a task property, but its just the task name/path. The id is just a consecutive number, not the task_id from TaskMeta, and I really need to be able to find the task that was executed as a PeriodicTask with TaskMeta so I can offer some monitoring over the status. TaskMeta doesn't have any other value that allows me to identify which task ran (since I will have several ones), so at least I could give a status of the last execution.
I've checked all over Celery docs and in here, but no solution so far.
Any help is highly appreciated.
Thanks
You can run service to monitor task have been performed by using command line
python manage.py celerycam --frequency=10.0
More detail at:
http://www.lexev.org/en/2014/django-celery-setup/

Celery - schedule periodic tasks starting at a specific time

What is the best way to schedule a periodic task starting at specific datetime?
(I'm not using cron for this considering I've the need to schedule about a hundred remote rsyncs,
where I compute the remote vs local offset and would need to rsync each path the second the logs are generated in each host.)
By my understanding the celery.task.schedules crontab class only allows specifying hour, minute, day of week.
The most useful tip I've found so far was this answer by nosklo.
Is this the best solution?
Am I using the wrong tool for the job?
Celery seems like a good solution for your scheduling problem: Celery's PeriodicTasks have run time resolution in seconds.
You're using an appropriate tool here, but the crontab entry is not what you want. You want to use python's datetime.timedelta object; the crontab scheduler in celery.schedules has only minute resolution, but using timedelta's to configure the PeriodicTask interval provides strictly more functionality, in this case, per second resolution.
e.g. from the Celery docs
>>> from celery.task import tasks, PeriodicTask
>>> from datetime import timedelta
>>> class EveryThirtySecondsTask(PeriodicTask):
... run_every = timedelta(seconds=30)
...
... def run(self, **kwargs):
... logger = self.get_logger(**kwargs)
... logger.info("Execute every 30 seconds")
http://ask.github.com/celery/reference/celery.task.base.html#celery.task.base.PeriodicTask
class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)
The only challenge here is that you have to describe the frequency with which you want this task to run rather than at what clock time you want it to run; however, I would suggest you check out the Advanced Python Scheduler http://packages.python.org/APScheduler/
It looks like Advanced Python Scheduler could easily be used to launch normal (i.e. non Periodic) Celery tasks at any schedule of your choosing using it's own scheduling functionality.
I've recently worked on a task that involved Celery, and I had to use it for asynchronous operation as well as scheduled tasks. Suffice to say I resorted back to the old crontab for the scheduled task, although it calls a python script that spawns a separate asynchronous task. This way I have less to maintain for the crontab (to make the Celery scheduler run there needs some further setup), but I am making full use of Celery's asynchronous capabilities.

Categories

Resources