Celery's expires option doesn't work - python

I'm playing around with Celery, and I'm trying to do a periodic task with CELERYBEAT_SCHEDULER. Here is my configuration:
CELERY_TIMEZONE = 'Europe/Kiev'
CELERYBEAT_SCHEDULE = {
'run-task-every-5-seconds': {
'task': 'tasks.run_every_five_seconds',
'schedule': timedelta(seconds=5),
'options': {
'expires': 10,
}
},
}
# the task
#app.task()
def run_every_five_seconds():
return '5 seconds passed'
When running the beat with celery -A celery_app beat the task doesn't seem to expire. Then I've read that there might be some issue with the beat, so it do not take into account the expires option.
Then I've tried to do a task, so it gets called manually.
#app.task()
def print_hello():
while True:
print datetime.datetime.now()
sleep(1)
I am calling the task in this way:
print_hello.apply_async(args=[], expires=5)
The worker's console is telling my that my task will expire, but it doesn't get expired as well. It's getting executed infinitely.
Received task: tasks.print_hello[05ee0175-cf3a-492b-9601-1450eaaf8ef7] expires:[2016-01-15 00:08:03.707062+02:00]
Is there something I am doing wrong?

I think you have understood the expires argument wrong.
The documentation says: "The task will not be executed after the expiration time." ref. It means the execution will not start if the expiration time has passed. If the execution has already started, the execution will run to completion.
Your configuration adds a task to task queue every 5 seconds. If the execution does not start in 10 seconds from the time the task is added to the task queue, the task is discarded. However, the tasks is executed immediately because there is a free celery worker available.
Your code example adds a task that is discarded if the execution is not started in 5 seconds.
To get the functionality you want, you can replace 'expires': 10, with 'expires': datetime.datetime.now() + timedelta(seconds=10),. That will set the expires to a absolute time.

To add to the previous answer, the purpose of the expire parameter is captured at: https://github.com/celery/celery/issues/591
Let me explain with an example,
let's say you are scheduling a task to be executed every 5 minutes. so celery beat adds the task every 5 minutes to the task queue. Now, for some reason if the worker was not working, it would not pick any task from the task queue. The task queue grows over time with many repetitive tasks. As soon as the worker starts, it has a huge backlog and wastes time in doing the old tasks.
Solution? expires parameter.
Each task now will have, let's say 1 minute of expiring time. So when the worker is online again, it discards all old tasks which are expires and only works on the latest unexpired task. Thanks to this, worker doesn't have to waste time in old repetitive tasks.
Best Practice
When you don't know what to set the expires time, it's always good to set it equal to the schedule/interval.

Related

Celery Django runing periodic tasks after previus was done. [django-celery-beat]

I want to use django-celery-beat library to make some changes in my database periodically. I set task to run each 10 minutes. Everything working fine till my task takes less than 10 minutes, if it lasts longer next tasks starts while first one is doing calculations and it couses an error.
my tasks loks like that:
from celery import shared_task
from .utils.database_blockchain import BlockchainVerify
#shared_task()
def run_function():
build_block = BlockchainVerify()
return "Database updated"
is there a way to avoid starting the same task if previous wasn't done ?
There is definitely a way. It's locking.
There is whole page in the celery documentation - Ensuring a task is only executed one at a time.
Shortly explained - you can use some cache or even database to put lock in and then every time some task starts just check if this lock is still in use or has been already released.
Be aware of that the task may fail or run longer than expected. Task failure may be handled by adding some expiration to the lock. And set the lock expiration to be long enough just in case the task is still running.
There already is a good thread on SO - link.

Celery apply_async get get called multiple times

I have created a task
#app.task(bind=True, max_retries=1)
def notify_feedback(self, req_id):
#some things
I have called this task from my view with a delay of 1 hour like
later = datetime.datetime.utcnow() + datetime.timedelta(hours=1)
notify_feedback.apply_async((req_id,), eta=later)
When I checked the SQS Messages in Flight it has 1 count pending
after one hour this notify_feedback get called multiple times. Did any one encountered this kind of issue with celery?
celery- 4.1.0 is used
I faced such issue as well, but I have delayed task more than for 1 hour.
When I set this in settings.py my I solved my issue.
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 86400}
The visibility timeout defines the number of seconds to wait for the worker to acknowledge the task before the message is redelivered to another worker.
More details there.

Celery beat schedule, schedule to run on load then on interval

I am trying to figure out how to configure a periodic task in celery to be scheduled to run on load regardless of interval.
For example,
beat_schedule = {
'my-task': {
'task': 'module.my_task',
'schedule': 60.0,
},
}
will wait 60 seconds after the beat is started to run for the first time.
This is problematic for a longer interval, such as an hour, that can do work that is immediately valuable but is not needed "fresh" at shorter intervals.
This question addresses this issue but neither of the answers are satisfactory:
Adding startup lag for the task to be en-queued is both undesirable performance-wise and bad for maintainability since the initial run and schedule are now separated.
Re-implementing the schedule within the task is bad for maintainability.
This seems to me something that should be obvious, so I am quite surprised that that SO question is all I can find on the matter. I am unable to figure this out from the docs and the celery github issues so I wonder if I am missing something obvious.
Edit:
There seems to be more to the story here, because after trying a different task with an hour interval, it ran immediately as the project celery is started.
If I stop and clear the queue with celery purge -A proj -f then start celery again, the task does not run within the heartbeat interval. This would makes sense because the worker handles the messages but beat has its own schedule record celerybeat-schedule which would be unaffected by the purge.
If I delete celerybeat-schedule and restart beat the task still does not run. Starting celery beat with a non-default schedule db location also does not cause the task to run. The next time the task runs is one hour from the time I started the new beat (14:59) not one hour from the first start time of the task (13:47).
There seems to be some state that is not documented well or is unknown that is the basis of this issue. My question can also be stated as: how do you force beat to clear its record of last runs?.
I am also concerned that while running the worker and beat, running celery -A proj inspect scheduled gives - empty - but presumably the task had to be scheduled at some point because it gets run.

How to set a time limit to a task if not excute within a certain time then just remove it in celery

I'm using celery(rabbitmq as broker) to do many tasks every minute.Since the task is a little time-consuming, the tasks in the queue may accumulate which leads to the newest come task not excute in time. How can I deal with it?
I think I find it in the document:
Expiration
add.apply_async(args=[10, 10], expires=60)
Task Expires
#task(expires=50)

Same task executed multiple times

I have ETA tasks that get sent to a Redis broker for Celery. It is a single celery and redis instance, both int he same machine.
The problem is, tasks are getting executed multiple times. I've seen tasks executed 4 to 11 times.
I set up the visibility timeout to be 12 hours, given that my ETA's are between 4-11 hours (determined at runtime):
BROKER_TRANSPORT_OPTIONS = {'visibility_timeout': 12 * 60 * 60}
Even with that, tasks still get executed multiple times.
Initially, the task in question was not idempotent, so I tried adding in a DB check to make them idempotent.
it looks something like this:
#app.task
def foo(side_effect_action):
if side_effect_action.executed:
return ALREADY_EXECUTED
else:
do_side_effect()
side_effect_action.executed = True
side_effect_action.save() #hits the db
return JUST_EXECUTED
Turns out that the celery worker gets to the task before foo is able to call side_effect_action.save() and save the state, so in all cases when it's looking for side_effect_action.executed it is still False, and thus gets executed multiple times.
Any ideas how can I solve this issue?
I switched my celery broker to RabbitMQ to avoid this issue. It is unfortunate since I now have one more component in my webapp (I still need redis for something else), but it did solve the multiple execution for ETA tasks bug.

Categories

Resources