Is there a way to set the acks_late config in Celery? - python

For my Django project, I am using Celery with Redis, registering the tasks on runtime using the celery_app.tasks.register method. I want to retrigger a task in case of some failure, I have set the acks_late config param using task_acks_late=True on the app level while instantiating celery itself. I have also set task_reject_on_worker_lost=True. However, the tasks aren't being received back by celery no matter what. Is there any other way?

Related

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

How to test code that creates Celery tasks?

I've read Testing with Celery but I'm still a bit confused. I want to test code that generates a Celery task by running the task manually and explicitly, something like:
def test_something(self):
do_something_that_generates_a_celery_task()
assert_state_before_task_runs()
run_task()
assert_state_after_task_runs()
I don't want to entirely mock up the creation of the task but at the same time I don't care about testing the task being picked up by a Celery worker. I'm assuming Celery works.
The actual context in which I'm trying to do this is a Django application where there's some code that takes too long to run in a request, so, it's delegated to background jobs.
In test mode use CELERY_TASK_ALWAYS_EAGER = True. You can set this setting in your settings.py in django if you have followed the default guide for django-celery configuration.

Multiple instances of celerybeat for autoscaled django app on elasticbeanstalk

I am trying to figure out the best way to structure a Django app that uses Celery to handle async and scheduled tasks in an autoscaling AWS ElasticBeanstalk environment.
So far I have used only a single instance Elastic Beanstalk environment with Celery + Celerybeat and this worked perfectly fine. However, I want to have multiple instances running in my environment, because every now and then an instance crashes and it takes a lot of time until the instance is back up, but I can't scale my current architecture to more than one instance because Celerybeat is supposed to be running only once across all instances as otherwise every task scheduled by Celerybeat will be submitted multiple times (once for every EC2 instance in the environment).
I have read about multiple solutions, but all of them seem to have issues that don't make it work for me:
Using django cache + locking: This approach is more like a quick fix than a real solution. This can't be the solution if you have a lot of scheduled tasks and you need to add code to check the cache for every task. Also tasks are still submitted multiple times, this approach only makes sure that execution of the duplicates stops.
Using leader_only option with ebextensions: Works fine initially, but if an EC2 instance in the enviroment crashes or is replaced, this would lead to a situation where no Celerybeat is running at all, because the leader is only defined once at the creation of the environment.
Creating a new Django app just for async tasks in the Elastic Beanstalk worker tier: Nice, because web servers and workers can be scaled independently and the web server performance is not affected by huge async work loads performed by the workers. However, this approach does not work with Celery because the worker tier SQS daemon removes messages and posts the message bodies to a predefined urls. Additionally, I don't like the idea of having a complete additional Django app that needs to import the models from the main app and needs to be separately updated and deployed if the tasks are modified in the main app.
How to I use Celery with scheduled tasks in a distributed Elastic Beanstalk environment without task duplication? E.g. how can I make sure that exactly one instance is running across all instances all the time in the Elastic Beanstalk environment (even if the current instance with Celerybeat crashes)?
Are there any other ways to achieve this? What's the best way to use Elastic Beanstalk's Worker Tier Environment with Django?
I guess you could single out celery beat to different group.
Your auto scaling group runs multiple django instances, but celery is not included in the ec2 config of the scaling group.
You should have different set (or just one) of instance for celery beat
In case someone experience similar issues: I ended up switching to a different Queue / Task framework for django. It is called django-q and was set up and working in less than an hour. It has all the features that I needed and also better Django integration than Celery (since djcelery is no longer active).
Django-q is super easy to use and also lighter than the huge Celery framework. I can only recommend it!

Pymongo Celery ConfiguraionError Unknown option auto_start_request

I am using decorator inside my tasks which it manages my tasks. And I am using MongoDB as celery backend.
#app.task(bind=True)
#my_customize_decorator
def some_task(self):
#Do something
return
My decorator and task, both of them have MongoDB connection. When i send some_task.delay() to worker it gives me ConfigurationError: Unknown option auto_start_request.
I think celery sends auto_start_request option to pymongo and pymongo couldn't resolve that. But i don't know how can i override that configuration.
It causes from celery backend options. Not from task or decorator. Celery mongodb backend default options are you can see here
self.options.setdefault('max_pool_size', self.max_pool_size)`
self.options.setdefault('auto_start_request', False)`
These lines are causes ConfigurationError. After i remove these lines from
path/to/dist-pack/celery/backends/mongodb.py issue has been solved.

How do I configure Celery to use separate BROKER_URLs for producing and consuming from the same broker?

We have an application that uses a Celery instance in two ways: The instance's .task attribute is used as our task decorator, and when we invoke celery workers, we pass the instance as the -A (--app) argument. This workflow uses the same Celery instance for both producing and consuming, and it has worked, but we are using the same Celery instance for both producers (the tasks) and consumers (the celery workers).
Now, we are considering using Bigwig RabbitMQ, which is an AMQP service provider, and they publish two different URLs, one optimized for message producers, the other optimized for message consumers.
What's the best way for us to modify our setup in order to take advantage of the separate broker endpoints? I'm assuming a single Celery instance can only use a single broker URL (via the BROKER_URL setting). Should we use two distinct Celery instances configured identically except for the BROKER_URL setting?
This feature will be available in Celery 4.0: http://docs.celeryproject.org/en/master/whatsnew-4.0.html#configure-broker-url-for-read-write-separately
Yes you are right one celery instance can use only one broker URL. As you said the only way is to use 2 workers with just different BROKER_URL one for consuming and one for producing.
Technically is trivial, you can take advantage of this (http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.config_from_object) but off course you will have two workers running but I don't think that this introduces any problem.
There is also another option explained here , but I would avoid it.

Categories

Resources