How to test code that creates Celery tasks? - python

I've read Testing with Celery but I'm still a bit confused. I want to test code that generates a Celery task by running the task manually and explicitly, something like:
def test_something(self):
do_something_that_generates_a_celery_task()
assert_state_before_task_runs()
run_task()
assert_state_after_task_runs()
I don't want to entirely mock up the creation of the task but at the same time I don't care about testing the task being picked up by a Celery worker. I'm assuming Celery works.
The actual context in which I'm trying to do this is a Django application where there's some code that takes too long to run in a request, so, it's delegated to background jobs.

In test mode use CELERY_TASK_ALWAYS_EAGER = True. You can set this setting in your settings.py in django if you have followed the default guide for django-celery configuration.

Related

Is there a way to set the acks_late config in Celery?

For my Django project, I am using Celery with Redis, registering the tasks on runtime using the celery_app.tasks.register method. I want to retrigger a task in case of some failure, I have set the acks_late config param using task_acks_late=True on the app level while instantiating celery itself. I have also set task_reject_on_worker_lost=True. However, the tasks aren't being received back by celery no matter what. Is there any other way?

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Perpetual tasks in Celery?

I'm building a django app where I use a camera to capture images, analyze them, store metadata and results of the analysis in a database, and finally present the data to users.
I'm considering using Celery to handle to background process of capturing images and then processing them:
app = Celery('myapp')
#app.task
def capture_and_process_images(camera):
while True:
image = camera.get_image()
process_image(image)
sleep(5000)
#app.task
def process_image(image):
# do some calculations
# django orm calls
# etc...
The first task will run perpetually, while the second should take ~20 seconds, so there will be multiple images being processed at once.
I haven't found any examples online of using Celery in this way, so I'm not sure if this is bad practice or not.
Can/should Celery be used to handle perpetually running tasks?
Thank you.
Running perpetual tasks in Celery is a done in practise. Take a look at daemonization, which essentially runs a permanent task without user interaction, so I wouldn't say there is anything wrong with running it permanently in your case.
Having celery task running infinitely is not seems like a good idea to me.
If you are going to capture images at some intervals I would suggest you to use some cron-like script getting an image every 5 seconds and launching celery task to process it.
Note also that it is a best practice to avoid synchronous subtasks in celery, see docs for more details.

Multiple instances of celerybeat for autoscaled django app on elasticbeanstalk

I am trying to figure out the best way to structure a Django app that uses Celery to handle async and scheduled tasks in an autoscaling AWS ElasticBeanstalk environment.
So far I have used only a single instance Elastic Beanstalk environment with Celery + Celerybeat and this worked perfectly fine. However, I want to have multiple instances running in my environment, because every now and then an instance crashes and it takes a lot of time until the instance is back up, but I can't scale my current architecture to more than one instance because Celerybeat is supposed to be running only once across all instances as otherwise every task scheduled by Celerybeat will be submitted multiple times (once for every EC2 instance in the environment).
I have read about multiple solutions, but all of them seem to have issues that don't make it work for me:
Using django cache + locking: This approach is more like a quick fix than a real solution. This can't be the solution if you have a lot of scheduled tasks and you need to add code to check the cache for every task. Also tasks are still submitted multiple times, this approach only makes sure that execution of the duplicates stops.
Using leader_only option with ebextensions: Works fine initially, but if an EC2 instance in the enviroment crashes or is replaced, this would lead to a situation where no Celerybeat is running at all, because the leader is only defined once at the creation of the environment.
Creating a new Django app just for async tasks in the Elastic Beanstalk worker tier: Nice, because web servers and workers can be scaled independently and the web server performance is not affected by huge async work loads performed by the workers. However, this approach does not work with Celery because the worker tier SQS daemon removes messages and posts the message bodies to a predefined urls. Additionally, I don't like the idea of having a complete additional Django app that needs to import the models from the main app and needs to be separately updated and deployed if the tasks are modified in the main app.
How to I use Celery with scheduled tasks in a distributed Elastic Beanstalk environment without task duplication? E.g. how can I make sure that exactly one instance is running across all instances all the time in the Elastic Beanstalk environment (even if the current instance with Celerybeat crashes)?
Are there any other ways to achieve this? What's the best way to use Elastic Beanstalk's Worker Tier Environment with Django?
I guess you could single out celery beat to different group.
Your auto scaling group runs multiple django instances, but celery is not included in the ec2 config of the scaling group.
You should have different set (or just one) of instance for celery beat
In case someone experience similar issues: I ended up switching to a different Queue / Task framework for django. It is called django-q and was set up and working in less than an hour. It has all the features that I needed and also better Django integration than Celery (since djcelery is no longer active).
Django-q is super easy to use and also lighter than the huge Celery framework. I can only recommend it!

Celery's inspect unstable behaviour

I got celery project with RabbitMQ backend, that relies heavily on inspecting scheduled tasks. I found that the following code returns nothing for most of the time (of course, there are scheduled tasks) :
i = app.control.inspect()
scheduled = i.scheduled()
if (scheduled):
# do something
This code also runs from one of tasks, but I think it doesn't matter, I got same result from interactive python command line (with some exceptions, see below).
At the same time, celery -A <proj> inspect scheduled command never fails. Also, I noticed, that when called from interactive python command line for the first time, this command also never fails. Most of the successive i.scheduled() calls return nothing.
i.scheduled() guarantees result only when called for the first time?
If so, why and how then can I inspect scheduled tasks from task? Run dedicated worker and restart it after every task? Seems like overkill for such trivial task.
Please explain, how to use this feature the right way.
This is caused by some weird issue inside Celery app. To repeat methods from Inspect object you have to create new Celery app instance object.
Here is small snippet, which can help you:
from celery import Celery
def inspect(method):
app = Celery('app', broker='amqp://')
return getattr(app.control.inspect(), method)()
print inspect('scheduled')
print inspect('active')

Categories

Resources