What is the purpose of Celery's "autodiscover_tasks" function? - python

I'm wondering what is the purpose of Celery's autodiscover_tasks function. Im am using Celery 4.1.2 with Django 2.1.4.
The Celery documentation refers to imports:
foo.tasks and bar.tasks being imported
But I can't comprehend how this works.
All the examples I found on GitHub including this one from the Official Celery repo, rely on manually importing (i.e.from demoapp.tasks import add, mul, xsum) the tasks even when calling the autodiscover_tasks function when booting the worker.
I guess this is how Python work, you can't access to classes "globally", like in Ruby, for example.
Then once again, what is this function for? I'm no expert at Celery and maybe I am missing something. The only thing I see is the name of the discovered tasks when launching the Celery worker, is that all this function is supposed to do?
Thanks for your inputs,

When using celery with django, the autodiscover_tasks function registers all decorated tasks within the task module inside each INSTALLED_APPS entry. e.g.,
if your INSTALLED_APPS settings included app1, app2, and app3, celery would automatically register any decorated tasks that could be found by looking at app1.tasks, app2.tasks, and app3.tasks.

Related

Does django-celery-beat deal with multiple instances of django processes created by web-servers

I have a fairly complex periodic-tasks that needs to be offloaded from django context. django-celery-beat looks promising. While I was going through celery-beat docs, I found this:
You have to ensure only a single scheduler is running for a schedule at a time, otherwise you’d end up with duplicate tasks. Using a centralized approach means the schedule doesn’t have to be synchronized, and the service can operate without using locks.
A typical production deployment will spawn a pool of worker-processes each running a django instance. Will that result in creation of multiple scheduler processes as well? Do I need to have some synchronisation logic?
Thanks for your time!
It does not.
You can dig into the issues page on their github repo for confirmation. I think it's weird that the documentation doesn't call this out, but I suppose you have to assume that's how all celery beats work unless they specify otherwise.
In theory, you could build your own synchronization, but it will probably be a better experience to use a different scheduler that has that functionality built in, like Heroku's redbeat: https://blog.heroku.com/redbeat-celery-beat-scheduler.

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Difference between different ways to create celery task

I am very confused by looking at different ways of creating a celery task. On the surface they all work the same So, Can someone explain what is the difference between these.
1.
from myproject.tasks import app
#app.task
def foo():
pass
2.
from celery import task
#task
def foo():
pass
3.
from celery import shared_task
#shared_task
def foo():
pass
I know by a little bit of googling that the difference between the 1nd and 3rd one is shared_task is used when you don't have a concrete app instance. Can someone elaborate more on that and when is the second one is used?
Don't use #2 unless you are using celery v3. If you are using celery v4, use #1.
Use #3 in instances where you are writing a reusable library or django app. For example, if you are writing an open source set of tasks that allow you to manage aws ec2 instances using celery, you would use shared_task so that the tasks could be run on celery, but you would leave it to the person using your library to configure celery for themselves.
Use #1 if you are writing for your own project and there is no concern for re-use.

How to test code that creates Celery tasks?

I've read Testing with Celery but I'm still a bit confused. I want to test code that generates a Celery task by running the task manually and explicitly, something like:
def test_something(self):
do_something_that_generates_a_celery_task()
assert_state_before_task_runs()
run_task()
assert_state_after_task_runs()
I don't want to entirely mock up the creation of the task but at the same time I don't care about testing the task being picked up by a Celery worker. I'm assuming Celery works.
The actual context in which I'm trying to do this is a Django application where there's some code that takes too long to run in a request, so, it's delegated to background jobs.
In test mode use CELERY_TASK_ALWAYS_EAGER = True. You can set this setting in your settings.py in django if you have followed the default guide for django-celery configuration.

Getting reusable tasks to work in a setup with one celery server and 3k+ django sites, each with its own database

Here's the problem: I have one celery server and 3k+ django sites, each with its own database. New sites (and databases) can be added dynamically.
I'm writing celery tasks which need to be run for each site, through the common celery server. The code is in an app which is meant to be reusable, so it shouldn't be written in a way that ties it to this particular setup.
So. Without mangling the task code to fit my exact setup, how can I make sure that the tasks connect to the correct database when they run?
This is hard to accomplish because of an inherent limitation in Django: The settings are global. So unless all the apps shared the same settings, this is going to be a problem.
You could try spawning new worker processes for every task and create the django environment each time. Don't use django-celery, but use celery directly with something like this in
celeryconfig.py:
from celery import signals
from importlib import import_module
def before_task(task, **kwargs):
settings_module = task.request.kwargs.pop("settings_module", None)
if settings_module:
settings = import_module(settings_module)
from django.conf import setup_environ
setup_environ(settings)
signals.task_prerun.connect(before_task)
CELERYD_MAX_TASKS_PER_CHILD = 1

Categories

Resources