Python Celery - Switch TaskRegistry implementation - python

I'm wondering how I can change the Celery TaskRegistry implementation so that I can switch it with my own implementation. I wish to inject dependencies into tasks when they are created (e.g. when running the celery worker process).
I have tried:
from celery import Celery
Celery.registry_cls=MyCustomTaskRegistry
At the top before creating my app instance. But it does not seem to be picked up.
Any help in the right direction is greatly appreciated.

You can define your own registry like this
from celery import Celery
a = Celery()
print(a.registry_cls) # gives original registry
from celery.app.registry import TaskRegistry
class MyCustomRegistry(TaskRegistry):
pass
a.registry_cls = MyCustomRegistry
print(a.registry_cls) # gives your custom registry

Found the solution myself:
app._tasks=MyCustomTaskRegistry()

Related

Change concurrency/manage queue in running celery app

I am using celery 5 to manage some external tasks within a fast API app.
I started celery with one worker and 8 concurrent jobs:
celery worker --app=app.worker.celery --concurrency=8 --loglevel=info --logfile=logs/celery.log
I want to be able to change the concurrency from the fast API app. I don't know if this is the best way of doing this or even if it is possible.
I haven't found a way to change the concurrency so I tried to add new workers, using
from celery import current_app as app
cmd = ["--app=app.worker.celery", "--concurrency=8", "--loglevel=info", "--logfile=logs/celery.log", "--without-gossip" , "--detach", "-E"]
app.worker_main(cmd)
but this doesn't work even passing --detach it blocks the request.
Is there another/better way of doing this?
EDIT:
after looking how flower 1.0.1 does this I was able to track the right API.
Solved:
from celery import current_app as app
response = app.control.pool_grow(
n=4, reply=True, destination=[worker_name])
You can write your own autoscaler by subclassing celery.worker.autoscale.AutoScaler and setting worker_autoscaler . Your autoscaler could, for example, monitor a database record and when that changes, scales up.
Flower allows to grow and shrink the pool (here the flower implementation)
from celery import current_app as app
n = 10 # increase the pool size
worker_name = "my_worker"
response = app.control.pool_grow(
n=n, reply=True, destination=[worker_name])
The only problem is that it does not have an option to get the pool size after one changes it

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass
In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!
That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.
If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Difference between different ways to create celery task

I am very confused by looking at different ways of creating a celery task. On the surface they all work the same So, Can someone explain what is the difference between these.
1.
from myproject.tasks import app
#app.task
def foo():
pass
2.
from celery import task
#task
def foo():
pass
3.
from celery import shared_task
#shared_task
def foo():
pass
I know by a little bit of googling that the difference between the 1nd and 3rd one is shared_task is used when you don't have a concrete app instance. Can someone elaborate more on that and when is the second one is used?
Don't use #2 unless you are using celery v3. If you are using celery v4, use #1.
Use #3 in instances where you are writing a reusable library or django app. For example, if you are writing an open source set of tasks that allow you to manage aws ec2 instances using celery, you would use shared_task so that the tasks could be run on celery, but you would leave it to the person using your library to configure celery for themselves.
Use #1 if you are writing for your own project and there is no concern for re-use.

Django Celery delay() always pushing to default 'celery' queue

I'm ripping my hair out with this one.
The crux of my issue is that, using the Django CELERY_DEFAULT_QUEUE setting in my settings.py is not forcing my tasks to go to that particular queue that I've set up. It always goes to the default celery queue in my broker.
However, if I specify queue=proj:dev in the shared_task decorator, it goes to the correct queue. It behaves as expected.
My setup is as follows:
Django code on my localhost (for testing and stuff). Executing task .delay()'s via Django's shell (manage.py shell)
a remote Redis instance configured as my broker
2 celery workers configured on a remote machine setup and waiting for messages from Redis (On Google App Engine - irrelevant perhaps)
NB: For the pieces of code below, I've obscured the project name and used proj as a placeholder.
celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery, shared_task
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings')
app = Celery('proj')
app.config_from_object('django.conf:settings', namespace='CELERY', force=True)
app.autodiscover_tasks()
#shared_task
def add(x, y):
return x + y
settings.py
...
CELERY_RESULT_BACKEND = 'django-db'
CELERY_BROKER_URL = 'redis://:{}#{}:6379/0'.format(
os.environ.get('REDIS_PASSWORD'),
os.environ.get('REDIS_HOST', 'alice-redis-vm'))
CELERY_DEFAULT_QUEUE = os.environ.get('CELERY_DEFAULT_QUEUE', 'proj:dev')
The idea is that, for right now, I'd like to have different queues for the different environments that my code exists in: dev, staging, prod. Thus, on Google App Engine, I define an environment variable that is passed based on the individual App Engine service.
Steps
So, with the above configuration, I fire up the shell using ./manage.py shell and run add.delay(2, 2). I get an AsyncResult back but Redis monitor clearly shows a message was sent to the default celery queue:
1497566026.117419 [0 155.93.144.189:58887] "LPUSH" "celery"
...
What am I missing?
Not to throw a spanner in the works, but I feel like there was a point today at which this was actually working. But for the life of me, I can't think what part of my brain is failing me here.
Stack versions:
python: 3.5.2
celery: 4.0.2
redis: 2.10.5
django: 1.10.4
This issue is far more simple than I thought - incorrect documentation!!
The Celery documentation asks us to use CELERY_DEFAULT_QUEUE to set the task_default_queue configuration on the celery object.
Ref: http://docs.celeryproject.org/en/latest/userguide/configuration.html#new-lowercase-settings
We should currently use CELERY_TASK_DEFAULT_QUEUE. This is an inconsistency in the naming of all the other settings' names. It was raised on Github here - https://github.com/celery/celery/issues/3772
Solution summary
Using CELERY_DEFAULT_QUEUE in a configuration module (using config_from_object) has no effect on the queue.
Use CELERY_TASK_DEFAULT_QUEUE instead.
If you are here because you're trying to implement a predefined queue using SQS in Celery and find that Celery creates a new queue called "celery" in SQS regardless of what you say, you've reached the end of your journey friend.
Before passing broker_transport_options to Celery, change your default queue and/or specify the queues you will use explicitly. In my case, I need just the one queue so doing the following worked:
celery.conf.task_default_queue = "<YOUR_PREDEFINED_QUEUE_NAME_IN_SQS">

Getting reusable tasks to work in a setup with one celery server and 3k+ django sites, each with its own database

Here's the problem: I have one celery server and 3k+ django sites, each with its own database. New sites (and databases) can be added dynamically.
I'm writing celery tasks which need to be run for each site, through the common celery server. The code is in an app which is meant to be reusable, so it shouldn't be written in a way that ties it to this particular setup.
So. Without mangling the task code to fit my exact setup, how can I make sure that the tasks connect to the correct database when they run?
This is hard to accomplish because of an inherent limitation in Django: The settings are global. So unless all the apps shared the same settings, this is going to be a problem.
You could try spawning new worker processes for every task and create the django environment each time. Don't use django-celery, but use celery directly with something like this in
celeryconfig.py:
from celery import signals
from importlib import import_module
def before_task(task, **kwargs):
settings_module = task.request.kwargs.pop("settings_module", None)
if settings_module:
settings = import_module(settings_module)
from django.conf import setup_environ
setup_environ(settings)
signals.task_prerun.connect(before_task)
CELERYD_MAX_TASKS_PER_CHILD = 1

Categories

Resources