Django/Celery multiple queues on localhost - routing not working - python

I followed celery docs to define 2 queues on my dev machine.
My celery settings:
CELERY_ALWAYS_EAGER = True
CELERY_TASK_RESULT_EXPIRES = 60 # 1 mins
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_CREATE_MISSING_QUEUES = True
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='arena.social.tasks.#'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'fs_feeds',
},
}
i opened two terminal windows, in virtualenv of my project, and ran following commands:
terminal_1$ celery -A arena worker -Q default -B -l debug --purge -n deafult_worker
terminal_2$ celery -A arena worker -Q feeds -B -l debug --purge -n feeds_worker
what i get is that all tasks are being processed by both queues.
My goal is to have one queue to process only the one task defined in CELERY_ROUTES and default queue to process all other tasks.
I also followed this SO question, rabbitmqctl list_queues returns celery 0, and running rabbitmqctl list_bindings returns exchange celery queue celery [] twice. Restarting rabbit server didn't change anything.

Ok, so i figured it out. Following is my whole setup, settings and how to run celery, for those who might be wondering about same thing as my question did.
Settings
CELERY_TIMEZONE = TIME_ZONE
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
# celery queues setup
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='long_tasks'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'feeds',
'routing_key': 'long_tasks',
},
}
How to run celery?
terminal - tab 1:
celery -A proj worker -Q default -l debug -n default_worker
this will start first worker that consumes tasks from default queue. NOTE! -n default_worker is not a must for the first worker, but is a must if you have any other celery instances up and running. Setting -n worker_name is the same as --hostname=default#%h.
terminal - tab 2:
celery -A proj worker -Q feeds -l debug -n feeds_worker
this will start second worker that consumers tasks from feeds queue. Notice -n feeds_worker, if you are running with -l debug (log level = debug), you will see that both workers are syncing between them.
terminal - tab 3:
celery -A proj beat -l debug
this will start the beat, executing tasks according to the schedule in your CELERYBEAT_SCHEDULE.
I didn't have to change the task, or the CELERYBEAT_SCHEDULE.
For example, this is how looks my CELERYBEAT_SCHEDULE for the task that should go to feeds queue:
CELERYBEAT_SCHEDULE = {
...
'update_feeds': {
'task': 'arena.social.tasks.Update',
'schedule': crontab(minute='*/6'),
},
...
}
As you can see, no need for adding 'options': {'routing_key': 'long_tasks'} or specifying to what queue it should go. Also, if you were wondering why Update is upper cased, its because its a custom task, which are defined as sub classes of celery.Task.
Update Celery 5.0+
Celery made a couple changes since version 5, here is an updated setup for routing of tasks.
How to create the queues?
Celery can create the queues automatically. It works perfectly for simple cases, where celery default values for routing are ok.
task_create_missing_queues=True or, if you're using django settings and you're namespacing all celery configs under CELERY_ key, CELERY_TASK_CREATE_MISSING_QUEUES=True. Note, that it is on by default.
Automatic scheduled task routing
After configuring celery app:
celery_app.conf.beat_schedule = {
"some_scheduled_task": {
"task": "module.path.some_task",
"schedule": crontab(minute="*/10"),
"options": {"queue": "queue1"}
}
}
Automatic task routing
Celery app still has to be configured first and then:
app.conf.task_routes = {
"module.path.task2": {"queue": "queue2"},
}
Manual routing of tasks
In case and you want to route the tasks dynamically, then when sending the task specify the queue:
from module import task
def do_work():
# do some work and launch the task
task.apply_async(args=(arg1, arg2), queue="queue3")
More details re routing can be found here:
https://docs.celeryproject.org/en/stable/userguide/routing.html
And regarding calling tasks here:
https://docs.celeryproject.org/en/stable/userguide/calling.html

In addition to accepted answer, if anyone comes here and still wonders why his settings aren't working (as I did just moments ago), here's why: celery documentation isn't listing settings names properly.
For celery 5.0.5 settings CELERY_DEFAULT_QUEUE, CELERY_QUEUES, CELERY_ROUTES should be named CELERY_TASK_DEFAULT_QUEUE, CELERY_TASK_QUEUESand CELERY_TASK_ROUTES instead. These are settings that I've tested, but my guess is the same rule applies for exchange and routing key aswell.

Related

Celery Beat: Synchronizing schedule is the last debug message I see

I am running superset and celery on AWS ECS. Celery worker, Celery beat and Superset are running in separate containers of the same task. I have turned on debug logs in celery so that I can see each step celery is taking. Celery is starting up and running. Celery worker goes until the log message DEBUG/MainProcess] | Consumer: Starting Connection, celery beat goes until the first time it wakes up, then it displays the log message DEBUG/MainProcess] beat: Synchronizing schedule... and doesn't wake up again.
The command I am using to start the celery worker is:
celery --app=superset.tasks.celery_app:app worker -E --pool=gevent -c 500 -l DEBUG
The command I am using to start the celery beat is:
celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule -l DEBUG
From my superset_config.py, the relevant lines of code are:
class CeleryConfig:
broker_url = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
imports = (
"superset.sql_lab",
"superset.tasks",
"superset.tasks.thumbnails",
)
result_backend = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
worker_log_level = "DEBUG"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
"email_reports.send": {
"rate_limit": "1/s",
"time_limit": 600,
"soft_time_limit": 600,
"ignore_result": True,
},
}
beat_schedule = {
"alerts.schedule_check": {
"task": "alerts.schedule_check",
"schedule": crontab(minute="*", hour="*"),
},
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
WEBDRIVER_BASEURL = "http://0.0.0.0:8088"
Things I have tried:
Configuring a postgres database for Celery
putting celery into the same container as superset
Changed db numer in superset config
Tried variations on WEBDRIVER_BASEURL (localhost:8088, www.actual-url.com)
Changing the dependencies in the ECS task definition
reconfiguring security groups for redis
Various commands for starting celery worker and beat
Ensuring the security group for the container allows ingress on port 6379
I setup flower to run on the ECS instance; flower shows no workers, tasks or monitors
Things I know:
Celery is connecting to Redis. (At one point it wasn't, and that threw very specific errors.)
Celery is reading the schedule, I can see in the logs where it is displaying the three things scheduled for the beat_schedule.
Schedule reports are not firing; there are no logs at the time of the report, nor is there evidence of a report being generated.
I get an error cron_descriptor.GetText:Failed to find locale en_US, when I access the reports page of Superset (although, TBH, I feel like this is unrelated).
So it turns out I was using MemoryDb instead of Elasticache, and the two things are not interchangeable.

Celery not consuming tasks when queue name is specified [duplicate]

By default Celery send all tasks to 'celery' queue, but you can change this behavior by adding extra parameter:
#task(queue='celery_periodic')
def recalc_last_hour():
log.debug('sending new task')
recalc_hour.delay(datetime(2013, 1, 1, 2)) # for example
Scheduler settings:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15 # every 15 sec for test
},
}
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
Run worker:
python manage.py celery worker -c 1 -Q celery_periodic -B -E
This scheme doesn't work as expected: this workers sends periodic tasks to 'celery' queue, not 'celery_periodic'. How can I fix that?
P.S. celery==3.0.16
Periodic tasks are sent to queues by celery beat where you can do everything you do with the Celery API. Here is the list of configurations that comes with celery beat:
https://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#available-fields
In your case:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15, # every 15 sec for test
'options': {'queue' : 'celery_periodic'}, # options are mapped to apply_async options
},
}
I found solution for this problem:
1) First of all I changed the way for configuring periodic tasks. I used #periodic_task decorator like this:
#periodic_task(run_every=crontab(minute='5'),
queue='celery_periodic',
options={'queue': 'celery_periodic'})
def recalc_last_hour():
dt = datetime.utcnow()
prev_hour = datetime(dt.year, dt.month, dt.day, dt.hour) \
- timedelta(hours=1)
log.debug('Generating task for hour %s', str(prev_hour))
recalc_hour.delay(prev_hour)
2) I wrote celery_periodic twice in params to #periodic_task:
queue='celery_periodic' option is used when you invoke task from code (.delay or .apply_async)
options={'queue': 'celery_periodic'} option is used when celery beat invokes it.
I'm sure, the same thing is possible if you'd configure periodic tasks with CELERYBEAT_SCHEDULE variable.
UPD. This solution correct for both DB based and file based storage for CELERYBEAT_SCHEDULER.
And if you are using djcelery Database scheduler, you can specify the queue on the Execution Options -> queue field

Django Celery Worker Not reciving the Tasks

Whenever I am running the celery worker I am getting the warning
./manage.py celery worker -l info --concurrency=8
and if I am ignored this warning then my celery worker not receiving the celery beat tasks
After googled I have also changed the worker name, but this time I am not receiving the warning but celery worker still not receiving the celery beat scheduled tasks
I have checked the celery beat logs, and celery beat scheduling the task on time.
I have also checked the celery flower and its showing two workers and the first worker is receiving the tasks and not executing it, how to send all task the second worker? or how can i disable the first kombu worker, what is djagno-celery setting that i am missing?
My django settings.py
RABBITMQ_USERNAME = "guest"
RABBITMQ_PASSWORD = "guest"
BROKER_URL = 'amqp://%s:%s#localhost:5672//' % (RABBITMQ_USERNAME,
RABBITMQ_PASSWORD)
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_IGNORE_RESULT = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
celery_enable_utc=True
import djcelery
djcelery.setup_loader()
You only enabled the worker. For a task to be executed, you must call the task with the help of the your_task.delay () function.
For example, open another terminal, enter your project, and run the python manage.py shell command. When entering the shell of your project Django, import your task and run the command your_task.delay ()
In the following link, there is an example of celery code with rabbitmq broker, I advise you to study it:
https://github.com/celery/celery/tree/master/examples/django

Deleting periodic task for celery scheduler in `settings.py` will not delete the actual task

Is there a way of deleting periodic task or removing the cache in Django Celery? Commenting out the code or deleting the corresponding code segment that schedules the task does not delete the actual task.
""" Commenting out, or deleting both entries from the code base doesn't do anything
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': timedelta(seconds=2),
'args': (2, 2)
},
'add-every-30-seconds2': {
'task': 'tasks.add',
'schedule': timedelta(seconds=5),
'args': (2, 6)
},
}
"""
I tried celery -A my_proj purge but the periodic tasks still happens. I am using RabbitMQ as my broker
BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
From the celery guide to periodic tasks and the celery management guide.
inspect active: List active tasks
$ celery -A proj inspect active
inspect scheduled: List scheduled ETA tasks
$ celery -A proj inspect scheduled
control disable_events: Disable events
$ celery -A proj control disable_events
Alternatively, try the GUI management systems available in the management guide.
EDIT: Purge will only remove the messages, not the task itself.
Delete the task in the djcelery admin screen to remove it from the database.

Celery events specific to a queue

I have two Django projects, each with a Celery app:
- fooproj.celery_app
- barproj.celery_app
Each app is running its own Celery worker:
celery worker -A fooproj.celery_app -l info -E -Q foo_queue
celery worker -A barproj.celery_app -l info -E -Q bar_queue
Here's how I am configuring my Celery apps:
import os
from celery import Celery
from django.conf import settings
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'proj.settings.local')
app = Celery('celery_app', broker=settings.BROKER_URL)
app.conf.update(
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend',
CELERY_SEND_EVENTS=True,
CELERY_DEFAULT_QUEUE=settings.CELERY_DEFAULT_QUEUE,
CELERY_DEFAULT_EXCHANGE=settings.CELERY_DEFAULT_EXCHANGE,
CELERY_DEFAULT_ROUTING_KEY=settings.CELERY_DEFAULT_ROUTING_KEY,
CELERY_DEFAULT_EXCHANGE_TYPE='direct',
CELERY_ROUTES = ('proj.celeryrouters.MainRouter', ),
CELERY_IMPORTS=(
'apps.qux.tasks',
'apps.lorem.tasks',
'apps.ipsum.tasks',
'apps.sit.tasks'
),
)
My router class:
from django.conf import settings
class MainRouter(object):
"""
Routes Celery tasks to a proper exchange and queue
"""
def route_for_task(self, task, args=None, kwargs=None):
return {
'exchange': settings.CELERY_DEFAULT_EXCHANGE,
'exchange_type': 'direct',
'queue': settings.CELERY_DEFAULT_QUEUE,
'routing_key': settings.CELERY_DEFAULT_ROUTING_KEY,
}
fooproj has settings:
BROKER_URL = redis://localhost:6379/0
CELERY_DEFAULT_EXCHANGE = 'foo_exchange'
CELERY_DEFAULT_QUEUE = 'foo_queue'
CELERY_DEFAULT_ROUTING_KEY = 'foo_routing_key'
barproj has settings:
BROKER_URL = redis://localhost:6379/1
CELERY_DEFAULT_EXCHANGE = 'foo_exchange'
CELERY_DEFAULT_QUEUE = 'foo_queue'
CELERY_DEFAULT_ROUTING_KEY = 'foo_routing_key'
As you can see, both projects use their own Redis database as a broker, their own MySQL database as a result backend, their own exchange, queue and routing key.
I am trying to have two Celery events processes running, one for each app:
celery events -A fooproj.celery_app -l info -c djcelery.snapshot.Camera
celery events -A barproj.celery_app -l info -c djcelery.snapshot.Camera
The problem is, both celery events processes are picking up tasks from all of my Celery workers! So in the fooproj database, I can see task results from barproj database.
Any idea how to solve this problem?
From http://celery.readthedocs.org/en/latest/getting-started/brokers/redis.html:
Monitoring events (as used by flower and other tools) are global and
is not affected by the virtual host setting.
This is caused by a limitation in Redis. The Redis PUB/SUB channels are global and not affected by the database number.
This seems to be one of Redis' caveats :(

Categories

Resources