By default Celery send all tasks to 'celery' queue, but you can change this behavior by adding extra parameter:
#task(queue='celery_periodic')
def recalc_last_hour():
log.debug('sending new task')
recalc_hour.delay(datetime(2013, 1, 1, 2)) # for example
Scheduler settings:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15 # every 15 sec for test
},
}
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
Run worker:
python manage.py celery worker -c 1 -Q celery_periodic -B -E
This scheme doesn't work as expected: this workers sends periodic tasks to 'celery' queue, not 'celery_periodic'. How can I fix that?
P.S. celery==3.0.16
Periodic tasks are sent to queues by celery beat where you can do everything you do with the Celery API. Here is the list of configurations that comes with celery beat:
https://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#available-fields
In your case:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15, # every 15 sec for test
'options': {'queue' : 'celery_periodic'}, # options are mapped to apply_async options
},
}
I found solution for this problem:
1) First of all I changed the way for configuring periodic tasks. I used #periodic_task decorator like this:
#periodic_task(run_every=crontab(minute='5'),
queue='celery_periodic',
options={'queue': 'celery_periodic'})
def recalc_last_hour():
dt = datetime.utcnow()
prev_hour = datetime(dt.year, dt.month, dt.day, dt.hour) \
- timedelta(hours=1)
log.debug('Generating task for hour %s', str(prev_hour))
recalc_hour.delay(prev_hour)
2) I wrote celery_periodic twice in params to #periodic_task:
queue='celery_periodic' option is used when you invoke task from code (.delay or .apply_async)
options={'queue': 'celery_periodic'} option is used when celery beat invokes it.
I'm sure, the same thing is possible if you'd configure periodic tasks with CELERYBEAT_SCHEDULE variable.
UPD. This solution correct for both DB based and file based storage for CELERYBEAT_SCHEDULER.
And if you are using djcelery Database scheduler, you can specify the queue on the Execution Options -> queue field
Related
I am running superset and celery on AWS ECS. Celery worker, Celery beat and Superset are running in separate containers of the same task. I have turned on debug logs in celery so that I can see each step celery is taking. Celery is starting up and running. Celery worker goes until the log message DEBUG/MainProcess] | Consumer: Starting Connection, celery beat goes until the first time it wakes up, then it displays the log message DEBUG/MainProcess] beat: Synchronizing schedule... and doesn't wake up again.
The command I am using to start the celery worker is:
celery --app=superset.tasks.celery_app:app worker -E --pool=gevent -c 500 -l DEBUG
The command I am using to start the celery beat is:
celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule -l DEBUG
From my superset_config.py, the relevant lines of code are:
class CeleryConfig:
broker_url = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
imports = (
"superset.sql_lab",
"superset.tasks",
"superset.tasks.thumbnails",
)
result_backend = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
worker_log_level = "DEBUG"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
"email_reports.send": {
"rate_limit": "1/s",
"time_limit": 600,
"soft_time_limit": 600,
"ignore_result": True,
},
}
beat_schedule = {
"alerts.schedule_check": {
"task": "alerts.schedule_check",
"schedule": crontab(minute="*", hour="*"),
},
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
WEBDRIVER_BASEURL = "http://0.0.0.0:8088"
Things I have tried:
Configuring a postgres database for Celery
putting celery into the same container as superset
Changed db numer in superset config
Tried variations on WEBDRIVER_BASEURL (localhost:8088, www.actual-url.com)
Changing the dependencies in the ECS task definition
reconfiguring security groups for redis
Various commands for starting celery worker and beat
Ensuring the security group for the container allows ingress on port 6379
I setup flower to run on the ECS instance; flower shows no workers, tasks or monitors
Things I know:
Celery is connecting to Redis. (At one point it wasn't, and that threw very specific errors.)
Celery is reading the schedule, I can see in the logs where it is displaying the three things scheduled for the beat_schedule.
Schedule reports are not firing; there are no logs at the time of the report, nor is there evidence of a report being generated.
I get an error cron_descriptor.GetText:Failed to find locale en_US, when I access the reports page of Superset (although, TBH, I feel like this is unrelated).
So it turns out I was using MemoryDb instead of Elasticache, and the two things are not interchangeable.
As I know, since Celery 3.1 decorator #periodic_task is depricated.
So I am trying to run an example from celery docs, and can't realise, what am I doing wrong.
I have the following code in task_planner.py:
from celery import Celery
from kombu import Queue, Exchange
class Config(object):
CELERY_QUEUES = (
Queue(
'try',
exchange=Exchange('try'),
routing_key='try',
),
)
celery = Celery('tasks',
backend='redis://',
broker='redis://localhost:6379/0')
celery.config_from_object(Config)
celery.conf.beat_schedule = {
'planner': {
'task': 'some_task',
'schedule': 5.0,
},
}
#celery.task(queue='try')
def some_task():
print('Hooray')
And when I run: celery -A task_planner worker -l info -B, I recieve only the following: [2016-11-27 19:06:56,119: INFO/Beat] Scheduler: Sending due task planner (some_task) every 5 sec.
But I am expecting the output 'Hooray'.
So, what am I missing?
Have found the solution.
I had the task:
#celery.task(queue='try')
def some_task():
print('Hooray')
I printed it's name:
print(some_task)
Got the following:
<#task: task_planner.some_task of tasks:0x7fceaaf5b9e8>
So I just changed the name of the task from some_task to task_planner.some_task here:
celery.conf.beat_schedule = {
'planner': {
'task': 'task_planner.some_task',
'schedule': 5.0,
},
}
And it worked!
[2016-11-29 10:09:57,697: WARNING/PoolWorker-3] Hooray
Note. You should run beat with worker (if task in the same module as beat) and loglevel 'info' in order to see the results:
celery -A task_planner worker -B -l info
Is there a way of deleting periodic task or removing the cache in Django Celery? Commenting out the code or deleting the corresponding code segment that schedules the task does not delete the actual task.
""" Commenting out, or deleting both entries from the code base doesn't do anything
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': timedelta(seconds=2),
'args': (2, 2)
},
'add-every-30-seconds2': {
'task': 'tasks.add',
'schedule': timedelta(seconds=5),
'args': (2, 6)
},
}
"""
I tried celery -A my_proj purge but the periodic tasks still happens. I am using RabbitMQ as my broker
BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
From the celery guide to periodic tasks and the celery management guide.
inspect active: List active tasks
$ celery -A proj inspect active
inspect scheduled: List scheduled ETA tasks
$ celery -A proj inspect scheduled
control disable_events: Disable events
$ celery -A proj control disable_events
Alternatively, try the GUI management systems available in the management guide.
EDIT: Purge will only remove the messages, not the task itself.
Delete the task in the djcelery admin screen to remove it from the database.
I would like to launch a periodic task every second but only if the previous task ended (db polling to send task to celery).
In the Celery documentation they are using the Django cache to make a lock.
I tried to use the example:
from __future__ import absolute_import
import datetime
import time
from celery import shared_task
from django.core.cache import cache
LOCK_EXPIRE = 60 * 5
#shared_task
def periodic():
acquire_lock = lambda: cache.add('lock_id', 'true', LOCK_EXPIRE)
release_lock = lambda: cache.delete('lock_id')
a = acquire_lock()
if a:
try:
time.sleep(10)
print a, 'Hello ', datetime.datetime.now()
finally:
release_lock()
else:
print 'Ignore'
with the following configuration:
app.conf.update(
CELERY_IGNORE_RESULT=True,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
CELERYBEAT_SCHEDULE={
'periodic_task': {
'task': 'app_task_management.tasks.periodic',
'schedule': timedelta(seconds=1),
},
},
)
But in the console, I never see the Ignore message and I have Hello every second. It seems that the lock is not working fine.
I launch the periodic task with:
celeryd -B -A my_app
and the worker with:
celery worker -A my_app -l info
Could you please correct my misunderstanding?
From the Django Cache Framework documentation about local-memory cache:
Note that each process will have its own private cache instance, which
means no cross-process caching is possible.
So basically your workers are each dealing with their own cache. If you need a low resource cost cache backend I would recommend File Based Cache or Database Cache, both allow cross-process.
I followed celery docs to define 2 queues on my dev machine.
My celery settings:
CELERY_ALWAYS_EAGER = True
CELERY_TASK_RESULT_EXPIRES = 60 # 1 mins
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_CREATE_MISSING_QUEUES = True
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='arena.social.tasks.#'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'fs_feeds',
},
}
i opened two terminal windows, in virtualenv of my project, and ran following commands:
terminal_1$ celery -A arena worker -Q default -B -l debug --purge -n deafult_worker
terminal_2$ celery -A arena worker -Q feeds -B -l debug --purge -n feeds_worker
what i get is that all tasks are being processed by both queues.
My goal is to have one queue to process only the one task defined in CELERY_ROUTES and default queue to process all other tasks.
I also followed this SO question, rabbitmqctl list_queues returns celery 0, and running rabbitmqctl list_bindings returns exchange celery queue celery [] twice. Restarting rabbit server didn't change anything.
Ok, so i figured it out. Following is my whole setup, settings and how to run celery, for those who might be wondering about same thing as my question did.
Settings
CELERY_TIMEZONE = TIME_ZONE
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
# celery queues setup
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='long_tasks'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'feeds',
'routing_key': 'long_tasks',
},
}
How to run celery?
terminal - tab 1:
celery -A proj worker -Q default -l debug -n default_worker
this will start first worker that consumes tasks from default queue. NOTE! -n default_worker is not a must for the first worker, but is a must if you have any other celery instances up and running. Setting -n worker_name is the same as --hostname=default#%h.
terminal - tab 2:
celery -A proj worker -Q feeds -l debug -n feeds_worker
this will start second worker that consumers tasks from feeds queue. Notice -n feeds_worker, if you are running with -l debug (log level = debug), you will see that both workers are syncing between them.
terminal - tab 3:
celery -A proj beat -l debug
this will start the beat, executing tasks according to the schedule in your CELERYBEAT_SCHEDULE.
I didn't have to change the task, or the CELERYBEAT_SCHEDULE.
For example, this is how looks my CELERYBEAT_SCHEDULE for the task that should go to feeds queue:
CELERYBEAT_SCHEDULE = {
...
'update_feeds': {
'task': 'arena.social.tasks.Update',
'schedule': crontab(minute='*/6'),
},
...
}
As you can see, no need for adding 'options': {'routing_key': 'long_tasks'} or specifying to what queue it should go. Also, if you were wondering why Update is upper cased, its because its a custom task, which are defined as sub classes of celery.Task.
Update Celery 5.0+
Celery made a couple changes since version 5, here is an updated setup for routing of tasks.
How to create the queues?
Celery can create the queues automatically. It works perfectly for simple cases, where celery default values for routing are ok.
task_create_missing_queues=True or, if you're using django settings and you're namespacing all celery configs under CELERY_ key, CELERY_TASK_CREATE_MISSING_QUEUES=True. Note, that it is on by default.
Automatic scheduled task routing
After configuring celery app:
celery_app.conf.beat_schedule = {
"some_scheduled_task": {
"task": "module.path.some_task",
"schedule": crontab(minute="*/10"),
"options": {"queue": "queue1"}
}
}
Automatic task routing
Celery app still has to be configured first and then:
app.conf.task_routes = {
"module.path.task2": {"queue": "queue2"},
}
Manual routing of tasks
In case and you want to route the tasks dynamically, then when sending the task specify the queue:
from module import task
def do_work():
# do some work and launch the task
task.apply_async(args=(arg1, arg2), queue="queue3")
More details re routing can be found here:
https://docs.celeryproject.org/en/stable/userguide/routing.html
And regarding calling tasks here:
https://docs.celeryproject.org/en/stable/userguide/calling.html
In addition to accepted answer, if anyone comes here and still wonders why his settings aren't working (as I did just moments ago), here's why: celery documentation isn't listing settings names properly.
For celery 5.0.5 settings CELERY_DEFAULT_QUEUE, CELERY_QUEUES, CELERY_ROUTES should be named CELERY_TASK_DEFAULT_QUEUE, CELERY_TASK_QUEUESand CELERY_TASK_ROUTES instead. These are settings that I've tested, but my guess is the same rule applies for exchange and routing key aswell.