I am running superset and celery on AWS ECS. Celery worker, Celery beat and Superset are running in separate containers of the same task. I have turned on debug logs in celery so that I can see each step celery is taking. Celery is starting up and running. Celery worker goes until the log message DEBUG/MainProcess] | Consumer: Starting Connection, celery beat goes until the first time it wakes up, then it displays the log message DEBUG/MainProcess] beat: Synchronizing schedule... and doesn't wake up again.
The command I am using to start the celery worker is:
celery --app=superset.tasks.celery_app:app worker -E --pool=gevent -c 500 -l DEBUG
The command I am using to start the celery beat is:
celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule -l DEBUG
From my superset_config.py, the relevant lines of code are:
class CeleryConfig:
broker_url = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
imports = (
"superset.sql_lab",
"superset.tasks",
"superset.tasks.thumbnails",
)
result_backend = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
worker_log_level = "DEBUG"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
"email_reports.send": {
"rate_limit": "1/s",
"time_limit": 600,
"soft_time_limit": 600,
"ignore_result": True,
},
}
beat_schedule = {
"alerts.schedule_check": {
"task": "alerts.schedule_check",
"schedule": crontab(minute="*", hour="*"),
},
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
WEBDRIVER_BASEURL = "http://0.0.0.0:8088"
Things I have tried:
Configuring a postgres database for Celery
putting celery into the same container as superset
Changed db numer in superset config
Tried variations on WEBDRIVER_BASEURL (localhost:8088, www.actual-url.com)
Changing the dependencies in the ECS task definition
reconfiguring security groups for redis
Various commands for starting celery worker and beat
Ensuring the security group for the container allows ingress on port 6379
I setup flower to run on the ECS instance; flower shows no workers, tasks or monitors
Things I know:
Celery is connecting to Redis. (At one point it wasn't, and that threw very specific errors.)
Celery is reading the schedule, I can see in the logs where it is displaying the three things scheduled for the beat_schedule.
Schedule reports are not firing; there are no logs at the time of the report, nor is there evidence of a report being generated.
I get an error cron_descriptor.GetText:Failed to find locale en_US, when I access the reports page of Superset (although, TBH, I feel like this is unrelated).
So it turns out I was using MemoryDb instead of Elasticache, and the two things are not interchangeable.
Related
By default Celery send all tasks to 'celery' queue, but you can change this behavior by adding extra parameter:
#task(queue='celery_periodic')
def recalc_last_hour():
log.debug('sending new task')
recalc_hour.delay(datetime(2013, 1, 1, 2)) # for example
Scheduler settings:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15 # every 15 sec for test
},
}
CELERYBEAT_SCHEDULER = "djcelery.schedulers.DatabaseScheduler"
Run worker:
python manage.py celery worker -c 1 -Q celery_periodic -B -E
This scheme doesn't work as expected: this workers sends periodic tasks to 'celery' queue, not 'celery_periodic'. How can I fix that?
P.S. celery==3.0.16
Periodic tasks are sent to queues by celery beat where you can do everything you do with the Celery API. Here is the list of configurations that comes with celery beat:
https://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#available-fields
In your case:
CELERYBEAT_SCHEDULE = {
'installer_recalc_hour': {
'task': 'stats.installer.tasks.recalc_last_hour',
'schedule': 15, # every 15 sec for test
'options': {'queue' : 'celery_periodic'}, # options are mapped to apply_async options
},
}
I found solution for this problem:
1) First of all I changed the way for configuring periodic tasks. I used #periodic_task decorator like this:
#periodic_task(run_every=crontab(minute='5'),
queue='celery_periodic',
options={'queue': 'celery_periodic'})
def recalc_last_hour():
dt = datetime.utcnow()
prev_hour = datetime(dt.year, dt.month, dt.day, dt.hour) \
- timedelta(hours=1)
log.debug('Generating task for hour %s', str(prev_hour))
recalc_hour.delay(prev_hour)
2) I wrote celery_periodic twice in params to #periodic_task:
queue='celery_periodic' option is used when you invoke task from code (.delay or .apply_async)
options={'queue': 'celery_periodic'} option is used when celery beat invokes it.
I'm sure, the same thing is possible if you'd configure periodic tasks with CELERYBEAT_SCHEDULE variable.
UPD. This solution correct for both DB based and file based storage for CELERYBEAT_SCHEDULER.
And if you are using djcelery Database scheduler, you can specify the queue on the Execution Options -> queue field
Whenever I am running the celery worker I am getting the warning
./manage.py celery worker -l info --concurrency=8
and if I am ignored this warning then my celery worker not receiving the celery beat tasks
After googled I have also changed the worker name, but this time I am not receiving the warning but celery worker still not receiving the celery beat scheduled tasks
I have checked the celery beat logs, and celery beat scheduling the task on time.
I have also checked the celery flower and its showing two workers and the first worker is receiving the tasks and not executing it, how to send all task the second worker? or how can i disable the first kombu worker, what is djagno-celery setting that i am missing?
My django settings.py
RABBITMQ_USERNAME = "guest"
RABBITMQ_PASSWORD = "guest"
BROKER_URL = 'amqp://%s:%s#localhost:5672//' % (RABBITMQ_USERNAME,
RABBITMQ_PASSWORD)
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_IGNORE_RESULT = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
celery_enable_utc=True
import djcelery
djcelery.setup_loader()
You only enabled the worker. For a task to be executed, you must call the task with the help of the your_task.delay () function.
For example, open another terminal, enter your project, and run the python manage.py shell command. When entering the shell of your project Django, import your task and run the command your_task.delay ()
In the following link, there is an example of celery code with rabbitmq broker, I advise you to study it:
https://github.com/celery/celery/tree/master/examples/django
Is there a way of deleting periodic task or removing the cache in Django Celery? Commenting out the code or deleting the corresponding code segment that schedules the task does not delete the actual task.
""" Commenting out, or deleting both entries from the code base doesn't do anything
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks.add',
'schedule': timedelta(seconds=2),
'args': (2, 2)
},
'add-every-30-seconds2': {
'task': 'tasks.add',
'schedule': timedelta(seconds=5),
'args': (2, 6)
},
}
"""
I tried celery -A my_proj purge but the periodic tasks still happens. I am using RabbitMQ as my broker
BROKER_URL = "amqp://guest:guest#localhost:5672//"
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
From the celery guide to periodic tasks and the celery management guide.
inspect active: List active tasks
$ celery -A proj inspect active
inspect scheduled: List scheduled ETA tasks
$ celery -A proj inspect scheduled
control disable_events: Disable events
$ celery -A proj control disable_events
Alternatively, try the GUI management systems available in the management guide.
EDIT: Purge will only remove the messages, not the task itself.
Delete the task in the djcelery admin screen to remove it from the database.
I am hoping someone can help me as I've looked on Stack Overflow and cannot find a solution to my problem. I am running a Django project and have Supervisor, RabbitMQ and Celery installed. RabbitMQ is up and running and Supervisor is ensuring my celerybeat is running, however, while it logs that the beat has started and sends tasks every 5 minutes (see below), the tasks never actually execute:
My supervisor program conf:
[program:nrv_twitter]
; Set full path to celery program if using virtualenv
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery beat -A app --loglevel=INFO --pidfile=/tmp/nrv-celerybeat.pid --schedule=/tmp/nrv-celerybeat-schedule
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celerybeat_twitter.log
redirect_stderr=true
autorestart=true
autostart=true
startsecs=10
user=tsantor
; if rabbitmq is supervised, set its priority higher so it starts first
priority=999
Here is the output of the log from the program above:
[2014-12-16 20:29:42,293: INFO/MainProcess] beat: Starting...
[2014-12-16 20:34:08,161: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:39:08,186: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:44:08,204: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:49:08,205: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:54:08,223: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
Here is my celery.py settings file:
from datetime import timedelta
BROKER_URL = 'amqp://guest:guest#localhost//'
CELERY_DISABLE_RATE_LIMITS = True
CELERYBEAT_SCHEDULE = {
'gettweets-every-5-mins': {
'task': 'twitter.tasks.get_tweets',
'schedule': timedelta(seconds=300) # 300 = every 5 minutes
},
}
Here is my celeryapp.py:
from __future__ import absolute_import
import os
from django.conf import settings
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
app = Celery('app')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Here is my twitter/tasks.py:
from __future__ import absolute_import
import logging
from celery import shared_task
from twitter.views import IngestTweets
log = logging.getLogger('custom.log')
#shared_task
def get_tweets():
"""
Get tweets and save them to the DB
"""
instance = IngestTweets()
IngestTweets.get_new_tweets(instance)
log.info('Successfully ingested tweets via celery task')
return True
The get_tweets method never gets executed, however I know it works as I can execute get_tweets manually and it works fine.
I have spent two days trying to figure out why its sending due tasks, but not executing them? Any help is greatly appreciated. Thanks in advance.
user2097159 thanks for pointing me in the right direction, I was not aware I also must run a worker using supervisor. I thought it was either a worker or a beat, but now I understand that I must have a worker to handle the task and a beat to fire off the task periodically.
Below is the missing worker config for supervisor:
[program:nrv_celery_worker]
; Worker
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery worker -A app --loglevel=INFO
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celery_worker.log
redirect_stderr=true
autostart=true
autorestart=true
startsecs=10
user=tsantor
numprocs=1
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
I then reset the RabbitMQ queue. Now that I have both the beat and worker programs managed via supervisor, all is working as intended. Hope this helps someone else out.
You need to a start both a worker process and a beat process. You can create separate processes as described in tsantor's answer, or you can create a single process with both a worker and a beat. This can be more convenient during development (but is not recommended for production).
From "Starting the scheduler" in the Celery documentation:
You can also embed beat inside the worker by enabling the workers -B option, this is convenient if you’ll never run more than one worker node, but it’s not commonly used and for that reason isn’t recommended for production use:
$ celery -A proj worker -B
For expression in Supervisor config files see https://github.com/celery/celery/tree/master/extra/supervisord/ (linked from "Daemonization")
I followed celery docs to define 2 queues on my dev machine.
My celery settings:
CELERY_ALWAYS_EAGER = True
CELERY_TASK_RESULT_EXPIRES = 60 # 1 mins
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_CREATE_MISSING_QUEUES = True
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='arena.social.tasks.#'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'fs_feeds',
},
}
i opened two terminal windows, in virtualenv of my project, and ran following commands:
terminal_1$ celery -A arena worker -Q default -B -l debug --purge -n deafult_worker
terminal_2$ celery -A arena worker -Q feeds -B -l debug --purge -n feeds_worker
what i get is that all tasks are being processed by both queues.
My goal is to have one queue to process only the one task defined in CELERY_ROUTES and default queue to process all other tasks.
I also followed this SO question, rabbitmqctl list_queues returns celery 0, and running rabbitmqctl list_bindings returns exchange celery queue celery [] twice. Restarting rabbit server didn't change anything.
Ok, so i figured it out. Following is my whole setup, settings and how to run celery, for those who might be wondering about same thing as my question did.
Settings
CELERY_TIMEZONE = TIME_ZONE
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
# celery queues setup
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='long_tasks'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'feeds',
'routing_key': 'long_tasks',
},
}
How to run celery?
terminal - tab 1:
celery -A proj worker -Q default -l debug -n default_worker
this will start first worker that consumes tasks from default queue. NOTE! -n default_worker is not a must for the first worker, but is a must if you have any other celery instances up and running. Setting -n worker_name is the same as --hostname=default#%h.
terminal - tab 2:
celery -A proj worker -Q feeds -l debug -n feeds_worker
this will start second worker that consumers tasks from feeds queue. Notice -n feeds_worker, if you are running with -l debug (log level = debug), you will see that both workers are syncing between them.
terminal - tab 3:
celery -A proj beat -l debug
this will start the beat, executing tasks according to the schedule in your CELERYBEAT_SCHEDULE.
I didn't have to change the task, or the CELERYBEAT_SCHEDULE.
For example, this is how looks my CELERYBEAT_SCHEDULE for the task that should go to feeds queue:
CELERYBEAT_SCHEDULE = {
...
'update_feeds': {
'task': 'arena.social.tasks.Update',
'schedule': crontab(minute='*/6'),
},
...
}
As you can see, no need for adding 'options': {'routing_key': 'long_tasks'} or specifying to what queue it should go. Also, if you were wondering why Update is upper cased, its because its a custom task, which are defined as sub classes of celery.Task.
Update Celery 5.0+
Celery made a couple changes since version 5, here is an updated setup for routing of tasks.
How to create the queues?
Celery can create the queues automatically. It works perfectly for simple cases, where celery default values for routing are ok.
task_create_missing_queues=True or, if you're using django settings and you're namespacing all celery configs under CELERY_ key, CELERY_TASK_CREATE_MISSING_QUEUES=True. Note, that it is on by default.
Automatic scheduled task routing
After configuring celery app:
celery_app.conf.beat_schedule = {
"some_scheduled_task": {
"task": "module.path.some_task",
"schedule": crontab(minute="*/10"),
"options": {"queue": "queue1"}
}
}
Automatic task routing
Celery app still has to be configured first and then:
app.conf.task_routes = {
"module.path.task2": {"queue": "queue2"},
}
Manual routing of tasks
In case and you want to route the tasks dynamically, then when sending the task specify the queue:
from module import task
def do_work():
# do some work and launch the task
task.apply_async(args=(arg1, arg2), queue="queue3")
More details re routing can be found here:
https://docs.celeryproject.org/en/stable/userguide/routing.html
And regarding calling tasks here:
https://docs.celeryproject.org/en/stable/userguide/calling.html
In addition to accepted answer, if anyone comes here and still wonders why his settings aren't working (as I did just moments ago), here's why: celery documentation isn't listing settings names properly.
For celery 5.0.5 settings CELERY_DEFAULT_QUEUE, CELERY_QUEUES, CELERY_ROUTES should be named CELERY_TASK_DEFAULT_QUEUE, CELERY_TASK_QUEUESand CELERY_TASK_ROUTES instead. These are settings that I've tested, but my guess is the same rule applies for exchange and routing key aswell.