Celery autoscale not executing tasks

Celery autoscale not executing tasks - python

When Celery is receiving a Task, this task never gets executed, just hangs:
These tasks arrive randomly, in very low load.
Celery 3.1.20
[2016-03-02 22:33:08,300: INFO/MainProcess] Received task: catalogue.cluster.deploy_cluster.imbue_cluster[A5C030C4E0]
[2016-03-02 22:33:08,303: INFO/MainProcess] Scaling up 1 processes.
After this, nothing happens.
I started celery with supervisord using a shell script:
source ~/.profile
CELERY_LOGFILE=/usr/local/src/imbue/application/imbue/log/celeryd.log
CELERYD_OPTS=" --loglevel=INFO --autoscale=10,5"
cd /usr/local/src/imbue/application/imbue/conf
exec celery worker -n celeryd#%h -f $CELERY_LOGFILE $CELERYD_OPTS
My configuration:
CELERYD_CHDIR=settings.filepath
CELERY_IGNORE_RESULT = False
CELERY_RESULT_BACKEND = "amqp"
CELERY_TASK_RESULT_EXPIRES = 360000
CELERY_RESULT_PERSISTENT = True
BROKER_URL=<rabbitmq>
CELERY_ENABLE_UTC=True
CELERY_TIMEZONE= "US/Eastern"
CELERY_IMPORTS=("catalogue.cluster.deploy_cluster",
"tools.deploy_tools",)
This is how I call my tasks:
celery = Celery()
celery.config_from_object('conf.celeryconfig')
celery.send_task("catalogue.cluster.deploy_cluster.imbue_cluster",
kwargs={'configuration': configuration,
'job': job_instance,
'api_call': True},
task_id=job_instance.reference)
#task(bind=True, default_retry_delay=300, max_retries=5)
def imbue_cluster(...)
Similar issues:
http://comments.gmane.org/gmane.comp.python.amqp.celery.user/4990
https://groups.google.com/forum/#!topic/cloudify-users/ANvSv7mV7h4

Related

Celery Beat: Synchronizing schedule is the last debug message I see

I am running superset and celery on AWS ECS. Celery worker, Celery beat and Superset are running in separate containers of the same task. I have turned on debug logs in celery so that I can see each step celery is taking. Celery is starting up and running. Celery worker goes until the log message DEBUG/MainProcess] | Consumer: Starting Connection, celery beat goes until the first time it wakes up, then it displays the log message DEBUG/MainProcess] beat: Synchronizing schedule... and doesn't wake up again.
The command I am using to start the celery worker is:
celery --app=superset.tasks.celery_app:app worker -E --pool=gevent -c 500 -l DEBUG
The command I am using to start the celery beat is:
celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule -l DEBUG
From my superset_config.py, the relevant lines of code are:
class CeleryConfig:
broker_url = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
imports = (
"superset.sql_lab",
"superset.tasks",
"superset.tasks.thumbnails",
)
result_backend = "redis://%s:%s/0" % (REDIS_HOST, REDIS_PORT)
worker_log_level = "DEBUG"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
"sql_lab.get_sql_results": {
"rate_limit": "100/s",
},
"email_reports.send": {
"rate_limit": "1/s",
"time_limit": 600,
"soft_time_limit": 600,
"ignore_result": True,
},
}
beat_schedule = {
"alerts.schedule_check": {
"task": "alerts.schedule_check",
"schedule": crontab(minute="*", hour="*"),
},
"reports.scheduler": {
"task": "reports.scheduler",
"schedule": crontab(minute="*", hour="*"),
},
"reports.prune_log": {
"task": "reports.prune_log",
"schedule": crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
WEBDRIVER_BASEURL = "http://0.0.0.0:8088"
Things I have tried:
Configuring a postgres database for Celery
putting celery into the same container as superset
Changed db numer in superset config
Tried variations on WEBDRIVER_BASEURL (localhost:8088, www.actual-url.com)
Changing the dependencies in the ECS task definition
reconfiguring security groups for redis
Various commands for starting celery worker and beat
Ensuring the security group for the container allows ingress on port 6379
I setup flower to run on the ECS instance; flower shows no workers, tasks or monitors
Things I know:
Celery is connecting to Redis. (At one point it wasn't, and that threw very specific errors.)
Celery is reading the schedule, I can see in the logs where it is displaying the three things scheduled for the beat_schedule.
Schedule reports are not firing; there are no logs at the time of the report, nor is there evidence of a report being generated.
I get an error cron_descriptor.GetText:Failed to find locale en_US, when I access the reports page of Superset (although, TBH, I feel like this is unrelated).

So it turns out I was using MemoryDb instead of Elasticache, and the two things are not interchangeable.

Django Celery Worker Not reciving the Tasks

Whenever I am running the celery worker I am getting the warning
./manage.py celery worker -l info --concurrency=8
and if I am ignored this warning then my celery worker not receiving the celery beat tasks
After googled I have also changed the worker name, but this time I am not receiving the warning but celery worker still not receiving the celery beat scheduled tasks
I have checked the celery beat logs, and celery beat scheduling the task on time.
I have also checked the celery flower and its showing two workers and the first worker is receiving the tasks and not executing it, how to send all task the second worker? or how can i disable the first kombu worker, what is djagno-celery setting that i am missing?
My django settings.py
RABBITMQ_USERNAME = "guest"
RABBITMQ_PASSWORD = "guest"
BROKER_URL = 'amqp://%s:%s#localhost:5672//' % (RABBITMQ_USERNAME,
RABBITMQ_PASSWORD)
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_IGNORE_RESULT = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
celery_enable_utc=True
import djcelery
djcelery.setup_loader()

You only enabled the worker. For a task to be executed, you must call the task with the help of the your_task.delay () function.
For example, open another terminal, enter your project, and run the python manage.py shell command. When entering the shell of your project Django, import your task and run the command your_task.delay ()
In the following link, there is an example of celery code with rabbitmq broker, I advise you to study it:
https://github.com/celery/celery/tree/master/examples/django

Celery worker disappeared without errors

I setup Celery with Django app and broker "Redis"
#task
def proc(product_id,url,did,did_name):
## some long operation here
#task
def Scraping(product_id,num=None):
if num:
num=int(num) ## this for i can set what count of subtasks run now
res=group([proc.s(product_id,url,did,dis[did]) for did in dis.keys()[:num]])()
result = res.get()
return sum(result)
First few subtasks run successful, but later any worker dissapears and new tasks are still in RECEIVED status. Because the worker which must operate it does not exist.
I setup minimal concurency and 2 workers in /etc/default/celeryd.
I monitor CPU and memory usage, no highload detected.
There are no errors in the Celery logs!!!
What's wrong?
[2015-12-19 04:00:30,131: INFO/MainProcess] Task remains.tasks.proc[fd0ec29c-436f-4f60-a1b6-3785342ac173] succeeded in 20.045763085s: 6
[2015-12-19 04:17:28,895: INFO/MainProcess] missed heartbeat from w2#server.domain.com
[2015-12-19 04:17:28,897: DEBUG/MainProcess] w2#server.domain.com joined the party
[2015-12-19 05:11:44,057: INFO/MainProcess] missed heartbeat from w2#server.domain.com
[2015-12-19 05:11:44,058: DEBUG/MainProcess] w2#server.domain.com joined the party
SOLUTION>>> --------------------------------------------------------------)))))
if you use django-celery and want use celery as daemon: no use app() http://docs.celeryproject.org/en/latest/userguide/application.html , instead you must setup your celery in /etc/default/celeryd direct to manage.py of your project as: CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"
do not disable heartbeats!!!!!
for use celery with direct to manage.py need:
create arg. CELERY_APP="" in /etc/default/celeryd beacuse if you don't do it, beat will be make run-command with old argument "app".
add line: "export DJANGO_SETTINGS_MODULE="your_app.settings"" to celeryd config if you not use default settings

Print statement in Celery scheduled task doesn't appear in terminal

When I run celery -A tasks2.celery worker -B I want to see "celery task" printed every second. Currently nothing is printed. Why isn't this working?
from app import app
from celery import Celery
from datetime import timedelta
celery = Celery(app.name, broker='amqp://guest:#localhost/', backend='amqp://guest:#localhost/')
celery.conf.update(CELERY_TASK_RESULT_EXPIRES=3600,)
#celery.task
def add(x, y):
print "celery task"
return x + y
CELERYBEAT_SCHEDULE = {
'add-every-30-seconds': {
'task': 'tasks2.add',
'schedule': timedelta(seconds=1),
'args': (16, 16)
},
}
This is the only output after staring the worker and beat:
[tasks]
. tasks2.add
[INFO/Beat] beat: Starting...
[INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[INFO/MainProcess] mingle: searching for neighbors
[INFO/MainProcess] mingle: all alone

You wrote the schedule, but didn't add it to the celery config. So beat saw no scheduled tasks to send. The example below uses celery.config_from_object(__name__) to pick up config values from the current module, but you can use any other config method as well.
Once you configure it properly, you will see messages from beat about sending scheduled tasks, as well as the output from those tasks as the worker receives and runs them.
from celery import Celery
from datetime import timedelta
celery = Celery(__name__)
celery.config_from_object(__name__)
#celery.task
def say_hello():
print('Hello, World!')
CELERYBEAT_SCHEDULE = {
'every-second': {
'task': 'example.say_hello',
'schedule': timedelta(seconds=5),
},
}
$ celery -A example.celery worker -B -l info
[tasks]
. example.say_hello
[2015-07-15 08:23:54,350: INFO/Beat] beat: Starting...
[2015-07-15 08:23:54,366: INFO/MainProcess] Connected to amqp://guest:**#127.0.0.1:5672//
[2015-07-15 08:23:54,377: INFO/MainProcess] mingle: searching for neighbors
[2015-07-15 08:23:55,385: INFO/MainProcess] mingle: all alone
[2015-07-15 08:23:55,411: WARNING/MainProcess] celery#netsec-ast-15 ready.
[2015-07-15 08:23:59,471: INFO/Beat] Scheduler: Sending due task every-second (example.say_hello)
[2015-07-15 08:23:59,481: INFO/MainProcess] Received task: example.say_hello[2a9d31cb-fe11-47c8-9aa2-51690d47c007]
[2015-07-15 08:23:59,483: WARNING/Worker-3] Hello, World!
[2015-07-15 08:23:59,484: INFO/MainProcess] Task example.say_hello[2a9d31cb-fe11-47c8-9aa2-51690d47c007] succeeded in 0.0012782540870830417s: None

In version 4.1.0 you have to add logger to your task.py file like so:
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#task(name="multiply_two_numbers")
def mul(x, y):
total = x * (y * random.randint(3, 100))
#HERE:
logger.info('Adding {0} + {1}'.format(x, y))
return total
Stated halfway down in the docs here if you want more info:
http://docs.celeryproject.org/en/latest/userguide/tasks.html

Make sure you run celery beat worker for scheduled tasks:
celery beat --app app.celery
Check the docs here: http://celery.readthedocs.org/en/latest/userguide/periodic-tasks.html#starting-the-scheduler

I actually had issues printing mine on command prompt because I was using the wrong command but I found a link to a project which I forked Project
(If on Mac ) celery -A Project worker --loglevel=info
(If on Windows) celery -A Project worker -l info --pool=solo

Celery / RabbitMQ / Django not running tasks

I am hoping someone can help me as I've looked on Stack Overflow and cannot find a solution to my problem. I am running a Django project and have Supervisor, RabbitMQ and Celery installed. RabbitMQ is up and running and Supervisor is ensuring my celerybeat is running, however, while it logs that the beat has started and sends tasks every 5 minutes (see below), the tasks never actually execute:
My supervisor program conf:
[program:nrv_twitter]
; Set full path to celery program if using virtualenv
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery beat -A app --loglevel=INFO --pidfile=/tmp/nrv-celerybeat.pid --schedule=/tmp/nrv-celerybeat-schedule
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celerybeat_twitter.log
redirect_stderr=true
autorestart=true
autostart=true
startsecs=10
user=tsantor
; if rabbitmq is supervised, set its priority higher so it starts first
priority=999
Here is the output of the log from the program above:
[2014-12-16 20:29:42,293: INFO/MainProcess] beat: Starting...
[2014-12-16 20:34:08,161: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:39:08,186: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:44:08,204: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:49:08,205: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
[2014-12-16 20:54:08,223: INFO/MainProcess] Scheduler: Sending due task gettweets-every-5-mins (twitter.tasks.get_tweets)
Here is my celery.py settings file:
from datetime import timedelta
BROKER_URL = 'amqp://guest:guest#localhost//'
CELERY_DISABLE_RATE_LIMITS = True
CELERYBEAT_SCHEDULE = {
'gettweets-every-5-mins': {
'task': 'twitter.tasks.get_tweets',
'schedule': timedelta(seconds=300) # 300 = every 5 minutes
},
}
Here is my celeryapp.py:
from __future__ import absolute_import
import os
from django.conf import settings
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'app.settings')
app = Celery('app')
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
Here is my twitter/tasks.py:
from __future__ import absolute_import
import logging
from celery import shared_task
from twitter.views import IngestTweets
log = logging.getLogger('custom.log')
#shared_task
def get_tweets():
"""
Get tweets and save them to the DB
"""
instance = IngestTweets()
IngestTweets.get_new_tweets(instance)
log.info('Successfully ingested tweets via celery task')
return True
The get_tweets method never gets executed, however I know it works as I can execute get_tweets manually and it works fine.
I have spent two days trying to figure out why its sending due tasks, but not executing them? Any help is greatly appreciated. Thanks in advance.

user2097159 thanks for pointing me in the right direction, I was not aware I also must run a worker using supervisor. I thought it was either a worker or a beat, but now I understand that I must have a worker to handle the task and a beat to fire off the task periodically.
Below is the missing worker config for supervisor:
[program:nrv_celery_worker]
; Worker
command=/Users/tsantor/.virtualenvs/nrv_env/bin/celery worker -A app --loglevel=INFO
; Project dir
directory=/Users/tsantor/Projects/NRV/nrv
; Logs
stdout_logfile=/Users/tsantor/Projects/NRV/nrv/logs/celery_worker.log
redirect_stderr=true
autostart=true
autorestart=true
startsecs=10
user=tsantor
numprocs=1
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
I then reset the RabbitMQ queue. Now that I have both the beat and worker programs managed via supervisor, all is working as intended. Hope this helps someone else out.

You need to a start both a worker process and a beat process. You can create separate processes as described in tsantor's answer, or you can create a single process with both a worker and a beat. This can be more convenient during development (but is not recommended for production).
From "Starting the scheduler" in the Celery documentation:
You can also embed beat inside the worker by enabling the workers -B option, this is convenient if you’ll never run more than one worker node, but it’s not commonly used and for that reason isn’t recommended for production use:
$ celery -A proj worker -B
For expression in Supervisor config files see https://github.com/celery/celery/tree/master/extra/supervisord/ (linked from "Daemonization")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery autoscale not executing tasks - python

Related

Celery Beat: Synchronizing schedule is the last debug message I see

Django Celery Worker Not reciving the Tasks

Celery worker disappeared without errors

Print statement in Celery scheduled task doesn't appear in terminal

Celery / RabbitMQ / Django not running tasks

Categories

Resources