Celery worker doesn't launch from Python - python

We have Python 3.6.1 set up with Django, Celery, and Rabbitmq on Ubuntu 14.04. Right now, I'm using the Django debug server (for dev and Apache isn't working). My current problem is that the celery workers get launched from Python and immediately die -- processes show as defunct. If I use the same command in a terminal window, the worker gets created and picks up the task if there is one waiting in the queue.
Here's the command:
celery worker --app=myapp --loglevel=info --concurrency=1 --maxtasksperchild=20 -n celery_1 -Q celery
The same functionality occurs for whichever queues are being set up.
In the terminal, we see the output myapp.settings - INFO - Loading... followed by output that describes the queue and lists the tasks. When running from Python, the last thing we see is the Loading...
In the code, we do have a check to be sure we are not running the celery command as root.
These are the Celery settings from our settings.py file:
CELERY_ACCEPT_CONTENT = ['json','pickle']
CELERY_TASK_SERIALIZER = 'pickle'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_IMPORTS = ('api.tasks',)
CELERYD_PREFETCH_MULTIPLIER = 1
CELERYD_CONCURRENCY = 1
BROKER_POOL_LIMIT = 120 # Note: I tried this set to None but it didn't seem to make any difference
CELERYD_LOG_COLOR = False
CELERY_LOG_FORMAT = '%)asctime)s - $(processName)s - %(levelname)s - %(message)s'
CELERYD_HIJACK_ROOT_LOGGER = False
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(psconf.BASE_DIR, 'myapp_static/')
BROKER_URL = psconf.MQ_URI
CELERY_RESULT_BACKEND = 'rpc'
CELERY_RESULT_PERSISTENT = True
CELERY_ROUTES = {}
for entry in os.scandir(psconf.PLUGIN_PATH):
if not entry.is_dir() or entry.name == '__pycache__':
continue
plugin_dir = entry.name
settings_file = f'{plugin_dir}.settings'
try:
plugin_tasks = importlib.import_module(settings_file)
queue_name = plugin_tasks.QUEUENAME
except ModuleNotFoundError as e:
logging.warning(e)
except AttributeError:
logging.debug(f'The plugin {plugin_dir} will use the general worker queue.')
else:
CELERY_ROUTES[f'{plugin_dir}.tasks.run'] = {'queue': queue_name}
logging.debug(f'The plugin {plugin_dir} will use the {queue_name} queue.')
Here is the part that kicks off the worker:
class CeleryWorker(BackgroundProcess):
def __init__(self, n, q):
self.name = n
self.worker_queue = q
cmd = f'celery worker --app=myapp --loglevel=info --concurrency=1 --maxtasksperchild=20 -n {self.name" -Q {self.worker_queue}'
super().__init__(cmd, cwd=str(psconf.BASE_DIR))
class BackgroundProcess(subprocess.Popen):
def __init__(self, args, **kwargs):
super().__init__(args, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True, **kwargs)
Any suggestions as to how to get this working from Python are appreciated. I'm new to Rabbitmq/Celery.

Just in case someone else needs this...It turns out that the problem was that the shell script which kicks off this whole app is now being launched with sudo and, even though I thought I was checking so we wouldn't launch the celery worker with sudo, I'd missed something and we were trying to launch as root. That is a no-no. I'm now explicitly using 'sudo -u ' and the workers are starting properly.

Related

Celery consumer does not receive messages from SQS queue on LocalStack

I have a SQS queue on a LocalStack server and I'm trying to consume messages from it with a Celery consumer.
It seams that the consumer is properly attached to the queue, for example the queue sqs-test-queue, but it does not receive any message when I try to send one with aws command.
My celeryconfig.py looks like this:
from kombu import (
Exchange,
Queue
)
broker_transport_options = {'region': REGION}
broker_transport = 'sqs'
accept_content = ['application/json']
result_serializer = 'json'
content_encoding = 'utf-8'
task_serializer = 'json'
worker_enable_remote_control = False
worker_send_task_events = False
result_backend = None
task_queues = (
Queue('sqs-test-queue', exchange=Exchange(''), routing_key='sqs-test-queue'),
)
and my tasks.py module looks like this:
from celery import Celery
from kombu.utils.url import quote
AWS_ACCESS_KEY = quote("AWS_ACCESS_KEY")
AWS_SECRET_KEY = quote("AWS_SECRET_KEY")
LOCALSTACK = "<IP>:<PORT>"
broker_url = "sqs://{access}:{secret}#{host}".format(access=AWS_ACCESS_KEY,
secret=AWS_SECRET_KEY,
host=LOCALSTACK)
app = Celery('tasks', broker=broker_url, backend=None)
app.config_from_object('celeryconfig')
#app.task(bind=True, name='tasks.consume', acks_late=True, ignore_result=True)
def consume(self, msg):
# DO SOMETHING WITH THE RECEIVED MESSAGE
return True
Tried to execute it with celery -A tasks worker -l INFO -Q sqs-test-queue and everything seams OK:
...
[tasks]
. tasks.consume
[... INFO/MainProcess] Connected to sqs://AWS_ACCESS_KEY:**#<IP>:<PORT>//
[... INFO/MainProcess] celery#local ready
but when I try to send a message with aws sqs send-message --endpoint-url=http://<IP>:<PORT> --queue-url=http://localhost:<PORT>/queue/sqs-test-queue --message-body="Test message", nothing happens.
What am I doing wrong? Have I missed something in the configuration maybe?
PS: If I try to run the command aws sqs receive-message --endpoint-url=http://<IP>:<PORT> --queue-url=http://localhost:<PORT>/queue/sqs-test-queue, I'm able to get the message.
NOTE:
I'm using Python 3.7.0 and my pip freeze looks like this:
boto3==1.10.16
botocore==1.13.16
celery==4.3.0
kombu==4.6.6
pycurl==7.43.0.3
...
I am going through the same thing as you. To fix it I did a couple of things:
I set the HOSTNAME_EXTERNAL and HOSTNAME env variables in localstack
Set broker_url to sqs://{access}:{secret}#{host}:{port} (as you have it)
Make sure that the celery worker's broker_transport_options does not include the config item: wait_time_seconds since this causes errors with localstack as of February 7th, 2020 (check issue here).
Once I did those two things, it started working, hope it helps.
Celery can't publish or consume arbitrary messages to/from any message queue system. Use kombu for that - that is what Celery uses behind the scenes too.

Django Celery Worker Not reciving the Tasks

Whenever I am running the celery worker I am getting the warning
./manage.py celery worker -l info --concurrency=8
and if I am ignored this warning then my celery worker not receiving the celery beat tasks
After googled I have also changed the worker name, but this time I am not receiving the warning but celery worker still not receiving the celery beat scheduled tasks
I have checked the celery beat logs, and celery beat scheduling the task on time.
I have also checked the celery flower and its showing two workers and the first worker is receiving the tasks and not executing it, how to send all task the second worker? or how can i disable the first kombu worker, what is djagno-celery setting that i am missing?
My django settings.py
RABBITMQ_USERNAME = "guest"
RABBITMQ_PASSWORD = "guest"
BROKER_URL = 'amqp://%s:%s#localhost:5672//' % (RABBITMQ_USERNAME,
RABBITMQ_PASSWORD)
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE = 'default'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_IGNORE_RESULT = True
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
celery_enable_utc=True
import djcelery
djcelery.setup_loader()
You only enabled the worker. For a task to be executed, you must call the task with the help of the your_task.delay () function.
For example, open another terminal, enter your project, and run the python manage.py shell command. When entering the shell of your project Django, import your task and run the command your_task.delay ()
In the following link, there is an example of celery code with rabbitmq broker, I advise you to study it:
https://github.com/celery/celery/tree/master/examples/django

Starting celery worker from multiprocessing

I'm new to celery. All of the examples I've seen start a celery worker from the command line. e.g:
$ celery -A proj worker -l info
I'm starting a project on elastic beanstalk and thought it would be nice to have the worker be a subprocess of my web app. I tried using multiprocessing and it seems to work. I'm wondering if this is a good idea, or if there might be some disadvantages.
import celery
import multiprocessing
class WorkerProcess(multiprocessing.Process):
def __init__(self):
super().__init__(name='celery_worker_process')
def run(self):
argv = [
'worker',
'--loglevel=WARNING',
'--hostname=local',
]
app.worker_main(argv)
def start_celery():
global worker_process
worker_process = WorkerProcess()
worker_process.start()
def stop_celery():
global worker_process
if worker_process:
worker_process.terminate()
worker_process = None
worker_name = 'celery#local'
worker_process = None
app = celery.Celery()
app.config_from_object('celery_app.celeryconfig')
Seems like a good option, definitely not the only option but a good one :)
One thing you might want to look into (you might already be doing this), is linking the autoscaling to the size of your Celery queue. So you only scale up when the queue is growing.
Effectively Celery does something similar internally of course, so there's not a lot of difference. The only snag I can think of is the handling of external resources (database connections for example), that might be a problem but is completely dependent on what you are doing with Celery.
If anyone is interested, I did get this working on Elastic Beanstalk with a pre-configured AMI server running Python 3.4. I had a lot of problems with the Docker based server running Debian Jessie. Something to do with port remapping, maybe. Docker is kind of a black box, and I've found it very hard to work with and debug. Fortunately, the good folks at AWS just added a non-docker Python 3.4 option on April 8, 2015.
I did a lot of searching to get this deployed and working. I saw lots of questions without answers. So here's my very simple deployed python 3.4/flask/celery process.
Celery you can just pip install. You'll need to install rabbitmq from a configuration file with a config command or container_command. I'm using a script in my uploaded project zip, so a container_command is necessary to use the script (regular eb config command takes place before the project is installed).
[yourapproot]/.ebextensions/05_install_rabbitmq.config:
container_commands:
01RunScript:
command: bash ./init_scripts/app_setup.sh
[yourapproot]/init_scripts/app_setup.sh:
#!/usr/bin/env bash
# Download and install Erlang
yum install erlang
# Download the latest RabbitMQ package using wget:
wget http://www.rabbitmq.com/releases/rabbitmq-server/v3.5.1/rabbitmq-server-3.5.1-1.noarch.rpm
# Install rabbit
rpm --import http://www.rabbitmq.com/rabbitmq-signing-key-public.asc
yum -y install rabbitmq-server-3.5.1-1.noarch.rpm
# Start server
/sbin/service rabbitmq-server start
I'm doing a flask app, so I startup the workers before the first request:
#app.before_first_request
def before_first_request():
task_mgr.start_celery()
The task_mgr creates the celery app object (which I call celery, since the flask app object is app). The -Ofair is pretty key here, for a simple task manager. There's all kinds of strange behavior with task prefetch. This should maybe be the default?
task_mgr/task_mgr.py:
import celery as celery_module
import multiprocessing
class WorkerProcess(multiprocessing.Process):
def __init__(self):
super().__init__(name='celery_worker_process')
def run(self):
argv = [
'worker',
'--loglevel=WARNING',
'--hostname=local',
'-Ofair',
]
celery.worker_main(argv)
def start_celery():
global worker_process
multiprocessing.set_start_method('fork') # 'spawn' seems to work also
worker_process = WorkerProcess()
worker_process.start()
def stop_celery():
global worker_process
if worker_process:
worker_process.terminate()
worker_process = None
worker_name = 'celery#local'
worker_process = None
celery = celery_module.Celery()
celery.config_from_object('task_mgr.celery_config')
My config is pretty simple so far:
task_mgr/celery_config.py:
BROKER_URL = 'amqp://'
CELERY_RESULT_BACKEND = 'amqp://'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json' # 'pickle' warning: can't use datetime in json
CELERY_RESULT_SERIALIZER = 'json' # 'pickle' warning: can't use datetime in json
CELERY_TASK_RESULT_EXPIRES = 18000 # Results hang around for 5 hours
CELERYD_CONCURRENCY = 4
Then you can put tasks wherever you need them:
from task_mgr.task_mgr import celery
import time
#celery.task(bind=True)
def error_task(self):
self.update_state(state='RUNNING')
time.sleep(10)
raise KeyError('im an error')
#celery.task(bind=True)
def long_task(self):
self.update_state(state='RUNNING')
time.sleep(20)
return 'long task finished'
#celery.task(bind=True)
def task_with_status(self, wait):
self.update_state(state='RUNNING')
for i in range(5):
time.sleep(wait)
self.update_state(
state='PROGRESS',
meta={
'current': i + 1,
'total': 5,
'status': 'progress',
'host': self.request.hostname,
}
)
time.sleep(wait)
return 'finished with wait = ' + str(wait)
I also keep a task queue to hold the async results so I can monitor the tasks:
task_queue = []
def queue_task(task, *args):
async_result = task.apply_async(args)
task_queue.append(
{
'task_name':task.__name__,
'task_args':args,
'async_result':async_result
}
)
return async_result
def get_tasks_info():
tasks = []
for task in task_queue:
task_name = task['task_name']
task_args = task['task_args']
async_result = task['async_result']
task_id = async_result.id
task_state = async_result.state
task_result_info = async_result.info
task_result = async_result.result
tasks.append(
{
'task_name': task_name,
'task_args': task_args,
'task_id': task_id,
'task_state': task_state,
'task_result.info': task_result_info,
'task_result': task_result,
}
)
return tasks
And of course, start the tasks where you need to:
from webapp.app import app
from flask import url_for, render_template, redirect
from webapp import tasks
from task_mgr import task_mgr
#app.route('/start_all_tasks')
def start_all_tasks():
task_mgr.queue_task(tasks.long_task)
task_mgr.queue_task(tasks.error_task)
for i in range(1, 9):
task_mgr.queue_task(tasks.task_with_status, i * 2)
return redirect(url_for('task_status'))
#app.route('/task_status')
def task_status():
current_tasks = task_mgr.get_tasks_info()
return render_template(
'parse/task_status.html',
tasks=current_tasks
)
And that's about it. Let me know if you need any help, though my celery knowledge is still fairly limited.

Django/Celery multiple queues on localhost - routing not working

I followed celery docs to define 2 queues on my dev machine.
My celery settings:
CELERY_ALWAYS_EAGER = True
CELERY_TASK_RESULT_EXPIRES = 60 # 1 mins
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
CELERY_CREATE_MISSING_QUEUES = True
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='arena.social.tasks.#'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'fs_feeds',
},
}
i opened two terminal windows, in virtualenv of my project, and ran following commands:
terminal_1$ celery -A arena worker -Q default -B -l debug --purge -n deafult_worker
terminal_2$ celery -A arena worker -Q feeds -B -l debug --purge -n feeds_worker
what i get is that all tasks are being processed by both queues.
My goal is to have one queue to process only the one task defined in CELERY_ROUTES and default queue to process all other tasks.
I also followed this SO question, rabbitmqctl list_queues returns celery 0, and running rabbitmqctl list_bindings returns exchange celery queue celery [] twice. Restarting rabbit server didn't change anything.
Ok, so i figured it out. Following is my whole setup, settings and how to run celery, for those who might be wondering about same thing as my question did.
Settings
CELERY_TIMEZONE = TIME_ZONE
CELERY_ACCEPT_CONTENT = ['json', 'pickle']
CELERYD_CONCURRENCY = 2
CELERYD_MAX_TASKS_PER_CHILD = 4
CELERYD_PREFETCH_MULTIPLIER = 1
# celery queues setup
CELERY_DEFAULT_QUEUE = 'default'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'default'
CELERY_QUEUES = (
Queue('default', Exchange('default'), routing_key='default'),
Queue('feeds', Exchange('feeds'), routing_key='long_tasks'),
)
CELERY_ROUTES = {
'arena.social.tasks.Update': {
'queue': 'feeds',
'routing_key': 'long_tasks',
},
}
How to run celery?
terminal - tab 1:
celery -A proj worker -Q default -l debug -n default_worker
this will start first worker that consumes tasks from default queue. NOTE! -n default_worker is not a must for the first worker, but is a must if you have any other celery instances up and running. Setting -n worker_name is the same as --hostname=default#%h.
terminal - tab 2:
celery -A proj worker -Q feeds -l debug -n feeds_worker
this will start second worker that consumers tasks from feeds queue. Notice -n feeds_worker, if you are running with -l debug (log level = debug), you will see that both workers are syncing between them.
terminal - tab 3:
celery -A proj beat -l debug
this will start the beat, executing tasks according to the schedule in your CELERYBEAT_SCHEDULE.
I didn't have to change the task, or the CELERYBEAT_SCHEDULE.
For example, this is how looks my CELERYBEAT_SCHEDULE for the task that should go to feeds queue:
CELERYBEAT_SCHEDULE = {
...
'update_feeds': {
'task': 'arena.social.tasks.Update',
'schedule': crontab(minute='*/6'),
},
...
}
As you can see, no need for adding 'options': {'routing_key': 'long_tasks'} or specifying to what queue it should go. Also, if you were wondering why Update is upper cased, its because its a custom task, which are defined as sub classes of celery.Task.
Update Celery 5.0+
Celery made a couple changes since version 5, here is an updated setup for routing of tasks.
How to create the queues?
Celery can create the queues automatically. It works perfectly for simple cases, where celery default values for routing are ok.
task_create_missing_queues=True or, if you're using django settings and you're namespacing all celery configs under CELERY_ key, CELERY_TASK_CREATE_MISSING_QUEUES=True. Note, that it is on by default.
Automatic scheduled task routing
After configuring celery app:
celery_app.conf.beat_schedule = {
"some_scheduled_task": {
"task": "module.path.some_task",
"schedule": crontab(minute="*/10"),
"options": {"queue": "queue1"}
}
}
Automatic task routing
Celery app still has to be configured first and then:
app.conf.task_routes = {
"module.path.task2": {"queue": "queue2"},
}
Manual routing of tasks
In case and you want to route the tasks dynamically, then when sending the task specify the queue:
from module import task
def do_work():
# do some work and launch the task
task.apply_async(args=(arg1, arg2), queue="queue3")
More details re routing can be found here:
https://docs.celeryproject.org/en/stable/userguide/routing.html
And regarding calling tasks here:
https://docs.celeryproject.org/en/stable/userguide/calling.html
In addition to accepted answer, if anyone comes here and still wonders why his settings aren't working (as I did just moments ago), here's why: celery documentation isn't listing settings names properly.
For celery 5.0.5 settings CELERY_DEFAULT_QUEUE, CELERY_QUEUES, CELERY_ROUTES should be named CELERY_TASK_DEFAULT_QUEUE, CELERY_TASK_QUEUESand CELERY_TASK_ROUTES instead. These are settings that I've tested, but my guess is the same rule applies for exchange and routing key aswell.

Can't get "retry" to work in Celery

Given a file myapp.py
from celery import Celery
celery = Celery("myapp")
celery.config_from_object("celeryconfig")
#celery.task(default_retry_delay=5 * 60, max_retries=12)
def add(a, b):
with open("try.txt", "a") as f:
f.write("A trial = {}!\n".format(a + b))
raise add.retry([a, b])
Configured with a celeryconfig.py
CELERY_IMPORTS = ["myapp"]
BROKER_URL = "amqp://"
CELERY_RESULT_BACKEND = "amqp"
I call in the directory that have both files:
$ celeryd -E
And then
$ python -c "import myapp; myapp.add.delay(2, 5)"
or
$ celery call myapp.add --args="[2, 5]"
So the try.txt is created with
A trial = 7!
only once.
That means, the retry was ignored.
I tried many other things:
Using MongoDB as broker and backend and inspecting the database (strangely enough, I can't see anything in my broker "messages" collection even in a "countdown"-scheduled job)
The PING example in here, both with RabbitMQ and MongoDB
Printing on the screen with both print (like the PING example) and logging
Make the retry call in an except block after an enforced Exception is raised, raising or returning the retry(), changing the "throw" parameter to True/False/not specified.
Seeing what's happening with celery flower (in which the "broker" link shows nothing)
But none happened to work =/
My celery report output:
software -> celery:3.0.19 (Chiastic Slide) kombu:2.5.10 py:2.7.3
billiard:2.7.3.28 py-amqp:N/A
platform -> system:Linux arch:64bit, ELF imp:CPython
loader -> celery.loaders.default.Loader
settings -> transport:amqp results:amqp
Is there anything wrong above? What I need to do to make the retry() method work?

Categories

Resources